[GitHub] spark pull request #15118: Branch 2.0

2016-09-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15118


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15118: Branch 2.0

2016-09-16 Thread avinashkolla
GitHub user avinashkolla opened a pull request:

https://github.com/apache/spark/pull/15118

Branch 2.0

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15118


commit f46a074510e47206de9d3b3ac6902af321923ce8
Author: Sylvain Zimmer 
Date:   2016-07-28T16:51:45Z

[SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap

Avoid overflow of Long type causing a NegativeArraySizeException a few 
lines later.

Unit tests for HashedRelationSuite still pass.

I can confirm the python script I included in 
https://issues.apache.org/jira/browse/SPARK-16740 works fine with this patch. 
Unfortunately I don't have the knowledge/time to write a Scala test case for 
HashedRelationSuite right now. As the patch is pretty obvious I hope it can be 
included without this.

Thanks!

Author: Sylvain Zimmer 

Closes #14373 from sylvinus/master.

(cherry picked from commit 1178d61ede816bf1c8d5bb3dbb3b965c9b944407)
Signed-off-by: Reynold Xin 

commit fb09a693d6f58d71ec042224b8ea66b972c1adc2
Author: Sameer Agarwal 
Date:   2016-07-28T20:04:19Z

[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on 
OutOfMemoryError

## What changes were proposed in this pull request?

We currently don't bound or manage the data array size used by column 
vectors in the vectorized reader (they're just bound by INT.MAX) which may lead 
to OOMs while reading data. As a short term fix, this patch intercepts the 
OutOfMemoryError exception and suggest the user to disable the vectorized 
parquet reader.

## How was this patch tested?

Existing Tests

Author: Sameer Agarwal 

Closes #14387 from sameeragarwal/oom.

(cherry picked from commit 3fd39b87bda77f3c3a4622d854f23d4234683571)
Signed-off-by: Reynold Xin 

commit 5cd79c396f98660e12b02c0151a084b4d1599b6b
Author: Nicholas Chammas 
Date:   2016-07-28T21:57:15Z

[SPARK-16772] Correct API doc references to PySpark classes + formatting 
fixes

## What's Been Changed

The PR corrects several broken or missing class references in the Python 
API docs. It also correct formatting problems.

For example, you can see 
[here](http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.SQLContext.registerFunction)
 how Sphinx is not picking up the reference to `DataType`. That's because the 
reference is relative to the current module, whereas `DataType` is in a 
different module.

You can also see 
[here](http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.SQLContext.createDataFrame)
 how the formatting for byte, tinyint, and so on is italic instead of 
monospace. That's because in ReST single backticks just make things italic, 
unlike in Markdown.

## Testing

I tested this PR by [building the Python 
docs](https://github.com/apache/spark/tree/master/docs#generating-the-documentation-html)
 and reviewing the results locally in my browser. I confirmed that the broken 
or missing class references were resolved, and that the formatting was 
corrected.

Author: Nicholas Chammas 

Closes #14393 from nchammas/python-docstring-fixes.

(cherry picked from commit 274f3b9ec86e4109c7678eef60f990d41dc3899f)
Signed-off-by: Reynold Xin 

commit ed03d0a690c9a7920a21c858df7f42f9a41f28d7
Author: Wesley Tang 
Date:   2016-07-29T11:26:05Z

[SPARK-16664][SQL] Fix persist call on Data frames with more than 200…

## What changes were proposed in this pull request?

f12f11e578169b47e3f8b18b299948c0670ba585 introduced this bug, missed 
foreach as map

## How was this patch tested?

Test added

Author: Wesley Tang 

Closes #14324 from breakdawn/master.

(cherry picked from commit d1d5069aa3744d46abd3889abab5f15e9067382a)
Signed-off-by: Sean Owen 

commit efad4aa1468867b36cffb1e8c91f9731c48eca81
Author: Yanbo Liang 
Date:   2016-07-29T11:40:20Z

[SPARK-16750][ML] Fix GaussianMixture training failed due to feature column 
type mistake

## What changes were proposed in this pull request?
ML ```GaussianMixture``` training failed due to feature column type 
mistake. The feature column type sh