spark git commit: [SPARK-16353][BUILD][DOC] Missing javadoc options for java unidoc

2016-07-04 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.6 c25aa8fca -> 4fcb88843 [SPARK-16353][BUILD][DOC] Missing javadoc options for java unidoc Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-16353 ## What changes were proposed in this pull request? The javadoc options for

spark git commit: [SPARK-16353][BUILD][DOC] Missing javadoc options for java unidoc

2016-07-04 Thread srowen
Repository: spark Updated Branches: refs/heads/master 18fb57f58 -> 7dbffcdd6 [SPARK-16353][BUILD][DOC] Missing javadoc options for java unidoc Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-16353 ## What changes were proposed in this pull request? The javadoc options for the

spark git commit: [SPARK-16353][BUILD][DOC] Missing javadoc options for java unidoc

2016-07-04 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 d5683a703 -> cc100ab54 [SPARK-16353][BUILD][DOC] Missing javadoc options for java unidoc Link to Jira issue: https://issues.apache.org/jira/browse/SPARK-16353 ## What changes were proposed in this pull request? The javadoc options for

spark git commit: [SPARK-16339][CORE] ScriptTransform does not print stderr when outstream is lost

2016-07-06 Thread srowen
Repository: spark Updated Branches: refs/heads/master ec79183ac -> 5f342049c [SPARK-16339][CORE] ScriptTransform does not print stderr when outstream is lost ## What changes were proposed in this pull request? Currently, if due to some failure, the outstream gets destroyed or closed and late

spark git commit: [SPARK-16339][CORE] ScriptTransform does not print stderr when outstream is lost

2016-07-06 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 6e8fa86eb -> 521fc7186 [SPARK-16339][CORE] ScriptTransform does not print stderr when outstream is lost ## What changes were proposed in this pull request? Currently, if due to some failure, the outstream gets destroyed or closed and

spark git commit: [MINOR][BUILD] Download Maven 3.3.9 instead of 3.3.3 because the latter is no longer published on Apache mirrors

2016-07-06 Thread srowen
use the latter is no longer published on Apache mirrors ## How was this patch tested? Jenkins Author: Sean Owen Closes #14066 from srowen/Maven339Branch16. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/76781950 Tree: h

spark git commit: [MINOR][CORE][1.6-BACKPORT] Fix display wrong free memory size in the log

2016-07-06 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.6 76781950f -> 2588776ad [MINOR][CORE][1.6-BACKPORT] Fix display wrong free memory size in the log ## What changes were proposed in this pull request? Free memory size displayed in the log is wrong (used memory), fix to make it correct.

spark git commit: [SPARK-16372][MLLIB] Retag RDD to tallSkinnyQR of RowMatrix

2016-07-07 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.6 2588776ad -> 45dda9221 [SPARK-16372][MLLIB] Retag RDD to tallSkinnyQR of RowMatrix ## What changes were proposed in this pull request? The following Java code because of type erasing: ```Java JavaRDD rows = jsc.parallelize(...); RowMa

spark git commit: [SPARK-16372][MLLIB] Retag RDD to tallSkinnyQR of RowMatrix

2016-07-07 Thread srowen
Repository: spark Updated Branches: refs/heads/master 986b25140 -> 4c6f00d09 [SPARK-16372][MLLIB] Retag RDD to tallSkinnyQR of RowMatrix ## What changes were proposed in this pull request? The following Java code because of type erasing: ```Java JavaRDD rows = jsc.parallelize(...); RowMatrix

spark git commit: [SPARK-16372][MLLIB] Retag RDD to tallSkinnyQR of RowMatrix

2016-07-07 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 d63428af6 -> 24933355c [SPARK-16372][MLLIB] Retag RDD to tallSkinnyQR of RowMatrix ## What changes were proposed in this pull request? The following Java code because of type erasing: ```Java JavaRDD rows = jsc.parallelize(...); RowMa

spark git commit: [SPARK-16399][PYSPARK] Force PYSPARK_PYTHON to python

2016-07-07 Thread srowen
Repository: spark Updated Branches: refs/heads/master 4c6f00d09 -> 6343f6655 [SPARK-16399][PYSPARK] Force PYSPARK_PYTHON to python ## What changes were proposed in this pull request? I would like to change ```bash if hash python2.7 2>/dev/null; then # Attempt to use Python 2.7, if installe

spark git commit: [SPARK-16369][MLLIB] tallSkinnyQR of RowMatrix should aware of empty partition

2016-07-08 Thread srowen
Repository: spark Updated Branches: refs/heads/master a54438cb2 -> 255d74fe4 [SPARK-16369][MLLIB] tallSkinnyQR of RowMatrix should aware of empty partition ## What changes were proposed in this pull request? tallSkinnyQR of RowMatrix should aware of empty partition, which could cause excepti

spark git commit: [SPARK-16369][MLLIB] tallSkinnyQR of RowMatrix should aware of empty partition

2016-07-08 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 221a4a7fb -> 8c8180605 [SPARK-16369][MLLIB] tallSkinnyQR of RowMatrix should aware of empty partition ## What changes were proposed in this pull request? tallSkinnyQR of RowMatrix should aware of empty partition, which could cause exc

spark git commit: [SPARK-17507][ML][MLLIB] check weight vector size in ANN

2016-09-15 Thread srowen
Repository: spark Updated Branches: refs/heads/master 6a6adb167 -> d15b4f90e [SPARK-17507][ML][MLLIB] check weight vector size in ANN ## What changes were proposed in this pull request? as the TODO described, check weight vector size and if wrong throw exception. ## How was this patch tested

spark git commit: [SPARK-17524][TESTS] Use specified spark.buffer.pageSize

2016-09-15 Thread srowen
Repository: spark Updated Branches: refs/heads/master d15b4f90e -> f893e2625 [SPARK-17524][TESTS] Use specified spark.buffer.pageSize ## What changes were proposed in this pull request? This PR has the appendRowUntilExceedingPageSize test in RowBasedKeyValueBatchSuite use whatever spark.buff

spark git commit: [SPARK-17521] Error when I use sparkContext.makeRDD(Seq())

2016-09-15 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 bb2bdb440 -> 5c2bc8360 [SPARK-17521] Error when I use sparkContext.makeRDD(Seq()) ## What changes were proposed in this pull request? when i use sc.makeRDD below ``` val data3 = sc.makeRDD(Seq()) println(data3.partitions.length) ``` I

spark git commit: [SPARK-17521] Error when I use sparkContext.makeRDD(Seq())

2016-09-15 Thread srowen
Repository: spark Updated Branches: refs/heads/master f893e2625 -> 647ee05e5 [SPARK-17521] Error when I use sparkContext.makeRDD(Seq()) ## What changes were proposed in this pull request? when i use sc.makeRDD below ``` val data3 = sc.makeRDD(Seq()) println(data3.partitions.length) ``` I got

spark git commit: [SPARK-17406][WEB UI] limit timeline executor events

2016-09-15 Thread srowen
Repository: spark Updated Branches: refs/heads/master 647ee05e5 -> ad79fc0a8 [SPARK-17406][WEB UI] limit timeline executor events ## What changes were proposed in this pull request? The job page will be too slow to open when there are thousands of executor events(added or removed). I found th

spark git commit: [SPARK-17536][SQL] Minor performance improvement to JDBC batch inserts

2016-09-15 Thread srowen
Repository: spark Updated Branches: refs/heads/master ad79fc0a8 -> 71a65825c [SPARK-17536][SQL] Minor performance improvement to JDBC batch inserts ## What changes were proposed in this pull request? Optimize a while loop during batch inserts ## How was this patch tested? Unit tests were do

spark git commit: [SPARK-17406][BUILD][HOTFIX] MiMa excludes fix

2016-09-15 Thread srowen
this is merged to 2.1.x only / master, I left the exclude in for 2.0.x in case we back port. It's a private API so is always a false positive. ## How was this patch tested? Jenkins build Author: Sean Owen Closes #15110 from srowen/SPARK-17406.2. Project: http://git-wip-us.apache.org/re

spark git commit: [SPARK-17543] Missing log4j config file for tests in common/network-…

2016-09-16 Thread srowen
Repository: spark Updated Branches: refs/heads/master b72486f82 -> b2e272624 [SPARK-17543] Missing log4j config file for tests in common/network-… ## What changes were proposed in this pull request? The Maven module `common/network-shuffle` does not have a log4j configuration file for its

spark git commit: [SPARK-17534][TESTS] Increase timeouts for DirectKafkaStreamSuite tests

2016-09-16 Thread srowen
Repository: spark Updated Branches: refs/heads/master b2e272624 -> fc1efb720 [SPARK-17534][TESTS] Increase timeouts for DirectKafkaStreamSuite tests **## What changes were proposed in this pull request?** There are two tests in this suite that are particularly flaky on the following hardware:

spark git commit: Correct fetchsize property name in docs

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/master 39e2bad6a -> 69cb04969 Correct fetchsize property name in docs ## What changes were proposed in this pull request? Replace `fetchSize` with `fetchsize` in the docs. ## How was this patch tested? I manually tested `fetchSize` and `fetchsi

spark git commit: Correct fetchsize property name in docs

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 3fce1255a -> 9ff158b81 Correct fetchsize property name in docs ## What changes were proposed in this pull request? Replace `fetchSize` with `fetchsize` in the docs. ## How was this patch tested? I manually tested `fetchSize` and `fet

spark git commit: [SPARK-17567][DOCS] Use valid url to Spark RDD paper

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 9ff158b81 -> 3ca0dc007 [SPARK-17567][DOCS] Use valid url to Spark RDD paper https://issues.apache.org/jira/browse/SPARK-17567 ## What changes were proposed in this pull request? Documentation (http://spark.apache.org/docs/latest/api/

spark git commit: [SPARK-17567][DOCS] Use valid url to Spark RDD paper

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/master 69cb04969 -> f15d41be3 [SPARK-17567][DOCS] Use valid url to Spark RDD paper https://issues.apache.org/jira/browse/SPARK-17567 ## What changes were proposed in this pull request? Documentation (http://spark.apache.org/docs/latest/api/scal

spark git commit: [SPARK-17561][DOCS] DataFrameWriter documentation formatting problems

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 3ca0dc007 -> c9bd67e94 [SPARK-17561][DOCS] DataFrameWriter documentation formatting problems Fix ` / ` problems in SQL scaladoc. Scaladoc build and manual verification of generated HTML. Author: Sean Owen Closes #15117 from sro

spark git commit: [SPARK-17548][MLLIB] Word2VecModel.findSynonyms no longer spuriously rejects the best match when invoked with a vector

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/master f15d41be3 -> 25cbbe6ca [SPARK-17548][MLLIB] Word2VecModel.findSynonyms no longer spuriously rejects the best match when invoked with a vector ## What changes were proposed in this pull request? This pull request changes the behavior of `W

spark git commit: [SPARK-17548][MLLIB] Word2VecModel.findSynonyms no longer spuriously rejects the best match when invoked with a vector

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 c9bd67e94 -> eb2675de9 [SPARK-17548][MLLIB] Word2VecModel.findSynonyms no longer spuriously rejects the best match when invoked with a vector ## What changes were proposed in this pull request? This pull request changes the behavior o

spark git commit: [SPARK-17529][CORE] Implement BitSet.clearUntil and use it during merge joins

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/master 25cbbe6ca -> 9dbd4b864 [SPARK-17529][CORE] Implement BitSet.clearUntil and use it during merge joins ## What changes were proposed in this pull request? Add a clearUntil() method on BitSet (adapted from the pre-existing setUntil() method)

spark git commit: [SPARK-17575][DOCS] Remove extra table tags in configuration document

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 eb2675de9 -> ec2b73656 [SPARK-17575][DOCS] Remove extra table tags in configuration document ## What changes were proposed in this pull request? Remove extra table tags in configurations document. ## How was this patch tested? Run al

spark git commit: [SPARK-17575][DOCS] Remove extra table tags in configuration document

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/master 9dbd4b864 -> bbe0b1d62 [SPARK-17575][DOCS] Remove extra table tags in configuration document ## What changes were proposed in this pull request? Remove extra table tags in configurations document. ## How was this patch tested? Run all te

spark git commit: [SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n)

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/master bbe0b1d62 -> 86c2d393a [SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n) ## What changes were proposed in this pull request? This PR fixes all the instances which was fixed in the previous PR. To

spark git commit: [SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n)

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 ec2b73656 -> a3bba372a [SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n) This PR fixes all the instances which was fixed in the previous PR. To make sure, I manually debugged and also checked t

spark git commit: [SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n)

2016-09-17 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 0cfc0469b -> 5fd354b2d [SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n) This PR fixes all the instances which was fixed in the previous PR. To make sure, I manually debugged and also checked t

spark-website git commit: replace with valid url to rdd paper

2016-09-17 Thread srowen
Repository: spark-website Updated Branches: refs/heads/asf-site a78faf582 -> eee58685c replace with valid url to rdd paper Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/eee58685 Tree: http://git-wip-us.ap

spark git commit: [SPARK-17506][SQL] Improve the check double values equality rule.

2016-09-18 Thread srowen
Repository: spark Updated Branches: refs/heads/master 3fe630d31 -> 5d3f4615f [SPARK-17506][SQL] Improve the check double values equality rule. ## What changes were proposed in this pull request? In `ExpressionEvalHelper`, we check the equality between two double values by comparing whether t

spark git commit: [SPARK-17546][DEPLOY] start-* scripts should use hostname -f

2016-09-18 Thread srowen
of course, but also verified output of command on OS X and Linux Author: Sean Owen Closes #15129 from srowen/SPARK-17546. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/342c0e65 Tree: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-17546][DEPLOY] start-* scripts should use hostname -f

2016-09-18 Thread srowen
sts of course, but also verified output of command on OS X and Linux Author: Sean Owen Closes #15129 from srowen/SPARK-17546. (cherry picked from commit 342c0e65bec4b9a715017089ab6ea127f3c46540) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://

spark git commit: [SPARK-17586][BUILD] Do not call static member via instance reference

2016-09-18 Thread srowen
Repository: spark Updated Branches: refs/heads/master 342c0e65b -> 7151011b3 [SPARK-17586][BUILD] Do not call static member via instance reference ## What changes were proposed in this pull request? This PR fixes a warning message as below: ``` [WARNING] .../UnsafeInMemorySorter.java:284: wa

spark git commit: [SPARK-17586][BUILD] Do not call static member via instance reference

2016-09-18 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 5619f095b -> 6c67d86f2 [SPARK-17586][BUILD] Do not call static member via instance reference ## What changes were proposed in this pull request? This PR fixes a warning message as below: ``` [WARNING] .../UnsafeInMemorySorter.java:284

spark git commit: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV cast null values properly

2016-09-18 Thread srowen
Repository: spark Updated Branches: refs/heads/master 7151011b3 -> 1dbb725db [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV cast null values properly ## Problem CSV in Spark 2.0.0: - does not read null values back correctly for certain data types such as `Boolean`, `TimestampType`, `

spark git commit: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV cast null values properly

2016-09-18 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 6c67d86f2 -> 151f808a1 [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV cast null values properly ## Problem CSV in Spark 2.0.0: - does not read null values back correctly for certain data types such as `Boolean`, `TimestampType

spark git commit: [SPARK-17297][DOCS] Clarify window/slide duration as absolute time, not relative to a calendar

2016-09-19 Thread srowen
not relative to a calendar. ## How was this patch tested? Doc build (no functional change) Author: Sean Owen Closes #15142 from srowen/SPARK-17297. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d720a401 Tree: http://git-

spark git commit: [SPARK-17297][DOCS] Clarify window/slide duration as absolute time, not relative to a calendar

2016-09-19 Thread srowen
not relative to a calendar. ## How was this patch tested? Doc build (no functional change) Author: Sean Owen Closes #15142 from srowen/SPARK-17297. (cherry picked from commit d720a4019460b6c284d0473249303c349df60a1f) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/sp

spark git commit: [SPARK-17437] Add uiWebUrl to JavaSparkContext and pyspark.SparkContext

2016-09-20 Thread srowen
Repository: spark Updated Branches: refs/heads/master f039d964d -> 4a426ff8a [SPARK-17437] Add uiWebUrl to JavaSparkContext and pyspark.SparkContext ## What changes were proposed in this pull request? The Scala version of `SparkContext` has a handy field called `uiWebUrl` that tells you whic

spark-website git commit: Add Israel Spark meetup to community page per request. Use https for meetup while we're here. Pick up a recent change to paper hyperlink reflected only in markdown, not HTML

2016-09-21 Thread srowen
Repository: spark-website Updated Branches: refs/heads/asf-site eee58685c -> 7c96b646e Add Israel Spark meetup to community page per request. Use https for meetup while we're here. Pick up a recent change to paper hyperlink reflected only in markdown, not HTML Project: http://git-wip-us.apa

spark git commit: [CORE][DOC] Fix errors in comments

2016-09-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master e48ebc4e4 -> 61876a427 [CORE][DOC] Fix errors in comments ## What changes were proposed in this pull request? While reading source code of CORE and SQL core, I found some minor errors in comments such as extra space, missing blank line and

spark git commit: [SPARK-17595][MLLIB] Use a bounded priority queue to find synonyms in Word2VecModel

2016-09-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master d3b886976 -> 7654385f2 [SPARK-17595][MLLIB] Use a bounded priority queue to find synonyms in Word2VecModel ## What changes were proposed in this pull request? The code in `Word2VecModel.findSynonyms` to choose the vocabulary elements with

spark git commit: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based on False Positive Rate (FPR) test

2016-09-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master 28fafa3ee -> b366f1849 [SPARK-17017][MLLIB][ML] add a chiSquare Selector based on False Positive Rate (FPR) test ## What changes were proposed in this pull request? Univariate feature selection works by selecting the best features based o

spark git commit: [SPARK-17219][ML] Add NaN value handling in Bucketizer

2016-09-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master b366f1849 -> 57dc326bd [SPARK-17219][ML] Add NaN value handling in Bucketizer ## What changes were proposed in this pull request? This PR fixes an issue when Bucketizer is called to handle a dataset containing NaN value. Sometimes, null va

spark git commit: [SPARK-17583][SQL] Remove uesless rowSeparator variable and set auto-expanding buffer as default for maxCharsPerColumn option in CSV

2016-09-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master 57dc326bd -> 25a020be9 [SPARK-17583][SQL] Remove uesless rowSeparator variable and set auto-expanding buffer as default for maxCharsPerColumn option in CSV ## What changes were proposed in this pull request? This PR includes the changes b

spark git commit: [CORE][MINOR] Add minor code change to TaskState and Task

2016-09-21 Thread srowen
Repository: spark Updated Branches: refs/heads/master 25a020be9 -> dd7561d33 [CORE][MINOR] Add minor code change to TaskState and Task ## What changes were proposed in this pull request? - TaskState and ExecutorState expose isFailed and isFinished functions. It can be useful to add test cover

spark git commit: [BACKPORT 2.0][MINOR][BUILD] Fix CheckStyle Error

2016-09-21 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 65295bad9 -> 45bccdd9c [BACKPORT 2.0][MINOR][BUILD] Fix CheckStyle Error ## What changes were proposed in this pull request? This PR is to fix the code style errors. ## How was this patch tested? Manual. Before: ``` ./dev/lint-java Us

spark git commit: [SPARK-17421][DOCS] Documenting the current treatment of MAVEN_OPTS.

2016-09-22 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 e8b26be9b -> b25a8e6e1 [SPARK-17421][DOCS] Documenting the current treatment of MAVEN_OPTS. ## What changes were proposed in this pull request? Modified the documentation to clarify that `build/mvn` and `pom.xml` always add Java 7-spe

spark git commit: [SPARK-17421][DOCS] Documenting the current treatment of MAVEN_OPTS.

2016-09-22 Thread srowen
Repository: spark Updated Branches: refs/heads/master de7df7def -> 646f38346 [SPARK-17421][DOCS] Documenting the current treatment of MAVEN_OPTS. ## What changes were proposed in this pull request? Modified the documentation to clarify that `build/mvn` and `pom.xml` always add Java 7-specifi

spark git commit: [BUILD] Closes some stale PRs

2016-09-23 Thread srowen
Repository: spark Updated Branches: refs/heads/master 62ccf27ab -> 5c5396cb4 [BUILD] Closes some stale PRs ## What changes were proposed in this pull request? This PR proposes to close some stale PRs and ones suggested to be closed by committer(s) Closes #12415 Closes #14765 Closes #15118 C

spark git commit: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulator API on top of Accumulator V2

2016-09-23 Thread srowen
Repository: spark Updated Branches: refs/heads/master 5c5396cb4 -> 90d575421 [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulator API on top of Accumulator V2 ## What changes were proposed in this pull request? Move the internals of the PySpark accumulator API from the old deprecated AP

spark git commit: [SPARK-10835][ML] Word2Vec should accept non-null string array, in addition to existing null string array

2016-09-24 Thread srowen
Vec, output a nullable string array type in NGram ## How was this patch tested? Jenkins tests. Author: Sean Owen Closes #15179 from srowen/SPARK-10835. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f3fe5543 Tree: http://

spark git commit: [SPARK-10835][ML] Word2Vec should accept non-null string array, in addition to existing null string array

2016-09-24 Thread srowen
ith Word2Vec, output a nullable string array type in NGram ## How was this patch tested? Jenkins tests. Author: Sean Owen Closes #15179 from srowen/SPARK-10835. (cherry picked from commit f3fe55439e4c865c26502487a1bccf255da33f4a) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/re

spark git commit: [SPARK-17057][ML] ProbabilisticClassifierModels' thresholds should have at most one 0

2016-09-24 Thread srowen
omForest cutoff, requiring all > 0 ## How was this patch tested? Jenkins tests plus new test cases Author: Sean Owen Closes #15149 from srowen/SPARK-17057. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/248916f5 Tree: htt

spark git commit: [SPARK-17017][FOLLOW-UP][ML] Refactor of ChiSqSelector and add ML Python API.

2016-09-26 Thread srowen
Repository: spark Updated Branches: refs/heads/master 59d87d240 -> ac65139be [SPARK-17017][FOLLOW-UP][ML] Refactor of ChiSqSelector and add ML Python API. ## What changes were proposed in this pull request? #14597 modified ```ChiSqSelector``` to support ```fpr``` type selector, however, it le

spark git commit: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc

2016-09-26 Thread srowen
Repository: spark Updated Branches: refs/heads/master ac65139be -> 50b89d05b [SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc ## What changes were proposed in this pull request? This change modifies the implementation of DataFrameWriter.save such that it works with jdbc, and the cal

spark git commit: [SPARK-17017][ML][MLLIB][ML][DOC] Updated the ml/mllib feature selection docs for ChiSqSelector

2016-09-28 Thread srowen
Repository: spark Updated Branches: refs/heads/master 4a8339568 -> b2a7eedcd [SPARK-17017][ML][MLLIB][ML][DOC] Updated the ml/mllib feature selection docs for ChiSqSelector ## What changes were proposed in this pull request? A follow up for #14597 to update feature selection docs about ChiSq

spark git commit: [MINOR][PYSPARK][DOCS] Fix examples in PySpark documentation

2016-09-28 Thread srowen
Repository: spark Updated Branches: refs/heads/master b2a7eedcd -> 219003775 [MINOR][PYSPARK][DOCS] Fix examples in PySpark documentation ## What changes were proposed in this pull request? This PR proposes to fix wrongly indented examples in PySpark documentation ``` ->>> json_sdf =

spark git commit: [MINOR][PYSPARK][DOCS] Fix examples in PySpark documentation

2016-09-28 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 1b02f8820 -> 4d73d5cd8 [MINOR][PYSPARK][DOCS] Fix examples in PySpark documentation ## What changes were proposed in this pull request? This PR proposes to fix wrongly indented examples in PySpark documentation ``` ->>> json_s

spark git commit: [SPARK-17614][SQL] sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support

2016-09-29 Thread srowen
y rather than hard-coded WHERE 1=0 query ## How was this patch tested? Existing tests. Author: Sean Owen Closes #15196 from srowen/SPARK-17614. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b35b0dbb Tree: http://git

spark git commit: [MINOR][DOCS] Fix th doc. of spark-streaming with kinesis

2016-09-29 Thread srowen
Repository: spark Updated Branches: refs/heads/master b35b0dbbf -> b2e9731ca [MINOR][DOCS] Fix th doc. of spark-streaming with kinesis ## What changes were proposed in this pull request? This pr is just to fix the document of `spark-kinesis-integration`. Since `SPARK-17418` prevented all the k

spark git commit: [MINOR][DOCS] Fix th doc. of spark-streaming with kinesis

2016-09-29 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 7d612a7d5 -> ca8130050 [MINOR][DOCS] Fix th doc. of spark-streaming with kinesis ## What changes were proposed in this pull request? This pr is just to fix the document of `spark-kinesis-integration`. Since `SPARK-17418` prevented all t

spark git commit: [SPARK-17704][ML][MLLIB] ChiSqSelector performance improvement.

2016-10-01 Thread srowen
put ## How was this patch tested? Existing tests. Author: Sean Owen Closes #15299 from srowen/SPARK-17704.2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b88cb63d Tree: http://git-wip-us.apache.org/repos/asf/spark/t

spark git commit: [SPARK-17598][SQL][WEB UI] User-friendly name for Spark Thrift Server in web UI

2016-10-03 Thread srowen
Repository: spark Updated Branches: refs/heads/master 76dc2d907 -> de3f71ed7 [SPARK-17598][SQL][WEB UI] User-friendly name for Spark Thrift Server in web UI ## What changes were proposed in this pull request? The name of Spark Thrift JDBC/ODBC Server in web UI reflects the name of the class,

spark git commit: [SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,…

2016-10-03 Thread srowen
Repository: spark Updated Branches: refs/heads/master de3f71ed7 -> a27033c0b [SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,… ## What changes were proposed in this pull request? To build R docs (which are built when R tests are run), users need to install pandoc and rma

spark git commit: [SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,…

2016-10-03 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 744aac8e6 -> b57e2acb1 [SPARK-17736][DOCUMENTATION][SPARKR] Update R README for rmarkdown,… ## What changes were proposed in this pull request? To build R docs (which are built when R tests are run), users need to install pandoc and

spark git commit: [SPARK-17671][WEBUI] Spark 2.0 history server summary page is slow even set spark.history.ui.maxApplications

2016-10-04 Thread srowen
simpler too to make this internal method also pass through an Iterator. ## How was this patch tested? Existing tests. Author: Sean Owen Closes #15321 from srowen/SPARK-17671. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spar

spark git commit: [SPARK-16962][CORE][SQL] Fix misaligned record accesses for SPARC architectures

2016-10-04 Thread srowen
Repository: spark Updated Branches: refs/heads/master 8e8de0073 -> 7d5160883 [SPARK-16962][CORE][SQL] Fix misaligned record accesses for SPARC architectures ## What changes were proposed in this pull request? Made changes to record length offsets to make them uniform throughout various areas

spark git commit: [BUILD] Closing some stale PRs

2016-10-06 Thread srowen
Repository: spark Updated Branches: refs/heads/master 7aeb20be7 -> 5e9f32dd8 [BUILD] Closing some stale PRs ## What changes were proposed in this pull request? This PR proposes to close some stale PRs and ones suggested to be closed by committer(s) or obviously inappropriate PRs (e.g. branch

spark git commit: [SPARK-17782][STREAMING][BUILD] Add Kafka 0.10 project to build modules

2016-10-07 Thread srowen
Repository: spark Updated Branches: refs/heads/master bcaa799cb -> 18bf9d2b2 [SPARK-17782][STREAMING][BUILD] Add Kafka 0.10 project to build modules ## What changes were proposed in this pull request? This PR adds the Kafka 0.10 subproject to the build infrastructure. This makes sure Kafka 0.

spark git commit: [SPARK-17795][WEB UI] Sorting on stage or job tables doesn’t reload page on that table

2016-10-07 Thread srowen
Repository: spark Updated Branches: refs/heads/master 18bf9d2b2 -> 24097d847 [SPARK-17795][WEB UI] Sorting on stage or job tables doesn’t reload page on that table ## What changes were proposed in this pull request? Added anchor on table header id to sorting links on job and stage tables.

spark git commit: [SPARK-16960][SQL] Deprecate approxCountDistinct, toDegrees and toRadians according to FunctionRegistry

2016-10-07 Thread srowen
Repository: spark Updated Branches: refs/heads/master 24097d847 -> 2b01d3c70 [SPARK-16960][SQL] Deprecate approxCountDistinct, toDegrees and toRadians according to FunctionRegistry ## What changes were proposed in this pull request? It seems `approxCountDistinct`, `toDegrees` and `toRadians`

spark git commit: [SPARK-17793][WEB UI] Sorting on the description on the Job or Stage page doesn’t always work

2016-10-08 Thread srowen
Repository: spark Updated Branches: refs/heads/master 471690f90 -> 362ba4b6f [SPARK-17793][WEB UI] Sorting on the description on the Job or Stage page doesn’t always work ## What changes were proposed in this pull request? Added secondary sorting on stage name for the description column. T

spark git commit: [SPARK-17768][CORE] Small (Sum, Count, Mean)Evaluator problems and suboptimalities

2016-10-08 Thread srowen
ats in each could use a bit of documentation as I had to guess at them - (Code could use a few cleanups and optimizations too) ## How was this patch tested? Existing and new tests Author: Sean Owen Closes #15341 from srowen/SPARK-17768. Project: http://git-wip-us.apache.org/repos/asf/spark/repo C

spark git commit: [MINOR][SQL] Use resource path for test_script.sh

2016-10-08 Thread srowen
Repository: spark Updated Branches: refs/heads/master 4201ddcc0 -> 8a6bbe095 [MINOR][SQL] Use resource path for test_script.sh ## What changes were proposed in this pull request? This PR modified the test case `test("script")` to use resource path for `test_script.sh`. Make the test case port

spark-website git commit: Add new book per request; again fix minor differences in rendering from others' different/older jekyll versions?

2016-10-09 Thread srowen
Repository: spark-website Updated Branches: refs/heads/asf-site 5a6c152c8 -> 4fbd720dd Add new book per request; again fix minor differences in rendering from others' different/older jekyll versions? Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us

spark-website git commit: Oops, fix HTML error in last new book addition

2016-10-09 Thread srowen
Repository: spark-website Updated Branches: refs/heads/asf-site 4fbd720dd -> 4150a7329 Oops, fix HTML error in last new book addition Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/4150a732 Tree: http://gi

spark git commit: [HOT-FIX][SQL][TESTS] Remove unused function in `SparkSqlParserSuite`

2016-10-10 Thread srowen
Repository: spark Updated Branches: refs/heads/master 23ddff4b2 -> 7e16c94f1 [HOT-FIX][SQL][TESTS] Remove unused function in `SparkSqlParserSuite` ## What changes were proposed in this pull request? The function `SparkSqlParserSuite.createTempViewUsing` is not used for now and causes build f

spark git commit: [SPARK-17828][DOCS] Remove unused generate-changelist.py

2016-10-10 Thread srowen
Repository: spark Updated Branches: refs/heads/master 689de9200 -> 3f8a0222e [SPARK-17828][DOCS] Remove unused generate-changelist.py ## What changes were proposed in this pull request? We can remove this file based on discussion at https://issues.apache.org/jira/browse/SPARK-17828 it's evide

spark git commit: [SPARK-14082][MESOS] Enable GPU support with Mesos

2016-10-10 Thread srowen
Repository: spark Updated Branches: refs/heads/master 3f8a0222e -> 29f186bfd [SPARK-14082][MESOS] Enable GPU support with Mesos ## What changes were proposed in this pull request? Enable GPU resources to be used when running coarse grain mode with Mesos. ## How was this patch tested? Manual

spark git commit: [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13

2016-10-10 Thread srowen
Repository: spark Updated Branches: refs/heads/master 19401a203 -> 658c7147f [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13 ## What changes were proposed in this pull request? Upgraded to a newer version of Pyrolite which supports serialization of a BinaryType StructField for PyS

spark git commit: [SPARK-11560][MLLIB] Optimize KMeans implementation / remove 'runs'

2016-10-12 Thread srowen
ning that it may return less than k centroids if not enough data is available. ## How was this patch tested? Existing tests Author: Sean Owen Closes #15342 from srowen/SPARK-11560. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13

2016-10-12 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 f12b74c02 -> 4dcbde48d [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13 ## What changes were proposed in this pull request? Upgraded to a newer version of Pyrolite which supports serialization of a BinaryType StructField for

spark git commit: [SPARK-17870][MLLIB][ML] Change statistic to pValue for SelectKBest and SelectPercentile because of DoF difference

2016-10-14 Thread srowen
Repository: spark Updated Branches: refs/heads/master a1b136d05 -> c8b612dec [SPARK-17870][MLLIB][ML] Change statistic to pValue for SelectKBest and SelectPercentile because of DoF difference ## What changes were proposed in this pull request? For feature selection method ChiSquareSelector,

spark git commit: [SPARK-17855][CORE] Remove query string from jar url

2016-10-14 Thread srowen
Repository: spark Updated Branches: refs/heads/master c8b612dec -> 28b645b1e [SPARK-17855][CORE] Remove query string from jar url ## What changes were proposed in this pull request? Spark-submit support jar url with http protocol. However, if the url contains any query strings, `worker.Drive

spark git commit: [DOC] Fix typo in sql hive doc

2016-10-14 Thread srowen
Repository: spark Updated Branches: refs/heads/master 7486442fe -> a0ebcb3a3 [DOC] Fix typo in sql hive doc Change is too trivial to file a JIRA. Author: Dhruve Ashar Closes #15485 from dhruve/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apa

spark git commit: Typo: form -> from

2016-10-14 Thread srowen
Repository: spark Updated Branches: refs/heads/master a0ebcb3a3 -> fa37877af Typo: form -> from ## What changes were proposed in this pull request? Minor typo fix ## How was this patch tested? Existing unit tests on Jenkins Author: Andrew Ash Closes #15486 from ash211/patch-8. Project:

spark git commit: Fix example of tf_idf with minDocFreq

2016-10-17 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 d1a021178 -> a0d9015b3 Fix example of tf_idf with minDocFreq ## What changes were proposed in this pull request? The python example for tf_idf with the parameter "minDocFreq" is not properly set up because the same variable is used to

spark git commit: Fix example of tf_idf with minDocFreq

2016-10-17 Thread srowen
Repository: spark Updated Branches: refs/heads/master 56b0f5f4d -> e3bf37fa3 Fix example of tf_idf with minDocFreq ## What changes were proposed in this pull request? The python example for tf_idf with the parameter "minDocFreq" is not properly set up because the same variable is used to tra

spark git commit: [SPARK-17985][CORE] Bump commons-lang3 version to 3.5.

2016-10-19 Thread srowen
Repository: spark Updated Branches: refs/heads/master f39852e59 -> 9540357ad [SPARK-17985][CORE] Bump commons-lang3 version to 3.5. ## What changes were proposed in this pull request? `SerializationUtils.clone()` of commons-lang3 (<3.5) has a bug that breaks thread safety, which gets stack s

spark git commit: [SPARK-11653][DEPLOY] Allow spark-daemon.sh to run in the foreground

2016-10-20 Thread srowen
Repository: spark Updated Branches: refs/heads/master 4bd17c460 -> c2c107aba [SPARK-11653][DEPLOY] Allow spark-daemon.sh to run in the foreground ## What changes were proposed in this pull request? Add a SPARK_NO_DAEMONIZE environment variable flag to spark-daemon.sh that causes the process

spark git commit: [SPARK-17796][SQL] Support wildcard character in filename for LOAD DATA LOCAL INPATH

2016-10-20 Thread srowen
Repository: spark Updated Branches: refs/heads/master c2c107aba -> 986a3b8b5 [SPARK-17796][SQL] Support wildcard character in filename for LOAD DATA LOCAL INPATH ## What changes were proposed in this pull request? Currently, Spark 2.0 raises an `input path does not exist` AnalysisException i

spark git commit: [SPARK-17331][FOLLOWUP][ML][CORE] Avoid allocating 0-length arrays

2016-10-21 Thread srowen
-name '*.scala' | xargs -i bash -c 'egrep "Array\[[A-Za-z]+\]\(\)" -n {} && echo {}'` to find modification candidates. cc srowen ## How was this patch tested? existing tests Author: Zheng RuiFeng Closes #15564 from zhengruifeng/avoid_0_length_array. Pro

<    1   2   3   4   5   6   7   8   9   10   >