Repository: spark
Updated Branches:
refs/heads/branch-2.4 b632e775c -> 085f731ad
[SPARK-25268][GRAPHX] run Parallel Personalized PageRank throws serialization
Exception
## What changes were proposed in this pull request?
mapValues in scala is currently not serializable. To avoid the serializa
Repository: spark
Updated Branches:
refs/heads/master 7ef6d1daf -> 3b6591b0b
[SPARK-25268][GRAPHX] run Parallel Personalized PageRank throws serialization
Exception
## What changes were proposed in this pull request?
mapValues in scala is currently not serializable. To avoid the serialization
Repository: spark
Updated Branches:
refs/heads/branch-2.3 42c1fdd22 -> f5983823e
[SPARK-25124][ML] VectorSizeHint setSize and getSize don't return values
backport to 2.3
## What changes were proposed in this pull request?
In feature.py, VectorSizeHint setSize and getSize don't return value. A
Repository: spark
Updated Branches:
refs/heads/master 8ed044928 -> b5e118808
[SPARK-25124][ML] VectorSizeHint setSize and getSize don't return values
## What changes were proposed in this pull request?
In feature.py, VectorSizeHint setSize and getSize don't return value. Add
return.
## How
Repository: spark
Updated Branches:
refs/heads/master 99d2e4e00 -> 72ecfd095
[SPARK-25149][GRAPHX] Update Parallel Personalized Page Rank to test with large
vertexIds
## What changes were proposed in this pull request?
runParallelPersonalizedPageRank in graphx checks that `sources` are <=
I
Repository: spark
Updated Branches:
refs/heads/master 244bcff19 -> 3cb1b5780
[SPARK-24852][ML] Update spark.ml to use Instrumentation.instrumented.
## What changes were proposed in this pull request?
Followup for #21719.
Update spark.ml training code to fully wrap instrumented methods and rem
Repository: spark
Updated Branches:
refs/heads/master 7688ce88b -> 912634b00
[SPARK-24747][ML] Make Instrumentation class more flexible
## What changes were proposed in this pull request?
This PR updates the Instrumentation class to make it more flexible and a little
bit easier to use. When
Repository: spark
Updated Branches:
refs/heads/master a33dcf4a0 -> ffaefe755
[SPARK-7132][ML] Add fit with validation set to spark.ml GBT
## What changes were proposed in this pull request?
Add fit with validation set to spark.ml GBT
## How was this patch tested?
Will add later.
Author: We
Repository: spark
Updated Branches:
refs/heads/master a7a9b1837 -> 439c69511
[SPARK-24114] Add instrumentation to FPGrowth.
## What changes were proposed in this pull request?
Have FPGrowth keep track of model training using the Instrumentation class.
## How was this patch tested?
manually
Repository: spark
Updated Branches:
refs/heads/master 991726f31 -> bfd75cdfb
[SPARK-22210][ML] Add seed for LDA variationalTopicInference
## What changes were proposed in this pull request?
- Add seed parameter for variationalTopicInference
- Add seed for calling variationalTopicInference in
Repository: spark
Updated Branches:
refs/heads/master 6b94420f6 -> 8a13c5096
[SPARK-24058][ML][PYSPARK] Default Params in ML should be saved separately:
Python API
## What changes were proposed in this pull request?
See SPARK-23455 for reference. Now default params in ML are saved separately
Repository: spark
Updated Branches:
refs/heads/master 628c7b517 -> 7aaa148f5
[SPARK-14682][ML] Provide evaluateEachIteration method or equivalent for
spark.ml GBTs
## What changes were proposed in this pull request?
Provide evaluateEachIteration method or equivalent for spark.ml GBTs.
## Ho
ser guide page. I also improved the wording and organization
slightly.
## How was this patch tested?
Built docs locally.
Author: Joseph K. Bradley
Closes #21272 from jkbradley/nb-doc-update.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/
Repository: spark
Updated Branches:
refs/heads/master f48bd6bdc -> 76ecd0950
[SPARK-20114][ML] spark.ml parity for sequential pattern mining - PrefixSpan
## What changes were proposed in this pull request?
PrefixSpan API for spark.ml. New implementation instead of #20810
## How was this patc
Repository: spark
Updated Branches:
refs/heads/master 1c9c5de95 -> f48bd6bdc
[SPARK-22885][ML][TEST] ML test for StructuredStreaming: spark.ml.tuning
## What changes were proposed in this pull request?
ML test for StructuredStreaming: spark.ml.tuning
## How was this patch tested?
N/A
Autho
Repository: spark
Updated Branches:
refs/heads/master d83e96372 -> 56a52e0a5
[SPARK-15750][MLLIB][PYSPARK] Constructing FPGrowth fails when no numPartitions
specified in pyspark
## What changes were proposed in this pull request?
Change FPGrowth from private to private[spark]. If no numParti
Repository: spark
Updated Branches:
refs/heads/master 83013752e -> 379bffa05
[SPARK-23990][ML] Instruments logging improvements - ML regression package
## What changes were proposed in this pull request?
Instruments logging improvements - ML regression package
I add an `OptionalInstrument` c
Repository: spark
Updated Branches:
refs/heads/master ce7ba2e98 -> 83013752e
[SPARK-23455][ML] Default Params in ML should be saved separately in metadata
## What changes were proposed in this pull request?
We save ML's user-supplied params and default params as one entity in metadata.
Durin
Repository: spark
Updated Branches:
refs/heads/master 55c4ca88a -> 2a24c481d
[SPARK-23975][ML] Allow Clustering to take Arrays of Double as input features
## What changes were proposed in this pull request?
- Multiple possible input types is added in validateAndTransformSchema() and
computeC
y author is wangmiao1981
## How was this patch tested?
This PR has 2 types of tests:
* Copies of tests from spark.mllib's PIC tests
* New tests specific to the spark.ml APIs
Author: wm...@hotmail.com
Author: wangmiao1981
Author: Joseph K. Bradley
Closes #21090 from jkbradley/wan
Repository: spark
Updated Branches:
refs/heads/master f39e82ce1 -> 1ca3c50fe
[SPARK-21741][ML][PYSPARK] Python API for DataFrame-based multivariate
summarizer
## What changes were proposed in this pull request?
Python API for DataFrame-based multivariate summarizer.
## How was this patch te
Repository: spark
Updated Branches:
refs/heads/master 5003736ad -> 04614820e
[SPARK-21088][ML] CrossValidator, TrainValidationSplit support collect all
models when fitting: Python API
## What changes were proposed in this pull request?
Add python API for collecting sub-models during
CrossVa
Repository: spark
Updated Branches:
refs/heads/master 083cf2235 -> 5003736ad
[SPARK-9312][ML] Add RawPrediction, numClasses, and numFeatures for
OneVsRestModel
add RawPrediction as output column
add numClasses and numFeatures to OneVsRestModel
## What changes were proposed in this pull reque
Repository: spark
Updated Branches:
refs/heads/master 0b19122d4 -> 0f93b91a7
[SPARK-23751][FOLLOW-UP] fix build for scala-2.12
## What changes were proposed in this pull request?
fix build for scala-2.12
## How was this patch tested?
Manual.
Author: WeichenXu
Closes #21051 from WeichenXu
Repository: spark
Updated Branches:
refs/heads/master 75a183071 -> 9d960de08
typo rawPredicition changed to rawPrediction
MultilayerPerceptronClassifier had 4 occurrences
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch
Repository: spark
Updated Branches:
refs/heads/branch-2.3 acfc156df -> 03a4dfd69
typo rawPredicition changed to rawPrediction
MultilayerPerceptronClassifier had 4 occurrences
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this pa
dds structured streaming tests using testTransformer for these suites:
* IDF
* Imputer
* Interaction
* MaxAbsScaler
* MinHashLSH
* MinMaxScaler
* NGram
## How was this patch tested?
It is a bunch of tests!
Author: Joseph K. Bradley
Author: Joseph K. Bradley
Closes #21042 from jkbradley/SPARK-22883-pa
ter
* Interaction
* MaxAbsScaler
* MinHashLSH
* MinMaxScaler
* NGram
## How was this patch tested?
It is a bunch of tests!
Author: Joseph K. Bradley
Closes #20964 from jkbradley/SPARK-22883-part2.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/
Repository: spark
Updated Branches:
refs/heads/master 4f1e8b9bb -> 7c7570d46
[SPARK-23944][ML] Add the set method for the two LSHModel
## What changes were proposed in this pull request?
Add two set method for LSHModel in LSH.scala,
BucketedRandomProjectionLSH.scala, and MinHashLSH.scala
##
Repository: spark
Updated Branches:
refs/heads/master adb222b95 -> 4f1e8b9bb
[SPARK-23871][ML][PYTHON] add python api for VectorAssembler handleInvalid
## What changes were proposed in this pull request?
add python api for VectorAssembler handleInvalid
## How was this patch tested?
Add doct
Repository: spark
Updated Branches:
refs/heads/master e17965891 -> adb222b95
[SPARK-23751][ML][PYSPARK] Kolmogorov-Smirnoff test Python API in pyspark.ml
## What changes were proposed in this pull request?
Kolmogorov-Smirnoff test Python API in `pyspark.ml`
**Note** API with `CDF` is a litt
ide val rootNode: ClassificationNode
class DecisionTreeRegressionModel
override val rootNode: RegressionNode
```
Closes #17466
## How was this patch tested?
UT will be added soon.
Author: WeichenXu
Author: jkbradley
Closes #20786 from WeichenXu123/tree_stat_api_2.
Project: http://git-
Repository: spark
Updated Branches:
refs/heads/master c926acf71 -> d23a805f9
[SPARK-23859][ML] Initial PR for Instrumentation improvements: UUID and logging
levels
## What changes were proposed in this pull request?
Initial PR for Instrumentation improvements: UUID and logging levels.
This P
Repository: spark
Updated Branches:
refs/heads/master 4807d381b -> f2ac08795
[SPARK-23870][ML] Forward RFormula handleInvalid Param to VectorAssembler to
handle invalid values in non-string columns
## What changes were proposed in this pull request?
`handleInvalid` Param was forwarded to the
Repository: spark
Updated Branches:
refs/heads/master 28ea4e314 -> a1351828d
[SPARK-23690][ML] Add handleinvalid to VectorAssembler
## What changes were proposed in this pull request?
Introduce `handleInvalid` parameter in `VectorAssembler` that can take in
`"keep", "skip", "error"` options.
of
JavaKolmogorovSmirnovTestSuite
Author: Joseph K. Bradley
Closes #20875 from jkbradley/kstest-lint-fix.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a091ee67
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a091ee67
Diff: http://git-
Repository: spark
Updated Branches:
refs/heads/master 500b21c3d -> bf09f2f71
[SPARK-10884][ML] Support prediction on single instance for regression and
classification related models
## What changes were proposed in this pull request?
Support prediction on single instance for regression and c
for `KolmogorovSmirnovTest` in `mllib.stat`.
## How was this patch tested?
Test suite added.
Author: WeichenXu
Author: jkbradley
Closes #19108 from WeichenXu123/ml-ks-test.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7f5e8
Repository: spark
Updated Branches:
refs/heads/branch-2.3 80e79430f -> 920493949
[SPARK-23728][BRANCH-2.3] Fix ML tests with expected exceptions running
streaming tests
## What changes were proposed in this pull request?
The testTransformerByInterceptingException failed to catch the expected
Repository: spark
Updated Branches:
refs/heads/master 1098933b0 -> 279b3db89
http://git-wip-us.apache.org/repos/asf/spark/blob/279b3db8/mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala
--
diff --git
a/
[SPARK-22915][MLLIB] Streaming tests for spark.ml.feature, from N to Z
# What changes were proposed in this pull request?
Adds structured streaming tests using testTransformer for these suites:
- NGramSuite
- NormalizerSuite
- OneHotEncoderEstimatorSuite
- OneHotEncoderSuite
- PCASuite
- Polynom
Repository: spark
Updated Branches:
refs/heads/branch-2.3 f3efbfa4b -> 0663b6119
http://git-wip-us.apache.org/repos/asf/spark/blob/0663b611/mllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala
--
diff --git
[SPARK-22915][MLLIB] Streaming tests for spark.ml.feature, from N to Z
# What changes were proposed in this pull request?
Adds structured streaming tests using testTransformer for these suites:
- NGramSuite
- NormalizerSuite
- OneHotEncoderEstimatorSuite
- OneHotEncoderSuite
- PCASuite
- Polynom
Repository: spark
Updated Branches:
refs/heads/master 508573958 -> 7706eea6a
[SPARK-18630][PYTHON][ML] Move del method from JavaParams to JavaWrapper; add
tests
The `__del__` method that explicitly detaches the object was moved from
`JavaParams` to `JavaWrapper` class, this way model summari
Repository: spark
Updated Branches:
refs/heads/master 4586eada4 -> 98a5c0a35
[SPARK-22882][ML][TESTS] ML test for structured streaming: ml.classification
## What changes were proposed in this pull request?
adding Structured Streaming tests for all Models/Transformers in
spark.ml.classificati
Repository: spark
Updated Branches:
refs/heads/branch-2.3 232b9f81f -> 4550673b1
[SPARK-22882][ML][TESTS] ML test for structured streaming: ml.classification
## What changes were proposed in this pull request?
adding Structured Streaming tests for all Models/Transformers in
spark.ml.classifi
ses #20111 from jkbradley/SPARK-22883-streaming-featureAM.
(cherry picked from commit 119f6a0e4729aa952e811d2047790a32ee90bf69)
Signed-off-by: Joseph K. Bradley
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/56cfbd93
Tree: h
111 from jkbradley/SPARK-22883-streaming-featureAM.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/119f6a0e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/119f6a0e
Diff: http://git-wip-us.apache.org/repos/asf/spark/d
Repository: spark
Updated Branches:
refs/heads/branch-2.2 a95c3e29d -> 1cc34f3e5
[SPARK-22700][ML] Bucketizer.transform incorrectly drops row containing NaN -
for branch-2.2
## What changes were proposed in this pull request?
for branch-2.2
only drops the rows containing NaN in the input colu
Repository: spark
Updated Branches:
refs/heads/branch-2.3 03960faa6 -> 0bd7765cd
[SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug
## What changes were proposed in this pull request?
Problem:
Since 2.3, `Bucketizer` supports multiple input/output columns. We will
Repository: spark
Updated Branches:
refs/heads/master 6968c3cfd -> db45daab9
[SPARK-23377][ML] Fixes Bucketizer with multiple columns persistence bug
## What changes were proposed in this pull request?
Problem:
Since 2.3, `Bucketizer` supports multiple input/output columns. We will chec
ML models
and Pipelines from old Spark versions. Discussed & confirmed on linked JIRA.
Author: Joseph K. Bradley
Closes #20592 from jkbradley/SPARK-23154-backwards-compat-doc.
(cherry picked from commit d58fe28836639e68e262812d911f167cb071007b)
Signed-off-by: Joseph K. Bradley
Projec
ML models
and Pipelines from old Spark versions. Discussed & confirmed on linked JIRA.
Author: Joseph K. Bradley
Closes #20592 from jkbradley/SPARK-23154-backwards-compat-doc.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark
Repository: spark
Updated Branches:
refs/heads/branch-2.3 863ffdc8a -> 833a584bb
[SPARK-23045][ML][SPARKR] Update RFormula to use OneHotEncoderEstimator.
## What changes were proposed in this pull request?
RFormula should use VectorSizeHint & OneHotEncoderEstimator in its pipeline to
avoid u
Repository: spark
Updated Branches:
refs/heads/master 12db365b4 -> 4371466b3
[SPARK-23045][ML][SPARKR] Update RFormula to use OneHotEncoderEstimator.
## What changes were proposed in this pull request?
RFormula should use VectorSizeHint & OneHotEncoderEstimator in its pipeline to
avoid using
Repository: spark
Updated Branches:
refs/heads/branch-2.3 6bb22961c -> 55695c712
[SPARK-23008][ML] OnehotEncoderEstimator python API
## What changes were proposed in this pull request?
OnehotEncoderEstimator python API.
## How was this patch tested?
doctest
Author: WeichenXu
Closes #2020
Repository: spark
Updated Branches:
refs/heads/master 186bf8fb2 -> b5042d75c
[SPARK-23008][ML] OnehotEncoderEstimator python API
## What changes were proposed in this pull request?
OnehotEncoderEstimator python API.
## How was this patch tested?
doctest
Author: WeichenXu
Closes #20209 fr
Repository: spark
Updated Branches:
refs/heads/branch-2.3 f891ee324 -> 2ec302658
[SPARK-23046][ML][SPARKR] Have RFormula include VectorSizeHint in pipeline
## What changes were proposed in this pull request?
Including VectorSizeHint in RFormula piplelines will allow them to be applied
to str
Repository: spark
Updated Branches:
refs/heads/master 6f7aaed80 -> 186bf8fb2
[SPARK-23046][ML][SPARKR] Have RFormula include VectorSizeHint in pipeline
## What changes were proposed in this pull request?
Including VectorSizeHint in RFormula piplelines will allow them to be applied
to streami
zed the logic to show what I meant in the comment in the
previous PR. I think it's simpler but am open to suggestions.
I also made some small style cleanups based on IntelliJ warnings.
## How was this patch tested?
Existing unit tests
Author: Joseph K. Bradley
Closes #20132 from j
the logic to show what I meant in the comment in the
previous PR. I think it's simpler but am open to suggestions.
I also made some small style cleanups based on IntelliJ warnings.
## How was this patch tested?
Existing unit tests
Author: Joseph K. Bradley
Closes #20132 from jkbradle
Repository: spark
Updated Branches:
refs/heads/branch-2.3 145820bda -> 5b524cc0c
[SPARK-22949][ML] Apply CrossValidator approach to Driver/Distributed memory
tradeoff for TrainValidationSplit
## What changes were proposed in this pull request?
Avoid holding all models in memory for `TrainVal
Repository: spark
Updated Branches:
refs/heads/master 52fc5c17d -> cf0aa6557
[SPARK-22949][ML] Apply CrossValidator approach to Driver/Distributed memory
tradeoff for TrainValidationSplit
## What changes were proposed in this pull request?
Avoid holding all models in memory for `TrainValidat
Repository: spark
Updated Branches:
refs/heads/master 5955a2d0f -> 994065d89
[SPARK-13030][ML] Create OneHotEncoderEstimator for OneHotEncoder as Estimator
## What changes were proposed in this pull request?
This patch adds a new class `OneHotEncoderEstimator` which extends `Estimator`.
The
Repository: spark
Updated Branches:
refs/heads/master 816963043 -> 2ea17afb6
[SPARK-22881][ML][TEST] ML regression package testsuite add StructuredStreaming
test
## What changes were proposed in this pull request?
ML regression package testsuite add StructuredStreaming test
In order to make
Repository: spark
Updated Branches:
refs/heads/master 30fcdc038 -> 816963043
[SPARK-22734][ML][PYSPARK] Added Python API for VectorSizeHint.
(Please fill in changes proposed in this fix)
Python API for VectorSizeHint Transformer.
(Please explain how this patch was tested. E.g. unit tests, in
Repository: spark
Updated Branches:
refs/heads/master ccda75b0d -> 30fcdc038
[SPARK-22922][ML][PYSPARK] Pyspark portion of the fit-multiple API
## What changes were proposed in this pull request?
Adding fitMultiple API to `Estimator` with default implementation. Also update
have ml.tuning me
Repository: spark
Updated Branches:
refs/heads/master 4e9e6aee4 -> afc364146
[SPARK-22905][ML][FOLLOWUP] Fix GaussianMixtureModel save
## What changes were proposed in this pull request?
make sure model data is stored in order. WeichenXu123
## How was this patch tested?
existing tests
Autho
Repository: spark
Updated Branches:
refs/heads/master ffe6fd77a -> c74573084
[SPARK-22905][MLLIB] Fix ChiSqSelectorModel save implementation
## What changes were proposed in this pull request?
Currently, in `ChiSqSelectorModel`, save:
```
spark.createDataFrame(dataArray).repartition(1).write.
Repository: spark
Updated Branches:
refs/heads/master 774715d5c -> 753793bc8
[SPARK-22899][ML][STREAMING] Fix OneVsRestModel transform on streaming data
failed.
## What changes were proposed in this pull request?
Fix OneVsRestModel transform on streaming data failed.
## How was this patch t
PR to fix it.
## Discussion
I give 3 approaches which we can compare, after discussion I realized none of
them is ideal, we have to make a trade-off.
**After discussion with jkbradley , choose approach 3**
### Approach 1
~~The approach proposed by MrBago at~~
https://github.com/apache/spark/p
Repository: spark
Updated Branches:
refs/heads/master 13190a4f6 -> d23dc5b8e
[SPARK-22346][ML] VectorSizeHint Transformer for using VectorAssembler in
StructuredSteaming
## What changes were proposed in this pull request?
A new VectorSizeHint transformer was added. This transformer is meant
Repository: spark
Updated Branches:
refs/heads/master c7d014861 -> 0e36ba621
[SPARK-22644][ML][TEST] Make ML testsuite support StructuredStreaming test
## What changes were proposed in this pull request?
We need to add some helper code to make testing ML transformers & models easier
with str
Repository: spark
Updated Branches:
refs/heads/master 0605ad761 -> 1edb3175d
[SPARK-21866][ML][PYSPARK] Adding spark image reader
## What changes were proposed in this pull request?
Adding spark image reader, an implementation of schema for representing images
in spark DataFrames
The code is
Repository: spark
Updated Branches:
refs/heads/master 774398045 -> 1e6f76059
[SPARK-12375][ML] VectorIndexerModel support handle unseen categories via
handleInvalid
## What changes were proposed in this pull request?
Support skip/error/keep strategy, similar to `StringIndexer`.
Implemented v
Repository: spark
Updated Branches:
refs/heads/master b00972259 -> 774398045
[SPARK-21087][ML] CrossValidator, TrainValidationSplit expose sub models after
fitting: Scala
## What changes were proposed in this pull request?
We add a parameter whether to collect the full model list when
Cross
Repository: spark
Updated Branches:
refs/heads/master c8b7f97b8 -> d8741b2b0
[SPARK-21911][ML][FOLLOW-UP] Fix doc for parallel ML Tuning in PySpark
## What changes were proposed in this pull request?
Fix doc issue mentioned here:
https://github.com/apache/spark/pull/19122#issuecomment-340111
Repository: spark
Updated Branches:
refs/heads/master b3d8fc3dc -> 20eb95e5e
[SPARK-21911][ML][PYSPARK] Parallel Model Evaluation for ML Tuning in PySpark
## What changes were proposed in this pull request?
Add parallelism support for ML tuning in pyspark.
## How was this patch tested?
Test
Repository: spark
Updated Branches:
refs/heads/branch-2.2 9ed64048a -> 35725f735
[SPARK-22332][ML][TEST] Fix NaiveBayes unit test occasionly fail (cause by test
dataset not deterministic)
## What changes were proposed in this pull request?
Fix NaiveBayes unit test occasionly fail:
Set seed f
Repository: spark
Updated Branches:
refs/heads/master b377ef133 -> 841f1d776
[SPARK-22332][ML][TEST] Fix NaiveBayes unit test occasionly fail (cause by test
dataset not deterministic)
## What changes were proposed in this pull request?
Fix NaiveBayes unit test occasionly fail:
Set seed for `
Repository: spark
Updated Branches:
refs/heads/master 1f25d8683 -> 52facb006
[SPARK-14371][MLLIB] OnlineLDAOptimizer should not collect stats for each doc
in mini-batch to driver
Hi,
# What changes were proposed in this pull request?
as it was proposed by jkbradley , ```gammat``` are
Repository: spark
Updated Branches:
refs/heads/master 3e6a714c9 -> f180b6534
[SPARK-22060][ML] Fix CrossValidator/TrainValidationSplit param persist/load bug
## What changes were proposed in this pull request?
Currently the param of CrossValidator/TrainValidationSplit persist/loading is
hard
Repository: spark
Updated Branches:
refs/heads/branch-2.2 63098dc31 -> b606dc177
[SPARK-18608][ML] Fix double caching
## What changes were proposed in this pull request?
`df.rdd.getStorageLevel` => `df.storageLevel`
using cmd `find . -name '*.scala' | xargs -i bash -c 'egrep -in
"\.rdd\.getS
Repository: spark
Updated Branches:
refs/heads/master b9b54b1c8 -> c5f9b89dd
[SPARK-18608][ML] Fix double caching
## What changes were proposed in this pull request?
`df.rdd.getStorageLevel` => `df.storageLevel`
using cmd `find . -name '*.scala' | xargs -i bash -c 'egrep -in
"\.rdd\.getStora
Repository: spark
Updated Branches:
refs/heads/master 515910e9b -> 720c94fe7
[SPARK-21027][ML][PYTHON] Added tunable parallelism to one vs. rest in both
Scala mllib and Pyspark
# What changes were proposed in this pull request?
Added tunable parallelism to the pyspark implementation of one v
Repository: spark
Updated Branches:
refs/heads/master aba9492d2 -> 900f14f6f
[SPARK-21729][ML][TEST] Generic test for ProbabilisticClassifier to ensure
consistent output columns
## What changes were proposed in this pull request?
Add test for prediction using the model with all combinations
Repository: spark
Updated Branches:
refs/heads/master 96028e36b -> f5e10a34e
[SPARK-21862][ML] Add overflow check in PCA
## What changes were proposed in this pull request?
add overflow check in PCA, otherwise it is possible to throw
`NegativeArraySizeException` when `k` and `numFeatures` ar
Repository: spark
Updated Branches:
refs/heads/master cba69aeb4 -> 96028e36b
[SPARK-17139][ML][FOLLOW-UP] Add convenient method `asBinary` for casting to
BinaryLogisticRegressionSummary
## What changes were proposed in this pull request?
add an "asBinary" method to LogisticRegressionSummary
ion summary traits.
Author: Joseph K. Bradley
Closes #19071 from jkbradley/lr-summary-minor.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/840ba053
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/840ba053
Diff: h
Repository: spark
Updated Branches:
refs/heads/master 73e64f7d5 -> c7270a46f
[SPARK-17139][ML] Add model summary for MultinomialLogisticRegression
## What changes were proposed in this pull request?
Add 4 traits, using the following hierarchy:
LogisticRegressionSummary
LogisticRegressionTrain
Repository: spark
Updated Branches:
refs/heads/branch-2.2 a58536741 -> 2b4bd7910
[SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd
contains zero (backport PR for 2.2)
## What changes were proposed in this pull request?
This is backport PR of https://github.com/apache/sp
Repository: spark
Updated Branches:
refs/heads/master d58a3507e -> d6b30edd4
[SPARK-12664][ML] Expose probability in mlp model
## What changes were proposed in this pull request?
Modify MLP model to inherit `ProbabilisticClassificationModel` and so that it
can expose the probability column
Repository: spark
Updated Branches:
refs/heads/master 01a8e4627 -> d56c26210
[SPARK-21681][ML] fix bug of MLOR do not work correctly when featureStd
contains zero
## What changes were proposed in this pull request?
fix bug of MLOR do not work correctly when featureStd contains zero
We can r
Repository: spark
Updated Branches:
refs/heads/master b0bdfce9c -> 35db3b9fe
[SPARK-17025][ML][PYTHON] Persistence for Pipelines with Python-only Stages
## What changes were proposed in this pull request?
Implemented a Python-only persistence framework for pipelines containing stages
that ca
Repository: spark
Updated Branches:
refs/heads/master baf5cac0f -> fdcee028a
[SPARK-21542][ML][PYTHON] Python persistence helper functions
## What changes were proposed in this pull request?
Added DefaultParamsWriteable, DefaultParamsReadable, DefaultParamsWriter, and
DefaultParamsReader to
Repository: spark
Updated Branches:
refs/heads/master 25826c77d -> 1347b2a69
[SPARK-21633][ML][PYTHON] UnaryTransformer in Python
## What changes were proposed in this pull request?
Implemented UnaryTransformer in Python.
## How was this patch tested?
This patch was tested by creating a Moc
Repository: spark
Updated Branches:
refs/heads/master 4ce735eed -> 7047f49f4
[SPARK-21221][ML] CrossValidator and TrainValidationSplit Persist Nested
Estimators such as OneVsRest
## What changes were proposed in this pull request?
Added functionality for CrossValidator and TrainValidationSpli
to rawPrediction instead of probability. This PR changes the param
in the Scala, Python and R APIs.
## How was this patch tested?
New unit test to make sure the threshold can be set to any Double value.
Author: Joseph K. Bradley
Closes #18151 from jkbradley/ml-2.2-linearsvc-cleanup.
Project: h
to rawPrediction instead of probability. This PR changes the param
in the Scala, Python and R APIs.
## How was this patch tested?
New unit test to make sure the threshold can be set to any Double value.
Author: Joseph K. Bradley
Closes #18151 from jkbradley/ml-2.2-linearsvc-cleanup.
(che
ery easily to have an overflow in calculating the number of
partitions for ML persistence.
This modifies the calculations to use Long.
## How was this patch tested?
New unit test. I verified that the test fails before this patch.
Author: Joseph K. Bradley
Closes #18265 from jkbradley/word2
1 - 100 of 763 matches
Mail list logo