GitHub user Hydrotoast opened a pull request:
https://github.com/apache/spark/pull/13265
Log warnings for numIterations * miniBatchFraction < 1.0
## What changes were proposed in this pull request?
Add a warning log for the case that `numIterations * miniBatchFraction
<1.0` during gradient descent. If the product of those two numbers is less than
`1.0`, then not all training examples will be used during optimization. To put
this concretely, suppose that `numExamples = 100`, `miniBatchFraction = 0.2`
and `numIterations = 3`. Then, 3 iterations will occur each sampling
approximately 6 examples each. In the best case, each of the 6 examples are
unique; hence 18/100 examples are used.
This may be counter-intuitive to most users and led to the issue during the
development of another Spark ML model:
https://github.com/zhengruifeng/spark-libFM/issues/11. If a user actually does
not require the training data set, it would be easier and more intuitive to use
`RDD.sample`.
## How was this patch tested?
`build/mvn -DskipTests clean package` build succeeds
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Hydrotoast/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13265.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13265
----
commit c7dd4c61b6d954504b6e03a58a4cdd47a5077883
Author: Gio Borje <[email protected]>
Date: 2016-05-23T20:45:52Z
Log warnings for numIterations * miniBatchFraction < 1.0 during gradient
descent
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]