GitHub user Hydrotoast opened a pull request:

    https://github.com/apache/spark/pull/13265

    Log warnings for numIterations * miniBatchFraction < 1.0

    ## What changes were proposed in this pull request?
    
    Add a warning log for the case that `numIterations * miniBatchFraction 
<1.0` during gradient descent. If the product of those two numbers is less than 
`1.0`, then not all training examples will be used during optimization. To put 
this concretely, suppose that `numExamples = 100`, `miniBatchFraction = 0.2` 
and `numIterations = 3`. Then, 3 iterations will occur each sampling 
approximately 6 examples each. In the best case, each of the 6 examples are 
unique; hence 18/100 examples are used. 
    
    This may be counter-intuitive to most users and led to the issue during the 
development of another Spark  ML model: 
https://github.com/zhengruifeng/spark-libFM/issues/11. If a user actually does 
not require the training data set, it would be easier and more intuitive to use 
`RDD.sample`.
    
    ## How was this patch tested?
    
    `build/mvn -DskipTests clean package` build succeeds

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Hydrotoast/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13265.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13265
    
----
commit c7dd4c61b6d954504b6e03a58a4cdd47a5077883
Author: Gio Borje <[email protected]>
Date:   2016-05-23T20:45:52Z

    Log warnings for numIterations * miniBatchFraction < 1.0 during gradient 
descent

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to