Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-65879536
@dbtsai Here are the results of my tests:
- Settings:
- Spark: latest Spark merged with
https://github.com/dbtsai/spark/tree/dbtsai-mlor (manual merge) and
https://github.com/avulanov/spark/tree/annclassifier. Optimizer in MLOR was
changed to LBFGS to make a correct comparison with ANN which uses LBFGS.
- Hadoop 1.2.1, dataset is loaded from hdfs
- Cluster: 6 machines Xeon 3.3GHz, 16GB RAM, each machine has 2 Spark
Workers with maximum 8GB or RAM and 2GB used, total 16 workers
- Dataset: mnist8m; classes: 10; data: 8,100,000 instances; features:
784; random split 99% train, 1% test
- Link to the dataset:
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/mnist8m.scale.bz2
- Learning settings: 40 iterations, tolerance=1e-4 (both); ANN
classifier: hidden layer `Array[Int]()` (no hidden layer - the same as
regression)
- Result
- ANN classifier: training time: 00:47:55; accuracy: 0.848
- MLOR: training time: 01:30:45; accuracy: 0.864
- Average gradient compute time (`mapPartitionsWithIndex at
RDDFunctions.scala:108`)
- ANN classifier: 51 seconds
- MLOR: 2.1 minutes
- Average update time (`reduce at RDDFunctions.scala:112`)
- ANN classifier: 90 ms
- MLOR: 90 ms
It seems that ANN is almost 2x faster (with the mentioned settings), though
accuracy is 1.6% smaller. The difference in accuracy can be explained by the
fact that ANN uses (half) squared error cost function instead of cross entropy
and no softmax. They are supposed to be better for classification.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]