Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1290#issuecomment-100334613
I did small test to compare new implementation performance with the
previous one.
* 8 machines (Xeon 3.3GHz 4 cores, 16GB RAM) with 7 workers total,
* mnist8m dataset, persist in memory
* Network topology 784x10 (no hidden layer = logistic regression)
* LBFGS optimizer, 40 steps, tolerance 1e-4, batch size = 100
* Accuracy on mnist test set: 0.9076
Name |Time, hh:mm:ss
--------|------------
Total time | 00:03:53
Avg step time | 00:00:06
Code (FOR the new version
https://github.com/avulanov/spark/tree/ann-interface-gemm):
```
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.mllib.ann.{FeedForwardTrainer, Topology}
import org.apache.spark.mllib.classification.ANNClassifier
val mnist = MLUtils.loadLibSVMFile(sc,
"hdfs://my.net:9000/input/mnist8m.scale").persist
val mnist784 = MLUtils.loadLibSVMFile(sc,
"hdfs://my.net:9000/input/mnist.scale.t.784").persist
val topology = Topology.multiLayerPerceptron(Array[Int](784, 10), false)
val trainer = new FeedForwardTrainer(topology, 784, 10).setBatchSize(100)
trainer.LBFGSOptimizer.setNumIterations(40).setConvergenceTol(1e-4)
val model40 = new ANNClassifier(trainer).train(mnist)
val predictionAndLabels = mnist784.map( lp =>
(model40.predict(lp.features), lp.label))
val accuracy = predictionAndLabels.map{ case(p, l) => if (p == l) 1 else
0}.sum() / predictionAndLabels.count()
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]