[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-67083534
  
  [Test build #545 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/545/consoleFull)
 for   PR 1290 at commit 
[`5e86c5e`](https://github.com/apache/spark/commit/5e86c5edab4c58fee55ddae841f29105f62ceec4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-67083539
  
The test logs have expired...rerunning


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-15 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-67087128
  
@witgo @ankurdave Thank you for suggestion! I merged #3677 into the ANN 
code and then I run 100 iterations on mnist (60K instances). The running time 
improved only 1%. Is it expected to be so small or it should be bigger in 
different settings? Could you elaborate on how the mentioned PR might improve 
performance or this PR? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-15 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-67087246
  
@jkbradley Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-67089450
  
  [Test build #545 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/545/consoleFull)
 for   PR 1290 at commit 
[`5e86c5e`](https://github.com/apache/spark/commit/5e86c5edab4c58fee55ddae841f29105f62ceec4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-67089972
  
It looks like the error is:
```
[error] 
/home/jenkins/workspace/NewSparkPullRequestBuilder/mllib/src/test/scala/org/apache/spark/mllib/ann/ANNSuite.scala:21:
 object LocalSparkContext is not a member of package org.apache.spark.mllib.util
```
Does it compile and run locally?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-15 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-67091484
  
@jkbradley This PR references an old MLlib that had this class 
`LocalSparkContext`. It was substituted with `MLlibTestSparkContext` in one of 
the latest releases. This PR compiles locally and it is not updated to the new 
MLlib. Strange that we see such an error! Should we merge with the latest MLlib?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-15 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-67092852
  
Yes, Jenkins will test against the master branch, so I'd recommend merging 
with master (or rebasing if the merge is messy).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-13 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-66876255
  
@ankurdave  
#3677 should be able to help improve this PR performance


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-13 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-66876521
  
@avulanov I will submit a new PR about `AdaDelta` and `AdaGrad` in the next 
week
It should be able to use in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-12 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-66824545
  
@mengxr Could you suggest why the build has failed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-10 Thread bgreeven
Github user bgreeven commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r21603916
  
--- Diff: docs/mllib-ann.md ---
@@ -0,0 +1,239 @@
+---
+layout: global
+title: Artificial Neural Networks - MLlib
+displayTitle: a href=mllib-guide.htmlMLlib/a - Artificial Neural 
Networks
+---
+
+# Introduction
+
+This document describes the MLlib's Artificial Neural Network (ANN) 
implementation.
+
+The implementation currently consist of the following files:
+
+* 'ArtificialNeuralNetwork.scala': implements the ANN
+* 'ANNSuite': implements automated tests for the ANN and its gradient
+* 'ANNDemo': a demo that approximates three functions and shows a 
graphical representation of
+the result
+
+# Summary of usage
+
+The ArtificialNeuralNetwork object is used as an interface to the neural 
network. It is
+called as follows:
+
+```
+val annModel = ArtificialNeuralNetwork.train(rdd, hiddenLayersTopology, 
maxNumIterations)
+```
+
+where
+
+* `rdd` is an RDD of type (Vector,Vector), the first element containing 
the input vector and
+the second the associated output vector.
+* `hiddenLayersTopology` is an array of integers (Array[Int]), which 
contains the number of
+nodes per hidden layer, starting with the layer that takes inputs from the 
input layer, and
+finishing with the layer that outputs to the output layer. The bias nodes 
are not counted.
+* `maxNumIterations` is an upper bound to the number of iterations to be 
performed.
+* `ANNmodel` contains the trained ANN parameters, and can be used to 
calculated the ANNs
+approximation to arbitrary input values.
+
+The approximations can be calculated as follows:
+
+```
+val v_out = annModel.predict(v_in)
+```
+
+where v_in is either a Vector or an RDD of Vectors, and v_out respectively 
a Vector or RDD of
+(Vector,Vector) pairs, corresponding to input and output values.
+
+Further details and other calling options will be elaborated upon below.
+
+# Architecture and Notation
+
+The file ArtificialNeuralNetwork.scala implements the ANN. The following 
picture shows the
+architecture of a 3-layer ANN:
+
+```
+ +---+
+ |   |
+ | N_0,0 |
+ |   | 
+ +---++---+
+  |   |
+ +---+| N_0,1 |   +---+
+ |   ||   |   |   |
+ | N_1,0 |-   +---+ -| N_0,2 |
+ |   | \ Wij1  /  |   |
+ +---+  --+---+  --   +---+
+   \  |   | / Wjk2
+ :  -| N_1,1 |-  +---+
+ :|   |   |   |
+ :+---+   | N_1,2 |
+ :|   |
+ ::   +---+
+ ::
+ :::
+ :: 
+ ::   +---+
+ ::   |   |
+ ::   |N_K-1,2|
+ :|   |
+ :+---+   +---+
+ :|   |
+ :|N_J-1,1|
+  |   |
+ +---++---+
+ |   | 
+ |N_I-1,0|  
+ |   |
+ +---+
+
+ +---+++
+ |   |||
+ |   -1  ||   -1   |
+ |   |||
+ +---+++
+
+INPUT LAYER  HIDDEN LAYEROUTPUT LAYER
+```
+
+The i-th node in layer l is denoted by N_{i,l}, both i and l starting with 
0. The weight
+between node i in layer l-1 and node j in layer l is denoted by Wijl. 
Layer 0 is the input
+layer, whereas layer L is the output layer.
+
+The ANN also implements bias units. These are nodes that always output the 
value -1. The bias
+units are in all layers except the output layer. They act similar to other 
nodes, but do not
+have input.
+
+The value of node N_{j,l} is calculated  as follows:
+
+`$N_{j,l} = g( \sum_{i=0}^{topology_l} W_{i,j,l)*N_{i,l-1} )$`
+
+Where g is the sigmoid function
+
+`$g(t) = \frac{e^{\beta t} }{1+e^{\beta t}}$`
+
+# LBFGS
+
+MLlib's ANN implementation uses the LBFGS optimisation algorithm for 
training. It minimises the
+following error function:
+
+`$E = \sum_{k=0}^{K-1} (N_{k,L} - Y_k)^2$`
+
+where Y_k is the target output given inputs N_{0,0} ... N_{I-1,0}.
+
+# Implementation Details
+
+## The ArtificialNeuralNetwork class
+
+The ArtificialNeuralNetwork class has the following constructor:
+
+```
+class 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-66454787
  
  [Test build #24310 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24310/consoleFull)
 for   PR 1290 at commit 
[`5e86c5e`](https://github.com/apache/spark/commit/5e86c5edab4c58fee55ddae841f29105f62ceec4).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-66461883
  
  [Test build #24310 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24310/consoleFull)
 for   PR 1290 at commit 
[`5e86c5e`](https://github.com/apache/spark/commit/5e86c5edab4c58fee55ddae841f29105f62ceec4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OutputCanvas2D(wd: Int, ht: Int) extends Canvas `
  * `class OutputFrame2D( title: String ) extends Frame( title ) `
  * `class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends 
Canvas `
  * `class OutputFrame3D(title: String, shadowFrac: Double) extends 
Frame(title) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-66461896
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24310/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-09 Thread ZhangBanger
Github user ZhangBanger commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r21571009
  
--- Diff: docs/mllib-ann.md ---
@@ -0,0 +1,239 @@
+---
+layout: global
+title: Artificial Neural Networks - MLlib
+displayTitle: a href=mllib-guide.htmlMLlib/a - Artificial Neural 
Networks
+---
+
+# Introduction
+
+This document describes the MLlib's Artificial Neural Network (ANN) 
implementation.
+
+The implementation currently consist of the following files:
+
+* 'ArtificialNeuralNetwork.scala': implements the ANN
+* 'ANNSuite': implements automated tests for the ANN and its gradient
+* 'ANNDemo': a demo that approximates three functions and shows a 
graphical representation of
+the result
+
+# Summary of usage
+
+The ArtificialNeuralNetwork object is used as an interface to the neural 
network. It is
+called as follows:
+
+```
+val annModel = ArtificialNeuralNetwork.train(rdd, hiddenLayersTopology, 
maxNumIterations)
+```
+
+where
+
+* `rdd` is an RDD of type (Vector,Vector), the first element containing 
the input vector and
+the second the associated output vector.
+* `hiddenLayersTopology` is an array of integers (Array[Int]), which 
contains the number of
+nodes per hidden layer, starting with the layer that takes inputs from the 
input layer, and
+finishing with the layer that outputs to the output layer. The bias nodes 
are not counted.
+* `maxNumIterations` is an upper bound to the number of iterations to be 
performed.
+* `ANNmodel` contains the trained ANN parameters, and can be used to 
calculated the ANNs
+approximation to arbitrary input values.
+
+The approximations can be calculated as follows:
+
+```
+val v_out = annModel.predict(v_in)
+```
+
+where v_in is either a Vector or an RDD of Vectors, and v_out respectively 
a Vector or RDD of
+(Vector,Vector) pairs, corresponding to input and output values.
+
+Further details and other calling options will be elaborated upon below.
+
+# Architecture and Notation
+
+The file ArtificialNeuralNetwork.scala implements the ANN. The following 
picture shows the
+architecture of a 3-layer ANN:
+
+```
+ +---+
+ |   |
+ | N_0,0 |
+ |   | 
+ +---++---+
+  |   |
+ +---+| N_0,1 |   +---+
+ |   ||   |   |   |
+ | N_1,0 |-   +---+ -| N_0,2 |
+ |   | \ Wij1  /  |   |
+ +---+  --+---+  --   +---+
+   \  |   | / Wjk2
+ :  -| N_1,1 |-  +---+
+ :|   |   |   |
+ :+---+   | N_1,2 |
+ :|   |
+ ::   +---+
+ ::
+ :::
+ :: 
+ ::   +---+
+ ::   |   |
+ ::   |N_K-1,2|
+ :|   |
+ :+---+   +---+
+ :|   |
+ :|N_J-1,1|
+  |   |
+ +---++---+
+ |   | 
+ |N_I-1,0|  
+ |   |
+ +---+
+
+ +---+++
+ |   |||
+ |   -1  ||   -1   |
+ |   |||
+ +---+++
+
+INPUT LAYER  HIDDEN LAYEROUTPUT LAYER
+```
+
+The i-th node in layer l is denoted by N_{i,l}, both i and l starting with 
0. The weight
+between node i in layer l-1 and node j in layer l is denoted by Wijl. 
Layer 0 is the input
+layer, whereas layer L is the output layer.
+
+The ANN also implements bias units. These are nodes that always output the 
value -1. The bias
+units are in all layers except the output layer. They act similar to other 
nodes, but do not
+have input.
+
+The value of node N_{j,l} is calculated  as follows:
+
+`$N_{j,l} = g( \sum_{i=0}^{topology_l} W_{i,j,l)*N_{i,l-1} )$`
+
+Where g is the sigmoid function
+
+`$g(t) = \frac{e^{\beta t} }{1+e^{\beta t}}$`
+
+# LBFGS
+
+MLlib's ANN implementation uses the LBFGS optimisation algorithm for 
training. It minimises the
+following error function:
+
+`$E = \sum_{k=0}^{K-1} (N_{k,L} - Y_k)^2$`
+
+where Y_k is the target output given inputs N_{0,0} ... N_{I-1,0}.
+
+# Implementation Details
+
+## The ArtificialNeuralNetwork class
+
+The ArtificialNeuralNetwork class has the following constructor:
+
+```
+class 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-05 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-65880449
  
@jkbradley @manishamde I did performance comparisons with multinomial 
regression and posted them here: 
https://github.com/apache/spark/pull/1379#issuecomment-65879536. Suggestions 
are very welcome.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-04 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-65731474
  
I've submitted a feature request to Spark Jira regarding the ANN-based 
classifier: https://issues.apache.org/jira/browse/SPARK-4752. It is implemented 
already and I would like to share it with community. However ANN PR is not in 
the main branch, so I cannot perform a valid PR for the classifier code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-12-02 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-65264353
  
Just in case, I've tested the ANN without the hidden layer and it seems to 
work as multinomial regression, though with (one-half) squared-error cost 
function and without softmax output. One needs to pass an empty array 
`Array[Int]()` as a hidden layer parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-24 Thread Lewuathe
Github user Lewuathe commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-64176809
  
@avulanov What type of resource manager did you use for checking 
performance?
I'm trying to look into the scalability how performance will increase 
depending of the size of cluster. So as a reference, I want to know what type 
of resource manger(YARN, Mesos or standalone) was used. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-24 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-64283686
  
@manishamde Thanks for the useful references! It seems that model 
parallelization for ANN is a challenging problem. I asked this question to few 
presenters on the recent AMP CAMP and they confirm this point given that 
present MLlib interfaces are not very well suited for this task. Moreover, 
there will be a huge communication overhead during the update step for big 
models that can still fit into memory. I took a look at the other algorithms 
rather than back propagation listed in this paper: 
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=393138tag=1. A number of 
models needs to be evaluated in genetic algorithm which even hardens the task. 
Simulated annealing which is a global optimization routine seems to be more 
promising. However, with the model distributed across several nodes one needs 
to copy data points to all nodes that store the model. I suggest to stick with 
the current implementation until one finds a clear and better approach. Does it 
make sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-24 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-64285217
  
@jkbradley Thank you for good explanation! As for (1), single threaded ANN 
implemented in C++ shows similar accuracy. (2) seems like some additional 
coding needs to be done. I'll plan to work on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-24 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-64285419
  
@Lewuathe Ganglia is used to monitor the cluster performance and it shows 
that cores are busy all the  way and memory underutilized. No resource manager 
is used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-19 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r20589805
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala 
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.ann
+
+import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy}
+
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.optimization._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/*
+ * Implements a Artificial Neural Network (ANN)
+ *
+ * The data consists of an input vector and an output vector, combined 
into a single vector
+ * as follows:
+ *
+ * [ ---input--- ---output--- ]
+ *
+ * NOTE: output values should be in the range [0,1]
+ *
+ * For a network of H hidden layers:
+ *
+ * hiddenLayersTopology(h) indicates the number of nodes in hidden layer 
h, excluding the bias
+ * node. h counts from 0 (first hidden layer, taking inputs from input 
layer) to H - 1 (last
+ * hidden layer, sending outputs to the output layer).
+ *
+ * hiddenLayersTopology is converted internally to topology, which adds 
the number of nodes
+ * in the input and output layers.
+ *
+ * noInput = topology(0), the number of input nodes
+ * noOutput = topology(L-1), the number of output nodes
+ *
+ * input = data( 0 to noInput-1 )
+ * output = data( noInput to noInput + noOutput - 1 )
+ *
+ * W_ijl is the weight from node i in layer l-1 to node j in layer l
+ * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the 
weights vector
+ *
+ * B_jl is the bias input of node j in layer l
+ * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + 
topology(l-1) in the weights vector
+ *
+ * error function: E( O, Y ) = sum( O_j - Y_j )
+ * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN,
+ * and (Y_0, ..., Y_(noOutput-1)) the input)
+ *
+ * node_jl is node j in layer l
+ * node_jl goes to position ofsNode(l) + j
+ *
+ * The weights gradient is defined as dE/dW_ijl and dE/dB_jl
+ * It has same mapping as W_ijl and B_jl
+ *
+ * For back propagation:
+ * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before 
applying the sigmoid
+ * delta_jl has the same mapping as node_jl
+ *
+ * Where E = ((estOutput-output),(estOutput-output)),
+ * the inner product of the difference between estimation and target 
output with itself.
+ *
+ */
+
+/**
+ * Artificial neural network (ANN) model
+ *
+ * @param weights the weights between the neurons in the ANN.
+ * @param topology array containing the number of nodes per layer in the 
network, including
+ * the nodes in the input and output layer, but excluding the bias nodes.
+ */
+class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val 
topology: Array[Int])
+  extends Serializable with ANNHelper {
+
+  /**
+   * Predicts values for a single data point using the trained model.
+   *
+   * @param testData represents a single data point.
+   * @return prediction using the trained model.
+   */
+  def predict(testData: Vector): Vector = {
+Vectors.dense(computeValues(testData.toArray, weights.toArray))
+  }
+
+  /**
+   * Predict values for an RDD of data points using the trained model.
+   *
+   * @param testDataRDD RDD representing the input vectors.
+   * @return RDD with predictions using the trained model as (input, 
output) pairs.
+   */
+  def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = {
+testDataRDD.map(T = (T, predict(T)) )
+  }
+
+  private def computeValues(arrData: Array[Double], arrWeights: 
Array[Double]): Array[Double] = {
+val arrNodes = forwardRun(arrData, arrWeights)
+arrNodes.slice(arrNodes.size - topology(L), 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r20591105
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala 
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.ann
+
+import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy}
+
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.optimization._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/*
+ * Implements a Artificial Neural Network (ANN)
+ *
+ * The data consists of an input vector and an output vector, combined 
into a single vector
+ * as follows:
+ *
+ * [ ---input--- ---output--- ]
+ *
+ * NOTE: output values should be in the range [0,1]
+ *
+ * For a network of H hidden layers:
+ *
+ * hiddenLayersTopology(h) indicates the number of nodes in hidden layer 
h, excluding the bias
+ * node. h counts from 0 (first hidden layer, taking inputs from input 
layer) to H - 1 (last
+ * hidden layer, sending outputs to the output layer).
+ *
+ * hiddenLayersTopology is converted internally to topology, which adds 
the number of nodes
+ * in the input and output layers.
+ *
+ * noInput = topology(0), the number of input nodes
+ * noOutput = topology(L-1), the number of output nodes
+ *
+ * input = data( 0 to noInput-1 )
+ * output = data( noInput to noInput + noOutput - 1 )
+ *
+ * W_ijl is the weight from node i in layer l-1 to node j in layer l
+ * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the 
weights vector
+ *
+ * B_jl is the bias input of node j in layer l
+ * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + 
topology(l-1) in the weights vector
+ *
+ * error function: E( O, Y ) = sum( O_j - Y_j )
+ * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN,
+ * and (Y_0, ..., Y_(noOutput-1)) the input)
+ *
+ * node_jl is node j in layer l
+ * node_jl goes to position ofsNode(l) + j
+ *
+ * The weights gradient is defined as dE/dW_ijl and dE/dB_jl
+ * It has same mapping as W_ijl and B_jl
+ *
+ * For back propagation:
+ * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before 
applying the sigmoid
+ * delta_jl has the same mapping as node_jl
+ *
+ * Where E = ((estOutput-output),(estOutput-output)),
+ * the inner product of the difference between estimation and target 
output with itself.
+ *
+ */
+
+/**
+ * Artificial neural network (ANN) model
+ *
+ * @param weights the weights between the neurons in the ANN.
+ * @param topology array containing the number of nodes per layer in the 
network, including
+ * the nodes in the input and output layer, but excluding the bias nodes.
+ */
+class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val 
topology: Array[Int])
+  extends Serializable with ANNHelper {
+
+  /**
+   * Predicts values for a single data point using the trained model.
+   *
+   * @param testData represents a single data point.
+   * @return prediction using the trained model.
+   */
+  def predict(testData: Vector): Vector = {
+Vectors.dense(computeValues(testData.toArray, weights.toArray))
+  }
+
+  /**
+   * Predict values for an RDD of data points using the trained model.
+   *
+   * @param testDataRDD RDD representing the input vectors.
+   * @return RDD with predictions using the trained model as (input, 
output) pairs.
+   */
+  def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = {
+testDataRDD.map(T = (T, predict(T)) )
+  }
+
+  private def computeValues(arrData: Array[Double], arrWeights: 
Array[Double]): Array[Double] = {
+val arrNodes = forwardRun(arrData, arrWeights)
+arrNodes.slice(arrNodes.size - topology(L), 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-19 Thread bgreeven
Github user bgreeven commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r20620263
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala 
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.ann
+
+import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy}
+
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.optimization._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/*
+ * Implements a Artificial Neural Network (ANN)
+ *
+ * The data consists of an input vector and an output vector, combined 
into a single vector
+ * as follows:
+ *
+ * [ ---input--- ---output--- ]
+ *
+ * NOTE: output values should be in the range [0,1]
+ *
+ * For a network of H hidden layers:
+ *
+ * hiddenLayersTopology(h) indicates the number of nodes in hidden layer 
h, excluding the bias
+ * node. h counts from 0 (first hidden layer, taking inputs from input 
layer) to H - 1 (last
+ * hidden layer, sending outputs to the output layer).
+ *
+ * hiddenLayersTopology is converted internally to topology, which adds 
the number of nodes
+ * in the input and output layers.
+ *
+ * noInput = topology(0), the number of input nodes
+ * noOutput = topology(L-1), the number of output nodes
+ *
+ * input = data( 0 to noInput-1 )
+ * output = data( noInput to noInput + noOutput - 1 )
+ *
+ * W_ijl is the weight from node i in layer l-1 to node j in layer l
+ * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the 
weights vector
+ *
+ * B_jl is the bias input of node j in layer l
+ * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + 
topology(l-1) in the weights vector
+ *
+ * error function: E( O, Y ) = sum( O_j - Y_j )
+ * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN,
+ * and (Y_0, ..., Y_(noOutput-1)) the input)
+ *
+ * node_jl is node j in layer l
+ * node_jl goes to position ofsNode(l) + j
+ *
+ * The weights gradient is defined as dE/dW_ijl and dE/dB_jl
+ * It has same mapping as W_ijl and B_jl
+ *
+ * For back propagation:
+ * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before 
applying the sigmoid
+ * delta_jl has the same mapping as node_jl
+ *
+ * Where E = ((estOutput-output),(estOutput-output)),
+ * the inner product of the difference between estimation and target 
output with itself.
+ *
+ */
+
+/**
+ * Artificial neural network (ANN) model
+ *
+ * @param weights the weights between the neurons in the ANN.
+ * @param topology array containing the number of nodes per layer in the 
network, including
+ * the nodes in the input and output layer, but excluding the bias nodes.
+ */
+class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val 
topology: Array[Int])
+  extends Serializable with ANNHelper {
+
+  /**
+   * Predicts values for a single data point using the trained model.
+   *
+   * @param testData represents a single data point.
+   * @return prediction using the trained model.
+   */
+  def predict(testData: Vector): Vector = {
+Vectors.dense(computeValues(testData.toArray, weights.toArray))
+  }
+
+  /**
+   * Predict values for an RDD of data points using the trained model.
+   *
+   * @param testDataRDD RDD representing the input vectors.
+   * @return RDD with predictions using the trained model as (input, 
output) pairs.
+   */
+  def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = {
+testDataRDD.map(T = (T, predict(T)) )
+  }
+
+  private def computeValues(arrData: Array[Double], arrWeights: 
Array[Double]): Array[Double] = {
+val arrNodes = forwardRun(arrData, arrWeights)
+arrNodes.slice(arrNodes.size - topology(L), 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-17 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-63371703
  
@jkbradley (a) I found few papers with timing results, however with it was 
with SVM. Moreover, the results differ from minutes to hours. Could you suggest 
a paper with performance results of ANN on mnist8m? (b) I can compare against 
single threaded C++ implementation. I think it will be much faster. Probably 
the size of data is not enough to run it in distributed mode. Probably, we need 
some dataset that cannot be used on a single machine. (c) Do you mean comparing 
with other machine learners in Spark MLlib? To the best of my knowledge, there 
is only one multiclass machine learner in MLlib, it is Bayesian. It is no 
supposed to work with negative features.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-17 Thread manishamde
Github user manishamde commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-63375342
  
@avulanov Thanks for conducting the experiments. Could you plot graphs for 
the experiments that you conducted with changing number of features and number 
of machines. It will be good to understand weak scaling (scaling #machines with 
the size of the dataset) and strong scaling (fixed size dataset with additional 
machines machine added for speedup) performance. You could look at [strong 
scaling](https://github.com/apache/spark/pull/79) experiments that @etrain 
performed for the first decision tree PR for reference. 

Also, could you compare the accuracy with similar implementation in Python 
or R?

Finally, Decision trees and random forests support multiclass 
classification.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-17 Thread manishamde
Github user manishamde commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-63375573
  
I found this reference recently about Netflix's distributed implementation 
of neural nets that could be relevant for MLlib. 
http://techblog.netflix.com/2014/02/distributed-neural-networks-with-gpus.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-17 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-63401446
  
@avulanov I'll look around for papers which might allow for comparisons; 
I'm not sure offhand.

For experiments, I agree with @manishamde may getting at---dividing into 2 
types of tests:
(1) accuracy tests: Here, comparing with single-threaded implementations on 
small datasets sounds fine.
(2) scaling tests: By self-speedup tests, I was referring to the scaling 
tests which @manishamde is mentioning above.  Comparing this ANN implementation 
with itself to see how it scales in various ways: increasing # examples, # 
nodes in the model, # machines, etc.  That might let us spot bottlenecks or 
inefficiencies, even if there aren't good alternate implementations available 
for comparison.

If I find papers on distributed implementations referencing code available, 
I'll be sure to post here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-15 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-63196649
  
@avulanov  Thanks for running tests!  It would be great if we could 
calibrate tests somehow.  Some options would be: (a) trying to reproduce tests 
in a published paper which has timing results + info about the distributed 
system they used, (b) testing against another implementation (though I'm not 
sure about the availability of comparable distributed implementations), or (c) 
doing self-speedup tests to understand scalability.  What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-12 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62783988
  
@debasish83 I agree, such setup will not fit into the memory of a singe 
node. Are you talking about recommender system? As far as I know, they are 
usually addressed by different models and algorithms rather than ANN and 
back-propagation. Could you suggest other problems with such a big model size 
that are usually solved with ANN? Probably, the data should have millions of 
features, such as weather forecasting. Just wondering...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-12 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62823148
  
@mengxr I've performed a test on mnist8m with our implementation of ANN as 
you suggested. I used a 5-node cluster. Each node has Xeon 3.3GHz 4 cores with 
16GB RAM. My Spark Setup was that each node runs 4 Workers with 3 GB Ram and 1 
core, total 20 Workers. 99.9% of data was used for train, remaining - for test 
(random split). I got an error oscillating around 4% after 25 iterations. Each 
iteration is 30 minutes on average (ranges from 10 to 50 minutes). Could you 
suggest if such testing is enough or you would like me to produce some specific 
measurements?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-12 Thread debasish83
Github user debasish83 commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62824356
  
@avulanov I meant first sparse auto encoder with 1 layer and followed by a 
deep net...these huge nets are mostly for automated feature extraction...of 
course we can do sparse coding using ALS design but I am not sure if  X - W'H 
formulation in ALS is as effective as X - f(XW)H even if we pass W'H from ALS 
through some non-linear function...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r20181004
  
--- Diff: docs/mllib-ann.md ---
@@ -0,0 +1,223 @@
+---
+layout: global
+title: Artificial Neural Networks - MLlib
+displayTitle: a href=mllib-guide.htmlMLlib/a - Artificial Neural 
Networks
+---
+
+# Introduction
+
+This document describes the MLlib's Artificial Neural Network (ANN) 
implementation.
+
+The implementation currently consist of the following files:
+
+* 'ArtificialNeuralNetwork.scala': implements the ANN
+* 'ANNSuite': implements automated tests for the ANN and its gradient
+* 'ANNDemo': a demo that approximates three functions and shows a 
graphical representation of
+the result
+
+# Summary of usage
+
+The ArtificialNeuralNetwork object is used as an interface to the neural 
network. It is
+called as follows:
+
+```
+val annModel = ArtificialNeuralNetwork.train(rdd, hiddenLayersTopology, 
maxNumIterations)
--- End diff --

@manishamde  The upcoming API may make it a bit easier, but the current API 
here could support default parameters via a builder pattern for parameters.  
I'll take a closer look at this PR!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-11 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62623318
  
@bgreeven @avulanov About supporting various optimizer, updater, gradient, 
and error function options, I vote for keeping them as parameters, rather than 
having different versions of the class with different names.  (The 
LogisticRegressionWithX pattern is really awkward and not sustainable if we add 
more optimizers.)  I vote for separating these into (a) optimizer, (b) 
regularization type (translating to updater internally), and (c) error function 
(where the gradient should be paired with the error function, not a separate 
parameter).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-08 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62254979
  
I agree with what @debasish83 said. We should find a suitable solution to 
weight matrix distributed storage. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-07 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62187026
  
@witgo Thank you for your suggestion! Could you elaborate how als algorithm 
design could be used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-07 Thread debasish83
Github user debasish83 commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62190595
  
For matrix factorization we have user x product sparse matrix...You can 
think of this sparse matrix as the feature matrix for ANN...Now consider two 
matrices H1 and H2 of size feature x rank...where rank is the number of hidden 
layers...With this the problem is minimize || X - f(H1'X)H2 || + lambdaL1(H1) + 
lambdaL2(H2)

The major difference is can H1'X breaks the way matrix factorization breaks 
? If it can then we should be able to use ALS design...or an extension of ALS 
design...

But say the hidden layer grows from 1 to 10 (Latest Google paper mentioned 
22 layers)...then I don't think this idea works...we have to formulate the 
problem on graphx where the model is distributed over workers and not built on 
Master  

@witgo you think we can break f(H1'X) in ALS way? I have not thought more 
on it !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-07 Thread debasish83
Github user debasish83 commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62191470
  
f is neural activation...it can be tanh or sigmoid function (they are 
non-convex, nonlinear) , LRU units (max is convex)...in this PR 
https://github.com/apache/spark/pull/2705 I am experimenting with convex and 
nonlinear functions for matrix factorization loss..Idea is to use the gradient 
interfaces for the loss functions...if f(H1'X) can break component wise we can 
re-use lot of ALS development...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-07 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62195511
  
I think I am missing something from your description. In ANN we need to 
compute K+1 weight matrices: W1(NxH1), W2(H1xH2), ... WK(HKxM), where N is the 
number of inputs (size of feature space), M is the number of outputs, K is the 
number of hidden layers, Hi is the size of i-th hidden layer. (Also there are 
bias vectors but we can forget about them for the sake of simplicity). Could 
you suggest why do you have only two matrices in your description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-07 Thread debasish83
Github user debasish83 commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62197587
  
I am considering 1 hidden layer ANN right now...For multiple layers I have 
not thought on it since to scale it (if you want the model to be distributed) I 
don't think ALS design fits...We need something through graphx APIs...have to 
think more...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-07 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62199347
  
I think before doing this we need to consider the benefits versus the 
current implementation with loops. I can name one - the readability of code 
will be better with matrices. Could you suggest others? The  scalability is not 
an issue until all weight matrices (unrolled into vector for 'Gradient`) fit 
into memory of one node.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-07 Thread debasish83
Github user debasish83 commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62247730
  
Our sparse data very often has ~10M users and 1M features...for 1 hidden 
layer net with 10K nodes, we need 10M x 10K + 1M x 10K doubles...if the model 
is not distributed (like ALS design) it's not possible to put such complicated 
model into memory of one node...having said that it makes sense to have a 
baseline implementation so that it can be used as a reference for further 
enhancements...also as a auto encoder, neural nets should generate better 
results than sparse coding (which we can do in mllib through the PR) due to the 
non-linearity of hidden units...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-06 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62085010
  
We should use matrix to calculate the forward propagation ,back propagation 
see 
http://deeplearning.stanford.edu/wiki/index.php/Neural_Network_Vectorization


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-06 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62086344
  
@witgo I have implemented ANN in matrix form for Spark with breeze a while 
ago: https://github.com/avulanov/spark/tree/neuralnetwork. We tested it with 
@bgreeven and it was less effective than loop-based due to overhead of matrices 
rolling/unrolling. The latter is needed because `Gradient` class does not allow 
passing matrices. I also tried plugging native BLAS/LAPACK into breeze but it 
didn't deliver better performance probably because loops are well-optimized in 
case of vector-matrix multiplication. Do you think we still should use 
vectorized form for better code readability?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-06 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-62089562
  
We can not use existing Gradient classes,Let the whole iterative process is 
completed in the form of matrix calculation.Moreover We can use the als 
algorithm design, cut the matrix into the appropriate pieces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-05 Thread manishamde
Github user manishamde commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19922521
  
--- Diff: docs/mllib-ann.md ---
@@ -0,0 +1,223 @@
+---
+layout: global
+title: Artificial Neural Networks - MLlib
+displayTitle: a href=mllib-guide.htmlMLlib/a - Artificial Neural 
Networks
+---
+
+# Introduction
+
+This document describes the MLlib's Artificial Neural Network (ANN) 
implementation.
+
+The implementation currently consist of the following files:
+
+* 'ArtificialNeuralNetwork.scala': implements the ANN
+* 'ANNSuite': implements automated tests for the ANN and its gradient
+* 'ANNDemo': a demo that approximates three functions and shows a 
graphical representation of
+the result
+
+# Summary of usage
+
+The ArtificialNeuralNetwork object is used as an interface to the neural 
network. It is
+called as follows:
+
+```
+val annModel = ArtificialNeuralNetwork.train(rdd, hiddenLayersTopology, 
maxNumIterations)
--- End diff --

@bgreeven Thanks for the clarification.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-04 Thread bgreeven
Github user bgreeven commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-61749598
  
Let's discuss a bit more about making the optimiser, updater, gradient, and 
error function customizable.

Notice that for the current LBFGS algorithm, the error function is used 
both for the gradient (as the error function is minimized), which is used in 
the updater and optimizer. Hence for a pluggable error function, the gradient 
needs to be pluggable.

I think there would be value in making the updater and optimizer 
pluggable too. For the optimizers we have already seen the candidates LBFGS and 
SGD, both with their pres and contras. Also, there may be other optimizers that 
use something else than the gradient. Since the updater currently depends on 
the gradient, I suggest to make it pluggable too. (I played around a bit with a 
genetic optimizer - doesn't work very well but is an example of an optimizer 
that doesn't use the gradient.)

Maybe we can start with making the optimizer, gradient and updater in 
the ArtificialNeuralNetwork class vars instead of vals. Then we can create a 
different ANN object for each optimizer, gradient and updater 
combination, e.g. ArtificialNeuralNetworkWithLBFGS. We also need to remove 
the convergenceTol from the ArtificialNeuralNetwork constructor, since that is 
LBFGS specific.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-04 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-61757505
  
I think that these 3 parameters should be somehow bound otherwise one can 
plug a gradient with vector length that does not correspond to the ANN size. We 
could provide a fabric of correct gradients, or, what is better, to create a 
`trait ANNGradient` that must be used for any ANN gradient. It should have few 
functions that allow setting error function for example. However, some of ML 
algorithms with specific optimization are separate classes in MLlib, such as 
`SVMWithSGD`. If we follow this route we can create abstract `trait ANN` with 
vals of optimizer, gradient and updater that have to be initialized somehow in 
the descendants. We'll have one descendant - `ANNWithLBFGS` - the current 
implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-04 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-61757723
  
With regards to performance tests, I am going to use mnist8m
(http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html) as 
suggested by @mengxr. I have set up a cluster with Apache Spark that has five 
machines with 4-core Xeon 3.3Ghz 8MB cache with 16GB RAM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-03 Thread bgreeven
Github user bgreeven commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19722711
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala 
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.ann
+
+import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy}
+
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.optimization._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/*
+ * Implements a Artificial Neural Network (ANN)
+ *
+ * The data consists of an input vector and an output vector, combined 
into a single vector
+ * as follows:
+ *
+ * [ ---input--- ---output--- ]
+ *
+ * NOTE: output values should be in the range [0,1]
+ *
+ * For a network of H hidden layers:
+ *
+ * hiddenLayersTopology(h) indicates the number of nodes in hidden layer 
h, excluding the bias
+ * node. h counts from 0 (first hidden layer, taking inputs from input 
layer) to H - 1 (last
+ * hidden layer, sending outputs to the output layer).
+ *
+ * hiddenLayersTopology is converted internally to topology, which adds 
the number of nodes
+ * in the input and output layers.
+ *
+ * noInput = topology(0), the number of input nodes
+ * noOutput = topology(L-1), the number of output nodes
+ *
+ * input = data( 0 to noInput-1 )
+ * output = data( noInput to noInput + noOutput - 1 )
+ *
+ * W_ijl is the weight from node i in layer l-1 to node j in layer l
+ * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the 
weights vector
+ *
+ * B_jl is the bias input of node j in layer l
+ * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + 
topology(l-1) in the weights vector
+ *
+ * error function: E( O, Y ) = sum( O_j - Y_j )
+ * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN,
+ * and (Y_0, ..., Y_(noOutput-1)) the input)
+ *
+ * node_jl is node j in layer l
+ * node_jl goes to position ofsNode(l) + j
+ *
+ * The weights gradient is defined as dE/dW_ijl and dE/dB_jl
+ * It has same mapping as W_ijl and B_jl
+ *
+ * For back propagation:
+ * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before 
applying the sigmoid
+ * delta_jl has the same mapping as node_jl
+ *
+ * Where E = ((estOutput-output),(estOutput-output)),
+ * the inner product of the difference between estimation and target 
output with itself.
+ *
+ */
+
+/**
+ * Artificial neural network (ANN) model
+ *
+ * @param weights the weights between the neurons in the ANN.
+ * @param topology array containing the number of nodes per layer in the 
network, including
+ * the nodes in the input and output layer, but excluding the bias nodes.
+ */
+class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val 
topology: Array[Int])
+  extends Serializable with ANNHelper {
+
+  /**
+   * Predicts values for a single data point using the trained model.
+   *
+   * @param testData represents a single data point.
+   * @return prediction using the trained model.
+   */
+  def predict(testData: Vector): Vector = {
+Vectors.dense(computeValues(testData.toArray, weights.toArray))
+  }
+
+  /**
+   * Predict values for an RDD of data points using the trained model.
+   *
+   * @param testDataRDD RDD representing the input vectors.
+   * @return RDD with predictions using the trained model as (input, 
output) pairs.
+   */
+  def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = {
+testDataRDD.map(T = (T, predict(T)) )
+  }
+
+  private def computeValues(arrData: Array[Double], arrWeights: 
Array[Double]): Array[Double] = {
+val arrNodes = forwardRun(arrData, arrWeights)
+arrNodes.slice(arrNodes.size - topology(L), 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-03 Thread bgreeven
Github user bgreeven commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19723239
  
--- Diff: docs/mllib-ann.md ---
@@ -0,0 +1,223 @@
+---
+layout: global
+title: Artificial Neural Networks - MLlib
+displayTitle: a href=mllib-guide.htmlMLlib/a - Artificial Neural 
Networks
+---
+
+# Introduction
+
+This document describes the MLlib's Artificial Neural Network (ANN) 
implementation.
+
+The implementation currently consist of the following files:
+
+* 'ArtificialNeuralNetwork.scala': implements the ANN
+* 'ANNSuite': implements automated tests for the ANN and its gradient
+* 'ANNDemo': a demo that approximates three functions and shows a 
graphical representation of
+the result
+
+# Summary of usage
+
+The ArtificialNeuralNetwork object is used as an interface to the neural 
network. It is
+called as follows:
+
+```
+val annModel = ArtificialNeuralNetwork.train(rdd, hiddenLayersTopology, 
maxNumIterations)
--- End diff --

@manishamde: In most cases, one hidden layer is enough. For some special 
functions two hidden layers are needed. This is a nice text about the choice of 
number of layers and number of nodes per layer:
http://www.heatonresearch.com/node/707
Especially the number of nodes per layer depends heavily on the particular 
problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-03 Thread bgreeven
Github user bgreeven commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19733377
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala 
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.ann
+
+import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy}
+
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.optimization._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/*
+ * Implements a Artificial Neural Network (ANN)
+ *
+ * The data consists of an input vector and an output vector, combined 
into a single vector
+ * as follows:
+ *
+ * [ ---input--- ---output--- ]
+ *
+ * NOTE: output values should be in the range [0,1]
+ *
+ * For a network of H hidden layers:
+ *
+ * hiddenLayersTopology(h) indicates the number of nodes in hidden layer 
h, excluding the bias
+ * node. h counts from 0 (first hidden layer, taking inputs from input 
layer) to H - 1 (last
+ * hidden layer, sending outputs to the output layer).
+ *
+ * hiddenLayersTopology is converted internally to topology, which adds 
the number of nodes
+ * in the input and output layers.
+ *
+ * noInput = topology(0), the number of input nodes
+ * noOutput = topology(L-1), the number of output nodes
+ *
+ * input = data( 0 to noInput-1 )
+ * output = data( noInput to noInput + noOutput - 1 )
+ *
+ * W_ijl is the weight from node i in layer l-1 to node j in layer l
+ * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the 
weights vector
+ *
+ * B_jl is the bias input of node j in layer l
+ * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + 
topology(l-1) in the weights vector
+ *
+ * error function: E( O, Y ) = sum( O_j - Y_j )
+ * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN,
+ * and (Y_0, ..., Y_(noOutput-1)) the input)
+ *
+ * node_jl is node j in layer l
+ * node_jl goes to position ofsNode(l) + j
+ *
+ * The weights gradient is defined as dE/dW_ijl and dE/dB_jl
+ * It has same mapping as W_ijl and B_jl
+ *
+ * For back propagation:
+ * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before 
applying the sigmoid
+ * delta_jl has the same mapping as node_jl
+ *
+ * Where E = ((estOutput-output),(estOutput-output)),
+ * the inner product of the difference between estimation and target 
output with itself.
+ *
+ */
+
+/**
+ * Artificial neural network (ANN) model
+ *
+ * @param weights the weights between the neurons in the ANN.
+ * @param topology array containing the number of nodes per layer in the 
network, including
+ * the nodes in the input and output layer, but excluding the bias nodes.
+ */
+class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val 
topology: Array[Int])
+  extends Serializable with ANNHelper {
+
+  /**
+   * Predicts values for a single data point using the trained model.
+   *
+   * @param testData represents a single data point.
+   * @return prediction using the trained model.
+   */
+  def predict(testData: Vector): Vector = {
+Vectors.dense(computeValues(testData.toArray, weights.toArray))
+  }
+
+  /**
+   * Predict values for an RDD of data points using the trained model.
+   *
+   * @param testDataRDD RDD representing the input vectors.
+   * @return RDD with predictions using the trained model as (input, 
output) pairs.
+   */
+  def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = {
+testDataRDD.map(T = (T, predict(T)) )
+  }
+
+  private def computeValues(arrData: Array[Double], arrWeights: 
Array[Double]): Array[Double] = {
+val arrNodes = forwardRun(arrData, arrWeights)
+arrNodes.slice(arrNodes.size - topology(L), 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-03 Thread bgreeven
Github user bgreeven commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19733442
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala 
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.ann
+
+import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy}
+
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.optimization._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/*
+ * Implements a Artificial Neural Network (ANN)
+ *
+ * The data consists of an input vector and an output vector, combined 
into a single vector
+ * as follows:
+ *
+ * [ ---input--- ---output--- ]
+ *
+ * NOTE: output values should be in the range [0,1]
+ *
+ * For a network of H hidden layers:
+ *
+ * hiddenLayersTopology(h) indicates the number of nodes in hidden layer 
h, excluding the bias
+ * node. h counts from 0 (first hidden layer, taking inputs from input 
layer) to H - 1 (last
+ * hidden layer, sending outputs to the output layer).
+ *
+ * hiddenLayersTopology is converted internally to topology, which adds 
the number of nodes
+ * in the input and output layers.
+ *
+ * noInput = topology(0), the number of input nodes
+ * noOutput = topology(L-1), the number of output nodes
+ *
+ * input = data( 0 to noInput-1 )
+ * output = data( noInput to noInput + noOutput - 1 )
+ *
+ * W_ijl is the weight from node i in layer l-1 to node j in layer l
+ * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the 
weights vector
+ *
+ * B_jl is the bias input of node j in layer l
+ * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + 
topology(l-1) in the weights vector
+ *
+ * error function: E( O, Y ) = sum( O_j - Y_j )
+ * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN,
+ * and (Y_0, ..., Y_(noOutput-1)) the input)
+ *
+ * node_jl is node j in layer l
+ * node_jl goes to position ofsNode(l) + j
+ *
+ * The weights gradient is defined as dE/dW_ijl and dE/dB_jl
+ * It has same mapping as W_ijl and B_jl
+ *
+ * For back propagation:
+ * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before 
applying the sigmoid
+ * delta_jl has the same mapping as node_jl
+ *
+ * Where E = ((estOutput-output),(estOutput-output)),
+ * the inner product of the difference between estimation and target 
output with itself.
+ *
+ */
+
+/**
+ * Artificial neural network (ANN) model
+ *
+ * @param weights the weights between the neurons in the ANN.
+ * @param topology array containing the number of nodes per layer in the 
network, including
+ * the nodes in the input and output layer, but excluding the bias nodes.
+ */
+class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val 
topology: Array[Int])
+  extends Serializable with ANNHelper {
+
+  /**
+   * Predicts values for a single data point using the trained model.
+   *
+   * @param testData represents a single data point.
+   * @return prediction using the trained model.
+   */
+  def predict(testData: Vector): Vector = {
+Vectors.dense(computeValues(testData.toArray, weights.toArray))
+  }
+
+  /**
+   * Predict values for an RDD of data points using the trained model.
+   *
+   * @param testDataRDD RDD representing the input vectors.
+   * @return RDD with predictions using the trained model as (input, 
output) pairs.
+   */
+  def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = {
+testDataRDD.map(T = (T, predict(T)) )
+  }
+
+  private def computeValues(arrData: Array[Double], arrWeights: 
Array[Double]): Array[Double] = {
+val arrNodes = forwardRun(arrData, arrWeights)
+arrNodes.slice(arrNodes.size - topology(L), 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-61480481
  
  [Test build #22818 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22818/consoleFull)
 for   PR 1290 at commit 
[`73759d5`](https://github.com/apache/spark/commit/73759d5bc587216c2484e9ebbfaac1fe0ab78bfb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-61481496
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22817/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-61499579
  
**[Test build #22818 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22818/consoleFull)**
 for PR 1290 at commit 
[`73759d5`](https://github.com/apache/spark/commit/73759d5bc587216c2484e9ebbfaac1fe0ab78bfb)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-61499597
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22818/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-61584664
  
  [Test build #22851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22851/consoleFull)
 for   PR 1290 at commit 
[`73759d5`](https://github.com/apache/spark/commit/73759d5bc587216c2484e9ebbfaac1fe0ab78bfb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-61590143
  
  [Test build #22851 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22851/consoleFull)
 for   PR 1290 at commit 
[`73759d5`](https://github.com/apache/spark/commit/73759d5bc587216c2484e9ebbfaac1fe0ab78bfb).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-61590149
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22851/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19717255
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala 
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.ann
+
+import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy}
+
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.optimization._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/*
+ * Implements a Artificial Neural Network (ANN)
+ *
+ * The data consists of an input vector and an output vector, combined 
into a single vector
+ * as follows:
+ *
+ * [ ---input--- ---output--- ]
+ *
+ * NOTE: output values should be in the range [0,1]
+ *
+ * For a network of H hidden layers:
+ *
+ * hiddenLayersTopology(h) indicates the number of nodes in hidden layer 
h, excluding the bias
+ * node. h counts from 0 (first hidden layer, taking inputs from input 
layer) to H - 1 (last
+ * hidden layer, sending outputs to the output layer).
+ *
+ * hiddenLayersTopology is converted internally to topology, which adds 
the number of nodes
+ * in the input and output layers.
+ *
+ * noInput = topology(0), the number of input nodes
+ * noOutput = topology(L-1), the number of output nodes
+ *
+ * input = data( 0 to noInput-1 )
+ * output = data( noInput to noInput + noOutput - 1 )
+ *
+ * W_ijl is the weight from node i in layer l-1 to node j in layer l
+ * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the 
weights vector
+ *
+ * B_jl is the bias input of node j in layer l
+ * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + 
topology(l-1) in the weights vector
+ *
+ * error function: E( O, Y ) = sum( O_j - Y_j )
+ * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN,
+ * and (Y_0, ..., Y_(noOutput-1)) the input)
+ *
+ * node_jl is node j in layer l
+ * node_jl goes to position ofsNode(l) + j
+ *
+ * The weights gradient is defined as dE/dW_ijl and dE/dB_jl
+ * It has same mapping as W_ijl and B_jl
+ *
+ * For back propagation:
+ * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before 
applying the sigmoid
+ * delta_jl has the same mapping as node_jl
+ *
+ * Where E = ((estOutput-output),(estOutput-output)),
+ * the inner product of the difference between estimation and target 
output with itself.
+ *
+ */
+
+/**
+ * Artificial neural network (ANN) model
+ *
+ * @param weights the weights between the neurons in the ANN.
+ * @param topology array containing the number of nodes per layer in the 
network, including
+ * the nodes in the input and output layer, but excluding the bias nodes.
+ */
+class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val 
topology: Array[Int])
+  extends Serializable with ANNHelper {
+
+  /**
+   * Predicts values for a single data point using the trained model.
+   *
+   * @param testData represents a single data point.
+   * @return prediction using the trained model.
+   */
+  def predict(testData: Vector): Vector = {
+Vectors.dense(computeValues(testData.toArray, weights.toArray))
+  }
+
+  /**
+   * Predict values for an RDD of data points using the trained model.
+   *
+   * @param testDataRDD RDD representing the input vectors.
+   * @return RDD with predictions using the trained model as (input, 
output) pairs.
+   */
+  def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = {
+testDataRDD.map(T = (T, predict(T)) )
+  }
+
+  private def computeValues(arrData: Array[Double], arrWeights: 
Array[Double]): Array[Double] = {
+val arrNodes = forwardRun(arrData, arrWeights)
+arrNodes.slice(arrNodes.size - topology(L), 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-61433892
  
@bgreeven I haven't studied the implementation details yet but I had a 
question about the API. I realize that RDD[(Vector, Vector)] is a more general 
structure for training data but it might be a good idea in order to provide 
support for RDD[LabeledPoint] (with conversion to internal format of 
RDD[(Vector, Vector)] as well so that ANN can be plugged in place of other 
algorithms in the machine learning pipeline for single label classification and 
regression uses cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19717451
  
--- Diff: docs/mllib-ann.md ---
@@ -0,0 +1,223 @@
+---
+layout: global
+title: Artificial Neural Networks - MLlib
+displayTitle: a href=mllib-guide.htmlMLlib/a - Artificial Neural 
Networks
+---
+
+# Introduction
+
+This document describes the MLlib's Artificial Neural Network (ANN) 
implementation.
+
+The implementation currently consist of the following files:
+
+* 'ArtificialNeuralNetwork.scala': implements the ANN
+* 'ANNSuite': implements automated tests for the ANN and its gradient
+* 'ANNDemo': a demo that approximates three functions and shows a 
graphical representation of
+the result
+
+# Summary of usage
+
+The ArtificialNeuralNetwork object is used as an interface to the neural 
network. It is
+called as follows:
+
+```
+val annModel = ArtificialNeuralNetwork.train(rdd, hiddenLayersTopology, 
maxNumIterations)
--- End diff --

How common is it to have different number of nodes in the hidden layer? I 
am wondering whether there should be support for a simpler method ```def 
train(rdd, numNodesHiddenLayers, maxNumIterations)``` and perhaps even a 
simpler ```def train(rdd)``` with good default settings to help users get 
started.

@mengxr @jkbradley Would the upcoming MLlib API feature make this 
suggestion moot with support for default parameters?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19717692
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala 
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.ann
+
+import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy}
+
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.optimization._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/*
+ * Implements a Artificial Neural Network (ANN)
+ *
+ * The data consists of an input vector and an output vector, combined 
into a single vector
+ * as follows:
+ *
+ * [ ---input--- ---output--- ]
+ *
+ * NOTE: output values should be in the range [0,1]
+ *
+ * For a network of H hidden layers:
+ *
+ * hiddenLayersTopology(h) indicates the number of nodes in hidden layer 
h, excluding the bias
+ * node. h counts from 0 (first hidden layer, taking inputs from input 
layer) to H - 1 (last
+ * hidden layer, sending outputs to the output layer).
+ *
+ * hiddenLayersTopology is converted internally to topology, which adds 
the number of nodes
+ * in the input and output layers.
+ *
+ * noInput = topology(0), the number of input nodes
+ * noOutput = topology(L-1), the number of output nodes
+ *
+ * input = data( 0 to noInput-1 )
+ * output = data( noInput to noInput + noOutput - 1 )
+ *
+ * W_ijl is the weight from node i in layer l-1 to node j in layer l
+ * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the 
weights vector
+ *
+ * B_jl is the bias input of node j in layer l
+ * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + 
topology(l-1) in the weights vector
+ *
+ * error function: E( O, Y ) = sum( O_j - Y_j )
+ * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN,
+ * and (Y_0, ..., Y_(noOutput-1)) the input)
+ *
+ * node_jl is node j in layer l
+ * node_jl goes to position ofsNode(l) + j
+ *
+ * The weights gradient is defined as dE/dW_ijl and dE/dB_jl
+ * It has same mapping as W_ijl and B_jl
+ *
+ * For back propagation:
+ * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before 
applying the sigmoid
+ * delta_jl has the same mapping as node_jl
+ *
+ * Where E = ((estOutput-output),(estOutput-output)),
+ * the inner product of the difference between estimation and target 
output with itself.
+ *
+ */
+
+/**
+ * Artificial neural network (ANN) model
+ *
+ * @param weights the weights between the neurons in the ANN.
+ * @param topology array containing the number of nodes per layer in the 
network, including
+ * the nodes in the input and output layer, but excluding the bias nodes.
+ */
+class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val 
topology: Array[Int])
+  extends Serializable with ANNHelper {
+
+  /**
+   * Predicts values for a single data point using the trained model.
+   *
+   * @param testData represents a single data point.
+   * @return prediction using the trained model.
+   */
+  def predict(testData: Vector): Vector = {
+Vectors.dense(computeValues(testData.toArray, weights.toArray))
+  }
+
+  /**
+   * Predict values for an RDD of data points using the trained model.
+   *
+   * @param testDataRDD RDD representing the input vectors.
+   * @return RDD with predictions using the trained model as (input, 
output) pairs.
+   */
+  def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = {
+testDataRDD.map(T = (T, predict(T)) )
+  }
+
+  private def computeValues(arrData: Array[Double], arrWeights: 
Array[Double]): Array[Double] = {
+val arrNodes = forwardRun(arrData, arrWeights)
+arrNodes.slice(arrNodes.size - topology(L), 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19717699
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala 
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.ann
+
+import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy}
+
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.optimization._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/*
+ * Implements a Artificial Neural Network (ANN)
+ *
+ * The data consists of an input vector and an output vector, combined 
into a single vector
+ * as follows:
+ *
+ * [ ---input--- ---output--- ]
+ *
+ * NOTE: output values should be in the range [0,1]
+ *
+ * For a network of H hidden layers:
+ *
+ * hiddenLayersTopology(h) indicates the number of nodes in hidden layer 
h, excluding the bias
+ * node. h counts from 0 (first hidden layer, taking inputs from input 
layer) to H - 1 (last
+ * hidden layer, sending outputs to the output layer).
+ *
+ * hiddenLayersTopology is converted internally to topology, which adds 
the number of nodes
+ * in the input and output layers.
+ *
+ * noInput = topology(0), the number of input nodes
+ * noOutput = topology(L-1), the number of output nodes
+ *
+ * input = data( 0 to noInput-1 )
+ * output = data( noInput to noInput + noOutput - 1 )
+ *
+ * W_ijl is the weight from node i in layer l-1 to node j in layer l
+ * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the 
weights vector
+ *
+ * B_jl is the bias input of node j in layer l
+ * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + 
topology(l-1) in the weights vector
+ *
+ * error function: E( O, Y ) = sum( O_j - Y_j )
+ * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN,
+ * and (Y_0, ..., Y_(noOutput-1)) the input)
+ *
+ * node_jl is node j in layer l
+ * node_jl goes to position ofsNode(l) + j
+ *
+ * The weights gradient is defined as dE/dW_ijl and dE/dB_jl
+ * It has same mapping as W_ijl and B_jl
+ *
+ * For back propagation:
+ * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before 
applying the sigmoid
+ * delta_jl has the same mapping as node_jl
+ *
+ * Where E = ((estOutput-output),(estOutput-output)),
+ * the inner product of the difference between estimation and target 
output with itself.
+ *
+ */
+
+/**
+ * Artificial neural network (ANN) model
+ *
+ * @param weights the weights between the neurons in the ANN.
+ * @param topology array containing the number of nodes per layer in the 
network, including
+ * the nodes in the input and output layer, but excluding the bias nodes.
+ */
+class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val 
topology: Array[Int])
+  extends Serializable with ANNHelper {
+
+  /**
+   * Predicts values for a single data point using the trained model.
+   *
+   * @param testData represents a single data point.
+   * @return prediction using the trained model.
+   */
+  def predict(testData: Vector): Vector = {
+Vectors.dense(computeValues(testData.toArray, weights.toArray))
+  }
+
+  /**
+   * Predict values for an RDD of data points using the trained model.
+   *
+   * @param testDataRDD RDD representing the input vectors.
+   * @return RDD with predictions using the trained model as (input, 
output) pairs.
+   */
+  def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = {
+testDataRDD.map(T = (T, predict(T)) )
+  }
+
+  private def computeValues(arrData: Array[Double], arrWeights: 
Array[Double]): Array[Double] = {
+val arrNodes = forwardRun(arrData, arrWeights)
+arrNodes.slice(arrNodes.size - topology(L), 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-61436212
  
@bgreeven Another general suggestion: consider adding logging to the code. 
It goes a long way in debugging errors and get statuses on long running job. 
Check DecisionTree/RandomForest/GradientBoosting for example. We need to add 
more logging to LogisticRegression (object GradientDescent extends Logging but 
there are no logs there) as well but I guess that's a separate PR.

cc: @mengxr @jkbradley


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19717937
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala 
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.ann
+
+import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy}
+
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.optimization._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/*
+ * Implements a Artificial Neural Network (ANN)
+ *
+ * The data consists of an input vector and an output vector, combined 
into a single vector
+ * as follows:
+ *
+ * [ ---input--- ---output--- ]
+ *
+ * NOTE: output values should be in the range [0,1]
+ *
+ * For a network of H hidden layers:
+ *
+ * hiddenLayersTopology(h) indicates the number of nodes in hidden layer 
h, excluding the bias
+ * node. h counts from 0 (first hidden layer, taking inputs from input 
layer) to H - 1 (last
+ * hidden layer, sending outputs to the output layer).
+ *
+ * hiddenLayersTopology is converted internally to topology, which adds 
the number of nodes
+ * in the input and output layers.
+ *
+ * noInput = topology(0), the number of input nodes
+ * noOutput = topology(L-1), the number of output nodes
+ *
+ * input = data( 0 to noInput-1 )
+ * output = data( noInput to noInput + noOutput - 1 )
+ *
+ * W_ijl is the weight from node i in layer l-1 to node j in layer l
+ * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the 
weights vector
+ *
+ * B_jl is the bias input of node j in layer l
+ * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + 
topology(l-1) in the weights vector
+ *
+ * error function: E( O, Y ) = sum( O_j - Y_j )
+ * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN,
+ * and (Y_0, ..., Y_(noOutput-1)) the input)
+ *
+ * node_jl is node j in layer l
+ * node_jl goes to position ofsNode(l) + j
+ *
+ * The weights gradient is defined as dE/dW_ijl and dE/dB_jl
+ * It has same mapping as W_ijl and B_jl
+ *
+ * For back propagation:
+ * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before 
applying the sigmoid
+ * delta_jl has the same mapping as node_jl
+ *
+ * Where E = ((estOutput-output),(estOutput-output)),
+ * the inner product of the difference between estimation and target 
output with itself.
+ *
+ */
+
+/**
+ * Artificial neural network (ANN) model
+ *
+ * @param weights the weights between the neurons in the ANN.
+ * @param topology array containing the number of nodes per layer in the 
network, including
+ * the nodes in the input and output layer, but excluding the bias nodes.
+ */
+class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val 
topology: Array[Int])
+  extends Serializable with ANNHelper {
+
+  /**
+   * Predicts values for a single data point using the trained model.
+   *
+   * @param testData represents a single data point.
+   * @return prediction using the trained model.
+   */
+  def predict(testData: Vector): Vector = {
+Vectors.dense(computeValues(testData.toArray, weights.toArray))
+  }
+
+  /**
+   * Predict values for an RDD of data points using the trained model.
+   *
+   * @param testDataRDD RDD representing the input vectors.
+   * @return RDD with predictions using the trained model as (input, 
output) pairs.
+   */
+  def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = {
+testDataRDD.map(T = (T, predict(T)) )
+  }
+
+  private def computeValues(arrData: Array[Double], arrWeights: 
Array[Double]): Array[Double] = {
+val arrNodes = forwardRun(arrData, arrWeights)
+arrNodes.slice(arrNodes.size - topology(L), 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19717985
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/ann/ArtificialNeuralNetwork.scala 
---
@@ -0,0 +1,528 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.ann
+
+import breeze.linalg.{DenseVector, Vector = BV, axpy = brzAxpy}
+
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.optimization._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.util.random.XORShiftRandom
+
+/*
+ * Implements a Artificial Neural Network (ANN)
+ *
+ * The data consists of an input vector and an output vector, combined 
into a single vector
+ * as follows:
+ *
+ * [ ---input--- ---output--- ]
+ *
+ * NOTE: output values should be in the range [0,1]
+ *
+ * For a network of H hidden layers:
+ *
+ * hiddenLayersTopology(h) indicates the number of nodes in hidden layer 
h, excluding the bias
+ * node. h counts from 0 (first hidden layer, taking inputs from input 
layer) to H - 1 (last
+ * hidden layer, sending outputs to the output layer).
+ *
+ * hiddenLayersTopology is converted internally to topology, which adds 
the number of nodes
+ * in the input and output layers.
+ *
+ * noInput = topology(0), the number of input nodes
+ * noOutput = topology(L-1), the number of output nodes
+ *
+ * input = data( 0 to noInput-1 )
+ * output = data( noInput to noInput + noOutput - 1 )
+ *
+ * W_ijl is the weight from node i in layer l-1 to node j in layer l
+ * W_ijl goes to position ofsWeight(l) + j*(topology(l-1)+1) + i in the 
weights vector
+ *
+ * B_jl is the bias input of node j in layer l
+ * B_jl goes to position ofsWeight(l) + j*(topology(l-1)+1) + 
topology(l-1) in the weights vector
+ *
+ * error function: E( O, Y ) = sum( O_j - Y_j )
+ * (with O = (O_0, ..., O_(noOutput-1)) the output of the ANN,
+ * and (Y_0, ..., Y_(noOutput-1)) the input)
+ *
+ * node_jl is node j in layer l
+ * node_jl goes to position ofsNode(l) + j
+ *
+ * The weights gradient is defined as dE/dW_ijl and dE/dB_jl
+ * It has same mapping as W_ijl and B_jl
+ *
+ * For back propagation:
+ * delta_jl = dE/dS_jl, where S_jl the output of node_jl, but before 
applying the sigmoid
+ * delta_jl has the same mapping as node_jl
+ *
+ * Where E = ((estOutput-output),(estOutput-output)),
+ * the inner product of the difference between estimation and target 
output with itself.
+ *
+ */
+
+/**
+ * Artificial neural network (ANN) model
+ *
+ * @param weights the weights between the neurons in the ANN.
+ * @param topology array containing the number of nodes per layer in the 
network, including
+ * the nodes in the input and output layer, but excluding the bias nodes.
+ */
+class ArtificialNeuralNetworkModel private[mllib](val weights: Vector, val 
topology: Array[Int])
+  extends Serializable with ANNHelper {
+
+  /**
+   * Predicts values for a single data point using the trained model.
+   *
+   * @param testData represents a single data point.
+   * @return prediction using the trained model.
+   */
+  def predict(testData: Vector): Vector = {
+Vectors.dense(computeValues(testData.toArray, weights.toArray))
+  }
+
+  /**
+   * Predict values for an RDD of data points using the trained model.
+   *
+   * @param testDataRDD RDD representing the input vectors.
+   * @return RDD with predictions using the trained model as (input, 
output) pairs.
+   */
+  def predict(testDataRDD: RDD[Vector]): RDD[(Vector,Vector)] = {
+testDataRDD.map(T = (T, predict(T)) )
+  }
+
+  private def computeValues(arrData: Array[Double], arrWeights: 
Array[Double]): Array[Double] = {
+val arrNodes = forwardRun(arrData, arrWeights)
+arrNodes.slice(arrNodes.size - topology(L), 

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread avulanov
Github user avulanov commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19719009
  
--- Diff: docs/mllib-ann.md ---
@@ -0,0 +1,223 @@
+---
+layout: global
+title: Artificial Neural Networks - MLlib
+displayTitle: a href=mllib-guide.htmlMLlib/a - Artificial Neural 
Networks
+---
+
+# Introduction
+
+This document describes the MLlib's Artificial Neural Network (ANN) 
implementation.
+
+The implementation currently consist of the following files:
+
+* 'ArtificialNeuralNetwork.scala': implements the ANN
+* 'ANNSuite': implements automated tests for the ANN and its gradient
+* 'ANNDemo': a demo that approximates three functions and shows a 
graphical representation of
+the result
+
+# Summary of usage
+
+The ArtificialNeuralNetwork object is used as an interface to the neural 
network. It is
+called as follows:
+
+```
+val annModel = ArtificialNeuralNetwork.train(rdd, hiddenLayersTopology, 
maxNumIterations)
--- End diff --

@manishamde given the size of PR @mengxr suggested to split it into 
multiple PRs. There is an implementation of a classifier that is based on this 
artificial neural network https://github.com/avulanov/spark/tree/annclassifier. 
It employes `RDD[LabeledPoint]` and implements MLlib `Classifier`. Softmax 
output and cross-entropy error is usually used for better classification 
performance and they are not yet implemented. We've discussed this issue with 
@bgreeven and our thinking is to have interface in this PR that allows setting 
different error function and optimizer. Does it make sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-11-02 Thread manishamde
Github user manishamde commented on a diff in the pull request:

https://github.com/apache/spark/pull/1290#discussion_r19720701
  
--- Diff: docs/mllib-ann.md ---
@@ -0,0 +1,223 @@
+---
+layout: global
+title: Artificial Neural Networks - MLlib
+displayTitle: a href=mllib-guide.htmlMLlib/a - Artificial Neural 
Networks
+---
+
+# Introduction
+
+This document describes the MLlib's Artificial Neural Network (ANN) 
implementation.
+
+The implementation currently consist of the following files:
+
+* 'ArtificialNeuralNetwork.scala': implements the ANN
+* 'ANNSuite': implements automated tests for the ANN and its gradient
+* 'ANNDemo': a demo that approximates three functions and shows a 
graphical representation of
+the result
+
+# Summary of usage
+
+The ArtificialNeuralNetwork object is used as an interface to the neural 
network. It is
+called as follows:
+
+```
+val annModel = ArtificialNeuralNetwork.train(rdd, hiddenLayersTopology, 
maxNumIterations)
--- End diff --

@avulanov Thanks for the clarification. Sounds good to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-10-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-60742124
  
  [Test build #22359 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22359/consoleFull)
 for   PR 1290 at commit 
[`6d167c5`](https://github.com/apache/spark/commit/6d167c52184582d58557fd16a462416391d29d82).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-10-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-60749017
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22359/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-10-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-60749010
  
  [Test build #22359 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22359/consoleFull)
 for   PR 1290 at commit 
[`6d167c5`](https://github.com/apache/spark/commit/6d167c52184582d58557fd16a462416391d29d82).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-10-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-60844377
  
  [Test build #22385 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22385/consoleFull)
 for   PR 1290 at commit 
[`79c433e`](https://github.com/apache/spark/commit/79c433ef84f59a999a080dd421ca4bc5856d24ca).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-10-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-60853271
  
  [Test build #22385 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22385/consoleFull)
 for   PR 1290 at commit 
[`79c433e`](https://github.com/apache/spark/commit/79c433ef84f59a999a080dd421ca4bc5856d24ca).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OutputCanvas2D(wd: Int, ht: Int) extends Canvas `
  * `class OutputFrame2D( title: String ) extends Frame( title ) `
  * `class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends 
Canvas `
  * `class OutputFrame3D(title: String, shadowFrac: Double) extends 
Frame(title) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-10-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-60853279
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22385/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-57076438
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20930/consoleFull)
 for   PR 1290 at commit 
[`804c07a`](https://github.com/apache/spark/commit/804c07a3abd6a0e81d0f04b4a08f88df29cad357).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-57077703
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20930/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-57077701
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20930/consoleFull)
 for   PR 1290 at commit 
[`804c07a`](https://github.com/apache/spark/commit/804c07a3abd6a0e81d0f04b4a08f88df29cad357).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OutputCanvas2D(wd: Int, ht: Int) extends Canvas `
  * `class OutputFrame2D( title: String ) extends Frame( title ) `
  * `class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends 
Canvas `
  * `class OutputFrame3D(title: String, shadowFrac: Double) extends 
Frame(title) `
  * `class IndexedRecordToJavaConverter extends Converter[IndexedRecord, 
JMap[String, Any]]`
  * `class AvroWrapperToJavaConverter extends Converter[Any, Any] `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56952211
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20857/consoleFull)
 for   PR 1290 at commit 
[`19d2faa`](https://github.com/apache/spark/commit/19d2faac9047bfe382646bb8c4cd1e5ea2faf2f7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56957127
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20857/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56957114
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20857/consoleFull)
 for   PR 1290 at commit 
[`19d2faa`](https://github.com/apache/spark/commit/19d2faac9047bfe382646bb8c4cd1e5ea2faf2f7).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OutputCanvas2D(wd: Int, ht: Int) extends Canvas `
  * `class OutputFrame2D( title: String ) extends Frame( title ) `
  * `class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends 
Canvas `
  * `class OutputFrame3D(title: String, shadowFrac: Double) extends 
Frame(title) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56962296
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20858/consoleFull)
 for   PR 1290 at commit 
[`aaf3162`](https://github.com/apache/spark/commit/aaf31627110b379982d74c8882b6bd1491828b0e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56971412
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20858/consoleFull)
 for   PR 1290 at commit 
[`aaf3162`](https://github.com/apache/spark/commit/aaf31627110b379982d74c8882b6bd1491828b0e).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OutputCanvas2D(wd: Int, ht: Int) extends Canvas `
  * `class OutputFrame2D( title: String ) extends Frame( title ) `
  * `class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends 
Canvas `
  * `class OutputFrame3D(title: String, shadowFrac: Double) extends 
Frame(title) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56971423
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20858/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56789789
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20793/consoleFull)
 for   PR 1290 at commit 
[`5b91bba`](https://github.com/apache/spark/commit/5b91bba9cde8f4257d37725fc167dee140e22be9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56790021
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20791/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56790721
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20795/consoleFull)
 for   PR 1290 at commit 
[`0db8951`](https://github.com/apache/spark/commit/0db89511aa95fcdf980f9e93b7a14c823eed620f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56791029
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20794/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56791103
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20795/consoleFull)
 for   PR 1290 at commit 
[`0db8951`](https://github.com/apache/spark/commit/0db89511aa95fcdf980f9e93b7a14c823eed620f).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OutputCanvas2D(wd: Int, ht: Int) extends Canvas `
  * `class OutputFrame2D( title: String ) extends Frame( title ) `
  * `class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends 
Canvas `
  * `class OutputFrame3D(title: String, shadowFrac: Double) extends 
Frame(title) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56791106
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20795/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56791275
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20797/consoleFull)
 for   PR 1290 at commit 
[`d4a692c`](https://github.com/apache/spark/commit/d4a692c2decb044115424561b5618a112f54c2f8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56796798
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20793/consoleFull)
 for   PR 1290 at commit 
[`5b91bba`](https://github.com/apache/spark/commit/5b91bba9cde8f4257d37725fc167dee140e22be9).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class OutputCanvas2D(wd: Int, ht: Int) extends Canvas `
  * `class OutputFrame2D( title: String ) extends Frame( title ) `
  * `class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends 
Canvas `
  * `class OutputFrame3D(title: String, shadowFrac: Double) extends 
Frame(title) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56796805
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20793/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56798291
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20797/consoleFull)
 for   PR 1290 at commit 
[`d4a692c`](https://github.com/apache/spark/commit/d4a692c2decb044115424561b5618a112f54c2f8).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56798296
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20797/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-22 Thread bgreeven
Github user bgreeven commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56343909
  
Changed optimiser to LBFGS. Works much faster, but has the disadvantage 
(due to the increased convergence speed per iteration) that it also starts to 
exhibit overfitting earlier (after much fewer iterations).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an 1-hi...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-56343891
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20650/consoleFull)
 for   PR 1290 at commit 
[`8acd799`](https://github.com/apache/spark/commit/8acd799abc20a054a14ec99780b7609c87f3255f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >