That is a really old paper that basically pre-dates all of the recent
important work in neural networks.

You should look for works on Rectified Linear Units (ReLU), drop-out
regularization, parameter servers (downpour sgd) and deep learning.

Map-reduce as you have used it will not produce interesting results because
the overhead of map-reduce will be far too high.

Here are some references:

http://www.cs.toronto.edu/~ranzato/publications/DistBeliefNIPS2012_withAppendix.pdf

http://arxiv.org/abs/1412.5567

http://arxiv.org/abs/1502.01710

http://www.comp.nus.edu.sg/~dbsystem/singa/

http://0xdata.com/product/deep-learning/


On Thu, Feb 12, 2015 at 2:14 AM, unmesha sreeveni <unmeshab...@gmail.com>
wrote:

> I am trying to implement Neural Network in MapReduce. Apache mahout is
> reffering this paper
> <
> http://www.cs.stanford.edu/people/ang/papers/nips06-mapreducemulticore.pdf
> >
>
> Neural Network (NN) We focus on backpropagation By defining a network
> structure (we use a three layer network with two output neurons classifying
> the data into two categories), each mapper propagates its set of data
> through the network. For each training example, the error is back
> propagated to calculate the partial gradient for each of the weights in the
> network. The reducer then sums the partial gradient from each mapper and
> does a batch gradient descent to update the weights of the network.
>
> Here <http://homepages.gold.ac.uk/nikolaev/311sperc.htm> is the worked out
> example for gradient descent algorithm.
>
> Gradient Descent Learning Algorithm for Sigmoidal Perceptrons
> <http://pastebin.com/6gAQv5vb>
>
>    1. Which is the better way to parallize neural network algorithm While
>    looking in MapReduce perspective? In mapper: Each Record owns a partial
>    weight(from above example: w0,w1,w2),I doubt if w0 is bias. A random
> weight
>    will be assigned initially and initial record calculates the output(o)
> and
>    weight get updated , second record also find the output and deltaW is
> got
>    updated with the previous deltaW value. While coming into reducer the
> sum
>    of gradient is calculated. ie if we have 3 mappers,we will be able to
> get 3
>    w0,w1,w2.These are summed and using batch gradient descent we will be
>    updating the weights of the network.
>    2. In the above method how can we ensure that which previous weight is
>    taken while considering more than 1 map task.Each map task has its own
>    weight updated.How can it be accurate? [image: enter image description
>    here]
>    3. Where can I find backward propogation in the above mentioned gradient
>    descent neural network algorithm?Or is it fine with this implementation?
>    4. what is the termination condition mensioned in the algorithm?
>
> Please help me with some pointers.
>
> Thanks in advance.
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>

Reply via email to