[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: 
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 

A diagram of the parameter server architecture is shown below.

  was:
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 

Synchronization:

We also need to implement the synchronization between workers and parameter 
server to be able to bring more parameter update strategies, e.g., the 
stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock recording all workers' 
clock in the server. Each time when an iteration in side of worker finishes, it 
waits server to give a signal, i.e., to send a request for calculating the 
staleness according to the vector clock. And when the server receives the 
gradients from certain worker, it will increment the vector clock for this 
worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and 
SSP as "staleness==N".

A diagram of the parameter server architecture is shown below.


> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Technical task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: 
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 

Synchronization:

We also need to implement the synchronization between workers and parameter 
server to be able to bring more parameter update strategies, e.g., the 
stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock recording all workers' 
clock in the server. Each time when an iteration in side of worker finishes, it 
waits server to give a signal, i.e., to send a request for calculating the 
staleness according to the vector clock. And when the server receives the 
gradients from certain worker, it will increment the vector clock for this 
worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and 
SSP as "staleness==N".

A diagram of the parameter server architecture is shown below.

  was:
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 

Push/Pull service: 

In general, we could launch a parameter server inside (local multi-thread 
backend) or outside (spark distributed backend) of CP to provide the pull and 
push service. For the moment, all the weights and biases are saved in a hashmap 
using a key, e.g., "global parameter". Each worker's gradients will be put into 
the hashmap seperately with a given key. And the exchange between server and 
workers will be implemented by TCP. Hence, we could easily broadcast the IP 
address and the port number to the workers. And then the workers can send the 
gradients and retrieve the new parameters via TCP socket. The server will also 
spawn a thread which retrieves the gradients by polling the hashmap using 
relevant keys and aggregates them. At last, it updates the global parameter in 
the hashmap.

Synchronization:

We also need to implement the synchronization between workers and parameter 
server to be able to bring more parameter update strategies, e.g., the 
stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock recording all workers' 
clock in the server. Each time when an iteration in side of worker finishes, it 
waits server to give a signal, i.e., to send a request for calculating the 
staleness according to the vector clock. And when the server receives the 
gradients from certain worker, it will increment the vector clock for this 
worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and 
SSP as "staleness==N".

A diagram of the parameter server architecture is shown below.


> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Technical task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> Synchronization:
> We also need to implement the synchronization between workers and parameter 
> server to be able to bring more parameter update strategies, e.g., the 
> stale-synchronous strategy needs a hyperparameter "staleness" to define the 
> waiting interval. The idea is to maintain a vector clock recording all 
> workers' clock in the server. Each time when an iteration in side of worker 
> finishes, it waits server to give a signal, i.e., to send a request for 
> calculating the staleness according to the vector clock. And when the server 
> receives the gradients from certain worker, it will increment the vector 
> clock for this worker. So we could define BSP as "staleness==0", ASP as 
> "staleness==-1" and SSP as "staleness==N".
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-13 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Issue Type: Technical task  (was: Sub-task)

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Technical task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> Push/Pull service: 
> In general, we could launch a parameter server inside (local multi-thread 
> backend) or outside (spark distributed backend) of CP to provide the pull and 
> push service. For the moment, all the weights and biases are saved in a 
> hashmap using a key, e.g., "global parameter". Each worker's gradients will 
> be put into the hashmap seperately with a given key. And the exchange between 
> server and workers will be implemented by TCP. Hence, we could easily 
> broadcast the IP address and the port number to the workers. And then the 
> workers can send the gradients and retrieve the new parameters via TCP 
> socket. The server will also spawn a thread which retrieves the gradients by 
> polling the hashmap using relevant keys and aggregates them. At last, it 
> updates the global parameter in the hashmap.
> Synchronization:
> We also need to implement the synchronization between workers and parameter 
> server to be able to bring more parameter update strategies, e.g., the 
> stale-synchronous strategy needs a hyperparameter "staleness" to define the 
> waiting interval. The idea is to maintain a vector clock recording all 
> workers' clock in the server. Each time when an iteration in side of worker 
> finishes, it waits server to give a signal, i.e., to send a request for 
> calculating the staleness according to the vector clock. And when the server 
> receives the gradients from certain worker, it will increment the vector 
> clock for this worker. So we could define BSP as "staleness==0", ASP as 
> "staleness==-1" and SSP as "staleness==N".
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Due Date: 1/Jun/18  (was: 4/Jun/18)

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> Push/Pull service: 
> In general, we could launch a parameter server inside (local multi-thread 
> backend) or outside (spark distributed backend) of CP to provide the pull and 
> push service. For the moment, all the weights and biases are saved in a 
> hashmap using a key, e.g., "global parameter". Each worker's gradients will 
> be put into the hashmap seperately with a given key. And the exchange between 
> server and workers will be implemented by TCP. Hence, we could easily 
> broadcast the IP address and the port number to the workers. And then the 
> workers can send the gradients and retrieve the new parameters via TCP 
> socket. The server will also spawn a thread which retrieves the gradients by 
> polling the hashmap using relevant keys and aggregates them. At last, it 
> updates the global parameter in the hashmap.
> Synchronization:
> We also need to implement the synchronization between workers and parameter 
> server to be able to bring more parameter update strategies, e.g., the 
> stale-synchronous strategy needs a hyperparameter "staleness" to define the 
> waiting interval. The idea is to maintain a vector clock recording all 
> workers' clock in the server. Each time when an iteration in side of worker 
> finishes, it waits server to give a signal, i.e., to send a request for 
> calculating the staleness according to the vector clock. And when the server 
> receives the gradients from certain worker, it will increment the vector 
> clock for this worker. So we could define BSP as "staleness==0", ASP as 
> "staleness==-1" and SSP as "staleness==N".
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Attachment: (was: ps.png)

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> Push/Pull service: 
> In general, we could launch a parameter server inside (local multi-thread 
> backend) or outside (spark distributed backend) of CP to provide the pull and 
> push service. For the moment, all the weights and biases are saved in a 
> hashmap using a key, e.g., "global parameter". Each worker's gradients will 
> be put into the hashmap seperately with a given key. And the exchange between 
> server and workers will be implemented by TCP. Hence, we could easily 
> broadcast the IP address and the port number to the workers. And then the 
> workers can send the gradients and retrieve the new parameters via TCP 
> socket. The server will also spawn a thread which retrieves the gradients by 
> polling the hashmap using relevant keys and aggregates them. At last, it 
> updates the global parameter in the hashmap.
> Synchronization:
> We also need to implement the synchronization between workers and parameter 
> server to be able to bring more parameter update strategies, e.g., the 
> stale-synchronous strategy needs a hyperparameter "staleness" to define the 
> waiting interval. The idea is to maintain a vector clock recording all 
> workers' clock in the server. Each time when an iteration in side of worker 
> finishes, it waits server to give a signal, i.e., to send a request for 
> calculating the staleness according to the vector clock. And when the server 
> receives the gradients from certain worker, it will increment the vector 
> clock for this worker. So we could define BSP as "staleness==0", ASP as 
> "staleness==-1" and SSP as "staleness==N".
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Attachment: ps.png

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> Push/Pull service: 
> In general, we could launch a parameter server inside (local multi-thread 
> backend) or outside (spark distributed backend) of CP to provide the pull and 
> push service. For the moment, all the weights and biases are saved in a 
> hashmap using a key, e.g., "global parameter". Each worker's gradients will 
> be put into the hashmap seperately with a given key. And the exchange between 
> server and workers will be implemented by TCP. Hence, we could easily 
> broadcast the IP address and the port number to the workers. And then the 
> workers can send the gradients and retrieve the new parameters via TCP 
> socket. The server will also spawn a thread which retrieves the gradients by 
> polling the hashmap using relevant keys and aggregates them. At last, it 
> updates the global parameter in the hashmap.
> Synchronization:
> We also need to implement the synchronization between workers and parameter 
> server to be able to bring more parameter update strategies, e.g., the 
> stale-synchronous strategy needs a hyperparameter "staleness" to define the 
> waiting interval. The idea is to maintain a vector clock recording all 
> workers' clock in the server. Each time when an iteration in side of worker 
> finishes, it waits server to give a signal, i.e., to send a request for 
> calculating the staleness according to the vector clock. And when the server 
> receives the gradients from certain worker, it will increment the vector 
> clock for this worker. So we could define BSP as "staleness==0", ASP as 
> "staleness==-1" and SSP as "staleness==N".
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-09 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: 
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 
 # For the case of local multi-thread parameter server, it is easy to maintain 
a concurrent hashmap (where the parameters as value accompanied with a defined 
key) inside the CP. And the workers are launched in multi-threaded way to 
execute the gradients calculation function and push the gradients to the 
hashmap. An another thread will be launched to pull the gradients from hashmap 
and call the aggregation function to update the parameters. 
 # For the case of spark distributed backend, we could launch a remote single 
parameter server outside of CP (as a worker) to provide the pull and push 
service. For the moment, all the weights and biases are saved in this single 
server. And the exchange between server and workers will be implemented by TCP. 
Hence, we could easily broadcast the IP address and the port number to the 
workers. And then the workers can send the gradients and retrieve the new 
parameters via TCP socket. 

We could also need to implement the synchronisation between workers and 
parameter server to be able to bring more parameter update strategies, e.g., 
the stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock consisting of all 
workers' clock in the server. Each time when an iteration finishes, the worker 
will send a request to server and then the server will send back a response to 
indicate if the worker should wait or not.

  was:A single node parameter server acts as a data-parallel parameter server. 
And a multi-node model parallel parameter server will be discussed if time 
permits. The idea is to run a single-node parameter server by maintaining a 
hashmap inside the CP (Control Program) where the parameter as value 
accompanied with a defined key. For example, inserting the global parameter 
with a key named “worker-param-replica” allows the workers to retrieve the 
parameter replica. Hence, in the context of local multi-threaded backend, 
workers can communicate directly with this hashmap in the same process. And in 
the context of Spark distributed backend, the CP firstly needs to fork a thread 
to start a parameter server which maintains a hashmap. And secondly the workers 
can send intermediates and retrieve parameters by connecting to parameter 
server via TCP socket. Since SystemML has good cache management, we only need 
to maintain the matrix reference pointing to a file location instead of real 
data instance in the hashmap. If time permits, to be able to introduce the 
async and staleness update strategies, we would need to implement the 
synchronization by leveraging vector clock.


> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
>  # For the case of local multi-thread parameter server, it is easy to 
> maintain a concurrent hashmap (where the parameters as value accompanied with 
> a defined key) inside the CP. And the workers are launched in multi-threaded 
> way to execute the gradients calculation function and push the gradients to 
> the hashmap. An another thread will be launched to pull the gradients from 
> hashmap and call the aggregation function to update the parameters. 
>  # For the case of spark distributed backend, we could launch a remote single 
> parameter server outside of CP (as a worker) to provide the pull and push 
> service. For the moment, all the weights and biases are saved in this single 
> server. And the exchange between server and workers will be implemented by 
> TCP. Hence, we could easily broadcast the IP address and the port number to 
> the workers. And then the workers can send the gradients and retrieve the new 
> parameters via TCP socket. 
> We could also need to implement the synchronisation between workers and 
> parameter server to be able to bring more parameter update strategies, e.g., 
> the stale-synchronous strategy needs a hyperparameter "staleness" to define 
> the waiting interval. The idea is to maintain a vector clock consisting of 
> all workers' clock in the server. Each time when an iteration finishes, the 
> worker will send a request to server and then 

[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-05 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Attachment: ps.png

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. The idea is to run a single-node parameter server by maintaining a 
> hashmap inside the CP (Control Program) where the parameter as value 
> accompanied with a defined key. For example, inserting the global parameter 
> with a key named “worker-param-replica” allows the workers to retrieve the 
> parameter replica. Hence, in the context of local multi-threaded backend, 
> workers can communicate directly with this hashmap in the same process. And 
> in the context of Spark distributed backend, the CP firstly needs to fork a 
> thread to start a parameter server which maintains a hashmap. And secondly 
> the workers can send intermediates and retrieve parameters by connecting to 
> parameter server via TCP socket. Since SystemML has good cache management, we 
> only need to maintain the matrix reference pointing to a file location 
> instead of real data instance in the hashmap. If time permits, to be able to 
> introduce the async and staleness update strategies, we would need to 
> implement the synchronization by leveraging vector clock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-05 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: A single node parameter server acts as a data-parallel 
parameter server. And a multi-node model parallel parameter server will be 
discussed if time permits. The idea is to run a single-node parameter server by 
maintaining a hashmap inside the CP (Control Program) where the parameter as 
value accompanied with a defined key. For example, inserting the global 
parameter with a key named “worker-param-replica” allows the workers to 
retrieve the parameter replica. Hence, in the context of local multi-threaded 
backend, workers can communicate directly with this hashmap in the same 
process. And in the context of Spark distributed backend, the CP firstly needs 
to fork a thread to start a parameter server which maintains a hashmap. And 
secondly the workers can send intermediates and retrieve parameters by 
connecting to parameter server via TCP socket. Since SystemML has good cache 
management, we only need to maintain the matrix reference pointing to a file 
location instead of real data instance in the hashmap. If time permits, to be 
able to introduce the async and staleness update strategies, we would need to 
implement the synchronization by leveraging vector clock.  (was: A single node 
parameter server acts as a data-parallel parameter server. And a multi-node 
model parallel parameter server will be discussed if time permits. )

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. The idea is to run a single-node parameter server by maintaining a 
> hashmap inside the CP (Control Program) where the parameter as value 
> accompanied with a defined key. For example, inserting the global parameter 
> with a key named “worker-param-replica” allows the workers to retrieve the 
> parameter replica. Hence, in the context of local multi-threaded backend, 
> workers can communicate directly with this hashmap in the same process. And 
> in the context of Spark distributed backend, the CP firstly needs to fork a 
> thread to start a parameter server which maintains a hashmap. And secondly 
> the workers can send intermediates and retrieve parameters by connecting to 
> parameter server via TCP socket. Since SystemML has good cache management, we 
> only need to maintain the matrix reference pointing to a file location 
> instead of real data instance in the hashmap. If time permits, to be able to 
> introduce the async and staleness update strategies, we would need to 
> implement the synchronization by leveraging vector clock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-05 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: A single node parameter server acts as a data-parallel 
parameter server. And a multi-node model parallel parameter server will be 
discussed if time permits.   (was: Parameter server allows to persist the model 
parameters in a distributed manner. It is specially applied in the context of 
large-scale machine learning to train the model. The parameters computation 
will be done with data parallelism across the workers. The data-parallel 
parameter server architecture is illustrated in Figure 2. With the help
of a lightweight parameter server interface [1], we are inspired to provide the 
push and pull methods as internal primitives, i.e., not exposed to the script 
level, allowing to exchange the intermediates among workers.)

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-05 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Description: 
Parameter server allows to persist the model parameters in a distributed 
manner. It is specially applied in the context of large-scale machine learning 
to train the model. The parameters computation will be done with data 
parallelism across the workers. The data-parallel parameter server architecture 
is illustrated in Figure 2. With the help
of a lightweight parameter server interface [1], we are inspired to provide the 
push and pull methods as internal primitives, i.e., not exposed to the script 
level, allowing to exchange the intermediates among workers.

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Assignee: LI Guobao
>Priority: Major
>
> Parameter server allows to persist the model parameters in a distributed 
> manner. It is specially applied in the context of large-scale machine 
> learning to train the model. The parameters computation will be done with 
> data parallelism across the workers. The data-parallel parameter server 
> architecture is illustrated in Figure 2. With the help
> of a lightweight parameter server interface [1], we are inspired to provide 
> the push and pull methods as internal primitives, i.e., not exposed to the 
> script level, allowing to exchange the intermediates among workers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SYSTEMML-2085) Single-node parameter server primitives

2018-05-04 Thread LI Guobao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:

Due Date: 4/Jun/18
 Summary: Single-node parameter server primitives  (was: Basic runtime 
primitives)

> Single-node parameter server primitives
> ---
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)