[jira] [Commented] (SYSTEMML-2458) Add experiment on spark paramserv
[ https://issues.apache.org/jira/browse/SYSTEMML-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569601#comment-16569601 ] Matthias Boehm commented on SYSTEMML-2458: -- Thanks - the adagrad results are in the repo; currently adam and sgd are running. One observation is that ASP-batch is much slower than BSP-batch. It's understandable because for BSP-batch we simply accure gradients and perform one update for all workers but this effect should not be that pronounced. > Add experiment on spark paramserv > - > > Key: SYSTEMML-2458 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2458 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (SYSTEMML-2458) Add experiment on spark paramserv
[ https://issues.apache.org/jira/browse/SYSTEMML-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569424#comment-16569424 ] LI Guobao commented on SYSTEMML-2458: - [~mboehm7], yes, I added the baseline experiment w/o paramserv and fixed the location of SystemML-config.xml file. Addtionnally, I've double checked the configuration of native BLAS for remote worker and it is well transferred and set to remote worker. > Add experiment on spark paramserv > - > > Key: SYSTEMML-2458 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2458 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (SYSTEMML-2458) Add experiment on spark paramserv
[ https://issues.apache.org/jira/browse/SYSTEMML-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569372#comment-16569372 ] Matthias Boehm commented on SYSTEMML-2458: -- OK I just kicked of a run for LOCAL experiments with MKL. However, note that the SystemML-config.xml file needs to be in each of the subdirectories otherwise it's not picked up correctly. Also, the Intel MKL's direct conv2d still runs into segmentation faults on this new architecture whenever the batchsize larger than 64 and hence I limited it to max 64. Tomorrow, I will kickoff baseline runs (e.g., without parameter server, varying number of workers, and with our java backend operations). The distributed experiments will follow subsequently. > Add experiment on spark paramserv > - > > Key: SYSTEMML-2458 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2458 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (SYSTEMML-2458) Add experiment on spark paramserv
[ https://issues.apache.org/jira/browse/SYSTEMML-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569318#comment-16569318 ] Matthias Boehm commented on SYSTEMML-2458: -- Sure, I'm happy to kickoff additional rounds for local and distributed experiments. For the presentation, it would also be important to have baseline comparisons. Could you please add the baseline without paramserv to the experiments. Furthermore, I'll run these experiments with MKL so please double check that the native BLAS configuration is correctly set for distributed spark workers as well (see remote parfor worker setup) > Add experiment on spark paramserv > - > > Key: SYSTEMML-2458 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2458 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (SYSTEMML-2458) Add experiment on spark paramserv
[ https://issues.apache.org/jira/browse/SYSTEMML-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569312#comment-16569312 ] LI Guobao commented on SYSTEMML-2458: - [~mboehm7], for the reason of hoping to have some experiments result for the presentation, I have pushed the latest polished scripts and the new packaged jar with the recent patches. Maybe we could continue to launch the experiments? > Add experiment on spark paramserv > - > > Key: SYSTEMML-2458 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2458 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (SYSTEMML-2458) Add experiment on spark paramserv
[ https://issues.apache.org/jira/browse/SYSTEMML-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556472#comment-16556472 ] Matthias Boehm commented on SYSTEMML-2458: -- Thanks - I just gave it a try and the script failed due to invalid name bindings on function invocations. I just pushed the fix. Subsequently, it ran into SYSTEMML-2466 - maybe you could have a look [~Guobao]? > Add experiment on spark paramserv > - > > Key: SYSTEMML-2458 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2458 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (SYSTEMML-2458) Add experiment on spark paramserv
[ https://issues.apache.org/jira/browse/SYSTEMML-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554752#comment-16554752 ] LI Guobao commented on SYSTEMML-2458: - [~mboehm7], I've pushed the scripts for the distributed spark experiments. Could you please take a look on that? > Add experiment on spark paramserv > - > > Key: SYSTEMML-2458 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2458 > Project: SystemML > Issue Type: Sub-task >Reporter: LI Guobao >Assignee: LI Guobao >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)