Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/6386#issuecomment-112265729
So from running with a slightly larger scaling factor than I was initially
it seems all the approaches are pretty similar in terms of time usage.
origin:
glm-regression, glm-regression --num-trials=10 --inter-trial-wait=3
--num-partitions=6 --random-seed=5 --num-examples=50000 --num-features=10000
--num-iterations=20 --step-size=0.001 --reg-type=l2 --reg-param=0.1
--optimizer=lbfgs --intercept=0.0 --epsilon=0.1 --loss=l2
Training time: 4.5985, 0.175, 4.259, 4.452, 4.579
Test time: 0.1255, 0.007, 0.117, 0.125, 0.117
Training Set Metric: 33.2738700623, 0.133, 33.0214743257, 33.4757304367,
33.2006471123
Test Set Metric: 33.1470085163, 0.274, 32.8030691787, 33.372759872,
33.0449774996
glm-regression, glm-regression --num-trials=10 --inter-trial-wait=3
--num-partitions=6 --random-seed=5 --num-examples=50000 --num-features=10000
--num-iterations=20 --step-size=0.001 --reg-type=l2 --reg-param=0.1
--optimizer=lbfgs --intercept=0.0 --epsilon=0.1 --loss=elastic-net
Training time: 4.5985, 0.175, 4.259, 4.452, 4.579
Test time: 0.1255, 0.007, 0.117, 0.125, 0.117
Training Set Metric: 33.2738700623, 0.133, 33.0214743257, 33.4757304367,
33.2006471123
Test Set Metric: 33.1470085163, 0.274, 32.8030691787, 33.372759872,
33.0449774996
current pr (rt through dataframes):
glm-regression, glm-regression --num-trials=10 --inter-trial-wait=3
--num-partitions=6 --random-seed=5 --num-examples=50000 --num-features=10000
--num-iterations=20 --step-size=0.001 --reg-type=l2 --reg-param=0.1
--optimizer=lbfgs --intercept=0.0 --epsilon=0.1 --loss=l2
Training time: 4.382, 0.486, 3.937, 5.056, 3.937
Test time: 0.1255, 0.012, 0.114, 0.124, 0.115
Training Set Metric: 33.2738700623, 0.133, 33.0214743257, 33.4757304367,
33.2006471123
Test Set Metric: 33.1470085163, 0.274, 32.8030691787, 33.372759872,
33.0449774996
glm-regression, glm-regression --num-trials=10 --inter-trial-wait=3
--num-partitions=6 --random-seed=5 --num-examples=50000 --num-features=10000
--num-iterations=20 --step-size=0.001 --reg-type=l2 --reg-param=0.1
--optimizer=lbfgs --intercept=0.0 --epsilon=0.1 --loss=elastic-net
Training time: 4.382, 0.486, 3.937, 5.056, 3.937
Test time: 0.1255, 0.012, 0.114, 0.124, 0.115
Training Set Metric: 33.2738700623, 0.133, 33.0214743257, 33.4757304367,
33.2006471123
Test Set Metric: 33.1470085163, 0.274, 32.8030691787, 33.372759872,
33.0449774996
pr without the rt through data frames:
glm-regression, glm-regression --num-trials=10 --inter-trial-wait=3
--num-partitions=6 --random-seed=5 --num-examples=50000 --num-features=10000
--num-iterations=20 --step-size=0.001 --reg-type=l2 --reg-param=0.1
--optimizer=lbfgs --intercept=0.0 --epsilon=0.1 --loss=l2
Training time: 4.3305, 0.374, 4.049, 5.034, 4.049
Test time: 0.1225, 0.011, 0.119, 0.149, 0.119
Training Set Metric: 33.2738700623, 0.133, 33.0214743257, 33.4757304367,
33.2006471123
Test Set Metric: 33.1470085163, 0.274, 32.8030691787, 33.372759872,
33.0449774996
glm-regression, glm-regression --num-trials=10 --inter-trial-wait=3
--num-partitions=6 --random-seed=5 --num-examples=50000 --num-features=10000
--num-iterations=20 --step-size=0.001 --reg-type=l2 --reg-param=0.1
--optimizer=lbfgs --intercept=0.0 --epsilon=0.1 --loss=elastic-net
Training time: 4.3305, 0.374, 4.049, 5.034, 4.049
Test time: 0.1225, 0.011, 0.119, 0.149, 0.119
Training Set Metric: 33.2738700623, 0.133, 33.0214743257, 33.4757304367,
33.2006471123
Test Set Metric: 33.1470085163, 0.274, 32.8030691787, 33.372759872,
33.0449774996
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]