[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2017-05-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000188#comment-16000188
 ] 

Apache Spark commented on SPARK-17134:
--

User 'VinceShieh' has created a pull request for this issue:
https://github.com/apache/spark/pull/17894

> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>Assignee: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2017-05-07 Thread Vincent (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000176#comment-16000176
 ] 

Vincent commented on SPARK-17134:
-

I will submit a PR for this issue soon.

> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>Assignee: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2016-09-23 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515529#comment-15515529
 ] 

Seth Hendrickson commented on SPARK-17134:
--

This makes sense. In my initial testing I found that having to standardize the 
features in every iteration takes a non-trivial amount of time. Still, you 
mentioned the desire to not cache the standardized dataset since it can create 
unnecessary memory overhead. One solution is to allow the users to specify that 
there data has already been standardized, and then we don't have to perform the 
extra divisions in the update method. Alternatively, we could do as you suggest 
above, but store the coefficients in column major order in order to still 
maximize cache hits.

We'll need some testing for both cases to truly understand this.

> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2016-09-22 Thread DB Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515491#comment-15515491
 ] 

DB Tsai commented on SPARK-17134:
-

I did benchmark again. In old implementation, it takes 1.3hrs for one 
iteration, and in new implementation, it takes 3.5hrs for one iteration. I ran 
both experiment in the same spark job for fairness since they will get the same 
# of executors. I suspect that in old implementation, we cache the standardized 
dataset resulting better performance.   

> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2016-09-21 Thread DB Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510672#comment-15510672
 ] 

DB Tsai commented on SPARK-17134:
-

I'll try the old mlor in rdd tonight when the cluster is not busy. Actually, 
this is a very large training dataset, and around 160GB in memory. Since there 
are 22533 classes, and 100 features, the total parameters are 2.2M. I expect 
that level 2 blas will help significantly in this case.  

> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2016-09-21 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510198#comment-15510198
 ] 

Seth Hendrickson commented on SPARK-17134:
--

Hmm, it would be nice to see this vs the old mlor in rdd API, just as a sanity 
check. I conducted performance testing against mllib initially, though, so 
there shouldn't be any regressions.

> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2016-09-21 Thread DB Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509053#comment-15509053
 ] 

DB Tsai commented on SPARK-17134:
-

I'm benchmarking MLOR with 22533 of classes, and dense feature of 100. The 
number of instances are 200M. On a cluster with 1k executors, it takes 2.5 
hours for one iteration. Will be great that we can do some performance 
investigation to see if we can push the performance further. Thanks.

> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2016-08-21 Thread Qian Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429941#comment-15429941
 ] 

Qian Huang commented on SPARK-17134:


Thank you. I will do it.

> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2016-08-19 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429196#comment-15429196
 ] 

Yanbo Liang commented on SPARK-17134:
-

[~qhuang] Please feel free to take this task and do the performance 
investigation. Thanks! 

> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2016-08-19 Thread DB Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428848#comment-15428848
 ] 

DB Tsai commented on SPARK-17134:
-

{code:borderStyle=solid}
val margins = Array.ofDim[Double](numClasses)
features.foreachActive { (index, value) =>
  if (featuresStd(index) != 0.0 && value != 0.0) {
var i = 0
val temp = value / featuresStd(index)
while ( i < numClasses) {
  margins(i) += coefficients(i * numFeaturesPlusIntercept + index) * temp
  i += 1
   }
  }
}

if (fitIntercept) {
  var i = 0
  val length = features.size
  while ( i < numClasses) {
margins(i) += coefficients(i * numFeaturesPlusIntercept + length)
i += 1
  }
}

val maxMargin = margins.max
val marginOfLabel = margins(label.toInt)
{code}


> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2016-08-19 Thread Qian Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428228#comment-15428228
 ] 

Qian Huang commented on SPARK-17134:


I could be your backup if you are not available. This task is sort of like 
SPARK-6685 what i have done.

> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2016-08-19 Thread Qian Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428227#comment-15428227
 ] 

Qian Huang commented on SPARK-17134:


I could be your backup if you are not available. This task is sort of like 
SPARK-6685 what i have done.

> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator

2016-08-18 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427529#comment-15427529
 ] 

Yanbo Liang commented on SPARK-17134:
-

This is interesting. We also trying to use BLAS to accelerate linear algebra 
operations in other algorithms such as {{KMeans/ALS}} and I have some basic 
performance test result. I would like to contribute to this task. Thanks!

> Use level 2 BLAS operations in LogisticAggregator
> -
>
> Key: SPARK-17134
> URL: https://issues.apache.org/jira/browse/SPARK-17134
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient 
> updates. We should look into refactoring MLOR to use level 2 BLAS operations 
> for the updates. Performance testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org