Re: How to speed up MLlib LDA?

2015-09-22 Thread Charles Earl
It seems that the Vowpal Wabbit version is most similar to what is in

https://github.com/intel-analytics/TopicModeling/blob/master/src/main/scala/org/apache/spark/mllib/topicModeling/OnlineHDP.scala
Although the Intel seems to implement the Hierarchical Dirichlet Process
(topics and subtopics) as opposed to the implementation in VW, which is
based on
   https://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf
As opposed to Monte Carlo methods, in the HDP/VW they are using iterative
optimization of model parameters with respect predicted tokens (my best
shot at a one sentence).
The VW code is *highly* optimized.

https://github.com/JohnLangford/vowpal_wabbit/blob/master/vowpalwabbit/lda_core.cc
A fast inferencer for Spark LDA would be of great value.
C

On Tue, Sep 22, 2015 at 1:30 PM, Pedro Rodriguez 
wrote:

> I helped some with the LDA and worked quite a bit on a Gibbs version. I
> don't know if the Gibbs version might help, but since it is not (yet) in
> MLlib, Intel Analytics kindly created a spark package with their adapted
> version plus a couple other LDA algorithms:
> http://spark-packages.org/package/intel-analytics/TopicModeling
> https://github.com/intel-analytics/TopicModeling
>
> It might be worth trying out. Do you know what LDA algorithm VW uses?
>
> Pedro
>
>
> On Tue, Sep 22, 2015 at 1:54 AM, Marko Asplund 
> wrote:
>
>> Hi,
>>
>> I did some profiling for my LDA prototype code that requests topic
>> distributions from a model.
>> According to Java Mission Control more than 80 % of execution time during
>> sample interval is spent in the following methods:
>>
>> org.apache.commons.math3.util.FastMath.log(double); count: 337; 47.07%
>> org.apache.commons.math3.special.Gamma.digamma(double); count: 164; 22.91%
>> org.apache.commons.math3.util.FastMath.log(double, double[]); count: 50;
>> 6.98%
>> java.lang.Double.valueOf(double); count: 31; 4.33%
>>
>> Is there any way of using the API more optimally?
>> Are there any opportunities for optimising the "topicDistributions" code
>> path in MLlib?
>>
>> My code looks like this:
>>
>> // executed once
>> val model = LocalLDAModel.load(ctx, ModelFileName)
>>
>> // executed four times
>> val samples = Transformers.toSparseVectors(vocabularySize,
>> ctx.parallelize(Seq(input))) // fast
>> model.topicDistributions(samples.zipWithIndex.map(_.swap)) // <== this
>> seems to take about 4 seconds to execute
>>
>>
>> marko
>>
>
>
>
> --
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> UC Berkeley AMPLab Alumni
>
> ski.rodrig...@gmail.com | pedrorodriguez.io | 208-340-1703
> Github: github.com/EntilZha | LinkedIn:
> https://www.linkedin.com/in/pedrorodriguezscience
>
>


-- 
- Charles


Re: How to speed up MLlib LDA?

2015-09-22 Thread Marko Asplund
How optimized are the Commons math3 methods that showed up in profiling?
Are there any higher performance alternatives to these?

marko


Re: How to speed up MLlib LDA?

2015-09-22 Thread Pedro Rodriguez
I helped some with the LDA and worked quite a bit on a Gibbs version. I
don't know if the Gibbs version might help, but since it is not (yet) in
MLlib, Intel Analytics kindly created a spark package with their adapted
version plus a couple other LDA algorithms:
http://spark-packages.org/package/intel-analytics/TopicModeling
https://github.com/intel-analytics/TopicModeling

It might be worth trying out. Do you know what LDA algorithm VW uses?

Pedro


On Tue, Sep 22, 2015 at 1:54 AM, Marko Asplund 
wrote:

> Hi,
>
> I did some profiling for my LDA prototype code that requests topic
> distributions from a model.
> According to Java Mission Control more than 80 % of execution time during
> sample interval is spent in the following methods:
>
> org.apache.commons.math3.util.FastMath.log(double); count: 337; 47.07%
> org.apache.commons.math3.special.Gamma.digamma(double); count: 164; 22.91%
> org.apache.commons.math3.util.FastMath.log(double, double[]); count: 50;
> 6.98%
> java.lang.Double.valueOf(double); count: 31; 4.33%
>
> Is there any way of using the API more optimally?
> Are there any opportunities for optimising the "topicDistributions" code
> path in MLlib?
>
> My code looks like this:
>
> // executed once
> val model = LocalLDAModel.load(ctx, ModelFileName)
>
> // executed four times
> val samples = Transformers.toSparseVectors(vocabularySize,
> ctx.parallelize(Seq(input))) // fast
> model.topicDistributions(samples.zipWithIndex.map(_.swap)) // <== this
> seems to take about 4 seconds to execute
>
>
> marko
>



-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodrig...@gmail.com | pedrorodriguez.io | 208-340-1703
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience


Re: How to speed up MLlib LDA?

2015-09-22 Thread Marko Asplund
Hi,

I did some profiling for my LDA prototype code that requests topic
distributions from a model.
According to Java Mission Control more than 80 % of execution time during
sample interval is spent in the following methods:

org.apache.commons.math3.util.FastMath.log(double); count: 337; 47.07%
org.apache.commons.math3.special.Gamma.digamma(double); count: 164; 22.91%
org.apache.commons.math3.util.FastMath.log(double, double[]); count: 50;
6.98%
java.lang.Double.valueOf(double); count: 31; 4.33%

Is there any way of using the API more optimally?
Are there any opportunities for optimising the "topicDistributions" code
path in MLlib?

My code looks like this:

// executed once
val model = LocalLDAModel.load(ctx, ModelFileName)

// executed four times
val samples = Transformers.toSparseVectors(vocabularySize,
ctx.parallelize(Seq(input))) // fast
model.topicDistributions(samples.zipWithIndex.map(_.swap)) // <== this
seems to take about 4 seconds to execute


marko


Re: How to speed up MLlib LDA?

2015-09-17 Thread Marko Asplund
Hi Feynman,

I just tried that, but there wasn't a noticeable change in training
performance. On the other hand model loading time was reduced to ~ 5
seconds from ~ 2 minutes (now persisted as LocalLDAModel).

However, query / prediction time was unchanged.
Unfortunately, this is the critical performance characteristic in our case.

marko


On 15 September 2015 at 19:26, Feynman Liang  wrote:

> Hi Marko,
>
> I haven't looked into your case in much detail but one immediate thought
> is: have you tried the OnlineLDAOptimizer? It's implementation and
> resulting LDA model (LocalLDAModel) is quite different (doesn't depend on
> GraphX, assumes the model fits on a single machine) so you may see
> performance differences.
>
> Feynman
>
>


How to speed up MLlib LDA?

2015-09-15 Thread Marko Asplund
Hi,

I'm trying out MLlib LDA training with 100 topics, 105 K vocabulary size
and ~3.4 M documents using EMLDAOptimizer.

Training the model took ~2.5 hours with MLlib, whereas with Vowpal Wabbit
training with the same data and on the same system set took ~5 minutes.

I realize that there are differences in the LDA implementations, but which
parameters should I tweak to make the LDA implementations work with similar
operational parameters and thus make the results more comparable?

Any suggestions on how to speed up MLlib LDA and thoughts on speed-accuracy
tradeoffs?

The log includes the following message, which AFAIK, should mean
that netlib-java is using machine optimised native implementation:
"com.github.fommil.jni.JniLoader - successfully loaded
/tmp/jniloader4682745056459314976netlib-native_system-linux-x86_64.so"

The LDA model training proto code I'm using can be found here:
https://github.com/marko-asplund/tech-protos/blob/master/mllib-lda/src/main/scala/fi/markoa/proto/mllib/LDADemo.scala#L33-L47


marko


Re: How to speed up MLlib LDA?

2015-09-15 Thread Feynman Liang
Hi Marko,

I haven't looked into your case in much detail but one immediate thought
is: have you tried the OnlineLDAOptimizer? It's implementation and
resulting LDA model (LocalLDAModel) is quite different (doesn't depend on
GraphX, assumes the model fits on a single machine) so you may see
performance differences.

Feynman

On Tue, Sep 15, 2015 at 6:37 AM, Marko Asplund 
wrote:

>
> While doing some more testing I noticed that loading the persisted model
> from disk (~2 minutes) as well as querying LDA model topic distributions
> (~4 seconds for one document) are quite slow operations.
>
> Our application is querying LDA model topic distribution (for one doc at a
> time) as part of end-user operation execution flow, so a ~4 second
> execution time is very problematic. Am I using the MLlib LDA API correctly
> or is this just reflecting the current performance characteristics of the
> LDA implementation? My code can be found here:
>
>
> https://github.com/marko-asplund/tech-protos/blob/master/mllib-lda/src/main/scala/fi/markoa/proto/mllib/LDADemo.scala#L56-L57
>
> For what kinds of use cases are people currently using the LDA
> implementation?
>
>
> marko
>


Re: How to speed up MLlib LDA?

2015-09-15 Thread Marko Asplund
While doing some more testing I noticed that loading the persisted model
from disk (~2 minutes) as well as querying LDA model topic distributions
(~4 seconds for one document) are quite slow operations.

Our application is querying LDA model topic distribution (for one doc at a
time) as part of end-user operation execution flow, so a ~4 second
execution time is very problematic. Am I using the MLlib LDA API correctly
or is this just reflecting the current performance characteristics of the
LDA implementation? My code can be found here:

https://github.com/marko-asplund/tech-protos/blob/master/mllib-lda/src/main/scala/fi/markoa/proto/mllib/LDADemo.scala#L56-L57

For what kinds of use cases are people currently using the LDA
implementation?


marko