We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS
dgemm based calculation.

On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat <
ayman.fara...@yahoo.com.invalid> wrote:

> Thanks Sabarish and Nick
> Would you happen to have some code snippets that you can share.
> Best
> Ayman
>
> On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan <
> sabarish.sasidha...@manthan.com> wrote:
>
> Nick is right. I too have implemented this way and it works just fine. In
> my case, there can be even more products. You simply broadcast blocks of
> products to userFeatures.mapPartitions() and BLAS multiply in there to get
> recommendations. In my case 10K products form one block. Note that you
> would then have to union your recommendations. And if there lots of product
> blocks, you might also want to checkpoint once every few times.
>
> Regards
> Sab
>
> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath <nick.pentre...@gmail.com
> > wrote:
>
>> One issue is that you broadcast the product vectors and then do a dot
>> product one-by-one with the user vector.
>>
>> You should try forming a matrix of the item vectors and doing the dot
>> product as a matrix-vector multiply which will make things a lot faster.
>>
>> Another optimisation that is avalailable on 1.4 is a recommendProducts
>> method that blockifies the factors to make use of level 3 BLAS (ie
>> matrix-matrix multiply). I am not sure if this is available in The Python
>> api yet.
>>
>> But you can do a version yourself by using mapPartitions over user
>> factors, blocking the factors into sub-matrices and doing matrix multiply
>> with item factor matrix to get scores on a block-by-block basis.
>>
>> Also as Ilya says more parallelism can help. I don't think it's so
>> necessary to do LSH with 30,000 items.
>>
>> —
>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>
>>
>> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <
>> ilya.gane...@capitalone.com> wrote:
>>
>>> Actually talk about this exact thing in a blog post here
>>> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>>> Keep in mind, you're actually doing a ton of math. Even with proper caching
>>> and use of broadcast variables this will take a while defending on the size
>>> of your cluster. To get real results you may want to look into locality
>>> sensitive hashing to limit your search space and definitely look into
>>> spinning up multiple threads to process your product features in parallel
>>> to increase resource utilization on the cluster.
>>>
>>>
>>>
>>> Thank you,
>>> Ilya Ganelin
>>>
>>>
>>>
>>> -----Original Message-----
>>> *From: *afarahat [ayman.fara...@yahoo.com]
>>> *Sent: *Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
>>> *To: *user@spark.apache.org
>>> *Subject: *Matrix Multiplication and mllib.recommendation
>>>
>>> Hello;
>>> I am trying to get predictions after running the ALS model.
>>> The model works fine. In the prediction/recommendation , I have about 30
>>> ,000 products and 90 Millions users.
>>> When i try the predict all it fails.
>>> I have been trying to formulate the problem as a Matrix multiplication
>>> where
>>> I first get the product features, broadcast them and then do a dot
>>> product.
>>> Its still very slow. Any reason why
>>> here is a sample code
>>>
>>> def doMultiply(x):
>>>         a = []
>>>         #multiply by
>>>         mylen = len(pf.value)
>>>         for i in range(mylen) :
>>>           myprod = numpy.dot(x,pf.value[i][1])
>>>           a.append(myprod)
>>>         return a
>>>
>>>
>>> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
>>> #I need to select which products to broadcast but lets try all
>>> m1 = myModel.productFeatures().sample(False, 0.001)
>>> pf = sc.broadcast(m1.collect())
>>> uf = myModel.userFeatures()
>>> f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>> ------------------------------
>>> The information contained in this e-mail is confidential and/or
>>> proprietary to Capital One and/or its affiliates and may only be used
>>> solely in performance of work or services for Capital One. The information
>>> transmitted herewith is intended only for use by the individual or entity
>>> to which it is addressed. If the reader of this message is not the intended
>>> recipient, you are hereby notified that any review, retransmission,
>>> dissemination, distribution, copying or other use of, or taking of any
>>> action in reliance upon this information is strictly prohibited. If you
>>> have received this communication in error, please contact the sender and
>>> delete the material from your computer.
>>>
>>
>>
>
>
> --
>
> Architect - Big Data
> Ph: +91 99805 99458
>
> Manthan Systems | *Company of the year - Analytics (2014 Frost and
> Sullivan India ICT)*
> +++
>
>
>

Reply via email to