Re: Matrix Multiplication and mllib.recommendation

Debasish Das Thu, 18 Jun 2015 08:41:40 -0700

Also not sure how threading helps here because Spark puts a partition to
each core. On each core may be there are multiple threads if you are using
intel hyperthreading but I will let Spark handle the threading.


On Thu, Jun 18, 2015 at 8:38 AM, Debasish Das <[email protected]>
wrote:

> We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS
> dgemm based calculation.
>
> On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat <
> [email protected]> wrote:
>
>> Thanks Sabarish and Nick
>> Would you happen to have some code snippets that you can share.
>> Best
>> Ayman
>>
>> On Jun 17, 2015, at 10:35 PM, Sabarish Sasidharan <
>> [email protected]> wrote:
>>
>> Nick is right. I too have implemented this way and it works just fine. In
>> my case, there can be even more products. You simply broadcast blocks of
>> products to userFeatures.mapPartitions() and BLAS multiply in there to get
>> recommendations. In my case 10K products form one block. Note that you
>> would then have to union your recommendations. And if there lots of product
>> blocks, you might also want to checkpoint once every few times.
>>
>> Regards
>> Sab
>>
>> On Thu, Jun 18, 2015 at 10:43 AM, Nick Pentreath <
>> [email protected]> wrote:
>>
>>> One issue is that you broadcast the product vectors and then do a dot
>>> product one-by-one with the user vector.
>>>
>>> You should try forming a matrix of the item vectors and doing the dot
>>> product as a matrix-vector multiply which will make things a lot faster.
>>>
>>> Another optimisation that is avalailable on 1.4 is a recommendProducts
>>> method that blockifies the factors to make use of level 3 BLAS (ie
>>> matrix-matrix multiply). I am not sure if this is available in The Python
>>> api yet.
>>>
>>> But you can do a version yourself by using mapPartitions over user
>>> factors, blocking the factors into sub-matrices and doing matrix multiply
>>> with item factor matrix to get scores on a block-by-block basis.
>>>
>>> Also as Ilya says more parallelism can help. I don't think it's so
>>> necessary to do LSH with 30,000 items.
>>>
>>> —
>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>
>>>
>>> On Thu, Jun 18, 2015 at 6:01 AM, Ganelin, Ilya <
>>> [email protected]> wrote:
>>>
>>>> Actually talk about this exact thing in a blog post here
>>>> http://blog.cloudera.com/blog/2015/05/working-with-apache-spark-or-how-i-learned-to-stop-worrying-and-love-the-shuffle/.
>>>> Keep in mind, you're actually doing a ton of math. Even with proper caching
>>>> and use of broadcast variables this will take a while defending on the size
>>>> of your cluster. To get real results you may want to look into locality
>>>> sensitive hashing to limit your search space and definitely look into
>>>> spinning up multiple threads to process your product features in parallel
>>>> to increase resource utilization on the cluster.
>>>>
>>>>
>>>>
>>>> Thank you,
>>>> Ilya Ganelin
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> *From: *afarahat [[email protected]]
>>>> *Sent: *Wednesday, June 17, 2015 11:16 PM Eastern Standard Time
>>>> *To: *[email protected]
>>>> *Subject: *Matrix Multiplication and mllib.recommendation
>>>>
>>>> Hello;
>>>> I am trying to get predictions after running the ALS model.
>>>> The model works fine. In the prediction/recommendation , I have about 30
>>>> ,000 products and 90 Millions users.
>>>> When i try the predict all it fails.
>>>> I have been trying to formulate the problem as a Matrix multiplication
>>>> where
>>>> I first get the product features, broadcast them and then do a dot
>>>> product.
>>>> Its still very slow. Any reason why
>>>> here is a sample code
>>>>
>>>> def doMultiply(x):
>>>>         a = []
>>>>         #multiply by
>>>>         mylen = len(pf.value)
>>>>         for i in range(mylen) :
>>>>           myprod = numpy.dot(x,pf.value[i][1])
>>>>           a.append(myprod)
>>>>         return a
>>>>
>>>>
>>>> myModel = MatrixFactorizationModel.load(sc, "FlurryModelPath")
>>>> #I need to select which products to broadcast but lets try all
>>>> m1 = myModel.productFeatures().sample(False, 0.001)
>>>> pf = sc.broadcast(m1.collect())
>>>> uf = myModel.userFeatures()
>>>> f1 = uf.map(lambda x : (x[0], doMultiply(x[1])))
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Matrix-Multiplication-and-mllib-recommendation-tp23384.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com
>>>> .
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>> ------------------------------
>>>> The information contained in this e-mail is confidential and/or
>>>> proprietary to Capital One and/or its affiliates and may only be used
>>>> solely in performance of work or services for Capital One. The information
>>>> transmitted herewith is intended only for use by the individual or entity
>>>> to which it is addressed. If the reader of this message is not the intended
>>>> recipient, you are hereby notified that any review, retransmission,
>>>> dissemination, distribution, copying or other use of, or taking of any
>>>> action in reliance upon this information is strictly prohibited. If you
>>>> have received this communication in error, please contact the sender and
>>>> delete the material from your computer.
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Architect - Big Data
>> Ph: +91 99805 99458
>>
>> Manthan Systems | *Company of the year - Analytics (2014 Frost and
>> Sullivan India ICT)*
>> +++
>>
>>
>>
>

Re: Matrix Multiplication and mllib.recommendation

Reply via email to