Hi Pat,
Thanks again, spark-1.1.0 works without compilations and the errors have
gone. But still, there is out of memory problem. The error occurred when
spark is trying to write broadcast variable to desk. I tried to give each
executer 25g of memory but the same error occurs again. Also, I
The data structure is a HashBiMap from Guava. Yes they could be replaced with
joins but there is some extra complexity. The code would have to replace each
HashBiMap with some RDD backed collection. But if there is memory available
perhaps something else is causing the error. Let’s think this
Hi Pat Ferrel
Use option --omitStrength to output indexable data but this lead to less
accuracy while querying due to omit similar values between items.
Whether can put these values in order to improve accuracy in a search engine
On 23 December 2014 at 02:17, Pat Ferrel p...@occamsmachete.com
@Pat, Thanks for your answers. It seems that I have cloned the snapshot
before the feature of configuring spark was added. It worked now in the
local mode. Unfortunately, after trying the new snapshot and spark,
submitting to the cluster in yarn-client mode raise the following error:
Exception in
@Pat, I am aware of your blog and of Ted practical machine learning books
and webinars. I have learn a lot
from you guys ;)
@Ted, It is 3 nodes small cluster for POC. Spark executer is given 2g and
yarn is configured accordingly. I am trying to avoid spark memory caching.
@Simon, I am using
On Tue, Dec 23, 2014 at 7:39 AM, AlShater, Hani halsha...@souq.com wrote:
@Ted, It is 3 nodes small cluster for POC. Spark executer is given 2g and
yarn is configured accordingly. I am trying to avoid spark memory caching.
Have you tried the map-reduce version?
Why do you say it will lead to less accuracy?
The weights are LLR weights and they are used to filter and downsample the
indicator matrix. Once the downsampling is done they are not needed. When you
index the indicators in a search engine they will get TF-IDF weights and this
is a good effect.
Both errors happen when the Spark Context is created using Yarn. I have no
experience with Yarn and so would try it in standalone clustered mode first.
Then if all is well check this page to make sure the Spark cluster is
configured correctly for Yarn
Thank you for your explanation
There is a situation that I'm not clear, I have the result of item
similarity
iphonenexus:1 ipad:10
surface nexus:10 ipad:1 galaxy:1
Omit LLR weights then
If a user A has the purchase history : 'nexus', which one the
recommendation engine should prefer -
There is a large-ish data structure in the Spark version of this algorithm.
Each slave has a copy of several BiMaps that handle translation of your IDs
into and out of Mahout IDs. One of these is created for user IDs, and one for
each item ID set. For a single action that would be 2 BiMaps.
On Tue, Dec 23, 2014 at 9:16 AM, Pat Ferrel p...@occamsmachete.com wrote:
To use the hadoop mapreduce version (Ted’s suggestion) you’ll loose the
cross-cooccurrence indicators and you’ll have to translate your IDs into
Mahout IDs. This means mapping user and item IDs from your values into
First of all you need to index that indicator matrix with a search engine. Then
the query will be your user’s history. The search engine weights with TF-IDF
and the query is based on cosine similarity of doc to query terms. So the
weights won’t be the ones you have below, they will be TF-IDF
The job has an option -sem to set the spark.executor.memory config. Also you
can change runtime job config with -D:key=value to access any of the Spark
config values.
On Dec 21, 2014, at 11:44 PM, AlShater, Hani halsha...@souq.com wrote:
Hi All,
I am trying to use spark-itemsimilarity on 160M
Can you say what kind of cluster you have?
How many machines? How much memory? How much memory is given to Spark?
On Sun, Dec 21, 2014 at 11:44 PM, AlShater, Hani halsha...@souq.com wrote:
Hi All,
I am trying to use spark-itemsimilarity on 160M user interactions dataset.
The job launches
Hi Hani,
I recently read about Souq.com. A vey promising project.
If you are looking at the spark-itemsimilarity for ecommerce type
recommendations you may be interested in some slide decs and blog posts I’ve
done on the subject.
Check out:
Also Ted has an ebook you can download:
mapr.com/practical-machine-learning
On Dec 22, 2014, at 10:52 AM, Pat Ferrel p...@occamsmachete.com wrote:
Hi Hani,
I recently read about Souq.com. A vey promising project.
If you are looking at the spark-itemsimilarity for ecommerce type
Hi All,
I am trying to use spark-itemsimilarity on 160M user interactions dataset.
The job launches and running successfully for small data 1M action.
However, when trying for the larger dataset, some spark stages continuously
fail with out of memory exception.
I tried to change the
17 matches
Mail list logo