quot;)
gregconvdf = gregconv.map(lambda x: ConvRecord(*x)).toDF()
i get the following error
Traceback (most recent call last):
File "", line 1, in
File "/homes/afarahat/aofspark/share/spark/python/pyspark/sql/context.py",
line 60, in toDF
return sqlContext.create
Hello;
I am using the ALS recommendation MLLibb. To select the optimal rank, I have
a number of users who used multiple items as my test. I then get the
prediction on these users and compare it to the observed. I use
the RegressionMetrics to estimate the R^2.
I keep getting a negative value.
r
any guidance how to set these 2?
I have way more users (100s of millions than items)
Thanks
Ayman
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ALS-how-to-set-numUserBlocks-and-numItemBlocks-tp23503.html
Sent from the Apache Spark User List mailing list ar
I have a simple HQL (below). In hive it takes maybe 10 minutes to complete.
When I do this with Spark it seems to take for every. The table is
partitioned by "datestamp". I am using Spark 1.3.1
How can i tune/optimize
here is the query
tumblruser=hiveCtx.sql(" select s_mobile_id, receive_time
Hello;
I am fitting ALS models and would like to get an initial idea of the number
of factors.I wan tot use the reconstruction error on train data as a
measure. Does the API expose the reconstruction error ?
Thanks
Ayman
--
View this message in context:
http://apache-spark-user-list.1001560.n
Hello;
I am trying to get the optimal number of factors in ALS. To that end, i am
scanning various values and evaluating the RSE. DO i need to un-perisist the
RDD between loops or will the resources (memory) get automatically deleted
and re-assigned between iterations.
for i in range(5):
ra
Hello;
I am trying to get predictions after running the ALS model.
The model works fine. In the prediction/recommendation , I have about 30
,000 products and 90 Millions users.
When i try the predict all it fails.
I have been trying to formulate the problem as a Matrix multiplication where
I fi
Hello
I would like to Multiply two matrices
C = A* B
A is a m x k , B is a kxl
k,l m so that B can easily fit in memory.
Any ideas or suggestions how to do that in Pyspark?
Thanks
Ayman
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Den
Hello;
I have a data set of about 80 Million users and 12,000 items (very sparse ).
I can get the training part working no problem. (model has 20 factors),
However, when i try using Predict all for 80 Million x 10 items , the jib
does not complete.
When i use a smaller data set say 500k or a mi