Re: Scientific articles or a Mathematical description

2018-04-23 Thread GMAIL
Thank You, will study)

2018-04-23 19:41 GMT+03:00 Andrew Troemner <atroem...@salesforce.com>:

> Hi,
>
> PredictionIO serves as a framework for other numerical processing
> libraries, primarily Spark's MLlib. You can read more of the documentation
> on the Spark website <https://spark.apache.org/docs/latest/ml-guide.html>.
>
> The library has many tools related to classification, regression,
> collaborative filtering, NLP, and other related tasks, so it's probably
> easier to describe the broad task you want to do and then find out the
> specific Spark / Java / Scala implementation API's.
>
> If you want to learn more about several statistical techniques, I would
> recommend a general statistical book that describes many common techniques.
> I find the Elements of Statistical Learning
> <https://web.stanford.edu/~hastie/ElemStatLearn/> by Springer Press to be
> particularly useful to that end. There are several other books about
> Scikit-learn that also describe very well the variety of statistical
> algorithms available to Data Scientists.
>
> Hope that helps!
>
>
> *ANDREW TROEMNER*Associate Principal Data Scientist | salesforce.com
> Office: 317.832.4404
> Mobile: 317.531.0216
> <http://smart.salesforce.com/sig/atroemner//us_mb_kb/default/link.html>
>
> On Mon, Apr 23, 2018 at 12:32 PM, GMAIL <babaevka...@gmail.com> wrote:
>
>> Hello.
>> Is it possible to find scientific articles or a mathematical description
>> of the work of PredictionIO anywhere?
>> I could not find anything on predictionio.apache.org except for a brief
>> description of the principle of work and deployment documentation.
>>
>
>


Scientific articles or a Mathematical description

2018-04-23 Thread GMAIL
Hello.
Is it possible to find scientific articles or a mathematical description of
the work of PredictionIO anywhere?
I could not find anything on predictionio.apache.org except for a brief
description of the principle of work and deployment documentation.


Re: Recommendation return score more than 5

2017-12-22 Thread GMAIL
But I strictly followed the instructions from the site and did not change
anything even. Everything I did was steps from this page. I did not perform
any additional operations, including editing the source code.

Instruction (Quick Start - Recommendation Engine Template):
http://predictionio.incubator.apache.org/templates/recommendation/quickstart/

2017-12-22 22:12 GMT+03:00 Pat Ferrel <p...@occamsmachete.com>:

> Implicit means you assign a score to the event based on your own guess.
> Explicit uses ratings the user makes. One score is a guess by you (like a 4
> for buy) and the other is a rating made by the user. ALS comes in 2
> flavors, one for explicit scoring, used to predict rating and the other for
> implicit scoring used to predict something the user will prefer.
>
> Make sure your template is using the explicit version of ALS.
> https://spark.apache.org/docs/2.2.0/ml-collaborative-
> filtering.html#explicit-vs-implicit-feedback
>
>
> On Dec 21, 2017, at 11:09 PM, GMAIL <babaevka...@gmail.com> wrote:
>
> I wanted to use the Recomender because I expected that it could predict
> the scores as it is done by MovieLens. And it seems to be doing so, but
> for some reason the input and output scale is different. In imported
> scores, from 1 to 5, and in the predicted from 1 to 10.
>
> If by implicit scores you mean events without parameters, then I am aware
> that in essence there is also an score. I watched the DataSource in
> Recommender and there were only two events: rate and buy. Rate takes an
> score, and the buy implicitly puts the rating at 4 (out of 5, as I think).
>
> And I still did not understand exactly where to look for me and what to
> correct, so that incoming and predicted estimates were on the same scale.
>
> 2017-12-19 4:10 GMT+03:00 Pat Ferrel <p...@occamsmachete.com>:
>
>> There are 2 types of MLlib ALS recommenders last I checked, implicit and
>> explicit. Implicit ones you give any arbitrary score, like a 1 for
>> purchase. The explicit one you can input ratings and it is expected to
>> predict ratings for an individual. But both iirc also have a regularization
>> parameter that affects the scoring and is a param so you have to experiment
>> with it using cross-validation to see where you get the best results.
>>
>> There is an old metric used for this type of thing called RMSE
>> (root-mean-square error) which, when minimized will give you scores that
>> most closely match actual scores in the hold-out set (see wikipedia on
>> cross-validation and RMSE). You may have to use explicit ALS and tweak the
>> regularization param, to get the lowest RMSE. I doubt anything will
>> guarantee them to be in exactly the range of ratings so you’ll then need to
>> pick the closest rating.
>>
>>
>> On Dec 18, 2017, at 10:42 AM, GMAIL <babaevka...@gmail.com> wrote:
>>
>> That is, the predicted scores that the Recommender returns can not just
>> be multiplied by two, but may be completely wrong?
>> I can not, say, just divide the predictions by 2 and pretend that
>> everything is fine?
>>
>> 2017-12-18 21:35 GMT+03:00 Pat Ferrel <p...@occamsmachete.com>:
>>
>>> The UR and the Recommendations Template use very different technology
>>> underneath.
>>>
>>> In general the scores you get from recommenders are meaningless on their
>>> own. When using ratings as numerical values with a ”Matrix Factorization”
>>> recommender like the ones in MLlib, upon which the Recommendations Template
>>> is based need to have a regularization parameter. I don’t know for sure but
>>> maybe this is why the results don’t come in the range of input ratings. I
>>> haven’t looked at the code in a long while.
>>>
>>> If you are asking about the UR it would not take numeric ratings and the
>>> scores cannot be compared to them.
>>>
>>> For many reasons that I have written about before I always warn people
>>> about using ratings, which have been discontinued as a source of input for
>>> Netflix (who have removed them from their UX) and many other top
>>> recommender users. There are many reasons for this, not the least of which
>>> is that they are ambiguous and don’t directly relate to whether a user
>>> might like an item. For instance most video sources now use something like
>>> the length of time a user watches a video, and review sites prefer “like”
>>> and “dislike”. The first is implicit and the second is quite unambiguous.
>>>
>>>
>>> On Dec 18, 2017, at 12:32 AM, GMAIL <babaevka...@gmail.com> wrote:
>>>
>>> Does it se

Re: Recommendation return score more than 5

2017-12-21 Thread GMAIL
I wanted to use the Recomender because I expected that it could predict the
scores as it is done by MovieLens. And it seems to be doing so, but for
some reason the input and output scale is different. In imported scores,
from 1 to 5, and in the predicted from 1 to 10.

If by implicit scores you mean events without parameters, then I am aware
that in essence there is also an score. I watched the DataSource in
Recommender and there were only two events: rate and buy. Rate takes an
score, and the buy implicitly puts the rating at 4 (out of 5, as I think).

And I still did not understand exactly where to look for me and what to
correct, so that incoming and predicted estimates were on the same scale.

2017-12-19 4:10 GMT+03:00 Pat Ferrel <p...@occamsmachete.com>:

> There are 2 types of MLlib ALS recommenders last I checked, implicit and
> explicit. Implicit ones you give any arbitrary score, like a 1 for
> purchase. The explicit one you can input ratings and it is expected to
> predict ratings for an individual. But both iirc also have a regularization
> parameter that affects the scoring and is a param so you have to experiment
> with it using cross-validation to see where you get the best results.
>
> There is an old metric used for this type of thing called RMSE
> (root-mean-square error) which, when minimized will give you scores that
> most closely match actual scores in the hold-out set (see wikipedia on
> cross-validation and RMSE). You may have to use explicit ALS and tweak the
> regularization param, to get the lowest RMSE. I doubt anything will
> guarantee them to be in exactly the range of ratings so you’ll then need to
> pick the closest rating.
>
>
> On Dec 18, 2017, at 10:42 AM, GMAIL <babaevka...@gmail.com> wrote:
>
> That is, the predicted scores that the Recommender returns can not just be
> multiplied by two, but may be completely wrong?
> I can not, say, just divide the predictions by 2 and pretend that
> everything is fine?
>
> 2017-12-18 21:35 GMT+03:00 Pat Ferrel <p...@occamsmachete.com>:
>
>> The UR and the Recommendations Template use very different technology
>> underneath.
>>
>> In general the scores you get from recommenders are meaningless on their
>> own. When using ratings as numerical values with a ”Matrix Factorization”
>> recommender like the ones in MLlib, upon which the Recommendations Template
>> is based need to have a regularization parameter. I don’t know for sure but
>> maybe this is why the results don’t come in the range of input ratings. I
>> haven’t looked at the code in a long while.
>>
>> If you are asking about the UR it would not take numeric ratings and the
>> scores cannot be compared to them.
>>
>> For many reasons that I have written about before I always warn people
>> about using ratings, which have been discontinued as a source of input for
>> Netflix (who have removed them from their UX) and many other top
>> recommender users. There are many reasons for this, not the least of which
>> is that they are ambiguous and don’t directly relate to whether a user
>> might like an item. For instance most video sources now use something like
>> the length of time a user watches a video, and review sites prefer “like”
>> and “dislike”. The first is implicit and the second is quite unambiguous.
>>
>>
>> On Dec 18, 2017, at 12:32 AM, GMAIL <babaevka...@gmail.com> wrote:
>>
>> Does it seem to me or UR strongly differs from Recommender?
>> At least I can't find method getRatings in class DataSource, which
>> contains all events, in particular, "rate", that I needed.
>>
>> 2017-12-18 11:14 GMT+03:00 Noelia Osés Fernández <no...@vicomtech.org>:
>>
>>> I didn't solve the problem :(
>>>
>>> Now I use the universal recommender
>>>
>>> On 18 December 2017 at 09:12, GMAIL <babaevka...@gmail.com> wrote:
>>>
>>>> And how did you solve this problem? Did you divide prediction score by
>>>> 2?
>>>>
>>>> 2017-12-18 10:40 GMT+03:00 Noelia Osés Fernández <no...@vicomtech.org>:
>>>>
>>>>> I got the same problem. I still don't know the answer to your question
>>>>> :(
>>>>>
>>>>> On 17 December 2017 at 14:07, GMAIL <babaevka...@gmail.com> wrote:
>>>>>
>>>>>> I thought that there was a 5 point scale, but if so, why do I get
>>>>>> predictions of 7, 8, etc.?
>>>>>>
>>>>>> P.S. Sorry for my English.
>>>>>>
>>>>>> 2017-12-17 16:05 GMT+03:00 GMAIL <babaevka...@gmail.com

SparkUncaughtExceptionHandler - ... java.lang.StackOverflowError

2017-12-16 Thread GMAIL
Hi.
All times, when I try start "pio train", I get this error:
https://pastebin.com/ADaM0q1z
When I ran it into VirtualBox on my laptop, I thought it was due to lack of
resources, but now the server seems to be 16GB and on AWS Amazon (r4.large).
What could be the problem?

I did everything according to instructions from here:
Installing Apache PredictionIO from Source Code

Quick Start - Recommendation Engine Template



P.S. Yes, I know that AWS Marketplace have PredictionIO, but I get task
build it self. And I haven't access to AWS Panel, only Ubuntu Server
through SSH.