[jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS

2016-11-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696250#comment-15696250
 ] 

ASF GitHub Bot commented on FLINK-4712:
---

Github user thvasilo commented on the issue:

https://github.com/apache/flink/pull/2838
  
> The problem is not with the evaluate(test: TestType): DataSet[Double] but 
rather with evaluate(test: TestType): DataSet[(Prediction,Prediction)].

Completely agree there, I advocated for removing/renaming the evaluate 
function, we considered using a `score` function for a more sklearn-like 
approach before, see e.g. #902. Having _some_ function that returns a 
`DataSet[(truth: Prediction,pred: Prediction)]` is useful and probably 
necessary, but we should look at alternatives as the current state is confusing.
I think I like the approach you are suggesting, so feel free to come up 
with an alternative in the WIP PRs.

Getting rid of the Pipeline requirements for recommendation algorithms 
would simplify some things. In that case we'll have to re-evaluate if it makes 
sense for them to implement the `Predictor` interface at all, or maybe we have 
`ChainablePredictors` but I think our hierarchy is deep enough already.


> Implementing ranking predictions for ALS
> 
>
> Key: FLINK-4712
> URL: https://issues.apache.org/jira/browse/FLINK-4712
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Domokos Miklós Kelen
>Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 'exclude_known' setting of [Graphlab Create's ranking 
> factorization 
> recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
>  ).
> The output of the topK recommendation function could be in the form of 
> {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab 
> Create's output. However, this is arguable: follow up work includes 
> implementing ranking recommendation evaluation metrics (such as precision@k, 
> recall@k, ndcg@k), similar to [Spark's 
> implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
>  It would be beneficial if we were able to design the API such that it could 
> be included in the proposed evaluation framework (see 
> [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it 
> neccessary to consider the possible output type {{DataSet[(Int, 
> Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, 
> array of items), possibly including the predicted scores as well. See 
> [4713|https://issues.apache.org/jira/browse/FLINK-4713] for details.
> Another question arising is whether to provide this function as a member of 
> the ALS class, as a switch-kind of parameter to the ALS implementation 
> (meaning the model is either a rating or a ranking recommender model) or in 
> some other way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS

2016-11-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15693381#comment-15693381
 ] 

ASF GitHub Bot commented on FLINK-4712:
---

Github user gaborhermann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2838#discussion_r89503899
  
--- Diff: 
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala
 ---
@@ -72,14 +77,142 @@ trait Predictor[Self] extends Estimator[Self] with 
WithParameters {
 */
   def evaluate[Testing, PredictionValue](
   testing: DataSet[Testing],
-  evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit
-  evaluator: EvaluateDataSetOperation[Self, Testing, PredictionValue])
+  evaluateParameters: ParameterMap = ParameterMap.Empty)
+  (implicit evaluator: EvaluateDataSetOperation[Self, Testing, 
PredictionValue])
 : DataSet[(PredictionValue, PredictionValue)] = {
 FlinkMLTools.registerFlinkMLTypes(testing.getExecutionEnvironment)
 evaluator.evaluateDataSet(this, evaluateParameters, testing)
   }
 }
 
+trait RankingPredictor[Self] extends Estimator[Self] with WithParameters {
+  that: Self =>
+
+  def predictRankings(
+k: Int,
+users: DataSet[Int],
+predictParameters: ParameterMap = ParameterMap.Empty)(implicit
+rankingPredictOperation : RankingPredictOperation[Self])
+  : DataSet[(Int,Int,Int)] =
+rankingPredictOperation.predictRankings(this, k, users, 
predictParameters)
+
+  def evaluateRankings(
+testing: DataSet[(Int,Int,Double)],
+evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit
+rankingPredictOperation : RankingPredictOperation[Self])
+  : DataSet[(Int,Int,Int)] = {
+// todo: do not burn 100 topK into code
+predictRankings(100, testing.map(_._1).distinct(), evaluateParameters)
+  }
+}
+
+trait RankingPredictOperation[Instance] {
+  def predictRankings(
+instance: Instance,
+k: Int,
+users: DataSet[Int],
+predictParameters: ParameterMap = ParameterMap.Empty)
+  : DataSet[(Int, Int, Int)]
+}
+
+/**
+  * Trait for providing auxiliary data for ranking evaluations.
+  *
+  * They are useful e.g. for excluding items found in the training 
[[DataSet]]
+  * from the recommended top K items.
+  */
+trait TrainingRatingsProvider {
+
+  def getTrainingData: DataSet[(Int, Int, Double)]
+
+  /**
+* Retrieving the training items.
+* Although this can be calculated from the training data, it requires 
a costly
+* [[DataSet.distinct]] operation, while in matrix factor models the 
set items could be
+* given more efficiently from the item factors.
+*/
+  def getTrainingItems: DataSet[Int] = {
+getTrainingData.map(_._2).distinct()
+  }
+}
+
+/**
+  * Ranking predictions for the most common case.
+  * If we can predict ratings, we can compute top K lists by sorting the 
predicted ratings.
+  */
+class RankingFromRatingPredictOperation[Instance <: 
TrainingRatingsProvider]
+(val ratingPredictor: PredictDataSetOperation[Instance, (Int, Int), (Int, 
Int, Double)])
+  extends RankingPredictOperation[Instance] {
+
+  private def getUserItemPairs(users: DataSet[Int], items: DataSet[Int], 
exclude: DataSet[(Int, Int)])
+  : DataSet[(Int, Int)] = {
+users.cross(items)
--- End diff --

I completely agree. There are use-cases where we would not like to give 
rankings from all the items. E.g. when recommending TV programs, we would only 
like to recommend currently running TV programs, but train on all of them.

We'll include an `item` DataSet parameter to ranking predictions.

(Btw. I believe the "Flink-way" is to let the user configure as much as 
possible, but that's just my opinion :) )


> Implementing ranking predictions for ALS
> 
>
> Key: FLINK-4712
> URL: https://issues.apache.org/jira/browse/FLINK-4712
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Domokos Miklós Kelen
>Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 

[jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS

2016-11-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15693360#comment-15693360
 ] 

ASF GitHub Bot commented on FLINK-4712:
---

Github user gaborhermann commented on the issue:

https://github.com/apache/flink/pull/2838
  
Thanks again for taking a look at our PR!

I've just realized from a developer mailing list thread that the FlinkML 
API is still not carved into stone even until 2.0, and it's nice to hear that :)

The problem is not with the `evaluate(test: TestType): DataSet[Double]` but 
rather with `evaluate(test: TestType): DataSet[(Prediction,Prediction)]`. It's 
at least confusing to have both, but it might not be worth to expose the one 
giving `(Prediction,Prediction)` pairs to the user as it only *prepares* 
evaluation. With introducing the evaluation framework, we could at least rename 
it to something like `preparePairwiseEvaluation(test: TestType): 
DataSet[(Prediction,Prediction)]`. In the ranking case we might generalize it 
to `prepareEvaluation(test: TestType): PreparedTesting`. We basically did this 
with the `PrepareDataSetOperation`, we've just left the old `evaluate` as it is 
for now. I suggest to change this if we can break the API.

I'll do a rebase on the cross-validation PR. At first glance, it should not 
really be a problem to do both cross-validation and hyper-parameter tuning, as 
the user has to provide a `Scorer` anyway. A minor issue I see is the user 
falling back to a default `score` (e.g. RMSE in case of ALS). This might not be 
a problem for recommendation models that give rating predictions beside ranking 
predictions, but it's a problem for models that *only* give ranking 
predictions, because those do not extend the `Predictor` class. This is not an 
issue for now, but might be a problem when adding more recommendation models. 
Should we try and do this now or is it a bit "overengineering"? I'll see if any 
other problem comes up with after rebasing.

The `RankingPredictor` interface is useful *internally* for the `Score`s. 
It serves a contract between a `RankingScore` and the model. I'm sure it will 
be used only for recommendations, but it's no effort exposing it, so the user 
can write code using a general `RankingPredictor` (although I would not think 
this is what users would like to do :) ). A better question is whether to use 
it in a `Pipeline`. We discussed this with some people, and could not really 
find a use-case where we need a `Transformer`-like preprocessing for 
recommendations. Of course, there could be other preprocessing steps, such as 
removing/aggregating duplicates, but those do not have to be `fit` to training 
data. Based on this, it's not worth the effort to integrate `RankingPredictor` 
with the `Pipeline`, at least for now.


> Implementing ranking predictions for ALS
> 
>
> Key: FLINK-4712
> URL: https://issues.apache.org/jira/browse/FLINK-4712
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Domokos Miklós Kelen
>Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 'exclude_known' setting of [Graphlab Create's ranking 
> factorization 
> recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
>  ).
> The output of the topK recommendation function could be in the form of 
> {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab 
> Create's output. However, this is arguable: follow up work includes 
> implementing ranking recommendation evaluation metrics (such as precision@k, 
> recall@k, ndcg@k), similar to [Spark's 
> implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
>  It would be beneficial if we were able to design the API such that it could 
> be included in the proposed evaluation framework (see 
> [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it 
> neccessary to consider the possible output type {{DataSet[(Int, 
> Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, 
> array of items), possibly including the predicted scores as well. 

[jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS

2016-11-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15693161#comment-15693161
 ] 

ASF GitHub Bot commented on FLINK-4712:
---

Github user gaborhermann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2838#discussion_r89489117
  
--- Diff: 
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala
 ---
@@ -72,14 +77,142 @@ trait Predictor[Self] extends Estimator[Self] with 
WithParameters {
 */
   def evaluate[Testing, PredictionValue](
   testing: DataSet[Testing],
-  evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit
-  evaluator: EvaluateDataSetOperation[Self, Testing, PredictionValue])
+  evaluateParameters: ParameterMap = ParameterMap.Empty)
+  (implicit evaluator: EvaluateDataSetOperation[Self, Testing, 
PredictionValue])
 : DataSet[(PredictionValue, PredictionValue)] = {
 FlinkMLTools.registerFlinkMLTypes(testing.getExecutionEnvironment)
 evaluator.evaluateDataSet(this, evaluateParameters, testing)
   }
 }
 
+trait RankingPredictor[Self] extends Estimator[Self] with WithParameters {
+  that: Self =>
+
+  def predictRankings(
+k: Int,
+users: DataSet[Int],
+predictParameters: ParameterMap = ParameterMap.Empty)(implicit
+rankingPredictOperation : RankingPredictOperation[Self])
+  : DataSet[(Int,Int,Int)] =
+rankingPredictOperation.predictRankings(this, k, users, 
predictParameters)
+
+  def evaluateRankings(
+testing: DataSet[(Int,Int,Double)],
+evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit
+rankingPredictOperation : RankingPredictOperation[Self])
+  : DataSet[(Int,Int,Int)] = {
+// todo: do not burn 100 topK into code
+predictRankings(100, testing.map(_._1).distinct(), evaluateParameters)
+  }
+}
+
+trait RankingPredictOperation[Instance] {
+  def predictRankings(
+instance: Instance,
+k: Int,
+users: DataSet[Int],
+predictParameters: ParameterMap = ParameterMap.Empty)
+  : DataSet[(Int, Int, Int)]
+}
+
+/**
+  * Trait for providing auxiliary data for ranking evaluations.
+  *
+  * They are useful e.g. for excluding items found in the training 
[[DataSet]]
+  * from the recommended top K items.
+  */
+trait TrainingRatingsProvider {
+
+  def getTrainingData: DataSet[(Int, Int, Double)]
+
+  /**
+* Retrieving the training items.
+* Although this can be calculated from the training data, it requires 
a costly
+* [[DataSet.distinct]] operation, while in matrix factor models the 
set items could be
+* given more efficiently from the item factors.
+*/
+  def getTrainingItems: DataSet[Int] = {
+getTrainingData.map(_._2).distinct()
+  }
+}
+
+/**
+  * Ranking predictions for the most common case.
+  * If we can predict ratings, we can compute top K lists by sorting the 
predicted ratings.
+  */
+class RankingFromRatingPredictOperation[Instance <: 
TrainingRatingsProvider]
+(val ratingPredictor: PredictDataSetOperation[Instance, (Int, Int), (Int, 
Int, Double)])
+  extends RankingPredictOperation[Instance] {
+
+  private def getUserItemPairs(users: DataSet[Int], items: DataSet[Int], 
exclude: DataSet[(Int, Int)])
+  : DataSet[(Int, Int)] = {
+users.cross(items)
--- End diff --

You're right. Although there's not much we can do generally to avoid this, 
we might be able to optimize for matrix factorization. This solution works for 
*every* predictor that predicts ratings, and we currently use it in ALS 
([here](https://github.com/apache/flink/pull/2838/files/45c98a97ef82d1012062dbcf6ade85a8d566062d#diff-80639a21b8fd166b5f7df5280cd609a9R467)).
 With a matrix factorization model *specifically*, we can avoid materializing 
all user-item pairs as tuples, and compute the rankings more directly, and that 
might be more efficient. So we could use a more specific `RankingPredictor` 
implementation in `ALS`. But even in that case, we still need to go through all 
the items for a particular user to calculate the top k items for that user.

Also this is only calculated with for the users we'd like to give rankings 
to. E.g. in a testing scenario, for the users in the test data which might be 
significantly less than the users in the training data.

I suggest to keep this anyway as this is general. We might come up with a 
solution that's slightly efficient in most cases for MF models. Should put 
effort in working on it? What do you think?


> Implementing ranking predictions for ALS
> 
>
> Key: FLINK-4712
>  

[jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS

2016-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690966#comment-15690966
 ] 

ASF GitHub Bot commented on FLINK-4712:
---

Github user gaborhermann commented on the issue:

https://github.com/apache/flink/pull/2838
  
Hello Theodore,

Thank you for checking out our solution! I would not like to answer now, as 
we did most of the work together with @proto-n, and I might be wrong in some 
aspects. I'll discuss these issues with him tomorrow personally, and answer 
ASAP!


> Implementing ranking predictions for ALS
> 
>
> Key: FLINK-4712
> URL: https://issues.apache.org/jira/browse/FLINK-4712
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Domokos Miklós Kelen
>Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 'exclude_known' setting of [Graphlab Create's ranking 
> factorization 
> recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
>  ).
> The output of the topK recommendation function could be in the form of 
> {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab 
> Create's output. However, this is arguable: follow up work includes 
> implementing ranking recommendation evaluation metrics (such as precision@k, 
> recall@k, ndcg@k), similar to [Spark's 
> implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
>  It would be beneficial if we were able to design the API such that it could 
> be included in the proposed evaluation framework (see 
> [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it 
> neccessary to consider the possible output type {{DataSet[(Int, 
> Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, 
> array of items), possibly including the predicted scores as well. See 
> [4713|https://issues.apache.org/jira/browse/FLINK-4713] for details.
> Another question arising is whether to provide this function as a member of 
> the ALS class, as a switch-kind of parameter to the ALS implementation 
> (meaning the model is either a rating or a ranking recommender model) or in 
> some other way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS

2016-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15689765#comment-15689765
 ] 

ASF GitHub Bot commented on FLINK-4712:
---

Github user thvasilo commented on the issue:

https://github.com/apache/flink/pull/2838
  
Hello Gabor, 

I like the idea of having a RankingScore, it seems like having that 
hierarchy with Score, RankingScore and PairWiseScore gives us the flexibility 
we need to include ranking and supervised learning evaluation under the same 
umbrella.

I would also encourage sharing any other ideas you broached that might 
break the API, this is still very much an evolving project and there is no need 
to shoehorn everything into an `evaluate(test: TestType): DataSet[Double]` 
function if there are better alternatives.

One think we need to consider is how this affects cross-validation and 
model selection/hyper-parameter tuning. These two aspects of the library are 
tightly linked and I think that we'll need to work on them in parallel to find 
issues that affect both.

I recommend taking a look at the [cross-validation 
PR](https://github.com/apache/flink/pull/891) I had opened way back when, and 
make a new WIP PR that uses the current one (#2838) as a basis. Since the 
`Score` interface still exists it shouldn't require many changes, and all 
that's added is the CrossValidation class. There are other fundamental issues 
with the sampling there we can discuss in due time.

Regarding the RankingPredictor we should consider the usecase of such an 
interface. Is it only going to be used for recommendation? If yes, what are the 
cases where we could build a Pipeline with current or future pre-processing 
steps? Could you give some pipeline examples in a recommendation setting?


> Implementing ranking predictions for ALS
> 
>
> Key: FLINK-4712
> URL: https://issues.apache.org/jira/browse/FLINK-4712
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Domokos Miklós Kelen
>Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 'exclude_known' setting of [Graphlab Create's ranking 
> factorization 
> recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
>  ).
> The output of the topK recommendation function could be in the form of 
> {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab 
> Create's output. However, this is arguable: follow up work includes 
> implementing ranking recommendation evaluation metrics (such as precision@k, 
> recall@k, ndcg@k), similar to [Spark's 
> implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
>  It would be beneficial if we were able to design the API such that it could 
> be included in the proposed evaluation framework (see 
> [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it 
> neccessary to consider the possible output type {{DataSet[(Int, 
> Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, 
> array of items), possibly including the predicted scores as well. See 
> [4713|https://issues.apache.org/jira/browse/FLINK-4713] for details.
> Another question arising is whether to provide this function as a member of 
> the ALS class, as a switch-kind of parameter to the ALS implementation 
> (meaning the model is either a rating or a ranking recommender model) or in 
> some other way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS

2016-11-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15689747#comment-15689747
 ] 

ASF GitHub Bot commented on FLINK-4712:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/2838#discussion_r89292988
  
--- Diff: 
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala
 ---
@@ -72,14 +77,142 @@ trait Predictor[Self] extends Estimator[Self] with 
WithParameters {
 */
   def evaluate[Testing, PredictionValue](
   testing: DataSet[Testing],
-  evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit
-  evaluator: EvaluateDataSetOperation[Self, Testing, PredictionValue])
+  evaluateParameters: ParameterMap = ParameterMap.Empty)
+  (implicit evaluator: EvaluateDataSetOperation[Self, Testing, 
PredictionValue])
 : DataSet[(PredictionValue, PredictionValue)] = {
 FlinkMLTools.registerFlinkMLTypes(testing.getExecutionEnvironment)
 evaluator.evaluateDataSet(this, evaluateParameters, testing)
   }
 }
 
+trait RankingPredictor[Self] extends Estimator[Self] with WithParameters {
+  that: Self =>
+
+  def predictRankings(
+k: Int,
+users: DataSet[Int],
+predictParameters: ParameterMap = ParameterMap.Empty)(implicit
+rankingPredictOperation : RankingPredictOperation[Self])
+  : DataSet[(Int,Int,Int)] =
+rankingPredictOperation.predictRankings(this, k, users, 
predictParameters)
+
+  def evaluateRankings(
+testing: DataSet[(Int,Int,Double)],
+evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit
+rankingPredictOperation : RankingPredictOperation[Self])
+  : DataSet[(Int,Int,Int)] = {
+// todo: do not burn 100 topK into code
+predictRankings(100, testing.map(_._1).distinct(), evaluateParameters)
+  }
+}
+
+trait RankingPredictOperation[Instance] {
+  def predictRankings(
+instance: Instance,
+k: Int,
+users: DataSet[Int],
+predictParameters: ParameterMap = ParameterMap.Empty)
+  : DataSet[(Int, Int, Int)]
+}
+
+/**
+  * Trait for providing auxiliary data for ranking evaluations.
+  *
+  * They are useful e.g. for excluding items found in the training 
[[DataSet]]
+  * from the recommended top K items.
+  */
+trait TrainingRatingsProvider {
+
+  def getTrainingData: DataSet[(Int, Int, Double)]
+
+  /**
+* Retrieving the training items.
+* Although this can be calculated from the training data, it requires 
a costly
+* [[DataSet.distinct]] operation, while in matrix factor models the 
set items could be
+* given more efficiently from the item factors.
+*/
+  def getTrainingItems: DataSet[Int] = {
+getTrainingData.map(_._2).distinct()
+  }
+}
+
+/**
+  * Ranking predictions for the most common case.
+  * If we can predict ratings, we can compute top K lists by sorting the 
predicted ratings.
+  */
+class RankingFromRatingPredictOperation[Instance <: 
TrainingRatingsProvider]
+(val ratingPredictor: PredictDataSetOperation[Instance, (Int, Int), (Int, 
Int, Double)])
+  extends RankingPredictOperation[Instance] {
+
+  private def getUserItemPairs(users: DataSet[Int], items: DataSet[Int], 
exclude: DataSet[(Int, Int)])
+  : DataSet[(Int, Int)] = {
+users.cross(items)
--- End diff --

This could very well blow up, do we have any limits on the size of the 
users  and items DataSets? 

If I understand correctly, users comes from the calling predictRankings 
function and contains userids (potentially all users) and items are all the 
items in the training set, which could be in the millions


> Implementing ranking predictions for ALS
> 
>
> Key: FLINK-4712
> URL: https://issues.apache.org/jira/browse/FLINK-4712
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Domokos Miklós Kelen
>Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 'exclude_known' setting of [Graphlab Create's ranking 
> factorization 
> 

[jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS

2016-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683574#comment-15683574
 ] 

ASF GitHub Bot commented on FLINK-4712:
---

Github user gaborhermann commented on the issue:

https://github.com/apache/flink/pull/2838
  
Hi @thvasilo, of course, thanks for taking a look! :)


> Implementing ranking predictions for ALS
> 
>
> Key: FLINK-4712
> URL: https://issues.apache.org/jira/browse/FLINK-4712
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Domokos Miklós Kelen
>Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 'exclude_known' setting of [Graphlab Create's ranking 
> factorization 
> recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
>  ).
> The output of the topK recommendation function could be in the form of 
> {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab 
> Create's output. However, this is arguable: follow up work includes 
> implementing ranking recommendation evaluation metrics (such as precision@k, 
> recall@k, ndcg@k), similar to [Spark's 
> implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
>  It would be beneficial if we were able to design the API such that it could 
> be included in the proposed evaluation framework (see 
> [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it 
> neccessary to consider the possible output type {{DataSet[(Int, 
> Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, 
> array of items), possibly including the predicted scores as well. See 
> [4713|https://issues.apache.org/jira/browse/FLINK-4713] for details.
> Another question arising is whether to provide this function as a member of 
> the ALS class, as a switch-kind of parameter to the ALS implementation 
> (meaning the model is either a rating or a ranking recommender model) or in 
> some other way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS

2016-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683524#comment-15683524
 ] 

ASF GitHub Bot commented on FLINK-4712:
---

Github user thvasilo commented on the issue:

https://github.com/apache/flink/pull/2838
  
Hello @gaborhermann, thanks for making the PR!

I'll try to take a look this week, I've been busy with a couple of other 
PRs these days.




> Implementing ranking predictions for ALS
> 
>
> Key: FLINK-4712
> URL: https://issues.apache.org/jira/browse/FLINK-4712
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Domokos Miklós Kelen
>Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 'exclude_known' setting of [Graphlab Create's ranking 
> factorization 
> recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
>  ).
> The output of the topK recommendation function could be in the form of 
> {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab 
> Create's output. However, this is arguable: follow up work includes 
> implementing ranking recommendation evaluation metrics (such as precision@k, 
> recall@k, ndcg@k), similar to [Spark's 
> implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
>  It would be beneficial if we were able to design the API such that it could 
> be included in the proposed evaluation framework (see 
> [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it 
> neccessary to consider the possible output type {{DataSet[(Int, 
> Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, 
> array of items), possibly including the predicted scores as well. See 
> [4713|https://issues.apache.org/jira/browse/FLINK-4713] for details.
> Another question arising is whether to provide this function as a member of 
> the ALS class, as a switch-kind of parameter to the ALS implementation 
> (meaning the model is either a rating or a ranking recommender model) or in 
> some other way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS

2016-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683179#comment-15683179
 ] 

ASF GitHub Bot commented on FLINK-4712:
---

Github user gaborhermann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2838#discussion_r88868762
  
--- Diff: 
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala
 ---
@@ -267,6 +401,21 @@ trait PredictOperation[Instance, Model, Testing, 
Prediction] extends Serializabl
   def predict(value: Testing, model: Model): Prediction
 }
 
+/**
+  * Operation for preparing a testing [[DataSet]] for evaluation.
+  *
+  * The most commonly [[EvaluateDataSetOperation]] is used, but evaluation 
of
+  * ranking recommendations need input in a different form.
+  */
+trait PrepareOperation[Instance, Testing, Prepared] extends Serializable {
--- End diff --

`PrepareOperation` is the common trait for `EvaluateDataSetOperation` and 
preparing ranking evaluation.


> Implementing ranking predictions for ALS
> 
>
> Key: FLINK-4712
> URL: https://issues.apache.org/jira/browse/FLINK-4712
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Domokos Miklós Kelen
>Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 'exclude_known' setting of [Graphlab Create's ranking 
> factorization 
> recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
>  ).
> The output of the topK recommendation function could be in the form of 
> {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab 
> Create's output. However, this is arguable: follow up work includes 
> implementing ranking recommendation evaluation metrics (such as precision@k, 
> recall@k, ndcg@k), similar to [Spark's 
> implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
>  It would be beneficial if we were able to design the API such that it could 
> be included in the proposed evaluation framework (see 
> [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it 
> neccessary to consider the possible output type {{DataSet[(Int, 
> Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, 
> array of items), possibly including the predicted scores as well. See 
> [4713|https://issues.apache.org/jira/browse/FLINK-4713] for details.
> Another question arising is whether to provide this function as a member of 
> the ALS class, as a switch-kind of parameter to the ALS implementation 
> (meaning the model is either a rating or a ranking recommender model) or in 
> some other way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS

2016-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683173#comment-15683173
 ] 

ASF GitHub Bot commented on FLINK-4712:
---

Github user gaborhermann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2838#discussion_r88868369
  
--- Diff: 
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/evaluation/Score.scala
 ---
@@ -18,12 +18,37 @@
 
 package org.apache.flink.ml.evaluation
 
+import org.apache.flink.api.common.operators.Order
 import org.apache.flink.api.common.typeinfo.TypeInformation
 import org.apache.flink.api.scala._
 import org.apache.flink.ml._
+import org.apache.flink.ml.pipeline._
+import org.apache.flink.ml.RichNumericDataSet
+import org.apache.flink.util.Collector
 
 import scala.reflect.ClassTag
 
+trait Score[
--- End diff --

`Score` is the common trait for `RankingScore` and `PairwiseScore`.


> Implementing ranking predictions for ALS
> 
>
> Key: FLINK-4712
> URL: https://issues.apache.org/jira/browse/FLINK-4712
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Domokos Miklós Kelen
>Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 'exclude_known' setting of [Graphlab Create's ranking 
> factorization 
> recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
>  ).
> The output of the topK recommendation function could be in the form of 
> {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab 
> Create's output. However, this is arguable: follow up work includes 
> implementing ranking recommendation evaluation metrics (such as precision@k, 
> recall@k, ndcg@k), similar to [Spark's 
> implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
>  It would be beneficial if we were able to design the API such that it could 
> be included in the proposed evaluation framework (see 
> [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it 
> neccessary to consider the possible output type {{DataSet[(Int, 
> Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, 
> array of items), possibly including the predicted scores as well. See 
> [4713|https://issues.apache.org/jira/browse/FLINK-4713] for details.
> Another question arising is whether to provide this function as a member of 
> the ALS class, as a switch-kind of parameter to the ALS implementation 
> (meaning the model is either a rating or a ranking recommender model) or in 
> some other way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4712) Implementing ranking predictions for ALS

2016-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15683167#comment-15683167
 ] 

ASF GitHub Bot commented on FLINK-4712:
---

GitHub user gaborhermann opened a pull request:

https://github.com/apache/flink/pull/2838

[FLINK-4712] [FLINK-4713] [ml] Ranking recommendation & evaluation (WIP)

Please note that this is a work-in-progress PR for discussing API design 
decisions. We propose here a class hierarchy for fitting the ranking 
evaluations into the proposed evaluation framework (see 
[PR](https://github.com/apache/flink/pull/1849)).
The features are mostly working, but documentation is missing and minor 
refactoring is needed. The evaluations currently work with top 100 rankings 
(burnt-in), and we still need to fix that. We need feedback for two main 
solutions, so we can go on with the PR. Thanks for any comment!

### `RankingPredictor`

We have managed to rework the evaluation framework proposed by @thvasilo, 
so that ranking predictions would fit in. Our approach is to use separate 
`RankingPredictor` and `Predictor` traits. One main problem however remains: 
there is no common superclass for `RankingPredictor` and `Predictor` so the 
pipelining mechanism might not work. A `Predictor` can only be at the and of 
the pipeline, so this should not really be a problem, but I do not know for 
sure. An alternative solution would be to have different objects `ALS` and 
`RankingALS` that give different predictions, but both extends only a 
`Predictor`. There could be implicit conversions between the two. I would 
prefer the current solution if it does not break the pipelining. @thvasilo what 
do you think about this?

(This seems to be a problem similar to having a `predict_proba` function in 
scikit learn classification models, where the same model for the same input 
gives two different predictions: a `predict` for discrete predictions and 
`predict_proba` for giving a probability.)

### Generalizing `EvalutateDataSetOperation`

On the other hand, we seem to have solved the scoring issue. The users can 
evaluate a recommendation algorithm such as ALS by using a score operating on 
rankings (e.g. nDCG), or a score operating on ratings (e.g. RMSE). They only 
need to modify the `Score` they use in their code, and nothing else.

The main problem was that the evaluate method and 
`EvaluateDataSetOperation` were not general enough. They prepare the evaluation 
to `(trueValue, predictedValue)` pairs (i.e. a `DataSet[(PredictionType, 
PredictionType)]`), while ranking evaluations needed a more general input with 
the true ratings (`DataSet[(Int,Int,Double)]`) and the predicted rankings 
(`DataSet[(Int,Int,Int)]`).

Instead of using `EvaluateDataSetOperation` we use a more general 
`PrepareOperation`. We rename the `Score` in the original evaluation framework 
to `PairwiseScore`. `RankingScore` and `PairwiseScore` has a common trait 
`Score`. This way the user can use both a `RankingScore` and a `PairwiseScore` 
for a certain model, and only need to alter the score used in the code.

In case of pairwise scores (that only need true and predicted value pairs 
for evaluation) `EvaluateDataSetOperation` is used as a `PrepareOperation`. It 
prepares the evaluation by creating `(trueValue, predicitedValue)` pairs from 
the test dataset. Thus, the result of preparing and the input of 
`PairwiseScore`s will be `DataSet[(PredictionType,PredictionType)]`. In case of 
rankings the `PrepareOperation` passes the test dataset and creates the 
rankings. The result of preparing and the input of `RankingScore`s will be 
`(DataSet[Int,Int,Double], DataSet[Int,Int,Int])`. I believe this is a fairly 
acceptable solution that avoids breaking the API.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gaborhermann/flink ranking-rec-eval

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2838






> Implementing ranking predictions for ALS
> 
>
> Key: FLINK-4712
> URL: https://issues.apache.org/jira/browse/FLINK-4712
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Domokos Miklós Kelen
>Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items