Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/1849
Great. I agree this PR should be merged before #2838.
@skonto thanks for taking up the review :) This is indeed a bit blocking.
Hopefully I can improve upon #2838 next week, so
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/1849
Hi @skonto, I did not have time lately to finish up #2838, but I could
clean it up next week. Although I believe this PR could be merged separately
from mine. (Evaluating ranking
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2838#discussion_r89503899
--- Diff:
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala
---
@@ -72,14 +77,142 @@ trait Predictor[Self] extends
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2838
Thanks again for taking a look at our PR!
I've just realized from a developer mailing list thread that the FlinkML
API is still not carved into stone even until 2.0, and it's nice
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2838#discussion_r89489117
--- Diff:
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala
---
@@ -72,14 +77,142 @@ trait Predictor[Self] extends
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2819
Hello Theodore,
Thanks for taking a look at this PR!
1. No problem. Some performance evaluations might help us in this case too.
I don't believe it should have much effect
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2838
Hello Theodore,
Thank you for checking out our solution! I would not like to answer now, as
we did most of the work together with @proto-n, and I might be wrong in some
aspects. I'll
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2838
Hi @thvasilo, of course, thanks for taking a look! :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2838#discussion_r88868762
--- Diff:
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala
---
@@ -267,6 +401,21 @@ trait PredictOperation[Instance
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2838#discussion_r88868369
--- Diff:
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/evaluation/Score.scala
---
@@ -18,12 +18,37 @@
package
GitHub user gaborhermann opened a pull request:
https://github.com/apache/flink/pull/2838
[FLINK-4712] [FLINK-4713] [ml] Ranking recommendation & evaluation (WIP)
Please note that this is a work-in-progress PR for discussing API design
decisions. We propose here a class hiera
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2819
There are some open questions:
1. Should we optimize 3 way join? For now the join order is burnt into the
code, also we might be able to give hints for join strategies.
2. How
GitHub user gaborhermann opened a pull request:
https://github.com/apache/flink/pull/2819
[FLINK-4961] [ml] SGD for Matrix Factorization (WIP)
Please note, that this is a work-in-progress PR, to discuss some design
questions. There are minor things to be done including
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2542
Thank you @thvasilo for your thorough review :)
@mbalassi I think this PR is ready to merge. Could you do a review when you
have some time?
---
If your project is set up for it, you
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r87588061
--- Diff:
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala
---
@@ -675,7 +756,69 @@ object ALS
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r87587473
--- Diff:
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala
---
@@ -273,6 +308,14 @@ object ALS {
val
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2542
Okay, I've made the seed optional. I also removed the
`generateRandomMatrix` function as it wasn't used at all.
---
If your project is set up for it, you can reply to this email and have your
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2542
I agree. I already fixed the seed at the `ImplicitALSITSuite`. At the
`ALSITSuite` the seed is unset, but by default it's 0, that's why `ALSITSuite`
is deterministic, so I don't think it's
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2542
@tillrohrmann thanks for clarifying!
Beside testing against Spark, I also did a small NumPy implementation, to
make sure iALS works well. Although, I did not make efforts to imitate
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r87355415
--- Diff:
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala
---
@@ -675,7 +756,69 @@ object ALS
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r87354805
--- Diff:
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala
---
@@ -535,8 +581,17 @@ object ALS {
itemOut
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r87353990
--- Diff:
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala
---
@@ -273,6 +308,14 @@ object ALS {
val
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r87353332
--- Diff:
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala
---
@@ -156,6 +171,26 @@ class ALS extends Predictor[ALS
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2542
Okay. Thank you again @thvasilo for reviewing our code!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2542
Hi @thvasilo,
Thanks for your thoughts! I agree we should perform a benchmark in the
future. Furthermore, based on the results we could optimize the algorithm.
I split up
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2542
Thanks @jfeher for the measurements! :)
@thvasilo The filtering referred to having distinct (user,artist) pairs.
It's only because the input of the iALS is a sparse matrix, and it would
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/1849
Hi all,
What is the status of this PR?
It would be relevant for us, because we might like to use the evaluation
framework proposed here. See
[FLINK-4713](https://issues.apache.org
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2542
Okay, we're on it.
@thvasilo thanks for the dataset suggestion.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2542
I did some minor changes according to user suggestions, and put it in a
separate commit (Minor fixes in IALS), so you can see my fixes. Note that
before merging I'd prefer to squash the commit
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r81115301
--- Diff:
flink-libraries/flink-ml/src/test/scala/org/apache/flink/ml/recommendation/ImplicitALSTest.scala
---
@@ -0,0 +1,171
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r81115229
--- Diff:
flink-libraries/flink-ml/src/test/scala/org/apache/flink/ml/recommendation/ImplicitALSTest.scala
---
@@ -0,0 +1,171
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r81114406
--- Diff:
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/recommendation/ALS.scala
---
@@ -581,6 +637,16 @@ object ALS {
val
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r81112771
--- Diff: docs/dev/libs/ml/als.md ---
@@ -99,6 +114,26 @@ The alternating least squares implementation can be
controlled by the following
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r81112526
--- Diff: docs/dev/libs/ml/als.md ---
@@ -99,6 +114,26 @@ The alternating least squares implementation can be
controlled by the following
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r81112516
--- Diff: docs/dev/libs/ml/als.md ---
@@ -49,6 +49,21 @@ By applying this step alternately to the matrices $U$
and $V$, we can iterativel
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2542
Hi @thvasilo, thanks for reviewing my code.
You are right, @jfeher told me that the numbers in the left column indicate
the number of non-zero elements in the rating matrix. We
Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2542
We did not measure performance against Spark or other implementations yet.
Those would reflect the performance of Flink ALS implementation, as there is
not much difference between the implicit
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r80253321
--- Diff: docs/dev/libs/ml/als.md ---
@@ -49,6 +49,18 @@ By applying this step alternately to the matrices $U$
and $V$, we can iterativel
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2542#discussion_r80230683
--- Diff: docs/dev/libs/ml/als.md ---
@@ -49,6 +49,18 @@ By applying this step alternately to the matrices $U$
and $V$, we can iterativel
GitHub user gaborhermann opened a pull request:
https://github.com/apache/flink/pull/2542
[FLINK-4613] Extend ALS to handle implicit feedback datasets
This extension of the ALS algorithm changes some parts of the code if
`implicitPrefs` flag is set to true. Mainly the local parts
GitHub user gaborhermann opened a pull request:
https://github.com/apache/flink/pull/994
[FLINK-2286] [streaming] Wrapped ParallelMerge into stream operator
Resolved the issue by wrapping the ParallelMerge function into a
StreamOperator and emitting the remaining records at the end
Github user gaborhermann closed the pull request at:
https://github.com/apache/flink/pull/802
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/802#issuecomment-120875478
Okay, thanks :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/802#issuecomment-119321451
The main purpose of this is to enable a deterministic (non-time based) test
for iteration as it sometimes failed for some reason on Travis. (See [here]
(https
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/872#issuecomment-117644273
Sorry.
* updated the docs (custom partitioning was also missing in the Scala batch
API docs)
* added IT case tests (also for the other stream partitioning
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/872#issuecomment-117141916
Okay then. These are the effects of changing I did not know about. Let's
stick to (2) and later on, we might reconsider this.
---
If your project is set up
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/872#issuecomment-116647912
By the way, in the Scala DataSet the user should specify the Java
`Partitioner[K]` class. Wouldn't it be more convenient to wrap a function like
`(K, Int) = Int
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/872#issuecomment-116671285
I'd prefer the function implementation (like `(K, Int) = Int`), but it
should stay consistent with the batch API. I don't see why the wrapping would
effect
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/872#issuecomment-116736041
Sorry for not making myself clear.
I would actually go for
4. Only the Scala function (both in the streaming and batch API)
I don't understand
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/872#issuecomment-116755999
Okay, then I will
* deprecate the partitioner implementation in the batch API
* add the function implementation to the batch API
* add the function
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/872#issuecomment-116103719
I guess it is easier for the users to understand and partitioning to
multiple channels at a time is rarely needed. Is there a use-case where it is
needed
GitHub user gaborhermann opened a pull request:
https://github.com/apache/flink/pull/872
[FLINK-2138] Added custom partitioning to DataStream
Custom partitioning added to DataStream in order to be more consistent with
the batch API.
You can merge this pull request into a Git
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/802#issuecomment-114525454
Thanks for the heads up, I removed the aggregation test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/802#issuecomment-112968975
I updated the PR. Added the fix for the iteration test. Probably not
getting a new element within the maximum wait time caused the failing of the
iteration tests
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/771#issuecomment-109894121
Thanks for merging. I removed the line that failed the test (I mistakenly
checked a non-deterministic id).
https://github.com/gaborhermann/flink/commits/FLINK
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/771#issuecomment-109909701
Of course. I took up the JIRA issue.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/803#issuecomment-109955074
As a note: I've just run into this issue at chained streaming aggregations
when writing tests (not copying created wrong results). I will check again
after this has
GitHub user gaborhermann opened a pull request:
https://github.com/apache/flink/pull/802
[wip] [FLINK-2180] [streaming] Iteration test fix
* Fixed the iteration test at DataStreamTest
* Will add more tests for DataStream
* Will fix IterateTest
You can merge this pull
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/771#issuecomment-108902692
Fixed the style issues and added more tests.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user gaborhermann opened a pull request:
https://github.com/apache/flink/pull/771
[wip] [FLINK-2136] Adding DataStream tests for Scala API
* Added tests for scala DataStream
* Added tests for setting parallelism
* Fixed some small bugs in scala API
You can merge
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/589#issuecomment-92846151
Yes, of course, we can postpone the refactor.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/589#issuecomment-92413840
Thanks for the comments, I made the proposed changes (except Marton's).
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/589#discussion_r28251514
--- Diff:
flink-staging/flink-streaming/flink-streaming-connectors/src/test/java/org/apache/flink/streaming/connectors/kafka/KafkaITCase.java
GitHub user gaborhermann opened a pull request:
https://github.com/apache/flink/pull/557
[FLINK-1753] [streaming] Added test for Kafka connector with tuple type
Extended the KafkaITCase with a topology of a non-String type and fixed a
small bug with Kafka key serializer.
You can
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/557#issuecomment-88523220
Thanks for the remark. I updated the branch and added more tests.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user gaborhermann closed the pull request at:
https://github.com/apache/flink/pull/474
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/474#issuecomment-85014101
I agree. Further work should be done elsewhere (over the master).
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/500#discussion_r26758463
--- Diff:
flink-staging/flink-streaming/flink-streaming-connectors/src/main/java/org/apache/flink/streaming/connectors/kafka/api/simple/iterator
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/500#discussion_r26760037
--- Diff:
flink-staging/flink-streaming/flink-streaming-connectors/src/test/java/org/apache/flink/streaming/connectors/kafka/KafkaITCase.java
Github user gaborhermann commented on a diff in the pull request:
https://github.com/apache/flink/pull/500#discussion_r26758570
--- Diff:
flink-staging/flink-streaming/flink-streaming-connectors/src/main/java/org/apache/flink/streaming/connectors/kafka/api/simple
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/474#issuecomment-81948052
Thanks for the fix.
How can the above job execution failure be reproduced?
I agree with adding the integration test.
---
If your project is set up
GitHub user gaborhermann opened a pull request:
https://github.com/apache/flink/pull/474
Cleanup of low level Kafka consumer (PersistentKafkaSource)
* When there are more partitions of topic than parallel subtasks listening
partitions get distributed between subtask in order
GitHub user gaborhermann opened a pull request:
https://github.com/apache/flink/pull/472
[FLINK-1594] [streaming] [wip] DataStream supports self-connection
* Added StreamEdge abstraction, so more than one edge can be defined
between stream operators.
* Added
Github user gaborhermann commented on the pull request:
https://github.com/apache/flink/pull/433#issuecomment-75928922
Sorry, I accidentally used a class only available in Java 8, and did not
wait for the Travis build.
I fixed it. Travis still fails, but on an unrelated issue (see
74 matches
Mail list logo