date:20150514

[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Alexander Alexandrov (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543734#comment-14543734
 ] 

Alexander Alexandrov commented on FLINK-1731:
-

I would go with a {{DataSet}} for the centroids as well. That said, we can 
reduce syntax at the client side by providing either

- an implicit converter that {{Seq\[A\] = DataSet\[A\]}} (needs to be part of 
the Flink Scala API, could be already there), or
- an overloaded {{setCentroids(Seq\[A\])}} setter.

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1743) Add multinomial logistic regression to machine learning library

2015-05-14 Thread Alexander Alexandrov (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov updated FLINK-1743:

Assignee: (was: Alexander Alexandrov)

 Add multinomial logistic regression to machine learning library
 ---

 Key: FLINK-1743
 URL: https://issues.apache.org/jira/browse/FLINK-1743
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
  Labels: ML

 Multinomial logistic regression [1] would be good first classification 
 algorithm which can classify multiple classes. 
 Resources:
 [1] [http://en.wikipedia.org/wiki/Multinomial_logistic_regression]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Alexander Alexandrov (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov updated FLINK-1731:

Assignee: (was: Alexander Alexandrov)

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Peter Schrott (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543670#comment-14543670
 ] 

Peter Schrott commented on FLINK-1731:
--

Hi flink people,

as we now figured out how to pass in the initial centroids (via ParameterMap) 
there is still the open question, if we should use a Seqence or DataSet.
As I already mentioned before, I am not sure about the side effects regarding 
parallelism using the DataSet type.

- thanks for advices.

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Alexander Alexandrov (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543643#comment-14543643
 ] 

Alexander Alexandrov commented on FLINK-1731:
-

[~peedeeX21] for some reason I cannot assign this to you directly. I cleared 
the assignee field so you can assign the issue to yourself. 

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Alexander Alexandrov (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543734#comment-14543734
 ] 

Alexander Alexandrov edited comment on FLINK-1731 at 5/14/15 2:35 PM:
--

I would go with a {{DataSet}} for the centroids as well. That said, we can 
reduce syntax at the client side by providing either

- an overloaded {{setCentroids(Seq\[A\])}} setter, or
- an implicit converter of type {{Seq\[A\] = DataSet\[A\]}} (needs to be part 
of the Flink Scala API, could be already there) which allows to pass a 
{{Seq\[A\]}} argument to a {{setCentroids(DataSet\[A\])}} setter.


was (Author: aalexandrov):
I would go with a {{DataSet}} for the centroids as well. That said, we can 
reduce syntax at the client side by providing either

- an implicit converter that {{Seq\[A\] = DataSet\[A\]}} (needs to be part of 
the Flink Scala API, could be already there), or
- an overloaded {{setCentroids(Seq\[A\])}} setter.

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2013) Create generalized linear model framework

2015-05-14 Thread Theodore Vasiloudis (JIRA)

Theodore Vasiloudis created FLINK-2013:
--

 Summary: Create generalized linear model framework
 Key: FLINK-2013
 URL: https://issues.apache.org/jira/browse/FLINK-2013
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Theodore Vasiloudis
Assignee: Theodore Vasiloudis


[Generalized linear 
models|http://en.wikipedia.org/wiki/Generalized_linear_model] (GLMs) provide an 
abstraction for many learning models that can be used for regression and 
classification tasks.

Some example GLMs are linear regression, logistic regression, LASSO and ridge 
regression.

The goal for this JIRA is to provide interfaces for the set of common 
properties and functions these models share. 
The goal would be to have a design pattern similar to the one that 
[sklearn|http://scikit-learn.org/stable/modules/linear_model.html] and 
[MLlib|http://spark.apache.org/docs/1.3.0/mllib-linear-methods.html] uses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2015) Add ridge regression

2015-05-14 Thread Theodore Vasiloudis (JIRA)

Theodore Vasiloudis created FLINK-2015:
--

 Summary: Add ridge regression
 Key: FLINK-2015
 URL: https://issues.apache.org/jira/browse/FLINK-2015
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Theodore Vasiloudis
Priority: Minor


Ridge regression is a linear regression model that imposes penalties on the 
size of the coefficients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Theodore Vasiloudis (JIRA)

[
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543711#comment-14543711
]

Theodore Vasiloudis commented on FLINK-1731:

Since the centroids will have to be broadcast to all task managers, that means
that they will have to be placed inside a DataSet eventually.

One approach is to use a Sequence which you then convert into a DataSet inside
the algorithm, or require that the user provides a DataSet as a parameter.

In GradientDescent we are using the second option, i.e. we expect a DataSet of
weights, you can do the same with centroids.

Add kMeans clustering algorithm to machine learning library
---

Key: FLINK-1731
URL: https://issues.apache.org/jira/browse/FLINK-1731
Project: Flink
Issue Type: New Feature
Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
Labels: ML

The Flink repository already contains a kMeans implementation but it is not
yet ported to the machine learning library. I assume that only the used data
types have to be adapted and then it can be more or less directly moved to
flink-ml.
The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better
implementation because the improve the initial seeding phase to achieve near
optimal clustering. It might be worthwhile to implement kMeans||.
Resources:
[1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
[2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Robert Metzger (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543647#comment-14543647
 ] 

Robert Metzger commented on FLINK-1731:
---

[~aalexandrov]: only users with Contributor permissions can be assigned to 
issues.
I made [~peedeeX21] a contributor and assigned him.

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Alexander Alexandrov
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Peter Schrott (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Schrott reassigned FLINK-1731:


Assignee: Peter Schrott  (was: Alexander Alexandrov)

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Peter Schrott (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543648#comment-14543648
 ] 

Peter Schrott commented on FLINK-1731:
--

Great! Thanks!

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Alexander Alexandrov
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2013) Create generalized linear model framework

2015-05-14 Thread Theodore Vasiloudis (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Theodore Vasiloudis updated FLINK-2013:
---
Description: 
[Generalized linear 
models|http://en.wikipedia.org/wiki/Generalized_linear_model] (GLMs) provide an 
abstraction for many learning models that can be used for regression and 
classification tasks.

Some example GLMs are linear regression, logistic regression, LASSO and ridge 
regression.

The goal for this JIRA is to provide interfaces for the set of common 
properties and functions these models share. 
In order to achieve that, a design pattern similar to the one that 
[sklearn|http://scikit-learn.org/stable/modules/linear_model.html] and 
[MLlib|http://spark.apache.org/docs/1.3.0/mllib-linear-methods.html] employ 
will be used.

  was:
[Generalized linear 
models|http://en.wikipedia.org/wiki/Generalized_linear_model] (GLMs) provide an 
abstraction for many learning models that can be used for regression and 
classification tasks.

Some example GLMs are linear regression, logistic regression, LASSO and ridge 
regression.

The goal for this JIRA is to provide interfaces for the set of common 
properties and functions these models share. 
The goal would be to have a design pattern similar to the one that 
[sklearn|http://scikit-learn.org/stable/modules/linear_model.html] and 
[MLlib|http://spark.apache.org/docs/1.3.0/mllib-linear-methods.html] uses.


 Create generalized linear model framework
 -

 Key: FLINK-2013
 URL: https://issues.apache.org/jira/browse/FLINK-2013
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Theodore Vasiloudis
Assignee: Theodore Vasiloudis
  Labels: ML

 [Generalized linear 
 models|http://en.wikipedia.org/wiki/Generalized_linear_model] (GLMs) provide 
 an abstraction for many learning models that can be used for regression and 
 classification tasks.
 Some example GLMs are linear regression, logistic regression, LASSO and ridge 
 regression.
 The goal for this JIRA is to provide interfaces for the set of common 
 properties and functions these models share. 
 In order to achieve that, a design pattern similar to the one that 
 [sklearn|http://scikit-learn.org/stable/modules/linear_model.html] and 
 [MLlib|http://spark.apache.org/docs/1.3.0/mllib-linear-methods.html] employ 
 will be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2004] Fix memory leak in presense of fa...

2015-05-14 Thread rmetzger

GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/674

[FLINK-2004] Fix memory leak in presense of failed checkpoints for Kafka 
Source



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink2004

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/674.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #674


commit 27f11822b7db2716f3484def8ad350eb7e0b0893
Author: Robert Metzger rmetz...@apache.org
Date:   2015-05-14T09:45:30Z

[FLINK-2004] Fix memory leak in presense of failed checkpoints in Kafka 
source

commit 36cb4758c200713a97858989ac73f117186ed9dc
Author: Robert Metzger rmetz...@apache.org
Date:   2015-05-14T13:57:18Z

unused imports




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

84 matches

Mail list logo