[jira] [Commented] (FLINK-1284) Uniform random sampling operator over windows

2016-04-21 Thread Austin Ouyang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253383#comment-15253383
 ] 

Austin Ouyang commented on FLINK-1284:
--

Hi Paris,

Would we also want to add the ability to sample by percentage? Also what would 
the fieldID be referring to? I was thinking that there were 2 naive possible 
solutions. 
1) Once the trigger is made, we randomly sample for N samples or a percentage 
of all the samples in each window
2) Given a percentage of samples we want to retain from each window generate a 
random number between 0 and 1. Append to result if the random number is less 
than the specified percentage. 


> Uniform random sampling operator over windows
> -
>
> Key: FLINK-1284
> URL: https://issues.apache.org/jira/browse/FLINK-1284
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Paris Carbone
>Priority: Minor
>
> It would be useful for several use cases to have a built-in uniform random 
> sampling operator in the streaming API that can operate on windows. This can 
> be used for example for online machine learning operations, evaluating 
> heuristics or continuous visualisation of representative values.
> The operator could be given a field and a number of random samples needed, 
> following a window statement as such:
> mystream.window(..).sample(fieldID,#samples)
> Given that pre-aggregation is enabled, this could perhaps be implemented as a 
> binary reduce operator or a combinable groupreduce that pre-aggregates the 
> empiricals of that field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1743) Add multinomial logistic regression to machine learning library

2016-04-21 Thread David E Drummond (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253364#comment-15253364
 ] 

David E Drummond commented on FLINK-1743:
-

[~trohrm...@apache.org]

Are there any updates on this issue?  I would like to use this as a base for 
LogitBoost (FLINK-1749), and as an important algorithm in it's own right.  If 
this is still unresolved, then I would be happy to help with this issue by 
implementing the algorithm discussed in the previous reference [1].

> Add multinomial logistic regression to machine learning library
> ---
>
> Key: FLINK-1743
> URL: https://issues.apache.org/jira/browse/FLINK-1743
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Ozturk Gokal
>  Labels: ML
>
> Multinomial logistic regression [1] would be good first classification 
> algorithm which can classify multiple classes. 
> Resources:
> [1] [http://en.wikipedia.org/wiki/Multinomial_logistic_regression]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1749) Add Boosting algorithm for ensemble learning to machine learning library

2016-04-21 Thread David E Drummond (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253310#comment-15253310
 ] 

David E Drummond commented on FLINK-1749:
-

I noticed that multinomial logistic regression [FLINK-1743] is still open and 
unresolved, and would be a dependency for LogitBoost as a weak learner.  
Similarly, AdaBoost may require decision trees [FLINK-1727] as a weak 
classifier.  It may make more sense to work on those projects prior to boosting?

> Add Boosting algorithm for ensemble learning to machine learning library
> 
>
> Key: FLINK-1749
> URL: https://issues.apache.org/jira/browse/FLINK-1749
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: narayana reddy
>  Labels: ML
>
> Boosting [1] can help to create strong learners from an ensemble of weak 
> learners and thus improving its performance. Widely used boosting algorithms 
> are AdaBoost [2] and LogitBoost [3]. The work of I. Palit and C. K. Reddy [4] 
> investigates how boosting can be efficiently realised in a distributed 
> setting. 
> Resources:
> [1] [http://en.wikipedia.org/wiki/Boosting_%28machine_learning%29]
> [2] [http://en.wikipedia.org/wiki/AdaBoost]
> [3] [http://en.wikipedia.org/wiki/LogitBoost]
> [4] [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6035709]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FunctionAnnotation example should match docs

2016-04-21 Thread NathanHowell
GitHub user NathanHowell opened a pull request:

https://github.com/apache/flink/pull/1925

FunctionAnnotation example should match docs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NathanHowell/flink patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1925.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1925


commit 38341d2b40e2420f1c07dc75ada237ca9c8a4dfb
Author: Nathan Howell 
Date:   2016-04-22T02:48:58Z

FunctionAnnotation example should match docs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1746) Add linear discriminant analysis to machine learning library

2016-04-21 Thread Ronak Nathani (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253119#comment-15253119
 ] 

Ronak Nathani commented on FLINK-1746:
--

Hi [~till.rohrmann],

I wanted to check if there are any updates on this issue. I would like to work 
on the distributed LDA implementation you mentioned in reference [1] and 
contribute to the project. 

Best,
Ronak

> Add linear discriminant analysis to machine learning library
> 
>
> Key: FLINK-1746
> URL: https://issues.apache.org/jira/browse/FLINK-1746
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Raghav Chalapathy
>  Labels: ML
>
> Linear discriminant analysis (LDA) [1] is used for dimensionality reduction 
> prior to classification. But it can also be used to calculate a linear 
> classifier on its own. Since dimensionality reduction is an important 
> preprocessing step, a distributed LDA implementation is a valuable addition 
> to flink-ml.
> Resources:
> [1] [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp==5946724]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1729) Assess performance of classification algorithms

2016-04-21 Thread hoa nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253113#comment-15253113
 ] 

hoa nguyen commented on FLINK-1729:
---

Hi [~till.rohrmann], Is there an update on this? To confirm, this would provide 
an example implementation of say SVMs on publicly available datasets to 
validate the algorithm. Would it be possible for me to be assigned this? Many 
thanks,
Hoa

> Assess performance of classification algorithms
> ---
>
> Key: FLINK-1729
> URL: https://issues.apache.org/jira/browse/FLINK-1729
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>  Labels: ML
>
> In order to validate Flink's classification algorithms (in terms of 
> performance and accuracy), we should run them on publicly available 
> classification data sets. This will not only serve as a proof for the 
> correctness of the implementations but will also show how easy the machine 
> learning library can be used.
> Bottou [1] published some results for the RCV1 dataset using SVMs for 
> classification. The SVMs are trained using stochastic gradient descent. Thus, 
> they would be a good comparison for the CoCoA trained SVMs.
> Some more benchmark results and publicly available data sets ca be found here 
> [2].
> Resources:
> [1] [http://leon.bottou.org/projects/sgd]
> [2] [https://github.com/BIDData/BIDMach/wiki/Benchmarks]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1749) Add Boosting algorithm for ensemble learning to machine learning library

2016-04-21 Thread David E Drummond (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253007#comment-15253007
 ] 

David E Drummond commented on FLINK-1749:
-

Hi [~trohrm...@apache.org],

Are there any updates on this issue?  I am a full-time data engineer and I 
would enjoy contributing to this.  In particular, I would like to start by 
working on the LogitBoost that you referred to in reference [3], with the 
distributed approach discussed in [4].

> Add Boosting algorithm for ensemble learning to machine learning library
> 
>
> Key: FLINK-1749
> URL: https://issues.apache.org/jira/browse/FLINK-1749
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: narayana reddy
>  Labels: ML
>
> Boosting [1] can help to create strong learners from an ensemble of weak 
> learners and thus improving its performance. Widely used boosting algorithms 
> are AdaBoost [2] and LogitBoost [3]. The work of I. Palit and C. K. Reddy [4] 
> investigates how boosting can be efficiently realised in a distributed 
> setting. 
> Resources:
> [1] [http://en.wikipedia.org/wiki/Boosting_%28machine_learning%29]
> [2] [http://en.wikipedia.org/wiki/AdaBoost]
> [3] [http://en.wikipedia.org/wiki/LogitBoost]
> [4] [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6035709]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3802) Add Very Fast Reservoir Sampling

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252808#comment-15252808
 ] 

ASF GitHub Bot commented on FLINK-3802:
---

GitHub user gaoyike opened a pull request:

https://github.com/apache/flink/pull/1924

[FLINK-3802] Add Very Fast Reservoir Sampling

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed



A in memory implementation of Very Fast Reservoir Sampling, the algorithm 
works well then the size of streaming data is much larger than size of 
reservoir.

  The algorithm runs in random sampling with P(R/j) where in R is the size 
of sampling and j is the current index of streaming data.
  The algorithm consists of two part:
(1) Before the size of streaming data reaches threshold, it uses 
regular reservoir sampling
(2) After the size of streaming data reaches threshold, it uses 
geometric distribution to generate the approximation gap
to skip data, and size of gap is determined by  geometric 
distribution with probability p = R/j

   Thanks to Erik Erlandson who is the author of this algorithm and help me 
with implementation.

Reference: 
http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gaoyike/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1924.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1924


commit 81e0622b20d8bc969dec1555bd55d4230d9b38de
Author: 晨光 何 
Date:   2016-04-21T21:42:26Z

 A in memory implementation of Very Fast Reservoir Sampling. The algorithm 
works well then the size of streaming data is much larger than size of 
reservoir.
  The algorithm runs in random sampling with P(R/j) where in R is the size 
of sampling and j is the current index of streaming data.
  The algorithm consists of two part:
(1) Before the size of streaming data reaches threshold, it uses 
regular reservoir sampling
(2) After the size of streaming data reaches threshold, it uses 
geometric distribution to generate the approximation gap
to skip data, and size of gap is determined by  geometric 
distribution with probability p = R/j

   Thanks to Erik Erlandson who is the author of this algorithm and help me 
with implementation.




> Add Very Fast Reservoir Sampling
> 
>
> Key: FLINK-3802
> URL: https://issues.apache.org/jira/browse/FLINK-3802
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: Chenguang He
>Assignee: Chenguang He
>  Labels: Sampling
>
> Adding Very Fast Reservoir Sampling 
> (http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/)
> An improved version of Reservoir Sampling, it's used to deal with small 
> sampling in large dataset, where the size of dataset is much larger than the 
> size of sampling.
> It is a random sampling proved in the link. The average possibility is 
> P(R/J), where R is size of sampling and J is index of streaming data 
> Thanks Erik Erlandson who is the author of this algorithm help me with 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3802] Add Very Fast Reservoir Sampling

2016-04-21 Thread gaoyike
GitHub user gaoyike opened a pull request:

https://github.com/apache/flink/pull/1924

[FLINK-3802] Add Very Fast Reservoir Sampling

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed



A in memory implementation of Very Fast Reservoir Sampling, the algorithm 
works well then the size of streaming data is much larger than size of 
reservoir.

  The algorithm runs in random sampling with P(R/j) where in R is the size 
of sampling and j is the current index of streaming data.
  The algorithm consists of two part:
(1) Before the size of streaming data reaches threshold, it uses 
regular reservoir sampling
(2) After the size of streaming data reaches threshold, it uses 
geometric distribution to generate the approximation gap
to skip data, and size of gap is determined by  geometric 
distribution with probability p = R/j

   Thanks to Erik Erlandson who is the author of this algorithm and help me 
with implementation.

Reference: 
http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gaoyike/flink master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1924.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1924


commit 81e0622b20d8bc969dec1555bd55d4230d9b38de
Author: 晨光 何 
Date:   2016-04-21T21:42:26Z

 A in memory implementation of Very Fast Reservoir Sampling. The algorithm 
works well then the size of streaming data is much larger than size of 
reservoir.
  The algorithm runs in random sampling with P(R/j) where in R is the size 
of sampling and j is the current index of streaming data.
  The algorithm consists of two part:
(1) Before the size of streaming data reaches threshold, it uses 
regular reservoir sampling
(2) After the size of streaming data reaches threshold, it uses 
geometric distribution to generate the approximation gap
to skip data, and size of gap is determined by  geometric 
distribution with probability p = R/j

   Thanks to Erik Erlandson who is the author of this algorithm and help me 
with implementation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-3802) Add Very Fast Reservoir Sampling

2016-04-21 Thread Chenguang He (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenguang He updated FLINK-3802:

Description: 
Adding Very Fast Reservoir Sampling 
(http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/)

An improved version of Reservoir Sampling, it's used to deal with small 
sampling in large dataset, where the size of dataset is much larger than the 
size of sampling.

It is a random sampling proved in the link. The average possibility is P(R/J), 
where R is size of sampling and J is index of streaming data 

Thanks Erik Erlandson who is the author of this algorithm help me with 
implementation.

  was:
Adding Very Fast Reservoir Sampling 
(http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/)

An improvement version of Reservoir Sampling, it's used to deal with small 
sampling in large dataset, where the set of dataset is much larger than the 
size of sampling.

It is a random sampling proved in the link. The average possibility is P(R/J), 
where R is size of sampling and J is index of streaming data 

Thanks Erik Erlandson who is the author of this algorithm help me with 
implementation.


> Add Very Fast Reservoir Sampling
> 
>
> Key: FLINK-3802
> URL: https://issues.apache.org/jira/browse/FLINK-3802
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: Chenguang He
>Assignee: Chenguang He
>  Labels: Sampling
>
> Adding Very Fast Reservoir Sampling 
> (http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/)
> An improved version of Reservoir Sampling, it's used to deal with small 
> sampling in large dataset, where the size of dataset is much larger than the 
> size of sampling.
> It is a random sampling proved in the link. The average possibility is 
> P(R/J), where R is size of sampling and J is index of streaming data 
> Thanks Erik Erlandson who is the author of this algorithm help me with 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3802) Add Very Fast Reservoir Sampling

2016-04-21 Thread Chenguang He (JIRA)
Chenguang He created FLINK-3802:
---

 Summary: Add Very Fast Reservoir Sampling
 Key: FLINK-3802
 URL: https://issues.apache.org/jira/browse/FLINK-3802
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Chenguang He
Assignee: Chenguang He


Adding Very Fast Reservoir Sampling 
(http://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/)

An improvement version of Reservoir Sampling, it's used to deal with small 
sampling in large dataset, where the set of dataset is much larger than the 
size of sampling.

It is a random sampling proved in the link. The average possibility is P(R/J), 
where R is size of sampling and J is index of streaming data 

Thanks Erik Erlandson who is the author of this algorithm help me with 
implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252289#comment-15252289
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1856#issuecomment-213030347
  
Thanks for your contribution @ramkrish86. Good work. But before merging, we 
should address the support for Scala tuples and add the `maxBy/minBy` to the 
Scala `GroupedDataSet`.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3650 Add maxBy/minBy to Scala DataSet AP...

2016-04-21 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1856#issuecomment-213030347
  
Thanks for your contribution @ramkrish86. Good work. But before merging, we 
should address the support for Scala tuples and add the `maxBy/minBy` to the 
Scala `GroupedDataSet`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3801) Upgrade Joda-Time library to 2.9.3

2016-04-21 Thread Ted Yu (JIRA)
Ted Yu created FLINK-3801:
-

 Summary: Upgrade Joda-Time library to 2.9.3
 Key: FLINK-3801
 URL: https://issues.apache.org/jira/browse/FLINK-3801
 Project: Flink
  Issue Type: Improvement
Reporter: Ted Yu
Priority: Minor


Currently yoda-time 2.5 is used which was very old.

We should upgrade to 2.9.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3650 Add maxBy/minBy to Scala DataSet AP...

2016-04-21 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r60620601
  
--- Diff: 
flink-scala/src/test/scala/org/apache/flink/api/operator/MaxByOperatorTest.scala
 ---
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.operator
+
+import org.apache.flink.api.common.InvalidProgramException
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.java.tuple.{Tuple, Tuple5}
+import org.apache.flink.api.java.typeutils.TupleTypeInfo
+import org.apache.flink.api.scala._
+import org.junit.Test
+import org.junit.Assert;
+
+class MaxByOperatorTest {
+  private val emptyTupleData = List[Tuple5[Integer, Long, String, Long, 
Integer]]()
+  private val customTypeData = List[CustomType](new CustomType())
+  private val tupleTypeInfo: TupleTypeInfo[Tuple5[Integer, Long, String, 
Long, Integer]] =
--- End diff --

We should add a test for Scala tuples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252258#comment-15252258
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r60620601
  
--- Diff: 
flink-scala/src/test/scala/org/apache/flink/api/operator/MaxByOperatorTest.scala
 ---
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.operator
+
+import org.apache.flink.api.common.InvalidProgramException
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.java.tuple.{Tuple, Tuple5}
+import org.apache.flink.api.java.typeutils.TupleTypeInfo
+import org.apache.flink.api.scala._
+import org.junit.Test
+import org.junit.Assert;
+
+class MaxByOperatorTest {
+  private val emptyTupleData = List[Tuple5[Integer, Long, String, Long, 
Integer]]()
+  private val customTypeData = List[CustomType](new CustomType())
+  private val tupleTypeInfo: TupleTypeInfo[Tuple5[Integer, Long, String, 
Long, Integer]] =
--- End diff --

We should add a test for Scala tuples.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252254#comment-15252254
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r60620477
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/SelectByMinFunction.java
 ---
@@ -72,8 +73,8 @@ public T reduce(T value1, T value2) throws Exception {
for (int position : fields) {
// Save position of compared key
// Get both values - both implement comparable
-   Comparable comparable1 = 
value1.getFieldNotNull(position);
-   Comparable comparable2 = 
value2.getFieldNotNull(position);
+   Comparable comparable1 = 
((Tuple)value1).getFieldNotNull(position);
--- End diff --

This won't work with Scala tuples.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3650 Add maxBy/minBy to Scala DataSet AP...

2016-04-21 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r60620477
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/SelectByMinFunction.java
 ---
@@ -72,8 +73,8 @@ public T reduce(T value1, T value2) throws Exception {
for (int position : fields) {
// Save position of compared key
// Get both values - both implement comparable
-   Comparable comparable1 = 
value1.getFieldNotNull(position);
-   Comparable comparable2 = 
value2.getFieldNotNull(position);
+   Comparable comparable1 = 
((Tuple)value1).getFieldNotNull(position);
--- End diff --

This won't work with Scala tuples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252249#comment-15252249
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r60620452
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/SelectByMinFunction.java
 ---
@@ -48,7 +49,7 @@ public SelectByMinFunction(TupleTypeInfo type, int... 
fields) {
}
 
// Check whether type is comparable
-   if (!type.getTypeAt(field).isKeyType()) {
+   if 
(!((TupleTypeInfo)type).getTypeAt(field).isKeyType()) {
--- End diff --

Scala tuple types are not of type `TupleTypeInfo` but instead 
`TupleTypeInfoBase`.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3650 Add maxBy/minBy to Scala DataSet AP...

2016-04-21 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r60620452
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/SelectByMinFunction.java
 ---
@@ -48,7 +49,7 @@ public SelectByMinFunction(TupleTypeInfo type, int... 
fields) {
}
 
// Check whether type is comparable
-   if (!type.getTypeAt(field).isKeyType()) {
+   if 
(!((TupleTypeInfo)type).getTypeAt(field).isKeyType()) {
--- End diff --

Scala tuple types are not of type `TupleTypeInfo` but instead 
`TupleTypeInfoBase`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3701] reuse serializer lists in Executi...

2016-04-21 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1913#issuecomment-213003239
  
The problem is a bit more involved. We have basically three possible 
branches for the `ExecutionConfig` usage.

1. Serialization during `JobGraph`/`StreamGraph` generation and 
deserialization using the user code class loader during instantiating of the 
tasks
2. Usage in `PojoSerializer` where no explicit 
serialization/deserialization is performed because it is assumed that the 
correct class loader is in place.
3. Reuse of the `ExecutionConfig` for further jobs

If we alter the `ExecutionConfig` after 1) by setting the fields to `null`, 
we change the configuration for the next job. The `ExecutionEnvironment` reuses 
the config. This problem is not always visible because it depends on Akka 
whether the class is serialized or simply passed as a reference. If the class 
is serialized, then a deserialization of the lists won't affect the original 
reference.

As a solution, I've wrapped the types/serializer lists in a 
`SerilizableCacheableValue` which keeps the value for as long as possible and 
deserializes using the default class loader when not explicitly deserialized 
during task instantiating. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3701) Cant call execute after first execution

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252172#comment-15252172
 ] 

ASF GitHub Bot commented on FLINK-3701:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1913#issuecomment-213003239
  
The problem is a bit more involved. We have basically three possible 
branches for the `ExecutionConfig` usage.

1. Serialization during `JobGraph`/`StreamGraph` generation and 
deserialization using the user code class loader during instantiating of the 
tasks
2. Usage in `PojoSerializer` where no explicit 
serialization/deserialization is performed because it is assumed that the 
correct class loader is in place.
3. Reuse of the `ExecutionConfig` for further jobs

If we alter the `ExecutionConfig` after 1) by setting the fields to `null`, 
we change the configuration for the next job. The `ExecutionEnvironment` reuses 
the config. This problem is not always visible because it depends on Akka 
whether the class is serialized or simply passed as a reference. If the class 
is serialized, then a deserialization of the lists won't affect the original 
reference.

As a solution, I've wrapped the types/serializer lists in a 
`SerilizableCacheableValue` which keeps the value for as long as possible and 
deserializes using the default class loader when not explicitly deserialized 
during task instantiating. 


> Cant call execute after first execution
> ---
>
> Key: FLINK-3701
> URL: https://issues.apache.org/jira/browse/FLINK-3701
> Project: Flink
>  Issue Type: Bug
>  Components: Scala Shell
>Reporter: Nikolaas Steenbergen
>Assignee: Maximilian Michels
>
> in the scala shell, local mode, version 1.0 this works:
> {code}
> Scala-Flink> var b = env.fromElements("a","b")
> Scala-Flink> b.print
> Scala-Flink> var c = env.fromElements("c","d")
> Scala-Flink> c.print
> {code}
> in the current master (after c.print) this leads to :
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1031)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:961)
>   at 
> org.apache.flink.api.java.ScalaShellRemoteEnvironment.execute(ScalaShellRemoteEnvironment.java:70)
>   at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:855)
>   at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
>   at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
>   at org.apache.flink.api.scala.DataSet.print(DataSet.scala:1615)
>   at .(:56)
>   at .()
>   at .(:7)
>   at .()
>   at $print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
>   at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
>   at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
>   at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
>   at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
>   at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
>   at 
> scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
>   at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
>   at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
>   at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
>   at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
>   at 
> scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
>   at 
> scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
>   at 
> scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
>   at 
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>   at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
>   at 
> org.apache.flink.api.scala.FlinkShell$.startShell(FlinkShell.scala:199)
>   at org.apache.flink.api.scala.FlinkShell$.main(FlinkShell.scala:127)
>   at org.apache.flink.api.scala.FlinkShell.main(FlinkShell.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2157) Create evaluation framework for ML library

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252073#comment-15252073
 ] 

ASF GitHub Bot commented on FLINK-2157:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/1849#issuecomment-212977748
  
I did some testing and I think the problem has to do with the types that 
each scaler expects.

`StandardScaler` has fit and transform operations for `DataSets` of type 
`Vector`, `LabeledVector`, and `(T :< Vector, Double)` while `MinMaxScaler` 
does not provide one for `(T :< Vector, Double)`. If you add the operations the 
code runs fine (at least re. you first comment).

So this is a bug unrelated to this PR I think. The question becomes if we 
want to support all three of these types. My recommendation would be to have 
support for `Vector` and `LabeledVector` only, and remove all operations that 
work on `(Vector, Double)` tuples. I will file a JIRA for that.

There is an argument to be whether some pre-processing steps are supervised 
(e.g. [PCA vs. 
LDA](https://stats.stackexchange.com/questions/161362/supervised-dimensionality-reduction))
 but in the strict definition of a transformer we shouldn't care about the 
label, only the features, so that operation can implemented at the 
`Transformer` level.


> Create evaluation framework for ML library
> --
>
> Key: FLINK-2157
> URL: https://issues.apache.org/jira/browse/FLINK-2157
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Theodore Vasiloudis
>  Labels: ML
> Fix For: 1.0.0
>
>
> Currently, FlinkML lacks means to evaluate the performance of trained models. 
> It would be great to add some {{Evaluators}} which can calculate some score 
> based on the information about true and predicted labels. This could also be 
> used for the cross validation to choose the right hyper parameters.
> Possible scores could be F score [1], zero-one-loss score, etc.
> Resources
> [1] [http://en.wikipedia.org/wiki/F1_score]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-04-21 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/1849#issuecomment-212977748
  
I did some testing and I think the problem has to do with the types that 
each scaler expects.

`StandardScaler` has fit and transform operations for `DataSets` of type 
`Vector`, `LabeledVector`, and `(T :< Vector, Double)` while `MinMaxScaler` 
does not provide one for `(T :< Vector, Double)`. If you add the operations the 
code runs fine (at least re. you first comment).

So this is a bug unrelated to this PR I think. The question becomes if we 
want to support all three of these types. My recommendation would be to have 
support for `Vector` and `LabeledVector` only, and remove all operations that 
work on `(Vector, Double)` tuples. I will file a JIRA for that.

There is an argument to be whether some pre-processing steps are supervised 
(e.g. [PCA vs. 
LDA](https://stats.stackexchange.com/questions/161362/supervised-dimensionality-reduction))
 but in the strict definition of a transformer we shouldn't care about the 
label, only the features, so that operation can implemented at the 
`Transformer` level.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3800] [jobmanager] Terminate ExecutionG...

2016-04-21 Thread tillrohrmann
GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/1923

[FLINK-3800] [jobmanager] Terminate ExecutionGraphs properly

This PR terminates the ExecutionGraphs properly without restarts when the 
JobManager calls
cancelAndClearEverything. It is achieved by allowing the method to be only 
called with an
SuppressRestartsException. The SuppressRestartsException will disable the 
restart strategy of
the respective ExecutionGraph. This is important because the root cause 
could be a different
exception. In order to avoid race conditions, the restart strategy has to 
be checked twice
whether it allows to restart the job: Once before and once after the job 
has transitioned to
the state RESTARTING. This avoids that ExecutionGraphs can become orphans.

Furthermore, this PR fixes the problem that the default restart strategy is 
shared by multiple
jobs. The problem is solved by introducing a RestartStrategyFactory which 
creates for every
job its own instance of a RestartStrategy.

- [X] General
  - The pull request references the related JIRA issue
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message

- [X] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink fixJobRestart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1923.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1923


commit ea05ae102428f6be8db4091b849b680112099c36
Author: Till Rohrmann 
Date:   2016-04-21T15:07:51Z

[FLINK-3800] [jobmanager] Terminate ExecutionGraphs properly

This PR terminates the ExecutionGraphs properly without restarts when the 
JobManager calls
cancelAndClearEverything. It is achieved by allowing the method to be only 
called with an
SuppressRestartsException. The SuppressRestartsException will disable the 
restart strategy of
the respective ExecutionGraph. This is important because the root cause 
could be a different
exception. In order to avoid race conditions, the restart strategy has to 
be checked twice
whether it allwos to restart the job: Once before and once after the job 
has transitioned to
the state RESTARTING. This avoids that ExecutionGraphs can become an orphan.

Furhtermore, this PR fixes the problem that the default restart strategy is 
shared by multiple
jobs. The problem is solved by introducing a RestartStrategyFactory which 
creates for every
job its own instance of a RestartStrategy.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3800) ExecutionGraphs can become orphans

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252042#comment-15252042
 ] 

ASF GitHub Bot commented on FLINK-3800:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/1923

[FLINK-3800] [jobmanager] Terminate ExecutionGraphs properly

This PR terminates the ExecutionGraphs properly without restarts when the 
JobManager calls
cancelAndClearEverything. It is achieved by allowing the method to be only 
called with an
SuppressRestartsException. The SuppressRestartsException will disable the 
restart strategy of
the respective ExecutionGraph. This is important because the root cause 
could be a different
exception. In order to avoid race conditions, the restart strategy has to 
be checked twice
whether it allows to restart the job: Once before and once after the job 
has transitioned to
the state RESTARTING. This avoids that ExecutionGraphs can become orphans.

Furthermore, this PR fixes the problem that the default restart strategy is 
shared by multiple
jobs. The problem is solved by introducing a RestartStrategyFactory which 
creates for every
job its own instance of a RestartStrategy.

- [X] General
  - The pull request references the related JIRA issue
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message

- [X] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink fixJobRestart

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1923.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1923


commit ea05ae102428f6be8db4091b849b680112099c36
Author: Till Rohrmann 
Date:   2016-04-21T15:07:51Z

[FLINK-3800] [jobmanager] Terminate ExecutionGraphs properly

This PR terminates the ExecutionGraphs properly without restarts when the 
JobManager calls
cancelAndClearEverything. It is achieved by allowing the method to be only 
called with an
SuppressRestartsException. The SuppressRestartsException will disable the 
restart strategy of
the respective ExecutionGraph. This is important because the root cause 
could be a different
exception. In order to avoid race conditions, the restart strategy has to 
be checked twice
whether it allwos to restart the job: Once before and once after the job 
has transitioned to
the state RESTARTING. This avoids that ExecutionGraphs can become an orphan.

Furhtermore, this PR fixes the problem that the default restart strategy is 
shared by multiple
jobs. The problem is solved by introducing a RestartStrategyFactory which 
creates for every
job its own instance of a RestartStrategy.




> ExecutionGraphs can become orphans
> --
>
> Key: FLINK-3800
> URL: https://issues.apache.org/jira/browse/FLINK-3800
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>
> The {{JobManager.cancelAndClearEverything}} method fails all currently 
> executed jobs on the {{JobManager}} and then clears the list of 
> {{currentJobs}} kept in the JobManager. This can become problematic if the 
> user has set a restart strategy for a job, because the {{RestartStrategy}} 
> will try to restart the job. This can lead to unwanted re-deployments of the 
> job which consumes resources and thus will trouble the execution of other 
> jobs. If the restart strategy never stops, then this prevents that the 
> {{ExecutionGraph}} from ever being properly terminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3800) ExecutionGraphs can become orphans

2016-04-21 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3800:


 Summary: ExecutionGraphs can become orphans
 Key: FLINK-3800
 URL: https://issues.apache.org/jira/browse/FLINK-3800
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 1.0.0, 1.1.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann


The {{JobManager.cancelAndClearEverything}} method fails all currently executed 
jobs on the {{JobManager}} and then clears the list of {{currentJobs}} kept in 
the JobManager. This can become problematic if the user has set a restart 
strategy for a job, because the {{RestartStrategy}} will try to restart the 
job. This can lead to unwanted re-deployments of the job which consumes 
resources and thus will trouble the execution of other jobs. If the restart 
strategy never stops, then this prevents that the {{ExecutionGraph}} from ever 
being properly terminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3799) Graph checksum should execute single job

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252008#comment-15252008
 ] 

ASF GitHub Bot commented on FLINK-3799:
---

GitHub user greghogan opened a pull request:

https://github.com/apache/flink/pull/1922

[FLINK-3799] [gelly] Graph checksum should execute single job



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/greghogan/flink 
3799_graph_checksum_should_execute_single_job

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1922.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1922


commit d54274ce0f86613150c8716c71b9018524039a52
Author: Greg Hogan 
Date:   2016-04-21T12:36:04Z

[FLINK-3799] [gelly] Graph checksum should execute single job




> Graph checksum should execute single job
> 
>
> Key: FLINK-3799
> URL: https://issues.apache.org/jira/browse/FLINK-3799
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> {{GraphUtils.checksumHashCode()}} calls {{DataSetUtils.checksumHashCode()}} 
> for both the vertex and edge {{DataSet}} which each require a separate job. 
> Rewrite this to only execute a single job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3799] [gelly] Graph checksum should exe...

2016-04-21 Thread greghogan
GitHub user greghogan opened a pull request:

https://github.com/apache/flink/pull/1922

[FLINK-3799] [gelly] Graph checksum should execute single job



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/greghogan/flink 
3799_graph_checksum_should_execute_single_job

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1922.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1922


commit d54274ce0f86613150c8716c71b9018524039a52
Author: Greg Hogan 
Date:   2016-04-21T12:36:04Z

[FLINK-3799] [gelly] Graph checksum should execute single job




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-04-21 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/1849#issuecomment-212942014
  
Well breeze was recently bumped to 0.12 #1876, maybe that has something to 
do with it, but let's see.

Any chance you can try with the prev. Breeze version?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2157) Create evaluation framework for ML library

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251963#comment-15251963
 ] 

ASF GitHub Bot commented on FLINK-2157:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/1849#issuecomment-212942014
  
Well breeze was recently bumped to 0.12 #1876, maybe that has something to 
do with it, but let's see.

Any chance you can try with the prev. Breeze version?


> Create evaluation framework for ML library
> --
>
> Key: FLINK-2157
> URL: https://issues.apache.org/jira/browse/FLINK-2157
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Theodore Vasiloudis
>  Labels: ML
> Fix For: 1.0.0
>
>
> Currently, FlinkML lacks means to evaluate the performance of trained models. 
> It would be great to add some {{Evaluators}} which can calculate some score 
> based on the information about true and predicted labels. This could also be 
> used for the cross validation to choose the right hyper parameters.
> Possible scores could be F score [1], zero-one-loss score, etc.
> Resources
> [1] [http://en.wikipedia.org/wiki/F1_score]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3794) Add checks for unsupported operations in streaming table API

2016-04-21 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-3794.
--
   Resolution: Fixed
Fix Version/s: 1.1.0

> Add checks for unsupported operations in streaming table API
> 
>
> Key: FLINK-3794
> URL: https://issues.apache.org/jira/browse/FLINK-3794
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 1.1.0
>
>
> Unsupported operations on streaming tables currently fail during plan 
> translation. It would be nicer to add checks in the Table API methods and 
> fail with an informative message that the operation is not supported. The 
> operations that are not currently supported are:
> - aggregations inside select
> - groupBy
> - distinct
> - join
> We can simply check if the Table's environment is a streaming environment and 
> throw an unsupported operation exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)

2016-04-21 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-3727.
--
   Resolution: Implemented
Fix Version/s: 1.1.0

> Add support for embedded streaming SQL (projection, filter, union)
> --
>
> Key: FLINK-3727
> URL: https://issues.apache.org/jira/browse/FLINK-3727
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 1.1.0
>
>
> Similar to the support for SQL embedded in batch Table API programs, this 
> issue tracks the support for SQL embedded in stream Table API programs. The 
> only currently supported operations on streaming Tables are projection, 
> filtering, and union.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251902#comment-15251902
 ] 

ASF GitHub Bot commented on FLINK-3727:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1917


> Add support for embedded streaming SQL (projection, filter, union)
> --
>
> Key: FLINK-3727
> URL: https://issues.apache.org/jira/browse/FLINK-3727
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Similar to the support for SQL embedded in batch Table API programs, this 
> issue tracks the support for SQL embedded in stream Table API programs. The 
> only currently supported operations on streaming Tables are projection, 
> filtering, and union.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3794) Add checks for unsupported operations in streaming table API

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251903#comment-15251903
 ] 

ASF GitHub Bot commented on FLINK-3794:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1921


> Add checks for unsupported operations in streaming table API
> 
>
> Key: FLINK-3794
> URL: https://issues.apache.org/jira/browse/FLINK-3794
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Unsupported operations on streaming tables currently fail during plan 
> translation. It would be nicer to add checks in the Table API methods and 
> fail with an informative message that the operation is not supported. The 
> operations that are not currently supported are:
> - aggregations inside select
> - groupBy
> - distinct
> - join
> We can simply check if the Table's environment is a streaming environment and 
> throw an unsupported operation exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...

2016-04-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1917


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3794] add checks for unsupported operat...

2016-04-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1921


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3772) Graph algorithms for vertex and edge degree

2016-04-21 Thread Vasia Kalavri (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251900#comment-15251900
 ] 

Vasia Kalavri commented on FLINK-3772:
--

Hi [~greghogan],
I see your point about performance. But judging from your pull request, this 
adds a new set of utilities for Gelly.

So far, Gelly has had 3 components: 
- graph transformations (basically methods of the Graph class)
- iteration abstractions (vertex-centric, gsa, and scatter-gather)
- library of algorithms (which one can simply call with {{graph.run()}}).

Your PR adds a caching utility and a set of degree annotation functions. I am 
personally in favor and always excited about additions and extensions to Gelly. 
However, as the codebase is growing, several people have often raised the point 
that we need to be careful about adding new functionality. This, together with 
FLINK-3771, are considerably big additions and I think it would be best to see 
what the community thinks first. Would you mind starting a discussion in the 
dev mailing list about these issues?

Thank you!

> Graph algorithms for vertex and edge degree
> ---
>
> Key: FLINK-3772
> URL: https://issues.apache.org/jira/browse/FLINK-3772
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Many graph algorithms require vertices or edges to be marked with the degree. 
> This ticket provides algorithms for annotating
> * vertex degree for undirected graphs
> * vertex out-, in-, and out- and in-degree for directed graphs
> * edge source, target, and source and target degree for undirected graphs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2157) Create evaluation framework for ML library

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251880#comment-15251880
 ] 

ASF GitHub Bot commented on FLINK-2157:
---

Github user rawkintrevo commented on the pull request:

https://github.com/apache/flink/pull/1849#issuecomment-212916215
  
np, also RE: my comment on the docs- I think I can lend a hand there (I was 
actually testing functionality to make sure I understood how it worked). Let me 
know if I can be of assistance.

Also, I did some more hacking this morning...

```scala
%flink

import org.apache.flink.api.scala._

import org.apache.flink.ml.preprocessing.StandardScaler
val scaler = StandardScaler()//MinMaxScaler()

import org.apache.flink.ml.evaluation.{RegressionScores, Scorer}
val loss = RegressionScores.squaredLoss
val scorer = new Scorer(loss)

import org.apache.flink.ml.regression.MultipleLinearRegression
val mlr = MultipleLinearRegression()
.setIterations(microIters)
.setConvergenceThreshold(0.001)
.setWarmStart(true)

val pipeline = scaler.chainPredictor(mlr)
val evaluationDS = survivalLV.map(x => (x.vector, x.label))

pipeline.fit(survivalLV)
//pipeline.evaluate(survivalLV).collect()
scorer.evaluate(evaluationDS, pipeline).collect().head
```

This throws the  `breeze.linalg...` error.  So I'm not sure exactly what is 
different, but it would seem the breeze.linalg is close to the heart of the 
problem(?)



> Create evaluation framework for ML library
> --
>
> Key: FLINK-2157
> URL: https://issues.apache.org/jira/browse/FLINK-2157
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Theodore Vasiloudis
>  Labels: ML
> Fix For: 1.0.0
>
>
> Currently, FlinkML lacks means to evaluate the performance of trained models. 
> It would be great to add some {{Evaluators}} which can calculate some score 
> based on the information about true and predicted labels. This could also be 
> used for the cross validation to choose the right hyper parameters.
> Possible scores could be F score [1], zero-one-loss score, etc.
> Resources
> [1] [http://en.wikipedia.org/wiki/F1_score]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3799) Graph checksum should execute single job

2016-04-21 Thread Greg Hogan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-3799:
--
Fix Version/s: 1.1.0

> Graph checksum should execute single job
> 
>
> Key: FLINK-3799
> URL: https://issues.apache.org/jira/browse/FLINK-3799
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> {{GraphUtils.checksumHashCode()}} calls {{DataSetUtils.checksumHashCode()}} 
> for both the vertex and edge {{DataSet}} which each require a separate job. 
> Rewrite this to only execute a single job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3794) Add checks for unsupported operations in streaming table API

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251817#comment-15251817
 ] 

ASF GitHub Bot commented on FLINK-3794:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1921#issuecomment-212897347
  
Thanks, will do!


> Add checks for unsupported operations in streaming table API
> 
>
> Key: FLINK-3794
> URL: https://issues.apache.org/jira/browse/FLINK-3794
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Unsupported operations on streaming tables currently fail during plan 
> translation. It would be nicer to add checks in the Table API methods and 
> fail with an informative message that the operation is not supported. The 
> operations that are not currently supported are:
> - aggregations inside select
> - groupBy
> - distinct
> - join
> We can simply check if the Table's environment is a streaming environment and 
> throw an unsupported operation exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3794] add checks for unsupported operat...

2016-04-21 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1921#issuecomment-212897347
  
Thanks, will do!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3794) Add checks for unsupported operations in streaming table API

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251812#comment-15251812
 ] 

ASF GitHub Bot commented on FLINK-3794:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1921#issuecomment-212896335
  
Can be merged after the two minor comments have been resolved.
Thanks @vasia!


> Add checks for unsupported operations in streaming table API
> 
>
> Key: FLINK-3794
> URL: https://issues.apache.org/jira/browse/FLINK-3794
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Unsupported operations on streaming tables currently fail during plan 
> translation. It would be nicer to add checks in the Table API methods and 
> fail with an informative message that the operation is not supported. The 
> operations that are not currently supported are:
> - aggregations inside select
> - groupBy
> - distinct
> - join
> We can simply check if the Table's environment is a streaming environment and 
> throw an unsupported operation exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3794] add checks for unsupported operat...

2016-04-21 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1921#issuecomment-212896335
  
Can be merged after the two minor comments have been resolved.
Thanks @vasia!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3794) Add checks for unsupported operations in streaming table API

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251809#comment-15251809
 ] 

ASF GitHub Bot commented on FLINK-3794:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1921#discussion_r60570939
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/table/streaming/test/UnsupportedOpsTest.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.scala.table.streaming.test
+
+import org.apache.flink.api.scala.table._
+import 
org.apache.flink.api.scala.table.streaming.test.utils.{StreamITCase, 
StreamTestData}
+import org.apache.flink.api.table.{TableException, TableEnvironment}
+import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
+import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase
+import org.junit.Test
+
+import scala.collection.mutable
+
+class UnsupportedOpsTest extends StreamingMultipleProgramsTestBase {
+
+  @Test(expected = classOf[TableException])
+  def testSelectWithAggregation(): Unit = {
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+val tEnv = TableEnvironment.getTableEnvironment(env)
+
+StreamITCase.testResults = mutable.MutableList()
--- End diff --

`testResults` not used.


> Add checks for unsupported operations in streaming table API
> 
>
> Key: FLINK-3794
> URL: https://issues.apache.org/jira/browse/FLINK-3794
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Unsupported operations on streaming tables currently fail during plan 
> translation. It would be nicer to add checks in the Table API methods and 
> fail with an informative message that the operation is not supported. The 
> operations that are not currently supported are:
> - aggregations inside select
> - groupBy
> - distinct
> - join
> We can simply check if the Table's environment is a streaming environment and 
> throw an unsupported operation exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3794] add checks for unsupported operat...

2016-04-21 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1921#discussion_r60570939
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/api/scala/table/streaming/test/UnsupportedOpsTest.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.scala.table.streaming.test
+
+import org.apache.flink.api.scala.table._
+import 
org.apache.flink.api.scala.table.streaming.test.utils.{StreamITCase, 
StreamTestData}
+import org.apache.flink.api.table.{TableException, TableEnvironment}
+import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
+import org.apache.flink.streaming.util.StreamingMultipleProgramsTestBase
+import org.junit.Test
+
+import scala.collection.mutable
+
+class UnsupportedOpsTest extends StreamingMultipleProgramsTestBase {
+
+  @Test(expected = classOf[TableException])
+  def testSelectWithAggregation(): Unit = {
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+val tEnv = TableEnvironment.getTableEnvironment(env)
+
+StreamITCase.testResults = mutable.MutableList()
--- End diff --

`testResults` not used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3799) Graph checksum should execute single job

2016-04-21 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3799:
-

 Summary: Graph checksum should execute single job
 Key: FLINK-3799
 URL: https://issues.apache.org/jira/browse/FLINK-3799
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 1.1.0
Reporter: Greg Hogan
Assignee: Greg Hogan


{{GraphUtils.checksumHashCode()}} calls {{DataSetUtils.checksumHashCode()}} for 
both the vertex and edge {{DataSet}} which each require a separate job. Rewrite 
this to only execute a single job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3794] add checks for unsupported operat...

2016-04-21 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1921#discussion_r60570764
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/table.scala
 ---
@@ -280,8 +293,14 @@ class Table(
 * }}}
 */
   def groupBy(fields: String): GroupedTable = {
-val fieldsExpr = ExpressionParser.parseExpressionList(fields)
-groupBy(fieldsExpr: _*)
+// group by on stream tables is currently not supported
+tableEnv match {
--- End diff --

This check can be skip. It will be catched by the other `groupBy`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3794) Add checks for unsupported operations in streaming table API

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251808#comment-15251808
 ] 

ASF GitHub Bot commented on FLINK-3794:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1921#discussion_r60570764
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/table.scala
 ---
@@ -280,8 +293,14 @@ class Table(
 * }}}
 */
   def groupBy(fields: String): GroupedTable = {
-val fieldsExpr = ExpressionParser.parseExpressionList(fields)
-groupBy(fieldsExpr: _*)
+// group by on stream tables is currently not supported
+tableEnv match {
--- End diff --

This check can be skip. It will be catched by the other `groupBy`.


> Add checks for unsupported operations in streaming table API
> 
>
> Key: FLINK-3794
> URL: https://issues.apache.org/jira/browse/FLINK-3794
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Unsupported operations on streaming tables currently fail during plan 
> translation. It would be nicer to add checks in the Table API methods and 
> fail with an informative message that the operation is not supported. The 
> operations that are not currently supported are:
> - aggregations inside select
> - groupBy
> - distinct
> - join
> We can simply check if the Table's environment is a streaming environment and 
> throw an unsupported operation exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251806#comment-15251806
 ] 

ASF GitHub Bot commented on FLINK-3727:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1917#issuecomment-212892867
  
Thanks for the update @vasia! +1 to merge


> Add support for embedded streaming SQL (projection, filter, union)
> --
>
> Key: FLINK-3727
> URL: https://issues.apache.org/jira/browse/FLINK-3727
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Similar to the support for SQL embedded in batch Table API programs, this 
> issue tracks the support for SQL embedded in stream Table API programs. The 
> only currently supported operations on streaming Tables are projection, 
> filtering, and union.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...

2016-04-21 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1917#issuecomment-212892867
  
Thanks for the update @vasia! +1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3794] add checks for unsupported operat...

2016-04-21 Thread vasia
GitHub user vasia opened a pull request:

https://github.com/apache/flink/pull/1921

[FLINK-3794] add checks for unsupported operations in streaming table API

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [X] General
  - The pull request references the related JIRA issue
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [X] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

groupBy, distinct, join, and aggregation inside select will now fail before 
translation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vasia/flink flink-3794

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1921.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1921


commit 6a2f21a1f20933558635f89db03fcd03a180ca00
Author: vasia 
Date:   2016-04-21T10:17:32Z

[FLINK-3794] add checks for unsupported operations in streaming table API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3798) Clean up RocksDB state backend access modifiers

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251674#comment-15251674
 ] 

ASF GitHub Bot commented on FLINK-3798:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1918#issuecomment-212845767
  
Hotfix time?


> Clean up RocksDB state backend access modifiers
> ---
>
> Key: FLINK-3798
> URL: https://issues.apache.org/jira/browse/FLINK-3798
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Minor
>
> The RocksDB state backend uses a lot package private methods and fields which 
> makes it very hard to subclass the different parts for added functionality. I 
> think these should be protected instead. 
> Also the AbstractRocksDBState declares some methods final when there are 
> use-cases when a subclass migh want to call them.
> Just to give you an example I am creating a version of the value state which 
> would keep a small cache on heap. For this it would be enough to subclass the 
> RockDBStateBackend and RocksDBVAlue state classes if the above mentioned 
> changes were made. Now I have to use reflection to access package private 
> fields and actually copy classes due to final methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3798] [streaming] Clean up RocksDB back...

2016-04-21 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1918#issuecomment-212845767
  
Hotfix time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3790) Rolling File sink does not pick up hadoop configuration

2016-04-21 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251672#comment-15251672
 ] 

Fabian Hueske commented on FLINK-3790:
--

The fix should be merged to the {{release-1.0}} branch as well, no?

> Rolling File sink does not pick up hadoop configuration
> ---
>
> Key: FLINK-3790
> URL: https://issues.apache.org/jira/browse/FLINK-3790
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.0.2
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Critical
>
> In the rolling file sink function, a new hadoop configuration is created to 
> get the FileSystem every time, which completely ignores the hadoop config set 
> in flink-conf.yaml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3798) Clean up RocksDB state backend access modifiers

2016-04-21 Thread Gyula Fora (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora resolved FLINK-3798.
---
Resolution: Fixed

> Clean up RocksDB state backend access modifiers
> ---
>
> Key: FLINK-3798
> URL: https://issues.apache.org/jira/browse/FLINK-3798
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Minor
>
> The RocksDB state backend uses a lot package private methods and fields which 
> makes it very hard to subclass the different parts for added functionality. I 
> think these should be protected instead. 
> Also the AbstractRocksDBState declares some methods final when there are 
> use-cases when a subclass migh want to call them.
> Just to give you an example I am creating a version of the value state which 
> would keep a small cache on heap. For this it would be enough to subclass the 
> RockDBStateBackend and RocksDBVAlue state classes if the above mentioned 
> changes were made. Now I have to use reflection to access package private 
> fields and actually copy classes due to final methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3790) Rolling File sink does not pick up hadoop configuration

2016-04-21 Thread Gyula Fora (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora resolved FLINK-3790.
---
Resolution: Fixed

> Rolling File sink does not pick up hadoop configuration
> ---
>
> Key: FLINK-3790
> URL: https://issues.apache.org/jira/browse/FLINK-3790
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.0.2
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Critical
>
> In the rolling file sink function, a new hadoop configuration is created to 
> get the FileSystem every time, which completely ignores the hadoop config set 
> in flink-conf.yaml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3798) Clean up RocksDB state backend access modifiers

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251640#comment-15251640
 ] 

ASF GitHub Bot commented on FLINK-3798:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1918#issuecomment-212834630
  
ahh the actual classes were not public either, I'm so stupid to have missed 
that lol... :d


> Clean up RocksDB state backend access modifiers
> ---
>
> Key: FLINK-3798
> URL: https://issues.apache.org/jira/browse/FLINK-3798
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Minor
>
> The RocksDB state backend uses a lot package private methods and fields which 
> makes it very hard to subclass the different parts for added functionality. I 
> think these should be protected instead. 
> Also the AbstractRocksDBState declares some methods final when there are 
> use-cases when a subclass migh want to call them.
> Just to give you an example I am creating a version of the value state which 
> would keep a small cache on heap. For this it would be enough to subclass the 
> RockDBStateBackend and RocksDBVAlue state classes if the above mentioned 
> changes were made. Now I have to use reflection to access package private 
> fields and actually copy classes due to final methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3798] [streaming] Clean up RocksDB back...

2016-04-21 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1918#issuecomment-212834630
  
ahh the actual classes were not public either, I'm so stupid to have missed 
that lol... :d


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3790) Rolling File sink does not pick up hadoop configuration

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251627#comment-15251627
 ] 

ASF GitHub Bot commented on FLINK-3790:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1919


> Rolling File sink does not pick up hadoop configuration
> ---
>
> Key: FLINK-3790
> URL: https://issues.apache.org/jira/browse/FLINK-3790
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.0.2
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Critical
>
> In the rolling file sink function, a new hadoop configuration is created to 
> get the FileSystem every time, which completely ignores the hadoop config set 
> in flink-conf.yaml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3790] [streaming] Use proper hadoop con...

2016-04-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1919


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3798] [streaming] Clean up RocksDB back...

2016-04-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1918


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3798) Clean up RocksDB state backend access modifiers

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251623#comment-15251623
 ] 

ASF GitHub Bot commented on FLINK-3798:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1918


> Clean up RocksDB state backend access modifiers
> ---
>
> Key: FLINK-3798
> URL: https://issues.apache.org/jira/browse/FLINK-3798
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Minor
>
> The RocksDB state backend uses a lot package private methods and fields which 
> makes it very hard to subclass the different parts for added functionality. I 
> think these should be protected instead. 
> Also the AbstractRocksDBState declares some methods final when there are 
> use-cases when a subclass migh want to call them.
> Just to give you an example I am creating a version of the value state which 
> would keep a small cache on heap. For this it would be enough to subclass the 
> RockDBStateBackend and RocksDBVAlue state classes if the above mentioned 
> changes were made. Now I have to use reflection to access package private 
> fields and actually copy classes due to final methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3727) Add support for embedded streaming SQL (projection, filter, union)

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251617#comment-15251617
 ] 

ASF GitHub Bot commented on FLINK-3727:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1917#issuecomment-212828521
  
Thank you for the review @fhueske! I've addressed your comments. We can 
definitely try to figure out a better way to handle the sql tests, as there is 
a lot of overlap with the table api tests.


> Add support for embedded streaming SQL (projection, filter, union)
> --
>
> Key: FLINK-3727
> URL: https://issues.apache.org/jira/browse/FLINK-3727
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> Similar to the support for SQL embedded in batch Table API programs, this 
> issue tracks the support for SQL embedded in stream Table API programs. The 
> only currently supported operations on streaming Tables are projection, 
> filtering, and union.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3727] Embedded streaming SQL projection...

2016-04-21 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1917#issuecomment-212828521
  
Thank you for the review @fhueske! I've addressed your comments. We can 
definitely try to figure out a better way to handle the sql tests, as there is 
a lot of overlap with the table api tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3790) Rolling File sink does not pick up hadoop configuration

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251616#comment-15251616
 ] 

ASF GitHub Bot commented on FLINK-3790:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1919#issuecomment-212828216
  
Cool, could you merge it then? 


> Rolling File sink does not pick up hadoop configuration
> ---
>
> Key: FLINK-3790
> URL: https://issues.apache.org/jira/browse/FLINK-3790
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.0.2
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Critical
>
> In the rolling file sink function, a new hadoop configuration is created to 
> get the FileSystem every time, which completely ignores the hadoop config set 
> in flink-conf.yaml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3790] [streaming] Use proper hadoop con...

2016-04-21 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1919#issuecomment-212828216
  
Cool, could you merge it then? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3798] [streaming] Clean up RocksDB back...

2016-04-21 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1918#discussion_r60550239
  
--- Diff: 
flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBValueState.java
 ---
@@ -64,7 +64,7 @@
 * @param stateDesc The state identifier for the state. This contains 
name
 *   and can create a default state value.
 */
-   RocksDBValueState(ColumnFamilyHandle columnFamily,
+   public RocksDBValueState(ColumnFamilyHandle columnFamily,
--- End diff --

Oh, for some reason I didn't see that... 😅 Could you go ahead and merge 
it then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3798) Clean up RocksDB state backend access modifiers

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251615#comment-15251615
 ] 

ASF GitHub Bot commented on FLINK-3798:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1918#discussion_r60550239
  
--- Diff: 
flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBValueState.java
 ---
@@ -64,7 +64,7 @@
 * @param stateDesc The state identifier for the state. This contains 
name
 *   and can create a default state value.
 */
-   RocksDBValueState(ColumnFamilyHandle columnFamily,
+   public RocksDBValueState(ColumnFamilyHandle columnFamily,
--- End diff --

Oh, for some reason I didn't see that...  Could you go ahead and merge it 
then?


> Clean up RocksDB state backend access modifiers
> ---
>
> Key: FLINK-3798
> URL: https://issues.apache.org/jira/browse/FLINK-3798
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Minor
>
> The RocksDB state backend uses a lot package private methods and fields which 
> makes it very hard to subclass the different parts for added functionality. I 
> think these should be protected instead. 
> Also the AbstractRocksDBState declares some methods final when there are 
> use-cases when a subclass migh want to call them.
> Just to give you an example I am creating a version of the value state which 
> would keep a small cache on heap. For this it would be enough to subclass the 
> RockDBStateBackend and RocksDBVAlue state classes if the above mentioned 
> changes were made. Now I have to use reflection to access package private 
> fields and actually copy classes due to final methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3790) Rolling File sink does not pick up hadoop configuration

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251614#comment-15251614
 ] 

ASF GitHub Bot commented on FLINK-3790:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1919#issuecomment-212827419
  
We confirmed that this works in our production environment


> Rolling File sink does not pick up hadoop configuration
> ---
>
> Key: FLINK-3790
> URL: https://issues.apache.org/jira/browse/FLINK-3790
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.0.2
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Critical
>
> In the rolling file sink function, a new hadoop configuration is created to 
> get the FileSystem every time, which completely ignores the hadoop config set 
> in flink-conf.yaml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3790] [streaming] Use proper hadoop con...

2016-04-21 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1919#issuecomment-212827419
  
We confirmed that this works in our production environment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3798] [streaming] Clean up RocksDB back...

2016-04-21 Thread gyfora
Github user gyfora commented on a diff in the pull request:

https://github.com/apache/flink/pull/1918#discussion_r60549764
  
--- Diff: 
flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBValueState.java
 ---
@@ -64,7 +64,7 @@
 * @param stateDesc The state identifier for the state. This contains 
name
 *   and can create a default state value.
 */
-   RocksDBValueState(ColumnFamilyHandle columnFamily,
+   public RocksDBValueState(ColumnFamilyHandle columnFamily,
--- End diff --

I think thats how it is except for the AbstractRocksDBState :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3798) Clean up RocksDB state backend access modifiers

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251593#comment-15251593
 ] 

ASF GitHub Bot commented on FLINK-3798:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1918#discussion_r60547896
  
--- Diff: 
flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBValueState.java
 ---
@@ -64,7 +64,7 @@
 * @param stateDesc The state identifier for the state. This contains 
name
 *   and can create a default state value.
 */
-   RocksDBValueState(ColumnFamilyHandle columnFamily,
+   public RocksDBValueState(ColumnFamilyHandle columnFamily,
--- End diff --

Ah I see, you're right. Could you make all of the State constructors public 
then? Just for consistency...  


> Clean up RocksDB state backend access modifiers
> ---
>
> Key: FLINK-3798
> URL: https://issues.apache.org/jira/browse/FLINK-3798
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Minor
>
> The RocksDB state backend uses a lot package private methods and fields which 
> makes it very hard to subclass the different parts for added functionality. I 
> think these should be protected instead. 
> Also the AbstractRocksDBState declares some methods final when there are 
> use-cases when a subclass migh want to call them.
> Just to give you an example I am creating a version of the value state which 
> would keep a small cache on heap. For this it would be enough to subclass the 
> RockDBStateBackend and RocksDBVAlue state classes if the above mentioned 
> changes were made. Now I have to use reflection to access package private 
> fields and actually copy classes due to final methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3790) Rolling File sink does not pick up hadoop configuration

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251575#comment-15251575
 ] 

ASF GitHub Bot commented on FLINK-3790:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1919#issuecomment-212816048
  
I need about an hour :) but working on it


> Rolling File sink does not pick up hadoop configuration
> ---
>
> Key: FLINK-3790
> URL: https://issues.apache.org/jira/browse/FLINK-3790
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.0.2
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Critical
>
> In the rolling file sink function, a new hadoop configuration is created to 
> get the FileSystem every time, which completely ignores the hadoop config set 
> in flink-conf.yaml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3790] [streaming] Use proper hadoop con...

2016-04-21 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1919#issuecomment-212816048
  
I need about an hour :) but working on it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3798] [streaming] Clean up RocksDB back...

2016-04-21 Thread gyfora
Github user gyfora commented on a diff in the pull request:

https://github.com/apache/flink/pull/1918#discussion_r60544472
  
--- Diff: 
flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBValueState.java
 ---
@@ -64,7 +64,7 @@
 * @param stateDesc The state identifier for the state. This contains 
name
 *   and can create a default state value.
 */
-   RocksDBValueState(ColumnFamilyHandle columnFamily,
+   public RocksDBValueState(ColumnFamilyHandle columnFamily,
--- End diff --

The reason why I made the state constructors public so if you sublcass the 
RocksDBStatebackend you can still manually instantiate them. You wouldnt be 
able to call the protected constructor from your package though. 

It's no biggie, I can change it if you still think its better as protected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3790] [streaming] Use proper hadoop con...

2016-04-21 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1919#issuecomment-212807484
  
Did you test the changes on your system and verify that it correctly picks 
up the config?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3790) Rolling File sink does not pick up hadoop configuration

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251557#comment-15251557
 ] 

ASF GitHub Bot commented on FLINK-3790:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1919#issuecomment-212807484
  
Did you test the changes on your system and verify that it correctly picks 
up the config?


> Rolling File sink does not pick up hadoop configuration
> ---
>
> Key: FLINK-3790
> URL: https://issues.apache.org/jira/browse/FLINK-3790
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.0.2
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Critical
>
> In the rolling file sink function, a new hadoop configuration is created to 
> get the FileSystem every time, which completely ignores the hadoop config set 
> in flink-conf.yaml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3798) Clean up RocksDB state backend access modifiers

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251554#comment-15251554
 ] 

ASF GitHub Bot commented on FLINK-3798:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1918#discussion_r60543157
  
--- Diff: 
flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBValueState.java
 ---
@@ -64,7 +64,7 @@
 * @param stateDesc The state identifier for the state. This contains 
name
 *   and can create a default state value.
 */
-   RocksDBValueState(ColumnFamilyHandle columnFamily,
+   public RocksDBValueState(ColumnFamilyHandle columnFamily,
--- End diff --

Would it be enough to make this `protected`?


> Clean up RocksDB state backend access modifiers
> ---
>
> Key: FLINK-3798
> URL: https://issues.apache.org/jira/browse/FLINK-3798
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Minor
>
> The RocksDB state backend uses a lot package private methods and fields which 
> makes it very hard to subclass the different parts for added functionality. I 
> think these should be protected instead. 
> Also the AbstractRocksDBState declares some methods final when there are 
> use-cases when a subclass migh want to call them.
> Just to give you an example I am creating a version of the value state which 
> would keep a small cache on heap. For this it would be enough to subclass the 
> RockDBStateBackend and RocksDBVAlue state classes if the above mentioned 
> changes were made. Now I have to use reflection to access package private 
> fields and actually copy classes due to final methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3798] [streaming] Clean up RocksDB back...

2016-04-21 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1918#discussion_r60543157
  
--- Diff: 
flink-contrib/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBValueState.java
 ---
@@ -64,7 +64,7 @@
 * @param stateDesc The state identifier for the state. This contains 
name
 *   and can create a default state value.
 */
-   RocksDBValueState(ColumnFamilyHandle columnFamily,
+   public RocksDBValueState(ColumnFamilyHandle columnFamily,
--- End diff --

Would it be enough to make this `protected`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3796) FileSourceFunction doesn't respect InputFormat's life cycle methods

2016-04-21 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251550#comment-15251550
 ] 

Aljoscha Krettek commented on FLINK-3796:
-

I think [~kkl0u] is currently working on the streaming file sources and I was 
under the impression that this would make the {{FileSourceFunction}} obsolete 
as well.

> FileSourceFunction doesn't respect InputFormat's life cycle methods 
> 
>
> Key: FLINK-3796
> URL: https://issues.apache.org/jira/browse/FLINK-3796
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.0, 1.0.1, 1.0.2
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> The {{FileSourceFunction}} wraps {{InputFormat}} but doesn't execute its life 
> cycle functions correctly.
> 1) It doesn't call {{close()}} before reading the next InputSplit via 
> {{open(InputSplit split)}}.
> 2) It calls {{close()}} even if no InputSplit has been read (and thus 
> open(..) hasn't been called previously).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3708) Scala API for CEP

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251547#comment-15251547
 ] 

ASF GitHub Bot commented on FLINK-3708:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1905#discussion_r60542216
  
--- Diff: 
flink-libraries/flink-cep-scala/src/main/scala/org/apache/flink/cep/scala/pattern/package.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.cep.scala
+
+import org.apache.flink.cep.pattern.{FollowedByPattern => 
JFollowedByPattern, Pattern => JPattern}
+
+import _root_.scala.reflect.ClassTag
+
+package object pattern {
+  /**
+* Utility method to wrap { @link org.apache.flink.cep.pattern.Pattern} 
and its subclasses
+* for usage with the Scala API.
+*
+* @param javaPattern The underlying pattern from the Java API
+* @tparam T Base type of the elements appearing in the pattern
+* @tparam F Subtype of T to which the current pattern operator is 
constrained
+* @return A pattern from the Scala API which wraps the pattern from 
the Java API
+*/
+  private[flink] def wrapPattern[
+  T: ClassTag, F <: T : ClassTag](javaPattern: JPattern[T, F])
+  : Pattern[T, F] = javaPattern match {
+case f: JFollowedByPattern[T, F] => FollowedByPattern[T, F](f)
+case p: JPattern[T, F] => Pattern[T, F](p)
+case _ => null
--- End diff --

Oh yes, you're totally right Stefan. But then we could maybe return an 
`Option[Pattern]` from the `wrapPattern` method.


> Scala API for CEP
> -
>
> Key: FLINK-3708
> URL: https://issues.apache.org/jira/browse/FLINK-3708
> Project: Flink
>  Issue Type: Improvement
>  Components: CEP
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Stefan Richter
>
> Currently, The CEP library does not support Scala case classes, because the 
> {{TypeExtractor}} cannot handle them. In order to support them, it would be 
> necessary to offer a Scala API for the CEP library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3708] Scala API for CEP (initial).

2016-04-21 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1905#discussion_r60542216
  
--- Diff: 
flink-libraries/flink-cep-scala/src/main/scala/org/apache/flink/cep/scala/pattern/package.scala
 ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.cep.scala
+
+import org.apache.flink.cep.pattern.{FollowedByPattern => 
JFollowedByPattern, Pattern => JPattern}
+
+import _root_.scala.reflect.ClassTag
+
+package object pattern {
+  /**
+* Utility method to wrap { @link org.apache.flink.cep.pattern.Pattern} 
and its subclasses
+* for usage with the Scala API.
+*
+* @param javaPattern The underlying pattern from the Java API
+* @tparam T Base type of the elements appearing in the pattern
+* @tparam F Subtype of T to which the current pattern operator is 
constrained
+* @return A pattern from the Scala API which wraps the pattern from 
the Java API
+*/
+  private[flink] def wrapPattern[
+  T: ClassTag, F <: T : ClassTag](javaPattern: JPattern[T, F])
+  : Pattern[T, F] = javaPattern match {
+case f: JFollowedByPattern[T, F] => FollowedByPattern[T, F](f)
+case p: JPattern[T, F] => Pattern[T, F](p)
+case _ => null
--- End diff --

Oh yes, you're totally right Stefan. But then we could maybe return an 
`Option[Pattern]` from the `wrapPattern` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2157) Create evaluation framework for ML library

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251513#comment-15251513
 ] 

ASF GitHub Bot commented on FLINK-2157:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/1849#issuecomment-212793430
  
Hello Trevor,

Thanks for taking the time to look at this, I'll investigate these issues
today hopefully.

-- 
Sent from a mobile device. May contain autocorrect errors.
On Apr 21, 2016 12:16 AM, "Trevor Grant"  wrote:

> Also two quick issues.
>
> *pipelines*
>
> val scaler = MinMaxScaler()val pipeline = scaler.chainPredictor(mlr)val 
evaluationDS = survivalLV.map(x => (x.vector, x.label))
>
> pipeline.fit(survivalLV)
> scorer.evaluate(evaluationDS, pipeline).collect().head
>
> When using this with a ChainedPredictor as the predictor I get the
> following error:
> error: could not find implicit value for parameter evaluateOperation:
> 
org.apache.flink.ml.pipeline.EvaluateDataSetOperation[org.apache.flink.ml.pipeline.ChainedPredictor[org.apache.flink.ml.preprocessing.MinMaxScaler,org.apache.flink.ml.regression.MultipleLinearRegression],(org.apache.flink.ml.math.Vector,
> Double),Double]
>
> *MinMaxScaler()*
> Merging for me broke the following code:
>
> val scaler = MinMaxScaler()val scaledSurvivalLV = 
scaler.transform(survivalLV)
>
> With the following error (omiting part of the stack trace)
> Caused by: java.lang.NoSuchMethodError:
> breeze.linalg.Vector$.scalarOf()Lbreeze/linalg/support/ScalarOf;
> at
> 
org.apache.flink.ml.preprocessing.MinMaxScaler$$anonfun$3.apply(MinMaxScaler.scala:156)
> at
> 
org.apache.flink.ml.preprocessing.MinMaxScaler$$anonfun$3.apply(MinMaxScaler.scala:154)
> at org.apache.flink.api.scala.DataSet$$anon$7.reduce(DataSet.scala:584)
> at
> 
org.apache.flink.runtime.operators.chaining.ChainedAllReduceDriver.collect(ChainedAllReduceDriver.java:93)
> at
> 
org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
> at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:97)
> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:480)
> at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
> at java.lang.Thread.run(Thread.java:745)
>
> I'm looking for a work around. Just saying I found a regression. Other
> than that, looks/works AWESOME well done.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly or view it on GitHub
> 
>



> Create evaluation framework for ML library
> --
>
> Key: FLINK-2157
> URL: https://issues.apache.org/jira/browse/FLINK-2157
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Theodore Vasiloudis
>  Labels: ML
> Fix For: 1.0.0
>
>
> Currently, FlinkML lacks means to evaluate the performance of trained models. 
> It would be great to add some {{Evaluators}} which can calculate some score 
> based on the information about true and predicted labels. This could also be 
> used for the cross validation to choose the right hyper parameters.
> Possible scores could be F score [1], zero-one-loss score, etc.
> Resources
> [1] [http://en.wikipedia.org/wiki/F1_score]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2157] [ml] Create evaluation framework ...

2016-04-21 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/1849#issuecomment-212793430
  
Hello Trevor,

Thanks for taking the time to look at this, I'll investigate these issues
today hopefully.

-- 
Sent from a mobile device. May contain autocorrect errors.
On Apr 21, 2016 12:16 AM, "Trevor Grant"  wrote:

> Also two quick issues.
>
> *pipelines*
>
> val scaler = MinMaxScaler()val pipeline = scaler.chainPredictor(mlr)val 
evaluationDS = survivalLV.map(x => (x.vector, x.label))
>
> pipeline.fit(survivalLV)
> scorer.evaluate(evaluationDS, pipeline).collect().head
>
> When using this with a ChainedPredictor as the predictor I get the
> following error:
> error: could not find implicit value for parameter evaluateOperation:
> 
org.apache.flink.ml.pipeline.EvaluateDataSetOperation[org.apache.flink.ml.pipeline.ChainedPredictor[org.apache.flink.ml.preprocessing.MinMaxScaler,org.apache.flink.ml.regression.MultipleLinearRegression],(org.apache.flink.ml.math.Vector,
> Double),Double]
>
> *MinMaxScaler()*
> Merging for me broke the following code:
>
> val scaler = MinMaxScaler()val scaledSurvivalLV = 
scaler.transform(survivalLV)
>
> With the following error (omiting part of the stack trace)
> Caused by: java.lang.NoSuchMethodError:
> breeze.linalg.Vector$.scalarOf()Lbreeze/linalg/support/ScalarOf;
> at
> 
org.apache.flink.ml.preprocessing.MinMaxScaler$$anonfun$3.apply(MinMaxScaler.scala:156)
> at
> 
org.apache.flink.ml.preprocessing.MinMaxScaler$$anonfun$3.apply(MinMaxScaler.scala:154)
> at org.apache.flink.api.scala.DataSet$$anon$7.reduce(DataSet.scala:584)
> at
> 
org.apache.flink.runtime.operators.chaining.ChainedAllReduceDriver.collect(ChainedAllReduceDriver.java:93)
> at
> 
org.apache.flink.runtime.operators.chaining.ChainedMapDriver.collect(ChainedMapDriver.java:78)
> at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:97)
> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:480)
> at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
> at java.lang.Thread.run(Thread.java:745)
>
> I'm looking for a work around. Just saying I found a regression. Other
> than that, looks/works AWESOME well done.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly or view it on GitHub
> 
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3781) BlobClient may be left unclosed in BlobCache#deleteGlobal()

2016-04-21 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251365#comment-15251365
 ] 

Fabian Hueske commented on FLINK-3781:
--

[~readman] Yes, definitely :-) 
Thanks for your contribution.
I gave you contributor permissions as well. You can now assign issues to 
yourself if you'd like to continue contributing.

> BlobClient may be left unclosed in BlobCache#deleteGlobal()
> ---
>
> Key: FLINK-3781
> URL: https://issues.apache.org/jira/browse/FLINK-3781
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Chenguang He
>Priority: Minor
> Fix For: 1.1.0
>
>
> {code}
>   public void deleteGlobal(BlobKey key) throws IOException {
> delete(key);
> BlobClient bc = createClient();
> bc.delete(key);
> bc.close();
> {code}
> If delete() throws IOException, BlobClient would be left inclosed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3781) BlobClient may be left unclosed in BlobCache#deleteGlobal()

2016-04-21 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske updated FLINK-3781:
-
Assignee: Chenguang He

> BlobClient may be left unclosed in BlobCache#deleteGlobal()
> ---
>
> Key: FLINK-3781
> URL: https://issues.apache.org/jira/browse/FLINK-3781
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Chenguang He
>Priority: Minor
> Fix For: 1.1.0
>
>
> {code}
>   public void deleteGlobal(BlobKey key) throws IOException {
> delete(key);
> BlobClient bc = createClient();
> bc.delete(key);
> bc.close();
> {code}
> If delete() throws IOException, BlobClient would be left inclosed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3229) Kinesis streaming consumer with integration of Flink's checkpointing mechanics

2016-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251355#comment-15251355
 ] 

ASF GitHub Bot commented on FLINK-3229:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/1911#discussion_r60531075
  
--- Diff: flink-streaming-connectors/pom.xml ---
@@ -45,6 +45,7 @@ under the License.
flink-connector-rabbitmq
flink-connector-twitter
flink-connector-nifi
+   flink-connector-kinesis
--- End diff --

Thanks, I missed the "include-kinesis" profile defined below. We'll 
probably need a more general profile name in the future though (ex. 
include-aws-connectors), for example when we start including more Amazon 
licensed libraries for other connectors such as for DynamoDB


> Kinesis streaming consumer with integration of Flink's checkpointing mechanics
> --
>
> Key: FLINK-3229
> URL: https://issues.apache.org/jira/browse/FLINK-3229
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Affects Versions: 1.0.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>
> Opening a sub-task to implement data source consumer for Kinesis streaming 
> connector (https://issues.apache.org/jira/browser/FLINK-3211).
> An example of the planned user API for Flink Kinesis Consumer:
> {code}
> Properties kinesisConfig = new Properties();
> config.put(KinesisConfigConstants.CONFIG_AWS_REGION, "us-east-1");
> config.put(KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_TYPE, 
> "BASIC");
> config.put(
> KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_BASIC_ACCESSKEYID, 
> "aws_access_key_id_here");
> config.put(
> KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_BASIC_SECRETKEY,
> "aws_secret_key_here");
> config.put(KinesisConfigConstants.CONFIG_STREAM_INIT_POSITION_TYPE, 
> "LATEST"); // or TRIM_HORIZON
> DataStream kinesisRecords = env.addSource(new FlinkKinesisConsumer<>(
> "kinesis_stream_name",
> new SimpleStringSchema(),
> kinesisConfig));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3229] Flink streaming consumer for AWS ...

2016-04-21 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/1911#discussion_r60531075
  
--- Diff: flink-streaming-connectors/pom.xml ---
@@ -45,6 +45,7 @@ under the License.
flink-connector-rabbitmq
flink-connector-twitter
flink-connector-nifi
+   flink-connector-kinesis
--- End diff --

Thanks, I missed the "include-kinesis" profile defined below. We'll 
probably need a more general profile name in the future though (ex. 
include-aws-connectors), for example when we start including more Amazon 
licensed libraries for other connectors such as for DynamoDB


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---