[jira] [Commented] (FLINK-3951) Add Histogram Metric Type

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335470#comment-15335470
 ] 

ASF GitHub Bot commented on FLINK-3951:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2112
  
So we don't want to tie ourselves to DropWizard histograms, but the only 
way for users to use a histogram right now _without implementing one themselves 
(something that is too much work for us apparently)_ is to rely on DropWizard 
histograms. This doesn't really fit together. 

_Could_ we add other implementations? Sure! But I have a hard time 
believing that we'll ever get around to doing that. If it doesn't make sense 
now to implement one because the DW ones offer so much functionality, when 
would it?


> Add Histogram Metric Type
> -
>
> Key: FLINK-3951
> URL: https://issues.apache.org/jira/browse/FLINK-3951
> Project: Flink
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Stephan Ewen
>Assignee: Till Rohrmann
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2112: [FLINK-3951] Add histogram metric type

2016-06-16 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2112
  
So we don't want to tie ourselves to DropWizard histograms, but the only 
way for users to use a histogram right now _without implementing one themselves 
(something that is too much work for us apparently)_ is to rely on DropWizard 
histograms. This doesn't really fit together. 

_Could_ we add other implementations? Sure! But I have a hard time 
believing that we'll ever get around to doing that. If it doesn't make sense 
now to implement one because the DW ones offer so much functionality, when 
would it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis

2016-06-16 Thread Kostas Kloudas (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostas Kloudas reassigned FLINK-4075:
-

Assignee: Kostas Kloudas

> ContinuousFileProcessingCheckpointITCase failed on Travis
> -
>
> Key: FLINK-4075
> URL: https://issues.apache.org/jira/browse/FLINK-4075
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Kostas Kloudas
>Priority: Critical
>  Labels: test-stability
>
> The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334748#comment-15334748
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/1856
  
Thanks for the update @ramkrish86. PR is good to merge.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/1856
  
Thanks for the update @ramkrish86. PR is good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2108: [FLINK-4027] Flush FlinkKafkaProducer on checkpoints

2016-06-16 Thread eliaslevy
Github user eliaslevy commented on the issue:

https://github.com/apache/flink/pull/2108
  
👍 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4027) FlinkKafkaProducer09 sink can lose messages

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334636#comment-15334636
 ] 

ASF GitHub Bot commented on FLINK-4027:
---

Github user eliaslevy commented on the issue:

https://github.com/apache/flink/pull/2108
  
 


> FlinkKafkaProducer09 sink can lose messages
> ---
>
> Key: FLINK-4027
> URL: https://issues.apache.org/jira/browse/FLINK-4027
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.0.3
>Reporter: Elias Levy
>Assignee: Robert Metzger
>Priority: Critical
>
> The FlinkKafkaProducer09 sink appears to not offer at-least-once guarantees.
> The producer is publishing messages asynchronously.  A callback can record 
> publishing errors, which will be raised when detected.  But as far as I can 
> tell, there is no barrier to wait for async errors from the sink when 
> checkpointing or to track the event time of acked messages to inform the 
> checkpointing process.
> If a checkpoint occurs while there are pending publish requests, and the 
> requests return a failure after the checkpoint occurred, those message will 
> be lost as the checkpoint will consider them processed by the sink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2109: [FLINK-3677] FileInputFormat: Allow to specify include/ex...

2016-06-16 Thread mushketyk
Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2109
  
Fixed my PR according to the review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3677) FileInputFormat: Allow to specify include/exclude file name patterns

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334570#comment-15334570
 ] 

ASF GitHub Bot commented on FLINK-3677:
---

Github user mushketyk commented on the issue:

https://github.com/apache/flink/pull/2109
  
Fixed my PR according to the review.


> FileInputFormat: Allow to specify include/exclude file name patterns
> 
>
> Key: FLINK-3677
> URL: https://issues.apache.org/jira/browse/FLINK-3677
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Maximilian Michels
>Assignee: Ivan Mushketyk
>Priority: Minor
>  Labels: starter
>
> It would be nice to be able to specify a regular expression to filter files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3319) Add or operator to CEP's pattern API

2016-06-16 Thread Robert Thorman (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334509#comment-15334509
 ] 

Robert Thorman commented on FLINK-3319:
---

Is this as simple as duplicating the AndFilterFunction and implementing the 
filter as an OR?  If so, I'd be glad to work on this one and get familiar with 
flink-cep.

> Add or operator to CEP's pattern API
> 
>
> Key: FLINK-3319
> URL: https://issues.apache.org/jira/browse/FLINK-3319
> Project: Flink
>  Issue Type: Improvement
>  Components: CEP
>Affects Versions: 1.0.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> Adding an {{or}} operator to CEP's pattern API would be beneficial. This 
> would considerably extend the set of supported patterns. The {{or}} operator 
> lets you define multiple succeeding pattern states for the next stage.
> {code}
> Pattern.begin("start").next("middle1").where(...).or("middle2").where(...)
> {code}
> In order to implement the {{or}} operator, one has to extend the {{Pattern}} 
> class. Furthermore, the {{NFACompiler}} has to be extended to generate two 
> resulting pattern states in case of an {{or}} operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis

2016-06-16 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334325#comment-15334325
 ] 

Ufuk Celebi commented on FLINK-4075:


Another instance 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/138086467/log.txt

> ContinuousFileProcessingCheckpointITCase failed on Travis
> -
>
> Key: FLINK-4075
> URL: https://issues.apache.org/jira/browse/FLINK-4075
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
>
> The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334195#comment-15334195
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/1856
  
@fhueske - Pushed with latest updates. Pls review. Thank you for guiding me 
through this.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334190#comment-15334190
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user ramkrish86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67385885
  
--- Diff: 
flink-scala/src/test/scala/org/apache/flink/api/operator/SelectByFunctionTest.scala
 ---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.operator
+
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.scala.{SelectByMaxFunction, 
SelectByMinFunction}
+import org.apache.flink.api.scala.typeutils.CaseClassTypeInfo
+import org.junit.{Assert, Test}
+
+/**
+  *
+  */
+class SelectByFunctionTest {
+
+  val tupleTypeInfo = new CaseClassTypeInfo(
+classOf[(Int, Long, String, Long, Int)], Array(), 
Array(BasicTypeInfo.INT_TYPE_INFO,
+  BasicTypeInfo.LONG_TYPE_INFO,
+  BasicTypeInfo.STRING_TYPE_INFO,
+  BasicTypeInfo.LONG_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO),
+Array("_1", "_2","_3","_4","_5")) {
+
+override def createSerializer(config: ExecutionConfig):
+  TypeSerializer[(Int, Long, String, Long, Int)] = ???
+  }
+
+   private val bigger  = new Tuple5[Int, Long, String, Long, Int](10, 
100L, "HelloWorld", 200L, 20)
--- End diff --

I actually did this way. But may be because of my other definition of 
tupleTypeInfo  the assertion was not working fine though I was getting the 
expected result. I was not aware of this 'implicitly' call. Just read that. 


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ramkrish86
Github user ramkrish86 commented on the issue:

https://github.com/apache/flink/pull/1856
  
@fhueske - Pushed with latest updates. Pls review. Thank you for guiding me 
through this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ramkrish86
Github user ramkrish86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67385885
  
--- Diff: 
flink-scala/src/test/scala/org/apache/flink/api/operator/SelectByFunctionTest.scala
 ---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.operator
+
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.scala.{SelectByMaxFunction, 
SelectByMinFunction}
+import org.apache.flink.api.scala.typeutils.CaseClassTypeInfo
+import org.junit.{Assert, Test}
+
+/**
+  *
+  */
+class SelectByFunctionTest {
+
+  val tupleTypeInfo = new CaseClassTypeInfo(
+classOf[(Int, Long, String, Long, Int)], Array(), 
Array(BasicTypeInfo.INT_TYPE_INFO,
+  BasicTypeInfo.LONG_TYPE_INFO,
+  BasicTypeInfo.STRING_TYPE_INFO,
+  BasicTypeInfo.LONG_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO),
+Array("_1", "_2","_3","_4","_5")) {
+
+override def createSerializer(config: ExecutionConfig):
+  TypeSerializer[(Int, Long, String, Long, Int)] = ???
+  }
+
+   private val bigger  = new Tuple5[Int, Long, String, Long, Int](10, 
100L, "HelloWorld", 200L, 20)
--- End diff --

I actually did this way. But may be because of my other definition of 
tupleTypeInfo  the assertion was not working fine though I was getting the 
expected result. I was not aware of this 'implicitly' call. Just read that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-4040) Same env.java.opts is applied for TM , JM and ZK

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-4040:
--
Component/s: YARN Client

> Same env.java.opts is applied for TM , JM and ZK
> 
>
> Key: FLINK-4040
> URL: https://issues.apache.org/jira/browse/FLINK-4040
> Project: Flink
>  Issue Type: Bug
>  Components: Start-Stop Scripts, YARN Client
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Gyula Fora
>Priority: Critical
> Fix For: 1.1.0
>
>
> This makes it very hard to fine tune the different processes or to set up 
> profiling on the Task managers.
> We should have dedicated config keys for the 3 processes
> For example if someone wants to setup jmx for the task manager, and the TM 
> and JM runs on the same host they will try to bind to the same jmx port and 
> fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3977) Subclasses of InternalWindowFunction must support OutputTypeConfigurable

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334087#comment-15334087
 ] 

ASF GitHub Bot commented on FLINK-3977:
---

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/2118

[FLINK-3977] InternalWindowFunctions implement OutputTypeConfigurable and 
forward calls to wrapped functions.

- [X] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [X] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink windowOutTConfig

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2118


commit 973ee7fd797bb5e92c5e664075e2e33d21eed28f
Author: Fabian Hueske 
Date:   2016-06-16T07:09:40Z

[FLINK-3977] InternalWindowFunctions implement OutputTypeConfigurable and 
forward calls to wrapped functions.




> Subclasses of InternalWindowFunction must support OutputTypeConfigurable
> 
>
> Key: FLINK-3977
> URL: https://issues.apache.org/jira/browse/FLINK-3977
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Fabian Hueske
>Priority: Critical
>
> Right now, if they wrap functions and a wrapped function implements 
> {{OutputTypeConfigurable}}, {{setOutputType}} is never called. This manifests 
> itself, for example, in FoldFunction on a window with evictor not working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2118: [FLINK-3977] InternalWindowFunctions implement Out...

2016-06-16 Thread fhueske
GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/2118

[FLINK-3977] InternalWindowFunctions implement OutputTypeConfigurable and 
forward calls to wrapped functions.

- [X] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [X] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink windowOutTConfig

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2118


commit 973ee7fd797bb5e92c5e664075e2e33d21eed28f
Author: Fabian Hueske 
Date:   2016-06-16T07:09:40Z

[FLINK-3977] InternalWindowFunctions implement OutputTypeConfigurable and 
forward calls to wrapped functions.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3827) Flink modules include unused dependencies

2016-06-16 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334076#comment-15334076
 ] 

Till Rohrmann commented on FLINK-3827:
--

I don't really know what the status of this issue is. We should check again 
after merging PR #2092.

> Flink modules include unused dependencies
> -
>
> Key: FLINK-3827
> URL: https://issues.apache.org/jira/browse/FLINK-3827
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
> Fix For: 1.1.0
>
>
> A quick look via {{mvn dependency:analyze}} revealed that many Flink modules 
> include dependencies which they don't really need. We should fix this for the 
> next {{1.1}} release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script

2016-06-16 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-4084:


 Summary: Add configDir parameter to CliFrontend and flink shell 
script
 Key: FLINK-4084
 URL: https://issues.apache.org/jira/browse/FLINK-4084
 Project: Flink
  Issue Type: Improvement
  Components: Command-line client
Affects Versions: 1.1.0
Reporter: Till Rohrmann
Priority: Minor


At the moment there is no other way than the environment variable 
FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if it 
is started via the flink shell script. In order to improve the user exprience, 
I would propose to introduce a {{--configDir}} parameter which the user can use 
to specify a configuration directory more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3942) Add support for INTERSECT

2016-06-16 Thread Jark Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-3942:
--

Assignee: Jark Wu

> Add support for INTERSECT
> -
>
> Key: FLINK-3942
> URL: https://issues.apache.org/jira/browse/FLINK-3942
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Fabian Hueske
>Assignee: Jark Wu
>Priority: Minor
>
> Currently, the Table API and SQL do not support INTERSECT.
> INTERSECT can be executed as join on all fields.
> In order to add support for INTERSECT to the Table API and SQL we need to:
> - Implement a {{DataSetIntersect}} class that translates an INTERSECT into a 
> DataSet API program using a join on all fields.
> - Implement a {{DataSetIntersectRule}} that translates a Calcite 
> {{LogicalIntersect}} into a {{DataSetIntersect}}.
> - Extend the Table API (and validation phase) to provide an intersect() 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3942) Add support for INTERSECT

2016-06-16 Thread Jark Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334042#comment-15334042
 ] 

Jark Wu commented on FLINK-3942:


I can try to work on it if nobody else is already work on it. 

> Add support for INTERSECT
> -
>
> Key: FLINK-3942
> URL: https://issues.apache.org/jira/browse/FLINK-3942
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Fabian Hueske
>Priority: Minor
>
> Currently, the Table API and SQL do not support INTERSECT.
> INTERSECT can be executed as join on all fields.
> In order to add support for INTERSECT to the Table API and SQL we need to:
> - Implement a {{DataSetIntersect}} class that translates an INTERSECT into a 
> DataSet API program using a join on all fields.
> - Implement a {{DataSetIntersectRule}} that translates a Calcite 
> {{LogicalIntersect}} into a {{DataSetIntersect}}.
> - Extend the Table API (and validation phase) to provide an intersect() 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3839) Support wildcards in classpath parameters

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333992#comment-15333992
 ] 

ASF GitHub Bot commented on FLINK-3839:
---

Github user thormanrd commented on the issue:

https://github.com/apache/flink/pull/2114
  
Corrected the typo.  Gotta add FLINK to my dictionary.  


> Support wildcards in classpath parameters
> -
>
> Key: FLINK-3839
> URL: https://issues.apache.org/jira/browse/FLINK-3839
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ken Krugler
>Assignee: Robert Thorman
>Priority: Minor
>
> Currently you can only specify a single explict jar with the CLI --classpath 
> file:// parameter.Java (since 1.6) has allowed you to use -cp 
> /* as a way of adding every file that ends in .jar in a 
> directory.
> This would simplify things, e.g. when running on EMR you have to add roughly 
> 120 jars explicitly, but these are all located in just two directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2114: FLINK-3839 Added the extraction of jar files from a URL s...

2016-06-16 Thread thormanrd
Github user thormanrd commented on the issue:

https://github.com/apache/flink/pull/2114
  
Corrected the typo.  Gotta add FLINK to my dictionary.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4068) Move constant computations out of code-generated `flatMap` functions.

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333991#comment-15333991
 ] 

ASF GitHub Bot commented on FLINK-4068:
---

Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/2102
  
After introducing `RexExecutor` which make `ReduceExpressionRules` taking 
effect ,  many errors occurred.  

1. The `cannot translate call AS($t0, $t1)` is a Calcite bug I think, and I 
created a related issue : 
[CALCITE-1295](https://issues.apache.org/jira/browse/CALCITE-1295).

2. We should replace 
[L69](https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/arithmetic.scala#L69-L73)
 to `relBuilder.call(SqlStdOperatorTable.CONCAT, l, cast)` otherwise will throw 
the following exception. Because calcite have no plus(String, String) method. 

```
java.lang.RuntimeException: while resolving method 'plus[class 
java.lang.String, class java.lang.String]' in class class 
org.apache.calcite.runtime.SqlFunctions

at org.apache.calcite.linq4j.tree.Types.lookupMethod(Types.java:345)
at org.apache.calcite.linq4j.tree.Expressions.call(Expressions.java:442)
at 
org.apache.calcite.adapter.enumerable.RexImpTable$BinaryImplementor.implement(RexImpTable.java:1640)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.implementCall(RexImpTable.java:854)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.implementNullSemantics(RexImpTable.java:843)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.implementNullSemantics0(RexImpTable.java:756)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.access$900(RexImpTable.java:181)
at 
org.apache.calcite.adapter.enumerable.RexImpTable$3.implement(RexImpTable.java:411)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translateCall(RexToLixTranslator.java:535)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate0(RexToLixTranslator.java:507)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:222)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate0(RexToLixTranslator.java:472)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:222)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:217)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translateList(RexToLixTranslator.java:700)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translateProjects(RexToLixTranslator.java:192)
at 
org.apache.calcite.rex.RexExecutorImpl.compile(RexExecutorImpl.java:80)
at 
org.apache.calcite.rex.RexExecutorImpl.compile(RexExecutorImpl.java:59)
at 
org.apache.calcite.rex.RexExecutorImpl.reduce(RexExecutorImpl.java:118)
at 
org.apache.calcite.rel.rules.ReduceExpressionsRule.reduceExpressionsInternal(ReduceExpressionsRule.java:544)
at 
org.apache.calcite.rel.rules.ReduceExpressionsRule.reduceExpressions(ReduceExpressionsRule.java:455)
at 
org.apache.calcite.rel.rules.ReduceExpressionsRule.reduceExpressions(ReduceExpressionsRule.java:438)
at 
org.apache.calcite.rel.rules.ReduceExpressionsRule$CalcReduceExpressionsRule.onMatch(ReduceExpressionsRule.java:350)
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:213)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:838)
at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:334)
at 
org.apache.flink.api.table.BatchTableEnvironment.translate(BatchTableEnvironment.scala:250)
at 
org.apache.flink.api.java.table.BatchTableEnvironment.toDataSet(BatchTableEnvironment.scala:146)
at org.apache.flink.api.java.batch.table.ExpressionsITCase.testCom


3. The following error is when we convert `Trim` to `RexNode`, we use a 
Integer to represent "LEADING", "TRAILING", "BOTH". Instead we should use 
`SqlTrimFunction.Flag`. But I haven't found how to write SqlTrimFunction.Flag 
into a `RexNode`.

 ```
java.lang.ClassCastException: java.lang.Integer cannot be cast to 
org.apache.calcite.sql.fun.SqlTrimFunction$Flag

at 
org.apache.calcite.adapter.enumerable.RexImpTable$TrimImplementor.implement(RexImpTable.java:1448)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.implementCall(RexImpTable.java:854)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.implementNullSemantics(RexImpTable.java:843)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.implementNullSemantics0(RexImpTable.java:756)
at 

[GitHub] flink issue #2102: [FLINK-4068] [tableAPI] Move constant computations out of...

2016-06-16 Thread wuchong
Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/2102
  
After introducing `RexExecutor` which make `ReduceExpressionRules` taking 
effect ,  many errors occurred.  

1. The `cannot translate call AS($t0, $t1)` is a Calcite bug I think, and I 
created a related issue : 
[CALCITE-1295](https://issues.apache.org/jira/browse/CALCITE-1295).

2. We should replace 
[L69](https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/expressions/arithmetic.scala#L69-L73)
 to `relBuilder.call(SqlStdOperatorTable.CONCAT, l, cast)` otherwise will throw 
the following exception. Because calcite have no plus(String, String) method. 

```
java.lang.RuntimeException: while resolving method 'plus[class 
java.lang.String, class java.lang.String]' in class class 
org.apache.calcite.runtime.SqlFunctions

at org.apache.calcite.linq4j.tree.Types.lookupMethod(Types.java:345)
at org.apache.calcite.linq4j.tree.Expressions.call(Expressions.java:442)
at 
org.apache.calcite.adapter.enumerable.RexImpTable$BinaryImplementor.implement(RexImpTable.java:1640)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.implementCall(RexImpTable.java:854)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.implementNullSemantics(RexImpTable.java:843)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.implementNullSemantics0(RexImpTable.java:756)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.access$900(RexImpTable.java:181)
at 
org.apache.calcite.adapter.enumerable.RexImpTable$3.implement(RexImpTable.java:411)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translateCall(RexToLixTranslator.java:535)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate0(RexToLixTranslator.java:507)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:222)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate0(RexToLixTranslator.java:472)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:222)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:217)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translateList(RexToLixTranslator.java:700)
at 
org.apache.calcite.adapter.enumerable.RexToLixTranslator.translateProjects(RexToLixTranslator.java:192)
at 
org.apache.calcite.rex.RexExecutorImpl.compile(RexExecutorImpl.java:80)
at 
org.apache.calcite.rex.RexExecutorImpl.compile(RexExecutorImpl.java:59)
at 
org.apache.calcite.rex.RexExecutorImpl.reduce(RexExecutorImpl.java:118)
at 
org.apache.calcite.rel.rules.ReduceExpressionsRule.reduceExpressionsInternal(ReduceExpressionsRule.java:544)
at 
org.apache.calcite.rel.rules.ReduceExpressionsRule.reduceExpressions(ReduceExpressionsRule.java:455)
at 
org.apache.calcite.rel.rules.ReduceExpressionsRule.reduceExpressions(ReduceExpressionsRule.java:438)
at 
org.apache.calcite.rel.rules.ReduceExpressionsRule$CalcReduceExpressionsRule.onMatch(ReduceExpressionsRule.java:350)
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:213)
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:838)
at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:334)
at 
org.apache.flink.api.table.BatchTableEnvironment.translate(BatchTableEnvironment.scala:250)
at 
org.apache.flink.api.java.table.BatchTableEnvironment.toDataSet(BatchTableEnvironment.scala:146)
at org.apache.flink.api.java.batch.table.ExpressionsITCase.testCom


3. The following error is when we convert `Trim` to `RexNode`, we use a 
Integer to represent "LEADING", "TRAILING", "BOTH". Instead we should use 
`SqlTrimFunction.Flag`. But I haven't found how to write SqlTrimFunction.Flag 
into a `RexNode`.

 ```
java.lang.ClassCastException: java.lang.Integer cannot be cast to 
org.apache.calcite.sql.fun.SqlTrimFunction$Flag

at 
org.apache.calcite.adapter.enumerable.RexImpTable$TrimImplementor.implement(RexImpTable.java:1448)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.implementCall(RexImpTable.java:854)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.implementNullSemantics(RexImpTable.java:843)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.implementNullSemantics0(RexImpTable.java:756)
at 
org.apache.calcite.adapter.enumerable.RexImpTable.access$900(RexImpTable.java:181)
at 
org.apache.calcite.adapter.enumerable.RexImpTable$3.implement(RexImpTable.java:411)
at 

[jira] [Resolved] (FLINK-3332) Provide an exactly-once Cassandra connector

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-3332.
---
   Resolution: Fixed
Fix Version/s: 1.1.0

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/a28a2d09

> Provide an exactly-once Cassandra connector
> ---
>
> Key: FLINK-3332
> URL: https://issues.apache.org/jira/browse/FLINK-3332
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
> Fix For: 1.1.0
>
>
> With FLINK-3311, we are adding a Cassandra connector to Flink.
> It would be good to also provide an "exactly-once" C* connector.
> I would like to first discuss how we are going to implement this in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3311) Add a connector for streaming data into Cassandra

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333947#comment-15333947
 ] 

ASF GitHub Bot commented on FLINK-3311:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1771


> Add a connector for streaming data into Cassandra
> -
>
> Key: FLINK-3311
> URL: https://issues.apache.org/jira/browse/FLINK-3311
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Andrea Sella
> Fix For: 1.1.0
>
>
> We had users in the past asking for a Flink+Cassandra integration.
> It seems that there is a well-developed java client for connecting into 
> Cassandra: https://github.com/datastax/java-driver (ASL 2.0)
> There are also tutorials out there on how to start a local cassandra instance 
> (for the tests): 
> http://prettyprint.me/prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/index.html
> For the data types, I think we should support TupleX types, and map standard 
> java types to the respective cassandra types.
> In addition, it seems that there is a object mapper from datastax to store 
> POJOs in Cassandra (there are annotations for defining the primary key and 
> types)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3311) Add a connector for streaming data into Cassandra

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-3311.
---
   Resolution: Fixed
Fix Version/s: 1.1.0

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/5930622e

Thank you for your contribution.

> Add a connector for streaming data into Cassandra
> -
>
> Key: FLINK-3311
> URL: https://issues.apache.org/jira/browse/FLINK-3311
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Andrea Sella
> Fix For: 1.1.0
>
>
> We had users in the past asking for a Flink+Cassandra integration.
> It seems that there is a well-developed java client for connecting into 
> Cassandra: https://github.com/datastax/java-driver (ASL 2.0)
> There are also tutorials out there on how to start a local cassandra instance 
> (for the tests): 
> http://prettyprint.me/prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/index.html
> For the data types, I think we should support TupleX types, and map standard 
> java types to the respective cassandra types.
> In addition, it seems that there is a object mapper from datastax to store 
> POJOs in Cassandra (there are annotations for defining the primary key and 
> types)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #1771: [FLINK-3311/FLINK-3332] Add Cassandra connector

2016-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1771


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2041: [FLINK-3317] [cep] Introduce timeout handler to CE...

2016-06-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2041


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3317) Add timeout handler to CEP operator

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333934#comment-15333934
 ] 

ASF GitHub Bot commented on FLINK-3317:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2041


> Add timeout handler to CEP operator
> ---
>
> Key: FLINK-3317
> URL: https://issues.apache.org/jira/browse/FLINK-3317
> Project: Flink
>  Issue Type: Improvement
>  Components: CEP
>Affects Versions: 1.0.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Minor
>
> Currently, event sequences which exceed the defined pattern timeout will be 
> discarded. However, in some cases the user might be interested in getting to 
> know when such a timeout occurred to return a default value for these event 
> sequences.
> Thus, the pattern API should be extended to be able to define a timeout 
> handler. Furthermore, the {{NFA}} has to be extended to also return the 
> discarded event sequences. The {{CEPOperator}} would then call for every 
> discarded event sequence the timeout handler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3317) Add timeout handler to CEP operator

2016-06-16 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-3317.

Resolution: Fixed

Added via 57ef6c315ee7aa467d922dd4d1213dfd8bc74fb0

> Add timeout handler to CEP operator
> ---
>
> Key: FLINK-3317
> URL: https://issues.apache.org/jira/browse/FLINK-3317
> Project: Flink
>  Issue Type: Improvement
>  Components: CEP
>Affects Versions: 1.0.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Minor
>
> Currently, event sequences which exceed the defined pattern timeout will be 
> discarded. However, in some cases the user might be interested in getting to 
> know when such a timeout occurred to return a default value for these event 
> sequences.
> Thus, the pattern API should be extended to be able to define a timeout 
> handler. Furthermore, the {{NFA}} has to be extended to also return the 
> discarded event sequences. The {{CEPOperator}} would then call for every 
> discarded event sequence the timeout handler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3317) Add timeout handler to CEP operator

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333928#comment-15333928
 ] 

ASF GitHub Bot commented on FLINK-3317:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2041
  
Will be merging this PR.


> Add timeout handler to CEP operator
> ---
>
> Key: FLINK-3317
> URL: https://issues.apache.org/jira/browse/FLINK-3317
> Project: Flink
>  Issue Type: Improvement
>  Components: CEP
>Affects Versions: 1.0.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Minor
>
> Currently, event sequences which exceed the defined pattern timeout will be 
> discarded. However, in some cases the user might be interested in getting to 
> know when such a timeout occurred to return a default value for these event 
> sequences.
> Thus, the pattern API should be extended to be able to define a timeout 
> handler. Furthermore, the {{NFA}} has to be extended to also return the 
> discarded event sequences. The {{CEPOperator}} would then call for every 
> discarded event sequence the timeout handler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2041: [FLINK-3317] [cep] Introduce timeout handler to CEP opera...

2016-06-16 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2041
  
Will be merging this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3937) Make flink cli list, savepoint, cancel and stop work on Flink-on-YARN clusters

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333907#comment-15333907
 ] 

ASF GitHub Bot commented on FLINK-3937:
---

Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2085
  
I tried your code again on my testing machine and everything that didn't 
work at my last tests is now fixed.

I also quickly scanned the newly added commits.
I think the pull request is good to merge.


> Make flink cli list, savepoint, cancel and stop work on Flink-on-YARN clusters
> --
>
> Key: FLINK-3937
> URL: https://issues.apache.org/jira/browse/FLINK-3937
> Project: Flink
>  Issue Type: Improvement
>Reporter: Sebastian Klemke
>Assignee: Maximilian Michels
>Priority: Trivial
> Attachments: improve_flink_cli_yarn_integration.patch
>
>
> Currently, flink cli can't figure out JobManager RPC location for 
> Flink-on-YARN clusters. Therefore, list, savepoint, cancel and stop 
> subcommands are hard to invoke if you only know the YARN application ID. As 
> an improvement, I suggest adding a -yid  option to the 
> mentioned subcommands that can be used together with -m yarn-cluster. Flink 
> cli would then retrieve JobManager RPC location from YARN ResourceManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2085: [FLINK-3937] programmatic resuming of clusters

2016-06-16 Thread rmetzger
Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2085
  
I tried your code again on my testing machine and everything that didn't 
work at my last tests is now fixed.

I also quickly scanned the newly added commits.
I think the pull request is good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4083) Use ClosureCleaner for Join where and equalTo

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333870#comment-15333870
 ] 

ASF GitHub Bot commented on FLINK-4083:
---

GitHub user StefanRRichter opened a pull request:

https://github.com/apache/flink/pull/2117

[FLINK-4083] [java,DataSet] Introduce closure cleaning in Join.where(…

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StefanRRichter/flink 4083-Closure_Cleaner_Join

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2117


commit 9ff6366599127ddd1f54bb50e73ba8584e509564
Author: Stefan Richter 
Date:   2016-06-16T13:26:12Z

[FLINK-4083] [java,DataSet] Introduce closure cleaning in Join.where() and 
Join.equaltTo()




> Use ClosureCleaner for Join where and equalTo
> -
>
> Key: FLINK-4083
> URL: https://issues.apache.org/jira/browse/FLINK-4083
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.3
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>Priority: Minor
>
> When specifying a key selector in the where or equalTo clause of a Join, the 
> closure cleaner is not used. Same problem as FLINK-4078.
> {code}
> .join(ds)
>   .where(new KeySelector() {
>   @Override
>   public Integer getKey(CustomType value) {
>   return value.myInt;
>   }
>   })
>   .equalTo(new KeySelector(){
>   @Override
>   public Integer getKey(CustomType value) throws Exception {
>   return value.myInt;
>   }
>   });
> {code}
> The problem is that the KeySelector is an anonymous inner class and as such 
> as a reference to the outer object. Normally, this would be rectified by the 
> closure cleaner but the cleaner is not used in Join.where() and 
> Join.equalTo().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2117: [FLINK-4083] [java,DataSet] Introduce closure clea...

2016-06-16 Thread StefanRRichter
GitHub user StefanRRichter opened a pull request:

https://github.com/apache/flink/pull/2117

[FLINK-4083] [java,DataSet] Introduce closure cleaning in Join.where(…

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StefanRRichter/flink 4083-Closure_Cleaner_Join

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2117


commit 9ff6366599127ddd1f54bb50e73ba8584e509564
Author: Stefan Richter 
Date:   2016-06-16T13:26:12Z

[FLINK-4083] [java,DataSet] Introduce closure cleaning in Join.where() and 
Join.equaltTo()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333861#comment-15333861
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/1856
  
Hi @ramkrish86, PR looks quite good now. I had only a few minor comments. 
After those are fixed, the PR should be good to merge.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-4019) Expose approximateArrivalTimestamp through the KinesisDeserializationSchema interface

2016-06-16 Thread Tzu-Li (Gordon) Tai (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai reassigned FLINK-4019:
--

Assignee: Tzu-Li (Gordon) Tai

> Expose approximateArrivalTimestamp through the KinesisDeserializationSchema 
> interface
> -
>
> Key: FLINK-4019
> URL: https://issues.apache.org/jira/browse/FLINK-4019
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Amazon's Record class also gives information about the timestamp of when 
> Kinesis successfully receives the record: 
> http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/model/Record.html#getApproximateArrivalTimestamp().
> This should be useful info for users and should be exposed through the 
> deserialization schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4019) Expose approximateArrivalTimestamp through the KinesisDeserializationSchema interface

2016-06-16 Thread Tzu-Li (Gordon) Tai (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333858#comment-15333858
 ] 

Tzu-Li (Gordon) Tai commented on FLINK-4019:


Ok. I'll add this then along with the upcoming PR for reshard handling.

> Expose approximateArrivalTimestamp through the KinesisDeserializationSchema 
> interface
> -
>
> Key: FLINK-4019
> URL: https://issues.apache.org/jira/browse/FLINK-4019
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Affects Versions: 1.1.0
>Reporter: Tzu-Li (Gordon) Tai
> Fix For: 1.1.0
>
>
> Amazon's Record class also gives information about the timestamp of when 
> Kinesis successfully receives the record: 
> http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/model/Record.html#getApproximateArrivalTimestamp().
> This should be useful info for users and should be exposed through the 
> deserialization schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/1856
  
Hi @ramkrish86, PR looks quite good now. I had only a few minor comments. 
After those are fixed, the PR should be good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333853#comment-15333853
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67350850
  
--- Diff: 
flink-scala/src/test/scala/org/apache/flink/api/operator/SelectByFunctionTest.scala
 ---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.operator
+
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.scala.{SelectByMaxFunction, 
SelectByMinFunction}
+import org.apache.flink.api.scala.typeutils.CaseClassTypeInfo
+import org.junit.{Assert, Test}
+
+/**
+  *
+  */
+class SelectByFunctionTest {
+
+  val tupleTypeInfo = new CaseClassTypeInfo(
+classOf[(Int, Long, String, Long, Int)], Array(), 
Array(BasicTypeInfo.INT_TYPE_INFO,
+  BasicTypeInfo.LONG_TYPE_INFO,
+  BasicTypeInfo.STRING_TYPE_INFO,
+  BasicTypeInfo.LONG_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO),
+Array("_1", "_2","_3","_4","_5")) {
+
+override def createSerializer(config: ExecutionConfig):
+  TypeSerializer[(Int, Long, String, Long, Int)] = ???
+  }
+
+   private val bigger  = new Tuple5[Int, Long, String, Long, Int](10, 
100L, "HelloWorld", 200L, 20)
+   private val smaller = new Tuple5[Int, Long, String, Long, Int](5, 50L, 
"Hello", 50L, 15)
+
+   //Special case where only the last value determines if bigger or smaller
+   private val specialCaseBigger =
+ new Tuple5[Int, Long, String, Long, Int](10, 100L, "HelloWorld", 
200L, 17)
+   private val specialCaseSmaller  =
+ new Tuple5[Int, Long, String, Long, Int](5, 50L, "Hello", 50L, 17)
+
+  /**
+* This test validates whether the order of tuples has
+*
+* any impact on the outcome and if the bigger tuple is returned.
+*/
+  @Test
+  def testMaxByComparison(): Unit = {
+val a1 = Array(0)
+val maxByTuple = new SelectByMaxFunction(tupleTypeInfo, a1)
+  try {
+Assert.assertSame("SelectByMax must return bigger tuple",
+  bigger, maxByTuple.reduce(smaller, bigger))
+Assert.assertSame("SelectByMax must return bigger tuple",
+  bigger, maxByTuple.reduce(bigger, smaller))
+  } catch {
+case e : Exception =>
+  Assert.fail("No exception should be thrown while comapring both 
tuples")
+  }
+  }
+
+  // --- MAXIMUM FUNCTION TEST BELOW 
--
+
+  /**
+* This test cases checks when two tuples only differ in one value, but 
this value is not
+* in the fields list. In that case it should be seen as equal
+* and then the first given tuple (value1) should be returned by 
reduce().
+*/
+  @Test
+  def testMaxByComparisonSpecialCase1() : Unit = {
+val a1 = Array(0, 3)
+val maxByTuple = new SelectByMaxFunction(tupleTypeInfo, a1)
+
+try {
+  Assert.assertSame("SelectByMax must return the first given tuple",
+specialCaseBigger, maxByTuple.reduce(specialCaseBigger, bigger))
+  Assert.assertSame("SelectByMax must return the first given tuple",
+bigger, maxByTuple.reduce(bigger, specialCaseBigger))
+} catch {
+  case e : Exception => Assert.fail("No exception should be thrown " +
+"while comapring both tuples")
+}
+  }
+
+  /**
+* This test cases checks when two tuples only differ in one value.
+*/
+  @Test
+  def testMaxByComparisonSpecialCase2() : Unit = {
+val a1 = Array(0, 2, 1, 4, 3)
+val maxByTuple = new SelectByMaxFunction(tupleTypeInfo, 

[GitHub] flink pull request #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67350850
  
--- Diff: 
flink-scala/src/test/scala/org/apache/flink/api/operator/SelectByFunctionTest.scala
 ---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.operator
+
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.scala.{SelectByMaxFunction, 
SelectByMinFunction}
+import org.apache.flink.api.scala.typeutils.CaseClassTypeInfo
+import org.junit.{Assert, Test}
+
+/**
+  *
+  */
+class SelectByFunctionTest {
+
+  val tupleTypeInfo = new CaseClassTypeInfo(
+classOf[(Int, Long, String, Long, Int)], Array(), 
Array(BasicTypeInfo.INT_TYPE_INFO,
+  BasicTypeInfo.LONG_TYPE_INFO,
+  BasicTypeInfo.STRING_TYPE_INFO,
+  BasicTypeInfo.LONG_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO),
+Array("_1", "_2","_3","_4","_5")) {
+
+override def createSerializer(config: ExecutionConfig):
+  TypeSerializer[(Int, Long, String, Long, Int)] = ???
+  }
+
+   private val bigger  = new Tuple5[Int, Long, String, Long, Int](10, 
100L, "HelloWorld", 200L, 20)
+   private val smaller = new Tuple5[Int, Long, String, Long, Int](5, 50L, 
"Hello", 50L, 15)
+
+   //Special case where only the last value determines if bigger or smaller
+   private val specialCaseBigger =
+ new Tuple5[Int, Long, String, Long, Int](10, 100L, "HelloWorld", 
200L, 17)
+   private val specialCaseSmaller  =
+ new Tuple5[Int, Long, String, Long, Int](5, 50L, "Hello", 50L, 17)
+
+  /**
+* This test validates whether the order of tuples has
+*
+* any impact on the outcome and if the bigger tuple is returned.
+*/
+  @Test
+  def testMaxByComparison(): Unit = {
+val a1 = Array(0)
+val maxByTuple = new SelectByMaxFunction(tupleTypeInfo, a1)
+  try {
+Assert.assertSame("SelectByMax must return bigger tuple",
+  bigger, maxByTuple.reduce(smaller, bigger))
+Assert.assertSame("SelectByMax must return bigger tuple",
+  bigger, maxByTuple.reduce(bigger, smaller))
+  } catch {
+case e : Exception =>
+  Assert.fail("No exception should be thrown while comapring both 
tuples")
+  }
+  }
+
+  // --- MAXIMUM FUNCTION TEST BELOW 
--
+
+  /**
+* This test cases checks when two tuples only differ in one value, but 
this value is not
+* in the fields list. In that case it should be seen as equal
+* and then the first given tuple (value1) should be returned by 
reduce().
+*/
+  @Test
+  def testMaxByComparisonSpecialCase1() : Unit = {
+val a1 = Array(0, 3)
+val maxByTuple = new SelectByMaxFunction(tupleTypeInfo, a1)
+
+try {
+  Assert.assertSame("SelectByMax must return the first given tuple",
+specialCaseBigger, maxByTuple.reduce(specialCaseBigger, bigger))
+  Assert.assertSame("SelectByMax must return the first given tuple",
+bigger, maxByTuple.reduce(bigger, specialCaseBigger))
+} catch {
+  case e : Exception => Assert.fail("No exception should be thrown " +
+"while comapring both tuples")
+}
+  }
+
+  /**
+* This test cases checks when two tuples only differ in one value.
+*/
+  @Test
+  def testMaxByComparisonSpecialCase2() : Unit = {
+val a1 = Array(0, 2, 1, 4, 3)
+val maxByTuple = new SelectByMaxFunction(tupleTypeInfo, a1)
+try {
+  Assert.assertSame("SelectByMax must return bigger tuple",
+bigger, maxByTuple.reduce(specialCaseBigger, bigger))
+  Assert.assertSame("SelectByMax must return bigger tuple",
+

[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333840#comment-15333840
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67349578
  
--- Diff: 
flink-scala/src/test/scala/org/apache/flink/api/operator/SelectByFunctionTest.scala
 ---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.operator
+
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.scala.{SelectByMaxFunction, 
SelectByMinFunction}
+import org.apache.flink.api.scala.typeutils.CaseClassTypeInfo
+import org.junit.{Assert, Test}
+
+/**
+  *
+  */
+class SelectByFunctionTest {
+
+  val tupleTypeInfo = new CaseClassTypeInfo(
+classOf[(Int, Long, String, Long, Int)], Array(), 
Array(BasicTypeInfo.INT_TYPE_INFO,
+  BasicTypeInfo.LONG_TYPE_INFO,
+  BasicTypeInfo.STRING_TYPE_INFO,
+  BasicTypeInfo.LONG_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO),
+Array("_1", "_2","_3","_4","_5")) {
+
+override def createSerializer(config: ExecutionConfig):
+  TypeSerializer[(Int, Long, String, Long, Int)] = ???
+  }
+
+   private val bigger  = new Tuple5[Int, Long, String, Long, Int](10, 
100L, "HelloWorld", 200L, 20)
--- End diff --

you can create Scala tuples also like this:
```
private val bigger = (10, 100L, "HelloWorld", 200L, 20)
```


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67349578
  
--- Diff: 
flink-scala/src/test/scala/org/apache/flink/api/operator/SelectByFunctionTest.scala
 ---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.operator
+
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.scala.{SelectByMaxFunction, 
SelectByMinFunction}
+import org.apache.flink.api.scala.typeutils.CaseClassTypeInfo
+import org.junit.{Assert, Test}
+
+/**
+  *
+  */
+class SelectByFunctionTest {
+
+  val tupleTypeInfo = new CaseClassTypeInfo(
+classOf[(Int, Long, String, Long, Int)], Array(), 
Array(BasicTypeInfo.INT_TYPE_INFO,
+  BasicTypeInfo.LONG_TYPE_INFO,
+  BasicTypeInfo.STRING_TYPE_INFO,
+  BasicTypeInfo.LONG_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO),
+Array("_1", "_2","_3","_4","_5")) {
+
+override def createSerializer(config: ExecutionConfig):
+  TypeSerializer[(Int, Long, String, Long, Int)] = ???
+  }
+
+   private val bigger  = new Tuple5[Int, Long, String, Long, Int](10, 
100L, "HelloWorld", 200L, 20)
--- End diff --

you can create Scala tuples also like this:
```
private val bigger = (10, 100L, "HelloWorld", 200L, 20)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333839#comment-15333839
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67349089
  
--- Diff: 
flink-scala/src/test/scala/org/apache/flink/api/operator/SelectByFunctionTest.scala
 ---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.operator
+
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.scala.{SelectByMaxFunction, 
SelectByMinFunction}
+import org.apache.flink.api.scala.typeutils.CaseClassTypeInfo
+import org.junit.{Assert, Test}
+
+/**
+  *
+  */
+class SelectByFunctionTest {
+
+  val tupleTypeInfo = new CaseClassTypeInfo(
--- End diff --

you can create `tupleTypeInfo` easier like this: 

```
val tupleTypeInfo = implicitly[TypeInformation[(Int, Long, String, Long, 
Int)]]
.asInstanceOf[TupleTypeInfoBase[(Int, Long, String, Long, Int)]]
```

if you add this `import org.apache.flink.api.scala._`


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67349089
  
--- Diff: 
flink-scala/src/test/scala/org/apache/flink/api/operator/SelectByFunctionTest.scala
 ---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.operator
+
+import org.apache.flink.api.common.ExecutionConfig
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.scala.{SelectByMaxFunction, 
SelectByMinFunction}
+import org.apache.flink.api.scala.typeutils.CaseClassTypeInfo
+import org.junit.{Assert, Test}
+
+/**
+  *
+  */
+class SelectByFunctionTest {
+
+  val tupleTypeInfo = new CaseClassTypeInfo(
--- End diff --

you can create `tupleTypeInfo` easier like this: 

```
val tupleTypeInfo = implicitly[TypeInformation[(Int, Long, String, Long, 
Int)]]
.asInstanceOf[TupleTypeInfoBase[(Int, Long, String, Long, Int)]]
```

if you add this `import org.apache.flink.api.scala._`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4038) Impossible to set more than 1 JVM argument in env.java.opts

2016-06-16 Thread Gyula Fora (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333824#comment-15333824
 ] 

Gyula Fora commented on FLINK-4038:
---

Should I open a discussion thread on the mailing list maybe?

> Impossible to set more than 1 JVM argument in env.java.opts
> ---
>
> Key: FLINK-4038
> URL: https://issues.apache.org/jira/browse/FLINK-4038
> Project: Flink
>  Issue Type: Bug
>  Components: Start-Stop Scripts
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Gyula Fora
>Priority: Critical
> Fix For: 1.1.0
>
>
> The Taskmanager start scripts fail when env.java.opts contains more than 1 
> jvm opts due to:
> if [[ $FLINK_TM_MEM_PRE_ALLOCATE == "false" ]] && [ -z $FLINK_ENV_JAVA_OPTS 
> ]; then
> -z checks the length of the first argument but it fails if it has more than 1 
> argument



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67345907
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/SelectByMaxFunction.scala 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.scala
+
+import org.apache.flink.api.common.functions.ReduceFunction
+import org.apache.flink.api.java.typeutils.TupleTypeInfoBase
+import org.apache.flink.annotation.Internal
+
+/**
+  * SelectByMaxFunction to work with Scala tuples
+  */
+@Internal
+class SelectByMaxFunction[T](t : TupleTypeInfoBase[T], fields : Array[Int])
+  extends ReduceFunction[T] {
+  for(f <- fields) {
+if (f < 0 || f >= t.getArity()) {
+  throw new IndexOutOfBoundsException(
+"SelectByMaxFunction field position " + f + " is out of range.")
+}
+
+// Check whether type is comparable
+if (!t.asInstanceOf[TupleTypeInfoBase[T]].getTypeAt(f).isKeyType()) {
--- End diff --

the cast is not necessary. `t` is already a `TupleTypeInfoBase`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4038) Impossible to set more than 1 JVM argument in env.java.opts

2016-06-16 Thread Gyula Fora (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333822#comment-15333822
 ] 

Gyula Fora commented on FLINK-4038:
---

I feel that these issues are rather easy to solve once we have a consensus on 
how the scripts should behave. On the other hand these issues will have major 
impact on the deployments in the future so I would like to hear the opinions of 
more experienced people.

> Impossible to set more than 1 JVM argument in env.java.opts
> ---
>
> Key: FLINK-4038
> URL: https://issues.apache.org/jira/browse/FLINK-4038
> Project: Flink
>  Issue Type: Bug
>  Components: Start-Stop Scripts
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Gyula Fora
>Priority: Critical
> Fix For: 1.1.0
>
>
> The Taskmanager start scripts fail when env.java.opts contains more than 1 
> jvm opts due to:
> if [[ $FLINK_TM_MEM_PRE_ALLOCATE == "false" ]] && [ -z $FLINK_ENV_JAVA_OPTS 
> ]; then
> -z checks the length of the first argument but it fails if it has more than 1 
> argument



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333801#comment-15333801
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67346072
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/SelectByMinFunction.scala 
---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.scala
+
+import org.apache.flink.annotation.Internal
+import org.apache.flink.api.common.functions.ReduceFunction
+import org.apache.flink.api.java.typeutils.TupleTypeInfoBase
+
+/**
+  * SelectByMinFunction to work with Scala tuples
+  */
+@Internal
+class SelectByMinFunction[T](t : TupleTypeInfoBase[T], fields : Array[Int])
+  extends ReduceFunction[T] {
+  for(f <- fields) {
+if (f < 0 || f >= t.getArity()) {
+  throw new IndexOutOfBoundsException(
+"SelectByMinFunction field position " + f + " is out of range.")
+}
+
+// Check whether type is comparable
+if (!t.asInstanceOf[TupleTypeInfoBase[T]].getTypeAt(f).isKeyType()) {
--- End diff --

the cast is not necessary. `t` is already a `TupleTypeInfoBase`.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-4067) Add version header to savepoints

2016-06-16 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reassigned FLINK-4067:
--

Assignee: Ufuk Celebi

> Add version header to savepoints
> 
>
> Key: FLINK-4067
> URL: https://issues.apache.org/jira/browse/FLINK-4067
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.1.0
>
>
> Adding a header with version information to savepoints ensures that we can 
> migrate savepoints between Flink versions in the future (for example when 
> changing internal serialization formats between versions).
> After talking with Till, we propose to add the following meta data:
> - Magic number (int): identify data as savepoint
> - Version (int): savepoint version (independent of Flink version)
> - Data Offset (int): specifies at which point the actual savepoint data 
> starts. With this, we can allow future Flink versions to add fields to the 
> header without breaking stuff, e.g. Flink 1.1 could read savepoints of Flink 
> 2.0.
> For Flink 1.0 savepoint support, we have to try reading the savepoints 
> without a header before failing if we don't find the magic number.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67346072
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/SelectByMinFunction.scala 
---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.scala
+
+import org.apache.flink.annotation.Internal
+import org.apache.flink.api.common.functions.ReduceFunction
+import org.apache.flink.api.java.typeutils.TupleTypeInfoBase
+
+/**
+  * SelectByMinFunction to work with Scala tuples
+  */
+@Internal
+class SelectByMinFunction[T](t : TupleTypeInfoBase[T], fields : Array[Int])
+  extends ReduceFunction[T] {
+  for(f <- fields) {
+if (f < 0 || f >= t.getArity()) {
+  throw new IndexOutOfBoundsException(
+"SelectByMinFunction field position " + f + " is out of range.")
+}
+
+// Check whether type is comparable
+if (!t.asInstanceOf[TupleTypeInfoBase[T]].getTypeAt(f).isKeyType()) {
--- End diff --

the cast is not necessary. `t` is already a `TupleTypeInfoBase`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333799#comment-15333799
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67345907
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/SelectByMaxFunction.scala 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.scala
+
+import org.apache.flink.api.common.functions.ReduceFunction
+import org.apache.flink.api.java.typeutils.TupleTypeInfoBase
+import org.apache.flink.annotation.Internal
+
+/**
+  * SelectByMaxFunction to work with Scala tuples
+  */
+@Internal
+class SelectByMaxFunction[T](t : TupleTypeInfoBase[T], fields : Array[Int])
+  extends ReduceFunction[T] {
+  for(f <- fields) {
+if (f < 0 || f >= t.getArity()) {
+  throw new IndexOutOfBoundsException(
+"SelectByMaxFunction field position " + f + " is out of range.")
+}
+
+// Check whether type is comparable
+if (!t.asInstanceOf[TupleTypeInfoBase[T]].getTypeAt(f).isKeyType()) {
--- End diff --

the cast is not necessary. `t` is already a `TupleTypeInfoBase`.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333793#comment-15333793
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67345503
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/GroupedDataSet.scala ---
@@ -356,6 +357,40 @@ class GroupedDataSet[T: ClassTag](
   }
 
   /**
+* Applies a special case of a reduce transformation `maxBy` on a 
grouped [[DataSet]]
+* The transformation consecutively calls a [[ReduceFunction]]
+* until only a single element remains which is the result of the 
transformation.
+* A ReduceFunction combines two elements into one new element of the 
same type.
+*
+* @param fields Keys taken into account for finding the minimum.
+* @return A [[ReduceOperator]] representing the minimum.
+*/
+  def maxBy(fields: Int*) : DataSet[T]  = {
+if (!set.getType().isTupleType) {
+  throw new InvalidProgramException("GroupedDataSet#maxBy(int...) only 
works on Tuple types.")
+}
+reduce(new 
SelectByMaxFunction[T](set.getType.asInstanceOf[TupleTypeInfoBase[T]],
+  fields.toArray))
+  }
+
+  /**
+* Applies a special case of a reduce transformation `minBy` on a 
grouped [[DataSet]].
+* The transformation consecutively calls a [[ReduceFunction]]
+* until only a single element remains which is the result of the 
transformation.
+* A ReduceFunction combines two elements into one new element of the 
same type.
+*
+* @param fields Keys taken into account for finding the minimum.
+* @return A [[ReduceOperator]] representing the minimum.
--- End diff --

The return type is not correct. You can remove `@param` and `@return` as 
these are not used for the other methods in this class either.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333792#comment-15333792
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67345484
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/GroupedDataSet.scala ---
@@ -356,6 +357,40 @@ class GroupedDataSet[T: ClassTag](
   }
 
   /**
+* Applies a special case of a reduce transformation `maxBy` on a 
grouped [[DataSet]]
+* The transformation consecutively calls a [[ReduceFunction]]
+* until only a single element remains which is the result of the 
transformation.
+* A ReduceFunction combines two elements into one new element of the 
same type.
+*
+* @param fields Keys taken into account for finding the minimum.
+* @return A [[ReduceOperator]] representing the minimum.
--- End diff --

The return type is not correct. You can remove `@param` and `@return` as 
these are not used for the other methods in this class either.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67345503
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/GroupedDataSet.scala ---
@@ -356,6 +357,40 @@ class GroupedDataSet[T: ClassTag](
   }
 
   /**
+* Applies a special case of a reduce transformation `maxBy` on a 
grouped [[DataSet]]
+* The transformation consecutively calls a [[ReduceFunction]]
+* until only a single element remains which is the result of the 
transformation.
+* A ReduceFunction combines two elements into one new element of the 
same type.
+*
+* @param fields Keys taken into account for finding the minimum.
+* @return A [[ReduceOperator]] representing the minimum.
+*/
+  def maxBy(fields: Int*) : DataSet[T]  = {
+if (!set.getType().isTupleType) {
+  throw new InvalidProgramException("GroupedDataSet#maxBy(int...) only 
works on Tuple types.")
+}
+reduce(new 
SelectByMaxFunction[T](set.getType.asInstanceOf[TupleTypeInfoBase[T]],
+  fields.toArray))
+  }
+
+  /**
+* Applies a special case of a reduce transformation `minBy` on a 
grouped [[DataSet]].
+* The transformation consecutively calls a [[ReduceFunction]]
+* until only a single element remains which is the result of the 
transformation.
+* A ReduceFunction combines two elements into one new element of the 
same type.
+*
+* @param fields Keys taken into account for finding the minimum.
+* @return A [[ReduceOperator]] representing the minimum.
--- End diff --

The return type is not correct. You can remove `@param` and `@return` as 
these are not used for the other methods in this class either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

2016-06-16 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r67345484
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/GroupedDataSet.scala ---
@@ -356,6 +357,40 @@ class GroupedDataSet[T: ClassTag](
   }
 
   /**
+* Applies a special case of a reduce transformation `maxBy` on a 
grouped [[DataSet]]
+* The transformation consecutively calls a [[ReduceFunction]]
+* until only a single element remains which is the result of the 
transformation.
+* A ReduceFunction combines two elements into one new element of the 
same type.
+*
+* @param fields Keys taken into account for finding the minimum.
+* @return A [[ReduceOperator]] representing the minimum.
--- End diff --

The return type is not correct. You can remove `@param` and `@return` as 
these are not used for the other methods in this class either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2116: [FLINK-4078] [java, DataSet] Introduce missing cal...

2016-06-16 Thread StefanRRichter
GitHub user StefanRRichter opened a pull request:

https://github.com/apache/flink/pull/2116

[FLINK-4078] [java, DataSet] Introduce missing calls to closure cleaner

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ x] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StefanRRichter/flink 4024-closure_cleaner

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2116.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2116


commit 90ccdccfaab8b3261716e2f0cbf8deb846f86f08
Author: Stefan Richter 
Date:   2016-06-16T10:11:32Z

[FLINK-4078] [java, DataSet] Introduce missing calls to closure cleaner




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2114: FLINT-3839 Added the extraction of jar files from a URL s...

2016-06-16 Thread rmetzger
Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2114
  
Thank you for your contribution. There is a typo in the pull request title. 
its important that its correct so that our jira picks up the pull request.
Please also check out our contribution guidelines: 
http://flink.apache.org/contribute-code.html

They state in particular the following:
> No reformattings. Please keep reformatting of source files to a minimum. 
Diffs become unreadable if you (or your IDE automatically) remove or replace 
whitespaces, reformat code, or comments. Also, other patches that affect the 
same files become un-mergeable. Please configure your IDE such that code is not 
automatically reformatted. Pull requests with excessive or unnecessary code 
reformatting might be rejected.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (FLINK-3924) Remove protobuf shading from Kinesis connector

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-3924:
-

Assignee: Robert Metzger

> Remove protobuf shading from Kinesis connector
> --
>
> Key: FLINK-3924
> URL: https://issues.apache.org/jira/browse/FLINK-3924
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>
> The Kinesis connector is currently creating a fat jar with a custom protobuf 
> version (2.6.1), relocated into a different package.
> We need to build the fat jar to change the protobuf calls from the original 
> protobuf to the relocated one.
> Because Kinesis is licensed under the Amazon Software License (which is not 
> entirely to the ASL2.0), I don't want to deploy kinesis connector binaries to 
> maven central with the releases. These binaries would contain code from 
> Amazon. It would be more than just linking to an (optional) dependencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3924) Remove protobuf shading from Kinesis connector

2016-06-16 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333752#comment-15333752
 ] 

Robert Metzger commented on FLINK-3924:
---

I'm assigning this to myself

> Remove protobuf shading from Kinesis connector
> --
>
> Key: FLINK-3924
> URL: https://issues.apache.org/jira/browse/FLINK-3924
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Reporter: Robert Metzger
>
> The Kinesis connector is currently creating a fat jar with a custom protobuf 
> version (2.6.1), relocated into a different package.
> We need to build the fat jar to change the protobuf calls from the original 
> protobuf to the relocated one.
> Because Kinesis is licensed under the Amazon Software License (which is not 
> entirely to the ASL2.0), I don't want to deploy kinesis connector binaries to 
> maven central with the releases. These binaries would contain code from 
> Amazon. It would be more than just linking to an (optional) dependencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3714) Add Support for "Allowed Lateness"

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333749#comment-15333749
 ] 

ASF GitHub Bot commented on FLINK-3714:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/2093
  
Could you please update the tests to add elements to `expectedOutput` in 
line with the `processElement` and `processWatermark` calls on the test 
harness. In the existing tests I did this and it makes it easy to reason about 
what output should be produced in sequence. If the expected elements/watermark 
are added in after the other calls you have to keep jumping back and forth when 
reading the test. Also, could you replace the `el`, `el2` and so on variables 
with inline tuple creation. For the same reason of not having to jump back and 
forth when reading the tests?


> Add Support for "Allowed Lateness"
> --
>
> Key: FLINK-3714
> URL: https://issues.apache.org/jira/browse/FLINK-3714
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Kloudas
>
> As mentioned in 
> https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit#
>  we should add support for an allowed lateness setting.
> This includes several things:
>  - API for setting allowed lateness
>  - Dropping of late elements 
>  - Garbage collection of windows state/timers
> Depending on whether the {{WindowAssigner}} assigns windows based on event 
> time or processing time we have to adjust the GC behavior. For event-time 
> windows "allowed lateness" makes sense and we should garbage collect after 
> this expires. For processing-time windows "allowed lateness" does not make 
> sense and we should always GC window state/timers at the end timestamp of a 
> processing-time window. I think that we need a method for this on 
> {{WindowAssigner}} that allows to differentiate between event-time windows 
> and processing-time windows: {{boolean WindowAssigner.isEventTime()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2093: [FLINK-3714] Add Support for "Allowed Lateness"

2016-06-16 Thread aljoscha
Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/2093
  
Could you please update the tests to add elements to `expectedOutput` in 
line with the `processElement` and `processWatermark` calls on the test 
harness. In the existing tests I did this and it makes it easy to reason about 
what output should be produced in sequence. If the expected elements/watermark 
are added in after the other calls you have to keep jumping back and forth when 
reading the test. Also, could you replace the `el`, `el2` and so on variables 
with inline tuple creation. For the same reason of not having to jump back and 
forth when reading the tests?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-4083) Use ClosureCleaner for Join where and equalTo

2016-06-16 Thread Stefan Richter (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Richter updated FLINK-4083:
--
Description: 
When specifying a key selector in the where or equalTo clause of a Join, the 
closure cleaner is not used. Same problem as FLINK-4078.

{code}
.join(ds)
.where(new KeySelector() {
@Override
public Integer getKey(CustomType value) {
return value.myInt;
}
})
.equalTo(new KeySelector(){

@Override
public Integer getKey(CustomType value) throws Exception {
return value.myInt;
}
});
{code}

The problem is that the KeySelector is an anonymous inner class and as such as 
a reference to the outer object. Normally, this would be rectified by the 
closure cleaner but the cleaner is not used in Join.where() and Join.equalTo().

  was:
When specifying a key selector in the where or equalTo clause of a Join, the 
closure cleaner is not used. Same problem as FLINK-4078.

{code}
.join(ds)
.where(new KeySelector() {
@Override
public Integer getKey(CustomType value) {
return value.myInt;
}
})
.equalTo(new KeySelector(){

@Override
public Integer getKey(CustomType value) throws Exception {
return value.myInt;
}
});
{code}

The problem is that the KeySelector is an anonymous inner class and as such as 
a reference to the outer object. Normally, this would be rectified by the 
closure cleaner but the cleaner is not used in CoGroup.where().


> Use ClosureCleaner for Join where and equalTo
> -
>
> Key: FLINK-4083
> URL: https://issues.apache.org/jira/browse/FLINK-4083
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.3
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>Priority: Minor
>
> When specifying a key selector in the where or equalTo clause of a Join, the 
> closure cleaner is not used. Same problem as FLINK-4078.
> {code}
> .join(ds)
>   .where(new KeySelector() {
>   @Override
>   public Integer getKey(CustomType value) {
>   return value.myInt;
>   }
>   })
>   .equalTo(new KeySelector(){
>   @Override
>   public Integer getKey(CustomType value) throws Exception {
>   return value.myInt;
>   }
>   });
> {code}
> The problem is that the KeySelector is an anonymous inner class and as such 
> as a reference to the outer object. Normally, this would be rectified by the 
> closure cleaner but the cleaner is not used in Join.where() and 
> Join.equalTo().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (FLINK-4083) Use ClosureCleaner for Join where and equalTo

2016-06-16 Thread Stefan Richter (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Richter updated FLINK-4083:
--
Comment: was deleted

(was: Same problem.)

> Use ClosureCleaner for Join where and equalTo
> -
>
> Key: FLINK-4083
> URL: https://issues.apache.org/jira/browse/FLINK-4083
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.3
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>Priority: Minor
>
> When specifying a key selector in the where or equalTo clause of a Join, the 
> closure cleaner is not used. Same problem as FLINK-4078.
> {code}
> .join(ds)
>   .where(new KeySelector() {
>   @Override
>   public Integer getKey(CustomType value) {
>   return value.myInt;
>   }
>   })
>   .equalTo(new KeySelector(){
>   @Override
>   public Integer getKey(CustomType value) throws Exception {
>   return value.myInt;
>   }
>   });
> {code}
> The problem is that the KeySelector is an anonymous inner class and as such 
> as a reference to the outer object. Normally, this would be rectified by the 
> closure cleaner but the cleaner is not used in CoGroup.where().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4083) Use ClosureCleaner for Join where and equalTo

2016-06-16 Thread Stefan Richter (JIRA)
Stefan Richter created FLINK-4083:
-

 Summary: Use ClosureCleaner for Join where and equalTo
 Key: FLINK-4083
 URL: https://issues.apache.org/jira/browse/FLINK-4083
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.0.3
Reporter: Stefan Richter
Assignee: Stefan Richter
Priority: Minor


When specifying a key selector in the where or equalTo clause of a Join, the 
closure cleaner is not used. Same problem as FLINK-4078.

{code}
.join(ds)
.where(new KeySelector() {
@Override
public Integer getKey(CustomType value) {
return value.myInt;
}
})
.equalTo(new KeySelector(){

@Override
public Integer getKey(CustomType value) throws Exception {
return value.myInt;
}
});
{code}

The problem is that the KeySelector is an anonymous inner class and as such as 
a reference to the outer object. Normally, this would be rectified by the 
closure cleaner but the cleaner is not used in CoGroup.where().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2115: [FLINK-4017] [py] Add Aggregation support to Pytho...

2016-06-16 Thread GEOFBOT
GitHub user GEOFBOT opened a pull request:

https://github.com/apache/flink/pull/2115

[FLINK-4017] [py] Add Aggregation support to Python API

Adds Aggregation support in the Python API accessible through 
`.aggregate()` and `.agg_and()`. (I was unable to use `.and()` in Python 
because 'and' is a keyword.)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/GEOFBOT/flink FLINK-4017

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2115


commit f3836f1e3919ca450baa3f633d9ece538008b106
Author: Geoffrey Mon 
Date:   2016-06-02T16:10:59Z

[FLINK-4017] [py] Add Aggregation support to Python API

Assembles and applies a GroupReduceFunction using pre-defined
AggregationOperations.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4017) [py] Add Aggregation support to Python API

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333746#comment-15333746
 ] 

ASF GitHub Bot commented on FLINK-4017:
---

GitHub user GEOFBOT opened a pull request:

https://github.com/apache/flink/pull/2115

[FLINK-4017] [py] Add Aggregation support to Python API

Adds Aggregation support in the Python API accessible through 
`.aggregate()` and `.agg_and()`. (I was unable to use `.and()` in Python 
because 'and' is a keyword.)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/GEOFBOT/flink FLINK-4017

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2115


commit f3836f1e3919ca450baa3f633d9ece538008b106
Author: Geoffrey Mon 
Date:   2016-06-02T16:10:59Z

[FLINK-4017] [py] Add Aggregation support to Python API

Assembles and applies a GroupReduceFunction using pre-defined
AggregationOperations.




> [py] Add Aggregation support to Python API
> --
>
> Key: FLINK-4017
> URL: https://issues.apache.org/jira/browse/FLINK-4017
> Project: Flink
>  Issue Type: Improvement
>  Components: Python API
>Reporter: Geoffrey Mon
>Priority: Minor
>
> Aggregations are not currently supported in the Python API.
> I was getting started with setting up and working with Flink and figured this 
> would be a relatively simple task for me to get started with. Currently 
> working on this at https://github.com/geofbot/flink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-4038) Impossible to set more than 1 JVM argument in env.java.opts

2016-06-16 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333740#comment-15333740
 ] 

Robert Metzger edited comment on FLINK-4038 at 6/16/16 1:12 PM:


[~gyfora] Can you work on FLINK-4038 - FLINK-4040 until the 1.1. release or 
would you like to have somebody else fixing this?


was (Author: rmetzger):
Can you work on FLINK-4038 - FLINK-4040 until the 1.1. release or would you 
like to have somebody else fixing this?

> Impossible to set more than 1 JVM argument in env.java.opts
> ---
>
> Key: FLINK-4038
> URL: https://issues.apache.org/jira/browse/FLINK-4038
> Project: Flink
>  Issue Type: Bug
>  Components: Start-Stop Scripts
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Gyula Fora
>Priority: Critical
> Fix For: 1.1.0
>
>
> The Taskmanager start scripts fail when env.java.opts contains more than 1 
> jvm opts due to:
> if [[ $FLINK_TM_MEM_PRE_ALLOCATE == "false" ]] && [ -z $FLINK_ENV_JAVA_OPTS 
> ]; then
> -z checks the length of the first argument but it fails if it has more than 1 
> argument



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2093: [FLINK-3714] Add Support for "Allowed Lateness"

2016-06-16 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2093#discussion_r67339362
  
--- Diff: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperatorTest.java
 ---
@@ -917,6 +929,465 @@ public void testRestoreAndSnapshotAreInSync() throws 
Exception {
Assert.assertEquals(operator.processingTimeTimerTimestamps, 
otherOperator.processingTimeTimerTimestamps);
}
 
+   @Test
+   public void testLateness() throws Exception {
+   final int WINDOW_SIZE = 2;
+   final long LATENESS = 500;
+
+   TypeInformation> inputType = 
TypeInfoParser.parse("Tuple2");
+
+   ReducingStateDescriptor> stateDesc = 
new ReducingStateDescriptor<>("window-contents",
+   new SumReducer(),
+   inputType.createSerializer(new ExecutionConfig()));
+
+   WindowOperator, Tuple2, Tuple2, TimeWindow> operator =
+   new WindowOperator<>(
+   
TumblingEventTimeWindows.of(Time.of(WINDOW_SIZE, TimeUnit.SECONDS)),
+   new TimeWindow.Serializer(),
+   new TupleKeySelector(),
+   
BasicTypeInfo.STRING_TYPE_INFO.createSerializer(new ExecutionConfig()),
+   stateDesc,
+   new InternalSingleValueWindowFunction<>(new 
PassThroughWindowFunction>()),
+   EventTimeTrigger.create(),
+   LATENESS);
+
+   OneInputStreamOperatorTestHarness, 
Tuple2> testHarness =
+   new OneInputStreamOperatorTestHarness<>(operator);
+
+   testHarness.configureForKeyedStream(new TupleKeySelector(), 
BasicTypeInfo.STRING_TYPE_INFO);
+
+   operator.setInputType(inputType, new ExecutionConfig());
+   testHarness.open();
+
+   long initialTime = 0L;
+   testHarness.processElement(new StreamRecord<>(new 
Tuple2<>("key2", 1), initialTime + 500));
+   testHarness.processWatermark(new Watermark(initialTime + 1500));
+
+   testHarness.processElement(new StreamRecord<>(new 
Tuple2<>("key2", 1), initialTime + 1300));
+
+   testHarness.processWatermark(new Watermark(initialTime + 2300));
+
+   // this will not be dropped because window.maxTimestamp() + 
allowedLateness > currentWatermark
+   testHarness.processElement(new StreamRecord<>(new 
Tuple2<>("key2", 1), initialTime + 1997));
+   testHarness.processWatermark(new Watermark(initialTime + 6000));
+
+   // this will be dropped because window.maxTimestamp() + 
allowedLateness < currentWatermark
+   testHarness.processElement(new StreamRecord<>(new 
Tuple2<>("key2", 1), initialTime + 1998));
+   testHarness.processWatermark(new Watermark(initialTime + 7000));
+
+   Tuple2 el1 = new Tuple2<>("key2", 2);
+   // the following is 1 and not  3because the trigger fires and 
purges.
+   Tuple2 el2 = new Tuple2<>("key2", 1);
+
+   ConcurrentLinkedQueue expected = new 
ConcurrentLinkedQueue<>();
+
+   expected.add(new Watermark(initialTime + 1500));
+   expected.add(new StreamRecord<>(el1, initialTime + 1999));
+
+   expected.add(new Watermark(initialTime + 2300));
+   expected.add(new StreamRecord<>(el2, initialTime + 1999));
+
+   expected.add(new Watermark(initialTime + 6000));
+   expected.add(new Watermark(initialTime + 7000));
+
+   TestHarnessUtil.assertOutputEqualsSorted("Output was not 
correct.", expected, testHarness.getOutput(), new Tuple2ResultSortComparator());
+   testHarness.close();
+   }
+
+   @Test
+   public void testDropDueToLatenessTumbling() throws Exception {
--- End diff --

No element is dropped in this test. If I'm not mistaken.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4038) Impossible to set more than 1 JVM argument in env.java.opts

2016-06-16 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333740#comment-15333740
 ] 

Robert Metzger commented on FLINK-4038:
---

Can you work on FLINK-4038 - FLINK-4040 until the 1.1. release or would you 
like to have somebody else fixing this?

> Impossible to set more than 1 JVM argument in env.java.opts
> ---
>
> Key: FLINK-4038
> URL: https://issues.apache.org/jira/browse/FLINK-4038
> Project: Flink
>  Issue Type: Bug
>  Components: Start-Stop Scripts
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Gyula Fora
>Priority: Critical
> Fix For: 1.1.0
>
>
> The Taskmanager start scripts fail when env.java.opts contains more than 1 
> jvm opts due to:
> if [[ $FLINK_TM_MEM_PRE_ALLOCATE == "false" ]] && [ -z $FLINK_ENV_JAVA_OPTS 
> ]; then
> -z checks the length of the first argument but it fails if it has more than 1 
> argument



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3714) Add Support for "Allowed Lateness"

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333725#comment-15333725
 ] 

ASF GitHub Bot commented on FLINK-3714:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2093#discussion_r67339362
  
--- Diff: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperatorTest.java
 ---
@@ -917,6 +929,465 @@ public void testRestoreAndSnapshotAreInSync() throws 
Exception {
Assert.assertEquals(operator.processingTimeTimerTimestamps, 
otherOperator.processingTimeTimerTimestamps);
}
 
+   @Test
+   public void testLateness() throws Exception {
+   final int WINDOW_SIZE = 2;
+   final long LATENESS = 500;
+
+   TypeInformation> inputType = 
TypeInfoParser.parse("Tuple2");
+
+   ReducingStateDescriptor> stateDesc = 
new ReducingStateDescriptor<>("window-contents",
+   new SumReducer(),
+   inputType.createSerializer(new ExecutionConfig()));
+
+   WindowOperator, Tuple2, Tuple2, TimeWindow> operator =
+   new WindowOperator<>(
+   
TumblingEventTimeWindows.of(Time.of(WINDOW_SIZE, TimeUnit.SECONDS)),
+   new TimeWindow.Serializer(),
+   new TupleKeySelector(),
+   
BasicTypeInfo.STRING_TYPE_INFO.createSerializer(new ExecutionConfig()),
+   stateDesc,
+   new InternalSingleValueWindowFunction<>(new 
PassThroughWindowFunction>()),
+   EventTimeTrigger.create(),
+   LATENESS);
+
+   OneInputStreamOperatorTestHarness, 
Tuple2> testHarness =
+   new OneInputStreamOperatorTestHarness<>(operator);
+
+   testHarness.configureForKeyedStream(new TupleKeySelector(), 
BasicTypeInfo.STRING_TYPE_INFO);
+
+   operator.setInputType(inputType, new ExecutionConfig());
+   testHarness.open();
+
+   long initialTime = 0L;
+   testHarness.processElement(new StreamRecord<>(new 
Tuple2<>("key2", 1), initialTime + 500));
+   testHarness.processWatermark(new Watermark(initialTime + 1500));
+
+   testHarness.processElement(new StreamRecord<>(new 
Tuple2<>("key2", 1), initialTime + 1300));
+
+   testHarness.processWatermark(new Watermark(initialTime + 2300));
+
+   // this will not be dropped because window.maxTimestamp() + 
allowedLateness > currentWatermark
+   testHarness.processElement(new StreamRecord<>(new 
Tuple2<>("key2", 1), initialTime + 1997));
+   testHarness.processWatermark(new Watermark(initialTime + 6000));
+
+   // this will be dropped because window.maxTimestamp() + 
allowedLateness < currentWatermark
+   testHarness.processElement(new StreamRecord<>(new 
Tuple2<>("key2", 1), initialTime + 1998));
+   testHarness.processWatermark(new Watermark(initialTime + 7000));
+
+   Tuple2 el1 = new Tuple2<>("key2", 2);
+   // the following is 1 and not  3because the trigger fires and 
purges.
+   Tuple2 el2 = new Tuple2<>("key2", 1);
+
+   ConcurrentLinkedQueue expected = new 
ConcurrentLinkedQueue<>();
+
+   expected.add(new Watermark(initialTime + 1500));
+   expected.add(new StreamRecord<>(el1, initialTime + 1999));
+
+   expected.add(new Watermark(initialTime + 2300));
+   expected.add(new StreamRecord<>(el2, initialTime + 1999));
+
+   expected.add(new Watermark(initialTime + 6000));
+   expected.add(new Watermark(initialTime + 7000));
+
+   TestHarnessUtil.assertOutputEqualsSorted("Output was not 
correct.", expected, testHarness.getOutput(), new Tuple2ResultSortComparator());
+   testHarness.close();
+   }
+
+   @Test
+   public void testDropDueToLatenessTumbling() throws Exception {
--- End diff --

No element is dropped in this test. If I'm not mistaken.


> Add Support for "Allowed Lateness"
> --
>
> Key: FLINK-3714
> URL: https://issues.apache.org/jira/browse/FLINK-3714
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: 

[jira] [Updated] (FLINK-3967) Provide RethinkDB Sink for Flink

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-3967:
--
Fix Version/s: (was: 1.1.0)

> Provide RethinkDB Sink for Flink
> 
>
> Key: FLINK-3967
> URL: https://issues.apache.org/jira/browse/FLINK-3967
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming, Streaming Connectors
>Affects Versions: 1.0.3
> Environment: All
>Reporter: Mans Singh
>Assignee: Mans Singh
>Priority: Minor
>  Labels: features
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Provide Sink to stream data from flink to rethink db.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2114: FLINT-3839 Added the extraction of jar files from ...

2016-06-16 Thread thormanrd
GitHub user thormanrd opened a pull request:

https://github.com/apache/flink/pull/2114

FLINT-3839 Added the extraction of jar files from a URL spec that is …

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

…a directory in the form of file:///path/to/jars/<*>

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/thormanrd/flink feature/FLINK-3839

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2114.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2114


commit e71514574e6e7d5623eaf4e556d708c642ccb6f2
Author: Bob Thorman 
Date:   2016-06-16T12:48:53Z

FLINT-3839 Added the extraction of jar files from a URL spec that is a 
directory in the form of file:///path/to/jars/<*>




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3890) Deprecate streaming mode flag from Yarn CLI

2016-06-16 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333712#comment-15333712
 ] 

Robert Metzger commented on FLINK-3890:
---

This will be addressed by https://github.com/apache/flink/pull/2085

> Deprecate streaming mode flag from Yarn CLI
> ---
>
> Key: FLINK-3890
> URL: https://issues.apache.org/jira/browse/FLINK-3890
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
> Fix For: 1.1.0
>
>
> The {{-yst}} and {{-yarnstreaming}} parameter to the Yarn command-line is not 
> in use anymore since FLINK-3073 and should have been removed before the 1.0.0 
> release. I would suggest to mark the parameter as deprecated in the code and 
> not list it anymore in the help section. In one of the upcoming major 
> releases, we can remove it completely (which would give users an error if 
> they used the flag).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3875) Add a TableSink for Elasticsearch

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-3875:
--
Fix Version/s: (was: 1.1.0)

> Add a TableSink for Elasticsearch
> -
>
> Key: FLINK-3875
> URL: https://issues.apache.org/jira/browse/FLINK-3875
> Project: Flink
>  Issue Type: New Feature
>Reporter: Fabian Hueske
>Priority: Minor
>
> Add a TableSink that writes data to Elasticsearch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3873) Add a Kafka TableSink with Avro serialization

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-3873:
--
Fix Version/s: (was: 1.1.0)

> Add a Kafka TableSink with Avro serialization
> -
>
> Key: FLINK-3873
> URL: https://issues.apache.org/jira/browse/FLINK-3873
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: Fabian Hueske
>Priority: Minor
>
> Add a TableSink that writes Avro serialized data to Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3874) Add a Kafka TableSink with JSON serialization

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-3874:
--
Fix Version/s: (was: 1.1.0)

> Add a Kafka TableSink with JSON serialization
> -
>
> Key: FLINK-3874
> URL: https://issues.apache.org/jira/browse/FLINK-3874
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: Fabian Hueske
>Priority: Minor
>
> Add a TableSink that writes JSON serialized data to Kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3995) Properly Structure Test Utils and Dependencies

2016-06-16 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333701#comment-15333701
 ] 

Robert Metzger commented on FLINK-3995:
---

There is a pull request for this one, with the wrong JIRA id in the title: 
https://github.com/apache/flink/pull/2092

> Properly Structure Test Utils and Dependencies
> --
>
> Key: FLINK-3995
> URL: https://issues.apache.org/jira/browse/FLINK-3995
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.0.2
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.1.0
>
>
> All valuable test utils are only found in {{test-jars}}, but should be found 
> in the {{compile}} scope of the test util projects.
>   - TestLogger
>   - RetryRules
>   - MiniClusters
>   - TestEnvironments
>   - ...
> Additionally, we have dependencies where the {{compile}} scope of some 
> projects depends on {{test-jars}} of other projects. That can create problems 
> in some builds and with some tools.
> Here is how we can fix that:
>   - Create a {{flink-testutils-core}} project, which has the test utils 
> currently contained in the {{flink-core}} {{test-jar}} in the main scope. 
> That means the {{flink-core test-jar}} is not needed by other projects any 
> more.
>   - Make the Mini Cluster available in {{flink-test-utils}} main scope.
>   - To remove the test-jar dependency on {{flink-runtime}} from the 
> {{flink-test-utils}} project, we need to move the test actor classes to the 
> main scope in {{flink-runtime}}.
> This is related to FLINK-1827 (a followup).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4082) Add Setting for LargeRecordHandler

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333702#comment-15333702
 ] 

ASF GitHub Bot commented on FLINK-4082:
---

GitHub user aljoscha opened a pull request:

https://github.com/apache/flink/pull/2113

[FLINK-4082] Add Setting for enabling/disabling LargeRecordHandler

By default this is set to disabled because there are known issues when
users specify a custom TypeInformation.

R: @tillrohrmann (the `R:` is for review) I think you know how stuff should 
work there
CC: @StephanEwen 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/flink large-record-handler/setting

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2113.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2113


commit 183e5ab32951ae0ee8392efd147936bb4711d966
Author: Aljoscha Krettek 
Date:   2016-06-16T12:39:23Z

[FLINK-4082] Add Setting for enabling/disabling LargeRecordHandler

By default this is set to disabled because there are known issues when
users specify a custom TypeInformation.




> Add Setting for LargeRecordHandler
> --
>
> Key: FLINK-4082
> URL: https://issues.apache.org/jira/browse/FLINK-4082
> Project: Flink
>  Issue Type: Improvement
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Now, this is always enabled but there are known problems when users specify a 
> custom {{TypeInformation}}. We should introduce a setting for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3951) Add Histogram Metric Type

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333703#comment-15333703
 ] 

ASF GitHub Bot commented on FLINK-3951:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2112
  
We decided to add the histogram metric, because users requested this 
feature. The important thing is that the histograms are not used by the runtime 
per default, because they indeed don't come for free. But if the user decides 
to use such a metric, then he should be able to explicitly register a histogram.

Additionally, we don't want to add a strict dependency on dropwizard from 
flink-core. Therefore, I introduced the `Histogram` interface which effectively 
hides the concrete implementation. So later we can easily swap the 
implementation out.

One of these implementations is the `DropwizardHistogramWrapper` which 
allows you to use Dropwizard histograms in Flink. The reason for this is that 
Dropwizard's `Histogram` has already a lot of functionality (different 
reservoirs for sampling streams, for example) and it would be a lot of 
duplicate work to reimplement it.

Since the assumption is that `Histograms` are a user metric, it seems to be 
ok for me to not ship an implementation of the interface with flink-core but to 
require the user to include an additional module such as 
flink-metrics-dropwizard. This module would add the dropwizard dependency 
anyway. If necessary, then we can also add our own histogram implementation 
later.

The PR does not introduce a Dropwizard compatibility layer. Dropwizard is 
only one of many possible implementation for our metrics (counter, gauge and 
histogram so far). It is up to the community to decide which other metrics to 
add.

I agree that the naming of `HistogramWrapper` is not so distinctive. I 
actually sticked to the naming scheme for `Counter` and `Gauge` which are 
called `CounterWrapper` and `GaugeWrapper`. This can be  easily changed. Do you 
have a proposal?

I'm sorry if you feel angry because you already did some of this work which 
was later removed. But please keep in mind that not everyone was involved in 
the metrics PR.


> Add Histogram Metric Type
> -
>
> Key: FLINK-3951
> URL: https://issues.apache.org/jira/browse/FLINK-3951
> Project: Flink
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Stephan Ewen
>Assignee: Till Rohrmann
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2112: [FLINK-3951] Add histogram metric type

2016-06-16 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2112
  
We decided to add the histogram metric, because users requested this 
feature. The important thing is that the histograms are not used by the runtime 
per default, because they indeed don't come for free. But if the user decides 
to use such a metric, then he should be able to explicitly register a histogram.

Additionally, we don't want to add a strict dependency on dropwizard from 
flink-core. Therefore, I introduced the `Histogram` interface which effectively 
hides the concrete implementation. So later we can easily swap the 
implementation out.

One of these implementations is the `DropwizardHistogramWrapper` which 
allows you to use Dropwizard histograms in Flink. The reason for this is that 
Dropwizard's `Histogram` has already a lot of functionality (different 
reservoirs for sampling streams, for example) and it would be a lot of 
duplicate work to reimplement it.

Since the assumption is that `Histograms` are a user metric, it seems to be 
ok for me to not ship an implementation of the interface with flink-core but to 
require the user to include an additional module such as 
flink-metrics-dropwizard. This module would add the dropwizard dependency 
anyway. If necessary, then we can also add our own histogram implementation 
later.

The PR does not introduce a Dropwizard compatibility layer. Dropwizard is 
only one of many possible implementation for our metrics (counter, gauge and 
histogram so far). It is up to the community to decide which other metrics to 
add.

I agree that the naming of `HistogramWrapper` is not so distinctive. I 
actually sticked to the naming scheme for `Counter` and `Gauge` which are 
called `CounterWrapper` and `GaugeWrapper`. This can be  easily changed. Do you 
have a proposal?

I'm sorry if you feel angry because you already did some of this work which 
was later removed. But please keep in mind that not everyone was involved in 
the metrics PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3395) Polishing the web UI

2016-06-16 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333700#comment-15333700
 ] 

Robert Metzger commented on FLINK-3395:
---

The PR opened by Stephan has the wrong issue id. So its unrelated.

> Polishing the web UI
> 
>
> Key: FLINK-3395
> URL: https://issues.apache.org/jira/browse/FLINK-3395
> Project: Flink
>  Issue Type: Improvement
>  Components: Webfrontend
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> On the job properties page one must select an operator from the plan. 
> Elsewhere in the UI a list of operators is displayed and clicking the table 
> or the plan will reveal the requested information.
> A list of operators could likewise be added to the timeline page.
> The job exceptions page should display a "No exceptions" notification as done 
> elsewhere for when there is nothing to display.
> The job plan is not redrawn when the browser window is resized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2113: [FLINK-4082] Add Setting for enabling/disabling La...

2016-06-16 Thread aljoscha
GitHub user aljoscha opened a pull request:

https://github.com/apache/flink/pull/2113

[FLINK-4082] Add Setting for enabling/disabling LargeRecordHandler

By default this is set to disabled because there are known issues when
users specify a custom TypeInformation.

R: @tillrohrmann (the `R:` is for review) I think you know how stuff should 
work there
CC: @StephanEwen 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/flink large-record-handler/setting

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2113.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2113


commit 183e5ab32951ae0ee8392efd147936bb4711d966
Author: Aljoscha Krettek 
Date:   2016-06-16T12:39:23Z

[FLINK-4082] Add Setting for enabling/disabling LargeRecordHandler

By default this is set to disabled because there are known issues when
users specify a custom TypeInformation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3950) Add Meter Metric Type

2016-06-16 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333692#comment-15333692
 ] 

Robert Metzger commented on FLINK-3950:
---

[~Zentol] is this in the scope of the 1.1 release or for later?

> Add Meter Metric Type
> -
>
> Key: FLINK-3950
> URL: https://issues.apache.org/jira/browse/FLINK-3950
> Project: Flink
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Stephan Ewen
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3933) Add an auto-type-extracting DeserializationSchema

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-3933:
--
Component/s: Streaming Connectors

> Add an auto-type-extracting DeserializationSchema
> -
>
> Key: FLINK-3933
> URL: https://issues.apache.org/jira/browse/FLINK-3933
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming, Streaming Connectors
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.1.0
>
>
> When creating a {{DeserializationSchema}}, people need to manually worry 
> about how to provide the produced type's {{TypeInformation}}.
> We should add a base utility {{AbstractDeserializationSchema}} that provides 
> that automatically via the type extractor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3933) Add an auto-type-extracting DeserializationSchema

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-3933.
---
Resolution: Fixed

This has been merged in 
https://github.com/apache/flink/commit/92e1c82cc545b80c3f82e01a97708aa8d70b3806

> Add an auto-type-extracting DeserializationSchema
> -
>
> Key: FLINK-3933
> URL: https://issues.apache.org/jira/browse/FLINK-3933
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming, Streaming Connectors
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 1.1.0
>
>
> When creating a {{DeserializationSchema}}, people need to manually worry 
> about how to provide the produced type's {{TypeInformation}}.
> We should add a base utility {{AbstractDeserializationSchema}} that provides 
> that automatically via the type extractor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4082) Add Setting for LargeRecordHandler

2016-06-16 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-4082:
---

 Summary: Add Setting for LargeRecordHandler
 Key: FLINK-4082
 URL: https://issues.apache.org/jira/browse/FLINK-4082
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek


Now, this is always enabled but there are known problems when users specify a 
custom {{TypeInformation}}. We should introduce a setting for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3895) Remove Docs for Program Packaging via Plans

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-3895:
--
Fix Version/s: (was: 1.1.0)

> Remove Docs for Program Packaging via Plans
> ---
>
> Key: FLINK-3895
> URL: https://issues.apache.org/jira/browse/FLINK-3895
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.0.2
>Reporter: Stephan Ewen
>
> As a first step, we should remove the docs that describe packaging via plans, 
> in order to avoid confusion among users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3871) Add Kafka TableSource with Avro serialization

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-3871:
--
Fix Version/s: (was: 1.1.0)

> Add Kafka TableSource with Avro serialization
> -
>
> Key: FLINK-3871
> URL: https://issues.apache.org/jira/browse/FLINK-3871
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: Fabian Hueske
>
> Add a Kafka TableSource which supports Avro serialized data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4021) Problem of setting autoread for netty channel when more tasks sharing the same Tcp connection

2016-06-16 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333683#comment-15333683
 ] 

Ufuk Celebi commented on FLINK-4021:


No, cleared the fix version.

> Problem of setting autoread for netty channel when more tasks sharing the 
> same Tcp connection
> -
>
> Key: FLINK-4021
> URL: https://issues.apache.org/jira/browse/FLINK-4021
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Affects Versions: 1.0.2
>Reporter: Zhijiang Wang
>Assignee: Zhijiang Wang
>
> More than one task sharing the same Tcp connection for shuffling data.
> If the downstream task said as "A" has no available memory segment to read 
> netty buffer from network, it will set autoread as false for the channel.
> When the task A is failed or has available segments again, the netty handler 
> will be notified to process the staging buffers first, then reset autoread as 
> true. But in some scenarios, the autoread will not be set as true any more.
> That is when processing staging buffers, first find the corresponding input 
> channel for the buffer, if the task for that input channel is failed, the 
> decodeMsg method in PartitionRequestClientHandler will return false, that 
> means setting autoread as true will not be done anymore.
> In summary,  if one task "A" sets the autoread as false because of no 
> available segments, and resulting in some staging buffers. If another task 
> "B" is failed by accident corresponding to one staging buffer. When task A 
> trys to reset autoread as true, the process can not work because of task B 
> failed.
> I have fixed this problem in our application by adding one boolean parameter 
> in decodeBufferOrEvent method to distinguish whether this method is invoke by 
> netty IO thread channel read or staged message handler task in 
> PartitionRequestClientHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3894) Remove Program Packaging via Plans

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-3894:
--
Fix Version/s: (was: 1.1.0)

> Remove Program Packaging via Plans
> --
>
> Key: FLINK-3894
> URL: https://issues.apache.org/jira/browse/FLINK-3894
> Project: Flink
>  Issue Type: Improvement
>  Components: Command-line client
>Affects Versions: 1.0.2
>Reporter: Stephan Ewen
>
> Flink still supports a legacy way of packaging programs via implementing the 
> {{Program}} interface. 
> The reason for that was the support of the now removed **Record API**.
> We should remove that way to package a program as well, to reduce complexity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-4021) Problem of setting autoread for netty channel when more tasks sharing the same Tcp connection

2016-06-16 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-4021:
---
Fix Version/s: (was: 1.1.0)

> Problem of setting autoread for netty channel when more tasks sharing the 
> same Tcp connection
> -
>
> Key: FLINK-4021
> URL: https://issues.apache.org/jira/browse/FLINK-4021
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Affects Versions: 1.0.2
>Reporter: Zhijiang Wang
>Assignee: Zhijiang Wang
>
> More than one task sharing the same Tcp connection for shuffling data.
> If the downstream task said as "A" has no available memory segment to read 
> netty buffer from network, it will set autoread as false for the channel.
> When the task A is failed or has available segments again, the netty handler 
> will be notified to process the staging buffers first, then reset autoread as 
> true. But in some scenarios, the autoread will not be set as true any more.
> That is when processing staging buffers, first find the corresponding input 
> channel for the buffer, if the task for that input channel is failed, the 
> decodeMsg method in PartitionRequestClientHandler will return false, that 
> means setting autoread as true will not be done anymore.
> In summary,  if one task "A" sets the autoread as false because of no 
> available segments, and resulting in some staging buffers. If another task 
> "B" is failed by accident corresponding to one staging buffer. When task A 
> trys to reset autoread as true, the process can not work because of task B 
> failed.
> I have fixed this problem in our application by adding one boolean parameter 
> in decodeBufferOrEvent method to distinguish whether this method is invoke by 
> netty IO thread channel read or staged message handler task in 
> PartitionRequestClientHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3864) Yarn tests don't check for prohibited strings in log output

2016-06-16 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333680#comment-15333680
 ] 

Robert Metzger commented on FLINK-3864:
---

Have you started working on this?
Will this go into the 1.1. release?

> Yarn tests don't check for prohibited strings in log output
> ---
>
> Key: FLINK-3864
> URL: https://issues.apache.org/jira/browse/FLINK-3864
> Project: Flink
>  Issue Type: Bug
>  Components: Tests, YARN Client
>Affects Versions: 1.1.0, 1.0.2
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> {{YarnTestBase.runWithArgs(...)}} provides a parameter for strings which must 
> not appear in the log output. {{perJobYarnCluster}} and 
> {{perJobYarnClusterWithParallelism}} have "System.out)" prepended to the 
> prohibited strings; probably an artifact of an older test code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3864) Yarn tests don't check for prohibited strings in log output

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-3864:
--
Component/s: YARN Client

> Yarn tests don't check for prohibited strings in log output
> ---
>
> Key: FLINK-3864
> URL: https://issues.apache.org/jira/browse/FLINK-3864
> Project: Flink
>  Issue Type: Bug
>  Components: Tests, YARN Client
>Affects Versions: 1.1.0, 1.0.2
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> {{YarnTestBase.runWithArgs(...)}} provides a parameter for strings which must 
> not appear in the log output. {{perJobYarnCluster}} and 
> {{perJobYarnClusterWithParallelism}} have "System.out)" prepended to the 
> prohibited strings; probably an artifact of an older test code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3827) Flink modules include unused dependencies

2016-06-16 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333677#comment-15333677
 ] 

Robert Metzger commented on FLINK-3827:
---

[~till.rohrmann] What's the status here. Do you still think we need to fix this 
for 1.1. ?
I guess some of the issues are fixed by 
https://github.com/apache/flink/pull/2092 ?

> Flink modules include unused dependencies
> -
>
> Key: FLINK-3827
> URL: https://issues.apache.org/jira/browse/FLINK-3827
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
> Fix For: 1.1.0
>
>
> A quick look via {{mvn dependency:analyze}} revealed that many Flink modules 
> include dependencies which they don't really need. We should fix this for the 
> next {{1.1}} release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3720) Add warm starts for models

2016-06-16 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-3720:
--
Fix Version/s: (was: 1.1.0)

> Add warm starts for models
> --
>
> Key: FLINK-3720
> URL: https://issues.apache.org/jira/browse/FLINK-3720
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Trevor Grant
>Assignee: Trevor Grant
>
> Add 'warm-start' to Iterative Solver. 
> - Make weight vector settable (this will allow for model saving/loading)
> - Make iterator existing weight vector if available
> - Keep track of what iteration we're on for additional partial fits in SGD 
> (and anywhere else it makes sense). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >