[jira] [Commented] (FLINK-7009) dogstatsd mode in statsd reporter

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064262#comment-16064262
 ] 

ASF GitHub Bot commented on FLINK-7009:
---

Github user dbrinegar commented on the issue:

https://github.com/apache/flink/pull/4188
  
The check fail seems to be an elasticsearch timeout, otherwise is all green:

```
Tests run: 9, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 66.507 sec 
<<< FAILURE! - in 
org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkITCase

testDeprecatedIndexRequestBuilderVariant(org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkITCase)
  Time elapsed: 60.187 sec  <<< ERROR!
```

```
Caused by: java.lang.RuntimeException: Elasticsearch client is not 
connected to any Elasticsearch nodes!
at 
org.apache.flink.streaming.connectors.elasticsearch.Elasticsearch1ApiCallBridge.createClient(Elasticsearch1ApiCallBridge.java:105)
```


> dogstatsd mode in statsd reporter
> -
>
> Key: FLINK-7009
> URL: https://issues.apache.org/jira/browse/FLINK-7009
> Project: Flink
>  Issue Type: Improvement
>  Components: Metrics
>Affects Versions: 1.4.0
> Environment: org.apache.flink.metrics.statsd.StatsDReporter
>Reporter: David Brinegar
> Fix For: 1.4.0
>
>
> The current statsd reporter can only report a subset of Flink metrics owing 
> to the manner in which Flink variables are handled, mainly around invalid 
> characters and metrics too long.  As an option, it would be quite useful to 
> have a stricter dogstatsd compliant output.  Dogstatsd metrics are tagged, 
> should be less than 200 characters including tag names and values, be 
> alphanumeric + underbar, delimited by periods.  As a further pragmatic 
> restriction, negative and other invalid values should be ignored rather than 
> sent to the backend.  These restrictions play well with a broad set of 
> collectors and time series databases.
> This mode would:
> * convert output to ascii alphanumeric characters with underbar, delimited by 
> periods.  Runs of invalid characters within a metric segment would be 
> collapsed to a single underbar.
> * report all Flink variables as tags
> * compress overly long segments, say over 50 chars, to a symbolic 
> representation of the metric name, to preserve the unique metric time series 
> but avoid downstream truncation
> * compress 32 character Flink IDs like tm_id, task_id, job_id, 
> task_attempt_id, to the first 8 characters, again to preserve enough 
> distinction amongst metrics while trimming up to 96 characters from the metric
> * remove object references from names, such as the instance hash id of the 
> serializer
> * drop negative or invalid numeric values such as "n/a", "-1" which is used 
> for unknowns like JVM.Memory.NonHeap.Max, and "-9223372036854775808" which is 
> used for unknowns like currentLowWaterMark
> With these in place, it becomes quite reasonable to support LatencyGauge 
> metrics as well.
> One idea for symbolic compression is to take the first 10 valid characters 
> plus a hash of the long name.  For example, a value like this operator_name:
> {code:java}
> TriggerWindow(TumblingProcessingTimeWindows(5000), 
> ReducingStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.PojoSerializer@f3395ffa,
>  
> reduceFunction=org.apache.flink.streaming.examples.socket.SocketWindowWordCount$1@4201c465},
>  ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java-301))
> {code}
> would first drop the instance references.  The stable version would be:
>  
> {code:java}
> TriggerWindow(TumblingProcessingTimeWindows(5000), 
> ReducingStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.PojoSerializer,
>  
> reduceFunction=org.apache.flink.streaming.examples.socket.SocketWindowWordCount$1},
>  ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java-301))
> {code}
> and then the compressed name would be the first ten valid characters plus the 
> hash of the stable string:
> {code}
> TriggerWin_d8c007da
> {code}
> This is just one way of dealing with unruly default names, the main point 
> would be to preserve the metrics so they are valid, avoid truncation, and can 
> be aggregated along other dimensions even if this particular dimension is 
> hard to parse after the compression.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4188: [FLINK-7009] dogstatsd mode in statds reporter

2017-06-26 Thread dbrinegar
Github user dbrinegar commented on the issue:

https://github.com/apache/flink/pull/4188
  
The check fail seems to be an elasticsearch timeout, otherwise is all green:

```
Tests run: 9, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 66.507 sec 
<<< FAILURE! - in 
org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkITCase

testDeprecatedIndexRequestBuilderVariant(org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkITCase)
  Time elapsed: 60.187 sec  <<< ERROR!
```

```
Caused by: java.lang.RuntimeException: Elasticsearch client is not 
connected to any Elasticsearch nodes!
at 
org.apache.flink.streaming.connectors.elasticsearch.Elasticsearch1ApiCallBridge.createClient(Elasticsearch1ApiCallBridge.java:105)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6987) TextInputFormatTest fails when run in path containing spaces

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064232#comment-16064232
 ] 

ASF GitHub Bot commented on FLINK-6987:
---

Github user zhangminglei commented on a diff in the pull request:

https://github.com/apache/flink/pull/4168#discussion_r124175328
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -461,8 +463,9 @@ public LocatableInputSplitAssigner 
getInputSplitAssigner(FileInputSplit[] splits

// take the desired number of splits into account
minNumSplits = Math.max(minNumSplits, this.numSplits);
-   
-   final Path path = this.filePath;
+
+   final Path path = new 
Path(URLDecoder.decode(this.filePath.toString(), 
Charset.defaultCharset().name()));
--- End diff --

I have updated the code. Please check again. One thing we should concern 
though. 

FYI, though, it is not very relevant to this issue here.

We can see, for **1** and **2** output results, they are both the same. 
That is to say, call ```toString()``` from ```toURI()``` does not change the 
internal of the result. But, if we call them in context of 
```flink.core.fs.Path```. what I got is **3** and **4** outputs. **It seems 
```new Path``` change someting stuff**. I guess that we  should expect that not 
happen under this context. Instead, I expect ```new 
Path(parentDirTest.toURI().toString())``` return the same value as ```new 
Path(parentDirTest.toURI())```. Now, they are not equal. 

```File parentDirTest = new File("D:\\projects\\hello world");```
```System.out.println(parentDirTest.toURI()); ```

1. **Output: file:/D:/projects/hello%20world/**

```System.out.println(parentDirTest.toURI().toString());```

2. **Output: file:/D:/projects/hello%20world/**

```System.out.println(new Path(parentDirTest.toURI())); ```

3. **Output: file:/D:/projects/hello world/**

```System.out.println(new Path(parentDirTest.toURI().toString()));```

4. **Output: file:/D:/projects/hello%20world**


> TextInputFormatTest fails when run in path containing spaces
> 
>
> Key: FLINK-6987
> URL: https://issues.apache.org/jira/browse/FLINK-6987
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.3.1
>Reporter: Timo Walther
>Assignee: mingleizhang
>
> The test {{TextInputFormatTest.testNestedFileRead}} fails if the path 
> contains spaces.
> Reason: "Test erroneous"
> I was building Flink on MacOS 10.12.5 and the folder was called "flink-1.3.1 
> 2".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4168: [FLINK-6987] Fix erroneous when path containing sp...

2017-06-26 Thread zhangminglei
Github user zhangminglei commented on a diff in the pull request:

https://github.com/apache/flink/pull/4168#discussion_r124175328
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -461,8 +463,9 @@ public LocatableInputSplitAssigner 
getInputSplitAssigner(FileInputSplit[] splits

// take the desired number of splits into account
minNumSplits = Math.max(minNumSplits, this.numSplits);
-   
-   final Path path = this.filePath;
+
+   final Path path = new 
Path(URLDecoder.decode(this.filePath.toString(), 
Charset.defaultCharset().name()));
--- End diff --

I have updated the code. Please check again. One thing we should concern 
though. 

FYI, though, it is not very relevant to this issue here.

We can see, for **1** and **2** output results, they are both the same. 
That is to say, call ```toString()``` from ```toURI()``` does not change the 
internal of the result. But, if we call them in context of 
```flink.core.fs.Path```. what I got is **3** and **4** outputs. **It seems 
```new Path``` change someting stuff**. I guess that we  should expect that not 
happen under this context. Instead, I expect ```new 
Path(parentDirTest.toURI().toString())``` return the same value as ```new 
Path(parentDirTest.toURI())```. Now, they are not equal. 

```File parentDirTest = new File("D:\\projects\\hello world");```
```System.out.println(parentDirTest.toURI()); ```

1. **Output: file:/D:/projects/hello%20world/**

```System.out.println(parentDirTest.toURI().toString());```

2. **Output: file:/D:/projects/hello%20world/**

```System.out.println(new Path(parentDirTest.toURI())); ```

3. **Output: file:/D:/projects/hello world/**

```System.out.println(new Path(parentDirTest.toURI().toString()));```

4. **Output: file:/D:/projects/hello%20world**


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6939) Not store IterativeCondition with NFA state

2017-06-26 Thread Jark Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064216#comment-16064216
 ] 

Jark Wu commented on FLINK-6939:


Hi [~kkl0u] , thank you for your suggestions on PR #4145, they are all good to 
me. 

BTW, take FLINK-6983 into account, FLINK-6983 want to separate the 
storing/restoring of States and NFA, which means only restore/create States 
once in the lifecycle of CEP operator. In this way, we can not introduce 
{{ConditionRegistry}} and many lines touched. I think maybe it make sense to 
implement this issue in FLINK-6983. What do you think [~kkl0u] [~dian.fu] 
[~dawidwys] ?

> Not store IterativeCondition with NFA state
> ---
>
> Key: FLINK-6939
> URL: https://issues.apache.org/jira/browse/FLINK-6939
> Project: Flink
>  Issue Type: Sub-task
>  Components: CEP
>Reporter: Jark Wu
>Assignee: Jark Wu
> Fix For: 1.4.0
>
>
> Currently, the IterativeCondition is stored with the total NFA state. And 
> de/serialized every time when update/get the NFA state. It is a heavy 
> operation and not necessary. In addition it is a required feature for 
> FLINK-6938.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6888) Can not determine TypeInformation of ACC type of AggregateFunction when ACC is a Scala case/tuple class

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064202#comment-16064202
 ] 

ASF GitHub Bot commented on FLINK-6888:
---

Github user wuchong commented on a diff in the pull request:

https://github.com/apache/flink/pull/4105#discussion_r124171844
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/TableEnvironment.scala
 ---
@@ -358,20 +358,27 @@ abstract class TableEnvironment(val config: 
TableConfig) {
 * Registers an [[AggregateFunction]] under a unique name. Replaces 
already existing
 * user-defined functions under this name.
 */
-  private[flink] def registerAggregateFunctionInternal[T: TypeInformation, 
ACC](
+  private[flink] def registerAggregateFunctionInternal[T: TypeInformation, 
ACC: TypeInformation](
   name: String, function: AggregateFunction[T, ACC]): Unit = {
 // check if class not Scala object
 checkNotSingleton(function.getClass)
 // check if class could be instantiated
 checkForInstantiation(function.getClass)
 
-val typeInfo: TypeInformation[_] = implicitly[TypeInformation[T]]
+val resultTypeInfo: TypeInformation[_] = implicitly[TypeInformation[T]]
+val accTypeInfo: TypeInformation[_] = implicitly[TypeInformation[ACC]]
 
 // register in Table API
 functionCatalog.registerFunction(name, function.getClass)
 
 // register in SQL API
-val sqlFunctions = createAggregateSqlFunction(name, function, 
typeInfo, typeFactory)
+val sqlFunctions = createAggregateSqlFunction(
--- End diff --

Ping @fhueske , what do you think about this? 


> Can not determine TypeInformation of ACC type of AggregateFunction when ACC 
> is a Scala case/tuple class
> ---
>
> Key: FLINK-6888
> URL: https://issues.apache.org/jira/browse/FLINK-6888
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Reporter: Jark Wu
>Assignee: Jark Wu
> Fix For: 1.4.0
>
>
> Currently the {{ACC}} TypeInformation of 
> {{org.apache.flink.table.functions.AggregateFunction[T, ACC]}} is extracted 
> using {{TypeInformation.of(Class)}}. When {{ACC}} is a Scala case class or 
> tuple class, the TypeInformation will fall back to {{GenericType}} which 
> result in bad performance when state de/serialization. 
> I suggest to extract the ACC TypeInformation when called 
> {{TableEnvironment.registerFunction()}}.
> Here is an example:
> {code}
> case class Accumulator(sum: Long, count: Long)
> class MyAgg extends AggregateFunction[Long, Accumulator] {
>   //Overloaded accumulate method
>   def accumulate(acc: Accumulator, value: Long): Unit = {
>   }
>   override def createAccumulator(): Accumulator = Accumulator(0, 0)
>   override def getValue(accumulator: Accumulator): Long = 1
> }
> {code}
> The {{Accumulator}} will be recognized as {{GenericType}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4105: [FLINK-6888] [table] Can not determine TypeInforma...

2017-06-26 Thread wuchong
Github user wuchong commented on a diff in the pull request:

https://github.com/apache/flink/pull/4105#discussion_r124171844
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/TableEnvironment.scala
 ---
@@ -358,20 +358,27 @@ abstract class TableEnvironment(val config: 
TableConfig) {
 * Registers an [[AggregateFunction]] under a unique name. Replaces 
already existing
 * user-defined functions under this name.
 */
-  private[flink] def registerAggregateFunctionInternal[T: TypeInformation, 
ACC](
+  private[flink] def registerAggregateFunctionInternal[T: TypeInformation, 
ACC: TypeInformation](
   name: String, function: AggregateFunction[T, ACC]): Unit = {
 // check if class not Scala object
 checkNotSingleton(function.getClass)
 // check if class could be instantiated
 checkForInstantiation(function.getClass)
 
-val typeInfo: TypeInformation[_] = implicitly[TypeInformation[T]]
+val resultTypeInfo: TypeInformation[_] = implicitly[TypeInformation[T]]
+val accTypeInfo: TypeInformation[_] = implicitly[TypeInformation[ACC]]
 
 // register in Table API
 functionCatalog.registerFunction(name, function.getClass)
 
 // register in SQL API
-val sqlFunctions = createAggregateSqlFunction(name, function, 
typeInfo, typeFactory)
+val sqlFunctions = createAggregateSqlFunction(
--- End diff --

Ping @fhueske , what do you think about this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6958) Async I/O timeout not work

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064200#comment-16064200
 ] 

ASF GitHub Bot commented on FLINK-6958:
---

GitHub user wuchong opened a pull request:

https://github.com/apache/flink/pull/4189

[FLINK-6958] [async] Async I/O hang when the source is bounded and collect 
timeout

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wuchong/flink async-io

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4189.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4189


commit 20456e24738c663f9f8afc71d426dbf5e6c73523
Author: Jark Wu 
Date:   2017-06-27T03:30:53Z

[FLINK-6958] [async] Async I/O hang when the source is bounded and collect 
timeout




> Async I/O timeout not work
> --
>
> Key: FLINK-6958
> URL: https://issues.apache.org/jira/browse/FLINK-6958
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.2.1
>Reporter: feng xiaojie
>Assignee: Jark Wu
>
> when use Async I/O with UnorderedStreamElementQueue, the queue will always 
> full if you don't  call the AsyncCollector.collect to ack them.
> Timeout shall collect these entries when the timeout trigger,but it isn't work
> I debug find,
> when time out, it will call resultFuture.completeExceptionally(error);
> but not call  UnorderedStreamElementQueue.onCompleteHandler
> it will cause that async i/o hang always



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6958) Async I/O timeout not work

2017-06-26 Thread Jark Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064201#comment-16064201
 ] 

Jark Wu commented on FLINK-6958:


Thank your for the advise [~till.rohrmann], I have created a PR for this, 
please correct me if I missed something.

> Async I/O timeout not work
> --
>
> Key: FLINK-6958
> URL: https://issues.apache.org/jira/browse/FLINK-6958
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.2.1
>Reporter: feng xiaojie
>Assignee: Jark Wu
>
> when use Async I/O with UnorderedStreamElementQueue, the queue will always 
> full if you don't  call the AsyncCollector.collect to ack them.
> Timeout shall collect these entries when the timeout trigger,but it isn't work
> I debug find,
> when time out, it will call resultFuture.completeExceptionally(error);
> but not call  UnorderedStreamElementQueue.onCompleteHandler
> it will cause that async i/o hang always



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4189: [FLINK-6958] [async] Async I/O hang when the sourc...

2017-06-26 Thread wuchong
GitHub user wuchong opened a pull request:

https://github.com/apache/flink/pull/4189

[FLINK-6958] [async] Async I/O hang when the source is bounded and collect 
timeout

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wuchong/flink async-io

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4189.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4189


commit 20456e24738c663f9f8afc71d426dbf5e6c73523
Author: Jark Wu 
Date:   2017-06-27T03:30:53Z

[FLINK-6958] [async] Async I/O hang when the source is bounded and collect 
timeout




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6969) Add support for deferred computation for group window aggregates

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064177#comment-16064177
 ] 

ASF GitHub Bot commented on FLINK-6969:
---

Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/4183
  
Oh, one thing more, would be great to add the document to the website.


> Add support for deferred computation for group window aggregates
> 
>
> Key: FLINK-6969
> URL: https://issues.apache.org/jira/browse/FLINK-6969
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: Fabian Hueske
>Assignee: sunjincheng
>
> Deferred computation is a strategy to deal with late arriving data and avoid 
> updates of previous results. Instead of computing a result as soon as it is 
> possible (i.e., when a corresponding watermark was received), deferred 
> computation adds a configurable amount of slack time in which late data is 
> accepted before the result is compute. For example, instead of computing a 
> tumbling window of 1 hour at each full hour, we can add a deferred 
> computation interval of 15 minute to compute the result quarter past each 
> full hour.
> This approach adds latency but can reduce the number of update esp. in use 
> cases where the user cannot influence the generation of watermarks. It is 
> also useful if the data is emitted to a system that cannot update result 
> (files or Kafka). The deferred computation interval should be configured via 
> the {{QueryConfig}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4183: [FLINK-6969][table]Add support for deferred computation f...

2017-06-26 Thread wuchong
Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/4183
  
Oh, one thing more, would be great to add the document to the website.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6987) TextInputFormatTest fails when run in path containing spaces

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064174#comment-16064174
 ] 

ASF GitHub Bot commented on FLINK-6987:
---

Github user zhangminglei commented on a diff in the pull request:

https://github.com/apache/flink/pull/4168#discussion_r124168503
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -461,8 +463,9 @@ public LocatableInputSplitAssigner 
getInputSplitAssigner(FileInputSplit[] splits

// take the desired number of splits into account
minNumSplits = Math.max(minNumSplits, this.numSplits);
-   
-   final Path path = this.filePath;
+
+   final Path path = new 
Path(URLDecoder.decode(this.filePath.toString(), 
Charset.defaultCharset().name()));
--- End diff --

Oh. Yeah. You are very correct, we need make the less changes to test. PR 
will update soon today.


> TextInputFormatTest fails when run in path containing spaces
> 
>
> Key: FLINK-6987
> URL: https://issues.apache.org/jira/browse/FLINK-6987
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.3.1
>Reporter: Timo Walther
>Assignee: mingleizhang
>
> The test {{TextInputFormatTest.testNestedFileRead}} fails if the path 
> contains spaces.
> Reason: "Test erroneous"
> I was building Flink on MacOS 10.12.5 and the folder was called "flink-1.3.1 
> 2".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4168: [FLINK-6987] Fix erroneous when path containing sp...

2017-06-26 Thread zhangminglei
Github user zhangminglei commented on a diff in the pull request:

https://github.com/apache/flink/pull/4168#discussion_r124168503
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -461,8 +463,9 @@ public LocatableInputSplitAssigner 
getInputSplitAssigner(FileInputSplit[] splits

// take the desired number of splits into account
minNumSplits = Math.max(minNumSplits, this.numSplits);
-   
-   final Path path = this.filePath;
+
+   final Path path = new 
Path(URLDecoder.decode(this.filePath.toString(), 
Charset.defaultCharset().name()));
--- End diff --

Oh. Yeah. You are very correct, we need make the less changes to test. PR 
will update soon today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-7010) Lamdba expression in flatMap throws InvalidTypesException in DataSet

2017-06-26 Thread Fang Yong (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang Yong updated FLINK-7010:
-
Description: 
When I create an example and use lambda in flatMap as follows
{noformat}
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet source = env.fromCollection(
Lists.newArrayList("hello", "flink", "test", "flat", "map", "lambda"));
DataSet> tupled = source.flatMap((word, out) -> {
  int length = word.length();
  out.collect(Tuple2.of(length, word));
});
try {
  tupled.print();
} catch (Exception e) {
  throw new RuntimeException(e);
}
{noformat}

InvalidTypesException was throwed and the exception stack is as follows:
{noformat}
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
return type of function 'testFlatMap(FlatMapTest.java:20)' could not be 
determined automatically, due to type erasure. You can give type information 
hints by using the returns(...) method on the result of the transformation 
call, or by letting your function implement the 'ResultTypeQueryable' interface.
at org.apache.flink.api.java.DataSet.getType(DataSet.java:178)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:407)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
generic type parameters of 'Collector' are missing. 
It seems that your compiler has not stored them into the .class file. 
Currently, only the Eclipse JDT compiler preserves the type information 
necessary to use the lambdas feature type-safely. 
See the documentation for more information about how to compile jobs containing 
lambda expressions.
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameter(TypeExtractor.java:1653)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameters(TypeExtractor.java:1639)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:573)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:188)
at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:266)
{noformat}

The 20th line code is
{noformat}
 DataSet> tupled = source.flatMap((word, out) -> { 
{noformat}
When I use FlatMapFunction instead of lambda, it will be all right

  was:
When I create an example and use lambda in flatMap as follows
{noformat}
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet source = env.fromCollection(
Lists.newArrayList("hello", "flink", "test", "flat", "map", "lambda"));
DataSet> tupled = source.flatMap((word, out) -> {
  int length = word.length();
  out.collect(Tuple2.of(length, word));
});
try {
  tupled.print();
} catch (Exception e) {
  throw new RuntimeException(e);
}
{noformat}

InvalidTypesException was throwed and the exception stack is as follows:
{noformat}
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
return type of function 'testFlatMap(FlatMapTest.java:20)' could not be 
determined automatically, due to type erasure. You can give type information 
hints by using the returns(...) method on the result of the transformation 
call, or by letting your function implement the 'ResultTypeQueryable' interface.
at org.apache.flink.api.java.DataSet.getType(DataSet.java:178)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:407)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
generic type parameters of 'Collector' are missing. 
It seems that your compiler has not stored them into the .class file. 
Currently, only the Eclipse JDT compiler preserves the type information 
necessary to use the lambdas feature type-safely. 
See the documentation for more information about how to compile jobs containing 
lambda expressions.
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameter(TypeExtractor.java:1653)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameters(TypeExtractor.java:1639)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:573)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:188)
at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:266)


The 20th line code is
{noformat}
 DataSet> tupled = source.flatMap((word, out) -> { 
{noformat}
When I use FlatMapFunction instead of lambda, it will be all right


> Lamdba 

[jira] [Updated] (FLINK-7010) Lamdba expression in flatMap throws InvalidTypesException in DataSet

2017-06-26 Thread Fang Yong (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang Yong updated FLINK-7010:
-
Description: 
When I create an example and use lambda in flatMap as follows
{noformat}
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet source = env.fromCollection(
Lists.newArrayList("hello", "flink", "test", "flat", "map", "lambda"));
DataSet> tupled = source.flatMap((word, out) -> {
  int length = word.length();
  out.collect(Tuple2.of(length, word));
});
try {
  tupled.print();
} catch (Exception e) {
  throw new RuntimeException(e);
}
{noformat}

InvalidTypesException was throwed and the exception stack is as follows:
{noformat}
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
return type of function 'testFlatMap(FlatMapTest.java:20)' could not be 
determined automatically, due to type erasure. You can give type information 
hints by using the returns(...) method on the result of the transformation 
call, or by letting your function implement the 'ResultTypeQueryable' interface.
at org.apache.flink.api.java.DataSet.getType(DataSet.java:178)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:407)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
generic type parameters of 'Collector' are missing. 
It seems that your compiler has not stored them into the .class file. 
Currently, only the Eclipse JDT compiler preserves the type information 
necessary to use the lambdas feature type-safely. 
See the documentation for more information about how to compile jobs containing 
lambda expressions.
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameter(TypeExtractor.java:1653)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameters(TypeExtractor.java:1639)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:573)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:188)
at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:266)


The 20th line code is
{noformat}
 DataSet> tupled = source.flatMap((word, out) -> { 
{noformat}
When I use FlatMapFunction instead of lambda, it will be all right

  was:
When I create an example and use lambda in flatMap as follows
{noformat}
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet source = env.fromCollection(
Lists.newArrayList("hello", "flink", "test", "flat", "map", "lambda"));
DataSet> tupled = source.flatMap((word, out) -> {
  int length = word.length();
  out.collect(Tuple2.of(length, word));
});
try {
  tupled.print();
} catch (Exception e) {
  throw new RuntimeException(e);
}
{noformat}

InvalidTypesException was throwed and the exception stack is as follows:
{noformat}
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
return type of function 'testFlatMap(FlatMapTest.java:20)' could not be 
determined automatically, due to type erasure. You can give type information 
hints by using the returns(...) method on the result of the transformation 
call, or by letting your function implement the 'ResultTypeQueryable' interface.
at org.apache.flink.api.java.DataSet.getType(DataSet.java:178)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:407)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
generic type parameters of 'Collector' are missing. 
It seems that your compiler has not stored them into the .class file. 
Currently, only the Eclipse JDT compiler preserves the type information 
necessary to use the lambdas feature type-safely. 
See the documentation for more information about how to compile jobs containing 
lambda expressions.
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameter(TypeExtractor.java:1653)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameters(TypeExtractor.java:1639)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:573)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:188)
at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:266)
{noformat}

The 20th line code is {{ DataSet> tupled = 
source.flatMap((word, out) -> { }}
When I use FlatMapFunction instead of lambda, it will be all right


> Lamdba expression in flatMap 

[jira] [Updated] (FLINK-7010) Lamdba expression in flatMap throws InvalidTypesException in DataSet

2017-06-26 Thread Fang Yong (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang Yong updated FLINK-7010:
-
Description: 
When I create an example and use lambda in flatMap as follows
{noformat}
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet source = env.fromCollection(
Lists.newArrayList("hello", "flink", "test", "flat", "map", "lambda"));
DataSet> tupled = source.flatMap((word, out) -> {
  int length = word.length();
  out.collect(Tuple2.of(length, word));
});
try {
  tupled.print();
} catch (Exception e) {
  throw new RuntimeException(e);
}
{noformat}

InvalidTypesException was throwed and the exception stack is as follows:
{noformat}
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
return type of function 'testFlatMap(FlatMapTest.java:20)' could not be 
determined automatically, due to type erasure. You can give type information 
hints by using the returns(...) method on the result of the transformation 
call, or by letting your function implement the 'ResultTypeQueryable' interface.
at org.apache.flink.api.java.DataSet.getType(DataSet.java:178)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:407)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
generic type parameters of 'Collector' are missing. 
It seems that your compiler has not stored them into the .class file. 
Currently, only the Eclipse JDT compiler preserves the type information 
necessary to use the lambdas feature type-safely. 
See the documentation for more information about how to compile jobs containing 
lambda expressions.
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameter(TypeExtractor.java:1653)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameters(TypeExtractor.java:1639)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:573)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:188)
at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:266)
{noformat}

The 20th line code is {{ DataSet> tupled = 
source.flatMap((word, out) -> { }}
When I use FlatMapFunction instead of lambda, it will be all right

  was:
When I create an example and use lambda in flatMap as follows

{{ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet source = env.fromCollection(
Lists.newArrayList("hello", "flink", "test", "flat", "map", "lambda"));
DataSet> tupled = source.flatMap((word, out) -> {
  int length = word.length();
  out.collect(Tuple2.of(length, word));
});
try {
  tupled.print();
} catch (Exception e) {
  throw new RuntimeException(e);
}}}

InvalidTypesException was throwed and the exception stack is as follows:
{{
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
return type of function 'testFlatMap(FlatMapTest.java:20)' could not be 
determined automatically, due to type erasure. You can give type information 
hints by using the returns(...) method on the result of the transformation 
call, or by letting your function implement the 'ResultTypeQueryable' interface.
at org.apache.flink.api.java.DataSet.getType(DataSet.java:178)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:407)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
generic type parameters of 'Collector' are missing. 
It seems that your compiler has not stored them into the .class file. 
Currently, only the Eclipse JDT compiler preserves the type information 
necessary to use the lambdas feature type-safely. 
See the documentation for more information about how to compile jobs containing 
lambda expressions.
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameter(TypeExtractor.java:1653)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameters(TypeExtractor.java:1639)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:573)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:188)
at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:266)
}}

The 20th line code is {{ DataSet> tupled = 
source.flatMap((word, out) -> { }}
When I use FlatMapFunction instead of lambda, it will be all right


> Lamdba expression in flatMap throws InvalidTypesException in DataSet
> 

[jira] [Created] (FLINK-7010) Lamdba expression in flatMap throws InvalidTypesException in DataSet

2017-06-26 Thread Fang Yong (JIRA)
Fang Yong created FLINK-7010:


 Summary: Lamdba expression in flatMap throws InvalidTypesException 
in DataSet
 Key: FLINK-7010
 URL: https://issues.apache.org/jira/browse/FLINK-7010
 Project: Flink
  Issue Type: Bug
Reporter: Fang Yong


When I create an example and use lambda in flatMap as follows
{{
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet source = env.fromCollection(
Lists.newArrayList("hello", "flink", "test", "flat", "map", "lambda"));
DataSet> tupled = source.flatMap((word, out) -> {
  int length = word.length();
  out.collect(Tuple2.of(length, word));
});
try {
  tupled.print();
} catch (Exception e) {
  throw new RuntimeException(e);
}
}}
InvalidTypesException was throwed and the exception stack is as follows:
{{
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
return type of function 'testFlatMap(FlatMapTest.java:20)' could not be 
determined automatically, due to type erasure. You can give type information 
hints by using the returns(...) method on the result of the transformation 
call, or by letting your function implement the 'ResultTypeQueryable' interface.
at org.apache.flink.api.java.DataSet.getType(DataSet.java:178)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:407)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1605)
Caused by: org.apache.flink.api.common.functions.InvalidTypesException: The 
generic type parameters of 'Collector' are missing. 
It seems that your compiler has not stored them into the .class file. 
Currently, only the Eclipse JDT compiler preserves the type information 
necessary to use the lambdas feature type-safely. 
See the documentation for more information about how to compile jobs containing 
lambda expressions.
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameter(TypeExtractor.java:1653)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.validateLambdaGenericParameters(TypeExtractor.java:1639)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getUnaryOperatorReturnType(TypeExtractor.java:573)
at 
org.apache.flink.api.java.typeutils.TypeExtractor.getFlatMapReturnTypes(TypeExtractor.java:188)
at org.apache.flink.api.java.DataSet.flatMap(DataSet.java:266)
}}

The 20th line code is {{ DataSet> tupled = 
source.flatMap((word, out) -> { }}
When I use FlatMapFunction instead of lambda, it will be all right



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6464) Metric name is not stable

2017-06-26 Thread David Brinegar (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064097#comment-16064097
 ] 

David Brinegar commented on FLINK-6464:
---

nice find!  FLINK-7009 tries to address this by removing the instance ids, then 
using a hash of the remaining stable part of the string as a compressed metric 
name.  So the above would convert into something like "TriggerWin_abcdef12" 
which is at least the same every time you run the job, and short so metric 
systems can handle it without truncation or conversion problems, but in the end 
only a shorter more stable default name, not particularly readable in itself.  
Thoughts?

> Metric name is not stable
> -
>
> Key: FLINK-6464
> URL: https://issues.apache.org/jira/browse/FLINK-6464
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API, Metrics
>Affects Versions: 1.2.0
>Reporter: Andrey
>
> Currently according to the documentation 
> (https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/metrics.html)
>  operator metrics constructed using the following pattern:
> , 
> For some operators, "operator_name" could contain default implementation of 
> toString method. For example:
> {code}
> TriggerWindow(TumblingProcessingTimeWindows(3000), 
> ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer@c65792d4},
>  xxx.Trigger@665fe457, WindowedStream.apply(WindowedStream.java:521)) -> 
> Sink: Unnamed
> {code}
> The part "@c65792d4" will be changed every time job is restarted/cancelled. 
> As a consequence it's not possible to store metrics for a long time.
> Expected:
> * ensure all operators return human readable, non-default names OR
> * change the way TriggerWindow generates it's name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-5486) Lack of synchronization in BucketingSink#handleRestoredBucketState()

2017-06-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-5486:
--
Description: 
Here is related code:

{code}
  
handlePendingFilesForPreviousCheckpoints(bucketState.pendingFilesPerCheckpoint);

  synchronized (bucketState.pendingFilesPerCheckpoint) {
bucketState.pendingFilesPerCheckpoint.clear();
  }
{code}

The handlePendingFilesForPreviousCheckpoints() call should be enclosed inside 
the synchronization block. Otherwise during the processing of 
handlePendingFilesForPreviousCheckpoints(), some entries of the map may be 
cleared.

  was:
Here is related code:
{code}
  
handlePendingFilesForPreviousCheckpoints(bucketState.pendingFilesPerCheckpoint);

  synchronized (bucketState.pendingFilesPerCheckpoint) {
bucketState.pendingFilesPerCheckpoint.clear();
  }
{code}

The handlePendingFilesForPreviousCheckpoints() call should be enclosed inside 
the synchronization block. Otherwise during the processing of 
handlePendingFilesForPreviousCheckpoints(), some entries of the map may be 
cleared.


> Lack of synchronization in BucketingSink#handleRestoredBucketState()
> 
>
> Key: FLINK-5486
> URL: https://issues.apache.org/jira/browse/FLINK-5486
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Reporter: Ted Yu
>
> Here is related code:
> {code}
>   
> handlePendingFilesForPreviousCheckpoints(bucketState.pendingFilesPerCheckpoint);
>   synchronized (bucketState.pendingFilesPerCheckpoint) {
> bucketState.pendingFilesPerCheckpoint.clear();
>   }
> {code}
> The handlePendingFilesForPreviousCheckpoints() call should be enclosed inside 
> the synchronization block. Otherwise during the processing of 
> handlePendingFilesForPreviousCheckpoints(), some entries of the map may be 
> cleared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-5541) Missing null check for localJar in FlinkSubmitter#submitTopology()

2017-06-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-5541:
--
Description: 
{code}
  if (localJar == null) {
try {
  for (final URL url : ((ContextEnvironment) 
ExecutionEnvironment.getExecutionEnvironment())
  .getJars()) {
// TODO verify that there is only one jar
localJar = new File(url.toURI()).getAbsolutePath();
  }
} catch (final URISyntaxException e) {
  // ignore
} catch (final ClassCastException e) {
  // ignore
}
  }

  logger.info("Submitting topology " + name + " in distributed mode with 
conf " + serConf);
  client.submitTopologyWithOpts(name, localJar, topology);
{code}
Since the try block may encounter URISyntaxException / ClassCastException, we 
should check that localJar is not null before calling submitTopologyWithOpts().

  was:
{code}
  if (localJar == null) {
try {
  for (final URL url : ((ContextEnvironment) 
ExecutionEnvironment.getExecutionEnvironment())
  .getJars()) {
// TODO verify that there is only one jar
localJar = new File(url.toURI()).getAbsolutePath();
  }
} catch (final URISyntaxException e) {
  // ignore
} catch (final ClassCastException e) {
  // ignore
}
  }

  logger.info("Submitting topology " + name + " in distributed mode with 
conf " + serConf);
  client.submitTopologyWithOpts(name, localJar, topology);
{code}

Since the try block may encounter URISyntaxException / ClassCastException, we 
should check that localJar is not null before calling submitTopologyWithOpts().


> Missing null check for localJar in FlinkSubmitter#submitTopology()
> --
>
> Key: FLINK-5541
> URL: https://issues.apache.org/jira/browse/FLINK-5541
> Project: Flink
>  Issue Type: Bug
>  Components: Storm Compatibility
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
>   if (localJar == null) {
> try {
>   for (final URL url : ((ContextEnvironment) 
> ExecutionEnvironment.getExecutionEnvironment())
>   .getJars()) {
> // TODO verify that there is only one jar
> localJar = new File(url.toURI()).getAbsolutePath();
>   }
> } catch (final URISyntaxException e) {
>   // ignore
> } catch (final ClassCastException e) {
>   // ignore
> }
>   }
>   logger.info("Submitting topology " + name + " in distributed mode with 
> conf " + serConf);
>   client.submitTopologyWithOpts(name, localJar, topology);
> {code}
> Since the try block may encounter URISyntaxException / ClassCastException, we 
> should check that localJar is not null before calling 
> submitTopologyWithOpts().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-6105) Properly handle InterruptedException in HadoopInputFormatBase

2017-06-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-6105:
--
Description: 
When catching InterruptedException, we should throw InterruptedIOException 
instead of IOException.

The following example is from HadoopInputFormatBase :

{code}
try {
  splits = this.mapreduceInputFormat.getSplits(jobContext);
} catch (InterruptedException e) {
  throw new IOException("Could not get Splits.", e);
}
{code}
There may be other places where IOE is thrown.

  was:
When catching InterruptedException, we should throw InterruptedIOException 
instead of IOException.

The following example is from HadoopInputFormatBase :
{code}
try {
  splits = this.mapreduceInputFormat.getSplits(jobContext);
} catch (InterruptedException e) {
  throw new IOException("Could not get Splits.", e);
}
{code}
There may be other places where IOE is thrown.


> Properly handle InterruptedException in HadoopInputFormatBase
> -
>
> Key: FLINK-6105
> URL: https://issues.apache.org/jira/browse/FLINK-6105
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Reporter: Ted Yu
>
> When catching InterruptedException, we should throw InterruptedIOException 
> instead of IOException.
> The following example is from HadoopInputFormatBase :
> {code}
> try {
>   splits = this.mapreduceInputFormat.getSplits(jobContext);
> } catch (InterruptedException e) {
>   throw new IOException("Could not get Splits.", e);
> }
> {code}
> There may be other places where IOE is thrown.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-6422) Unreachable code in FileInputFormat#createInputSplits

2017-06-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-6422:
--
Description: 
Here is related code:

{code}
if (minNumSplits < 1) {
  throw new IllegalArgumentException("Number of input splits has to be at 
least 1.");
}
...
final long maxSplitSize = (minNumSplits < 1) ? Long.MAX_VALUE : 
(totalLength / minNumSplits +
  (totalLength % minNumSplits == 0 ? 0 : 1));
{code}
minNumSplits wouldn't be less than 1 getting to the assignment of maxSplitSize.

  was:
Here is related code:
{code}
if (minNumSplits < 1) {
  throw new IllegalArgumentException("Number of input splits has to be at 
least 1.");
}
...
final long maxSplitSize = (minNumSplits < 1) ? Long.MAX_VALUE : 
(totalLength / minNumSplits +
  (totalLength % minNumSplits == 0 ? 0 : 1));
{code}
minNumSplits wouldn't be less than 1 getting to the assignment of maxSplitSize.


> Unreachable code in FileInputFormat#createInputSplits
> -
>
> Key: FLINK-6422
> URL: https://issues.apache.org/jira/browse/FLINK-6422
> Project: Flink
>  Issue Type: Bug
>  Components: Batch Connectors and Input/Output Formats
>Reporter: Ted Yu
>Assignee: mingleizhang
>Priority: Minor
>
> Here is related code:
> {code}
> if (minNumSplits < 1) {
>   throw new IllegalArgumentException("Number of input splits has to be at 
> least 1.");
> }
> ...
> final long maxSplitSize = (minNumSplits < 1) ? Long.MAX_VALUE : 
> (totalLength / minNumSplits +
>   (totalLength % minNumSplits == 0 ? 0 : 1));
> {code}
> minNumSplits wouldn't be less than 1 getting to the assignment of 
> maxSplitSize.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-26 Thread Cliff Resnick (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064079#comment-16064079
 ] 

Cliff Resnick commented on FLINK-6964:
--

[~srichter] Looks good from this end, all tests passed. Thanks!

> Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore
> --
>
> Key: FLINK-6964
> URL: https://issues.apache.org/jira/browse/FLINK-6964
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>
> {{StandaloneCompletedCheckpointStore}} does not register shared states ion 
> resume. However, for externalized checkpoints, it register the checkpoint 
> from which it resumed. This checkpoint gets added to the completed checkpoint 
> store as part of resume.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-7003) "select * from" in Flink SQL should not flatten all fields in the table by default

2017-06-26 Thread Shuyi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuyi Chen updated FLINK-7003:
--
Description: 
Currently, CompositeRelDataType is extended from 
RelRecordType(StructKind.PEEK_FIELDS, ...).  In Calcite, StructKind.PEEK_FIELDS 
would allow us to peek fields for nested types. However, when we use "select * 
from", calcite will flatten all nested fields that is marked as 
StructKind.PEEK_FIELDS in the table. 

For example, if the table structure *T* is as follows:

{code:java}
VARCHAR K0,
VARCHAR C1,
RecordType:peek_no_flattening(INTEGER C0, INTEGER C1) F0,
RecordType:peek_no_flattening(INTEGER C0, INTEGER C2) F1

{code}
The following query
{code:java}
Select * from T
{code} should return (K0, C1, F0, F1) instead of (K0, C1, F0.C0, F0.C1, F1.C0, 
F1.C1), which is the current behavior.


  was:Currently, CompositeRelDataType is extended from 
RelRecordType(StructKind.PEEK_FIELDS, ...).  In Calcite, StructKind.PEEK_FIELDS 
would allow us to peek fields for nested types. However, when we use "select * 
from", calcite will flatten all nested fields that is marked as 
StructKind.PEEK_FIELDS in the table. 


> "select * from" in Flink SQL should not flatten all fields in the table by 
> default
> --
>
> Key: FLINK-7003
> URL: https://issues.apache.org/jira/browse/FLINK-7003
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Reporter: Shuyi Chen
>
> Currently, CompositeRelDataType is extended from 
> RelRecordType(StructKind.PEEK_FIELDS, ...).  In Calcite, 
> StructKind.PEEK_FIELDS would allow us to peek fields for nested types. 
> However, when we use "select * from", calcite will flatten all nested fields 
> that is marked as StructKind.PEEK_FIELDS in the table. 
> For example, if the table structure *T* is as follows:
> {code:java}
> VARCHAR K0,
> VARCHAR C1,
> RecordType:peek_no_flattening(INTEGER C0, INTEGER C1) F0,
> RecordType:peek_no_flattening(INTEGER C0, INTEGER C2) F1
> {code}
> The following query
> {code:java}
> Select * from T
> {code} should return (K0, C1, F0, F1) instead of (K0, C1, F0.C0, F0.C1, 
> F1.C0, F1.C1), which is the current behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7009) dogstatsd mode in statsd reporter

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063958#comment-16063958
 ] 

ASF GitHub Bot commented on FLINK-7009:
---

GitHub user dbrinegar opened a pull request:

https://github.com/apache/flink/pull/4188

[FLINK-7009] dogstatsd mode in statds reporter

* converts output to ascii alphanumeric characters with underbar,
delimited by periods
* reports all Flink variables as tags
* compresses overly long segments with a first-ten plus hash symbol
* compresses Flink ID values to first eight characters
* removes object references from names, for correctness
* drops negative and invalid values
* handles LatencyGauge values

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dbrinegar/flink dogstatsd

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4188.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4188


commit e8badbe2771b1e5f35c0b2c49d5ffde930a59acf
Author: David Brinegar 
Date:   2017-06-26T23:13:46Z

[FLINK-7009] dogstatsd mode in statds reporter

* converts output to ascii alphanumeric characters with underbar,
delimited by periods
* reports all Flink variables as tags
* compresses overly long segments with a first-ten plus hash symbol
* compresses Flink ID values to first eight characters
* removes object references from names, for correctness
* drops negative and invalid values
* handles LatencyGauge values




> dogstatsd mode in statsd reporter
> -
>
> Key: FLINK-7009
> URL: https://issues.apache.org/jira/browse/FLINK-7009
> Project: Flink
>  Issue Type: Improvement
>  Components: Metrics
>Affects Versions: 1.4.0
> Environment: org.apache.flink.metrics.statsd.StatsDReporter
>Reporter: David Brinegar
> Fix For: 1.4.0
>
>
> The current statsd reporter can only report a subset of Flink metrics owing 
> to the manner in which Flink variables are handled, mainly around invalid 
> characters and metrics too long.  As an option, it would be quite useful to 
> have a stricter dogstatsd compliant output.  Dogstatsd metrics are tagged, 
> should be less than 200 characters including tag names and values, be 
> alphanumeric + underbar, delimited by periods.  As a further pragmatic 
> restriction, negative and other invalid values should be ignored rather than 
> sent to the backend.  These restrictions play well with a broad set of 
> collectors and time series databases.
> This mode would:
> * convert output to ascii alphanumeric characters with underbar, delimited by 
> periods.  Runs of invalid characters within a metric segment would be 
> collapsed to a single underbar.
> * report all Flink variables as tags
> * compress overly long segments, say over 50 chars, to a symbolic 
> representation of the metric name, to preserve the unique metric time series 
> but avoid downstream truncation
> * compress 32 character Flink IDs like tm_id, task_id, job_id, 
> task_attempt_id, to the first 8 characters, again to preserve enough 
> distinction amongst metrics while trimming up to 96 characters from the metric
> * remove object references from names, such as the instance hash id of the 
> serializer
> * drop negative or invalid numeric values such as "n/a", "-1" which is used 
> for unknowns like JVM.Memory.NonHeap.Max, and "-9223372036854775808" which is 
> used for unknowns like currentLowWaterMark
> With these in place, it becomes quite reasonable to support LatencyGauge 
> metrics as well.
> One idea for symbolic compression is to take the first 10 valid characters 
> plus a hash of the long name.  For example, a value like this operator_name:
> {code:java}
> TriggerWindow(TumblingProcessingTimeWindows(5000), 
> ReducingStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.PojoSerializer@f3395ffa,
>  
> reduceFunction=org.apache.flink.streaming.examples.socket.SocketWindowWordCount$1@4201c465},
>  ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java-301))
> {code}
> would first drop the instance references.  The stable version would be:
>  
> {code:java}
> TriggerWindow(TumblingProcessingTimeWindows(5000), 
> ReducingStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.PojoSerializer,
>  
> reduceFunction=org.apache.flink.streaming.examples.socket.SocketWindowWordCount$1},
>  ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java-301))
> {code}
> and then the compressed name would be the first ten valid characters plus the 
> hash of the stable string:
> {code}
> TriggerWin_d8c007da
> 

[GitHub] flink pull request #4188: [FLINK-7009] dogstatsd mode in statds reporter

2017-06-26 Thread dbrinegar
GitHub user dbrinegar opened a pull request:

https://github.com/apache/flink/pull/4188

[FLINK-7009] dogstatsd mode in statds reporter

* converts output to ascii alphanumeric characters with underbar,
delimited by periods
* reports all Flink variables as tags
* compresses overly long segments with a first-ten plus hash symbol
* compresses Flink ID values to first eight characters
* removes object references from names, for correctness
* drops negative and invalid values
* handles LatencyGauge values

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dbrinegar/flink dogstatsd

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4188.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4188


commit e8badbe2771b1e5f35c0b2c49d5ffde930a59acf
Author: David Brinegar 
Date:   2017-06-26T23:13:46Z

[FLINK-7009] dogstatsd mode in statds reporter

* converts output to ascii alphanumeric characters with underbar,
delimited by periods
* reports all Flink variables as tags
* compresses overly long segments with a first-ten plus hash symbol
* compresses Flink ID values to first eight characters
* removes object references from names, for correctness
* drops negative and invalid values
* handles LatencyGauge values




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-7009) dogstatsd mode in statsd reporter

2017-06-26 Thread David Brinegar (JIRA)
David Brinegar created FLINK-7009:
-

 Summary: dogstatsd mode in statsd reporter
 Key: FLINK-7009
 URL: https://issues.apache.org/jira/browse/FLINK-7009
 Project: Flink
  Issue Type: Improvement
  Components: Metrics
Affects Versions: 1.4.0
 Environment: org.apache.flink.metrics.statsd.StatsDReporter
Reporter: David Brinegar
 Fix For: 1.4.0


The current statsd reporter can only report a subset of Flink metrics owing to 
the manner in which Flink variables are handled, mainly around invalid 
characters and metrics too long.  As an option, it would be quite useful to 
have a stricter dogstatsd compliant output.  Dogstatsd metrics are tagged, 
should be less than 200 characters including tag names and values, be 
alphanumeric + underbar, delimited by periods.  As a further pragmatic 
restriction, negative and other invalid values should be ignored rather than 
sent to the backend.  These restrictions play well with a broad set of 
collectors and time series databases.

This mode would:

* convert output to ascii alphanumeric characters with underbar, delimited by 
periods.  Runs of invalid characters within a metric segment would be collapsed 
to a single underbar.
* report all Flink variables as tags
* compress overly long segments, say over 50 chars, to a symbolic 
representation of the metric name, to preserve the unique metric time series 
but avoid downstream truncation
* compress 32 character Flink IDs like tm_id, task_id, job_id, task_attempt_id, 
to the first 8 characters, again to preserve enough distinction amongst metrics 
while trimming up to 96 characters from the metric
* remove object references from names, such as the instance hash id of the 
serializer
* drop negative or invalid numeric values such as "n/a", "-1" which is used for 
unknowns like JVM.Memory.NonHeap.Max, and "-9223372036854775808" which is used 
for unknowns like currentLowWaterMark

With these in place, it becomes quite reasonable to support LatencyGauge 
metrics as well.


One idea for symbolic compression is to take the first 10 valid characters plus 
a hash of the long name.  For example, a value like this operator_name:

{code:java}
TriggerWindow(TumblingProcessingTimeWindows(5000), 
ReducingStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.PojoSerializer@f3395ffa,
 
reduceFunction=org.apache.flink.streaming.examples.socket.SocketWindowWordCount$1@4201c465},
 ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java-301))
{code}

would first drop the instance references.  The stable version would be:
 
{code:java}
TriggerWindow(TumblingProcessingTimeWindows(5000), 
ReducingStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.PojoSerializer,
 
reduceFunction=org.apache.flink.streaming.examples.socket.SocketWindowWordCount$1},
 ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java-301))
{code}

and then the compressed name would be the first ten valid characters plus the 
hash of the stable string:

{code}
TriggerWin_d8c007da
{code}

This is just one way of dealing with unruly default names, the main point would 
be to preserve the metrics so they are valid, avoid truncation, and can be 
aggregated along other dimensions even if this particular dimension is hard to 
parse after the compression.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6998) Kafka connector needs to expose metrics for failed/successful offset commits in the Kafka Consumer callback

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063927#comment-16063927
 ] 

ASF GitHub Bot commented on FLINK-6998:
---

GitHub user zhenzhongxu opened a pull request:

https://github.com/apache/flink/pull/4187

[FLINK-6998][Kafka Connector] Add kafka offset commit metrics in cons…

 add "kafkaCommitsSucceeded" and "kafkaCommitsFailed" metrics in 
KafkaConsumerThread class.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhenzhongxu/flink FLINK-6998

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4187.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4187


commit 7d3f0af6511d732c3cc2cb5231d76f9e8a12e684
Author: Zhenzhong Xu 
Date:   2017-06-26T22:51:09Z

[FLINK-6998][Kafka Connector] Add kafka offset commit metrics in consumer 
callback




> Kafka connector needs to expose metrics for failed/successful offset commits 
> in the Kafka Consumer callback
> ---
>
> Key: FLINK-6998
> URL: https://issues.apache.org/jira/browse/FLINK-6998
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Zhenzhong Xu
>Assignee: Zhenzhong Xu
>
> Propose to add "kafkaCommitsSucceeded" and "kafkaCommitsFailed" metrics in 
> KafkaConsumerThread class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4187: [FLINK-6998][Kafka Connector] Add kafka offset com...

2017-06-26 Thread zhenzhongxu
GitHub user zhenzhongxu opened a pull request:

https://github.com/apache/flink/pull/4187

[FLINK-6998][Kafka Connector] Add kafka offset commit metrics in cons…

 add "kafkaCommitsSucceeded" and "kafkaCommitsFailed" metrics in 
KafkaConsumerThread class.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhenzhongxu/flink FLINK-6998

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4187.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4187


commit 7d3f0af6511d732c3cc2cb5231d76f9e8a12e684
Author: Zhenzhong Xu 
Date:   2017-06-26T22:51:09Z

[FLINK-6998][Kafka Connector] Add kafka offset commit metrics in consumer 
callback




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-26 Thread Cliff Resnick (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063768#comment-16063768
 ] 

Cliff Resnick commented on FLINK-6964:
--

[~srichter] So far, looks good! I need to leave early today but I'll hit it a 
few more times this evening just to be sure. 

> Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore
> --
>
> Key: FLINK-6964
> URL: https://issues.apache.org/jira/browse/FLINK-6964
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>
> {{StandaloneCompletedCheckpointStore}} does not register shared states ion 
> resume. However, for externalized checkpoints, it register the checkpoint 
> from which it resumed. This checkpoint gets added to the completed checkpoint 
> store as part of resume.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6964) Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore

2017-06-26 Thread Stefan Richter (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063653#comment-16063653
 ] 

Stefan Richter commented on FLINK-6964:
---

[~cre...@gmail.com] I think I have figured out the second problem. Updated my 
branch.

> Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore
> --
>
> Key: FLINK-6964
> URL: https://issues.apache.org/jira/browse/FLINK-6964
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Reporter: Stefan Richter
>Assignee: Stefan Richter
>
> {{StandaloneCompletedCheckpointStore}} does not register shared states ion 
> resume. However, for externalized checkpoints, it register the checkpoint 
> from which it resumed. This checkpoint gets added to the completed checkpoint 
> store as part of resume.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-6665) Pass a ScheduledExecutorService to the RestartStrategy

2017-06-26 Thread Lim Chee Hau (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lim Chee Hau updated FLINK-6665:

Description: 
Currently, the {{RestartStrategy}} is called when the {{ExecutionGraph}} should 
be restarted.

To facilitate delays before restarting, the strategy simply sleeps, blocking 
the thread that runs the ExecutionGraph's recovery method.

I suggest to pass {{ScheduledExecutorService}}) to the {{RestartStrategy}} and 
let it schedule the restart call that way, avoiding any sleeps.

  was:
* Currently, the {{RestartStrategy}} is called when the {{ExecutionGraph}} 
should be restarted.

To facilitate delays before restarting, the strategy simply sleeps, blocking 
the thread that runs the ExecutionGraph's recovery method.

I suggest to pass {{ScheduledExecutorService}}) to the {{RestartStrategy}} and 
let it schedule the restart call that way, avoiding any sleeps.


> Pass a ScheduledExecutorService to the RestartStrategy
> --
>
> Key: FLINK-6665
> URL: https://issues.apache.org/jira/browse/FLINK-6665
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Reporter: Stephan Ewen
> Fix For: 1.4.0
>
>
> Currently, the {{RestartStrategy}} is called when the {{ExecutionGraph}} 
> should be restarted.
> To facilitate delays before restarting, the strategy simply sleeps, blocking 
> the thread that runs the ExecutionGraph's recovery method.
> I suggest to pass {{ScheduledExecutorService}}) to the {{RestartStrategy}} 
> and let it schedule the restart call that way, avoiding any sleeps.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-6665) Pass a ScheduledExecutorService to the RestartStrategy

2017-06-26 Thread Lim Chee Hau (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lim Chee Hau updated FLINK-6665:

Description: 
* Currently, the {{RestartStrategy}} is called when the {{ExecutionGraph}} 
should be restarted.

To facilitate delays before restarting, the strategy simply sleeps, blocking 
the thread that runs the ExecutionGraph's recovery method.

I suggest to pass {{ScheduledExecutorService}}) to the {{RestartStrategy}} and 
let it schedule the restart call that way, avoiding any sleeps.

  was:
Currently, the {{RestartStrategy}} is called when the {{ExecutionGraph}} should 
be restarted.

To facilitate delays before restarting, the strategy simply sleeps, blocking 
the thread that runs the ExecutionGraph's recovery method.

I suggest to pass {{ScheduledExecutorService}}) to the {{RestartStrategy}} and 
let it schedule the restart call that way, avoiding any sleeps.


> Pass a ScheduledExecutorService to the RestartStrategy
> --
>
> Key: FLINK-6665
> URL: https://issues.apache.org/jira/browse/FLINK-6665
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Coordination
>Reporter: Stephan Ewen
> Fix For: 1.4.0
>
>
> * Currently, the {{RestartStrategy}} is called when the {{ExecutionGraph}} 
> should be restarted.
> To facilitate delays before restarting, the strategy simply sleeps, blocking 
> the thread that runs the ExecutionGraph's recovery method.
> I suggest to pass {{ScheduledExecutorService}}) to the {{RestartStrategy}} 
> and let it schedule the restart call that way, avoiding any sleeps.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7008) Update NFA state only when the NFA changes.

2017-06-26 Thread Kostas Kloudas (JIRA)
Kostas Kloudas created FLINK-7008:
-

 Summary: Update NFA state only when the NFA changes.
 Key: FLINK-7008
 URL: https://issues.apache.org/jira/browse/FLINK-7008
 Project: Flink
  Issue Type: Improvement
  Components: CEP
Affects Versions: 1.3.1
Reporter: Kostas Kloudas
 Fix For: 1.4.0


Currently in the {{AbstractKeyedCEPPatternOperator.updateNFA()}} method we 
update the NFA state every time the NFA is touched. This leads to redundant 
puts/gets to the state when there are no changes to the NFA itself.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-6998) Kafka connector needs to expose metrics for failed/successful offset commits in the Kafka Consumer callback

2017-06-26 Thread Zhenzhong Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenzhong Xu updated FLINK-6998:

Description: Propose to add "kafkaCommitsSucceeded" and 
"kafkaCommitsFailed" metrics in KafkaConsumerThread class.  (was: Propose to 
add "kafka-commits-succeeded" and "kafka-commits-failed" metrics in 
KafkaConsumerThread class.)

> Kafka connector needs to expose metrics for failed/successful offset commits 
> in the Kafka Consumer callback
> ---
>
> Key: FLINK-6998
> URL: https://issues.apache.org/jira/browse/FLINK-6998
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Zhenzhong Xu
>Assignee: Zhenzhong Xu
>
> Propose to add "kafkaCommitsSucceeded" and "kafkaCommitsFailed" metrics in 
> KafkaConsumerThread class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-6998) Kafka connector needs to expose metrics for failed/successful offset commits in the Kafka Consumer callback

2017-06-26 Thread Zhenzhong Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenzhong Xu updated FLINK-6998:

Description: Propose to add "kafka-commits-succeeded" and 
"kafka-commits-failed" metrics in KafkaConsumerThread class.  (was: Propose to 
add "successful-commits" and "failed-commits" metrics in KafkaConsumerThread 
class.)

> Kafka connector needs to expose metrics for failed/successful offset commits 
> in the Kafka Consumer callback
> ---
>
> Key: FLINK-6998
> URL: https://issues.apache.org/jira/browse/FLINK-6998
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Zhenzhong Xu
>Assignee: Zhenzhong Xu
>
> Propose to add "kafka-commits-succeeded" and "kafka-commits-failed" metrics 
> in KafkaConsumerThread class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6418) Support for dynamic state changes in CEP patterns

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063496#comment-16063496
 ] 

ASF GitHub Bot commented on FLINK-6418:
---

Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/4143
  
LGTM. +1 to merge.


> Support for dynamic state changes in CEP patterns
> -
>
> Key: FLINK-6418
> URL: https://issues.apache.org/jira/browse/FLINK-6418
> Project: Flink
>  Issue Type: Improvement
>  Components: CEP
>Affects Versions: 1.3.0
>Reporter: Elias Levy
>Assignee: Dawid Wysakowicz
>
> Flink CEP library allows one to define event pattern to match where the match 
> condition can be determined programmatically via the {{where}} method.  Flink 
> 1.3 will introduce so-called iterative conditions, which allow the predicate 
> to look up events already matched by the pattern and thus be conditional on 
> them.
> 1.3 also introduces to the API quantifer methods which allow one to 
> declaratively specific how many times a condition must be matched before 
> there is a state change.
> Alas, there are use cases where the quantifier must be determined dynamically 
> based on the events matched by the pattern so far.  Therefore, I propose the 
> adding of a new {{Pattern}}: {{until}}.
> Like the new iterative variant of {{where}}, {{until}} would take a predicate 
> function and a context that provides access to events already matched.  But 
> whereas {{where}} determines if an event is accepted by the pattern, 
> {{until}} determines whether is pattern should move on to the next state.
> In our particular use case, we have a pattern where an event is matched a 
> number of times, but depending on the event type, the number (threshold) for 
> the pattern to match is different.  We could decompose the pattern into 
> multiple similar patterns, but that could be inefficient if we have many such 
> patterns.  If the functionality of {{until}} were available, we could make do 
> with a single pattern.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4143: [FLINK-6418][cep] Support for dynamic state changes in CE...

2017-06-26 Thread kl0u
Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/4143
  
LGTM. +1 to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6983) Do not serialize States with NFA

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063478#comment-16063478
 ] 

ASF GitHub Bot commented on FLINK-6983:
---

Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/4172
  
Hi @dianfu , this branch seems to be broken. Most of the migration tests 
fail when you run them locally. I will keep on checking out the branch, just to 
see the general idea of the change because this may give some ideas on how to 
handle the `IterativeCondition`s in this PR 
https://github.com/apache/flink/pull/4145 .


> Do not serialize States with NFA
> 
>
> Key: FLINK-6983
> URL: https://issues.apache.org/jira/browse/FLINK-6983
> Project: Flink
>  Issue Type: Improvement
>  Components: CEP
>Reporter: Dawid Wysakowicz
>Assignee: Dian Fu
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4172: [FLINK-6983] [cep] Do not serialize States with NFA

2017-06-26 Thread kl0u
Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/4172
  
Hi @dianfu , this branch seems to be broken. Most of the migration tests 
fail when you run them locally. I will keep on checking out the branch, just to 
see the general idea of the change because this may give some ideas on how to 
handle the `IterativeCondition`s in this PR 
https://github.com/apache/flink/pull/4145 .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (FLINK-5595) Add links to sub-sections in the left-hand navigation bar

2017-06-26 Thread Neelesh Srinivas Salian (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neelesh Srinivas Salian reassigned FLINK-5595:
--

Assignee: Neelesh Srinivas Salian  (was: Mike Winters)

> Add links to sub-sections in the left-hand navigation bar
> -
>
> Key: FLINK-5595
> URL: https://issues.apache.org/jira/browse/FLINK-5595
> Project: Flink
>  Issue Type: Improvement
>  Components: Project Website
>Reporter: Mike Winters
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: newbie, website
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Some pages on the Flink project site (such as 
> http://flink.apache.org/introduction.html) include a table of contents at the 
> top. The sections from the ToC are not exposed in the left-hand nav when the 
> page is active, but this could be a useful addition, especially for longer, 
> content-heavy pages. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6981) Add shaded guava dependency

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063285#comment-16063285
 ] 

ASF GitHub Bot commented on FLINK-6981:
---

Github user zentol closed the pull request at:

https://github.com/apache/flink-shaded/pull/1


> Add shaded guava dependency
> ---
>
> Key: FLINK-6981
> URL: https://issues.apache.org/jira/browse/FLINK-6981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #3270: [FLINK-4286] Have Kafka examples that use the Kafk...

2017-06-26 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/3270#discussion_r124046763
  
--- Diff: flink-examples/flink-examples-streaming/pom.xml ---
@@ -482,54 +502,19 @@ under the License.



+   





 
-   
+   

org.apache.maven.plugins
maven-shade-plugin


-   fat-jar-kafka-example
-   package
-   
-   shade
-   
-   
-   
false
-   
false
-   
false
-   
-   
-   
org.apache.flink.streaming.examples.kafka.ReadFromKafka
-   
-   
-   
Kafka
-   
-   
-   
-   
*
-   

-   
org/apache/flink/streaming/examples/kafka/**
-   
org/apache/flink/streaming/**
-   
org/apache/kafka/**
-   
org/apache/curator/**
-   
org/apache/zookeeper/**
-   
org/apache/jute/**
-   
org/I0Itec/**
-   
jline/**
-   
com/yammer/**
-   
kafka/**
-   

-   
-   
-   
-   
-   
--- End diff --

What's the reasoning for deleting this section?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4286) Have Kafka examples that use the Kafka 0.9 connector

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063284#comment-16063284
 ] 

ASF GitHub Bot commented on FLINK-4286:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/3270#discussion_r124046763
  
--- Diff: flink-examples/flink-examples-streaming/pom.xml ---
@@ -482,54 +502,19 @@ under the License.



+   





 
-   
+   

org.apache.maven.plugins
maven-shade-plugin


-   fat-jar-kafka-example
-   package
-   
-   shade
-   
-   
-   
false
-   
false
-   
false
-   
-   
-   
org.apache.flink.streaming.examples.kafka.ReadFromKafka
-   
-   
-   
Kafka
-   
-   
-   
-   
*
-   

-   
org/apache/flink/streaming/examples/kafka/**
-   
org/apache/flink/streaming/**
-   
org/apache/kafka/**
-   
org/apache/curator/**
-   
org/apache/zookeeper/**
-   
org/apache/jute/**
-   
org/I0Itec/**
-   
jline/**
-   
com/yammer/**
-   
kafka/**
-   

-   
-   
-   
-   
-   
--- End diff --

What's the reasoning for deleting this section?


> Have Kafka examples that use the Kafka 0.9 connector
> 
>
> Key: FLINK-4286
> URL: https://issues.apache.org/jira/browse/FLINK-4286
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Dmitrii Kniazev
>Priority: Minor
>  Labels: starter
>
> The {{ReadFromKafka}} and {{WriteIntoKafka}} examples use the 0.8 connector, 
> and the built example jar is named {{Kafka.jar}} under 
> {{examples/streaming/}} in the distributed package.
> Since we have different connectors for different Kafka versions, it would be 
> good to have examples for different versions, and package them as 
> {{Kafka08.jar}} and {{Kafka09.jar}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-4286) Have Kafka examples that use the Kafka 0.9 connector

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063281#comment-16063281
 ] 

ASF GitHub Bot commented on FLINK-4286:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/3270#discussion_r124046687
  
--- Diff: flink-examples/flink-examples-streaming/pom.xml ---
@@ -360,6 +360,26 @@ under the License.


 
+   
+   
+   Kafka09
+   package
+   
+   jar
+   
+   
+   
Kafka09
+   
+   

+   
org.apache.flink.streaming.examples.kafka.ReadFromKafka09
+   

+   
+   
+   
org/apache/flink/streaming/examples/kafka/ReadFromKafka09.class
+   
org/apache/flink/streaming/examples/kafka/WriteToKafka09.class
+   
+   
+   
--- End diff --

I don't think you can use the maven jar plugin for creating the kafka09 
example. This won't include the needed dependencies into the jar.


> Have Kafka examples that use the Kafka 0.9 connector
> 
>
> Key: FLINK-4286
> URL: https://issues.apache.org/jira/browse/FLINK-4286
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Dmitrii Kniazev
>Priority: Minor
>  Labels: starter
>
> The {{ReadFromKafka}} and {{WriteIntoKafka}} examples use the 0.8 connector, 
> and the built example jar is named {{Kafka.jar}} under 
> {{examples/streaming/}} in the distributed package.
> Since we have different connectors for different Kafka versions, it would be 
> good to have examples for different versions, and package them as 
> {{Kafka08.jar}} and {{Kafka09.jar}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #3270: [FLINK-4286] Have Kafka examples that use the Kafk...

2017-06-26 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/3270#discussion_r124046687
  
--- Diff: flink-examples/flink-examples-streaming/pom.xml ---
@@ -360,6 +360,26 @@ under the License.


 
+   
+   
+   Kafka09
+   package
+   
+   jar
+   
+   
+   
Kafka09
+   
+   

+   
org.apache.flink.streaming.examples.kafka.ReadFromKafka09
+   

+   
+   
+   
org/apache/flink/streaming/examples/kafka/ReadFromKafka09.class
+   
org/apache/flink/streaming/examples/kafka/WriteToKafka09.class
+   
+   
+   
--- End diff --

I don't think you can use the maven jar plugin for creating the kafka09 
example. This won't include the needed dependencies into the jar.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6981) Add shaded guava dependency

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063278#comment-16063278
 ] 

ASF GitHub Bot commented on FLINK-6981:
---

Github user rmetzger commented on the issue:

https://github.com/apache/flink-shaded/pull/1
  
Thanks.

+1 to merge


> Add shaded guava dependency
> ---
>
> Key: FLINK-6981
> URL: https://issues.apache.org/jira/browse/FLINK-6981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6221) Add Prometheus support to metrics

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063277#comment-16063277
 ] 

ASF GitHub Bot commented on FLINK-6221:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/3833
  
Finally found time to try this out, and it works great, merging.


> Add Prometheus support to metrics
> -
>
> Key: FLINK-6221
> URL: https://issues.apache.org/jira/browse/FLINK-6221
> Project: Flink
>  Issue Type: Improvement
>  Components: Metrics
>Affects Versions: 1.2.0
>Reporter: Joshua Griffith
>Assignee: Maximilian Bode
>Priority: Minor
>
> [Prometheus|https://prometheus.io/] is becoming popular for metrics and 
> alerting. It's possible to use 
> [statsd-exporter|https://github.com/prometheus/statsd_exporter] to load Flink 
> metrics into Prometheus but it would be far easier if Flink supported 
> Promethus as a metrics reporter. A [dropwizard 
> client|https://github.com/prometheus/client_java/tree/master/simpleclient_dropwizard]
>  exists that could be integrated into the existing metrics system.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #3833: [FLINK-6221] Add PrometheusReporter

2017-06-26 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/3833
  
Finally found time to try this out, and it works great, merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6388) Add support for DISTINCT into Code Generated Aggregations

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063271#comment-16063271
 ] 

ASF GitHub Bot commented on FLINK-6388:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124036796
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
 ---
@@ -533,7 +600,25 @@ class CodeGenerator(
  |$reset
  |  }""".stripMargin
 }
-
+
+var existDistinct = false
+if (distinctAggsFlags.isDefined){
+  val distAggsFlags = distinctAggsFlags.get
+  for(i <- distAggsFlags.indices){
+if(distAggsFlags(i)){ existDistinct = true }
+  }
+}
+if(existDistinct){
+ val initReusMember = {
--- End diff --

`initReusMember` -> `initReuseMember`


> Add support for DISTINCT into Code Generated Aggregations
> -
>
> Key: FLINK-6388
> URL: https://issues.apache.org/jira/browse/FLINK-6388
> Project: Flink
>  Issue Type: Sub-task
>  Components: DataStream API
>Affects Versions: 1.3.0
>Reporter: Stefano Bortoli
>Assignee: Stefano Bortoli
> Fix For: 1.4.0
>
>
> We should support DISTINCT in Code Generated aggrgation functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6388) Add support for DISTINCT into Code Generated Aggregations

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063272#comment-16063272
 ] 

ASF GitHub Bot commented on FLINK-6388:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124030194
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
 ---
@@ -296,6 +299,39 @@ class CodeGenerator(
   fields.mkString(", ")
 }
 
+def genInitialize(): String = {
+  
+  val sig: String = 
+j"""
+   |  public void initialize(
+   |org.apache.flink.api.common.functions.RuntimeContext ctx
+   |  )""".stripMargin
+
+  val initDist: String = if( distinctAggsFlags.isDefined ) {
+val statePackage = "org.apache.flink.api.common.state"
+val distAggsFlags = distinctAggsFlags.get
+  for(i <- distAggsFlags.indices) yield
+if(distAggsFlags(i)) {
+  val typeString = javaTypes(aggFields(i)(0))
--- End diff --

UDAGGs can have more than a single parameter.

Since the key can only be a single object, we have to put all arguments 
into a `TupleX` (I think the limitation of 25 fields is reasonable and we can 
throw an exception if we observe a DISTINCT UDAGG with more than 25 fields).


> Add support for DISTINCT into Code Generated Aggregations
> -
>
> Key: FLINK-6388
> URL: https://issues.apache.org/jira/browse/FLINK-6388
> Project: Flink
>  Issue Type: Sub-task
>  Components: DataStream API
>Affects Versions: 1.3.0
>Reporter: Stefano Bortoli
>Assignee: Stefano Bortoli
> Fix For: 1.4.0
>
>
> We should support DISTINCT in Code Generated aggrgation functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6388) Add support for DISTINCT into Code Generated Aggregations

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063267#comment-16063267
 ] 

ASF GitHub Bot commented on FLINK-6388:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124029025
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
 ---
@@ -296,6 +299,39 @@ class CodeGenerator(
   fields.mkString(", ")
 }
 
+def genInitialize(): String = {
+  
+  val sig: String = 
+j"""
+   |  public void initialize(
+   |org.apache.flink.api.common.functions.RuntimeContext ctx
+   |  )""".stripMargin
+
+  val initDist: String = if( distinctAggsFlags.isDefined ) {
+val statePackage = "org.apache.flink.api.common.state"
+val distAggsFlags = distinctAggsFlags.get
+  for(i <- distAggsFlags.indices) yield
--- End diff --

It would be good to share distinct state across aggregations if they are on 
the same fields, i.e., make distinct state distinct ;-). I would do this in a 
preprocessing step in the `generateAggregations` method.

It would also change the way how we access the state, because we may 
increment / decrement a key only once per record (call of `accumulate` if state 
is shared.


> Add support for DISTINCT into Code Generated Aggregations
> -
>
> Key: FLINK-6388
> URL: https://issues.apache.org/jira/browse/FLINK-6388
> Project: Flink
>  Issue Type: Sub-task
>  Components: DataStream API
>Affects Versions: 1.3.0
>Reporter: Stefano Bortoli
>Assignee: Stefano Bortoli
> Fix For: 1.4.0
>
>
> We should support DISTINCT in Code Generated aggrgation functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6388) Add support for DISTINCT into Code Generated Aggregations

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063270#comment-16063270
 ] 

ASF GitHub Bot commented on FLINK-6388:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124037411
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/GeneratedAggregations.scala
 ---
@@ -20,16 +20,22 @@ package org.apache.flink.table.runtime.aggregate
 
 import org.apache.flink.api.common.functions.Function
 import org.apache.flink.types.Row
+import org.apache.flink.api.common.functions.RuntimeContext
 
 /**
   * Base class for code-generated aggregations.
   */
 abstract class GeneratedAggregations extends Function {
+  
+  /**
+* Initialize the state for the distinct aggregation check
+*
+* @param ctx the runtime context to retrieve and initialize the 
distinct states
+*/
+  def initialize(ctx: RuntimeContext)
 
   /**
-* Sets the results of the aggregations (partial or final) to the 
output row.
--- End diff --

Why would you like to change this comment?


> Add support for DISTINCT into Code Generated Aggregations
> -
>
> Key: FLINK-6388
> URL: https://issues.apache.org/jira/browse/FLINK-6388
> Project: Flink
>  Issue Type: Sub-task
>  Components: DataStream API
>Affects Versions: 1.3.0
>Reporter: Stefano Bortoli
>Assignee: Stefano Bortoli
> Fix For: 1.4.0
>
>
> We should support DISTINCT in Code Generated aggrgation functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6388) Add support for DISTINCT into Code Generated Aggregations

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063266#comment-16063266
 ] 

ASF GitHub Bot commented on FLINK-6388:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124028061
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
 ---
@@ -296,6 +299,39 @@ class CodeGenerator(
   fields.mkString(", ")
 }
 
+def genInitialize(): String = {
+  
+  val sig: String = 
+j"""
+   |  public void initialize(
+   |org.apache.flink.api.common.functions.RuntimeContext ctx
+   |  )""".stripMargin
+
+  val initDist: String = if( distinctAggsFlags.isDefined ) {
+val statePackage = "org.apache.flink.api.common.state"
+val distAggsFlags = distinctAggsFlags.get
+  for(i <- distAggsFlags.indices) yield
--- End diff --

please add a space after `for`, `if`, `while`, etc.


> Add support for DISTINCT into Code Generated Aggregations
> -
>
> Key: FLINK-6388
> URL: https://issues.apache.org/jira/browse/FLINK-6388
> Project: Flink
>  Issue Type: Sub-task
>  Components: DataStream API
>Affects Versions: 1.3.0
>Reporter: Stefano Bortoli
>Assignee: Stefano Bortoli
> Fix For: 1.4.0
>
>
> We should support DISTINCT in Code Generated aggrgation functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6388) Add support for DISTINCT into Code Generated Aggregations

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063269#comment-16063269
 ] 

ASF GitHub Bot commented on FLINK-6388:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124037152
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
 ---
@@ -533,7 +600,25 @@ class CodeGenerator(
  |$reset
  |  }""".stripMargin
 }
-
+
+var existDistinct = false
+if (distinctAggsFlags.isDefined){
+  val distAggsFlags = distinctAggsFlags.get
+  for(i <- distAggsFlags.indices){
+if(distAggsFlags(i)){ existDistinct = true }
+  }
+}
+if(existDistinct){
+ val initReusMember = {
--- End diff --

The code for the distinct deduplication and tuple reusage for 
multi-argument aggregate function could go here.


> Add support for DISTINCT into Code Generated Aggregations
> -
>
> Key: FLINK-6388
> URL: https://issues.apache.org/jira/browse/FLINK-6388
> Project: Flink
>  Issue Type: Sub-task
>  Components: DataStream API
>Affects Versions: 1.3.0
>Reporter: Stefano Bortoli
>Assignee: Stefano Bortoli
> Fix For: 1.4.0
>
>
> We should support DISTINCT in Code Generated aggrgation functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #3783: [FLINK-6388] Add support for DISTINCT into Code Ge...

2017-06-26 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124037152
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
 ---
@@ -533,7 +600,25 @@ class CodeGenerator(
  |$reset
  |  }""".stripMargin
 }
-
+
+var existDistinct = false
+if (distinctAggsFlags.isDefined){
+  val distAggsFlags = distinctAggsFlags.get
+  for(i <- distAggsFlags.indices){
+if(distAggsFlags(i)){ existDistinct = true }
+  }
+}
+if(existDistinct){
+ val initReusMember = {
--- End diff --

The code for the distinct deduplication and tuple reusage for 
multi-argument aggregate function could go here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3783: [FLINK-6388] Add support for DISTINCT into Code Ge...

2017-06-26 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124028061
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
 ---
@@ -296,6 +299,39 @@ class CodeGenerator(
   fields.mkString(", ")
 }
 
+def genInitialize(): String = {
+  
+  val sig: String = 
+j"""
+   |  public void initialize(
+   |org.apache.flink.api.common.functions.RuntimeContext ctx
+   |  )""".stripMargin
+
+  val initDist: String = if( distinctAggsFlags.isDefined ) {
+val statePackage = "org.apache.flink.api.common.state"
+val distAggsFlags = distinctAggsFlags.get
+  for(i <- distAggsFlags.indices) yield
--- End diff --

please add a space after `for`, `if`, `while`, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6388) Add support for DISTINCT into Code Generated Aggregations

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063268#comment-16063268
 ] 

ASF GitHub Bot commented on FLINK-6388:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124033513
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
 ---
@@ -335,15 +371,30 @@ class CodeGenerator(
 j"""
 |  public final void accumulate(
 |org.apache.flink.types.Row accs,
-|org.apache.flink.types.Row input)""".stripMargin
+|org.apache.flink.types.Row input) throws 
Exception""".stripMargin
 
+  val distAggsFlags: Array[Boolean] = distinctAggsFlags.getOrElse(new 
Array[Boolean](0))
   val accumulate: String = {
 for (i <- aggs.indices) yield
-  j"""
- |${aggs(i)}.accumulate(
- |  ((${accTypes(i)}) accs.getField($i)),
- |  ${parameters(i)});""".stripMargin
-  }.mkString("\n")
+  if (distinctAggsFlags.isDefined && distAggsFlags(i)){
+j"""
+   |  Long distValCount$i = (Long) 
distStateList[$i].get(${parameters(i)});
--- End diff --

We need to put the parameters into a tuple before accessing the map state. 
The tuple should be reused.


> Add support for DISTINCT into Code Generated Aggregations
> -
>
> Key: FLINK-6388
> URL: https://issues.apache.org/jira/browse/FLINK-6388
> Project: Flink
>  Issue Type: Sub-task
>  Components: DataStream API
>Affects Versions: 1.3.0
>Reporter: Stefano Bortoli
>Assignee: Stefano Bortoli
> Fix For: 1.4.0
>
>
> We should support DISTINCT in Code Generated aggrgation functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #3783: [FLINK-6388] Add support for DISTINCT into Code Ge...

2017-06-26 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124030194
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
 ---
@@ -296,6 +299,39 @@ class CodeGenerator(
   fields.mkString(", ")
 }
 
+def genInitialize(): String = {
+  
+  val sig: String = 
+j"""
+   |  public void initialize(
+   |org.apache.flink.api.common.functions.RuntimeContext ctx
+   |  )""".stripMargin
+
+  val initDist: String = if( distinctAggsFlags.isDefined ) {
+val statePackage = "org.apache.flink.api.common.state"
+val distAggsFlags = distinctAggsFlags.get
+  for(i <- distAggsFlags.indices) yield
+if(distAggsFlags(i)) {
+  val typeString = javaTypes(aggFields(i)(0))
--- End diff --

UDAGGs can have more than a single parameter.

Since the key can only be a single object, we have to put all arguments 
into a `TupleX` (I think the limitation of 25 fields is reasonable and we can 
throw an exception if we observe a DISTINCT UDAGG with more than 25 fields).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3783: [FLINK-6388] Add support for DISTINCT into Code Ge...

2017-06-26 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124036796
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
 ---
@@ -533,7 +600,25 @@ class CodeGenerator(
  |$reset
  |  }""".stripMargin
 }
-
+
+var existDistinct = false
+if (distinctAggsFlags.isDefined){
+  val distAggsFlags = distinctAggsFlags.get
+  for(i <- distAggsFlags.indices){
+if(distAggsFlags(i)){ existDistinct = true }
+  }
+}
+if(existDistinct){
+ val initReusMember = {
--- End diff --

`initReusMember` -> `initReuseMember`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3783: [FLINK-6388] Add support for DISTINCT into Code Ge...

2017-06-26 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124033513
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
 ---
@@ -335,15 +371,30 @@ class CodeGenerator(
 j"""
 |  public final void accumulate(
 |org.apache.flink.types.Row accs,
-|org.apache.flink.types.Row input)""".stripMargin
+|org.apache.flink.types.Row input) throws 
Exception""".stripMargin
 
+  val distAggsFlags: Array[Boolean] = distinctAggsFlags.getOrElse(new 
Array[Boolean](0))
   val accumulate: String = {
 for (i <- aggs.indices) yield
-  j"""
- |${aggs(i)}.accumulate(
- |  ((${accTypes(i)}) accs.getField($i)),
- |  ${parameters(i)});""".stripMargin
-  }.mkString("\n")
+  if (distinctAggsFlags.isDefined && distAggsFlags(i)){
+j"""
+   |  Long distValCount$i = (Long) 
distStateList[$i].get(${parameters(i)});
--- End diff --

We need to put the parameters into a tuple before accessing the map state. 
The tuple should be reused.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3783: [FLINK-6388] Add support for DISTINCT into Code Ge...

2017-06-26 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124037411
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/GeneratedAggregations.scala
 ---
@@ -20,16 +20,22 @@ package org.apache.flink.table.runtime.aggregate
 
 import org.apache.flink.api.common.functions.Function
 import org.apache.flink.types.Row
+import org.apache.flink.api.common.functions.RuntimeContext
 
 /**
   * Base class for code-generated aggregations.
   */
 abstract class GeneratedAggregations extends Function {
+  
+  /**
+* Initialize the state for the distinct aggregation check
+*
+* @param ctx the runtime context to retrieve and initialize the 
distinct states
+*/
+  def initialize(ctx: RuntimeContext)
 
   /**
-* Sets the results of the aggregations (partial or final) to the 
output row.
--- End diff --

Why would you like to change this comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3783: [FLINK-6388] Add support for DISTINCT into Code Ge...

2017-06-26 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/3783#discussion_r124029025
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/codegen/CodeGenerator.scala
 ---
@@ -296,6 +299,39 @@ class CodeGenerator(
   fields.mkString(", ")
 }
 
+def genInitialize(): String = {
+  
+  val sig: String = 
+j"""
+   |  public void initialize(
+   |org.apache.flink.api.common.functions.RuntimeContext ctx
+   |  )""".stripMargin
+
+  val initDist: String = if( distinctAggsFlags.isDefined ) {
+val statePackage = "org.apache.flink.api.common.state"
+val distAggsFlags = distinctAggsFlags.get
+  for(i <- distAggsFlags.indices) yield
--- End diff --

It would be good to share distinct state across aggregations if they are on 
the same fields, i.e., make distinct state distinct ;-). I would do this in a 
preprocessing step in the `generateAggregations` method.

It would also change the way how we access the state, because we may 
increment / decrement a key only once per record (call of `accumulate` if state 
is shared.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6981) Add shaded guava dependency

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063252#comment-16063252
 ] 

ASF GitHub Bot commented on FLINK-6981:
---

Github user zentol commented on the issue:

https://github.com/apache/flink-shaded/pull/1
  
@rmetzger I've addressed your comments.


> Add shaded guava dependency
> ---
>
> Key: FLINK-6981
> URL: https://issues.apache.org/jira/browse/FLINK-6981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7004) Switch to Travis Trusty image

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063249#comment-16063249
 ] 

ASF GitHub Bot commented on FLINK-7004:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4182
  
huh, looks like there's a problem with hadoop2.6.5 + openjdk7. The 
`WordCountMapreduceITCase` fails for this profile, and also did so in a 
previous build (see #4167).
```
Running org.apache.flink.test.hadoop.mapreduce.WordCountMapreduceITCase
zip file closed
java.lang.IllegalStateException: zip file closed
at java.util.zip.ZipFile.ensureOpen(ZipFile.java:634)
at java.util.zip.ZipFile.getInputStream(ZipFile.java:347)
at java.util.jar.JarFile.getInputStream(JarFile.java:412)
at 
sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:162)
at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:226)
at javax.xml.parsers.SecuritySupport$4.run(SecuritySupport.java:94)
at java.security.AccessController.doPrivileged(Native Method)
at 
javax.xml.parsers.SecuritySupport.getResourceAsStream(SecuritySupport.java:87)
at 
javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:283)
at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:255)
at 
javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121)
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2467)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2444)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2361)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:968)
at 
org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2010)
at org.apache.hadoop.mapred.JobConf.(JobConf.java:449)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:187)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:168)
at 
org.apache.flink.hadoopcompatibility.HadoopInputs.readHadoopFile(HadoopInputs.java:104)
at 
org.apache.flink.test.hadoop.mapreduce.WordCountMapreduceITCase.internalRun(WordCountMapreduceITCase.java:78)
at 
org.apache.flink.test.hadoop.mapreduce.WordCountMapreduceITCase.testProgram(WordCountMapreduceITCase.java:67)
at 
org.apache.flink.test.util.JavaProgramTestBase.testJobWithoutObjectReuse(JavaProgramTestBase.java:180)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
at 

[GitHub] flink issue #4182: [FLINK-7004] Switch to Travis Trusty image

2017-06-26 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4182
  
huh, looks like there's a problem with hadoop2.6.5 + openjdk7. The 
`WordCountMapreduceITCase` fails for this profile, and also did so in a 
previous build (see #4167).
```
Running org.apache.flink.test.hadoop.mapreduce.WordCountMapreduceITCase
zip file closed
java.lang.IllegalStateException: zip file closed
at java.util.zip.ZipFile.ensureOpen(ZipFile.java:634)
at java.util.zip.ZipFile.getInputStream(ZipFile.java:347)
at java.util.jar.JarFile.getInputStream(JarFile.java:412)
at 
sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:162)
at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:226)
at javax.xml.parsers.SecuritySupport$4.run(SecuritySupport.java:94)
at java.security.AccessController.doPrivileged(Native Method)
at 
javax.xml.parsers.SecuritySupport.getResourceAsStream(SecuritySupport.java:87)
at 
javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:283)
at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:255)
at 
javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121)
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2467)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2444)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2361)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:968)
at 
org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2010)
at org.apache.hadoop.mapred.JobConf.(JobConf.java:449)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:187)
at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:168)
at 
org.apache.flink.hadoopcompatibility.HadoopInputs.readHadoopFile(HadoopInputs.java:104)
at 
org.apache.flink.test.hadoop.mapreduce.WordCountMapreduceITCase.internalRun(WordCountMapreduceITCase.java:78)
at 
org.apache.flink.test.hadoop.mapreduce.WordCountMapreduceITCase.testProgram(WordCountMapreduceITCase.java:67)
at 
org.apache.flink.test.util.JavaProgramTestBase.testJobWithoutObjectReuse(JavaProgramTestBase.java:180)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature

[jira] [Commented] (FLINK-6990) Poor performance with Sliding Time Windows

2017-06-26 Thread Brice Bingman (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063245#comment-16063245
 ] 

Brice Bingman commented on FLINK-6990:
--

[~jark] Good to hear.  In the meantime I will look into using a ProcessFunction.

> Poor performance with Sliding Time Windows
> --
>
> Key: FLINK-6990
> URL: https://issues.apache.org/jira/browse/FLINK-6990
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API, Streaming
>Affects Versions: 1.3.0
> Environment: OSX 10.11.4
> 2.8 GHz Intel Core i7
> 16 GB 1600 MHz DDR3
>Reporter: Brice Bingman
>
> I'm experiencing poor performance when using sliding time windows.  Here is a 
> simple example that performs poorly for me:
> {code:java}
> public class FlinkPerfTest {
> public static void main(String[] args) throws Exception {
> StreamExecutionEnvironment see = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> //Streaming 10,000 events per second
> see.addSource(new SourceFunction() {
> transient ScheduledExecutorService executor;
> @Override
> public synchronized void run(final SourceContext ctx) 
> throws Exception {
> executor = Executors.newSingleThreadScheduledExecutor();
> executor.scheduleAtFixedRate(new Runnable() {
> @Override
> public void run() {
> for (int k = 0; k < 10; k++) {
> for (int i = 0; i < 1000; i++) {
> TestObject obj = new TestObject();
> obj.setKey(k);
> ctx.collect(obj);
> }
> }
> }
> }, 0, 1, TimeUnit.SECONDS);
> this.wait();
> }
> @Override
> public synchronized void cancel() {
> executor.shutdown();
> this.notify();
> }
> }).keyBy("key")
> .window(SlidingProcessingTimeWindows.of(Time.minutes(10), 
> Time.seconds(1))).apply(new WindowFunction TimeWindow>() {
> @Override
> public void apply(Tuple key, TimeWindow window, 
> Iterable input, Collector out) throws Exception {
> int count = 0;
> for (Object obj : input) {
> count++;
> }
> out.collect(key.getField(0) + ": " + count);
> }
> })
> .print();
> see.execute();
> }
> public static class TestObject {
> private Integer key;
> public Integer getKey() {
> return key;
> }
> public void setKey(Integer key) {
> this.key = key;
> }
> }
> }
> {code}
> When running this, flink periodically pauses for long periods of time.  I 
> would expect a steady stream of output at 1 second intervals.  For 
> comparison, you can switch to a count window of similar size which peforms 
> just fine:
> {code:java}
>.countWindow(60, 1000).apply(new 
> WindowFunction() {
> @Override
> public void apply(Tuple key, GlobalWindow window, 
> Iterable input, Collector out) throws Exception {
> int count = 0;
> for (Object obj : input) {
> count++;
> }
> out.collect(key.getField(0) + ": " + count);
> }
> })
> {code}
> I would expect the sliding time window to perform similarly to a count 
> window.  The sliding time window also uses significantly more cpu and memory 
> than the count window.  I would also expect resource consumption to be 
> similar.
> A possible cause could be that the SystemProcessingTimeService.TriggerTask is 
> locking with the checkpointLock which acts like a global lock.  There should 
> be a lock per key or preferably a lock-less solution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6981) Add shaded guava dependency

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063243#comment-16063243
 ] 

ASF GitHub Bot commented on FLINK-6981:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink-shaded/pull/1#discussion_r124037728
  
--- Diff: flink-shaded-guava-18/pom.xml ---
@@ -0,0 +1,99 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
+
+4.0.0
+
+
+org.apache.flink
+flink-shaded
+1.0
+..
+
+
+flink-shaded-guava-18
+flink-shaded-guava-18
+1.0-${guava.version}
+
+jar
+
+
+18.0
+
+
+
+
+com.google.guava
+guava
+${guava.version}
+true
--- End diff --

I remembered why i did it this way. When relying on the shade-plugin the 
dependency is only hidden when working against the dependency-reduced pom, i.e 
when pulling the dependency from maven.

But in an earlier version this module was part of Flink. This caused the 
IDE to pick up guava as a transitive dependency, as did `mvn dependency:tree`.

We can change it now though.


> Add shaded guava dependency
> ---
>
> Key: FLINK-6981
> URL: https://issues.apache.org/jira/browse/FLINK-6981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-7003) "select * from" in Flink SQL should not flatten all fields in the table by default

2017-06-26 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-7003:
--
Component/s: Table API & SQL

> "select * from" in Flink SQL should not flatten all fields in the table by 
> default
> --
>
> Key: FLINK-7003
> URL: https://issues.apache.org/jira/browse/FLINK-7003
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Reporter: Shuyi Chen
>
> Currently, CompositeRelDataType is extended from 
> RelRecordType(StructKind.PEEK_FIELDS, ...).  In Calcite, 
> StructKind.PEEK_FIELDS would allow us to peek fields for nested types. 
> However, when we use "select * from", calcite will flatten all nested fields 
> that is marked as StructKind.PEEK_FIELDS in the table. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6981) Add shaded guava dependency

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063227#comment-16063227
 ] 

ASF GitHub Bot commented on FLINK-6981:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink-shaded/pull/1#discussion_r124035371
  
--- Diff: flink-shaded-guava-18/pom.xml ---
@@ -0,0 +1,99 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
+
+4.0.0
+
+
+org.apache.flink
+flink-shaded
+1.0
+..
+
+
+flink-shaded-guava-18
+flink-shaded-guava-18
+1.0-${guava.version}
+
+jar
+
+
+18.0
+
+
+
+
+com.google.guava
+guava
+${guava.version}
+true
--- End diff --

If you enable `createDependencyReducedPom`, guava won't be exposed as a 
dependency neither.
I would suggest to mark this as a proper dependency (because in fact, it IS 
a dependency of this module. Also we might be exploiting a bug in the shade 
plugin to include optional deps)


> Add shaded guava dependency
> ---
>
> Key: FLINK-6981
> URL: https://issues.apache.org/jira/browse/FLINK-6981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6981) Add shaded guava dependency

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063225#comment-16063225
 ] 

ASF GitHub Bot commented on FLINK-6981:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink-shaded/pull/1#discussion_r124034795
  
--- Diff: flink-shaded-guava-18/pom.xml ---
@@ -0,0 +1,99 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
+
+4.0.0
+
+
+org.apache.flink
+flink-shaded
+1.0
+..
+
+
+flink-shaded-guava-18
+flink-shaded-guava-18
+1.0-${guava.version}
+
+jar
+
+
+18.0
+
+
+
+
+com.google.guava
+guava
+${guava.version}
+true
+
+
+
+
+
+
+org.apache.maven.plugins
+maven-shade-plugin
+
+
+shade-flink
+package
+
+shade
+
+
+
false
+
+
+com.google.guava:*
+
+
+
+
+com.google
+
org.apache.flink.shaded.guava18.com.google
+
+
com.google.protobuf.**
+
com.google.inject.**
--- End diff --

yup, you're right. I just saw it in other guava shading configurations. 
Will change it.


> Add shaded guava dependency
> ---
>
> Key: FLINK-6981
> URL: https://issues.apache.org/jira/browse/FLINK-6981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6981) Add shaded guava dependency

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063222#comment-16063222
 ] 

ASF GitHub Bot commented on FLINK-6981:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink-shaded/pull/1#discussion_r124034556
  
--- Diff: flink-shaded-guava-18/pom.xml ---
@@ -0,0 +1,99 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
+
+4.0.0
+
+
+org.apache.flink
+flink-shaded
+1.0
+..
+
+
+flink-shaded-guava-18
+flink-shaded-guava-18
+1.0-${guava.version}
+
+jar
+
+
+18.0
+
+
+
+
+com.google.guava
+guava
+${guava.version}
+true
+
+
+
+
+
+
+org.apache.maven.plugins
+maven-shade-plugin
+
+
+shade-flink
+package
+
+shade
+
+
+
false
--- End diff --

Since guava is marked as optional the dependency isn't exposed.


> Add shaded guava dependency
> ---
>
> Key: FLINK-6981
> URL: https://issues.apache.org/jira/browse/FLINK-6981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6981) Add shaded guava dependency

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063216#comment-16063216
 ] 

ASF GitHub Bot commented on FLINK-6981:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink-shaded/pull/1#discussion_r124033958
  
--- Diff: flink-shaded-guava-18/pom.xml ---
@@ -0,0 +1,99 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
+
+4.0.0
+
+
+org.apache.flink
+flink-shaded
+1.0
+..
+
+
+flink-shaded-guava-18
+flink-shaded-guava-18
+1.0-${guava.version}
+
+jar
+
+
+18.0
+
+
+
+
+com.google.guava
+guava
+${guava.version}
+true
--- End diff --

so that it isn't exposed as a dependency. It's only required during 
compilation but not at runtime.


> Add shaded guava dependency
> ---
>
> Key: FLINK-6981
> URL: https://issues.apache.org/jira/browse/FLINK-6981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6981) Add shaded guava dependency

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063210#comment-16063210
 ] 

ASF GitHub Bot commented on FLINK-6981:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink-shaded/pull/1#discussion_r124033232
  
--- Diff: flink-shaded-guava-18/pom.xml ---
@@ -0,0 +1,99 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
+
+4.0.0
+
+
+org.apache.flink
+flink-shaded
+1.0
+..
+
+
+flink-shaded-guava-18
+flink-shaded-guava-18
+1.0-${guava.version}
+
+jar
+
+
+18.0
+
+
+
+
+com.google.guava
+guava
+${guava.version}
+true
+
+
+
+
+
+
+org.apache.maven.plugins
+maven-shade-plugin
+
+
+shade-flink
+package
+
+shade
+
+
+
false
+
+
+com.google.guava:*
+
+
+
+
+com.google
+
org.apache.flink.shaded.guava18.com.google
+
+
com.google.protobuf.**
+
com.google.inject.**
--- End diff --

afaik there is only the `com.google.common` packages in guava?


> Add shaded guava dependency
> ---
>
> Key: FLINK-6981
> URL: https://issues.apache.org/jira/browse/FLINK-6981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6981) Add shaded guava dependency

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063209#comment-16063209
 ] 

ASF GitHub Bot commented on FLINK-6981:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink-shaded/pull/1#discussion_r124032886
  
--- Diff: flink-shaded-guava-18/pom.xml ---
@@ -0,0 +1,99 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
+
+4.0.0
+
+
+org.apache.flink
+flink-shaded
+1.0
+..
+
+
+flink-shaded-guava-18
+flink-shaded-guava-18
+1.0-${guava.version}
+
+jar
+
+
+18.0
+
+
+
+
+com.google.guava
+guava
+${guava.version}
+true
+
+
+
+
+
+
+org.apache.maven.plugins
+maven-shade-plugin
+
+
+shade-flink
+package
+
+shade
+
+
+
false
--- End diff --

Why is this set to `false`? I think it has to be `true`, otherwise the 
module will expose guava as a dependency.


> Add shaded guava dependency
> ---
>
> Key: FLINK-6981
> URL: https://issues.apache.org/jira/browse/FLINK-6981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6890) flink-dist Jar contains non-shaded Guava dependencies (built with Maven 3.0.5)

2017-06-26 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063207#comment-16063207
 ] 

Chesnay Schepler commented on FLINK-6890:
-

I couldn't reproduce this with 1.2.1 and maven 3.0.5. flink-dist did not 
contain any non-shaded guava classes.

> flink-dist Jar contains non-shaded Guava dependencies (built with Maven 3.0.5)
> --
>
> Key: FLINK-6890
> URL: https://issues.apache.org/jira/browse/FLINK-6890
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.3.0, 1.2.1
>Reporter: Tzu-Li (Gordon) Tai
>
> See original discussion on ML: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Guava-version-conflict-td13561.html.
> Running {{mvn dependency:tree}} for {{flink-dist}} did not reveal any Guava 
> dependencies.
> This was tested with Maven 3.0.5.
> {code}
> com/google/common/util/concurrent/Futures$CombinedFuture.class
> com/google/common/util/concurrent/Futures$CombinerFuture.class
> com/google/common/util/concurrent/Futures$FallbackFuture$1$1.class
> com/google/common/util/concurrent/Futures$FallbackFuture$1.class
> com/google/common/util/concurrent/Futures$FallbackFuture.class
> com/google/common/util/concurrent/Futures$FutureCombiner.class
> com/google/common/util/concurrent/Futures$ImmediateCancelledFuture.class
> com/google/common/util/concurrent/Futures$ImmediateFailedCheckedFuture.class
> com/google/common/util/concurrent/Futures$ImmediateFailedFuture.class
> com/google/common/util/concurrent/Futures$ImmediateFuture.class
> com/google/common/util/concurrent/Futures$ImmediateSuccessfulCheckedFuture.class
> com/google/common/util/concurrent/Futures$ImmediateSuccessfulFuture.class
> com/google/common/util/concurrent/Futures$MappingCheckedFuture.class
> com/google/common/util/concurrent/Futures$NonCancellationPropagatingFuture$1.class
> com/google/common/util/concurrent/Futures$NonCancellationPropagatingFuture.class
> com/google/common/util/concurrent/Futures$WrappedCombiner.class
> com/google/common/util/concurrent/Futures.class
> com/google/common/util/concurrent/JdkFutureAdapters$ListenableFutureAdapter$1.class
> com/google/common/util/concurrent/JdkFutureAdapters$ListenableFutureAdapter.class
> com/google/common/util/concurrent/JdkFutureAdapters.class
> com/google/common/util/concurrent/ListenableFuture.class
> com/google/common/util/concurrent/ListenableFutureTask.class
> com/google/common/util/concurrent/ListenableScheduledFuture.class
> com/google/common/util/concurrent/ListenerCallQueue$Callback.class
> com/google/common/util/concurrent/ListenerCallQueue.class
> com/google/common/util/concurrent/ListeningExecutorService.class
> com/google/common/util/concurrent/ListeningScheduledExecutorService.class
> com/google/common/util/concurrent/Monitor$Guard.class
> com/google/common/util/concurrent/Monitor.class
> com/google/common/util/concurrent/MoreExecutors$1.class
> com/google/common/util/concurrent/MoreExecutors$2.class
> com/google/common/util/concurrent/MoreExecutors$3.class
> com/google/common/util/concurrent/MoreExecutors$4.class
> com/google/common/util/concurrent/MoreExecutors$Application$1.class
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6981) Add shaded guava dependency

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063206#comment-16063206
 ] 

ASF GitHub Bot commented on FLINK-6981:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink-shaded/pull/1#discussion_r124032645
  
--- Diff: flink-shaded-guava-18/pom.xml ---
@@ -0,0 +1,99 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
+
+4.0.0
+
+
+org.apache.flink
+flink-shaded
+1.0
+..
+
+
+flink-shaded-guava-18
+flink-shaded-guava-18
+1.0-${guava.version}
+
+jar
+
+
+18.0
+
+
+
+
+com.google.guava
+guava
+${guava.version}
+true
--- End diff --

Why is the dependency optional?


> Add shaded guava dependency
> ---
>
> Key: FLINK-6981
> URL: https://issues.apache.org/jira/browse/FLINK-6981
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
> Fix For: 1.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-7007) Add README to "flink-shaded.git" repository

2017-06-26 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-7007:
-

 Summary: Add README to "flink-shaded.git" repository
 Key: FLINK-7007
 URL: https://issues.apache.org/jira/browse/FLINK-7007
 Project: Flink
  Issue Type: Task
  Components: flink-shaded.git
Reporter: Robert Metzger


We should put a README file up there with a brief explanation of the purpose of 
the repo + some links to the project.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6950) Add ability to specify single jar files to be shipped to YARN

2017-06-26 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063197#comment-16063197
 ] 

Robert Metzger commented on FLINK-6950:
---

Are you interested in adding a pull request for supporting this?

> Add ability to specify single jar files to be shipped to YARN
> -
>
> Key: FLINK-6950
> URL: https://issues.apache.org/jira/browse/FLINK-6950
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Mikhail Pryakhin
>Priority: Minor
>
> When deploying a flink job on YARN it is not possible to specify multiple 
> yarnship folders. 
> Often when submitting a flink job, job dependencies and job resources are 
> located in different local folders and both of them should be shipped to YARN 
> cluster. 
> I think it would be great to have an ability to specify jars but not folders 
> that should be shipped to YARN cluster (via the --yarnship-jars option). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6742) Improve error message when savepoint migration fails due to task removal

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063193#comment-16063193
 ] 

ASF GitHub Bot commented on FLINK-6742:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4185
  
Thank you for taking a look, merging this.


> Improve error message when savepoint migration fails due to task removal
> 
>
> Key: FLINK-6742
> URL: https://issues.apache.org/jira/browse/FLINK-6742
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.3.0
>Reporter: Gyula Fora
>Assignee: Chesnay Schepler
>Priority: Minor
>  Labels: flink-rel-1.3.1-blockers
>
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.runtime.checkpoint.savepoint.SavepointV2.convertToOperatorStateSavepointV2(SavepointV2.java:171)
>   at 
> org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:75)
>   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1090)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4185: [FLINK-6742] Add eager checks for parallelism/chain-lengt...

2017-06-26 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4185
  
Thank you for taking a look, merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4016: [FLINK-6638] Allow overriding default for primitive Confi...

2017-06-26 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4016
  
cool, merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6638) Allow overriding default value for all types when retrieving ConfigOption

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063192#comment-16063192
 ] 

ASF GitHub Bot commented on FLINK-6638:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4016
  
cool, merging this.


> Allow overriding default value for all types when retrieving ConfigOption
> -
>
> Key: FLINK-6638
> URL: https://issues.apache.org/jira/browse/FLINK-6638
> Project: Flink
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>
> Currently there exists no equivalent to 
> {{Configuration#getString(ConfigOption configOption, String 
> overrideDefault)}} for types such as {{Integer}} or {{Boolean}}.
> If you want to use a non-String {{ConfigOption}} you either have to resort to 
> hacks, like taking the key and going through the non-ConfigOption path or 
> comparing the result with the default value of the option.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6994) Wrong base url in master docs

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063183#comment-16063183
 ] 

ASF GitHub Bot commented on FLINK-6994:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4175


> Wrong base url in master docs
> -
>
> Key: FLINK-6994
> URL: https://issues.apache.org/jira/browse/FLINK-6994
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
> Fix For: 1.4.0
>
>
> The base url of the master docs point to 1.3 instead of 1.4. At the moment 
> the menu items point to the latest stable release docs instead of the nightly 
> master docs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (FLINK-6994) Wrong base url in master docs

2017-06-26 Thread Timo Walther (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther resolved FLINK-6994.
-
   Resolution: Fixed
Fix Version/s: 1.4.0

Fixed in 1.4.0: cd8932f5de967092a39fde5f1ba468b2542e91da

> Wrong base url in master docs
> -
>
> Key: FLINK-6994
> URL: https://issues.apache.org/jira/browse/FLINK-6994
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
> Fix For: 1.4.0
>
>
> The base url of the master docs point to 1.3 instead of 1.4. At the moment 
> the menu items point to the latest stable release docs instead of the nightly 
> master docs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4175: [FLINK-6994] [docs] Wrong base url in master docs

2017-06-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/4175


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-7006) Base class using POJOs for Gelly algorithms

2017-06-26 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-7006:
-

 Summary: Base class using POJOs for Gelly algorithms
 Key: FLINK-7006
 URL: https://issues.apache.org/jira/browse/FLINK-7006
 Project: Flink
  Issue Type: Sub-task
  Components: Gelly
Affects Versions: 1.4.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor
 Fix For: 1.4.0


Gelly algorithms commonly have a {{Result}} class extending a {{Tuple}} type 
and implementing one of the {{Unary/Binary/TertiaryResult}} interfaces.

Add a {{Unary/Binary/TertiaryResultBase}} class implementing each interface and 
convert the {{Result}} classes to POJOs extending the base result classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6994) Wrong base url in master docs

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063156#comment-16063156
 ] 

ASF GitHub Bot commented on FLINK-6994:
---

Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/4175
  
+1


> Wrong base url in master docs
> -
>
> Key: FLINK-6994
> URL: https://issues.apache.org/jira/browse/FLINK-6994
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> The base url of the master docs point to 1.3 instead of 1.4. At the moment 
> the menu items point to the latest stable release docs instead of the nightly 
> master docs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink issue #4175: [FLINK-6994] [docs] Wrong base url in master docs

2017-06-26 Thread rmetzger
Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/4175
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4175: [FLINK-6994] [docs] Wrong base url in master docs

2017-06-26 Thread twalthr
Github user twalthr commented on the issue:

https://github.com/apache/flink/pull/4175
  
I will merge this now. I added your comments to FLINK-6995.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6994) Wrong base url in master docs

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063142#comment-16063142
 ] 

ASF GitHub Bot commented on FLINK-6994:
---

Github user twalthr commented on the issue:

https://github.com/apache/flink/pull/4175
  
I will merge this now. I added your comments to FLINK-6995.


> Wrong base url in master docs
> -
>
> Key: FLINK-6994
> URL: https://issues.apache.org/jira/browse/FLINK-6994
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> The base url of the master docs point to 1.3 instead of 1.4. At the moment 
> the menu items point to the latest stable release docs instead of the nightly 
> master docs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6249) Distinct Aggregates for OVER window

2017-06-26 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063137#comment-16063137
 ] 

Fabian Hueske commented on FLINK-6249:
--

Hi [~rtudoran] and [~stefano.bortoli], I think we can close the four sub-JIRAs 
of this issue (FLINK-6250, FLINK-6251, FLINK-6252, and FLINK-6253) because the 
implementation of FLINK-6388 (adding distinct to the generated code) should 
cover all these cases.

What do you think?

Btw. Also Calcite 1.13.0 was released which should bring support for DISTINCT 
in OVER windows.



> Distinct Aggregates for OVER window
> ---
>
> Key: FLINK-6249
> URL: https://issues.apache.org/jira/browse/FLINK-6249
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Affects Versions: 1.3.0
>Reporter: radu
>  Labels: features, patch
>
> Time target: ProcTime/EventTime
> SQL targeted query examples:
> 
> Q1. Boundaries are expressed in windows and meant for the elements to be 
> aggregated
> Q1.1. `SELECT SUM( DISTINCT  b) OVER (ORDER BY procTime() ROWS BETWEEN 2 
> PRECEDING AND CURRENT ROW) FROM stream1`
> Q1.2. `SELECT SUM( DISTINCT  b) OVER (ORDER BY procTime() RANGE BETWEEN 
> INTERVAL '1' HOUR PRECEDING AND CURRENT ROW) FROM stream1`
> Q1.3. `SELECT SUM( DISTINCT  b) OVER (ORDER BY rowTime() ROWS BETWEEN 2 
> PRECEDING AND CURRENT ROW) FROM stream1`
> Q1.4. `SELECT SUM( DISTINCT  b) OVER (ORDER BY rowTime() RANGE BETWEEN 
> INTERVAL '1' HOUR PRECEDING AND CURRENT ROW) FROM stream1`
> General comments:
> -   DISTINCT operation makes sense only within the context of windows or some 
> bounded defined structures. Otherwise the operation would keep an infinite 
> amount of data to ensure uniqueness and would not trigger for certain 
> functions (e.g. aggregates)
> -   We can consider as a sub-JIRA issue the implementation of DISTINCT for 
> UNBOUND sliding windows. However, there would be no control over the data 
> structure to keep seen data (to check it is not re-process). -> This needs to 
> be decided if we want to support it (to create appropriate JIRA issues)
> => We will open sub-JIRA issues to extend the current functionality of 
> aggregates for the DISTINCT CASE   
> =>   Aggregations over distinct elements without any boundary (i.e. within 
> SELECT clause) do not make sense just as aggregations do not make sense 
> without groupings or windows.
> Description:
> 
> The DISTINCT operator requires processing the elements to ensure uniqueness. 
> Either that the operation is used for SELECT ALL distinct elements or for 
> applying typical aggregation functions over a set of elements, there is a 
> prior need of forming a collection of elements.
> This brings the need of using windows or grouping methods. Therefore the 
> distinct function will be implemented within windows. Depending on the type 
> of window definition there are several options:
> -   Main Scope: If distinct is applied as in Q1 example for window 
> aggregations than either we extend the implementation with distinct 
> aggregates (less preferred) or extend the sliding window aggregates 
> implementation in the processFunction with distinction identification support 
> (preferred). The later option is preferred because a query can carry multiple 
> aggregates including multiple aggregates that have the distinct key word set 
> up. Implementing the distinction between elements in the process function 
> avoid the need to multiply the data structure to mark what what was seen 
> across multiple aggregates. It also makes the implementation more robust and 
> resilient as we can keep the data structure for marking the seen elements in 
> a state (mapstate).
> Functionality example
> -
> We exemplify below the functionality of the IN/Exists when working with 
> streams.
> `Query:  SELECT  sum(DISTINCT  a) OVER (ORDER BY procTime() ROWS BETWEEN 2 
> PRECEDING AND CURRENT ROW) FROM stream1`
> ||Proctime||IngestionTime(Event)||Stream1||Q3||
> ||10:00:01|   (ab,1)|   1 |
> ||10:05:00| (aa,2)|3 |
> ||11:03:00|   (aa,2)|  3 |
> ||11:09:00|   (aa,2 |2 |
> |...|
> Implementation option
> -
> Considering that the behavior depends on over window behavior, the 
> implementation will be done by reusing the existing implementation of the 
> over window functions - done based on processFunction. As mentioned in the 
> description section, there are 2 options to consider:
> 1)  Using distinct within the aggregates implementation by extending with 
> distinct aggregates implementation the current aggregates in Flink. For this 
> we define additional JIRA issues for each implementation to support the 
> distinct keyword.
> 2)  Using distinct for selection within the process logic when calling 

[jira] [Commented] (FLINK-6428) Add support DISTINCT in dataStream SQL

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063136#comment-16063136
 ] 

ASF GitHub Bot commented on FLINK-6428:
---

Github user sunjincheng121 closed the pull request at:

https://github.com/apache/flink/pull/3817


> Add support DISTINCT in dataStream SQL
> --
>
> Key: FLINK-6428
> URL: https://issues.apache.org/jira/browse/FLINK-6428
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: sunjincheng
>Assignee: sunjincheng
>
> Add support DISTINCT in dataStream SQL as follow:
> DATA:
> {code}
> (name, age)
> (kevin, 28),
> (sunny, 6),
> (jack, 6)
> {code}
> SQL:
> {code}
> SELECT DISTINCT age FROM MyTable"
> {code}
> RESULTS:
> {code}
> 28, 6
> {code}
> To DataStream:
> {code}
> inputDS
>   .keyBy() // KeyBy on all fields
>   .flatMap() //  Eliminate duplicate data
> {code}
> [~fhueske] do we need this feature?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #3817: [FLINK-6428][table] Add support DISTINCT in dataSt...

2017-06-26 Thread sunjincheng121
Github user sunjincheng121 closed the pull request at:

https://github.com/apache/flink/pull/3817


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3817: [FLINK-6428][table] Add support DISTINCT in dataStream SQ...

2017-06-26 Thread sunjincheng121
Github user sunjincheng121 commented on the issue:

https://github.com/apache/flink/pull/3817
  
Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-6428) Add support DISTINCT in dataStream SQL

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063134#comment-16063134
 ] 

ASF GitHub Bot commented on FLINK-6428:
---

Github user sunjincheng121 commented on the issue:

https://github.com/apache/flink/pull/3817
  
Sure.


> Add support DISTINCT in dataStream SQL
> --
>
> Key: FLINK-6428
> URL: https://issues.apache.org/jira/browse/FLINK-6428
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API & SQL
>Reporter: sunjincheng
>Assignee: sunjincheng
>
> Add support DISTINCT in dataStream SQL as follow:
> DATA:
> {code}
> (name, age)
> (kevin, 28),
> (sunny, 6),
> (jack, 6)
> {code}
> SQL:
> {code}
> SELECT DISTINCT age FROM MyTable"
> {code}
> RESULTS:
> {code}
> 28, 6
> {code}
> To DataStream:
> {code}
> inputDS
>   .keyBy() // KeyBy on all fields
>   .flatMap() //  Eliminate duplicate data
> {code}
> [~fhueske] do we need this feature?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7005) Optimization steps are missing for nested registered tables

2017-06-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063126#comment-16063126
 ] 

ASF GitHub Bot commented on FLINK-7005:
---

Github user sunjincheng121 commented on a diff in the pull request:

https://github.com/apache/flink/pull/4186#discussion_r124016791
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/ExpressionReductionTest.scala
 ---
@@ -422,4 +422,22 @@ class ExpressionReductionTest extends TableTestBase {
 util.verifyTable(result, expected)
   }
 
+  @Test
+  def testNestedTablesReduction(): Unit = {
+val util = streamTestUtil()
--- End diff --

I suggest add a `batchTestUtil()` test case as well.


> Optimization steps are missing for nested registered tables
> ---
>
> Key: FLINK-7005
> URL: https://issues.apache.org/jira/browse/FLINK-7005
> Project: Flink
>  Issue Type: Bug
>  Components: Table API & SQL
>Affects Versions: 1.3.0, 1.3.1
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> Tables that are registered (implicitly or explicitly) do not pass the first 
> three optimization steps:
> - decorrelate
> - convert time indicators
> - normalize the logical plan
> E.g. this has the wrong plan right now:
> {code}
> val table = stream.toTable(tEnv, 'rowtime.rowtime, 'int, 'double, 'float, 
> 'bigdec, 'string)
> val table1 = tEnv.sql(s"""SELECT 1 + 1 FROM $table""") // not optimized
> val table2 = tEnv.sql(s"""SELECT myrt FROM $table1""")
> val results = table2.toAppendStream[Row]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] flink pull request #4186: [FLINK-7005] [table] Optimization steps are missin...

2017-06-26 Thread sunjincheng121
Github user sunjincheng121 commented on a diff in the pull request:

https://github.com/apache/flink/pull/4186#discussion_r124016791
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/ExpressionReductionTest.scala
 ---
@@ -422,4 +422,22 @@ class ExpressionReductionTest extends TableTestBase {
 util.verifyTable(result, expected)
   }
 
+  @Test
+  def testNestedTablesReduction(): Unit = {
+val util = streamTestUtil()
--- End diff --

I suggest add a `batchTestUtil()` test case as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-7000) Add custom configuration for StreamExecutionEnvironment#createLocalEnvironment

2017-06-26 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-7000:
--
Component/s: Scala API
 DataStream API

> Add custom configuration for StreamExecutionEnvironment#createLocalEnvironment
> --
>
> Key: FLINK-7000
> URL: https://issues.apache.org/jira/browse/FLINK-7000
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API, Scala API
>Reporter: Lim Chee Hau
>
> I was doing some local testings in {{Scala}} environment, however I found 
> that there is no straightforward way to add custom configuration to 
> {{StreamExecutionEnvironment}} by using {{createLocalEnvironment}} method. 
> This could be easily achieve in {{Java}} environment since 
> {{StreamExecutionEnvironment}} in {{Java}} has 
> - {{createLocalEnvironment()}}
> - {{createLocalEnvironment(Int)}}
> - {{createLocalEnvironment(Int, Configuration)}}
> Whereas Scala only has 2 out of 3 of these methods.
> Not sure if this is a missing implementation, if yes I could create a PR for 
> this.
> Therefore the example in [Local 
> Execution|https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/local_execution.html]
>  could be making sense for Scala users as well:
> bq. The LocalEnvironment allows also to pass custom configuration values to 
> Flink.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   >