[jira] [Commented] (FLINK-7608) LatencyGauge change to histogram metric

2018-01-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342946#comment-16342946
 ] 

ASF GitHub Bot commented on FLINK-7608:
---

Github user yew1eb commented on the issue:

https://github.com/apache/flink/pull/5161
  
Yes, in our production environment, we report and store all metrics to an 
external Time series database for alarm and visual presentation.

When the job is started, we will store the edge structure of the job's 
logical plan: eg. .

For the latency metrics, we hope to include the operatorName instead of the 
operatorID in the tags. because operatorID will change if the job redeploys, 
the history's latency metrics data for the job will be hard to match and no 
latency statistics will be observed in our visualization panel.



> LatencyGauge change to  histogram metric
> 
>
> Key: FLINK-7608
> URL: https://issues.apache.org/jira/browse/FLINK-7608
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics
>Reporter: Hai Zhou UTC+8
>Assignee: Hai Zhou UTC+8
>Priority: Major
> Fix For: 1.5.0
>
>
> I used slf4jReporter[https://issues.apache.org/jira/browse/FLINK-4831]  to 
> export metrics the log file.
> I found:
> {noformat}
> -- Gauges 
> -
> ..
> zhouhai-mbp.taskmanager.f3fd3a269c8c3da4e8319c8f6a201a57.Flink Streaming 
> Job.Map.0.latency:
>  value={LatencySourceDescriptor{vertexID=1, subtaskIndex=-1}={p99=116.0, 
> p50=59.5, min=11.0, max=116.0, p95=116.0, mean=61.836}}
> zhouhai-mbp.taskmanager.f3fd3a269c8c3da4e8319c8f6a201a57.Flink Streaming 
> Job.Sink- Unnamed.0.latency: 
> value={LatencySourceDescriptor{vertexID=1, subtaskIndex=0}={p99=195.0, 
> p50=163.5, min=115.0, max=195.0, p95=195.0, mean=161.0}}
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5161: [FLINK-7608][metric] Refactor latency statistics metric

2018-01-28 Thread yew1eb
Github user yew1eb commented on the issue:

https://github.com/apache/flink/pull/5161
  
Yes, in our production environment, we report and store all metrics to an 
external Time series database for alarm and visual presentation.

When the job is started, we will store the edge structure of the job's 
logical plan: eg. .

For the latency metrics, we hope to include the operatorName instead of the 
operatorID in the tags. because operatorID will change if the job redeploys, 
the history's latency metrics data for the job will be hard to match and no 
latency statistics will be observed in our visualization panel.



---


[jira] [Commented] (FLINK-8515) update RocksDBMapState to replace deprecated remove() with delete()

2018-01-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342802#comment-16342802
 ] 

ASF GitHub Bot commented on FLINK-8515:
---

Github user bowenli86 commented on the issue:

https://github.com/apache/flink/pull/5365
  
cc @StefanRRichter 


> update RocksDBMapState to replace deprecated remove() with delete()
> ---
>
> Key: FLINK-8515
> URL: https://issues.apache.org/jira/browse/FLINK-8515
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.5.0
>
>
> Currently in RocksDBMapState:
> {code:java}
> @Override
>   public void remove(UK userKey) throws IOException, RocksDBException {
>   byte[] rawKeyBytes = 
> serializeUserKeyWithCurrentKeyAndNamespace(userKey);
>   backend.db.remove(columnFamily, writeOptions, rawKeyBytes);
>   }
> {code}
> remove() is actually deprecated. Should be replaced with delete()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5365: [FLINK-8515] update RocksDBMapState to replace deprecated...

2018-01-28 Thread bowenli86
Github user bowenli86 commented on the issue:

https://github.com/apache/flink/pull/5365
  
cc @StefanRRichter 


---


[jira] [Issue Comment Deleted] (FLINK-6214) WindowAssigners do not allow negative offsets

2018-01-28 Thread Jelmer Kuperus (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jelmer Kuperus updated FLINK-6214:
--
Comment: was deleted

(was: Proposed pull request : https://github.com/apache/flink/pull/5376)

> WindowAssigners do not allow negative offsets
> -
>
> Key: FLINK-6214
> URL: https://issues.apache.org/jira/browse/FLINK-6214
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.2.0
>Reporter: Timo Walther
>Priority: Major
>
> Both the website and the JavaDoc promotes 
> ".window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8))) For 
> example, in China you would have to specify an offset of Time.hours(-8)". But 
> both the sliding and tumbling event time assigners do not allow offset to be 
> negative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6214) WindowAssigners do not allow negative offsets

2018-01-28 Thread Jelmer Kuperus (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342761#comment-16342761
 ] 

Jelmer Kuperus commented on FLINK-6214:
---

Proposed pull request : https://github.com/apache/flink/pull/5376

> WindowAssigners do not allow negative offsets
> -
>
> Key: FLINK-6214
> URL: https://issues.apache.org/jira/browse/FLINK-6214
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.2.0
>Reporter: Timo Walther
>Priority: Major
>
> Both the website and the JavaDoc promotes 
> ".window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8))) For 
> example, in China you would have to specify an offset of Time.hours(-8)". But 
> both the sliding and tumbling event time assigners do not allow offset to be 
> negative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6214) WindowAssigners do not allow negative offsets

2018-01-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342760#comment-16342760
 ] 

ASF GitHub Bot commented on FLINK-6214:
---

GitHub user jelmerk opened a pull request:

https://github.com/apache/flink/pull/5376

[FLINK-6214] WindowAssigners do not allow negative offsets

## What is the purpose of the change

The javadoc of TumblingEventTimeWindows and TumblingProcessingTimeWindows 
suggest that it is possible to use negative offsets but in practice this is not 
supported. This patch remedies this situation

## Brief change log

- updated behavior of TumblingEventTimeWindows & 
TumblingProcessingTimeWindows

## Verifying this change

This change added tests and can be verified as follows:

- Added testWindowAssignmentWithNegativeOffset method in 
TumblingEventTimeWindowsTest & TumblingProcessingTimeWindowsTest

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): no
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
  - The serializers: no
  - The runtime per-record code paths (performance sensitive): no
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  - The S3 file system connector: no

## Documentation

  - Does this pull request introduce a new feature? no
  - If yes, how is the feature documented? not applicable


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jelmerk/flink negative_offsets

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5376.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5376


commit 07f5577d67155c86af72dac85f7c96b4fea06456
Author: Jelmer Kuperus 
Date:   2018-01-28T22:22:10Z

FLINK-6214: allow negative offsets




> WindowAssigners do not allow negative offsets
> -
>
> Key: FLINK-6214
> URL: https://issues.apache.org/jira/browse/FLINK-6214
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Affects Versions: 1.2.0
>Reporter: Timo Walther
>Priority: Major
>
> Both the website and the JavaDoc promotes 
> ".window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8))) For 
> example, in China you would have to specify an offset of Time.hours(-8)". But 
> both the sliding and tumbling event time assigners do not allow offset to be 
> negative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5376: [FLINK-6214] WindowAssigners do not allow negative...

2018-01-28 Thread jelmerk
GitHub user jelmerk opened a pull request:

https://github.com/apache/flink/pull/5376

[FLINK-6214] WindowAssigners do not allow negative offsets

## What is the purpose of the change

The javadoc of TumblingEventTimeWindows and TumblingProcessingTimeWindows 
suggest that it is possible to use negative offsets but in practice this is not 
supported. This patch remedies this situation

## Brief change log

- updated behavior of TumblingEventTimeWindows & 
TumblingProcessingTimeWindows

## Verifying this change

This change added tests and can be verified as follows:

- Added testWindowAssignmentWithNegativeOffset method in 
TumblingEventTimeWindowsTest & TumblingProcessingTimeWindowsTest

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): no
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
  - The serializers: no
  - The runtime per-record code paths (performance sensitive): no
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  - The S3 file system connector: no

## Documentation

  - Does this pull request introduce a new feature? no
  - If yes, how is the feature documented? not applicable


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jelmerk/flink negative_offsets

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5376.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5376


commit 07f5577d67155c86af72dac85f7c96b4fea06456
Author: Jelmer Kuperus 
Date:   2018-01-28T22:22:10Z

FLINK-6214: allow negative offsets




---


[jira] [Commented] (FLINK-8484) Kinesis consumer re-reads closed shards on job restart

2018-01-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342608#comment-16342608
 ] 

ASF GitHub Bot commented on FLINK-8484:
---

Github user pluppens commented on the issue:

https://github.com/apache/flink/pull/5337
  
@tzulitai Is there anything more I can do from my side?


> Kinesis consumer re-reads closed shards on job restart
> --
>
> Key: FLINK-8484
> URL: https://issues.apache.org/jira/browse/FLINK-8484
> Project: Flink
>  Issue Type: Bug
>  Components: Kinesis Connector
>Affects Versions: 1.4.0, 1.3.2
>Reporter: Philip Luppens
>Assignee: Philip Luppens
>Priority: Blocker
>  Labels: bug, flink, kinesis
> Fix For: 1.3.3, 1.5.0, 1.4.1
>
>
> We’re using the connector to subscribe to streams varying from 1 to a 100 
> shards, and used the kinesis-scaling-utils to dynamically scale the Kinesis 
> stream up and down during peak times. What we’ve noticed is that, while we 
> were having closed shards, any Flink job restart with check- or save-point 
> would result in shards being re-read from the event horizon, duplicating our 
> events.
>  
> We started checking the checkpoint state, and found that the shards were 
> stored correctly with the proper sequence number (including for closed 
> shards), but that upon restarts, the older closed shards would be read from 
> the event horizon, as if their restored state would be ignored.
>  
> In the end, we believe that we found the problem: in the 
> FlinkKinesisConsumer’s run() method, we’re trying to find the shard returned 
> from the KinesisDataFetcher against the shards’ metadata from the restoration 
> point, but we do this via a containsKey() call, which means we’ll use the 
> StreamShardMetadata’s equals() method. However, this checks for all 
> properties, including the endingSequenceNumber, which might have changed 
> between the restored state’s checkpoint and our data fetch, thus failing the 
> equality check, failing the containsKey() check, and resulting in the shard 
> being re-read from the event horizon, even though it was present in the 
> restored state.
>  
> We’ve created a workaround where we only check for the shardId and stream 
> name to restore the state of the shards we’ve already seen, and this seems to 
> work correctly. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5337: [FLINK-8484][flink-kinesis-connector] Ensure a Kinesis co...

2018-01-28 Thread pluppens
Github user pluppens commented on the issue:

https://github.com/apache/flink/pull/5337
  
@tzulitai Is there anything more I can do from my side?


---


[jira] [Commented] (FLINK-8434) The new yarn resource manager should take over the running task managers after failover

2018-01-28 Thread Gary Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342553#comment-16342553
 ] 

Gary Yao commented on FLINK-8434:
-

[~tiemsn] I think this is a duplicate to FLINK-7805. One of the tickets should 
be closed.  

> The new yarn resource manager should take over the running task managers 
> after failover
> ---
>
> Key: FLINK-8434
> URL: https://issues.apache.org/jira/browse/FLINK-8434
> Project: Flink
>  Issue Type: Bug
>  Components: Cluster Management
>Affects Versions: 1.5.0
>Reporter: shuai.xu
>Assignee: shuai.xu
>Priority: Major
>  Labels: flip-6
>
> The app master which container the job master and yarn resource manager may 
> failover during running on yarn. The new resource manager should take over 
> the running task managers after started. But now the YarnResourceManager does 
> not record the running container to workerNodeMap, so when task managers 
> register to it, it will reject them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)