[GitHub] flink issue #3525: [FLINK-6020]add a random integer suffix to blob key to av...

2017-05-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3525
  
@tillrohrmann I've tried with your commit and the issue is resolved, 
thanks. Closing this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3525: [FLINK-6020]add a random integer suffix to blob ke...

2017-05-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic closed the pull request at:

https://github.com/apache/flink/pull/3525


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3525: [FLINK-6020]add a random integer suffix to blob key to av...

2017-05-16 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3525
  
Thanks for your fix, i'll check in a day or two.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3525: [FLINK-6020]add a random integer suffix to blob key to av...

2017-05-08 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3525
  
That looks good to me. Looking forward to fix from @tillrohrmann. Thank you 
very much :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3525: [FLINK-6020]add a random integer suffix to blob key to av...

2017-05-08 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3525
  
For HA case, the blob server will upload jars to HDFS for recovery, and 
there's a cocurrent operations here too. I'm not sure if the solutions ou 
proposed can cover that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3727: [FLINK-6312]update curator version to 2.12.0 to av...

2017-05-04 Thread WangTaoTheTonic
Github user WangTaoTheTonic closed the pull request at:

https://github.com/apache/flink/pull/3727


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]Update suspended ExecutionGraph to lower late...

2017-04-24 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
@StephanEwen Sure. The current fix is like a "pull", while what you suggest 
is a "push" way. Both them can fix just make difference in how the EGs being 
updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3727: [FLINK-6312]update curator version to 2.12.0 to avoid pot...

2017-04-24 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3727
  
@StephanEwen All tests passed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3745: [FLINK-6341]Don't let JM fall into infinite loop

2017-04-24 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3745
  
I see. Connection ID is added, please check if it's ok :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3745: [FLINK-6341]Don't let JM fall into infinite loop

2017-04-23 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3745
  
@tillrohrmann What problem it will bring if we access 
`currentResourceManager` from another thread? It is a variable in JobManager 
and can be shared across multi threads, right? The new added code just read it 
and there's no cocurrency problem coming, i think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-23 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
I've testet and the function is ok. Please check if it's good to go, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-20 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
All right. I'll change as you suggest and verify the result. Thanks for 
comments and advise :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3745: [FLINK-6341]Don't let JM fall into infinite loop

2017-04-20 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3745

[FLINK-6341]Don't let JM fall into infinite loop

When TaskManager register to JobManager, JM will send a 
"NotifyResourceStarted" message to kick off Resource Manager, then trigger a 
reconnection to resource manager through sending a 
"TriggerRegistrationAtJobManager".
When the ref of resource manager in JobManager is not None and the 
reconnection is to same resource manager, JobManager will go to a infinite 
message sending loop which will always sending himself a 
"ReconnectResourceManager" every 2 seconds.
We have already observed that phonomenon. More details, check how 
JobManager handles `ReconnectResourceManager`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-6341

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3745


commit 8eb4dd42a71d9830c91b3db824e3133ce3d35c08
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-04-20T12:28:10Z

Don't let JM fall into infinite loop




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-20 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
My main concern is that the status showing in web doesn't match the actual 
state backend. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-20 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
I got it, but still have one question: what about the other state 
transition? Like when job is cancelling or failing or else? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-19 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
@zentol How do we know if a job requested is supended or not, as the status 
of jobs in backend is alway changing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3727: [FLINK-6312]update curator version to 2.12.0 to avoid pot...

2017-04-19 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3727
  
Sure. Seems like it will take a little long time but i'll try my best :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-19 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
That means every time EGHolder received a request, it will check if the job 
status in request is suspended or not, right?  This will make cache in EGHolder 
unmeaningful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
Ok i think i've got your point.

Now using WeakHashMap, we add entries when the map doesn't contain the 
requested EG id,  remove invalid entries when GC happens.

By adding `small 2-line branch` as you suggest, we add entries as same way 
as before, but check if a entry is valid when it's accessed by a handler, and 
update/remove it if it's invalid. Is it right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3414: [FLINK-5904][YARN]make jobmanager.heap.mb and taskmanager...

2017-04-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3414
  
Thanks. I've resolved conflicts. enjoy :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
I mean who's in charge of updating EGHolder? EGHolder itself or JobManager? 
EGHolder don't sense status changing of jobs until it queries from JobManager 
periodically.

If JobManager took the responsibility, so it will be a listenser design 
pattern, i guess? Would it be too complicated as now EGHolder is just a light 
weighted cache?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
In my opinion EGHolder is simply a cache which should not be assigned too 
complicated task.

If we add the check logic, how long it should be? Will other events 
afftects status of tasks? I believe there're more concerns if we added it. 

This fix only change internal data structures and decouple with both 
JobManager and web frontend. 

I am not sure why we are reducing usage of guava, but it sounds not a very 
good idea :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
I'm not a akka expert. As we observed, the status of cancelled tasks will 
be updated to running only when gc happens in JM.
Way to reproduce:
1. launch a flink job with ha mode
2. restart zookeeper(to make tasks failed)
3. after tasks recovered, check if status of tasks are running or 
cancelled(if there's gc happens, tasks' status showed in web frontend will be 
same with the actual states, or the tasks' status are delayed, may cause 
inconsistend with those in backend)

We oberved such phenomemon in yarn mode, and it is fixed after this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
@zentol No you're wrong.

If you take a look at `ExecutionGraphHolder`, you'll find the graphs in it 
are generated from message answered by JobManager, which means there's no 
reference from JobManager but only from handlers in netty web backend. Once 
there's no reference from those handlers, they would be garbage collected no 
matter the actual job is running or recovering.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
@zentol The execution graphs cached in `ExecutionGraphHolder`(which is 
backed by a WeakHashMap) will be evicted only when gc happens.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3709: [FLINK-6295]use LoadingCache instead of WeakHashMap to lo...

2017-04-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3709
  
@wenlong88 LoadingCache can also cache and evict data as WeakHashMap, as 
this implementation shows it will evict data every 30 seconds and fetch data if 
it doesn't contain the required key.

@zentol You're right. The data structures used doesn't matter, while what 
is showed in web frontend and how they are updated does.  I don't think user 
can tasks' stauts update only triggered by JobManager GC(which could be a very 
long time).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3727: [FLINK-6312]update curator version to 2.12.0 to av...

2017-04-17 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3727

[FLINK-6312]update curator version to 2.12.0 to avoid potential block

As there's a Major 
bug([CURATOR-344](https://issues.apache.org/jira/browse/CURATOR-344)) in 
curator release used by flink, we need to update the release to 2.12.0 to avoid 
potential block in flink.

 (flink use recipes in checkpoint coordinator and we have already occurred 
problem in zookeeper failover when we're trying to fix 
[FLINK-6174](https://issues.apache.org/jira/browse/FLINK-6174))

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-6312

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3727.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3727


commit 014b042006c0d6b1939e00c68e7d15ce8262400a
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-04-17T06:55:27Z

update curator version to 2.12.0 to avoid potential block




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3709: [FLINK-6295]use LoadingCache instead of WeakHashMa...

2017-04-11 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3709

[FLINK-6295]use LoadingCache instead of WeakHashMap to lower latency

Now in ExecutionGraphHolder, which is used in many handlers, we use a 
WeakHashMap to cache ExecutionGraph(s), which is only sensitive to garbage 
collection.

The latency is too high when JVM do GC rarely, which will make status of 
jobs or its tasks unmatched with the real ones. (WE once observed that the web 
still shows tasks cancelled/failed, after the actual states of tasks coming 
back to normal for **30+ mins,** until a gc happened)

LoadingCache is a common used cache implementation from guava lib, we can 
use its time based eviction to lower latency of status update.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-6295

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3709.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3709


commit d76ced06242623d150f9ad09205e2b92f910c1a1
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-04-11T11:48:52Z

use LoadingCache instead of WeakHashMap to lower latency




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3704: [FLINK-5756] Replace RocksDB dependency with FRocksDB

2017-04-09 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3704
  
Could you tell what modifications are done in "FRocksDB" and post the url 
of source code repository?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3525: [FLINK-6020]add a random integer suffix to blob key to av...

2017-04-05 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3525
  
hi stephan, could you help review? @StephanEwen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTaskSlots a...

2017-03-29 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3408
  
@tillrohrmann 
After changing code the test results(both session and single mode) is like:

| Configurations   | #vcores of container(TM) | 
#slots of TM |
|  |  | 
 |
| -s/-ys 5,  yarn.containers.vcores: 4,  taskmanager.numberOfTaskSlots: 3 | 
4| 5|
| yarn.containers.vcores: 4,  taskmanager.numberOfTaskSlots: 3 | 4  
  | 3|
| yarn.containers.vcores: 4| 4| 1   
 |
| -s/-ys 5,  taskmanager.numberOfTaskSlots: 3 | 5| 
5|
| taskmanager.numberOfTaskSlots: 3 | 3| 3   
 |
| Nothing to specify, all use default  | 1| 1   
 |

Please check :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3599: [FLINK-6174][HA]introduce a SMARTER leader latch to make ...

2017-03-29 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3599
  
@wenlong88 Feel free to review, thanks :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3599: [FLINK-6174][HA]introduce a SMARTER leader latch to make ...

2017-03-29 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3599
  
@StephanEwen 
I've done the changes, which introduce a new smarter leader latch(the 
reason why i write a new class is that `handleStateChange` method is private in 
`LeaderLatch` and cannot be overrided) which will wait a connection timeout 
duration when connection to zookeeper is broken, instead of revoke leadership 
immediately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTaskSlots a...

2017-03-29 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3408
  
All right. That makes sense. Let me rephrase that and please check if we 
are in same channel:

1. slots of taskmanager is decided by `-s/-ys` and 
`taskmanager.numberOfTaskSlots`, the former has higher priority; and 
2. the vcores of yarn container is decided by `yarn.containers.vcores`, 
which will use values of `-s/-ys` or `taskmanager.numberOfTaskSlots` if user 
doesn't set `yarn.containers.vcores` explicitly.

Is that right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTaskSlots a...

2017-03-28 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3408
  
@tillrohrmann I still cannot get your point entirely. don't this three 
configs(`-s/-ys`, `yarn.containers.vcores` and `taskmanager.numberOfTaskSlots` 
mean same thing? Do they have difference in usage except priority?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3617: [FLINK-6192]reuse zookeeper client created by CuratorFram...

2017-03-28 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3617
  
Can you please review https://github.com/apache/flink/pull/3408, by the 
way? @tillrohrmann 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3617: [FLINK-6192]reuse zookeeper client created by CuratorFram...

2017-03-27 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3617
  
Glad we have same idea :) @tillrohrmann 

I'll mark the JIRA duplicated now and close this PR as soon as you open the 
new one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3617: [FLINK-6192]reuse zookeeper client created by Cura...

2017-03-27 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3617

[FLINK-6192]reuse zookeeper client created by CuratorFramework

Now in yarn mode, there're three places using zookeeper client(web monitor, 
jobmanager and resourcemanager) in ApplicationMaster/JobManager, while there're 
two in TaskManager. They create new one zookeeper client when they need them.

I believe there're more other places do the same thing, but in one JVM, one 
CuratorFramework is enough for connections to one zookeeper client, so we need 
a singleton to reuse them.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-6192

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3617.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3617


commit 3d0c164a0d41fcd5c88fe7b959062827eb1a5909
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-03-27T07:43:01Z

reuse zookeeper client created by CuratorFramework




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3614: [FLINK-6189][YARN]Do not use yarn client config to...

2017-03-25 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3614

[FLINK-6189][YARN]Do not use yarn client config to do sanity check

Now in client, if #slots is greater than then number of 
"yarn.nodemanager.resource.cpu-vcores" in yarn client config, the submission 
will be rejected.

It makes no sense as the actual vcores of node manager is decided in 
cluster side, but not in client side. If we don't set the config or don't set 
the right value of it(indeed this config is not a mandatory), it should not 
affect flink submission.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-6189

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3614.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3614


commit eef3bba405557c6f7c55aee6983bb1bd9ade7501
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-03-25T10:00:35Z

Do not use yarn client config to do sanity check




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3599: [FLINK-6174][HA]introduce a new election service to make ...

2017-03-24 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3599
  
I don't think it's a good idea, as it can not solve the "split brain" issue 
too.

The key problem is that `LeaderLatch` in curator is too sensitive to 
connection state to Zookeeper(it will revoke leadership when connection to 
zookeeper is temporarily broke), and probably the best way is offerring a 
"duller" LeaderLatch, which can be also used in standalone cluster.

I did same work in our own private Spark release, let me see if it can be 
reused.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTaskSlots a...

2017-03-23 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3408
  
@tillrohrmann @StephanEwen 

The code was changed and I've verified the functions, could you please 
review this and merge it if it's good to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3599: [FLINK-6174][HA]introduce a new election service to make ...

2017-03-23 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3599
  
Thanks for your comments @wenlong88 .

I also gave a thought about adding retry logic when zk failover, but this 
part should modify `LeaderLatch` in curator, which is a 3rd party library, or 
we can only add a our private LeaderLatch through coping most parts of the 
implementation in curator.

Even with adding this AlwaysLeaderService, the JM failover can also go well 
as RM will start a new instance.

about FLIP-6, I'll check the solution and find if anything can help with 
this :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3599: [FLINK-6174][HA]introduce a new election service t...

2017-03-22 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3599

[FLINK-6174][HA]introduce a new election service to make JobManager always 
available

Now in yarn mode, if we use zookeeper as high availability choice, it will 
create a election service to get a leader depending on zookeeper election.

When zookeeper leader crashes or the connection between JobManager and 
zookeeper instance was broken, JobManager's leadership will be revoked and send 
a Disconnect message to TaskManager, which will cancle all running tasks and 
make them waiting connection rebuild between JM and ZK.

In yarn mode, we have one and only JobManager(AM) in same time, and it 
should be alwasy leader instead of elected through zookeeper. We can introduce 
a new leader election service in yarn mode to achive that.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-6174

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3599


commit a758ec6320b026fb5d767cdc190c29c8043838da
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-03-23T03:15:40Z

introduce a new election service to make JobManager always available




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3525: [FLINK-6020]add a random integer suffix to blob key to av...

2017-03-20 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3525
  
ping @StephanEwen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3525: [FLINK-6020]add a random integer suffix to blob key to av...

2017-03-17 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3525
  
Right...I have same thought as you at the beginning and i've tried to make 
the move atomic but it has serveral side affect, like:
1. if we use this way to handle this, which means two job can share the 
same jar file in blobserver, it will be a problem when one of them being 
canceled and deleting its jars(now it seems like it doesn't do the delete, but 
it should do)
2. for job recovery(or other kind of recovery, i'm not sure, just observed 
the phenomenon) blob server will upload jars to hdfs using same name of local 
file. Even the two jobs share same jar in blob store, they will upload it twice 
at same time, which will cause file lease occuptation in hdfs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3525: [FLINK-6020]add a random integer suffix to blob key to av...

2017-03-17 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3525
  
The second rename will not fail, but make the file which written by the 
first corrupted, which will make the first job failed if the task is loading 
this jar.

by the way, the jar file will be uploaded to hdfs for recovery, and the 
uploading will fail too if there are more than two clients writing file with 
same name.

It is easy to reoccur. First launch a session with enough slots, then run a 
script contains many same job submitting, says there are 20 lines of "flink run 
../examples/steaming/WindowJoin.jar &". Make sure there's a "&" in end of each 
line to make them run in parallel.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3486: [FLINK-5981][SECURITY]make ssl version and cipher suites ...

2017-03-16 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3486
  
Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2425: FLINK-3930 Added shared secret based authorization...

2017-03-15 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/flink/pull/2425#discussion_r106335560
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/CookieHandler.java
 ---
@@ -0,0 +1,130 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.netty;
+
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import io.netty.channel.Channel;
+import io.netty.channel.ChannelHandlerContext;
+import io.netty.channel.ChannelInboundHandlerAdapter;
+import io.netty.handler.codec.MessageToMessageDecoder;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+public class CookieHandler {
+
+   public static class ClientCookieHandler extends 
ChannelInboundHandlerAdapter {
+
+   private final Logger LOG = 
LoggerFactory.getLogger(ClientCookieHandler.class);
+
+   private final String secureCookie;
+
+   final Charset DEFAULT_CHARSET = Charset.forName("utf-8");
+
+   public ClientCookieHandler(String secureCookie) {
+   this.secureCookie = secureCookie;
+   }
+
+   @Override
+   public void channelActive(ChannelHandlerContext ctx) throws 
Exception {
+   super.channelActive(ctx);
+   LOG.debug("In channelActive method of 
ClientCookieHandler");
+
+   if(this.secureCookie != null && 
this.secureCookie.length() != 0) {
+   LOG.debug("In channelActive method of 
ClientCookieHandler -> sending secure cookie");
+   final ByteBuf buffer = Unpooled.buffer(4 + 
this.secureCookie.getBytes(DEFAULT_CHARSET).length);
+   
buffer.writeInt(secureCookie.getBytes(DEFAULT_CHARSET).length);
+   
buffer.writeBytes(secureCookie.getBytes(DEFAULT_CHARSET));
+   ctx.writeAndFlush(buffer);
+   }
+   }
+   }
+
+   public static class ServerCookieDecoder extends 
MessageToMessageDecoder {
+
+   private final String secureCookie;
+
+   private final List channelList = new ArrayList<>();
+
+   private final Charset DEFAULT_CHARSET = 
Charset.forName("utf-8");
+
+   private final Logger LOG = 
LoggerFactory.getLogger(ServerCookieDecoder.class);
+
+   public ServerCookieDecoder(String secureCookie) {
+   this.secureCookie = secureCookie;
+   }
+
+   /**
+* Decode from one message to an other. This method will be 
called for each written message that can be handled
+* by this encoder.
+*
+* @param ctx the {@link ChannelHandlerContext} which this 
{@link MessageToMessageDecoder} belongs to
+* @param msg the message to decode to an other one
+* @param out the {@link List} to which decoded messages should 
be added
+* @throws Exception is thrown if an error accour
+*/
+   @Override
+   protected void decode(ChannelHandlerContext ctx, ByteBuf msg, 
List out) throws Exception {
+
+   LOG.debug("ChannelHandlerContext name: {}, channel: 
{}", ctx.name(), ctx.channel());
+
+   if(secureCookie == null || secureCookie.length() == 0) {
+   LOG.debug("Not validating secure cookie since 
the server configuration is not enabled to use cookie");
+   return;
+   }
+
+   LOG.debug("Going to decode the secure co

[GitHub] flink pull request #2425: FLINK-3930 Added shared secret based authorization...

2017-03-15 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/flink/pull/2425#discussion_r106335331
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/CookieHandler.java
 ---
@@ -0,0 +1,130 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.netty;
+
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import io.netty.channel.Channel;
+import io.netty.channel.ChannelHandlerContext;
+import io.netty.channel.ChannelInboundHandlerAdapter;
+import io.netty.handler.codec.MessageToMessageDecoder;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+
+public class CookieHandler {
+
+   public static class ClientCookieHandler extends 
ChannelInboundHandlerAdapter {
+
+   private final Logger LOG = 
LoggerFactory.getLogger(ClientCookieHandler.class);
+
+   private final String secureCookie;
+
+   final Charset DEFAULT_CHARSET = Charset.forName("utf-8");
+
+   public ClientCookieHandler(String secureCookie) {
+   this.secureCookie = secureCookie;
+   }
+
+   @Override
+   public void channelActive(ChannelHandlerContext ctx) throws 
Exception {
+   super.channelActive(ctx);
+   LOG.debug("In channelActive method of 
ClientCookieHandler");
+
+   if(this.secureCookie != null && 
this.secureCookie.length() != 0) {
+   LOG.debug("In channelActive method of 
ClientCookieHandler -> sending secure cookie");
+   final ByteBuf buffer = Unpooled.buffer(4 + 
this.secureCookie.getBytes(DEFAULT_CHARSET).length);
+   
buffer.writeInt(secureCookie.getBytes(DEFAULT_CHARSET).length);
+   
buffer.writeBytes(secureCookie.getBytes(DEFAULT_CHARSET));
+   ctx.writeAndFlush(buffer);
+   }
+   }
+   }
+
+   public static class ServerCookieDecoder extends 
MessageToMessageDecoder {
+
+   private final String secureCookie;
+
+   private final List channelList = new ArrayList<>();
--- End diff --

Is it better to use `Set` instead of a `List` here? As it is mainly used 
for lookup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3486: [FLINK-5981][SECURITY]make ssl version and cipher suites ...

2017-03-15 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3486
  
Since not merged, I've turned them around. Sorry for the carelessness :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3486: [FLINK-5981][SECURITY]make ssl version and cipher suites ...

2017-03-14 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3486
  
@vijikarthi I've checked the [JDK 
doc](http://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.html#ciphersuites)
 and not found any notes about combination of ssl version and ciper suites. 
About cihper suites, it says `The following list contains the standard JSSE 
cipher suite names. Over time, various groups have added additional cipher 
suites to the SSL/TLS namespace. `, so i think we better not add additional 
description about that and let user to follow JRE/JDK rules.

@StephanEwen Two comments you mentioned was fixed :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3486: [FLINK-5981][SECURITY]make ssl version and cipher suites ...

2017-03-14 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3486
  
thanks for review @vijikarthi. I will check if there are mismatch between 
protocols and cipher suites and document it if any.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3486: [FLINK-5981][SECURITY]make ssl version and cipher suites ...

2017-03-14 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3486
  
@StephanEwen I've added some test cases for testing new function and a 
ITCase to prove akka cannot accept more than one protocol setting. Let me know 
if there's better way to implementation :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3525: [FLINK-6020]add a random integer suffix to blob ke...

2017-03-13 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3525

[FLINK-6020]add a random integer suffix to blob key to avoid naming 
conflicting

In yarn-cluster mode, if we submit one same job multiple times parallelly, 
the task will encounter class load problem and lease occuputation.

Because blob server stores user jars in name with generated sha1sum of 
those, first writes a temp file and move it to finalialize. For recovery it 
also will put them to HDFS with same file name.

In same time, when multiple clients sumit same job with same jar, the local 
jar files in blob server and those file on hdfs will be handled in multiple 
threads(BlobServerConnection), and impact each other.

I've found a way to solve this by adding a random integer suffix to blob 
key. Like changed here.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-6020

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3525.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3525


commit 3d9f41afad9c831431b3c7bd0eb2a8006b92718e
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-03-13T11:52:36Z

add a random integer suffix to blob key to avoid naming conflicting




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTaskSlots a...

2017-03-12 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3408
  
ping @tillrohrmann 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3414: [FLINK-5904][YARN]make jobmanager.heap.mb and taskmanager...

2017-03-12 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3414
  
@tillrohrmann I've changed per comments. Mind reviewing again? Thanks :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTaskSlots a...

2017-03-08 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3408
  
I move the initiallization of this config to constructor of cluster 
descripter and restore the deleted configuration setting.
Please check if we are good with the usage of `YARN_VCORES`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTask...

2017-03-08 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/flink/pull/3408#discussion_r104867661
  
--- Diff: 
flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java ---
@@ -537,7 +543,6 @@ public YarnClusterClient createCluster(
Preconditions.checkNotNull(userJarFiles, "User jar files should 
not be null.");
 
AbstractYarnClusterDescriptor yarnClusterDescriptor = 
createDescriptor(applicationName, cmdLine);
-   yarnClusterDescriptor.setFlinkConfiguration(config);
--- End diff --

all right


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3414: [FLINK-5904][YARN]make jobmanager.heap.mb and task...

2017-03-08 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/flink/pull/3414#discussion_r104862993
  
--- Diff: 
flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java ---
@@ -306,12 +308,16 @@ public AbstractYarnClusterDescriptor 
createDescriptor(String defaultApplicationN
if (cmd.hasOption(JM_MEMORY.getOpt())) {
int jmMemory = 
Integer.valueOf(cmd.getOptionValue(JM_MEMORY.getOpt()));
yarnClusterDescriptor.setJobManagerMemory(jmMemory);
+   } else if 
(config.containsKey(ConfigConstants.JOB_MANAGER_HEAP_MEMORY_KEY)) {
+   
yarnClusterDescriptor.setJobManagerMemory(config.getInteger(ConfigConstants.JOB_MANAGER_HEAP_MEMORY_KEY,
 ConfigConstants.DEFAULT_JOB_MANAGER_HEAP_MEMORY));
--- End diff --

That's a good idea. we don't need to set it explicitly here if we init it 
in constructor :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTask...

2017-03-08 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/flink/pull/3408#discussion_r104862308
  
--- Diff: 
flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java ---
@@ -317,6 +319,10 @@ public AbstractYarnClusterDescriptor 
createDescriptor(String defaultApplicationN
if (cmd.hasOption(SLOTS.getOpt())) {
int slots = 
Integer.valueOf(cmd.getOptionValue(SLOTS.getOpt()));
yarnClusterDescriptor.setTaskManagerSlots(slots);
+   } else if (config.containsKey(ConfigConstants.YARN_VCORES)) {
--- End diff --

the document says YARN_VCORE > slotsOfTaskManager > default value(1) shen 
they are set in file.
the parameters in command line will be used before those in config file, in 
some way it is a common sense :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTask...

2017-03-07 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/flink/pull/3408#discussion_r104855322
  
--- Diff: 
flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java ---
@@ -317,6 +319,10 @@ public AbstractYarnClusterDescriptor 
createDescriptor(String defaultApplicationN
if (cmd.hasOption(SLOTS.getOpt())) {
int slots = 
Integer.valueOf(cmd.getOptionValue(SLOTS.getOpt()));
yarnClusterDescriptor.setTaskManagerSlots(slots);
+   } else if (config.containsKey(ConfigConstants.YARN_VCORES)) {
--- End diff --

And in document, there still has introduction of YARN_VCORES.

```
yarn.containers.vcores The number of virtual cores (vcores) per YARN 
container. By default, the 
number of vcores is set to the number of slots per TaskManager, if set, or 
to 1, otherwise.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3414: [FLINK-5904][YARN]make jobmanager.heap.mb and task...

2017-03-07 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/flink/pull/3414#discussion_r104854352
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/configuration/ConfigConstants.java ---
@@ -110,7 +110,12 @@
public static final String EXECUTION_RETRY_DELAY_KEY = 
"execution-retries.delay";

//  Runtime 
---
-   
+
+   /**
+* JVM heap size (in megabytes) for the JobManager
+*/
+   public static final String JOB_MANAGER_HEAP_MEMORY_KEY = 
"jobmanager.heap.mb";
--- End diff --

Nice. I've changed :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3414: [FLINK-5904][YARN]make jobmanager.heap.mb and task...

2017-03-07 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/flink/pull/3414#discussion_r104853211
  
--- Diff: 
flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java ---
@@ -306,12 +308,16 @@ public AbstractYarnClusterDescriptor 
createDescriptor(String defaultApplicationN
if (cmd.hasOption(JM_MEMORY.getOpt())) {
int jmMemory = 
Integer.valueOf(cmd.getOptionValue(JM_MEMORY.getOpt()));
yarnClusterDescriptor.setJobManagerMemory(jmMemory);
+   } else if 
(config.containsKey(ConfigConstants.JOB_MANAGER_HEAP_MEMORY_KEY)) {
+   
yarnClusterDescriptor.setJobManagerMemory(config.getInteger(ConfigConstants.JOB_MANAGER_HEAP_MEMORY_KEY,
 ConfigConstants.DEFAULT_JOB_MANAGER_HEAP_MEMORY));
--- End diff --

do you mean we should only create `jobManagerMemoryMb` object but not init 
it with value?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTaskSlots a...

2017-03-07 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3408
  
I'm sorry about the commit message :(
next time I'll format it, as it's better not to squash.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTask...

2017-03-07 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/flink/pull/3408#discussion_r104837998
  
--- Diff: 
flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java ---
@@ -537,7 +543,6 @@ public YarnClusterClient createCluster(
Preconditions.checkNotNull(userJarFiles, "User jar files should 
not be null.");
 
AbstractYarnClusterDescriptor yarnClusterDescriptor = 
createDescriptor(applicationName, cmdLine);
-   yarnClusterDescriptor.setFlinkConfiguration(config);
--- End diff --

@tillrohrmann Under what condition this configuration will contains values 
added programmatically? I've checked the codes and only found this config is 
initiallized from config file. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTask...

2017-03-07 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/flink/pull/3408#discussion_r104836285
  
--- Diff: 
flink-yarn/src/main/java/org/apache/flink/yarn/cli/FlinkYarnSessionCli.java ---
@@ -317,6 +319,10 @@ public AbstractYarnClusterDescriptor 
createDescriptor(String defaultApplicationN
if (cmd.hasOption(SLOTS.getOpt())) {
int slots = 
Integer.valueOf(cmd.getOptionValue(SLOTS.getOpt()));
yarnClusterDescriptor.setTaskManagerSlots(slots);
+   } else if (config.containsKey(ConfigConstants.YARN_VCORES)) {
--- End diff --

@tillrohrmann You mean YARN_VCORES is deprecated now? After checking code I 
found there're two places where we still use it: 
[here](https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java#L317)
 and 
[here](https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnFlinkResourceManager.java#L339),
 especially the latter one is used for container request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3486: [FLINK-5981][SECURITY]make ssl version and cipher ...

2017-03-07 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on a diff in the pull request:

https://github.com/apache/flink/pull/3486#discussion_r104828499
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/net/SSLUtils.java ---
@@ -55,6 +58,42 @@ public static boolean getSSLEnabled(Configuration 
sslConfig) {
}
 
/**
+* Sets SSl version and cipher suites for SSLServerSocket
+* @param socket
+*Socket to be handled
+* @param config
+*The application configuration
+*/
+   public static void setSSLVerAndCipherSuites(ServerSocket socket, 
Configuration config) {
+   if (socket instanceof SSLServerSocket) {
+   ((SSLServerSocket) 
socket).setEnabledProtocols(config.getString(
+   ConfigConstants.SECURITY_SSL_PROTOCOL,
+   
ConfigConstants.DEFAULT_SECURITY_SSL_PROTOCOL).split(","));
--- End diff --

By "do explicit handing", do you mean we should check if the elements in 
split resutls are both legal(must be one of `SSLv3, TLSv1, TLSv1.1`)?

BTW/FYI: This config also applies to akka, and it seems like akka only 
support single value setting but not multiple ones(It throws exception when I 
set the value to `TLSv1.1,TLSv1.2`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3415: [FLINK-5916][YARN]make env.java.opts.jobmanager and env.j...

2017-03-07 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3415
  
@tillrohrmann Thanks for the review. I've added the logic checking if the 
config is non-empty and test cases :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3486: [FLINK-5981][SECURITY]make ssl version and cipher ...

2017-03-07 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3486

[FLINK-5981][SECURITY]make ssl version and cipher suites work as configured

I configured ssl and start flink job, but found configured properties 
cannot apply properly:
```
akka port: only ciper suites apply right, ssl version not
blob server/netty server: both ssl version and ciper suites are not like 
what I configured
```
I've found out the reason why:

http://stackoverflow.com/questions/11504173/sslcontext-initialization (for 
blob server and netty server)
https://groups.google.com/forum/#!topic/akka-user/JH6bGnWE8kY(for akka ssl 
version, it's fixed in akka 2.4:https://github.com/akka/akka/pull/21078)

Configs:
```
security.ssl.protocol: TLSv1.1

security.ssl.algorithms: TLS_RSA_WITH_AES_128_CBC_SHA
```
**The scan results before:**

![before_blob_server](https://cloud.githubusercontent.com/assets/5276001/23655830/d37eb680-0371-11e7-952c-4a6514b1c42b.JPG)
**The scan results after fix:**

![after_blob_server](https://cloud.githubusercontent.com/assets/5276001/23655841/dfc09da0-0371-11e7-8486-bc807e877dff.JPG)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-5981

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3486.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3486


commit c75c2e3f38e0a856ead1316223ad3d81061e4252
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-03-07T12:05:21Z

make ssl version and cipher suites work as configured




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3274: [FLINK-5723][UI]Use Used instead of Initial to make taskm...

2017-02-26 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3274
  
sure.

Fixed in 
 - 1.2.1 via e3e3c2a7f9c8dd8576e0e27b2efddb7ff42c7c0d 
 - 1.3.0 via 03e6c249156fbbfeef39397a70c70bb905469d09 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3274: [FLINK-5723][UI]Use Used instead of Initial to mak...

2017-02-26 Thread WangTaoTheTonic
Github user WangTaoTheTonic closed the pull request at:

https://github.com/apache/flink/pull/3274


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3415: [FLINK-5916][YARN]make env.java.opts.jobmanager an...

2017-02-24 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3415

[FLINK-5916][YARN]make env.java.opts.jobmanager and 
env.java.opts.taskmanager working i…

Now only env.java.opts works in YARN mode, and it applies both to JM and 
TM. 

It's useful to make env.java.opts.jobmanager and env.java.opts.taskmanager 
working in YARN mode in addition, to support fine grained params setting.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-5916

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3415.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3415


commit 63a4e82858109644c8c9f940fd891b9aea11090a
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-02-25T07:35:19Z

make env.java.opts.jobmanager and env.java.opts.taskmanager working in YARN 
mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3414: [FLINK-5904][YARN]make jobmanager.heap.mb and task...

2017-02-24 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3414

[FLINK-5904][YARN]make jobmanager.heap.mb and taskmanager.heap.mb work in 
YARN mode

 I'm making these two configuration items same with "-yjm""-ytm" in yarn 
session and "-jm""-tm" in single job. Looks like it might get some 
misunderstanding.

We don't have a proper config item here. What's better idea do you suggest?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-5904

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3414


commit b21d27ea99ffa613a8eb9c00a4f977b619b12418
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-02-25T04:19:43Z

make jobmanager.heap.mb and taskmanager.heap.mb work in YARN mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3408: [FLINK-5903][YARN]respect taskmanager.numberOfTask...

2017-02-24 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3408

[FLINK-5903][YARN]respect taskmanager.numberOfTaskSlots and 
yarn.containers.vcores in YARN mode

Make sure taskmanager.numberOfTaskSlots and yarn.containers.vcores works in 
YARN mode. The priorities is: -s/-ys > yarn.containers.vcores > 
taskmanager.numberOfTaskSlots.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-5903

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3408.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3408


commit 3515f9faf4b40bff7310a55b7094b52999525cbb
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-02-24T08:11:47Z

xxx




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-21 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
@greghogan I think it's more like an improvement rather than a new feature. 
Anyway I'll post to mailling list for discussion.

Thanks all guys :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-20 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
As sub dirs are created by different jobs/users under root directory, we 
keep it minimum(or configurable) at creation in order to keep the data safe.

When a user has needs of accessing checkpointing files of other users, 
we(admin or file owner) can give it right to access. This can be more flexible 
than setting ACLs in root directory and more fine grained, because each user 
can decide who can touch its checkpointing files ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-19 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
@greghogan I'm aware of that, but my concern is when lots of users store 
their checkpoint files under same root directory, it would be a burden for 
admin to set different ACLs for different needs, like user1 can read user2 and 
user3's files, while user2 can only read files of user1, while user3 would like 
read files of user4, while ...

Only set one ACL(like flink_admin) to allow one group to access all is not 
fine grained, as there is need that for some user (like user1), we only allow 
it to access some, not all, of sub directories(like sub directories user2 and 
user3 created).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-19 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
We are very close in scenario :)

My point is that multiple users would use same root directory to store 
their checkpoint files(creating single directory for each job is complex), 
which makes it very hard for admin to set a proper permissions for it.

Adding a configuration item is a very good idea. Would it be better if this 
configuration would be applied to the sub directories each job created? It will 
resolve isolation of access between different users' checkpoint file and also 
can be customized for migrating. @StephanEwen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3274: [FLINK-5723][UI]Use Used instead of Initial to make taskm...

2017-02-18 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3274
  
@StephanEwen What do you think about this? Could you help merge this if 
you're ok with it? Thanks :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-17 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
I've read all guys and list preconditions and solutions for this directory 
permission setting. 

## Preconditions
1. Every flink job(session or single) can specify a directory storing 
checkpoint, called `state.backend.fs.checkpointdir`.
2. Different jobs can set same or different directories, which means their 
checkpoint files can be stored in one same or different directories, with 
**sub-dir** created with their own job-ids.
3. Jobs can be run by different users, and users has requirement that one 
could not read chp files written by another user, which will cause information 
leak.
4. In some condition(which is relatively rare, I think), as @StephanEwen 
said, users has need to access other users’ chp files for cloning/migrating 
jobs.
5. The chp files path is like: 
`hdfs://namenode:port/flink-checkpoints//chk-17/6ba7b810-9dad-11d1-80b4-00c04fd430c8`

## Solutions 
### Solution #1 (would not require changes)
1. Admins control permission of root directory via HDFS ACLs(set it like: 
user1 can read, user2 can only read, …).
2. This has two disadvantages: a) It is a huge burden for Admins to set 
different permissions for large number of users/groups); and b) sub-dirs 
inherited permissions from root directory, which means they are basically same, 
which make it hard to do fine grained control.
### Solution #2 (this proposal)
1. We don’t care what permission of the root dir is. It can be create 
while setup or job running, as long as it is available to use.
2. We control every sub-dir created by different jobs(which are submitted 
by different users, in most cases), and set it to a lower value(like “700”) 
to prevent it to be read by others.
3. If someone wanna migrate or clone jobs across users(again, this scenario 
is rare in my view), he should ask admins(normally HDFS admin) to add ACLs(or 
whatever) for this purpose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3337: [FLINK-5825][UI]make the cite path relative to sho...

2017-02-16 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3337

[FLINK-5825][UI]make the cite path relative to show it correctly

In yarn mode, the web frontend url is accessed from yarn in format like 
"http://spark-91-206:8088/proxy/application_1487122678902_0015/;, and the 
running job page's url is 
"http://spark-91-206:8088/proxy/application_1487122678902_0015/#/jobs/9440a129ea5899c16e7c1a7e8f2897b3;.
One .png file called "horizontal.png", which is very small can not be 
loaded in that mode, because in "index.styl" it is cited as absolute path.
We should make the path relative.

- [ ] Tests & Build
LATER I will paste difference between before and after.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-5825

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3337.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3337


commit c2e2711b2f815a4aa1b9c8be2478c963c41da245
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-02-17T06:48:14Z

make the cite path relative to show it correctly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-16 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
Hi @greghogan , I'm not sure I understand the relationship between HDFS 
ACLs and this change I proposed. Could you explain more specifically? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3335: [FLINK-5818][Security]change checkpoint dir permission to...

2017-02-16 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3335
  
Hi Stephan,

You may have a little misunderstanding about this change. It only controls 
directories with job id (generated using UUID), but not the configured root 
checkpoint directory.  I agree with you that the root directory should be 
created or changed permission when setup, but setup would not be aware of these 
directories with job ids, which are created in runtime.

About Hadoop dependency, I admit I am using a convenient (let's say a hack 
way) to do the transition, as it need a bit more codes to do it standalone. I 
will change it if it's a problem :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3335: [FLINK-5818][Security]change checkpoint dir permis...

2017-02-16 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3335

[FLINK-5818][Security]change checkpoint dir permission to 700

Now checkpoint directory is made w/o specified permission, so it is easy 
for another user to delete or read files under it, which will cause restore 
failure or information leak.

It's better to lower it down to 700.

- [x] Tests & Build
  - Functionality added by the pull request is covered by tests

![chp-filesystem-session](https://cloud.githubusercontent.com/assets/5276001/23019741/d753e8e0-f47e-11e6-9f2e-2cd35de35ef1.JPG)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink FLINK-5818

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3335.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3335


commit 02eef87dc2bfaa6737efad023916898719d34fe2
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-02-16T11:24:43Z

change checkpoint dir permission to 700




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3274: [FLINK-5723][UI]Use Used instead of Initial to make taskm...

2017-02-13 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3274
  
@zentol Is it ok to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3274: [FLINK-5723][UI]Use Used instead of Initial to make taskm...

2017-02-07 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3274
  
Thanks. That makes sense :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3274: [FLINK-5723][UI]Use Used instead of Initial to make taskm...

2017-02-07 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3274
  
Thanks @greghogan. Should I make another PR against branch `release-1.2`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3274: [FLINK-5723][UI]Use Used instead of Initial to make taskm...

2017-02-07 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3274
  
Travis timeout :) How to kick it up again?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3283: [FLINK-5729][EXAMPLES]add hostname option to be mo...

2017-02-07 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3283

[FLINK-5729][EXAMPLES]add hostname option to be more convenient

"hostname" option will help users to get data from the right port, 
otherwise the example would fail easily due to connection refused, especially 
in yarn-session mode.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink hostname

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3283.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3283


commit 1a8117f8a70a6bd8e88b3a2cb245b4f9db4f9193
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-02-07T07:52:26Z

add hostname option to be more convenient




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3274: [FLINK-5723][UI]Use Used instead of Initial to make taskm...

2017-02-06 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3274
  
@zentol Thanks for review. I've changed .jade file, but it looks like CI is 
not up properly :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3274: [FLINK-5723][UI]Use Used instead of Initial to mak...

2017-02-06 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3274

[FLINK-5723][UI]Use Used instead of Initial to make taskmanager tag more 
readable

Now in JobManager web fronted, the used memory of task managers is 
presented as "Initial" in table header, which actually means "memory used", 
from codes.

I'd like change it to be more readable, even it is trivial one.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink changeheader

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3274


commit 475e668aa1015c10cd69ffa88b6b0c4e3aeeb75f
Author: unknown <wangtaotheto...@163.com>
Date:   2017-02-06T15:18:38Z

Use Used instead of Initial to make taskmanager tag more readable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3071: [FLINK-5417][DOCUMENTATION]correct the wrong config file ...

2017-01-16 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3071
  
Surely not :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3071: [FLINK-5417][DOCUMENTATION]correct the wrong config file ...

2017-01-16 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3071
  
I guess it is probably that the illustrator added sth.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3071: [FLINK-5417][DOCUMENTATION]correct the wrong config file ...

2017-01-13 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3071
  
Hi @zentol , I've updated the svg using Inkscape. 
Is the whitespace you refer to on the topest(the red highlighted part)? I 
think it's normal as the original one has them to. It would not affect view in 
documents.

![default](https://cloud.githubusercontent.com/assets/5276001/21953009/8eef0426-da67-11e6-850a-935268ad019e.JPG)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #3071: [FLINK-5417][DOCUMENTATION]correct the wrong config file ...

2017-01-05 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/flink/pull/3071
  
I use Illustrator‎ to edit svg file, which will add some header infos 
that cause CI failed. Is there any prefered svg editor?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #3071: [FLINK-5417][DOCUMENTATION]correct the wrong confi...

2017-01-05 Thread WangTaoTheTonic
GitHub user WangTaoTheTonic opened a pull request:

https://github.com/apache/flink/pull/3071

[FLINK-5417][DOCUMENTATION]correct the wrong config file name

As the config file name is conf/flink-conf.yaml, the usage 
"conf/flink-config.yaml" in document is wrong and easy to confuse user. We 
should correct them.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WangTaoTheTonic/flink outdate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3071.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3071


commit 8c927bda55fead6b4cc90f49151c19907ac3700f
Author: WangTaoTheTonic <wangtao...@huawei.com>
Date:   2017-01-06T04:12:31Z

fix the wrong config file name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---