[GitHub] [flink] HuangZhenQiu commented on pull request #11541: [FLINK-15416][network] add task manager netty client retry mechenism

2020-06-21 Thread GitBox


HuangZhenQiu commented on pull request #11541:
URL: https://github.com/apache/flink/pull/11541#issuecomment-647147788


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] HuangZhenQiu commented on pull request #11541: [FLINK-15416][network] add task manager netty client retry mechenism

2020-06-21 Thread GitBox


HuangZhenQiu commented on pull request #11541:
URL: https://github.com/apache/flink/pull/11541#issuecomment-647147774


   @flinkbot run travis
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] HuangZhenQiu commented on pull request #11541: [FLINK-15416][network] add task manager netty client retry mechenism

2020-06-20 Thread GitBox


HuangZhenQiu commented on pull request #11541:
URL: https://github.com/apache/flink/pull/11541#issuecomment-647079845


   @flinkbot run travis



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] HuangZhenQiu commented on pull request #11541: [FLINK-15416][network] add task manager netty client retry mechenism

2020-06-20 Thread GitBox


HuangZhenQiu commented on pull request #11541:
URL: https://github.com/apache/flink/pull/11541#issuecomment-647079689


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] HuangZhenQiu commented on pull request #11541: [FLINK-15416][network] add task manager netty client retry mechenism

2020-06-19 Thread GitBox


HuangZhenQiu commented on pull request #11541:
URL: https://github.com/apache/flink/pull/11541#issuecomment-646864181


   @pnowojski @rkhachatryan 
   I think what @rkhachatryan proposed is the most concise and scalable 
solution. I rebased master with some small changes. Please take a look.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] HuangZhenQiu commented on pull request #11541: [FLINK-15416][network] add task manager netty client retry mechenism

2020-06-19 Thread GitBox


HuangZhenQiu commented on pull request #11541:
URL: https://github.com/apache/flink/pull/11541#issuecomment-646809118


   @rkhachatryan 
   I think the solution you proposed it much better. Let me update it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] HuangZhenQiu commented on pull request #11541: [FLINK-15416][network] add task manager netty client retry mechenism

2020-06-19 Thread GitBox


HuangZhenQiu commented on pull request #11541:
URL: https://github.com/apache/flink/pull/11541#issuecomment-646736438


   @pnowojski 
   If different threads are going to request 
   PartitionRequestClient for different connectionId, there is no resource 
conflicts. Thus, they can parallel run. But if more than two threads try to 
build PartitionRequestClient for the same connectionId, as @rkhachatryan 
pointed out the deadlock will happen.
   
   @pnowojski 
   Would you please help to trigger to Azure test?
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] HuangZhenQiu commented on pull request #11541: [FLINK-15416][network] add task manager netty client retry mechenism

2020-06-19 Thread GitBox


HuangZhenQiu commented on pull request #11541:
URL: https://github.com/apache/flink/pull/11541#issuecomment-646733511


   @rkhachatryan 
   Thanks for giving feedback on the PR. I think the 
   PR is to tackle different things compare to 
taskmanager.network.request-backoff.xxx. 
taskmanager.network.request-backoff.xxx is translated into delayMs in 
NettyPartitionRequestClient to request sub Partition. It is after the TCP 
connection is built. The retry added here is to make TCP connection creation 
more reliable.
   
   Did you test the latest version that has added synchronized on connectionID. 
It works locally from my side now. The Azure test has been not triggered it yet.
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] HuangZhenQiu commented on pull request #11541: [FLINK-15416][network] add task manager netty client retry mechenism

2020-06-14 Thread GitBox


HuangZhenQiu commented on pull request #11541:
URL: https://github.com/apache/flink/pull/11541#issuecomment-643845760


   @pnowojski 
   Yes, you are right. It is deadline for creating channel clients for the same 
connection id. To achieve a parallel connection building process for different 
connection id and also a synchronized way for the requests to the same 
connection id, I chose to synchronize connectionId in the 
createPartitionRequestClient function. It can guarantee the correctness with 
scarifying the scalability of the connection creation. Actually, we don't need 
to even use concurrent map. How do you think?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] HuangZhenQiu commented on pull request #11541: [FLINK-15416][network] add task manager netty client retry mechenism

2020-06-11 Thread GitBox


HuangZhenQiu commented on pull request #11541:
URL: https://github.com/apache/flink/pull/11541#issuecomment-642838317


   @pnowojski 
   Oh, I see. I didn't check the class level to ignore. how about putting 
ignore only in the method testResourceReleaseAfterInterruptedConnect?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] HuangZhenQiu commented on pull request #11541: [FLINK-15416][network] add task manager netty client retry mechenism

2020-06-04 Thread GitBox


HuangZhenQiu commented on pull request #11541:
URL: https://github.com/apache/flink/pull/11541#issuecomment-638957276


   @pnowojski For your comments.
   1) There is no test that is marked to ignore by this PR. Would you please 
double-check?
   2) Rebased master again.
   3) Add join logic to wait for all of the threads successfully executed.
   
   I will try my best to make the PR needs the requirement so that it will use 
less time to merge it later. As it is needed for our internal release, 
hopefully, it can be merged soon.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] HuangZhenQiu commented on pull request #11541: [FLINK-15416][network] add task manager netty client retry mechenism

2020-06-03 Thread GitBox


HuangZhenQiu commented on pull request #11541:
URL: https://github.com/apache/flink/pull/11541#issuecomment-638588430


   @pnowojski @zhijiangW I agree that the global lock will be concise in terms 
of implementation. But it will be big change in the class, I also prefer to 
leave it as a separate PR. According to your suggestions, I added a multiple 
thread test case. Please take a look when you have time.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org