[jira] [Commented] (FLINK-10469) FileChannel may not write the whole buffer in a single call to FileChannel.write(Buffer buffer)

2018-09-30 Thread Guowei Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633622#comment-16633622
 ] 

Guowei Ma commented on FLINK-10469:
---

+1

> FileChannel may not write the whole buffer in a single call to 
> FileChannel.write(Buffer buffer)
> ---
>
> Key: FLINK-10469
> URL: https://issues.apache.org/jira/browse/FLINK-10469
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.4.1, 1.4.2, 1.5.3, 1.6.0, 1.6.1, 1.7.0, 1.5.4, 1.6.2
>Reporter: Yun Gao
>Priority: Major
>
> Currently all the calls to _FileChannel.write(ByteBuffer src)_ assumes that 
> this method will not return before the whole buffer is written, like the one 
> in _AsynchronousFileIOChannel.write()._
>  
> However, this assumption may not be right for all the environments. We have 
> encountered the case that only part of a buffer was written on a cluster with 
> a high IO load, and the target file got messy. 
>  
> To fix this issue, I think we should add a utility method in the 
> org.apache.flink.util.IOUtils to ensure the whole buffer is written with a 
> loop,and replace all the calls to _FileChannel.write(ByteBuffer)_ with this 
> new method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10469) FileChannel may not write the whole buffer in a single call to FileChannel.write(Buffer buffer)

2018-09-30 Thread Yun Gao (JIRA)
Yun Gao created FLINK-10469:
---

 Summary: FileChannel may not write the whole buffer in a single 
call to FileChannel.write(Buffer buffer)
 Key: FLINK-10469
 URL: https://issues.apache.org/jira/browse/FLINK-10469
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.5.4, 1.6.1, 1.6.0, 1.5.3, 1.4.2, 1.4.1, 1.7.0, 1.6.2
Reporter: Yun Gao


Currently all the calls to _FileChannel.write(ByteBuffer src)_ assumes that 
this method will not return before the whole buffer is written, like the one in 
_AsynchronousFileIOChannel.write()._

 

However, this assumption may not be right for all the environments. We have 
encountered the case that only part of a buffer was written on a cluster with a 
high IO load, and the target file got messy. 

 

To fix this issue, I think we should add a utility method in the 
org.apache.flink.util.IOUtils to ensure the whole buffer is written with a 
loop,and replace all the calls to _FileChannel.write(ByteBuffer)_ with this new 
method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9084) Upgrade cassandra-driver-core to 3.6.0 to introduce BusyPoolException retries to Cassandra Connector

2018-09-30 Thread Jacob Park (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacob Park updated FLINK-9084:
--
Summary: Upgrade cassandra-driver-core to 3.6.0 to introduce 
BusyPoolException retries to Cassandra Connector  (was: Upgrade 
cassandra-driver-core to 3.4.0 to introduce BusyPoolException retries to 
Cassandra Connector)

> Upgrade cassandra-driver-core to 3.6.0 to introduce BusyPoolException retries 
> to Cassandra Connector
> 
>
> Key: FLINK-9084
> URL: https://issues.apache.org/jira/browse/FLINK-9084
> Project: Flink
>  Issue Type: Improvement
>  Components: Cassandra Connector
>Reporter: Jacob Park
>Assignee: Jacob Park
>Priority: Minor
>
> As per [https://datastax-oss.atlassian.net/browse/SPARKC-503,] that exception 
> will be thrown by executeAsync() since at least 3.1.4 as executeAsync() is 
> fully async, so it throws that exception if the driver-side pool is busy. 
> This improvement has greatly improved the peek throughput of our Cassandra 
> Sink Connector. It also allowed us to get all the benefits of the latest 
> driver version.
> I would like to contribute this feature back upstream.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL

2018-09-30 Thread Fan weiwen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633442#comment-16633442
 ] 

Fan weiwen edited comment on FLINK-6966 at 9/30/18 5:04 PM:


this is my test sql
{code:java}
//代码占位符
i use redis  sink   but can test in kafka or other sink
Asql
select
reqIp as factorContenta,
count(*) as eCount,
60 * 60 as expire
from
kafkasource
where
uri in not null 
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
reqIp
having
count(*) >= 1


Bsql
select
uid as factorContentb,
count(*) as eCount,
60 * 60 as expire
from
kafkasource
where
uri is not null
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
uid
having
count(*) >= 1

{code}
this is my test data
{code:java}
//代码占位符
{
"code" : "200",
"reqIp" : "656.19.173.34",
"t" : 1537950912546,
"uid" : 6630519,
"uri" : "/test"
}

{code}
first  the job only Asql , start A sql     redis has a key   656.19.173.34

then stop Asql and savepoint       del redis key  656.19.173.34

now start Bsql from savepoint   you can  find it  key  656.19.173.34  is create

redis has  Asql create key  and Bsql create key

this is wrong, Bsql fetch from Asql state

 

 


was (Author: fanweiwen):
this is my test sql
{code:java}
//代码占位符
i use redis  sink   but can test in kafka or other sink
Asql
select
reqIp as factorContenta,
count(*) as eCount,
60 * 60 as expire
from
kafkasource
where
uri in not null 
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
reqIp
having
count(*) >= 1


Bsql
select
uid as factorContentb,
count(*) as eCount,
60 * 60 as expire
from
kafkasource
where
uri is not null
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
uid
having
count(*) >= 1

{code}
this is my test data
{code:java}
//代码占位符
{
"code" : "200",
"reqIp" : "656.19.173.34",
"t" : 1537950912546,
"uid" : 6630519,
"uri" : "/test"
}

{code}
first  the job only Asql , start A sql     redis has a key   656.19.173.34

then stop Asql and savepoint       del redis key  656.19.173.34

now start Bsql from savepoint   you can  find it  key  656.19.173.34  is create

redis has  Asql create key  and Bsql create key

this is wrong

 

 

> Add maxParallelism and UIDs to all operators generated by the Table API / SQL
> -
>
> Key: FLINK-6966
> URL: https://issues.apache.org/jira/browse/FLINK-6966
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API  SQL
>Affects Versions: 1.4.0
>Reporter: Fabian Hueske
>Priority: Major
>
> At the moment, the Table API does not assign UIDs and the max parallelism to 
> operators (except for operators with parallelism 1).
> We should do that to avoid problems when rescaling or restarting jobs from 
> savepoints.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL

2018-09-30 Thread Fan weiwen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633442#comment-16633442
 ] 

Fan weiwen edited comment on FLINK-6966 at 9/30/18 5:02 PM:


this is my test sql
{code:java}
//代码占位符
i use redis  sink   but can test in kafka or other sink
Asql
select
reqIp as factorContenta,
count(*) as eCount,
60 * 60 as expire
from
kafkasource
where
uri in not null 
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
reqIp
having
count(*) >= 1


Bsql
select
uid as factorContentb,
count(*) as eCount,
60 * 60 as expire
from
kafkasource
where
uri is not null
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
uid
having
count(*) >= 1

{code}
this is my test data
{code:java}
//代码占位符
{
"code" : "200",
"reqIp" : "656.19.173.34",
"t" : 1537950912546,
"uid" : 6630519,
"uri" : "/test"
}

{code}
first  the job only Asql , start A sql     redis has a key   656.19.173.34

then stop Asql and savepoint       del redis key  656.19.173.34

now start Bsql from savepoint   you can  find it  key  656.19.173.34  is create

redis has  Asql create key  and Bsql create key

this is wrong

 

 


was (Author: fanweiwen):
this is my test sql
{code:java}
//代码占位符
i use redis  sink   but can test in kafka or other sink
Asql
select
reqIp as factorContenta,
count(*) as eCount,
60 * 60 as expire
from
xp_sso_biz_source
where
uri in not null 
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
reqIp
having
count(*) >= 1


Bsql
select
uid as factorContentb,
count(*) as eCount,
60 * 60 as expire
from
xp_sso_biz_source
where
uri is not null
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
uid
having
count(*) >= 1

{code}
this is my test data
{code:java}
//代码占位符
{
"code" : "200",
"reqIp" : "656.19.173.34",
"t" : 1537950912546,
"uid" : 6630519,
"uri" : "/test"
}

{code}
first  the job only Asql , start A sql     redis has a key   656.19.173.34

then stop Asql and savepoint       del redis key  656.19.173.34

now start Bsql from savepoint   you can  find it  key  656.19.173.34  is create

redis has  Asql create key  and Bsql create key

this is wrong

 

 

> Add maxParallelism and UIDs to all operators generated by the Table API / SQL
> -
>
> Key: FLINK-6966
> URL: https://issues.apache.org/jira/browse/FLINK-6966
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API  SQL
>Affects Versions: 1.4.0
>Reporter: Fabian Hueske
>Priority: Major
>
> At the moment, the Table API does not assign UIDs and the max parallelism to 
> operators (except for operators with parallelism 1).
> We should do that to avoid problems when rescaling or restarting jobs from 
> savepoints.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL

2018-09-30 Thread Fan weiwen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633442#comment-16633442
 ] 

Fan weiwen commented on FLINK-6966:
---

this is my test sql
{code:java}
//代码占位符
i use redis  sink   but can test in kafka or other sink
Asql
select
reqIp as factorContenta,
count(*) as eCount,
60 * 60 as expire
from
xp_sso_biz_source
where
uri in not null 
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
reqIp
having
count(*) >= 1


Bsql
select
uid as factorContentb,
count(*) as eCount,
60 * 60 as expire
from
xp_sso_biz_source
where
uri is not null
group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
uid
having
count(*) >= 1

{code}
this is my test data
{code:java}
//代码占位符
{
"code" : "200",
"reqIp" : "656.19.173.34",
"t" : 1537950912546,
"uid" : 6630519,
"uri" : "/test"
}

{code}
first  the job only Asql , start A sql     redis has a key   656.19.173.34

then stop Asql and savepoint       del redis key  656.19.173.34

now start Bsql from savepoint   you can  find it  key  656.19.173.34  is create

redis has  Asql create key  and Bsql create key

this is wrong

 

 

> Add maxParallelism and UIDs to all operators generated by the Table API / SQL
> -
>
> Key: FLINK-6966
> URL: https://issues.apache.org/jira/browse/FLINK-6966
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API  SQL
>Affects Versions: 1.4.0
>Reporter: Fabian Hueske
>Priority: Major
>
> At the moment, the Table API does not assign UIDs and the max parallelism to 
> operators (except for operators with parallelism 1).
> We should do that to avoid problems when rescaling or restarting jobs from 
> savepoints.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL

2018-09-30 Thread Fan weiwen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633423#comment-16633423
 ] 

Fan weiwen commented on FLINK-6966:
---

[~hequn8128]  it not like this  

i think it independent of the field

uids is not unique   

this is a very easy question to reproduce  can try it 

> Add maxParallelism and UIDs to all operators generated by the Table API / SQL
> -
>
> Key: FLINK-6966
> URL: https://issues.apache.org/jira/browse/FLINK-6966
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API  SQL
>Affects Versions: 1.4.0
>Reporter: Fabian Hueske
>Priority: Major
>
> At the moment, the Table API does not assign UIDs and the max parallelism to 
> operators (except for operators with parallelism 1).
> We should do that to avoid problems when rescaling or restarting jobs from 
> savepoints.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10468) Potential missing break for PARTITION_CUSTOM in OutputEmitter ctor

2018-09-30 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633416#comment-16633416
 ] 

Ted Yu commented on FLINK-10468:


That was why I started the issue title with Potential.

Even if this is the case, assigning {{channels}} and breaking would make the 
code easier to understand for other people.

Or, a comment should be added stating the fact.

> Potential missing break for PARTITION_CUSTOM in OutputEmitter ctor
> --
>
> Key: FLINK-10468
> URL: https://issues.apache.org/jira/browse/FLINK-10468
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
> switch (strategy) {
> case PARTITION_CUSTOM:
>   extractedKeys = new Object[1];
> case FORWARD:
> {code}
> It seems a 'break' is missing prior to FORWARD case.
> {code}
> if (strategy == ShipStrategyType.PARTITION_CUSTOM && partitioner == null) 
> {
>   throw new NullPointerException("Partitioner must not be null when the 
> ship strategy is set to custom partitioning.");
> }
> {code}
> Since the above check is for PARTITION_CUSTOM, it seems we can place the 
> check in the switch statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10468) Potential missing break for PARTITION_CUSTOM in OutputEmitter ctor

2018-09-30 Thread tison (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633413#comment-16633413
 ] 

tison commented on FLINK-10468:
---

I just have taken a look at that code and think it is intended since 
{{PARTITION_CUSTOM}} means that partitioning using a custom partitioner so it 
sets a {{extractedKeys}}, but all of the strategy need a {{channels}}.

However, I am not quite familiar with this code and think [~zjwang], 
[~piwaniuk] and [~NicoK] would know more about it. And if they are stand the 
same way as me, we can close this issue as won't fix.

> Potential missing break for PARTITION_CUSTOM in OutputEmitter ctor
> --
>
> Key: FLINK-10468
> URL: https://issues.apache.org/jira/browse/FLINK-10468
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
> switch (strategy) {
> case PARTITION_CUSTOM:
>   extractedKeys = new Object[1];
> case FORWARD:
> {code}
> It seems a 'break' is missing prior to FORWARD case.
> {code}
> if (strategy == ShipStrategyType.PARTITION_CUSTOM && partitioner == null) 
> {
>   throw new NullPointerException("Partitioner must not be null when the 
> ship strategy is set to custom partitioning.");
> }
> {code}
> Since the above check is for PARTITION_CUSTOM, it seems we can place the 
> check in the switch statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-10468) Potential missing break for PARTITION_CUSTOM in OutputEmitter ctor

2018-09-30 Thread tison (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tison reassigned FLINK-10468:
-

Assignee: (was: tison)

> Potential missing break for PARTITION_CUSTOM in OutputEmitter ctor
> --
>
> Key: FLINK-10468
> URL: https://issues.apache.org/jira/browse/FLINK-10468
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
> switch (strategy) {
> case PARTITION_CUSTOM:
>   extractedKeys = new Object[1];
> case FORWARD:
> {code}
> It seems a 'break' is missing prior to FORWARD case.
> {code}
> if (strategy == ShipStrategyType.PARTITION_CUSTOM && partitioner == null) 
> {
>   throw new NullPointerException("Partitioner must not be null when the 
> ship strategy is set to custom partitioning.");
> }
> {code}
> Since the above check is for PARTITION_CUSTOM, it seems we can place the 
> check in the switch statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL

2018-09-30 Thread Hequn Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633410#comment-16633410
 ] 

Hequn Cheng commented on FLINK-6966:


Hi [~fanweiwen] , this task has not been started.
In your case, it seems that jobB shares same uids with jobA. So, jobB can 
restore from the state if field b and field c share same filed type. 

> Add maxParallelism and UIDs to all operators generated by the Table API / SQL
> -
>
> Key: FLINK-6966
> URL: https://issues.apache.org/jira/browse/FLINK-6966
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API  SQL
>Affects Versions: 1.4.0
>Reporter: Fabian Hueske
>Priority: Major
>
> At the moment, the Table API does not assign UIDs and the max parallelism to 
> operators (except for operators with parallelism 1).
> We should do that to avoid problems when rescaling or restarting jobs from 
> savepoints.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-10468) Potential missing break for PARTITION_CUSTOM in OutputEmitter ctor

2018-09-30 Thread tison (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tison reassigned FLINK-10468:
-

Assignee: tison

> Potential missing break for PARTITION_CUSTOM in OutputEmitter ctor
> --
>
> Key: FLINK-10468
> URL: https://issues.apache.org/jira/browse/FLINK-10468
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: tison
>Priority: Minor
>
> Here is related code:
> {code}
> switch (strategy) {
> case PARTITION_CUSTOM:
>   extractedKeys = new Object[1];
> case FORWARD:
> {code}
> It seems a 'break' is missing prior to FORWARD case.
> {code}
> if (strategy == ShipStrategyType.PARTITION_CUSTOM && partitioner == null) 
> {
>   throw new NullPointerException("Partitioner must not be null when the 
> ship strategy is set to custom partitioning.");
> }
> {code}
> Since the above check is for PARTITION_CUSTOM, it seems we can place the 
> check in the switch statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10468) Potential missing break for PARTITION_CUSTOM in OutputEmitter ctor

2018-09-30 Thread Ted Yu (JIRA)
Ted Yu created FLINK-10468:
--

 Summary: Potential missing break for PARTITION_CUSTOM in 
OutputEmitter ctor
 Key: FLINK-10468
 URL: https://issues.apache.org/jira/browse/FLINK-10468
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu


Here is related code:
{code}
switch (strategy) {
case PARTITION_CUSTOM:
  extractedKeys = new Object[1];
case FORWARD:
{code}
It seems a 'break' is missing prior to FORWARD case.
{code}
if (strategy == ShipStrategyType.PARTITION_CUSTOM && partitioner == null) {
  throw new NullPointerException("Partitioner must not be null when the 
ship strategy is set to custom partitioning.");
}
{code}
Since the above check is for PARTITION_CUSTOM, it seems we can place the check 
in the switch statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9932) If task executor offer slot to job master timeout the first time, the slot will leak

2018-09-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633307#comment-16633307
 ] 

ASF GitHub Bot commented on FLINK-9932:
---

shuai-xu opened a new pull request #6780: [FLINK-9932] [runtime] fix slot leak 
when task executor offer slot to job master timeout
URL: https://github.com/apache/flink/pull/6780
 
 
   
   ## What is the purpose of the change
   
   *(For example: This pull request fix that the slots in task executor will 
leak if task executor fail to offer it to job master due to rpc timeout.)*
   
   ## Verifying this change
   
 - *Add a test in TaskExecutorTest*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no )
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> If task executor offer slot to job master timeout the first time, the slot 
> will leak
> 
>
> Key: FLINK-9932
> URL: https://issues.apache.org/jira/browse/FLINK-9932
> Project: Flink
>  Issue Type: Bug
>  Components: Cluster Management
>Affects Versions: 1.5.0
>Reporter: shuai.xu
>Assignee: shuai.xu
>Priority: Major
>  Labels: pull-request-available
>
> When task executor offer slot to job master, it will first mark the slot as 
> active.
> If the offer slot call timeout, the task executor will try to call 
> offerSlotsToJobManager again,
> but it will only offer the slot in ALLOCATED state. As the slot has already 
> be mark ACTIVE, it will never be offered and this will cause slot leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9932) If task executor offer slot to job master timeout the first time, the slot will leak

2018-09-30 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-9932:
--
Labels: pull-request-available  (was: )

> If task executor offer slot to job master timeout the first time, the slot 
> will leak
> 
>
> Key: FLINK-9932
> URL: https://issues.apache.org/jira/browse/FLINK-9932
> Project: Flink
>  Issue Type: Bug
>  Components: Cluster Management
>Affects Versions: 1.5.0
>Reporter: shuai.xu
>Assignee: shuai.xu
>Priority: Major
>  Labels: pull-request-available
>
> When task executor offer slot to job master, it will first mark the slot as 
> active.
> If the offer slot call timeout, the task executor will try to call 
> offerSlotsToJobManager again,
> but it will only offer the slot in ALLOCATED state. As the slot has already 
> be mark ACTIVE, it will never be offered and this will cause slot leak.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] shuai-xu opened a new pull request #6780: [FLINK-9932] [runtime] fix slot leak when task executor offer slot to job master timeout

2018-09-30 Thread GitBox
shuai-xu opened a new pull request #6780: [FLINK-9932] [runtime] fix slot leak 
when task executor offer slot to job master timeout
URL: https://github.com/apache/flink/pull/6780
 
 
   
   ## What is the purpose of the change
   
   *(For example: This pull request fix that the slots in task executor will 
leak if task executor fail to offer it to job master due to rpc timeout.)*
   
   ## Verifying this change
   
 - *Add a test in TaskExecutorTest*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no )
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL

2018-09-30 Thread Fan weiwen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633255#comment-16633255
 ] 

Fan weiwen commented on FLINK-6966:
---

[~sunjincheng121]   This is the problem, is it solved? 

i have a similar problem

the job has two process   one table one sink 

for example

A

// view 

select a,b, count(*) as acnt from kafka group by
 hop(
 rowtime,
 interval '2' second,
 interval '60' minute
 ),
a,
b

Table table = tableEnv.sqlQuery(sql);
tableEnv.registerTable(Aname, table);

 

// sink

sinksql = select a,b,ant from Aname

Table table = tableEnv.sqlQuery(sinksql);

DataStream ds = tableEnv.toAppendStream(table, Row.class);
ds.addSink(...);

 

B

// view 

sql = select a,c, count(*) as bnt from kafka group by
hop(
rowtime,
interval '2' second,
interval '60' minute
),
a,
c

Table table = tableEnv.sqlQuery(sql);
tableEnv.registerTable(Bname, table);

// sink

sinksql = select a,c,bnt from Bname

Table table = tableEnv.sqlQuery(sinksql);

DataStream ds = tableEnv.toAppendStream(table, Row.class);
ds.addSink(...);

-

 

now  the job  start  A, stop B 

then  stop A , cancel and savepoint

start B , from savepoint 

B  sink   sinksql = select a,c,bnt from Bname  

 

problem is  B  sink  fetch data  from A state

 

State is not maintained only, not uid and operator id 

english is pool , can  chinese exchange ? 

 

 

> Add maxParallelism and UIDs to all operators generated by the Table API / SQL
> -
>
> Key: FLINK-6966
> URL: https://issues.apache.org/jira/browse/FLINK-6966
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API  SQL
>Affects Versions: 1.4.0
>Reporter: Fabian Hueske
>Priority: Major
>
> At the moment, the Table API does not assign UIDs and the max parallelism to 
> operators (except for operators with parallelism 1).
> We should do that to avoid problems when rescaling or restarting jobs from 
> savepoints.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)