[jira] [Comment Edited] (FLINK-14170) Support hadoop < 2.7 with StreamingFileSink.BulkFormatBuilder

2019-12-10 Thread Bhagavan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992443#comment-16992443
 ] 

Bhagavan edited comment on FLINK-14170 at 12/10/19 11:10 AM:
-

[~aljoscha] I have a PR for this. Please assign this ticket to me


was (Author: dasbh):
[~aljoscha] I have a PR for this please assign this ticket to me

> Support hadoop < 2.7 with StreamingFileSink.BulkFormatBuilder
> -
>
> Key: FLINK-14170
> URL: https://issues.apache.org/jira/browse/FLINK-14170
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.8.0, 1.8.1, 1.8.2, 1.9.0
>Reporter: Bhagavan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, StreamingFileSink is supported only with Hadoop >= 2.7 
> irrespective of Row/bulk format builder. This restriction is due to truncate 
> is not supported in  Hadoop < 2.7
> However, BulkFormatBuilder does not use truncate method to restore the file. 
> So the restricting StreamingFileSink.BulkFormatBuilder to be used only with 
> Hadoop >= 2.7 is not necessary.
> So requested improvement is to remove the precondition on 
> HadoopRecoverableWriter and allow  BulkFormatBuilder (Parquet) to be used in 
> Hadoop 2.6 ( Most of the enterprises still on CDH 5.x)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-14170) Support hadoop < 2.7 with StreamingFileSink.BulkFormatBuilder

2019-10-17 Thread John Lonergan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953637#comment-16953637
 ] 

John Lonergan edited comment on FLINK-14170 at 10/17/19 11:25 AM:
--

Hi I disagree with your approach Kostas as it impacts time to marker and adds a 
huge effort to fixing this critical bug.

This existing impl is an attempt at "fail early" and is a *nice to have 
feature*, 
however this implementation needlessly disables the product on 2.6 and is 
therefore a *major bug* for us users.



Can we split the discussion into 
1 remove the bug (ie the block)
2 other nice to have improvements



Re 1 - 

When removing the bug the requestor's suggested an optional addition ie 
including a simple  NotImplementedException in the Flink code - this seems like 
a reasonable *but optional* compromise to improve the quality of the error 
message for any unfortunate's who go via a code path that attempts to use 
truncate() on 2.6. That approach is a practical solution that satisfies both 
the need to correctness and helpfulness without completely blocking the use of 
this product for a large group of potential users, particularly the many-many 
slower moving enterprises out here in the wild.

Let's not add additional barriers in the way of fixing the primary issue.

Re 2 - Your points ... 

"we should fail at build time" - how is the possible - we don't target 
specifically hadoop 2.6 or other versions?
" pre-flight time" - again how is this possible - I've looked at this and it's 
pretty hard to see how that would work - I can't see an straightforward one 
(suggest you make a proposal on how this would work)
"same strategy" - not needed to fix the bug - and this is separate problem that 
needs a separate ticket


Re "time bomb waiting to explode" - hardly an reasonable description of the 
issue - it's not like the first time I would run this code is in production? 
I'd discover the issue within an hour or so of writing my prototype or 
implementation - not a big deal IMHO. And not a big deal at all if the helpful 
error message that the original question suggests was included in the solution.

---

Again can I stress that we separate the critical bug (this 2.6 check) from 
other nice to haves




was (Author: johnlon):
Hi I disagree with your approach Kostas as it impacts time to marker and adds a 
huge effort to fixing this critical bug.

This existing impl is an attempt at "fail early" and is a *nice to have 
feature*, 
however this implementation needlessly disables the product on 2.6 and is 
therefore a *major bug* for us users.



Can we split the discussion into 
1 remove the bug (ie the block)
2 other nice to have improvements



Re 1 - 

When removing the bug the requestor's suggested an optional addition ie 
including a simple  NotImplementedException in the Flink code - this seems like 
a reasonable *but optional* compromise to improve the quality of the error 
message for any unfortunate's who go via a code path that attempts to use 
truncate() on 2.6. That approach is a practical solution that satisfies both 
the need to correctness and helpfulness without completely blocking the use of 
this product for a large group of potential users, particularly the many-many 
slower moving enterprises out here in the wild.

Let's not add additional barriers in the way of fixing the primary issue.

Re 2 - Your points ... 

"we should fail at build time" - how is the possible - we don't target 
specifically hadoop 2.6 or other versions?
" pre-flight time" - again how is this possible - I've looked at this and it's 
pretty hard to see how that would work - I can't see an straightforward one 
(suggest you make a proposal on how this would work)
"same strategy" - not needed to fix the bug - and this is separate problem that 
needs a separate ticket


Re "time bomb waiting to explode" - hardly an reasonable description of the 
issue - it's not like the first time I would run this code is in production? 
I'd discover the issue within an hour or so of writing my prototype or 
implementation - not a big deal IMHO. And not a big deal at all if the helpful 
error message that the original question suggests was included in the solution.

*Again can I stress that we separate the critical bug (this 2.6 check) from 
other nice to haves*



> Support hadoop < 2.7 with StreamingFileSink.BulkFormatBuilder
> -
>
> Key: FLINK-14170
> URL: https://issues.apache.org/jira/browse/FLINK-14170
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.8.0, 1.8.1, 1.8.2, 1.9.0
>Reporter: Bhagavan
>Priority: Major
>
> Currently, StreamingFileSink is supported only with Hadoop >= 2.7 
> irrespective of Ro

[jira] [Comment Edited] (FLINK-14170) Support hadoop < 2.7 with StreamingFileSink.BulkFormatBuilder

2019-10-17 Thread John Lonergan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953637#comment-16953637
 ] 

John Lonergan edited comment on FLINK-14170 at 10/17/19 11:24 AM:
--

Hi I disagree with your approach Kostas as it impacts time to marker and adds a 
huge effort to fixing this critical bug.

This existing impl is an attempt at "fail early" and is a *nice to have 
feature*, 
however this implementation needlessly disables the product on 2.6 and is 
therefore a *major bug* for us users.



Can we split the discussion into 
1 remove the bug (ie the block)
2 other nice to have improvements



Re 1 - 

When removing the bug the requestor's suggested an optional addition ie 
including a simple  NotImplementedException in the Flink code - this seems like 
a reasonable *but optional* compromise to improve the quality of the error 
message for any unfortunate's who go via a code path that attempts to use 
truncate() on 2.6. That approach is a practical solution that satisfies both 
the need to correctness and helpfulness without completely blocking the use of 
this product for a large group of potential users, particularly the many-many 
slower moving enterprises out here in the wild.

Let's not add additional barriers in the way of fixing the primary issue.

Re 2 - Your points ... 

"we should fail at build time" - how is the possible - we don't target 
specifically hadoop 2.6 or other versions?
" pre-flight time" - again how is this possible - I've looked at this and it's 
pretty hard to see how that would work - I can't see an straightforward one 
(suggest you make a proposal on how this would work)
"same strategy" - not needed to fix the bug - and this is separate problem that 
needs a separate ticket


Re "time bomb waiting to explode" - hardly an reasonable description of the 
issue - it's not like the first time I would run this code is in production? 
I'd discover the issue within an hour or so of writing my prototype or 
implementation - not a big deal IMHO. And not a big deal at all if the helpful 
error message that the original question suggests was included in the solution.

*Again can I stress that we separate the critical bug (this 2.6 check) from 
other nice to haves*




was (Author: johnlon):
Hi I disagree with that approach as it's impact time to marker and effort in 
fixing the bug significantly.

This existing impl is an attempt at "fail early" and is a *nice to have 
feature*, 
however this implementation needlessly disables the product on 2.6 and is 
therefore a *major bug* for us users.



Can we split the discussion into 
1 remove the bug (ie the block)
2 other nice to have improvements



Re 1 - 

When removing the bug the requestor's suggested an optional addition ie 
including a simple  NotImplementedException in the Flink code - this seems like 
a reasonable *but optional* compromise to improve the quality of the error 
message for any unfortunate's who go via a code path that attempts to use 
truncate() on 2.6. That approach is a practical solution that satisfies both 
the need to correctness and helpfulness without completely blocking the use of 
this product for a large group of potential users, particularly the many-many 
slower moving enterprises out here in the wild.

Let's not add additional barriers in the way of fixing the primary issue.

Re 2 - Your points ... 

"we should fail at build time" - how is the possible - we don't target 
specifically hadoop 2.6 or other versions?
" pre-flight time" - again how is this possible - I've looked at this and it's 
pretty hard to see how that would work - I can't see an straightforward one 
(suggest you make a proposal on how this would work)
"same strategy" - not needed to fix the bug - and this is separate problem that 
needs a separate ticket


Re "time bomb waiting to explode" - hardly an reasonable description of the 
issue - it's not like the first time I would run this code is in production? 
I'd discover the issue within an hour or so of writing my prototype or 
implementation - not a big deal IMHO. And not a big deal at all if the helpful 
error message that the original question suggests was included in the solution.

*Again can I stress that we separate the critical bug (this 2.6 check) from 
other nice to haves*



> Support hadoop < 2.7 with StreamingFileSink.BulkFormatBuilder
> -
>
> Key: FLINK-14170
> URL: https://issues.apache.org/jira/browse/FLINK-14170
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Affects Versions: 1.8.0, 1.8.1, 1.8.2, 1.9.0
>Reporter: Bhagavan
>Priority: Major
>
> Currently, StreamingFileSink is supported only with Hadoop >= 2.7 
> irrespective of Row/bulk format bui