Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2021-01-03 Thread Xiaoqiao He
Hi Konstantin and Steve,

HDFS-15751 has committed to related branches including 3.2.2. I will
prepare 3.2.2-RC shortly. Please let me know if any other issues for doc or
deployment do you meet. Thanks.

- He Xiaoqiao

On Wed, Dec 30, 2020 at 6:40 PM Xiaoqiao He  wrote:

> Hi Steve and Chao,
>
> Konstatin has updated HDFS-15751 and attached patch for documentation
> msync API, Would you mind to TAL. I want to involve this update to 3.2.2
> release, but for now it seems there are not enough committers to review.
> Please check when you have bandwidth. Thanks all.
>
> - He Xiaoqiao
>
> On Fri, Dec 25, 2020 at 5:40 AM Konstantin Shvachko 
> wrote:
>
>> Hi Steve,
>>
>> I created HDFS-15751 
>> for
>> documenting msync API.
>> Would appreciate your suggestions.
>>
>> Stay safe,
>> --Konstantin
>>
>> On Mon, Dec 21, 2020 at 5:19 AM Steve Loughran 
>> wrote:
>>
>> >
>> >
>> > On Fri, 18 Dec 2020 at 23:29, Konstantin Shvachko > >
>> > wrote:
>> >
>> >> Hey Steve,
>> >>
>> >> Thanks for the references. I was reading but still need to understand
>> how
>> >> exactly this applies to msync.
>> >>
>> >
>> > mainly: pull it up and it becomes part of the broader API, so needs to
>> be
>> > specified in a way which can be understood by users and for
>> implementors of
>> > others stores: to give their own stores the same semantics.
>> >
>> > What does the HDFS one do?
>> >
>> >
>> >
>> >> Will come up with a plan and post it on a new jira.
>> >> Will make sure to create it under HADOOP and ping Hadoop Common list
>> for
>> >> visibility.
>> >>
>> >>
>> > thanks
>> >
>> >
>> >> You are right about ViewFS. The impl should make sure it calls msync()
>> on
>> >> all mount points that enabled observer reads.
>> >>
>> >>
>> > That's the kind of issue this process aims to resolve. Another is to
>> > identify where we have HDFS-layer "quirks" and at least document them
>> (e.g.
>> > how hdfs streams are thread safe, rename isn't Posix, ...) and list
>> what we
>> > know breaks if you don't re-implement
>> >
>>
>


Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-30 Thread Xiaoqiao He
Hi Steve and Chao,

Konstatin has updated HDFS-15751 and attached patch for documentation msync
API, Would you mind to TAL. I want to involve this update to 3.2.2 release,
but for now it seems there are not enough committers to review. Please
check when you have bandwidth. Thanks all.

- He Xiaoqiao

On Fri, Dec 25, 2020 at 5:40 AM Konstantin Shvachko 
wrote:

> Hi Steve,
>
> I created HDFS-15751 
> for
> documenting msync API.
> Would appreciate your suggestions.
>
> Stay safe,
> --Konstantin
>
> On Mon, Dec 21, 2020 at 5:19 AM Steve Loughran 
> wrote:
>
> >
> >
> > On Fri, 18 Dec 2020 at 23:29, Konstantin Shvachko 
> > wrote:
> >
> >> Hey Steve,
> >>
> >> Thanks for the references. I was reading but still need to understand
> how
> >> exactly this applies to msync.
> >>
> >
> > mainly: pull it up and it becomes part of the broader API, so needs to be
> > specified in a way which can be understood by users and for implementors
> of
> > others stores: to give their own stores the same semantics.
> >
> > What does the HDFS one do?
> >
> >
> >
> >> Will come up with a plan and post it on a new jira.
> >> Will make sure to create it under HADOOP and ping Hadoop Common list for
> >> visibility.
> >>
> >>
> > thanks
> >
> >
> >> You are right about ViewFS. The impl should make sure it calls msync()
> on
> >> all mount points that enabled observer reads.
> >>
> >>
> > That's the kind of issue this process aims to resolve. Another is to
> > identify where we have HDFS-layer "quirks" and at least document them
> (e.g.
> > how hdfs streams are thread safe, rename isn't Posix, ...) and list what
> we
> > know breaks if you don't re-implement
> >
>


Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-24 Thread Konstantin Shvachko
Hi Steve,

I created HDFS-15751  for
documenting msync API.
Would appreciate your suggestions.

Stay safe,
--Konstantin

On Mon, Dec 21, 2020 at 5:19 AM Steve Loughran  wrote:

>
>
> On Fri, 18 Dec 2020 at 23:29, Konstantin Shvachko 
> wrote:
>
>> Hey Steve,
>>
>> Thanks for the references. I was reading but still need to understand how
>> exactly this applies to msync.
>>
>
> mainly: pull it up and it becomes part of the broader API, so needs to be
> specified in a way which can be understood by users and for implementors of
> others stores: to give their own stores the same semantics.
>
> What does the HDFS one do?
>
>
>
>> Will come up with a plan and post it on a new jira.
>> Will make sure to create it under HADOOP and ping Hadoop Common list for
>> visibility.
>>
>>
> thanks
>
>
>> You are right about ViewFS. The impl should make sure it calls msync() on
>> all mount points that enabled observer reads.
>>
>>
> That's the kind of issue this process aims to resolve. Another is to
> identify where we have HDFS-layer "quirks" and at least document them (e.g.
> how hdfs streams are thread safe, rename isn't Posix, ...) and list what we
> know breaks if you don't re-implement
>


Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-21 Thread Steve Loughran
On Fri, 18 Dec 2020 at 23:29, Konstantin Shvachko 
wrote:

> Hey Steve,
>
> Thanks for the references. I was reading but still need to understand how
> exactly this applies to msync.
>

mainly: pull it up and it becomes part of the broader API, so needs to be
specified in a way which can be understood by users and for implementors of
others stores: to give their own stores the same semantics.

What does the HDFS one do?



> Will come up with a plan and post it on a new jira.
> Will make sure to create it under HADOOP and ping Hadoop Common list for
> visibility.
>
>
thanks


> You are right about ViewFS. The impl should make sure it calls msync() on
> all mount points that enabled observer reads.
>
>
That's the kind of issue this process aims to resolve. Another is to
identify where we have HDFS-layer "quirks" and at least document them (e.g.
how hdfs streams are thread safe, rename isn't Posix, ...) and list what we
know breaks if you don't re-implement


Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-18 Thread Konstantin Shvachko
Hey Steve,

Thanks for the references. I was reading but still need to understand how
exactly this applies to msync.
Will come up with a plan and post it on a new jira.
Will make sure to create it under HADOOP and ping Hadoop Common list for
visibility.

You are right about ViewFS. The impl should make sure it calls msync() on
all mount points that enabled observer reads.

--Konst

On Tue, Dec 15, 2020 at 11:15 AM Steve Loughran  wrote:

>
>
> On Sun, 13 Dec 2020 at 21:08, Konstantin Shvachko 
> wrote:
>
>> Hi Steve,
>>
>> I am not sure I fully understand what is broken here. It is not an
>> incompatible change, right?
>>
>
> The issue is that the FileSystem/FileContext APIs are something we have to
> maintain ~forever, so every API change needs to be
>
> - something we are happy with being there for the life of the class
> - defined strictly enough that people implementing other filesystems can
> re-implement without having to reverse-engineer HDFS and then conclude
> "that is what they meant to do". That's with a bit of
> - and with an AbstractFileSystemContractTest for implementors.
>
> That's it: define, specify, add a contract test rather than just something
> for HDFS.
>
>
>
>> Could you please explain what you think the process is.
>> Would be best if you could share a link to a document describing it.
>> I would be glad to follow up with tests and documentation that are needed.
>>
>>
> ideally,
> hadoop-common-project/hadoop-common/src/site/markdown/filesystem/extending.md
>
> Pulling up something from hdfs is different from saying "hey, new rename
> API!", but it's still time to actually define what it does so that not only
> can other people like me reimplement it, but to actually define it well
> enough that we can all see when there's a regression.
>
> Equally important: is there a way to test that it works?
>
> We've been using hasPathCapabilities() to probe for an FS having a given
> feature down a path; the idea is to let code check upfront for a feature
> before having to call it and catching an exception,
>
> We can add that for an API even if it has shipped already. For example,
> here is a PR to do exactly that for Syncable
> https://github.com/apache/hadoop/pull/2102
>
>
>
>> As you can see I proposed multiple solutions to the problem in the jira.
>> Seemed nobody was objecting, so I chose one and explained why.
>> I believe we call it lazy consensus.
>>
>
> I'm happy with lazy consensus, but can you involve more people? In
> particular. i was filed in an HDFS JIRA so it didn't surface in
> hadoop-common.
>
> If you'd done a HADOOP- JIRA "pull up msync into FileSystem API" or even
> just a note to hadoop-common saying "we need to do this" that would have
> been enough to start a discussion.
>
>  As it is I only noticed after some rebasing with a "hang on a minute.
> here's a new method who's behaviour doesn't seem to have defined other than
> 'whatever hdfs does'". Which, if you've ever tried to work out what
> rename() does, you'll recognise as danger.
>
> Anyway, to finish off, have a look at the extending.md doc and just add a
> new method definition in the filesystem.md saying what it is meant to do.
>
> Now: what about viewfs? maybe: For all mounted fileystems which declare
> their support, call msync()? Or just "call it and swallow all exceptions?"
>
>
>> Stay safe,
>>
>
>
> yeah: )
>
>> --Konstantin
>>
>
>>>


Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-15 Thread Konstantin Shvachko
Hey Xiaoqiao,

HDFS-14272 was committed to all branches up to 2.10. The jira versions were
not updated properly.
I'll ping Chen for an update. He committed it in May.

Stay safe,
--Konstantin

On Tue, Dec 15, 2020 at 1:21 AM Xiaoqiao He  wrote:

> Hi All,
>
>
>> I'm just curious why this is included in the 3.2.2 release? HDFS-15567 is
>> tagged with 3.2.3 and the corresponding HDFS-14272 on server side is tagged
>> with 3.3.0.
>
>
> Have checked the fix version tag, I found there are 8 issues which do not
> include branch-3.2.2 correctly or both branch-3.2.2 and branch-3.2.3
> missed. And have updated them manually. Please have a look. Thanks.
> HADOOP-15691
> HDFS-15464
> HDFS-15478
> HDFS-15567
> HDFS-15574
> HDFS-15583
> HDFS-15628
> YARN-10430
>
> Regards,
> - He Xiaoqiao
>
> On Mon, Dec 14, 2020 at 5:08 AM Konstantin Shvachko 
> wrote:
>
>> Hi Steve,
>>
>> I am not sure I fully understand what is broken here. It is not an
>> incompatible change, right?
>> Could you please explain what you think the process is.
>> Would be best if you could share a link to a document describing it.
>> I would be glad to follow up with tests and documentation that are needed.
>>
>> As you can see I proposed multiple solutions to the problem in the jira.
>> Seemed nobody was objecting, so I chose one and explained why.
>> I believe we call it lazy consensus.
>>
>> Stay safe,
>> --Konstantin
>>
>> On Sun, Dec 13, 2020 at 10:22 AM Chao Sun  wrote:
>>
>>> > This is an API where it'd be ok to have a no-op if not implemented,
>>> correct? Or is there an requirement like Syncable that specific
>>> guarantees
>>> are met?
>>>
>>> Yes I think it's ok to leave it as no-op for other non-HDFS FS impls: it
>>> is
>>> only used by HDFS standby reads so far.
>>>
>>>
>>>
>>> On Sun, Dec 13, 2020 at 4:58 AM Steve Loughran 
>>> wrote:
>>>
>>> > This isn't worth holding up the RC. We'd just add something to the
>>> > release notes "use with caution". And if we can get what the API does
>>> > defined in a way which works, it shouldn't need changing.
>>> >
>>> > (which reminds me, I do need to check that RC out, don't I?)
>>> >
>>> > On Sun, 13 Dec 2020 at 09:00, Xiaoqiao He 
>>> wrote:
>>> >
>>> >> Thanks Steve very much for your discussion here.
>>> >>
>>> >> Leave some comments inline. Will focus on this thread to wait for the
>>> >> final
>>> >> conclusion to decide if we should prepare another release candidate of
>>> >> 3.2.2.
>>> >> Thanks Steve and Chao again for your warm discussions.
>>> >>
>>> >> On Sat, Dec 12, 2020 at 7:18 PM Steve Loughran
>>> >> 
>>> >> wrote:
>>> >>
>>> >> > Maybe it's not in the release; it's certainly in the 3.2 branch.
>>> Will
>>> >> check
>>> >> > further. If it's in the release I was thinking of adding a warning
>>> in
>>> >> the
>>> >> > notes "unstable API"; stable if invoked from DFSClient
>>> >>
>>> >> On Fri, 11 Dec 2020 at 18:21, Chao Sun  wrote:
>>> >> >
>>> >> > > I'm just curious why this is included in the 3.2.2 release?
>>> >> HDFS-15567 is
>>> >> > > tagged with 3.2.3 and the corresponding HDFS-14272 on server side
>>> is
>>> >> > tagged
>>> >> > > with 3.3.0.
>>> >> >
>>> >>
>>> >> Just checked that HDFS-15567 has been involved in Hadoop-3.2.2 RC4.
>>> IIRC,
>>> >> I
>>> >> have cut branch-3.2.2 in early October, at that time branch-3.2.3 has
>>> >> created but source code not freeze completely because several blocked
>>> >> issues reported and code freeze has done about mid October. Some
>>> issues
>>> >> which are tagged with 3.2.3 has also been involved in 3.2.2 during
>>> >> that period, include HDFS-15567. I will check them later, and make
>>> sure
>>> >> that we have mark the correct tags.
>>> >>
>>> >>
>>> >> > >
>>> >> > > > If it goes into FS/FC, what does it do for a viewfs with >1
>>> mounted
>>> >> > HDFS?
>>> >> > > Should it take path, msync(path) so that viewFS knows where to
>>> forward
>>> >> > it?
>>> >> > >
>>> >> > > The API shouldn't take any path - for viewFS I think it should
>>> call
>>> >> this
>>> >> > on
>>> >> > > all the child file systems. It might also need to handle the case
>>> >> where
>>> >> > > some downstream clusters support this capability while others
>>> don't.
>>> >> > >
>>> >> >
>>> >> > That's an extra bit of work for ViewFS then. It should probe for
>>> >> capability
>>> >> > and invoke as/when supported.
>>> >> >
>>> >> > >
>>> >> > > > Options
>>> >> > > 1. I roll HDFS-15567 back "please be follow process"
>>> >> > > 2. Someone does a followup patch with specification and contract
>>> test,
>>> >> > view
>>> >> > > FS. Add even more to the java
>>> >> > > 3. We do as per HADOOP-16898 into an MSyncable interface and then
>>> >> > > FileSystem & HDFS can implement. ViewFS and filterFS still need to
>>> >> pass
>>> >> > > through.
>>> >> > >
>>> >> > > I'm slightly in favor of the hasPathCapabilities approach and make
>>> >> this a
>>> >> > > mixin where FS impls can optionally support. Happy to hear what
>>> others
>>> >> > > think.
>>> >> > >

Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-15 Thread Steve Loughran
On Sun, 13 Dec 2020 at 21:08, Konstantin Shvachko 
wrote:

> Hi Steve,
>
> I am not sure I fully understand what is broken here. It is not an
> incompatible change, right?
>

The issue is that the FileSystem/FileContext APIs are something we have to
maintain ~forever, so every API change needs to be

- something we are happy with being there for the life of the class
- defined strictly enough that people implementing other filesystems can
re-implement without having to reverse-engineer HDFS and then conclude
"that is what they meant to do". That's with a bit of
- and with an AbstractFileSystemContractTest for implementors.

That's it: define, specify, add a contract test rather than just something
for HDFS.



> Could you please explain what you think the process is.
> Would be best if you could share a link to a document describing it.
> I would be glad to follow up with tests and documentation that are needed.
>
>
ideally,
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/extending.md

Pulling up something from hdfs is different from saying "hey, new rename
API!", but it's still time to actually define what it does so that not only
can other people like me reimplement it, but to actually define it well
enough that we can all see when there's a regression.

Equally important: is there a way to test that it works?

We've been using hasPathCapabilities() to probe for an FS having a given
feature down a path; the idea is to let code check upfront for a feature
before having to call it and catching an exception,

We can add that for an API even if it has shipped already. For example,
here is a PR to do exactly that for Syncable
https://github.com/apache/hadoop/pull/2102



> As you can see I proposed multiple solutions to the problem in the jira.
> Seemed nobody was objecting, so I chose one and explained why.
> I believe we call it lazy consensus.
>

I'm happy with lazy consensus, but can you involve more people? In
particular. i was filed in an HDFS JIRA so it didn't surface in
hadoop-common.

If you'd done a HADOOP- JIRA "pull up msync into FileSystem API" or even
just a note to hadoop-common saying "we need to do this" that would have
been enough to start a discussion.

 As it is I only noticed after some rebasing with a "hang on a minute.
here's a new method who's behaviour doesn't seem to have defined other than
'whatever hdfs does'". Which, if you've ever tried to work out what
rename() does, you'll recognise as danger.

Anyway, to finish off, have a look at the extending.md doc and just add a
new method definition in the filesystem.md saying what it is meant to do.

Now: what about viewfs? maybe: For all mounted fileystems which declare
their support, call msync()? Or just "call it and swallow all exceptions?"


> Stay safe,
>


yeah: )

> --Konstantin
>

>>


Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-15 Thread Xiaoqiao He
Hi All,


> I'm just curious why this is included in the 3.2.2 release? HDFS-15567 is
> tagged with 3.2.3 and the corresponding HDFS-14272 on server side is tagged
> with 3.3.0.


Have checked the fix version tag, I found there are 8 issues which do not
include branch-3.2.2 correctly or both branch-3.2.2 and branch-3.2.3
missed. And have updated them manually. Please have a look. Thanks.
HADOOP-15691
HDFS-15464
HDFS-15478
HDFS-15567
HDFS-15574
HDFS-15583
HDFS-15628
YARN-10430

Regards,
- He Xiaoqiao

On Mon, Dec 14, 2020 at 5:08 AM Konstantin Shvachko 
wrote:

> Hi Steve,
>
> I am not sure I fully understand what is broken here. It is not an
> incompatible change, right?
> Could you please explain what you think the process is.
> Would be best if you could share a link to a document describing it.
> I would be glad to follow up with tests and documentation that are needed.
>
> As you can see I proposed multiple solutions to the problem in the jira.
> Seemed nobody was objecting, so I chose one and explained why.
> I believe we call it lazy consensus.
>
> Stay safe,
> --Konstantin
>
> On Sun, Dec 13, 2020 at 10:22 AM Chao Sun  wrote:
>
>> > This is an API where it'd be ok to have a no-op if not implemented,
>> correct? Or is there an requirement like Syncable that specific guarantees
>> are met?
>>
>> Yes I think it's ok to leave it as no-op for other non-HDFS FS impls: it
>> is
>> only used by HDFS standby reads so far.
>>
>>
>>
>> On Sun, Dec 13, 2020 at 4:58 AM Steve Loughran 
>> wrote:
>>
>> > This isn't worth holding up the RC. We'd just add something to the
>> > release notes "use with caution". And if we can get what the API does
>> > defined in a way which works, it shouldn't need changing.
>> >
>> > (which reminds me, I do need to check that RC out, don't I?)
>> >
>> > On Sun, 13 Dec 2020 at 09:00, Xiaoqiao He 
>> wrote:
>> >
>> >> Thanks Steve very much for your discussion here.
>> >>
>> >> Leave some comments inline. Will focus on this thread to wait for the
>> >> final
>> >> conclusion to decide if we should prepare another release candidate of
>> >> 3.2.2.
>> >> Thanks Steve and Chao again for your warm discussions.
>> >>
>> >> On Sat, Dec 12, 2020 at 7:18 PM Steve Loughran
>> >> 
>> >> wrote:
>> >>
>> >> > Maybe it's not in the release; it's certainly in the 3.2 branch. Will
>> >> check
>> >> > further. If it's in the release I was thinking of adding a warning in
>> >> the
>> >> > notes "unstable API"; stable if invoked from DFSClient
>> >>
>> >> On Fri, 11 Dec 2020 at 18:21, Chao Sun  wrote:
>> >> >
>> >> > > I'm just curious why this is included in the 3.2.2 release?
>> >> HDFS-15567 is
>> >> > > tagged with 3.2.3 and the corresponding HDFS-14272 on server side
>> is
>> >> > tagged
>> >> > > with 3.3.0.
>> >> >
>> >>
>> >> Just checked that HDFS-15567 has been involved in Hadoop-3.2.2 RC4.
>> IIRC,
>> >> I
>> >> have cut branch-3.2.2 in early October, at that time branch-3.2.3 has
>> >> created but source code not freeze completely because several blocked
>> >> issues reported and code freeze has done about mid October. Some issues
>> >> which are tagged with 3.2.3 has also been involved in 3.2.2 during
>> >> that period, include HDFS-15567. I will check them later, and make sure
>> >> that we have mark the correct tags.
>> >>
>> >>
>> >> > >
>> >> > > > If it goes into FS/FC, what does it do for a viewfs with >1
>> mounted
>> >> > HDFS?
>> >> > > Should it take path, msync(path) so that viewFS knows where to
>> forward
>> >> > it?
>> >> > >
>> >> > > The API shouldn't take any path - for viewFS I think it should call
>> >> this
>> >> > on
>> >> > > all the child file systems. It might also need to handle the case
>> >> where
>> >> > > some downstream clusters support this capability while others
>> don't.
>> >> > >
>> >> >
>> >> > That's an extra bit of work for ViewFS then. It should probe for
>> >> capability
>> >> > and invoke as/when supported.
>> >> >
>> >> > >
>> >> > > > Options
>> >> > > 1. I roll HDFS-15567 back "please be follow process"
>> >> > > 2. Someone does a followup patch with specification and contract
>> test,
>> >> > view
>> >> > > FS. Add even more to the java
>> >> > > 3. We do as per HADOOP-16898 into an MSyncable interface and then
>> >> > > FileSystem & HDFS can implement. ViewFS and filterFS still need to
>> >> pass
>> >> > > through.
>> >> > >
>> >> > > I'm slightly in favor of the hasPathCapabilities approach and make
>> >> this a
>> >> > > mixin where FS impls can optionally support. Happy to hear what
>> others
>> >> > > think.
>> >> > >
>> >> >
>> >> > Mixins are great when FC and FS can both implement; makes it easier
>> to
>> >> code
>> >> > against either. All the filtering/aggregating FS's will have to
>> >> implement
>> >> > it, which means that presence of the interface doesn't guarantee
>> >> support.
>> >> >
>> >> > This is an API where it'd be ok to have a no-op if not implemented,
>> >> > correct? Or is there an requirement like Syncable that 

Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-13 Thread Konstantin Shvachko
Hi Steve,

I am not sure I fully understand what is broken here. It is not an
incompatible change, right?
Could you please explain what you think the process is.
Would be best if you could share a link to a document describing it.
I would be glad to follow up with tests and documentation that are needed.

As you can see I proposed multiple solutions to the problem in the jira.
Seemed nobody was objecting, so I chose one and explained why.
I believe we call it lazy consensus.

Stay safe,
--Konstantin

On Sun, Dec 13, 2020 at 10:22 AM Chao Sun  wrote:

> > This is an API where it'd be ok to have a no-op if not implemented,
> correct? Or is there an requirement like Syncable that specific guarantees
> are met?
>
> Yes I think it's ok to leave it as no-op for other non-HDFS FS impls: it is
> only used by HDFS standby reads so far.
>
>
>
> On Sun, Dec 13, 2020 at 4:58 AM Steve Loughran 
> wrote:
>
> > This isn't worth holding up the RC. We'd just add something to the
> > release notes "use with caution". And if we can get what the API does
> > defined in a way which works, it shouldn't need changing.
> >
> > (which reminds me, I do need to check that RC out, don't I?)
> >
> > On Sun, 13 Dec 2020 at 09:00, Xiaoqiao He  wrote:
> >
> >> Thanks Steve very much for your discussion here.
> >>
> >> Leave some comments inline. Will focus on this thread to wait for the
> >> final
> >> conclusion to decide if we should prepare another release candidate of
> >> 3.2.2.
> >> Thanks Steve and Chao again for your warm discussions.
> >>
> >> On Sat, Dec 12, 2020 at 7:18 PM Steve Loughran
> >> 
> >> wrote:
> >>
> >> > Maybe it's not in the release; it's certainly in the 3.2 branch. Will
> >> check
> >> > further. If it's in the release I was thinking of adding a warning in
> >> the
> >> > notes "unstable API"; stable if invoked from DFSClient
> >>
> >> On Fri, 11 Dec 2020 at 18:21, Chao Sun  wrote:
> >> >
> >> > > I'm just curious why this is included in the 3.2.2 release?
> >> HDFS-15567 is
> >> > > tagged with 3.2.3 and the corresponding HDFS-14272 on server side is
> >> > tagged
> >> > > with 3.3.0.
> >> >
> >>
> >> Just checked that HDFS-15567 has been involved in Hadoop-3.2.2 RC4.
> IIRC,
> >> I
> >> have cut branch-3.2.2 in early October, at that time branch-3.2.3 has
> >> created but source code not freeze completely because several blocked
> >> issues reported and code freeze has done about mid October. Some issues
> >> which are tagged with 3.2.3 has also been involved in 3.2.2 during
> >> that period, include HDFS-15567. I will check them later, and make sure
> >> that we have mark the correct tags.
> >>
> >>
> >> > >
> >> > > > If it goes into FS/FC, what does it do for a viewfs with >1
> mounted
> >> > HDFS?
> >> > > Should it take path, msync(path) so that viewFS knows where to
> forward
> >> > it?
> >> > >
> >> > > The API shouldn't take any path - for viewFS I think it should call
> >> this
> >> > on
> >> > > all the child file systems. It might also need to handle the case
> >> where
> >> > > some downstream clusters support this capability while others don't.
> >> > >
> >> >
> >> > That's an extra bit of work for ViewFS then. It should probe for
> >> capability
> >> > and invoke as/when supported.
> >> >
> >> > >
> >> > > > Options
> >> > > 1. I roll HDFS-15567 back "please be follow process"
> >> > > 2. Someone does a followup patch with specification and contract
> test,
> >> > view
> >> > > FS. Add even more to the java
> >> > > 3. We do as per HADOOP-16898 into an MSyncable interface and then
> >> > > FileSystem & HDFS can implement. ViewFS and filterFS still need to
> >> pass
> >> > > through.
> >> > >
> >> > > I'm slightly in favor of the hasPathCapabilities approach and make
> >> this a
> >> > > mixin where FS impls can optionally support. Happy to hear what
> others
> >> > > think.
> >> > >
> >> >
> >> > Mixins are great when FC and FS can both implement; makes it easier to
> >> code
> >> > against either. All the filtering/aggregating FS's will have to
> >> implement
> >> > it, which means that presence of the interface doesn't guarantee
> >> support.
> >> >
> >> > This is an API where it'd be ok to have a no-op if not implemented,
> >> > correct? Or is there an requirement like Syncable that specific
> >> guarantees
> >> > are met?
> >> >
> >> > >
> >> > > Chao
> >> > >
> >> > >
> >> > > On Fri, Dec 11, 2020 at 9:00 AM Steve Loughran
> >> >  >> > > >
> >> > > wrote:
> >> > >
> >> > > > Silence from the  HDFS team
> >> > > >
> >> > > >
> >> > > > Hadoop 3.2.2 is in an RC; it has the new FS API call. I really
> don't
> >> > want
> >> > > > to veto the release just because someone pulled up a method
> without
> >> > doing
> >> > > > the due diligence.
> >> >
> >>
> >> Thanks Steve started this discussion here. I agree to roll back
> HDFS-15567
> >> if there are still some incompatible issues not resolved completely. And
> >> release will not be the blocked things here, I would like to prepare
> >> 

Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-13 Thread Chao Sun
> This is an API where it'd be ok to have a no-op if not implemented,
correct? Or is there an requirement like Syncable that specific guarantees
are met?

Yes I think it's ok to leave it as no-op for other non-HDFS FS impls: it is
only used by HDFS standby reads so far.



On Sun, Dec 13, 2020 at 4:58 AM Steve Loughran  wrote:

> This isn't worth holding up the RC. We'd just add something to the
> release notes "use with caution". And if we can get what the API does
> defined in a way which works, it shouldn't need changing.
>
> (which reminds me, I do need to check that RC out, don't I?)
>
> On Sun, 13 Dec 2020 at 09:00, Xiaoqiao He  wrote:
>
>> Thanks Steve very much for your discussion here.
>>
>> Leave some comments inline. Will focus on this thread to wait for the
>> final
>> conclusion to decide if we should prepare another release candidate of
>> 3.2.2.
>> Thanks Steve and Chao again for your warm discussions.
>>
>> On Sat, Dec 12, 2020 at 7:18 PM Steve Loughran
>> 
>> wrote:
>>
>> > Maybe it's not in the release; it's certainly in the 3.2 branch. Will
>> check
>> > further. If it's in the release I was thinking of adding a warning in
>> the
>> > notes "unstable API"; stable if invoked from DFSClient
>>
>> On Fri, 11 Dec 2020 at 18:21, Chao Sun  wrote:
>> >
>> > > I'm just curious why this is included in the 3.2.2 release?
>> HDFS-15567 is
>> > > tagged with 3.2.3 and the corresponding HDFS-14272 on server side is
>> > tagged
>> > > with 3.3.0.
>> >
>>
>> Just checked that HDFS-15567 has been involved in Hadoop-3.2.2 RC4. IIRC,
>> I
>> have cut branch-3.2.2 in early October, at that time branch-3.2.3 has
>> created but source code not freeze completely because several blocked
>> issues reported and code freeze has done about mid October. Some issues
>> which are tagged with 3.2.3 has also been involved in 3.2.2 during
>> that period, include HDFS-15567. I will check them later, and make sure
>> that we have mark the correct tags.
>>
>>
>> > >
>> > > > If it goes into FS/FC, what does it do for a viewfs with >1 mounted
>> > HDFS?
>> > > Should it take path, msync(path) so that viewFS knows where to forward
>> > it?
>> > >
>> > > The API shouldn't take any path - for viewFS I think it should call
>> this
>> > on
>> > > all the child file systems. It might also need to handle the case
>> where
>> > > some downstream clusters support this capability while others don't.
>> > >
>> >
>> > That's an extra bit of work for ViewFS then. It should probe for
>> capability
>> > and invoke as/when supported.
>> >
>> > >
>> > > > Options
>> > > 1. I roll HDFS-15567 back "please be follow process"
>> > > 2. Someone does a followup patch with specification and contract test,
>> > view
>> > > FS. Add even more to the java
>> > > 3. We do as per HADOOP-16898 into an MSyncable interface and then
>> > > FileSystem & HDFS can implement. ViewFS and filterFS still need to
>> pass
>> > > through.
>> > >
>> > > I'm slightly in favor of the hasPathCapabilities approach and make
>> this a
>> > > mixin where FS impls can optionally support. Happy to hear what others
>> > > think.
>> > >
>> >
>> > Mixins are great when FC and FS can both implement; makes it easier to
>> code
>> > against either. All the filtering/aggregating FS's will have to
>> implement
>> > it, which means that presence of the interface doesn't guarantee
>> support.
>> >
>> > This is an API where it'd be ok to have a no-op if not implemented,
>> > correct? Or is there an requirement like Syncable that specific
>> guarantees
>> > are met?
>> >
>> > >
>> > > Chao
>> > >
>> > >
>> > > On Fri, Dec 11, 2020 at 9:00 AM Steve Loughran
>> > > > > >
>> > > wrote:
>> > >
>> > > > Silence from the  HDFS team
>> > > >
>> > > >
>> > > > Hadoop 3.2.2 is in an RC; it has the new FS API call. I really don't
>> > want
>> > > > to veto the release just because someone pulled up a method without
>> > doing
>> > > > the due diligence.
>> >
>>
>> Thanks Steve started this discussion here. I agree to roll back HDFS-15567
>> if there are still some incompatible issues not resolved completely. And
>> release will not be the blocked things here, I would like to prepare
>> another RC if we would reach common agreement. To be honest, I think it is
>> better to involve Shvachko here.
>>
>>
>> > > > Is anyone in the HDFS going to do that due diligence or should we
>> > include
>> > > > something in the release notes "msync()" must be considered
>> unstable.
>> > > >
>> > > > Then we can do a proper msync().
>> > > >
>> > > > If it goes into FS/FC, what does it do for a viewfs with >1 mounted
>> > HDFS?
>> > > > Should it take path, msync(path) so that viewFS knows where to
>> forward
>> > > it?
>> > > >
>> > > > Alternatively: go with an MSync interface which those few FS which
>> > > > implement it (hdfs) can do that, and the fact that it doesn't have
>> doc
>> > or
>> > > > tests won't be a blocker any more?
>> > > >
>> > > > -steve
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Thu, 

Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-13 Thread Steve Loughran
This isn't worth holding up the RC. We'd just add something to the
release notes "use with caution". And if we can get what the API does
defined in a way which works, it shouldn't need changing.

(which reminds me, I do need to check that RC out, don't I?)

On Sun, 13 Dec 2020 at 09:00, Xiaoqiao He  wrote:

> Thanks Steve very much for your discussion here.
>
> Leave some comments inline. Will focus on this thread to wait for the final
> conclusion to decide if we should prepare another release candidate of
> 3.2.2.
> Thanks Steve and Chao again for your warm discussions.
>
> On Sat, Dec 12, 2020 at 7:18 PM Steve Loughran  >
> wrote:
>
> > Maybe it's not in the release; it's certainly in the 3.2 branch. Will
> check
> > further. If it's in the release I was thinking of adding a warning in the
> > notes "unstable API"; stable if invoked from DFSClient
>
> On Fri, 11 Dec 2020 at 18:21, Chao Sun  wrote:
> >
> > > I'm just curious why this is included in the 3.2.2 release? HDFS-15567
> is
> > > tagged with 3.2.3 and the corresponding HDFS-14272 on server side is
> > tagged
> > > with 3.3.0.
> >
>
> Just checked that HDFS-15567 has been involved in Hadoop-3.2.2 RC4. IIRC, I
> have cut branch-3.2.2 in early October, at that time branch-3.2.3 has
> created but source code not freeze completely because several blocked
> issues reported and code freeze has done about mid October. Some issues
> which are tagged with 3.2.3 has also been involved in 3.2.2 during
> that period, include HDFS-15567. I will check them later, and make sure
> that we have mark the correct tags.
>
>
> > >
> > > > If it goes into FS/FC, what does it do for a viewfs with >1 mounted
> > HDFS?
> > > Should it take path, msync(path) so that viewFS knows where to forward
> > it?
> > >
> > > The API shouldn't take any path - for viewFS I think it should call
> this
> > on
> > > all the child file systems. It might also need to handle the case where
> > > some downstream clusters support this capability while others don't.
> > >
> >
> > That's an extra bit of work for ViewFS then. It should probe for
> capability
> > and invoke as/when supported.
> >
> > >
> > > > Options
> > > 1. I roll HDFS-15567 back "please be follow process"
> > > 2. Someone does a followup patch with specification and contract test,
> > view
> > > FS. Add even more to the java
> > > 3. We do as per HADOOP-16898 into an MSyncable interface and then
> > > FileSystem & HDFS can implement. ViewFS and filterFS still need to pass
> > > through.
> > >
> > > I'm slightly in favor of the hasPathCapabilities approach and make
> this a
> > > mixin where FS impls can optionally support. Happy to hear what others
> > > think.
> > >
> >
> > Mixins are great when FC and FS can both implement; makes it easier to
> code
> > against either. All the filtering/aggregating FS's will have to implement
> > it, which means that presence of the interface doesn't guarantee support.
> >
> > This is an API where it'd be ok to have a no-op if not implemented,
> > correct? Or is there an requirement like Syncable that specific
> guarantees
> > are met?
> >
> > >
> > > Chao
> > >
> > >
> > > On Fri, Dec 11, 2020 at 9:00 AM Steve Loughran
> >  > > >
> > > wrote:
> > >
> > > > Silence from the  HDFS team
> > > >
> > > >
> > > > Hadoop 3.2.2 is in an RC; it has the new FS API call. I really don't
> > want
> > > > to veto the release just because someone pulled up a method without
> > doing
> > > > the due diligence.
> >
>
> Thanks Steve started this discussion here. I agree to roll back HDFS-15567
> if there are still some incompatible issues not resolved completely. And
> release will not be the blocked things here, I would like to prepare
> another RC if we would reach common agreement. To be honest, I think it is
> better to involve Shvachko here.
>
>
> > > > Is anyone in the HDFS going to do that due diligence or should we
> > include
> > > > something in the release notes "msync()" must be considered unstable.
> > > >
> > > > Then we can do a proper msync().
> > > >
> > > > If it goes into FS/FC, what does it do for a viewfs with >1 mounted
> > HDFS?
> > > > Should it take path, msync(path) so that viewFS knows where to
> forward
> > > it?
> > > >
> > > > Alternatively: go with an MSync interface which those few FS which
> > > > implement it (hdfs) can do that, and the fact that it doesn't have
> doc
> > or
> > > > tests won't be a blocker any more?
> > > >
> > > > -steve
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, 10 Dec 2020 at 12:41, Steve Loughran 
> > > wrote:
> > > >
> > > > >
> > > > > Gosh, has it really been only since february since I last asked the
> > > HDFS
> > > > > dev list to stop adding anything to FileSystem/FileContext APIs
> > without
> > > > >
> > > > > * mentioning this on the hadoop-common list.
> > > > > * specifying what it does in filesystem.md
> > > > > * with a contract test
> > > > > * a new hasPathCapabilities probe. Throwing
> > > UnsupportedOperationException
> > > > > 

Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-13 Thread Xiaoqiao He
Thanks Steve very much for your discussion here.

Leave some comments inline. Will focus on this thread to wait for the final
conclusion to decide if we should prepare another release candidate of
3.2.2.
Thanks Steve and Chao again for your warm discussions.

On Sat, Dec 12, 2020 at 7:18 PM Steve Loughran 
wrote:

> Maybe it's not in the release; it's certainly in the 3.2 branch. Will check
> further. If it's in the release I was thinking of adding a warning in the
> notes "unstable API"; stable if invoked from DFSClient

On Fri, 11 Dec 2020 at 18:21, Chao Sun  wrote:
>
> > I'm just curious why this is included in the 3.2.2 release? HDFS-15567 is
> > tagged with 3.2.3 and the corresponding HDFS-14272 on server side is
> tagged
> > with 3.3.0.
>

Just checked that HDFS-15567 has been involved in Hadoop-3.2.2 RC4. IIRC, I
have cut branch-3.2.2 in early October, at that time branch-3.2.3 has
created but source code not freeze completely because several blocked
issues reported and code freeze has done about mid October. Some issues
which are tagged with 3.2.3 has also been involved in 3.2.2 during
that period, include HDFS-15567. I will check them later, and make sure
that we have mark the correct tags.


> >
> > > If it goes into FS/FC, what does it do for a viewfs with >1 mounted
> HDFS?
> > Should it take path, msync(path) so that viewFS knows where to forward
> it?
> >
> > The API shouldn't take any path - for viewFS I think it should call this
> on
> > all the child file systems. It might also need to handle the case where
> > some downstream clusters support this capability while others don't.
> >
>
> That's an extra bit of work for ViewFS then. It should probe for capability
> and invoke as/when supported.
>
> >
> > > Options
> > 1. I roll HDFS-15567 back "please be follow process"
> > 2. Someone does a followup patch with specification and contract test,
> view
> > FS. Add even more to the java
> > 3. We do as per HADOOP-16898 into an MSyncable interface and then
> > FileSystem & HDFS can implement. ViewFS and filterFS still need to pass
> > through.
> >
> > I'm slightly in favor of the hasPathCapabilities approach and make this a
> > mixin where FS impls can optionally support. Happy to hear what others
> > think.
> >
>
> Mixins are great when FC and FS can both implement; makes it easier to code
> against either. All the filtering/aggregating FS's will have to implement
> it, which means that presence of the interface doesn't guarantee support.
>
> This is an API where it'd be ok to have a no-op if not implemented,
> correct? Or is there an requirement like Syncable that specific guarantees
> are met?
>
> >
> > Chao
> >
> >
> > On Fri, Dec 11, 2020 at 9:00 AM Steve Loughran
>  > >
> > wrote:
> >
> > > Silence from the  HDFS team
> > >
> > >
> > > Hadoop 3.2.2 is in an RC; it has the new FS API call. I really don't
> want
> > > to veto the release just because someone pulled up a method without
> doing
> > > the due diligence.
>

Thanks Steve started this discussion here. I agree to roll back HDFS-15567
if there are still some incompatible issues not resolved completely. And
release will not be the blocked things here, I would like to prepare
another RC if we would reach common agreement. To be honest, I think it is
better to involve Shvachko here.


> > > Is anyone in the HDFS going to do that due diligence or should we
> include
> > > something in the release notes "msync()" must be considered unstable.
> > >
> > > Then we can do a proper msync().
> > >
> > > If it goes into FS/FC, what does it do for a viewfs with >1 mounted
> HDFS?
> > > Should it take path, msync(path) so that viewFS knows where to forward
> > it?
> > >
> > > Alternatively: go with an MSync interface which those few FS which
> > > implement it (hdfs) can do that, and the fact that it doesn't have doc
> or
> > > tests won't be a blocker any more?
> > >
> > > -steve
> > >
> > >
> > >
> > >
> > > On Thu, 10 Dec 2020 at 12:41, Steve Loughran 
> > wrote:
> > >
> > > >
> > > > Gosh, has it really been only since february since I last asked the
> > HDFS
> > > > dev list to stop adding anything to FileSystem/FileContext APIs
> without
> > > >
> > > > * mentioning this on the hadoop-common list.
> > > > * specifying what it does in filesystem.md
> > > > * with a contract test
> > > > * a new hasPathCapabilities probe. Throwing
> > UnsupportedOperationException
> > > > only lets people work out if it is unsupported through invocation.
> > Being
> > > > able to probe for it is better.
> > > > * ViewFS support.
> > > > * And, for any new API, one which works well for high-latency object
> > > > stores: returning Future and
> > Future
> > > > when > 1 result is returned
> > > >
> > > > This needs to hold even for pulling something up from HDFS. Because
> if
> > > > another FS wants to implement it, they need to know what it does, and
> > > have
> > > > tests to verify this. I say this as someone who has tried to document
> > > HDFS
> > > > 

Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-12 Thread Steve Loughran
Maybe it's not in the release; it's certainly in the 3.2 branch. Will check
further. If it's in the release I was thinking of adding a warning in the
notes "unstable API"; stable if invoked from DFSClient

On Fri, 11 Dec 2020 at 18:21, Chao Sun  wrote:

> I'm just curious why this is included in the 3.2.2 release? HDFS-15567 is
> tagged with 3.2.3 and the corresponding HDFS-14272 on server side is tagged
> with 3.3.0.
>
> > If it goes into FS/FC, what does it do for a viewfs with >1 mounted HDFS?
> Should it take path, msync(path) so that viewFS knows where to forward it?
>
> The API shouldn't take any path - for viewFS I think it should call this on
> all the child file systems. It might also need to handle the case where
> some downstream clusters support this capability while others don't.
>

That's an extra bit of work for ViewFS then. It should probe for capability
and invoke as/when supported.




>
> > Options
> 1. I roll HDFS-15567 back "please be follow process"
> 2. Someone does a followup patch with specification and contract test, view
> FS. Add even more to the java
> 3. We do as per HADOOP-16898 into an MSyncable interface and then
> FileSystem & HDFS can implement. ViewFS and filterFS still need to pass
> through.
>
> I'm slightly in favor of the hasPathCapabilities approach and make this a
> mixin where FS impls can optionally support. Happy to hear what others
> think.
>

Mixins are great when FC and FS can both implement; makes it easier to code
against either. All the filtering/aggregating FS's will have to implement
it, which means that presence of the interface doesn't guarantee support.

This is an API where it'd be ok to have a no-op if not implemented,
correct? Or is there an requirement like Syncable that specific guarantees
are met?


>
> Chao
>
>
>
> On Fri, Dec 11, 2020 at 9:00 AM Steve Loughran  >
> wrote:
>
> > Silence from the  HDFS team
> >
> >
> > Hadoop 3.2.2 is in an RC; it has the new FS API call. I really don't want
> > to veto the release just because someone pulled up a method without doing
> > the due diligence.
> >
> > Is anyone in the HDFS going to do that due diligence or should we include
> > something in the release notes "msync()" must be considered unstable.
> >
> > Then we can do a proper msync().
> >
> > If it goes into FS/FC, what does it do for a viewfs with >1 mounted HDFS?
> > Should it take path, msync(path) so that viewFS knows where to forward
> it?
> >
> > Alternatively: go with an MSync interface which those few FS which
> > implement it (hdfs) can do that, and the fact that it doesn't have doc or
> > tests won't be a blocker any more?
> >
> > -steve
> >
> >
> >
> >
> > On Thu, 10 Dec 2020 at 12:41, Steve Loughran 
> wrote:
> >
> > >
> > > Gosh, has it really been only since february since I last asked the
> HDFS
> > > dev list to stop adding anything to FileSystem/FileContext APIs without
> > >
> > > * mentioning this on the hadoop-common list.
> > > * specifying what it does in filesystem.md
> > > * with a contract test
> > > * a new hasPathCapabilities probe. Throwing
> UnsupportedOperationException
> > > only lets people work out if it is unsupported through invocation.
> Being
> > > able to probe for it is better.
> > > * ViewFS support.
> > > * And, for any new API, one which works well for high-latency object
> > > stores: returning Future and
> Future
> > > when > 1 result is returned
> > >
> > > This needs to hold even for pulling something up from HDFS. Because if
> > > another FS wants to implement it, they need to know what it does, and
> > have
> > > tests to verify this. I say this as someone who has tried to document
> > HDFS
> > > rename() semantics and gave up.
> > >
> > > It's really frustrating that every time someone does an FS API change
> > like
> > > this in the past (most recently HDFS-13616) I am the one who has to
> keep
> > > sending the reminders out, and then having to try and clean up/.
> > >
> > > So what now?
> > >
> > > Options
> > > 1. I roll HDFS-15567 back "please be follow process"
> > > 2. Someone does a followup patch with specification and contract test,
> > > view FS. Add even more to the java
> > > 3. We do as per HADOOP-16898 into an MSyncable interface and then
> > > FileSystem & HDFS can implement. ViewFS and filterFS still need to pass
> > > through.
> > >
> > > *If nobody is going to volunteer for the specification/test changes,
> I'm
> > > happy for the rollback. It'll remind people about process, *
> > >
> > > Pre-emptive Warning: No matter what we do for this patch, I will roll
> > back
> > > the next change which adds a new API if it's not accompanied by
> > > specification and tests.
> > >
> > > Unhappily yours,
> > >
> > > Steve
> > >
> >
>


Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-11 Thread Chao Sun
I'm just curious why this is included in the 3.2.2 release? HDFS-15567 is
tagged with 3.2.3 and the corresponding HDFS-14272 on server side is tagged
with 3.3.0.

> If it goes into FS/FC, what does it do for a viewfs with >1 mounted HDFS?
Should it take path, msync(path) so that viewFS knows where to forward it?

The API shouldn't take any path - for viewFS I think it should call this on
all the child file systems. It might also need to handle the case where
some downstream clusters support this capability while others don't.

> Options
1. I roll HDFS-15567 back "please be follow process"
2. Someone does a followup patch with specification and contract test, view
FS. Add even more to the java
3. We do as per HADOOP-16898 into an MSyncable interface and then
FileSystem & HDFS can implement. ViewFS and filterFS still need to pass
through.

I'm slightly in favor of the hasPathCapabilities approach and make this a
mixin where FS impls can optionally support. Happy to hear what others
think.

Chao



On Fri, Dec 11, 2020 at 9:00 AM Steve Loughran 
wrote:

> Silence from the  HDFS team
>
>
> Hadoop 3.2.2 is in an RC; it has the new FS API call. I really don't want
> to veto the release just because someone pulled up a method without doing
> the due diligence.
>
> Is anyone in the HDFS going to do that due diligence or should we include
> something in the release notes "msync()" must be considered unstable.
>
> Then we can do a proper msync().
>
> If it goes into FS/FC, what does it do for a viewfs with >1 mounted HDFS?
> Should it take path, msync(path) so that viewFS knows where to forward it?
>
> Alternatively: go with an MSync interface which those few FS which
> implement it (hdfs) can do that, and the fact that it doesn't have doc or
> tests won't be a blocker any more?
>
> -steve
>
>
>
>
> On Thu, 10 Dec 2020 at 12:41, Steve Loughran  wrote:
>
> >
> > Gosh, has it really been only since february since I last asked the HDFS
> > dev list to stop adding anything to FileSystem/FileContext APIs without
> >
> > * mentioning this on the hadoop-common list.
> > * specifying what it does in filesystem.md
> > * with a contract test
> > * a new hasPathCapabilities probe. Throwing UnsupportedOperationException
> > only lets people work out if it is unsupported through invocation. Being
> > able to probe for it is better.
> > * ViewFS support.
> > * And, for any new API, one which works well for high-latency object
> > stores: returning Future and  Future
> > when > 1 result is returned
> >
> > This needs to hold even for pulling something up from HDFS. Because if
> > another FS wants to implement it, they need to know what it does, and
> have
> > tests to verify this. I say this as someone who has tried to document
> HDFS
> > rename() semantics and gave up.
> >
> > It's really frustrating that every time someone does an FS API change
> like
> > this in the past (most recently HDFS-13616) I am the one who has to keep
> > sending the reminders out, and then having to try and clean up/.
> >
> > So what now?
> >
> > Options
> > 1. I roll HDFS-15567 back "please be follow process"
> > 2. Someone does a followup patch with specification and contract test,
> > view FS. Add even more to the java
> > 3. We do as per HADOOP-16898 into an MSyncable interface and then
> > FileSystem & HDFS can implement. ViewFS and filterFS still need to pass
> > through.
> >
> > *If nobody is going to volunteer for the specification/test changes, I'm
> > happy for the rollback. It'll remind people about process, *
> >
> > Pre-emptive Warning: No matter what we do for this patch, I will roll
> back
> > the next change which adds a new API if it's not accompanied by
> > specification and tests.
> >
> > Unhappily yours,
> >
> > Steve
> >
>


Re: Regarding HDFS-15567. HDFS should expose msync() API to allow downstream applications call it explicitly

2020-12-11 Thread Steve Loughran
Silence from the  HDFS team


Hadoop 3.2.2 is in an RC; it has the new FS API call. I really don't want
to veto the release just because someone pulled up a method without doing
the due diligence.

Is anyone in the HDFS going to do that due diligence or should we include
something in the release notes "msync()" must be considered unstable.

Then we can do a proper msync().

If it goes into FS/FC, what does it do for a viewfs with >1 mounted HDFS?
Should it take path, msync(path) so that viewFS knows where to forward it?

Alternatively: go with an MSync interface which those few FS which
implement it (hdfs) can do that, and the fact that it doesn't have doc or
tests won't be a blocker any more?

-steve




On Thu, 10 Dec 2020 at 12:41, Steve Loughran  wrote:

>
> Gosh, has it really been only since february since I last asked the HDFS
> dev list to stop adding anything to FileSystem/FileContext APIs without
>
> * mentioning this on the hadoop-common list.
> * specifying what it does in filesystem.md
> * with a contract test
> * a new hasPathCapabilities probe. Throwing UnsupportedOperationException
> only lets people work out if it is unsupported through invocation. Being
> able to probe for it is better.
> * ViewFS support.
> * And, for any new API, one which works well for high-latency object
> stores: returning Future and  Future
> when > 1 result is returned
>
> This needs to hold even for pulling something up from HDFS. Because if
> another FS wants to implement it, they need to know what it does, and have
> tests to verify this. I say this as someone who has tried to document HDFS
> rename() semantics and gave up.
>
> It's really frustrating that every time someone does an FS API change like
> this in the past (most recently HDFS-13616) I am the one who has to keep
> sending the reminders out, and then having to try and clean up/.
>
> So what now?
>
> Options
> 1. I roll HDFS-15567 back "please be follow process"
> 2. Someone does a followup patch with specification and contract test,
> view FS. Add even more to the java
> 3. We do as per HADOOP-16898 into an MSyncable interface and then
> FileSystem & HDFS can implement. ViewFS and filterFS still need to pass
> through.
>
> *If nobody is going to volunteer for the specification/test changes, I'm
> happy for the rollback. It'll remind people about process, *
>
> Pre-emptive Warning: No matter what we do for this patch, I will roll back
> the next change which adds a new API if it's not accompanied by
> specification and tests.
>
> Unhappily yours,
>
> Steve
>