Apache Hadoop qbt Report: branch-3.3+JDK8 on Linux/x86_64

2022-11-10 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/82/

[Nov 4, 2022, 10:00:17 AM] (noreply) HADOOP-18484. Upgrade hsqldb to v2.7.1 to 
mitigate CVE-2022-41853  (#5101)
[Nov 5, 2022, 4:28:24 PM] (noreply) HADOOP-18515. Backport HADOOP-17612 to  
branch-3.3(Upgrade Zookeeper to 3.6.3 and Curator to 5.2.0)  (#5097)
[Nov 7, 2022, 9:29:50 PM] (noreply) Hadoop-18519. Backport HDFS-15383 and 
HADOOP-17835 to branch-3.3 (#5112)
[Nov 8, 2022, 2:48:29 AM] (noreply) Hadoop-18520. Backport HADOOP-18427 and 
HADOOP-18452 to branch-3.3 (#5118)
[Nov 8, 2022, 1:35:42 PM] (Steve Loughran) HADOOP-18507. VectorIO FileRange 
type to support a "reference" field (#5076)
[Nov 8, 2022, 1:41:31 PM] (Steve Loughran) HADOOP-18517. ABFS: Add 
fs.azure.enable.readahead option to disable readahead (#5103)
[Nov 10, 2022, 5:37:09 AM] (noreply) HDFS-16811. Support 
DecommissionBackoffMonitor parameters reconfigurable (#5122)




-1 overall


The following subsystems voted -1:
blanks pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

Failed junit tests :

   hadoop.hdfs.TestDistributedFileSystem 
   hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy 
   hadoop.hdfs.server.namenode.TestRedudantBlocks 
   hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes 
  

   cc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/82/artifact/out/results-compile-cc-root.txt
 [48K]

   javac:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/82/artifact/out/results-compile-javac-root.txt
 [376K]

   blanks:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/82/artifact/out/blanks-eol.txt
 [14M]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/82/artifact/out/blanks-tabs.txt
 [2.0M]

   checkstyle:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/82/artifact/out/results-checkstyle-root.txt
 [14M]

   pathlen:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/82/artifact/out/results-pathlen.txt
 [16K]

   pylint:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/82/artifact/out/results-pylint.txt
 [20K]

   shellcheck:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/82/artifact/out/results-shellcheck.txt
 [20K]

   xml:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/82/artifact/out/xml.txt
 [32K]

   javadoc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/82/artifact/out/results-javadoc-javadoc-root.txt
 [1.1M]

   unit:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-3.3-java8-linux-x86_64/82/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 [712K]

Powered by Apache Yetus 0.14.0-SNAPSHOT   https://yetus.apache.org

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2022-11-10 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1040/

[Nov 9, 2022, 2:21:43 AM] (noreply) HADOOP-18502. MutableStat should return 0 
when there is no change (#5058)
[Nov 9, 2022, 11:18:31 AM] (noreply) HADOOP-18433. Fix main thread name for . 
(#4838)
[Nov 9, 2022, 6:25:10 PM] (noreply) YARN-11367. [Federation] Fix 
DefaultRequestInterceptorREST Client NPE. (#5100)




-1 overall


The following subsystems voted -1:
blanks hadolint pathlen xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 
  

   cc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1040/artifact/out/results-compile-cc-root.txt
 [96K]

   javac:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1040/artifact/out/results-compile-javac-root.txt
 [528K]

   blanks:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1040/artifact/out/blanks-eol.txt
 [14M]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1040/artifact/out/blanks-tabs.txt
 [2.0M]

   checkstyle:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1040/artifact/out/results-checkstyle-root.txt
 [13M]

   hadolint:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1040/artifact/out/results-hadolint.txt
 [8.0K]

   pathlen:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1040/artifact/out/results-pathlen.txt
 [16K]

   pylint:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1040/artifact/out/results-pylint.txt
 [20K]

   shellcheck:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1040/artifact/out/results-shellcheck.txt
 [24K]

   xml:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1040/artifact/out/xml.txt
 [24K]

   javadoc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1040/artifact/out/results-javadoc-javadoc-root.txt
 [392K]

Powered by Apache Yetus 0.14.0-SNAPSHOT   https://yetus.apache.org

Re: [DISCUSS] Supporting partial file rewrite/compose

2022-11-10 Thread Ayush Saxena
Hi Wei-Chiu,

I think this got lost somewhere or the discussion moved somewhere else, if
so please loop in me as well. Just guessing why everyone is so quiet :-)

1. Apache Iceberg or other evolvable table format.


 The first point that you mentioned regarding the iceberg, may be the use
case that you mentioned is not the same as what I can think of, but I still
want to rewrite actually the data files and if I can save on rewriting the
complete ones in Iceberg's copy on write mode, I feel that would lead to
read performance improvement as I can ditch  the Merge on Read mode & the
write performance won't suffer because I didn't re-write the entire file,
just removed some data from the actual data files rather than maintaining
in the delete file. I `feel` the performance should be more or less the
same as writing a delete file.

Maybe another use case could be Hive-Acid tables, can help in compactions &
those delete delta files and stuff like that, not going deep into that but
maybe...

2. GDPR compliance "the right to erasure"

That is even what my use case was also looking like just delete a record or
set of records. Just the records are stored in a Table

>From the HDFS point of view I think this isn't naive but I still feel it is
doable, *Do you have pointers regarding if it is possible with the object
stores as well*, that is where my interest lies.

3. In-place erasure coding conversion.

Regarding the Erasure Coding In-Place conversion from Replicated files, If
I remember it correct, there was a branch created for it, some patches were
committed, playing with some header or so, I have some faint memory, the
only issue which I remember with the design was if someone is reading a
file which is replicated and it gets converted into ErasureCoded file and
that guy for some reason refethes the block, he would be failing, may be
some changes in the DFSInputStream to handle or move to
DFSStripedInputStream in such situations might have solved it, but I guess
folks chasing it for some reason left it mid way and I feel that isn't more
than a week effort remaining, If I remember correctly. I can be wrong..

Thoughts? What would be a good FS interface to support these requirements?


Ok, I might be biased because of a use case or the only use case that is
coming into my mind, but the FileName, the indexes of the row and
optionally the row data itself to prevent wrong data deletion for being on
the safe side, we can keep the third param optional. and may be an object
like RowDeletes, which takes the starting and ending index in the file and
the row data for it(optionally) Just for reference from where this is
coming, the code is at [1]

[1]
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/deletes/PositionDelete.java#L34-L38

BTW Thanx for sharing the details!!!

-Ayush


On Sat, 8 Oct 2022 at 05:58, Wei-Chiu Chuang  wrote:

> There were a number of discussions that happened during ApacheCon. In the
> spirit of the Apache Way, I am taking the conversation online, sharing with
> the larger community and also capturing requirements. Credits to Owen who
> started this discussion.
>
> There are a number of scenarios where users want to partially rewrite file
> blocks, and it would make sense to create a file system API to make these
> operations efficient.
>
> 1. Apache Iceberg or other evolvable table format.
> These table formats need to update table schema. The underlying files are
> rewritten but only a subset of blocks are changed. It would be much more
> efficient if a new file can be composed using some of the existing file
> blocks.
>
> 2. GDPR compliance "the right to erasure"
> Files must be rewritten to remove a person's data at request. Again, this
> is efficient because only a small set of file blocks is updated.
>
> 3. In-place erasure coding conversion.
> I had a proposal to support atomically rewriting replicated files into
> erasure coded files. This can be the building block to support
> auto-tiering.
>
> Thoughts? What would be a good FS interface to support these requirements?
>
> For Ozone folks, Ritesh opened a jira: HDDS-7297
>  but I figured a larger
> conversation should happen so that we can take into the consideration of
> other FS implementations.
>
> Thanks,
> Weichiu
>


[jira] [Resolved] (HADOOP-18504) An unhandled NullPointerException in class KeyProvider

2022-11-10 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18504.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

yeah, marked as fixed. thanks

>  An unhandled NullPointerException in class KeyProvider
> ---
>
> Key: HADOOP-18504
> URL: https://issues.apache.org/jira/browse/HADOOP-18504
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.3.4
>Reporter: FuzzingTeam
>Assignee: FuzzingTeam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The code throws an unhandled NullPointerException when the method 
> *getBaseName* of KeyProvider.java is called with a null as input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: way forward for Winutils excision from `FileSystem`

2022-11-10 Thread Chris Nauroth
Symlink support on the local file system is still used. One example I can
think of is YARN container launch [1].

I would welcome removal of winutils, as already described in various JIRA
issues. I think the biggest challenge we'll have is testing of a transition
from winutils to the newer Java APIs. The contract tests help, but
historically there was also a tendency to break things in downstream
dependent projects.

I'd suggest taking this on piecemeal, transitioning small pieces of
FileSystem off of winutils one at a time.

[1]
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java#L1508-L1509

Chris Nauroth


On Thu, Nov 10, 2022 at 10:33 AM Wei-Chiu Chuang  wrote:

> >
> >
> >
> >   * Bare Naked Local File System v0.1.0 doesn't (yet) support symlinks
> > or the sticky bit.
> >
> ok to not support symlinks. The symlinks of HDFS are not being maintained
> and I am not aware of anything relying on it.
> So I assume people don't need it.
>
> Sticky bit would be useful, I guess.
>
> I suppose folks working at Microsoft would be more interested in this work?
> Last time I heard, Gautham and Inigo were revamping Hadoop's Windows
> support.
>
>
> >   * But the bigger issue is how to excise Winutils completely in the
> > existing Hadoop code. Winutils assumptions are hard-coded at a low
> > level across various classes—even code that has nothing to do with
> > the file system. The startup configuration for example calls
> > `StringUtils.equalsIgnoreCase("true", valueString)` which loads the
> > `StringUtils` class, which has a static reference to `Shell`, which
> > has a static block that checks for `WINUTILS_EXE`.
> >   * For the most part there should no longer even be a need for anything
> > but direct Java API access for the local file system. But muddling
> > things further, the existing `RawLocalFileSystem` implementation has
> > /four/ ways to access the local file system: Winutils, JNI calls,
> > shell access, and a "new" approach using "stat". The "stat" approach
> > has been switched off with a hard-coded `useDeprecatedFileStatus =
> > true` because of HADOOP-9652
> > .
> >   * Local file access is not contained within `RawLocalFileSystem` but
> > is scattered across other classes; `FileUtil.readLink()` for example
> > (which `RawLocalFileSystem` calls because of the deprecation issue
> > above) uses the shell approach without any option to change it.
> > (This implementation-specific decision should have been contained
> > within the `FileSystem` implementation itself.)
> >
> > In short, it's a mess that has accumulated over years and getting worse,
> > charging high interest on what at first was a small, self-contained
> > technical debt.
> >
> > I would welcome the opportunity to clean up this mess. I'm probably as
> > qualified as anyone to make the changes. This is one of my areas of
> > expertise: I was designing a full abstract file system interface (with
> > pure-Java from-scratch implementations for the local file system,
> > Subversion, and WebDAV—even the WebDAV HTTP implementation was from
> > scratch) around the time Apache Nutch was getting off the ground. Most
> > recently I've worked on the Hadoop `FileSystem` API contracting for
> > LinkedIn, discovering (what I consider to be) a huge bug in
> > ViewFilesystem, HADOOP-18525
> > .
> >
> > The cleanup should be done in several stages (e.g. consolidating
> > WinUtils access; replacing code with pure Java API calls; undeprecating
> > the new Stat code and relegating it to a different class, etc.).
> > Unfortunately it's not financially feasible for me to sit here for
> > several months and revamp the Hadoop `FileSystem` subsystem for fun
> > (even though I wish I could). Perhaps there is job opening at a company
> > related to Hadoop that would be interested in hiring me and devoting a
> > certain percentage of my time to fixing local `FileSystem` access. If
> > so, let me know where I should send my resume
> > .
> >
> > Otherwise let me know if any ideas for a way forward. If there proves to
> > be interest in GlobalMentor Hadoop Bare Naked Local FileSystem
> >  on GitHub
> > I'll try to maintain and improve it, but really what needs to be
> > revamped is the Hadoop codebase itself. I'll be happy when Hadoop is
> > fixed so that both Steve's code and my code are no longer needed.
> >
> > Garret
> >
>


Re: way forward for Winutils excision from `FileSystem`

2022-11-10 Thread Wei-Chiu Chuang
>
>
>
>   * Bare Naked Local File System v0.1.0 doesn't (yet) support symlinks
> or the sticky bit.
>
ok to not support symlinks. The symlinks of HDFS are not being maintained
and I am not aware of anything relying on it.
So I assume people don't need it.

Sticky bit would be useful, I guess.

I suppose folks working at Microsoft would be more interested in this work?
Last time I heard, Gautham and Inigo were revamping Hadoop's Windows
support.


>   * But the bigger issue is how to excise Winutils completely in the
> existing Hadoop code. Winutils assumptions are hard-coded at a low
> level across various classes—even code that has nothing to do with
> the file system. The startup configuration for example calls
> `StringUtils.equalsIgnoreCase("true", valueString)` which loads the
> `StringUtils` class, which has a static reference to `Shell`, which
> has a static block that checks for `WINUTILS_EXE`.
>   * For the most part there should no longer even be a need for anything
> but direct Java API access for the local file system. But muddling
> things further, the existing `RawLocalFileSystem` implementation has
> /four/ ways to access the local file system: Winutils, JNI calls,
> shell access, and a "new" approach using "stat". The "stat" approach
> has been switched off with a hard-coded `useDeprecatedFileStatus =
> true` because of HADOOP-9652
> .
>   * Local file access is not contained within `RawLocalFileSystem` but
> is scattered across other classes; `FileUtil.readLink()` for example
> (which `RawLocalFileSystem` calls because of the deprecation issue
> above) uses the shell approach without any option to change it.
> (This implementation-specific decision should have been contained
> within the `FileSystem` implementation itself.)
>
> In short, it's a mess that has accumulated over years and getting worse,
> charging high interest on what at first was a small, self-contained
> technical debt.
>
> I would welcome the opportunity to clean up this mess. I'm probably as
> qualified as anyone to make the changes. This is one of my areas of
> expertise: I was designing a full abstract file system interface (with
> pure-Java from-scratch implementations for the local file system,
> Subversion, and WebDAV—even the WebDAV HTTP implementation was from
> scratch) around the time Apache Nutch was getting off the ground. Most
> recently I've worked on the Hadoop `FileSystem` API contracting for
> LinkedIn, discovering (what I consider to be) a huge bug in
> ViewFilesystem, HADOOP-18525
> .
>
> The cleanup should be done in several stages (e.g. consolidating
> WinUtils access; replacing code with pure Java API calls; undeprecating
> the new Stat code and relegating it to a different class, etc.).
> Unfortunately it's not financially feasible for me to sit here for
> several months and revamp the Hadoop `FileSystem` subsystem for fun
> (even though I wish I could). Perhaps there is job opening at a company
> related to Hadoop that would be interested in hiring me and devoting a
> certain percentage of my time to fixing local `FileSystem` access. If
> so, let me know where I should send my resume
> .
>
> Otherwise let me know if any ideas for a way forward. If there proves to
> be interest in GlobalMentor Hadoop Bare Naked Local FileSystem
>  on GitHub
> I'll try to maintain and improve it, but really what needs to be
> revamped is the Hadoop codebase itself. I'll be happy when Hadoop is
> fixed so that both Steve's code and my code are no longer needed.
>
> Garret
>


way forward for Winutils excision from `FileSystem`

2022-11-10 Thread Garret Wilson
Steve Loughran and I have been discussing on Stack Overflow 
 a way forward for removing the 
Winutils requirement from the local `FileSystem` implementations.


Hadoop's FileSystem API has a lot of *nix assumptions which originally 
made it not possible to implement in pure Java for local file system 
access. The current implementation essentially creates shell processes 
that invoke *nix commands in order to e.g. access permissions. To get 
this working on Windows, Steve created Winutils 
, a sort of Windows back-door 
subsystem of binary executables which must be installed separately 
(think a tiny .NET) and which Hadoop can invoke as a substitute for *nix 
calls. At the time it was no doubt a nifty quick workaround, but as a 
long-term solution it is horrible (for a long list of reasons which 
everyone here already knows so I won't go into them; see HADOOP-13223 
 and HADOOP-17839 
.) There should be 
no need to install a separate set of executables maintained by a third 
party just to get Spark to write output to a local file on a Windows 
laptop, for example.


I have created the GlobalMentor Hadoop Bare Naked Local FileSystem 
, an 
implementation of `FileSystem` for the local file system that extends 
`LocalFileSystem`/`RawLocalFileSystem` and "undoes" the Winutils code by 
accessing pure Java API calls instead. It is available on Maven, and 
using it with Spark is as simple as including it as a dependency at 
specifying the implementation in the configuration, e.g. programmatically:


```java
SparkSession spark = SparkSession.builder().appName("Foo 
Bar").master("local").getOrCreate();
spark.sparkContext().hadoopConfiguration().setClass("fs.file.impl", 
BareLocalFileSystem.class, FileSystem.class);

```

But Bare Naked Local File System is not the end of the story.

 * Bare Naked Local File System v0.1.0 doesn't (yet) support symlinks
   or the sticky bit.
 * But the bigger issue is how to excise Winutils completely in the
   existing Hadoop code. Winutils assumptions are hard-coded at a low
   level across various classes—even code that has nothing to do with
   the file system. The startup configuration for example calls
   `StringUtils.equalsIgnoreCase("true", valueString)` which loads the
   `StringUtils` class, which has a static reference to `Shell`, which
   has a static block that checks for `WINUTILS_EXE`.
 * For the most part there should no longer even be a need for anything
   but direct Java API access for the local file system. But muddling
   things further, the existing `RawLocalFileSystem` implementation has
   /four/ ways to access the local file system: Winutils, JNI calls,
   shell access, and a "new" approach using "stat". The "stat" approach
   has been switched off with a hard-coded `useDeprecatedFileStatus =
   true` because of HADOOP-9652
   .
 * Local file access is not contained within `RawLocalFileSystem` but
   is scattered across other classes; `FileUtil.readLink()` for example
   (which `RawLocalFileSystem` calls because of the deprecation issue
   above) uses the shell approach without any option to change it.
   (This implementation-specific decision should have been contained
   within the `FileSystem` implementation itself.)

In short, it's a mess that has accumulated over years and getting worse, 
charging high interest on what at first was a small, self-contained 
technical debt.


I would welcome the opportunity to clean up this mess. I'm probably as 
qualified as anyone to make the changes. This is one of my areas of 
expertise: I was designing a full abstract file system interface (with 
pure-Java from-scratch implementations for the local file system, 
Subversion, and WebDAV—even the WebDAV HTTP implementation was from 
scratch) around the time Apache Nutch was getting off the ground. Most 
recently I've worked on the Hadoop `FileSystem` API contracting for 
LinkedIn, discovering (what I consider to be) a huge bug in 
ViewFilesystem, HADOOP-18525 
.


The cleanup should be done in several stages (e.g. consolidating 
WinUtils access; replacing code with pure Java API calls; undeprecating 
the new Stat code and relegating it to a different class, etc.). 
Unfortunately it's not financially feasible for me to sit here for 
several months and revamp the Hadoop `FileSystem` subsystem for fun 
(even though I wish I could). Perhaps there is job opening at a company 
related to Hadoop that would be interested in hiring me and devoting a 
certain percentage of my time to fixing local `FileSystem` access. If 
so, let me know where I should send my resume 
.


Otherwise let me know if 

[jira] [Created] (HADOOP-18526) Leak of S3AInstrumentation instances via hadoop Metrics references

2022-11-10 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-18526:
---

 Summary: Leak of S3AInstrumentation instances via hadoop Metrics 
references
 Key: HADOOP-18526
 URL: https://issues.apache.org/jira/browse/HADOOP-18526
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.3.4
Reporter: Steve Loughran


A heap dump of a process running OOM shows that if a process creates then 
destroys lots of S3AFS instances, you seem to run out of heap due to references 
to S3AInstrumentation and the IOStatisticsStore kept via the hadoop metrics 
registry

It doesn't look like S3AInstrumentation.close() is being invoked in 
S3AFS.close(). it should -with the IOStats being snapshotted to a local 
reference before this happens. This allows for stats of a closed fs to be 
examined.

If you look at org.apache.hadoop.ipc.DecayRpcScheduler.MetricsProxy it uses a 
WeakReference to refer back to the larger object. we should do the same for 
abfs/s3a bindings. ideally do some template proxy class in hadoop common they 
can both use.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2022-11-10 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/

No changes




-1 overall


The following subsystems voted -1:
asflicense hadolint mvnsite pathlen unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.fs.TestFileUtil 
   hadoop.hdfs.server.datanode.TestDirectoryScanner 
   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   
hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat 
   hadoop.hdfs.server.federation.router.TestRouterQuota 
   hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver 
   hadoop.hdfs.server.federation.resolver.order.TestLocalResolver 
   
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceAllocator
 
   hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor 
   
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceHandlerImpl
 
   hadoop.yarn.server.resourcemanager.TestClientRMService 
   
hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker
 
   hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter 
   hadoop.mapreduce.lib.input.TestLineRecordReader 
   hadoop.mapred.TestLineRecordReader 
   hadoop.tools.TestDistCpSystem 
   hadoop.yarn.sls.TestSLSRunner 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.resourceestimator.service.TestResourceEstimatorService 
  

   cc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/diff-compile-javac-root.txt
  [488K]

   checkstyle:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/diff-checkstyle-root.txt
  [14M]

   hadolint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   mvnsite:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/patch-mvnsite-root.txt
  [572K]

   pathlen:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/diff-patch-shellcheck.txt
  [72K]

   whitespace:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/whitespace-eol.txt
  [12M]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/whitespace-tabs.txt
  [1.3M]

   javadoc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/patch-javadoc-root.txt
  [40K]

   unit:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [220K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [432K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [16K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
  [36K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
  [20K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [76K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
  [116K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/841/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
  [104K]