[jira] [Created] (HADOOP-14750) s3guard to provide better diags on ddb init failures

2017-08-08 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-14750:
---

 Summary: s3guard to provide better diags on ddb init failures
 Key: HADOOP-14750
 URL: https://issues.apache.org/jira/browse/HADOOP-14750
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: HADOOP-13345
Reporter: Steve Loughran
Priority: Minor


When you can't connect to DDB you get an Http exception; it'd be good to 
include more info here (table name & region in particular)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: HADOOP-14163 proposal for new hadoop.apache.org

2017-08-08 Thread Allen Wittenauer

Something else to consider.  The main hadoop repo has precommit support.  I 
could easily see a quick and dirty maven pom.xml and dockerfile put in place to 
build the website against “patches” uploaded to JIRA or github.



-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: Question about how to best contribute

2017-08-08 Thread Chris Douglas
Lars-

Welcome!

As a mild refinement of enthusiasm for this proposal: when you
approach a "cleanup", please consider the cost to tracing the lineage
of changes in the codebase. Working on a project as large and
long-running as Hadoop, we often need to trace what motivated a
particular change using only the commit log and JIRA. Sifting through
cosmetic changes that obscure the reasoning behind a module not worth
the aesthetic benefits of consistently formatted code. As a strawman:
hitting 100% checkstyle compliance would not improve our users'
experience, so please use your judgement.

As you point out, we're not going to maintain perfect discipline going
forward, either. Nitpicking our contributors beyond what is necessary
to keep the code legible discourages them from continuing as
contributors. As a general heuristic: the stricter the rule, the more
automation is required to enforce it. This prevents everyone from
burning out on minutiae.

All that said, if you propose a refactoring that makes it easier to
maintain code that's developed more vestigial parts that functional
ones (and we have more than a few of those), that is hugely valuable.
-C

On Mon, Aug 7, 2017 at 5:13 AM, Lars Francke  wrote:
> Hi,
>
> a few words about me: I've contributed to Hadoop (and it's ecosystem[4]) in
> the past am a Hive committer and have used Hadoop for 10 years now, so I'm
> not totally inexperienced. I'm earning my money as a Hadoop consultant so
> I've seen dozens of real-life clusters in my life.
>
> As part of a few recent client projects and now writing about Hadoop in a
> new project/book I'm digging into the source code to figure out some of the
> things that are not documented.
>
> But as part of this digging I'm seeing lots of warnings in the code,
> inconsistencies etc. and I'd like to contribute some fixes to this back to
> the community.
>
> I have been a long-time believer in good code quality and consistent code
> styles. This might affect people like me especially who do a lot of
> "drive-by" contributions as I'm not someone who looks at the code daily but
> comes across it reasonably often as part of client engagements. In those
> scenarios, it's very unhelpful to have inconsistent code & bad
> documentation.
>
> Two simple but concrete examples:
> * There's lots of "final" usages on variables and methods but no
> consistency. Was this done for particular reasons or personal preference?
> * Similarly, there's lots of things that are public or protected while they
> could in theory be private. This especially makes it very hard to reason
> about code.
>
> Judging from the current code there's lots of "unofficial" code styling
> and/or personal preference. The Wiki says[1] to follow the Sun
> guidelines[2] which have not been updated in almost 20 years. A new version
> is in the works an clarifies a lot of things[3]. I'm trying to get it
> published soon. I'd try to format according to the latter (that means among
> other things no "final" for local variables).
>
> I realize that I won't be able to single-handedly fix all of this
> especially as code gets contributed but if the community thinks it's
> worthwhile I'd still love to land a few cleanup patches. My experience in
> the past has been that it's hard to get attention to these things (which I
> fully understand as they take up someone's time to review & commit).
>
> So, this is my request for comments on these questions:
> * Is there any interest in this at all?
> ** "This" being patches for code style & things like FindBugs & Checkstyle
> warnings
> * Size of the patches: Rather one big patch or smaller ones (e.g. per file
> or package)
> * Anyone willing to help me with this? e.g. reviewing and committing? I'd
> be more than happy to bribe you with drinks, sweets, food or something else
>
> My plan is not to go through each and every file and fix every issue I see.
> But there are some specific areas I'm looking at in detail and there I'd
> love to contribute back.
>
> Thank you for reading!
>
> Cheers,
> Lars
>
> PS: Posting to common-dev only, not sure if I should cross post to hdfs-dev
> and yarn-dev as well?
>
> [1] 
> [2] <
> http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html
>>
> [3] 
> [4] <
> https://issues.apache.org/jira/issues/?filter=-1=reporter%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke
>>

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: Question about how to best contribute

2017-08-08 Thread Lars Francke
Hi Akira,


> > So, this is my request for comments on these questions:
> > * Is there any interest in this at all?
> > ** "This" being patches for code style & things like FindBugs &
> Checkstyle
> > warnings
>
> Yes. I'm interested in this.
>
> > * Size of the patches: Rather one big patch or smaller ones (e.g. per
> file
> > or package)
>
> Par file or package is fine.
> Bigger patch makes review harder,
> and if your patch is big, you will need to rebase it frequently :(


Good point!


>
>
> > * Anyone willing to help me with this? e.g. reviewing and committing?
>
> Yes, please send an e-mail to dev ML or ping me if your patches are not
> reviewed from anyone.


Excellent! Thank you so much, I'll certainly take you up on your offer!

Cheers,
Lars




>
> On 2017/08/07 21:13, Lars Francke wrote:
>
>> Hi,
>>
>> a few words about me: I've contributed to Hadoop (and it's ecosystem[4])
>> in
>> the past am a Hive committer and have used Hadoop for 10 years now, so I'm
>> not totally inexperienced. I'm earning my money as a Hadoop consultant so
>> I've seen dozens of real-life clusters in my life.
>>
>> As part of a few recent client projects and now writing about Hadoop in a
>> new project/book I'm digging into the source code to figure out some of
>> the
>> things that are not documented.
>>
>> But as part of this digging I'm seeing lots of warnings in the code,
>> inconsistencies etc. and I'd like to contribute some fixes to this back to
>> the community.
>>
>> I have been a long-time believer in good code quality and consistent code
>> styles. This might affect people like me especially who do a lot of
>> "drive-by" contributions as I'm not someone who looks at the code daily
>> but
>> comes across it reasonably often as part of client engagements. In those
>> scenarios, it's very unhelpful to have inconsistent code & bad
>> documentation.
>>
>> Two simple but concrete examples:
>> * There's lots of "final" usages on variables and methods but no
>> consistency. Was this done for particular reasons or personal preference?
>> * Similarly, there's lots of things that are public or protected while
>> they
>> could in theory be private. This especially makes it very hard to reason
>> about code.
>>
>> Judging from the current code there's lots of "unofficial" code styling
>> and/or personal preference. The Wiki says[1] to follow the Sun
>> guidelines[2] which have not been updated in almost 20 years. A new
>> version
>> is in the works an clarifies a lot of things[3]. I'm trying to get it
>> published soon. I'd try to format according to the latter (that means
>> among
>> other things no "final" for local variables).
>>
>> I realize that I won't be able to single-handedly fix all of this
>> especially as code gets contributed but if the community thinks it's
>> worthwhile I'd still love to land a few cleanup patches. My experience in
>> the past has been that it's hard to get attention to these things (which I
>> fully understand as they take up someone's time to review & commit).
>>
>> So, this is my request for comments on these questions:
>> * Is there any interest in this at all?
>> ** "This" being patches for code style & things like FindBugs & Checkstyle
>> warnings
>> * Size of the patches: Rather one big patch or smaller ones (e.g. per file
>> or package)
>> * Anyone willing to help me with this? e.g. reviewing and committing? I'd
>> be more than happy to bribe you with drinks, sweets, food or something
>> else
>>
>> My plan is not to go through each and every file and fix every issue I
>> see.
>> But there are some specific areas I'm looking at in detail and there I'd
>> love to contribute back.
>>
>> Thank you for reading!
>>
>> Cheers,
>> Lars
>>
>> PS: Posting to common-dev only, not sure if I should cross post to
>> hdfs-dev
>> and yarn-dev as well?
>>
>> [1] 
>> [2] <
>> http://www.oracle.com/technetwork/java/javase/documentation/
>> codeconvtoc-136057.html
>>
>>>
>>> [3] 
>> [4] <
>> https://issues.apache.org/jira/issues/?filter=-1=reporte
>> r%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke
>>
>>>
>>>
>>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


Re: Question about how to best contribute

2017-08-08 Thread Lars Francke
Thanks John, another one for my list of things to look at.

On Mon, Aug 7, 2017 at 11:24 PM, John Zhuge  wrote:

> And check out HADOOP-12145
>  Organize and update
> CodeReviewChecklist wiki.
>
> Thanks, your contribution will be greatly appreciated!
>
>
> On Mon, Aug 7, 2017 at 5:53 AM, Steve Loughran 
> wrote:
>
>>
>> Hi Lars & Welcome!
>>
>> Maybe the first step here would be look at those style guides and think
>> how to bring them up to date, especially with stuff like lambda-expressions
>> in java 8, and mnodules forthcoming in in java 9, SLF4J logging, Junit 5 ->
>> 5 testing, code instrumentation, diagnostics, log stability, etc.
>>
>> https://issues.apache.org/jira/browse/HADOOP-12143 . ;
>>
>> This is my go at doing this
>>
>> https://github.com/steveloughran/formality/blob/master/
>> styleguide/styleguide.md
>>
>>
>> I've not done any work on trying to get it in, more evolving it as how I
>> code & what I look for, especially in tests.
>>
>> If you want to take this on, it'd be nice. At the same time, I fear
>> there'd be push back if you turned up and started telling people what to
>> do. Collaborating with us all on the test code is a good place to start.
>>
>> We're also more relaxed about contributions to the less-core bits of the
>> system (things like HDFS, IPC, security and Yarn core are trouble). If
>> there's stuff outside that you want to take a go at helping clean up,
>> that'd be lower risk (example: object store connectors)
>>
>> -Steve
>>
>>
>>
>> On 7 Aug 2017, at 13:13, Lars Francke  lars.fran...@gmail.com>> wrote:
>>
>> Hi,
>>
>> a few words about me: I've contributed to Hadoop (and it's ecosystem[4])
>> in
>> the past am a Hive committer and have used Hadoop for 10 years now, so I'm
>> not totally inexperienced. I'm earning my money as a Hadoop consultant so
>> I've seen dozens of real-life clusters in my life.
>>
>> As part of a few recent client projects and now writing about Hadoop in a
>> new project/book I'm digging into the source code to figure out some of
>> the
>> things that are not documented.
>>
>> But as part of this digging I'm seeing lots of warnings in the code,
>> inconsistencies etc. and I'd like to contribute some fixes to this back to
>> the community.
>>
>> I have been a long-time believer in good code quality and consistent code
>> styles. This might affect people like me especially who do a lot of
>> "drive-by" contributions as I'm not someone who looks at the code daily
>> but
>> comes across it reasonably often as part of client engagements. In those
>> scenarios, it's very unhelpful to have inconsistent code & bad
>> documentation.
>>
>> Two simple but concrete examples:
>> * There's lots of "final" usages on variables and methods but no
>> consistency. Was this done for particular reasons or personal preference?
>>
>> personal, though with a move to l-expressions, it matters a lot more. We
>> should really be marking all parameters as final at the very least.
>>
>>
>> * Similarly, there's lots of things that are public or protected while
>> they
>> could in theory be private. This especially makes it very hard to reason
>> about code.
>>
>> there's now a bit of fear of breaking things, but at the very least,
>> things could be protected or package-private more than they are.
>>
>>
>>
>> Judging from the current code there's lots of "unofficial" code styling
>> and/or personal preference. The Wiki says[1] to follow the Sun
>> guidelines[2] which have not been updated in almost 20 years. A new
>> version
>> is in the works an clarifies a lot of things[3]. I'm trying to get it
>> published soon. I'd try to format according to the latter (that means
>> among
>> other things no "final" for local variables).
>>
>> I realize that I won't be able to single-handedly fix all of this
>> especially as code gets contributed but if the community thinks it's
>> worthwhile I'd still love to land a few cleanup patches. My experience in
>> the past has been that it's hard to get attention to these things (which I
>> fully understand as they take up someone's time to review & commit).
>>
>> So, this is my request for comments on these questions:
>> * Is there any interest in this at all?
>> ** "This" being patches for code style & things like FindBugs & Checkstyle
>> warnings
>> * Size of the patches: Rather one big patch or smaller ones (e.g. per file
>> or package)
>> * Anyone willing to help me with this? e.g. reviewing and committing? I'd
>> be more than happy to bribe you with drinks, sweets, food or something
>> else
>>
>> My plan is not to go through each and every file and fix every issue I
>> see.
>> But there are some specific areas I'm looking at in detail and there I'd
>> love to contribute back.
>>
>> Thank you for reading!
>>
>> Cheers,
>> Lars
>>
>> PS: Posting to common-dev only, not sure if I should cross post to
>> hdfs-dev
>> and 

Re: Question about how to best contribute

2017-08-08 Thread Lars Francke
Thanks for the ideas Steve.

Hi Lars & Welcome!
>
> Maybe the first step here would be look at those style guides and think
> how to bring them up to date, especially with stuff like lambda-expressions
> in java 8, and mnodules forthcoming in in java 9, SLF4J logging, Junit 5 ->
> 5 testing, code instrumentation, diagnostics, log stability, etc.
>
> https://issues.apache.org/jira/browse/HADOOP-12143 . ;
>

Yeah most style guides are outdated but fortunately at least for the basic
Java stuff there's the updated version and I currently have a thread going
on the OpenJDK mailing list to hopefully get it published (<
http://mail.openjdk.java.net/pipermail/discuss/2017-July/004254.html>). It
already covers Java 8 (but not 9). And it obviously doesn't cover SLF4J
etc. but I've put the issue on my list of things to look at.


> This is my go at doing this
>
> https://github.com/steveloughran/formality/blob/
> master/styleguide/styleguide.md
>
>
> I've not done any work on trying to get it in, more evolving it as how I
> code & what I look for, especially in tests.
>
> If you want to take this on, it'd be nice. At the same time, I fear
> there'd be push back if you turned up and started telling people what to
> do. Collaborating with us all on the test code is a good place to start.
>

Hence me reaching out here to see what would be welcome and how.

Two thing I'm sure of though:

1) We'll never be able to agree on a style that _everyone_ agrees on but in
my opinion that shouldn't stop us from adopting one. And my vote is for the
updated Java one just because of consistency. More projects will use that
one (or the Google one) just because they are popular and easily found on
Google. I

2) There'll be plenty of patches going in ignoring all of the style
guidelines.



>
> We're also more relaxed about contributions to the less-core bits of the
> system (things like HDFS, IPC, security and Yarn core are trouble). If
> there's stuff outside that you want to take a go at helping clean up,
> that'd be lower risk (example: object store connectors)
>
> -Steve
>
>
>
> On 7 Aug 2017, at 13:13, Lars Francke  wrote:
>
> Hi,
>
> a few words about me: I've contributed to Hadoop (and it's ecosystem[4]) in
> the past am a Hive committer and have used Hadoop for 10 years now, so I'm
> not totally inexperienced. I'm earning my money as a Hadoop consultant so
> I've seen dozens of real-life clusters in my life.
>
> As part of a few recent client projects and now writing about Hadoop in a
> new project/book I'm digging into the source code to figure out some of the
> things that are not documented.
>
> But as part of this digging I'm seeing lots of warnings in the code,
> inconsistencies etc. and I'd like to contribute some fixes to this back to
> the community.
>
> I have been a long-time believer in good code quality and consistent code
> styles. This might affect people like me especially who do a lot of
> "drive-by" contributions as I'm not someone who looks at the code daily but
> comes across it reasonably often as part of client engagements. In those
> scenarios, it's very unhelpful to have inconsistent code & bad
> documentation.
>
> Two simple but concrete examples:
> * There's lots of "final" usages on variables and methods but no
> consistency. Was this done for particular reasons or personal preference?
>
>
> personal, though with a move to l-expressions, it matters a lot more. We
> should really be marking all parameters as final at the very least.
>
>
> * Similarly, there's lots of things that are public or protected while they
> could in theory be private. This especially makes it very hard to reason
> about code.
>
>
> there's now a bit of fear of breaking things, but at the very least,
> things could be protected or package-private more than they are.
>

Yeah. The "final" thing is mostly a style thing but this here is actually
costing me (and I assume others) lots of time. And - as you said - it makes
changing/deprecating/removing things hard because you have no idea how and
where things are being used.

Judging from the current code there's lots of "unofficial" code styling
> and/or personal preference. The Wiki says[1] to follow the Sun
> guidelines[2] which have not been updated in almost 20 years. A new version
> is in the works an clarifies a lot of things[3]. I'm trying to get it
> published soon. I'd try to format according to the latter (that means among
> other things no "final" for local variables).
>
> I realize that I won't be able to single-handedly fix all of this
> especially as code gets contributed but if the community thinks it's
> worthwhile I'd still love to land a few cleanup patches. My experience in
> the past has been that it's hard to get attention to these things (which I
> fully understand as they take up someone's time to review & commit).
>
> So, this is my request for comments on these questions:
> * Is there any interest in this at all?
> ** "This" being 

Re: HADOOP-14163 proposal for new hadoop.apache.org

2017-08-08 Thread Allen Wittenauer

> On Aug 8, 2017, at 12:36 AM, Akira Ajisaka  wrote:
> 
> Now I'm okay with not creating another repo.
> I'm thinking the following procedures may work:
> 
> 1. Create ./asf-site directory
> 2. Add the content of https://github.com/elek/hadoop-site-proposal to the 
> directory
> 3. Generate web pages and push them to asf-site branch
> 4. Create a CI job to run 3. automatically when ./asf-site directory is 
> changed


Yup.  To be more specific on the Jenkins part:

MultiSCM build. Build should be set to poll SCM, probably @daily or 
equally reasonable.

first SCM: clone hadoop/trunk to one dir
second SCM: clone hadoop/asf-site to another dir

(Letting Jenkins manage those dirs takes quite a bit of the work out of 
it)

Run a (modified?) form of create-release so that you get an exact 
replica of what a released site looks like.

Take site tarball and unpack it into asf-site/.../trunk (or current? or 
whatever?)

build main site then commit back to asf-site

commit an empty commit to asf-site to work around  INFRA-10751.  
Recommend comment be the git hash of the current hadoop/trunk

FWIW, what we do in Yetus is we actually have the src for the main 
yetus site as part of our source tree.  It gets built as part of the release.
-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: How can i understand parent is file

2017-08-08 Thread Steve Loughran

> On 8 Aug 2017, at 13:45, Fu, Yong  wrote:
> 
> When test my code, I found this case failed, but can't understand why a 
> parent can be a file(org.apache.hadoop.fs.s3a. ITestS3AMiscOperations
> ) .
> 

Is this AWS S3 or another S3-compatible endpoint?

The test is expecting the test to fail with a FileAlreadyExistsException, that 
is: you can't create a file under a file. If the test is passing, there's a 
problem, because the HEAD on the parent entry wasn't interpreted as a file. 
Either it wasn't there, or it was mistaken as a directory

Best fix here is to add some more assertions to see what's happening, then 
consider setting a breakpoint on the test in your IDE

> @Test(expected = FileAlreadyExistsException.class)
> public void testCreateNonRecursiveParentIsFile() throws IOException {
>  Path parent = path("/file.txt");
>  ContractTestUtils.touch(getFileSystem(), parent);

add
ContractTestUtils.assertIsFile(getFileSystem(), parent);

>  createNonRecursive(new Path(parent, "fail"));

add 
ContractTestUtils.assertPathDoesNotExist(getFileSystem(), new Path(parent, 
"fail"));

> }
> 
> 
> 




-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-08-08 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/486/

[Aug 7, 2017 7:45:46 AM] (sunilg) YARN-6951. Fix debug log when Resource 
Handler chain is enabled.
[Aug 7, 2017 9:47:33 AM] (aajisaka) HDFS-12198. Document missing namenode 
metrics that were added recently.
[Aug 7, 2017 9:56:00 AM] (aajisaka) YARN-6873. Moving logging APIs over to 
slf4j in
[Aug 7, 2017 10:25:40 AM] (aajisaka) YARN-6957. Moving logging APIs over to 
slf4j in
[Aug 7, 2017 11:30:10 AM] (kai.zheng) HDFS-12306. Add audit log for some 
erasure coding operations.
[Aug 7, 2017 5:25:52 PM] (xiao) HADOOP-14727. Socket not closed properly when 
reading Configurations
[Aug 7, 2017 6:32:21 PM] (wangda) YARN-4161. Capacity Scheduler : Assign single 
or multiple containers per
[Aug 7, 2017 10:05:10 PM] (arp) HDFS-12264. DataNode uses a deprecated method 
IoUtils#cleanup.
[Aug 7, 2017 11:58:29 PM] (subru) YARN-6955. Handle concurrent register AM 
requests in
[Aug 8, 2017 1:59:25 AM] (Arun Suresh) YARN-6920. Fix resource leak that 
happens during container
[Aug 8, 2017 4:31:28 AM] (cdouglas) HADOOP-14730. Support protobuf FileStatus 
in AdlFileSystem.




-1 overall


The following subsystems voted -1:
findbugs unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   module:hadoop-hdfs-project/hadoop-hdfs-client 
   Possible exposure of partially initialized object in 
org.apache.hadoop.hdfs.DFSClient.initThreadsNumForStripedReads(int) At 
DFSClient.java:object in 
org.apache.hadoop.hdfs.DFSClient.initThreadsNumForStripedReads(int) At 
DFSClient.java:[line 2906] 
   org.apache.hadoop.hdfs.server.protocol.SlowDiskReports.equals(Object) 
makes inefficient use of keySet iterator instead of entrySet iterator At 
SlowDiskReports.java:keySet iterator instead of entrySet iterator At 
SlowDiskReports.java:[line 105] 

FindBugs :

   module:hadoop-hdfs-project/hadoop-hdfs 
   Possible null pointer dereference in 
org.apache.hadoop.hdfs.qjournal.server.JournalNode.getJournalsStatus() due to 
return value of called method Dereferenced at 
JournalNode.java:org.apache.hadoop.hdfs.qjournal.server.JournalNode.getJournalsStatus()
 due to return value of called method Dereferenced at JournalNode.java:[line 
302] 
   
org.apache.hadoop.hdfs.server.common.HdfsServerConstants$StartupOption.setClusterId(String)
 unconditionally sets the field clusterId At HdfsServerConstants.java:clusterId 
At HdfsServerConstants.java:[line 193] 
   
org.apache.hadoop.hdfs.server.common.HdfsServerConstants$StartupOption.setForce(int)
 unconditionally sets the field force At HdfsServerConstants.java:force At 
HdfsServerConstants.java:[line 217] 
   
org.apache.hadoop.hdfs.server.common.HdfsServerConstants$StartupOption.setForceFormat(boolean)
 unconditionally sets the field isForceFormat At 
HdfsServerConstants.java:isForceFormat At HdfsServerConstants.java:[line 229] 
   
org.apache.hadoop.hdfs.server.common.HdfsServerConstants$StartupOption.setInteractiveFormat(boolean)
 unconditionally sets the field isInteractiveFormat At 
HdfsServerConstants.java:isInteractiveFormat At HdfsServerConstants.java:[line 
237] 
   Possible null pointer dereference in 
org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocksHelper(File, File, 
int, HardLink, boolean, File, List) due to return value of called method 
Dereferenced at 
DataStorage.java:org.apache.hadoop.hdfs.server.datanode.DataStorage.linkBlocksHelper(File,
 File, int, HardLink, boolean, File, List) due to return value of called method 
Dereferenced at DataStorage.java:[line 1339] 
   Possible null pointer dereference in 
org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldLegacyOIVImages(String,
 long) due to return value of called method Dereferenced at 
NNStorageRetentionManager.java:org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldLegacyOIVImages(String,
 long) due to return value of called method Dereferenced at 
NNStorageRetentionManager.java:[line 258] 
   Useless condition:argv.length >= 1 at this point At DFSAdmin.java:[line 
2100] 
   Useless condition:numBlocks == -1 at this point At 
ImageLoaderCurrent.java:[line 727] 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
   Useless object stored in variable removedNullContainers of method 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List)
 At NodeStatusUpdaterImpl.java:removedNullContainers of method 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.removeOrTrackCompletedContainersFromContext(List)
 At NodeStatusUpdaterImpl.java:[line 642] 
   

[jira] [Created] (HADOOP-14749) review s3guard docs & code prior to merge

2017-08-08 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-14749:
---

 Summary: review s3guard docs & code prior to merge
 Key: HADOOP-14749
 URL: https://issues.apache.org/jira/browse/HADOOP-14749
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: documentation, fs/s3
Affects Versions: HADOOP-13345
Reporter: Steve Loughran
Assignee: Steve Loughran


Pre-merge cleanup while it's still easy to do

* Read through all the docs, tune
* Diff the trunk/branch files to see if we can reduce the delta (and hence the 
changes)
* Review the new tests




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14748) Wasb input streams to implement CanUnbuffer

2017-08-08 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-14748:
---

 Summary: Wasb input streams to implement CanUnbuffer
 Key: HADOOP-14748
 URL: https://issues.apache.org/jira/browse/HADOOP-14748
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Steve Loughran


HBase relies on FileSystems implementing CanUnbuffer.unbuffer() to force input 
streams to free up remote connections (HBASE-9393Link). This works for HDFS, 
but not elsewhere.

WASB {{BlockBlobInputStream}} can implement this by closing the stream 
 in ({{closeBlobInputStream}}, so it will be re-opened elsewhere.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14747) S3AInputStream to implement CanUnbuffer

2017-08-08 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-14747:
---

 Summary: S3AInputStream to implement CanUnbuffer
 Key: HADOOP-14747
 URL: https://issues.apache.org/jira/browse/HADOOP-14747
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.8.1
Reporter: Steve Loughran


HBase relies on FileSystems implementing {{CanUnbuffer.unbuffer()}} to force 
input streams to free up remote connections (HBASE-9393). This works for HDFS, 
but not elsewhere.

S3A input stream can implement {{CanUnbuffer.unbuffer()}} by closing the input 
stream and relying on lazy seek to reopen it on demand.

Needs
* Contract specification of unbuffer. As in "who added a new feature to 
filesystems but forgot to mention what it should do?"
* Contract test for filesystems which declare their support. 
* S3AInputStream to call {{closeStream()}} on a call to {{unbuffer()}}.
* Test case



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14745) s3a getFileStatus can't return expect result when existing a file and directory with the same name

2017-08-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-14745.
-
Resolution: Invalid

> s3a getFileStatus can't return expect result when existing a file and 
> directory with the same name
> --
>
> Key: HADOOP-14745
> URL: https://issues.apache.org/jira/browse/HADOOP-14745
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Yonger
>Assignee: Yonger
>
> {code}
> [ hadoop-aws]# /root/hadoop/s3cmd/s3cmd ls s3://test-aws-s3a/user/root/
>DIR   s3://test-aws-s3a/user/root/ccc/
> 2017-08-08 07:04 0   s3://test-aws-s3a/user/root/ccc
> {code}
> if we expect to ccc is a directory by code :
> {code}
> Path test=new Path("ccc");
> fs.getFileStatus(test);
> {code}
> actually, it will tell us it is a file:
> {code}
> 2017-08-08 15:08:40,566 [JUnit-case1] DEBUG s3a.S3AFileSystem 
> (S3AFileSystem.java:getFileStatus(1576)) - Getting path status for 
> s3a://test-aws-s3a/user/root/ccc  (user/root/ccc)
> 2017-08-08 15:08:40,566 [JUnit-case1] DEBUG s3a.S3AFileSystem 
> (S3AStorageStatistics.java:incrementCounter(60)) - object_metadata_requests 
> += 1  ->  3
> 2017-08-08 15:08:40,580 [JUnit-case1] DEBUG s3a.S3AFileSystem 
> (S3AFileSystem.java:getFileStatus(1585)) - Found exact file: normal file
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



How can i understand parent is file

2017-08-08 Thread Fu, Yong
When test my code, I found this case failed, but can't understand why a parent 
can be a file(org.apache.hadoop.fs.s3a. ITestS3AMiscOperations
) .

@Test(expected = FileAlreadyExistsException.class)
public void testCreateNonRecursiveParentIsFile() throws IOException {
  Path parent = path("/file.txt");
  ContractTestUtils.touch(getFileSystem(), parent);
  createNonRecursive(new Path(parent, "fail"));
}





[jira] [Created] (HADOOP-14746) Cut S3AOutputStream

2017-08-08 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-14746:
---

 Summary: Cut S3AOutputStream
 Key: HADOOP-14746
 URL: https://issues.apache.org/jira/browse/HADOOP-14746
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.8.1
Reporter: Steve Loughran
Priority: Minor


We've been happy with the new S3A BlockOutputStream, with better scale, 
performance, instrumentation & recovery. I propose cutting the 
older{{S3AOutputStream}} code entirely.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: Question about how to best contribute

2017-08-08 Thread Akira Ajisaka

Hi Lars,

Thank you for your questions!

> So, this is my request for comments on these questions:
> * Is there any interest in this at all?
> ** "This" being patches for code style & things like FindBugs & Checkstyle
> warnings

Yes. I'm interested in this.

> * Size of the patches: Rather one big patch or smaller ones (e.g. per file
> or package)

Par file or package is fine.
Bigger patch makes review harder,
and if your patch is big, you will need to rebase it frequently :(

> * Anyone willing to help me with this? e.g. reviewing and committing?

Yes, please send an e-mail to dev ML or ping me if your patches are not 
reviewed from anyone.

Thanks,
Akira

On 2017/08/07 21:13, Lars Francke wrote:

Hi,

a few words about me: I've contributed to Hadoop (and it's ecosystem[4]) in
the past am a Hive committer and have used Hadoop for 10 years now, so I'm
not totally inexperienced. I'm earning my money as a Hadoop consultant so
I've seen dozens of real-life clusters in my life.

As part of a few recent client projects and now writing about Hadoop in a
new project/book I'm digging into the source code to figure out some of the
things that are not documented.

But as part of this digging I'm seeing lots of warnings in the code,
inconsistencies etc. and I'd like to contribute some fixes to this back to
the community.

I have been a long-time believer in good code quality and consistent code
styles. This might affect people like me especially who do a lot of
"drive-by" contributions as I'm not someone who looks at the code daily but
comes across it reasonably often as part of client engagements. In those
scenarios, it's very unhelpful to have inconsistent code & bad
documentation.

Two simple but concrete examples:
* There's lots of "final" usages on variables and methods but no
consistency. Was this done for particular reasons or personal preference?
* Similarly, there's lots of things that are public or protected while they
could in theory be private. This especially makes it very hard to reason
about code.

Judging from the current code there's lots of "unofficial" code styling
and/or personal preference. The Wiki says[1] to follow the Sun
guidelines[2] which have not been updated in almost 20 years. A new version
is in the works an clarifies a lot of things[3]. I'm trying to get it
published soon. I'd try to format according to the latter (that means among
other things no "final" for local variables).

I realize that I won't be able to single-handedly fix all of this
especially as code gets contributed but if the community thinks it's
worthwhile I'd still love to land a few cleanup patches. My experience in
the past has been that it's hard to get attention to these things (which I
fully understand as they take up someone's time to review & commit).

So, this is my request for comments on these questions:
* Is there any interest in this at all?
** "This" being patches for code style & things like FindBugs & Checkstyle
warnings
* Size of the patches: Rather one big patch or smaller ones (e.g. per file
or package)
* Anyone willing to help me with this? e.g. reviewing and committing? I'd
be more than happy to bribe you with drinks, sweets, food or something else

My plan is not to go through each and every file and fix every issue I see.
But there are some specific areas I'm looking at in detail and there I'd
love to contribute back.

Thank you for reading!

Cheers,
Lars

PS: Posting to common-dev only, not sure if I should cross post to hdfs-dev
and yarn-dev as well?

[1] 
[2] <
http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html



[3] 
[4] <
https://issues.apache.org/jira/issues/?filter=-1=reporter%20%3D%20lars_francke%20OR%20assignee%20%3D%20lars_francke






-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14745) s3a getFileStatus can't return expect result when existing a file and directory with the same name

2017-08-08 Thread Yonger (JIRA)
Yonger created HADOOP-14745:
---

 Summary: s3a getFileStatus can't return expect result when 
existing a file and directory with the same name
 Key: HADOOP-14745
 URL: https://issues.apache.org/jira/browse/HADOOP-14745
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.8.0
Reporter: Yonger
Assignee: Yonger


[ hadoop-aws]# /root/hadoop/s3cmd/s3cmd ls s3://test-aws-s3a/user/root/
   DIR   s3://test-aws-s3a/user/root/ccc/
2017-08-08 07:04 0   s3://test-aws-s3a/user/root/ccc

if we expect to ccc is a directory by code :
Path test=new Path("ccc");
fs.getFileStatus(test);

actually, it will tell us it is a file:

2017-08-08 15:08:40,566 [JUnit-case1] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:getFileStatus(1576)) - Getting path status for 
s3a://test-aws-s3a/user/root/ccc  (user/root/ccc)
2017-08-08 15:08:40,566 [JUnit-case1] DEBUG s3a.S3AFileSystem 
(S3AStorageStatistics.java:incrementCounter(60)) - object_metadata_requests += 
1  ->  3
2017-08-08 15:08:40,580 [JUnit-case1] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:getFileStatus(1585)) - Found exact file: normal file





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: HADOOP-14163 proposal for new hadoop.apache.org

2017-08-08 Thread Akira Ajisaka

Thanks Allen for the comment.
Yes, it can be done without creating another git repo.

In the near future, I'd like to enable CI job to push the site
to asf-site branch automatically when the site source is changed.

If we have a separate repository for web site,

Pro: CI job can detect the change of the site source easily.
Con: We need to manage the two different repositories.

Now I'm okay with not creating another repo.
I'm thinking the following procedures may work:

1. Create ./asf-site directory
2. Add the content of https://github.com/elek/hadoop-site-proposal to the 
directory
3. Generate web pages and push them to asf-site branch
4. Create a CI job to run 3. automatically when ./asf-site directory is changed

-Akira

On 2017/08/07 23:06, Allen Wittenauer wrote:



On Aug 7, 2017, at 3:53 AM, Akira Ajisaka  wrote:


I'll ask INFRA to create a git repository if there are no objections.


There's no need to create a git repo.  They just need to know to pull 
the website from the asf-site branch.



-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org