[jira] [Updated] (HADOOP-13868) New defaults for S3A multi-part configuration

2019-08-13 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-13868:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> New defaults for S3A multi-part configuration
> -
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, 
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13868) New defaults for S3A multi-part configuration

2019-07-18 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888395#comment-16888395
 ] 

Sean Mackrory commented on HADOOP-13868:


Bit-rot after only 2 1/2 years? Imagine that! Actually the only part that 
doesn't apply cleanly is the documentation, and that's just because it's 
looking 100 lines away from where it should. Resubmitted as a pull request to 
verify a clean Yetus run, but as the patch is virtually identical I'll assume 
your +1 still applies unless I hear otherwise.

> New defaults for S3A multi-part configuration
> -
>
> Key: HADOOP-13868
> URL: https://issues.apache.org/jira/browse/HADOOP-13868
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-13868.001.patch, HADOOP-13868.002.patch, 
> optimizing-multipart-s3a.sh
>
>
> I've been looking at a big performance regression when writing to S3 from 
> Spark that appears to have been introduced with HADOOP-12891.
> In the Amazon SDK, the default threshold for multi-part copies is 320x the 
> threshold for multi-part uploads (and the block size is 20x bigger), so I 
> don't think it's necessarily wise for us to have them be the same.
> I did some quick tests and it seems to me the sweet spot when multi-part 
> copies start being faster is around 512MB. It wasn't as significant, but 
> using 104857600 (Amazon's default) for the blocksize was also slightly better.
> I propose we do the following, although they're independent decisions:
> (1) Split the configuration. Ideally, I'd like to have 
> fs.s3a.multipart.copy.threshold and fs.s3a.multipart.upload.threshold (and 
> corresponding properties for the block size). But then there's the question 
> of what to do with the existing fs.s3a.multipart.* properties. Deprecation? 
> Leave it as a short-hand for configuring both (that's overridden by the more 
> specific properties?).
> (2) Consider increasing the default values. In my tests, 256 MB seemed to be 
> where multipart uploads came into their own, and 512 MB was where multipart 
> copies started outperforming the alternative. Would be interested to hear 
> what other people have seen.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-07-10 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882398#comment-16882398
 ] 

Sean Mackrory commented on HADOOP-15729:


Oh it was an obsolete import. Pull request created: 
https://github.com/apache/hadoop/pull/1075

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-07-10 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15729:
---
Attachment: (was: HADOOP-15729.002.patch)

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-07-10 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15729:
---
Attachment: HADOOP-15729.002.patch

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-07-10 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15729:
---
Status: Patch Available  (was: Open)

Resubmitting as checkstyle output has expired.

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-07-10 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15729:
---
Status: Open  (was: Patch Available)

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13980) S3Guard CLI: Add fsck check command

2019-07-08 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880424#comment-16880424
 ] 

Sean Mackrory commented on HADOOP-13980:


{quote}Export metadatastore and S3 bucket hierarchy{quote}
Is this different in some way from "export scan results in human readable 
format". Thought about maybe having a machine-readable export that we could 
import if that might help with supportability. I've personally never seen a 
support issue that it would've helped with, but just something to think about...

{quote}Implement the fixing mechanism{quote}
We can probably break this into more subtasks. If would be best if the 
implementation had a sequence of specific "fixers" to address specific 
discrepancies. "fixMissingParents", "fixOutOfDateEntries", etc.

> S3Guard CLI: Add fsck check command
> ---
>
> Key: HADOOP-13980
> URL: https://issues.apache.org/jira/browse/HADOOP-13980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Major
>
> As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
> compares S3 with MetadataStore, and returns a failure status if any 
> invariants are violated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16396) Allow authoritative mode on a subdirectory

2019-07-03 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-16396:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks, [~gabor.bota]. I neglected to fix the whitespace nits you pointed out 
before merging. I filed HADOOP-16409 for that and another follow-up tweak.

> Allow authoritative mode on a subdirectory
> --
>
> Key: HADOOP-16396
> URL: https://issues.apache.org/jira/browse/HADOOP-16396
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-16396.001.patch, HADOOP-16396.002.patch, 
> HADOOP-16396.003.patch
>
>
> Let's allow authoritative mode to be applied only to a subset of a bucket. 
> This is coming primarily from a Hive warehousing use-case where Hive-managed 
> tables can benefit from query planning, but can't speak for the rest of the 
> bucket. This should be limited in scope and is not a general attempt to allow 
> configuration on a per-path basis, as configuration is currently done on a 
> per-process of a per-bucket basis.
> I propose a new property (we could overload 
> fs.s3a.metadatastore.authoritative, but that seems likely to cause confusion 
> somewhere). A string would be allowed that would then be qualified in the 
> context of the FileSystem, and used to check if it is a prefix for a given 
> path. If it is, we act as though authoritative mode is enabled. If not, we 
> revert to the existing behavior of fs.s3a.metadatastore.authoritative (which 
> in practice will probably be false, the default, if the new property is in 
> use).
> Let's be clear about a few things:
> * Currently authoritative mode only short-cuts the process to avoid a 
> round-trip to S3 if we know it is safe to do so. This means that even when 
> authoritative mode is enabled for a bucket, if the metadata store does not 
> have a complete (or "authoritative") current listing cached, authoritative 
> mode still has no effect. This will still apply.
> * This will only apply to getFileStatus and listStatus, and internal calls to 
> their internal counterparts. No other API is currently using authoritative 
> mode to change behavior.
> * This will only apply to getFileStatus and listStatus calls INSIDE the 
> configured prefix. If there is a recursvie listing on the parent of the 
> configured prefix, no change in behavior will be observed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16409) Allow authoritative mode on non-qualified paths

2019-07-03 Thread Sean Mackrory (JIRA)
Sean Mackrory created HADOOP-16409:
--

 Summary: Allow authoritative mode on non-qualified paths
 Key: HADOOP-16409
 URL: https://issues.apache.org/jira/browse/HADOOP-16409
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Sean Mackrory
Assignee: Sean Mackrory


fs.s3a.authoritative.path currently requires a qualified URI (e.g. 
s3a://bucket/path) which is how I see this being used most immediately, but it 
also make sense for someone to just be able to configure /path, if all of their 
buckets follow that pattern, or if they're providing configuration already in a 
bucket-specific context (e.g. job-level configs, etc.) Just need to qualify 
whatever is passed in to allowAuthoritative to make that work.

Also, in HADOOP-16396 Gabor pointed out a few whitepace nits that I neglected 
to fix before merging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16396) Allow authoritative mode on a subdirectory

2019-07-01 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876283#comment-16876283
 ] 

Sean Mackrory commented on HADOOP-16396:


Submitted an iteration based on your feedback via PR. I moved 
allowAuthoritative into S3Guard, as I agree simplifying S3AFS.java is 
worthwhile. It's a bit messy because it needs 4 long parameter names or adding 
getters to S3AFS anyway. I deduplicated a couple of the calls. I'd feel better 
if we would rename "allowAuthoritativeMetadataStore" and 
"allowAuthoritativePaths" to "authMode" and "authModePaths". So far, "auth 
mode" is just a short-hand being thrown around - any objection to formalizing 
that in code?

Renamed unguardedFS to rawFS. I'd like to keep "fullyAuthFS" to distinguish 
between other "guarded" FS that have other varying settings for authoritative 
paths, etc. I have a rather unique need for the other get*FS functions that 
take a directory as an argument, I think, so I'd rather not factor those out.

I removed createNonAuthFS since I was indeed no longer using it.



> Allow authoritative mode on a subdirectory
> --
>
> Key: HADOOP-16396
> URL: https://issues.apache.org/jira/browse/HADOOP-16396
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-16396.001.patch, HADOOP-16396.002.patch, 
> HADOOP-16396.003.patch
>
>
> Let's allow authoritative mode to be applied only to a subset of a bucket. 
> This is coming primarily from a Hive warehousing use-case where Hive-managed 
> tables can benefit from query planning, but can't speak for the rest of the 
> bucket. This should be limited in scope and is not a general attempt to allow 
> configuration on a per-path basis, as configuration is currently done on a 
> per-process of a per-bucket basis.
> I propose a new property (we could overload 
> fs.s3a.metadatastore.authoritative, but that seems likely to cause confusion 
> somewhere). A string would be allowed that would then be qualified in the 
> context of the FileSystem, and used to check if it is a prefix for a given 
> path. If it is, we act as though authoritative mode is enabled. If not, we 
> revert to the existing behavior of fs.s3a.metadatastore.authoritative (which 
> in practice will probably be false, the default, if the new property is in 
> use).
> Let's be clear about a few things:
> * Currently authoritative mode only short-cuts the process to avoid a 
> round-trip to S3 if we know it is safe to do so. This means that even when 
> authoritative mode is enabled for a bucket, if the metadata store does not 
> have a complete (or "authoritative") current listing cached, authoritative 
> mode still has no effect. This will still apply.
> * This will only apply to getFileStatus and listStatus, and internal calls to 
> their internal counterparts. No other API is currently using authoritative 
> mode to change behavior.
> * This will only apply to getFileStatus and listStatus calls INSIDE the 
> configured prefix. If there is a recursvie listing on the parent of the 
> configured prefix, no change in behavior will be observed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16396) Allow authoritative mode on a subdirectory

2019-06-28 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875275#comment-16875275
 ] 

Sean Mackrory commented on HADOOP-16396:


Added a couple of other things:
* Ability to specify multiple authoritative paths, comma-delimited (kind of 
assumes you don't have commas in your paths, which I wish was a safe 
assumption, but I've seen weirded punctuation in paths before - not sure if 
it's worth doing more about that).
* Ensures specified paths are treated as directories and not prefixes (e.g. If 
you make /auth authoritative, /auth-plus-a-suffix is not included in that).

One downside is that ITestS3GuardOutOfBandOperations had some functions for 
checking that listings did and didn't contain specific entries, which I 
refactored into S3ATestUtils. Part of this include more generic messages in 
failures. I hope [~gabor.bota] is okay with that. This also changes 
.toString(), so IMO it warrants a Release Note when it goes in.

> Allow authoritative mode on a subdirectory
> --
>
> Key: HADOOP-16396
> URL: https://issues.apache.org/jira/browse/HADOOP-16396
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-16396.001.patch, HADOOP-16396.002.patch, 
> HADOOP-16396.003.patch
>
>
> Let's allow authoritative mode to be applied only to a subset of a bucket. 
> This is coming primarily from a Hive warehousing use-case where Hive-managed 
> tables can benefit from query planning, but can't speak for the rest of the 
> bucket. This should be limited in scope and is not a general attempt to allow 
> configuration on a per-path basis, as configuration is currently done on a 
> per-process of a per-bucket basis.
> I propose a new property (we could overload 
> fs.s3a.metadatastore.authoritative, but that seems likely to cause confusion 
> somewhere). A string would be allowed that would then be qualified in the 
> context of the FileSystem, and used to check if it is a prefix for a given 
> path. If it is, we act as though authoritative mode is enabled. If not, we 
> revert to the existing behavior of fs.s3a.metadatastore.authoritative (which 
> in practice will probably be false, the default, if the new property is in 
> use).
> Let's be clear about a few things:
> * Currently authoritative mode only short-cuts the process to avoid a 
> round-trip to S3 if we know it is safe to do so. This means that even when 
> authoritative mode is enabled for a bucket, if the metadata store does not 
> have a complete (or "authoritative") current listing cached, authoritative 
> mode still has no effect. This will still apply.
> * This will only apply to getFileStatus and listStatus, and internal calls to 
> their internal counterparts. No other API is currently using authoritative 
> mode to change behavior.
> * This will only apply to getFileStatus and listStatus calls INSIDE the 
> configured prefix. If there is a recursvie listing on the parent of the 
> configured prefix, no change in behavior will be observed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16396) Allow authoritative mode on a subdirectory

2019-06-28 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-16396:
---
Attachment: HADOOP-16396.003.patch

> Allow authoritative mode on a subdirectory
> --
>
> Key: HADOOP-16396
> URL: https://issues.apache.org/jira/browse/HADOOP-16396
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-16396.001.patch, HADOOP-16396.002.patch, 
> HADOOP-16396.003.patch
>
>
> Let's allow authoritative mode to be applied only to a subset of a bucket. 
> This is coming primarily from a Hive warehousing use-case where Hive-managed 
> tables can benefit from query planning, but can't speak for the rest of the 
> bucket. This should be limited in scope and is not a general attempt to allow 
> configuration on a per-path basis, as configuration is currently done on a 
> per-process of a per-bucket basis.
> I propose a new property (we could overload 
> fs.s3a.metadatastore.authoritative, but that seems likely to cause confusion 
> somewhere). A string would be allowed that would then be qualified in the 
> context of the FileSystem, and used to check if it is a prefix for a given 
> path. If it is, we act as though authoritative mode is enabled. If not, we 
> revert to the existing behavior of fs.s3a.metadatastore.authoritative (which 
> in practice will probably be false, the default, if the new property is in 
> use).
> Let's be clear about a few things:
> * Currently authoritative mode only short-cuts the process to avoid a 
> round-trip to S3 if we know it is safe to do so. This means that even when 
> authoritative mode is enabled for a bucket, if the metadata store does not 
> have a complete (or "authoritative") current listing cached, authoritative 
> mode still has no effect. This will still apply.
> * This will only apply to getFileStatus and listStatus, and internal calls to 
> their internal counterparts. No other API is currently using authoritative 
> mode to change behavior.
> * This will only apply to getFileStatus and listStatus calls INSIDE the 
> configured prefix. If there is a recursvie listing on the parent of the 
> configured prefix, no change in behavior will be observed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16396) Allow authoritative mode on a subdirectory

2019-06-28 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-16396:
---
Status: Patch Available  (was: Open)

> Allow authoritative mode on a subdirectory
> --
>
> Key: HADOOP-16396
> URL: https://issues.apache.org/jira/browse/HADOOP-16396
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-16396.001.patch, HADOOP-16396.002.patch, 
> HADOOP-16396.003.patch
>
>
> Let's allow authoritative mode to be applied only to a subset of a bucket. 
> This is coming primarily from a Hive warehousing use-case where Hive-managed 
> tables can benefit from query planning, but can't speak for the rest of the 
> bucket. This should be limited in scope and is not a general attempt to allow 
> configuration on a per-path basis, as configuration is currently done on a 
> per-process of a per-bucket basis.
> I propose a new property (we could overload 
> fs.s3a.metadatastore.authoritative, but that seems likely to cause confusion 
> somewhere). A string would be allowed that would then be qualified in the 
> context of the FileSystem, and used to check if it is a prefix for a given 
> path. If it is, we act as though authoritative mode is enabled. If not, we 
> revert to the existing behavior of fs.s3a.metadatastore.authoritative (which 
> in practice will probably be false, the default, if the new property is in 
> use).
> Let's be clear about a few things:
> * Currently authoritative mode only short-cuts the process to avoid a 
> round-trip to S3 if we know it is safe to do so. This means that even when 
> authoritative mode is enabled for a bucket, if the metadata store does not 
> have a complete (or "authoritative") current listing cached, authoritative 
> mode still has no effect. This will still apply.
> * This will only apply to getFileStatus and listStatus, and internal calls to 
> their internal counterparts. No other API is currently using authoritative 
> mode to change behavior.
> * This will only apply to getFileStatus and listStatus calls INSIDE the 
> configured prefix. If there is a recursvie listing on the parent of the 
> configured prefix, no change in behavior will be observed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16396) Allow authoritative mode on a subdirectory

2019-06-28 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875113#comment-16875113
 ] 

Sean Mackrory commented on HADOOP-16396:


I see what I'm missing: listStatus does write-backs. listFiles does not. This 
makes sense because listFiles doesn't otherwise require us to pull enough data 
to populate complete rows in the metadata store. And it's okay because I'm 
fairly certain the Hive warehousing use-case mentioned above is primarily going 
to be using listStatus for query planning anyway. As a side note, maybe 
listFiles should somehow be enable to populate partial rows that can fill in 
for future listFiles-like calls, even though they would be insufficient for 
future listStatus-like calls. Just a thought.

In any case - after switching from listFiles to listStatus in my tests while 
trying to trigger a write-back, my tests now pass.

> Allow authoritative mode on a subdirectory
> --
>
> Key: HADOOP-16396
> URL: https://issues.apache.org/jira/browse/HADOOP-16396
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-16396.001.patch, HADOOP-16396.002.patch
>
>
> Let's allow authoritative mode to be applied only to a subset of a bucket. 
> This is coming primarily from a Hive warehousing use-case where Hive-managed 
> tables can benefit from query planning, but can't speak for the rest of the 
> bucket. This should be limited in scope and is not a general attempt to allow 
> configuration on a per-path basis, as configuration is currently done on a 
> per-process of a per-bucket basis.
> I propose a new property (we could overload 
> fs.s3a.metadatastore.authoritative, but that seems likely to cause confusion 
> somewhere). A string would be allowed that would then be qualified in the 
> context of the FileSystem, and used to check if it is a prefix for a given 
> path. If it is, we act as though authoritative mode is enabled. If not, we 
> revert to the existing behavior of fs.s3a.metadatastore.authoritative (which 
> in practice will probably be false, the default, if the new property is in 
> use).
> Let's be clear about a few things:
> * Currently authoritative mode only short-cuts the process to avoid a 
> round-trip to S3 if we know it is safe to do so. This means that even when 
> authoritative mode is enabled for a bucket, if the metadata store does not 
> have a complete (or "authoritative") current listing cached, authoritative 
> mode still has no effect. This will still apply.
> * This will only apply to getFileStatus and listStatus, and internal calls to 
> their internal counterparts. No other API is currently using authoritative 
> mode to change behavior.
> * This will only apply to getFileStatus and listStatus calls INSIDE the 
> configured prefix. If there is a recursvie listing on the parent of the 
> configured prefix, no change in behavior will be observed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16396) Allow authoritative mode on a subdirectory

2019-06-28 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-16396:
---
Attachment: HADOOP-16396.002.patch

> Allow authoritative mode on a subdirectory
> --
>
> Key: HADOOP-16396
> URL: https://issues.apache.org/jira/browse/HADOOP-16396
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-16396.001.patch, HADOOP-16396.002.patch
>
>
> Let's allow authoritative mode to be applied only to a subset of a bucket. 
> This is coming primarily from a Hive warehousing use-case where Hive-managed 
> tables can benefit from query planning, but can't speak for the rest of the 
> bucket. This should be limited in scope and is not a general attempt to allow 
> configuration on a per-path basis, as configuration is currently done on a 
> per-process of a per-bucket basis.
> I propose a new property (we could overload 
> fs.s3a.metadatastore.authoritative, but that seems likely to cause confusion 
> somewhere). A string would be allowed that would then be qualified in the 
> context of the FileSystem, and used to check if it is a prefix for a given 
> path. If it is, we act as though authoritative mode is enabled. If not, we 
> revert to the existing behavior of fs.s3a.metadatastore.authoritative (which 
> in practice will probably be false, the default, if the new property is in 
> use).
> Let's be clear about a few things:
> * Currently authoritative mode only short-cuts the process to avoid a 
> round-trip to S3 if we know it is safe to do so. This means that even when 
> authoritative mode is enabled for a bucket, if the metadata store does not 
> have a complete (or "authoritative") current listing cached, authoritative 
> mode still has no effect. This will still apply.
> * This will only apply to getFileStatus and listStatus, and internal calls to 
> their internal counterparts. No other API is currently using authoritative 
> mode to change behavior.
> * This will only apply to getFileStatus and listStatus calls INSIDE the 
> configured prefix. If there is a recursvie listing on the parent of the 
> configured prefix, no change in behavior will be observed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16396) Allow authoritative mode on a subdirectory

2019-06-26 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873692#comment-16873692
 ] 

Sean Mackrory commented on HADOOP-16396:


Attaching my current state, although I'm not done. My tests are failing because 
when I list a directory that we just listed a minute ago, it's still querying 
S3, even when authoritative mode should, in my understanding be kicking in. The 
problem would seem to be that the first listing doesn't perform a write-back, 
and sure-enough the metadata store never considers that directory listing 
authoritative. I ran all (not scale) the tests and traced to confirm that at no 
point do any of them write-back with authoritative=true. I thought we used to 
have logic in listings that would conditionally do a write back, and I assumed 
that recent work would have included flipping the authoritative bit. Am I 
missing something? [~gabor.bota] [~ste...@apache.org]

> Allow authoritative mode on a subdirectory
> --
>
> Key: HADOOP-16396
> URL: https://issues.apache.org/jira/browse/HADOOP-16396
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-16396.001.patch
>
>
> Let's allow authoritative mode to be applied only to a subset of a bucket. 
> This is coming primarily from a Hive warehousing use-case where Hive-managed 
> tables can benefit from query planning, but can't speak for the rest of the 
> bucket. This should be limited in scope and is not a general attempt to allow 
> configuration on a per-path basis, as configuration is currently done on a 
> per-process of a per-bucket basis.
> I propose a new property (we could overload 
> fs.s3a.metadatastore.authoritative, but that seems likely to cause confusion 
> somewhere). A string would be allowed that would then be qualified in the 
> context of the FileSystem, and used to check if it is a prefix for a given 
> path. If it is, we act as though authoritative mode is enabled. If not, we 
> revert to the existing behavior of fs.s3a.metadatastore.authoritative (which 
> in practice will probably be false, the default, if the new property is in 
> use).
> Let's be clear about a few things:
> * Currently authoritative mode only short-cuts the process to avoid a 
> round-trip to S3 if we know it is safe to do so. This means that even when 
> authoritative mode is enabled for a bucket, if the metadata store does not 
> have a complete (or "authoritative") current listing cached, authoritative 
> mode still has no effect. This will still apply.
> * This will only apply to getFileStatus and listStatus, and internal calls to 
> their internal counterparts. No other API is currently using authoritative 
> mode to change behavior.
> * This will only apply to getFileStatus and listStatus calls INSIDE the 
> configured prefix. If there is a recursvie listing on the parent of the 
> configured prefix, no change in behavior will be observed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16396) Allow authoritative mode on a subdirectory

2019-06-26 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-16396:
---
Attachment: HADOOP-16396.001.patch

> Allow authoritative mode on a subdirectory
> --
>
> Key: HADOOP-16396
> URL: https://issues.apache.org/jira/browse/HADOOP-16396
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-16396.001.patch
>
>
> Let's allow authoritative mode to be applied only to a subset of a bucket. 
> This is coming primarily from a Hive warehousing use-case where Hive-managed 
> tables can benefit from query planning, but can't speak for the rest of the 
> bucket. This should be limited in scope and is not a general attempt to allow 
> configuration on a per-path basis, as configuration is currently done on a 
> per-process of a per-bucket basis.
> I propose a new property (we could overload 
> fs.s3a.metadatastore.authoritative, but that seems likely to cause confusion 
> somewhere). A string would be allowed that would then be qualified in the 
> context of the FileSystem, and used to check if it is a prefix for a given 
> path. If it is, we act as though authoritative mode is enabled. If not, we 
> revert to the existing behavior of fs.s3a.metadatastore.authoritative (which 
> in practice will probably be false, the default, if the new property is in 
> use).
> Let's be clear about a few things:
> * Currently authoritative mode only short-cuts the process to avoid a 
> round-trip to S3 if we know it is safe to do so. This means that even when 
> authoritative mode is enabled for a bucket, if the metadata store does not 
> have a complete (or "authoritative") current listing cached, authoritative 
> mode still has no effect. This will still apply.
> * This will only apply to getFileStatus and listStatus, and internal calls to 
> their internal counterparts. No other API is currently using authoritative 
> mode to change behavior.
> * This will only apply to getFileStatus and listStatus calls INSIDE the 
> configured prefix. If there is a recursvie listing on the parent of the 
> configured prefix, no change in behavior will be observed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16396) Allow authoritative mode on a subdirectory

2019-06-26 Thread Sean Mackrory (JIRA)
Sean Mackrory created HADOOP-16396:
--

 Summary: Allow authoritative mode on a subdirectory
 Key: HADOOP-16396
 URL: https://issues.apache.org/jira/browse/HADOOP-16396
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Reporter: Sean Mackrory
Assignee: Sean Mackrory


Let's allow authoritative mode to be applied only to a subset of a bucket. This 
is coming primarily from a Hive warehousing use-case where Hive-managed tables 
can benefit from query planning, but can't speak for the rest of the bucket. 
This should be limited in scope and is not a general attempt to allow 
configuration on a per-path basis, as configuration is currently done on a 
per-process of a per-bucket basis.

I propose a new property (we could overload fs.s3a.metadatastore.authoritative, 
but that seems likely to cause confusion somewhere). A string would be allowed 
that would then be qualified in the context of the FileSystem, and used to 
check if it is a prefix for a given path. If it is, we act as though 
authoritative mode is enabled. If not, we revert to the existing behavior of 
fs.s3a.metadatastore.authoritative (which in practice will probably be false, 
the default, if the new property is in use).

Let's be clear about a few things:
* Currently authoritative mode only short-cuts the process to avoid a 
round-trip to S3 if we know it is safe to do so. This means that even when 
authoritative mode is enabled for a bucket, if the metadata store does not have 
a complete (or "authoritative") current listing cached, authoritative mode 
still has no effect. This will still apply.
* This will only apply to getFileStatus and listStatus, and internal calls to 
their internal counterparts. No other API is currently using authoritative mode 
to change behavior.
* This will only apply to getFileStatus and listStatus calls INSIDE the 
configured prefix. If there is a recursvie listing on the parent of the 
configured prefix, no change in behavior will be observed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13551) hook up AwsSdkMetrics to hadoop metrics

2019-06-26 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873409#comment-16873409
 ] 

Sean Mackrory commented on HADOOP-13551:


As an update, every time someone has asked about this, I've pointed out the JDK 
property that allows the SDK to just publish metrics to CloudWatch, and so far 
that's exceeded everyone's expectations. I'm not feeling any continued demand 
for this... But knowing about throttling would be nice - I don't know if 
CloudWatch shows that off the top of my head.

> hook up AwsSdkMetrics to hadoop metrics
> ---
>
> Key: HADOOP-13551
> URL: https://issues.apache.org/jira/browse/HADOOP-13551
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Sean Mackrory
>Priority: Major
>
> There's an API in {{com.amazonaws.metrics.AwsSdkMetrics}} to give access to 
> the internal metrics of the AWS libraries. We might want to get at those



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-06-25 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15729:
---
Status: Patch Available  (was: Open)

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-06-25 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872630#comment-16872630
 ] 

Sean Mackrory commented on HADOOP-15729:


So I tried out some of the same 1 GB uploads in the same region. Could vary 
that more, but as it is I think I'm just seeing a lot of noise in the 
underlying performance. I *tended* to see that separating core thread and max 
thread counts resulted in worse performance - I mainly tested with 50 max 
threads, and any combination of 0 core threads, 10 core threads, prestarting or 
not, all *tended* to have very similar decreases in performance. I'm not sure 
how to explain that - except for the "tended" - once in a while with an 
identical configuration I'd see the same drop in performance (in the best case, 
upload takes ~25 seconds, when there was a drop in performance it was in the 
neighborhood of 35 - 45 seconds). So it'd be quite the task to do a 
statistically rigorous experiment on this...

So here's what I propose: what really shouldn't, and in my testing *tended* not 
to, have any impact on the short-term performance characteristics but would 
also completely solve the problem long-running processes have, is simply 
allowing core threads anywhere to time out. We already have a timeout 
configured, we just only use it for the BlockingThreadPoolExecutorService, and 
not for the unbounded threadpool we give to Amazon. I think this is the safe 
and right choice. Patch attached.

Side note: I'm seeing that I can't mv or cp on S3 because it says the 
destination exists when it doesn't, and I'm getting some Maven errors packaging 
stuff in S3AFileSystem because of JavaDoc errors (an errant <, and a missing 
param). I'll follow up on those in separate JIRAs.

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-06-25 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15729:
---
Attachment: HADOOP-15729.002.patch

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-06-25 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872396#comment-16872396
 ] 

Sean Mackrory edited comment on HADOOP-15729 at 6/25/19 2:55 PM:
-

Having slept on it, I think we should go ahead and undeprecate 
fs.s3a.threads.core (just saw in the code we had it once) but have it default 
to 0. A non-zero default would either be dangerously high (I know that some 
have set it to low values to work around the problem I described above) and we 
can't have core > max, or it'll be uselessly low with a value of 1 or 2. But 
the more I think about this the less averse I am to making it tunable right 
away, especially since the property has existed before but has been unused for 
some time.

edit: it's actually only deprecated in branch-2 for compatibility reasons. In 
trunk, it's gone entirely. This happened at the time of the 
BlockingThreadPoolExecutor implementation. 


was (Author: mackrorysd):
Having slept on it, I think we should go ahead and undeprecate 
fs.s3a.threads.core (just saw in the code we had it once) but have it default 
to 0. A non-zero default is dangerous because if anyone has set max threads to 
1 (and I know some people have set it to low values - albeit not THAT low - to 
work around the problem I described above) and suddenly core threads is, say, 
5, you'll suddenly start getting exceptions on upgrade (or we'll have to handle 
that exception and rather quietly fail over to 0 core threads, which doesn't 
seem like expected behavior). But the more I think about this the less averse I 
am to making it tunable right away, especially since the property has existed 
before but has been unused for some time.

edit: it's actually only deprecated in branch-2 for compatibility reasons. In 
trunk, it's gone entirely. This happened at the time of the 
BlockingThreadPoolExecutor implementation. 

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-06-25 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872396#comment-16872396
 ] 

Sean Mackrory edited comment on HADOOP-15729 at 6/25/19 2:32 PM:
-

Having slept on it, I think we should go ahead and undeprecate 
fs.s3a.threads.core (just saw in the code we had it once) but have it default 
to 0. A non-zero default is dangerous because if anyone has set max threads to 
1 (and I know some people have set it to low values - albeit not THAT low - to 
work around the problem I described above) and suddenly core threads is, say, 
5, you'll suddenly start getting exceptions on upgrade (or we'll have to handle 
that exception and rather quietly fail over to 0 core threads, which doesn't 
seem like expected behavior). But the more I think about this the less averse I 
am to making it tunable right away, especially since the property has existed 
before but has been unused for some time.

edit: it's actually only deprecated in branch-2 for compatibility reasons. In 
trunk, it's gone entirely. This happened at the time of the 
BlockingThreadPoolExecutor implementation. 


was (Author: mackrorysd):
Having slept on it, I think we should go ahead and undeprecate 
fs.s3a.threads.core (just saw in the code we had it once) but have it default 
to 0. A non-zero default is dangerous because if anyone has set max threads to 
1 (and I know some people have set it to low values - albeit not THAT low - to 
work around the problem I described above) and suddenly core threads is, say, 
5, you'll suddenly start getting exceptions on upgrade (or we'll have to handle 
that exception and rather quietly fail over to 0 core threads, which doesn't 
seem like expected behavior). But the more I think about this the less averse I 
am to making it tunable right away, especially since the property has existed 
before but has been unused for some time.

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-06-25 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872396#comment-16872396
 ] 

Sean Mackrory commented on HADOOP-15729:


Having slept on it, I think we should go ahead and undeprecate 
fs.s3a.threads.core (just saw in the code we had it once) but have it default 
to 0. A non-zero default is dangerous because if anyone has set max threads to 
1 (and I know some people have set it to low values - albeit not THAT low - to 
work around the problem I described above) and suddenly core threads is, say, 
5, you'll suddenly start getting exceptions on upgrade (or we'll have to handle 
that exception and rather quietly fail over to 0 core threads, which doesn't 
seem like expected behavior). But the more I think about this the less averse I 
am to making it tunable right away, especially since the property has existed 
before but has been unused for some time.

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-06-24 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871830#comment-16871830
 ] 

Sean Mackrory commented on HADOOP-15729:


Attach a patch with my proposed change. Unfortunately, it has a bigger impact 
on simple things like 1 GB upload than I thought, although it's hard to be sure 
it's not noise. See below for numbers to upload a 1GB file from a machine in 
us-west-2 to a bucket / pre-created DynamoDB table in the same region. Maybe 
this is worth adding fs.s3a.core.threads with a moderate default value. 
Long-running processes (like Hive Server2) might access many buckets and 
fs.s3a.max.threads grows absolutely out of control - core threads could still 
do the same unless it's *much* lower, in which case you'd easily hit this 
performance regression anyway. I would suggest we just proceed and consider 
fs.s3a.core.threads if further performance testing reveals an issue. Thoughts?

Without change:
{code}
real0m27.415s
user0m25.128s
sys 0m6.377s

real0m25.360s
user0m25.081s
sys 0m6.368s

real0m27.615s
user0m25.296s
sys 0m6.015s

real0m25.001s
user0m25.408s
sys 0m6.717s

real0m28.083s
user0m24.764s
sys 0m5.774s

real0m26.117s
user0m25.192s
sys 0m5.867s
{code}

With change:
{code}
real0m28.928s
user0m24.182s
sys 0m5.699s

real0m33.359s
user0m25.508s
sys 0m6.407s

real0m44.412s
user0m24.565s
sys 0m6.226s

real0m27.469s
user0m25.326s
sys 0m6.142s

real0m35.660s
user0m25.206s
sys 0m6.154s

real0m31.811s
user0m25.042s
sys 0m6.057s
{code}

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15729) [s3a] stop treat fs.s3a.max.threads as the long-term minimum

2019-06-24 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15729:
---
Attachment: HADOOP-15729.001.patch

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> 
>
> Key: HADOOP-15729
> URL: https://issues.apache.org/jira/browse/HADOOP-15729
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
> Attachments: HADOOP-15729.001.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16211) Update guava to 27.0-jre in hadoop-project branch-3.2

2019-06-13 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-16211:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Update guava to 27.0-jre in hadoop-project branch-3.2
> -
>
> Key: HADOOP-16211
> URL: https://issues.apache.org/jira/browse/HADOOP-16211
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-16211-branch-3.2.001.patch, 
> HADOOP-16211-branch-3.2.002.patch, HADOOP-16211-branch-3.2.003.patch, 
> HADOOP-16211-branch-3.2.004.patch, HADOOP-16211-branch-3.2.005.patch, 
> HADOOP-16211-branch-3.2.006.patch
>
>
> com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found 
> CVE-2018-10237.
> This is a sub-task for branch-3.2 from HADOOP-15960 to track issues on that 
> particular branch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16211) Update guava to 27.0-jre in hadoop-project branch-3.2

2019-06-13 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863098#comment-16863098
 ] 

Sean Mackrory commented on HADOOP-16211:


Thanks for getting to the bottom of that. Will commit, pending verifying a good 
build locally still...

> Update guava to 27.0-jre in hadoop-project branch-3.2
> -
>
> Key: HADOOP-16211
> URL: https://issues.apache.org/jira/browse/HADOOP-16211
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-16211-branch-3.2.001.patch, 
> HADOOP-16211-branch-3.2.002.patch, HADOOP-16211-branch-3.2.003.patch, 
> HADOOP-16211-branch-3.2.004.patch, HADOOP-16211-branch-3.2.005.patch, 
> HADOOP-16211-branch-3.2.006.patch
>
>
> com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found 
> CVE-2018-10237.
> This is a sub-task for branch-3.2 from HADOOP-15960 to track issues on that 
> particular branch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16213) Update guava to 27.0-jre in hadoop-project branch-3.1

2019-06-13 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-16213:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Update guava to 27.0-jre in hadoop-project branch-3.1
> -
>
> Key: HADOOP-16213
> URL: https://issues.apache.org/jira/browse/HADOOP-16213
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Critical
> Attachments: HADOOP-16213-branch-3.1.001.patch, 
> HADOOP-16213-branch-3.1.002.patch, HADOOP-16213-branch-3.1.003.patch, 
> HADOOP-16213-branch-3.1.004.patch, HADOOP-16213-branch-3.1.005.patch, 
> HADOOP-16213-branch-3.1.006.patch
>
>
> com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found 
> CVE-2018-10237.
> This is a sub-task for branch-3.1 from HADOOP-15960 to track issues on that 
> particular branch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16213) Update guava to 27.0-jre in hadoop-project branch-3.1

2019-06-13 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863097#comment-16863097
 ] 

Sean Mackrory commented on HADOOP-16213:


Thanks for getting to the bottom of that. Build looks good to me otherwise. 
Committed!

> Update guava to 27.0-jre in hadoop-project branch-3.1
> -
>
> Key: HADOOP-16213
> URL: https://issues.apache.org/jira/browse/HADOOP-16213
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Critical
> Attachments: HADOOP-16213-branch-3.1.001.patch, 
> HADOOP-16213-branch-3.1.002.patch, HADOOP-16213-branch-3.1.003.patch, 
> HADOOP-16213-branch-3.1.004.patch, HADOOP-16213-branch-3.1.005.patch, 
> HADOOP-16213-branch-3.1.006.patch
>
>
> com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found 
> CVE-2018-10237.
> This is a sub-task for branch-3.1 from HADOOP-15960 to track issues on that 
> particular branch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16213) Update guava to 27.0-jre in hadoop-project branch-3.1

2019-06-07 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858775#comment-16858775
 ] 

Sean Mackrory commented on HADOOP-16213:


Same as HADOOP-16211 - +1 on the code, but there's a different set of 
hadoop-yarn failures we need to check are actually flaky.

> Update guava to 27.0-jre in hadoop-project branch-3.1
> -
>
> Key: HADOOP-16213
> URL: https://issues.apache.org/jira/browse/HADOOP-16213
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.1.0, 3.1.1, 3.1.2
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Critical
> Attachments: HADOOP-16213-branch-3.1.001.patch, 
> HADOOP-16213-branch-3.1.002.patch, HADOOP-16213-branch-3.1.003.patch, 
> HADOOP-16213-branch-3.1.004.patch, HADOOP-16213-branch-3.1.005.patch, 
> HADOOP-16213-branch-3.1.006.patch
>
>
> com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found 
> CVE-2018-10237.
> This is a sub-task for branch-3.1 from HADOOP-15960 to track issues on that 
> particular branch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16211) Update guava to 27.0-jre in hadoop-project branch-3.2

2019-06-07 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858767#comment-16858767
 ] 

Sean Mackrory commented on HADOOP-16211:


+1 on the patch, but I don't immediately recognize the hadoop-yarn test 
failures as being likely flakes. Have you looked into that already? Need to 
confirm they're not caused by some change in Guava.

> Update guava to 27.0-jre in hadoop-project branch-3.2
> -
>
> Key: HADOOP-16211
> URL: https://issues.apache.org/jira/browse/HADOOP-16211
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-16211-branch-3.2.001.patch, 
> HADOOP-16211-branch-3.2.002.patch, HADOOP-16211-branch-3.2.003.patch, 
> HADOOP-16211-branch-3.2.004.patch, HADOOP-16211-branch-3.2.005.patch, 
> HADOOP-16211-branch-3.2.006.patch
>
>
> com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found 
> CVE-2018-10237.
> This is a sub-task for branch-3.2 from HADOOP-15960 to track issues on that 
> particular branch. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16212) Update guava to 27.0-jre in hadoop-project branch-3.0

2019-06-03 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-16212:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed. As discussed offline: we need to remember to re-email the other 
communities' mailing lists when you're done with the 3.x branches to remind 
them.

> Update guava to 27.0-jre in hadoop-project branch-3.0
> -
>
> Key: HADOOP-16212
> URL: https://issues.apache.org/jira/browse/HADOOP-16212
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-16212-branch-3.0.001.patch, 
> HADOOP-16212-branch-3.0.002.patch, HADOOP-16212-branch-3.0.003.patch, 
> HADOOP-16212-branch-3.0.004.patch, HADOOP-16212-branch-3.0.005.patch
>
>
> com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found 
> CVE-2018-10237.
> This is a sub-task for branch-3.0 from HADOOP-15960 to track issues on that 
> particular branch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16212) Update guava to 27.0-jre in hadoop-project branch-3.0

2019-06-03 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854601#comment-16854601
 ] 

Sean Mackrory commented on HADOOP-16212:


+1. Will commit...


> Update guava to 27.0-jre in hadoop-project branch-3.0
> -
>
> Key: HADOOP-16212
> URL: https://issues.apache.org/jira/browse/HADOOP-16212
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-16212-branch-3.0.001.patch, 
> HADOOP-16212-branch-3.0.002.patch, HADOOP-16212-branch-3.0.003.patch, 
> HADOOP-16212-branch-3.0.004.patch, HADOOP-16212-branch-3.0.005.patch
>
>
> com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found 
> CVE-2018-10237.
> This is a sub-task for branch-3.0 from HADOOP-15960 to track issues on that 
> particular branch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16321) ITestS3ASSL+TestOpenSSLSocketFactory failing with java.lang.UnsatisfiedLinkErrors

2019-05-21 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory resolved HADOOP-16321.

Resolution: Fixed

> ITestS3ASSL+TestOpenSSLSocketFactory failing with 
> java.lang.UnsatisfiedLinkErrors
> -
>
> Key: HADOOP-16321
> URL: https://issues.apache.org/jira/browse/HADOOP-16321
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3, test
>Affects Versions: 3.3.0
> Environment: macos
>Reporter: Steve Loughran
>Assignee: Sahil Takiar
>Priority: Major
>
> the new test of HADOOP-16050  {{ITestS3ASSL}} is failing with 
> {{java.lang.UnsatisfiedLinkError}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15999) S3Guard: Better support for out-of-band operations

2019-05-15 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840846#comment-16840846
 ] 

Sean Mackrory commented on HADOOP-15999:


{quote} Seems like a common case{quote}

Yeah that's actually the exact case that motivated this ticket, IIRC...

> S3Guard: Better support for out-of-band operations
> --
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, 
> HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, 
> HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, 
> HADOOP-15999.009.patch, out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15999) S3Guard: Better support for out-of-band operations

2019-05-15 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840846#comment-16840846
 ] 

Sean Mackrory edited comment on HADOOP-15999 at 5/15/19 10:49 PM:
--

{quote} Seems like a common case{quote}

Yeah that's actually the exact case that motivated this ticket in the first 
case, IIRC...


was (Author: mackrorysd):
{quote} Seems like a common case{quote}

Yeah that's actually the exact case that motivated this ticket, IIRC...

> S3Guard: Better support for out-of-band operations
> --
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15999-007.patch, HADOOP-15999.001.patch, 
> HADOOP-15999.002.patch, HADOOP-15999.003.patch, HADOOP-15999.004.patch, 
> HADOOP-15999.005.patch, HADOOP-15999.006.patch, HADOOP-15999.008.patch, 
> HADOOP-15999.009.patch, out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16281) ABFS: Rename operation, GetFileStatus before rename operation and throw exception on the driver side

2019-05-03 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832524#comment-16832524
 ] 

Sean Mackrory commented on HADOOP-16281:


[~DanielZhou] I actually hadn't thought of it being applicable to Azure because 
the WASB connector already had similar mechanisms, and ADLS Gen1 and Gen2 
already offer the required file-system semantics. But then I remembered you can 
turn of Hierarchical Namespace :) If people want to run HBase without that 
feature for some reason, then yes, I'd love to make sure it supports ABFS well.

> ABFS: Rename operation, GetFileStatus before rename operation and  throw 
> exception on the driver side
> -
>
> Key: HADOOP-16281
> URL: https://issues.apache.org/jira/browse/HADOOP-16281
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Da Zhou
>Assignee: Da Zhou
>Priority: Major
>
> ABFS should add the rename with options:
>  [https://github.com/apache/hadoop/pull/743]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16222) Fix new deprecations after guava 27.0 update in trunk

2019-04-24 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-16222:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Fix new deprecations after guava 27.0 update in trunk
> -
>
> Key: HADOOP-16222
> URL: https://issues.apache.org/jira/browse/HADOOP-16222
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-16222.001.patch, HADOOP-16222.002.patch, 
> HADOOP-16222.003.patch, HADOOP-16222.004.patch
>
>
> *Note*: this can be done after the guava update.
> There are a bunch of new deprecations after the guava update. We need to fix 
> those, because these will be removed after the next guava version (after 27).
> I created a separate jira for this from HADOOP-16210 because jenkins 
> pre-commit test job (yetus) will time-out after 5 hours after running this 
> together. 
> {noformat}
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/SemaphoredDelegatingExecutor.java:[110,20]
>  [deprecation] immediateFailedCheckedFuture(X) in Futures has been 
> deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ZKUtil.java:[175,16]
>  [deprecation] toString(File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestTableMapping.java:[44,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestTableMapping.java:[67,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestTableMapping.java:[131,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestTableMapping.java:[150,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestTableMapping.java:[169,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestZKUtil.java:[134,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestSecurityUtil.java:[437,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdminMiniCluster.java:[211,26]
>  [deprecation] toString(File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdminMiniCluster.java:[219,36]
>  [deprecation] toString(File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/TestFpgaResourceHandler.java:[130,9]
>  [deprecation] append(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/TestFpgaResourceHandler.java:[352,9]
>  [deprecation] append(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java:[1161,18]
>  [deprecation] propagate(Throwable) in Throwables has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/ServiceTestUtils.java:[413,18]
>  [deprecation] propagate(Throwable) in Throwables has been deprecated
> {noformat}
> Maybe fix these by module by module instead of a single patch?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To 

[jira] [Commented] (HADOOP-16222) Fix new deprecations after guava 27.0 update in trunk

2019-04-24 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825327#comment-16825327
 ] 

Sean Mackrory commented on HADOOP-16222:


+1. Note that it's no longer clean - YARN-9495 already committed the 
findbugs-exclude.xml changes, so only committing the rest of the patch.

> Fix new deprecations after guava 27.0 update in trunk
> -
>
> Key: HADOOP-16222
> URL: https://issues.apache.org/jira/browse/HADOOP-16222
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-16222.001.patch, HADOOP-16222.002.patch, 
> HADOOP-16222.003.patch, HADOOP-16222.004.patch
>
>
> *Note*: this can be done after the guava update.
> There are a bunch of new deprecations after the guava update. We need to fix 
> those, because these will be removed after the next guava version (after 27).
> I created a separate jira for this from HADOOP-16210 because jenkins 
> pre-commit test job (yetus) will time-out after 5 hours after running this 
> together. 
> {noformat}
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/SemaphoredDelegatingExecutor.java:[110,20]
>  [deprecation] immediateFailedCheckedFuture(X) in Futures has been 
> deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ZKUtil.java:[175,16]
>  [deprecation] toString(File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestTableMapping.java:[44,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestTableMapping.java:[67,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestTableMapping.java:[131,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestTableMapping.java:[150,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/TestTableMapping.java:[169,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestZKUtil.java:[134,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestSecurityUtil.java:[437,9]
>  [deprecation] write(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdminMiniCluster.java:[211,26]
>  [deprecation] toString(File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdminMiniCluster.java:[219,36]
>  [deprecation] toString(File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/TestFpgaResourceHandler.java:[130,9]
>  [deprecation] append(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/fpga/TestFpgaResourceHandler.java:[352,9]
>  [deprecation] append(CharSequence,File,Charset) in Files has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java:[1161,18]
>  [deprecation] propagate(Throwable) in Throwables has been deprecated
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/ServiceTestUtils.java:[413,18]
>  [deprecation] propagate(Throwable) in Throwables has been deprecated
> {noformat}
> Maybe fix these by module by module instead of a single patch?



--
This message was 

[jira] [Comment Edited] (HADOOP-16085) S3Guard: use object version or etags to protect against inconsistent read after replace/overwrite

2019-04-17 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820236#comment-16820236
 ] 

Sean Mackrory edited comment on HADOOP-16085 at 4/17/19 3:58 PM:
-

Left some feedback in-line on the pull-request (and for HADOOP-16221 too). Some 
more general thoughts:
* Have been discussing with [~ste...@apache.org] whether or not the FileStatus 
-> S3AFileStatus and schema changes should be separated out from the 
enforcement. I think the best argument for that is that it's a smaller change 
to get older clients to notify newer clients of changes whereas only the newer 
ones will enforce. The other factor mentioned is the desire for keeping S3Guard 
relatively storage-agnostic, but I honestly just don't see how we can do that 
and still have a robust solution. S3 is popular enough to warrant a custom 
solution that really does fix all the holes. Personally, I think we should just 
keep this change together.
* I don't suppose there's an interface we can rely on to provide getETag() and 
getVersionId(), is there? This is where Go's duck-typing would be nice so we 
could eliminate 2 (or more) or the args to every constructor call. Not a big 
deal. I have a small to do list of other little things to look into but as 
you'll see on the PR, the overwhelming majority of my feedback is pretty 
mechanical. I think overall this is looking like a good solid patch.

I am also getting some unit test failures running this in CDH. Will do some 
more test runs on the upstream base and with various parameters to see if I can 
narrow it down. I assume you've been running the tests with no problems?


was (Author: mackrorysd):
Left some feedback in-line on the pull-request (and for HADOOP-16221 too). Some 
more general thoughts:
* Have been discussing with [~ste...@apache.org] whether or not the FileStatus 
-> S3AFileStatus and schema changes should be separated out from the 
enforcement. I think the best argument for that is that it's a smaller change 
to get older clients to notify newer clients of changes whereas only the newer 
ones will enforce. The other factor mentioned is the desire for keeping S3Guard 
relatively storage-agnostic, but I honestly just don't see how we can do that 
and still have a robust solution. S3 is popular enough to warrant a custom 
solution that really does fix all the holes. Personally, I think we should just 
keep this change together.
* I don't suppose there's an interface we can rely on to provide getETag() and 
getVersionId(), is there? This is where Go's duck-typing would be nice so we 
could eliminate 2 (or more) or the args to every constructor call. Not a big 
deal. I have a small to do list of other little things to look into but as 
you'll see on the PR, the overwhelming majority of my feedback is pretty 
mechanical. I think overall this is looking like a good solid patch.

> S3Guard: use object version or etags to protect against inconsistent read 
> after replace/overwrite
> -
>
> Key: HADOOP-16085
> URL: https://issues.apache.org/jira/browse/HADOOP-16085
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Ben Roling
>Assignee: Ben Roling
>Priority: Major
> Attachments: HADOOP-16085-003.patch, HADOOP-16085_002.patch, 
> HADOOP-16085_3.2.0_001.patch
>
>
> Currently S3Guard doesn't track S3 object versions.  If a file is written in 
> S3A with S3Guard and then subsequently overwritten, there is no protection 
> against the next reader seeing the old version of the file instead of the new 
> one.
> It seems like the S3Guard metadata could track the S3 object version.  When a 
> file is created or updated, the object version could be written to the 
> S3Guard metadata.  When a file is read, the read out of S3 could be performed 
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my 
> impression from looking through the code.  My organization is looking to 
> shift some datasets stored in HDFS over to S3 and is concerned about this 
> potential issue as there are some cases in our codebase that would do an 
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite 
> track down any JIRAs discussing it.  If there is one, feel free to close this 
> with a reference to it.
> Am I understanding things correctly?  Is this idea feasible?  Any feedback 
> that could be provided would be appreciated.  We may consider crafting a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (HADOOP-16085) S3Guard: use object version or etags to protect against inconsistent read after replace/overwrite

2019-04-17 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820236#comment-16820236
 ] 

Sean Mackrory commented on HADOOP-16085:


Left some feedback in-line on the pull-request (and for HADOOP-16221 too). Some 
more general thoughts:
* Have been discussing with [~ste...@apache.org] whether or not the FileStatus 
-> S3AFileStatus and schema changes should be separated out from the 
enforcement. I think the best argument for that is that it's a smaller change 
to get older clients to notify newer clients of changes whereas only the newer 
ones will enforce. The other factor mentioned is the desire for keeping S3Guard 
relatively storage-agnostic, but I honestly just don't see how we can do that 
and still have a robust solution. S3 is popular enough to warrant a custom 
solution that really does fix all the holes. Personally, I think we should just 
keep this change together.
* I don't suppose there's an interface we can rely on to provide getETag() and 
getVersionId(), is there? This is where Go's duck-typing would be nice so we 
could eliminate 2 (or more) or the args to every constructor call. Not a big 
deal. I have a small to do list of other little things to look into but as 
you'll see on the PR, the overwhelming majority of my feedback is pretty 
mechanical. I think overall this is looking like a good solid patch.

> S3Guard: use object version or etags to protect against inconsistent read 
> after replace/overwrite
> -
>
> Key: HADOOP-16085
> URL: https://issues.apache.org/jira/browse/HADOOP-16085
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Ben Roling
>Assignee: Ben Roling
>Priority: Major
> Attachments: HADOOP-16085-003.patch, HADOOP-16085_002.patch, 
> HADOOP-16085_3.2.0_001.patch
>
>
> Currently S3Guard doesn't track S3 object versions.  If a file is written in 
> S3A with S3Guard and then subsequently overwritten, there is no protection 
> against the next reader seeing the old version of the file instead of the new 
> one.
> It seems like the S3Guard metadata could track the S3 object version.  When a 
> file is created or updated, the object version could be written to the 
> S3Guard metadata.  When a file is read, the read out of S3 could be performed 
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my 
> impression from looking through the code.  My organization is looking to 
> shift some datasets stored in HDFS over to S3 and is concerned about this 
> potential issue as there are some cases in our codebase that would do an 
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite 
> track down any JIRAs discussing it.  If there is one, feel free to close this 
> with a reference to it.
> Am I understanding things correctly?  Is this idea feasible?  Any feedback 
> that could be provided would be appreciated.  We may consider crafting a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16246) Unbounded thread pool maximum pool size in S3AFileSystem TransferManager

2019-04-10 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815024#comment-16815024
 ] 

Sean Mackrory commented on HADOOP-16246:


Yeah it was actually almost the reverse, IIRC. A couple of layers of monitoring 
tasks would get pushed first, and then the tasks that they were monitoring 
would get pushed after that. It did seem terrible, so I'm happy to hear it's 
been rethought.

> Unbounded thread pool maximum pool size in S3AFileSystem TransferManager
> 
>
> Key: HADOOP-16246
> URL: https://issues.apache.org/jira/browse/HADOOP-16246
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Greg Kinman
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have something running in production that is running up on {{ulimit}} 
> trying to create {{s3a-transfer-unbounded}} threads.
> Relevant background: https://issues.apache.org/jira/browse/HADOOP-13826.
> Before that change, the thread pool used in the {{TransferManager}} had both 
> a reasonably small maximum pool size and work queue capacity.
> After that change, the thread pool has both a maximum pool size and work 
> queue capacity of {{Integer.MAX_VALUE}}.
> This seems like a pretty bad idea, because now we have, practically speaking, 
> no bound on the number of threads that might get created. I understand the 
> change was made in response to experiencing deadlocks and at the warning of 
> the documentation, which I will repeat here:
> {quote}It is not recommended to use a single threaded executor or a thread 
> pool with a bounded work queue as control tasks may submit subtasks that 
> can't complete until all sub tasks complete. Using an incorrectly configured 
> thread pool may cause a deadlock (I.E. the work queue is filled with control 
> tasks that can't finish until subtasks complete but subtasks can't execute 
> because the queue is filled).
> {quote}
> The documentation only warns against having a bounded _work queue_, not 
> against having a bounded _maximum pool size_. And this seems fine, as having 
> an unbounded work queue sounds ok. Having an unbounded maximum pool size, 
> however, does not.
> I will also note that this constructor is now deprecated and suggests using 
> {{TransferManagerBuilder}} instead, which by default creates a fixed thread 
> pool of size 10: 
> [https://github.com/aws/aws-sdk-java/blob/1.11.534/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/TransferManagerUtils.java#L59].
> I suggest we make a small change here and keep the maximum pool size at 
> {{maxThreads}}, which defaults to 10, while keeping the work queue as is 
> (unbounded).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16246) Unbounded thread pool maximum pool size in S3AFileSystem TransferManager

2019-04-10 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814954#comment-16814954
 ] 

Sean Mackrory commented on HADOOP-16246:


{quote}The documentation only warns against having a bounded work queue, not 
against having a bounded maximum pool size. {quote}

The deadlocks we experienced were actually because the pool itself was 
completely saturated with tasks that depended on tasks that were in the queue. 
Unbounded queues wouldn't fix that. I haven't looked at TransferManagerBuilder, 
though - will take a look.

> Unbounded thread pool maximum pool size in S3AFileSystem TransferManager
> 
>
> Key: HADOOP-16246
> URL: https://issues.apache.org/jira/browse/HADOOP-16246
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Greg Kinman
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I have something running in production that is running up on {{ulimit}} 
> trying to create {{s3a-transfer-unbounded}} threads.
> Relevant background: https://issues.apache.org/jira/browse/HADOOP-13826.
> Before that change, the thread pool used in the {{TransferManager}} had both 
> a reasonably small maximum pool size and work queue capacity.
> After that change, the thread pool has both a maximum pool size and work 
> queue capacity of {{Integer.MAX_VALUE}}.
> This seems like a pretty bad idea, because now we have, practically speaking, 
> no bound on the number of threads that might get created. I understand the 
> change was made in response to experiencing deadlocks and at the warning of 
> the documentation, which I will repeat here:
> {quote}It is not recommended to use a single threaded executor or a thread 
> pool with a bounded work queue as control tasks may submit subtasks that 
> can't complete until all sub tasks complete. Using an incorrectly configured 
> thread pool may cause a deadlock (I.E. the work queue is filled with control 
> tasks that can't finish until subtasks complete but subtasks can't execute 
> because the queue is filled).
> {quote}
> The documentation only warns against having a bounded _work queue_, not 
> against having a bounded _maximum pool size_. And this seems fine, as having 
> an unbounded work queue sounds ok. Having an unbounded maximum pool size, 
> however, does not.
> I will also note that this constructor is now deprecated and suggests using 
> {{TransferManagerBuilder}} instead, which by default creates a fixed thread 
> pool of size 10: 
> [https://github.com/aws/aws-sdk-java/blob/1.11.534/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/TransferManagerUtils.java#L59].
> I suggest we make a small change here and keep the maximum pool size at 
> {{maxThreads}}, which defaults to 10, while keeping the work queue as is 
> (unbounded).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16210) Update guava to 27.0-jre in hadoop-project trunk

2019-04-03 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809130#comment-16809130
 ] 

Sean Mackrory edited comment on HADOOP-16210 at 4/3/19 7:21 PM:


{quote}-1   javac   19m 7s  root generated 15 new + 1481 unchanged - 1 
fixed = 1496 total (was 1482) {quote}

I didn't notice this one before I pushed. We should eliminate those if we can 
while the issue is still fresh. Care to follow-up on that, [~gabor.bota]? 
(edit: actually Gabor already filed HADOOP-16222 for that, and it makes the 
issue of Yetus timing out before running all the tests worse)

I should also note that there was a [DISCUSS] thread on the mailing list about 
this, but it's been no opposition, only support. I committed to trunk only - 
since you're still testing downstream, we should keep this open to backport to 
those branches as they seem to be confirmed ready.


was (Author: mackrorysd):
{quote}-1   javac   19m 7s  root generated 15 new + 1481 unchanged - 1 
fixed = 1496 total (was 1482) {quote}

I didn't notice this one before I pushed. We should eliminate those if we can 
while the issue is still fresh. Care to follow-up on that, [~gabor.bota]?

I should also note that there was a [DISCUSS] thread on the mailing list about 
this, but it's been no opposition, only support. I committed to trunk only - 
since you're still testing downstream, we should keep this open to backport to 
those branches as they seem to be confirmed ready.

> Update guava to 27.0-jre in hadoop-project trunk
> 
>
> Key: HADOOP-16210
> URL: https://issues.apache.org/jira/browse/HADOOP-16210
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Critical
> Attachments: HADOOP-16210.001.patch, 
> HADOOP-16210.002.findbugsfix.wip.patch, HADOOP-16210.002.patch, 
> HADOOP-16210.003.patch
>
>
> com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found 
> CVE-2018-10237.
> This is a sub-task for trunk from HADOOP-15960 to track issues with that 
> particular branch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16210) Update guava to 27.0-jre in hadoop-project trunk

2019-04-03 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809130#comment-16809130
 ] 

Sean Mackrory commented on HADOOP-16210:


{quote}-1   javac   19m 7s  root generated 15 new + 1481 unchanged - 1 
fixed = 1496 total (was 1482) {quote}

I didn't notice this one before I pushed. We should eliminate those if we can 
while the issue is still fresh. Care to follow-up on that, [~gabor.bota]?

I should also note that there was a [DISCUSS] thread on the mailing list about 
this, but it's been no opposition, only support. I committed to trunk only - 
since you're still testing downstream, we should keep this open to backport to 
those branches as they seem to be confirmed ready.

> Update guava to 27.0-jre in hadoop-project trunk
> 
>
> Key: HADOOP-16210
> URL: https://issues.apache.org/jira/browse/HADOOP-16210
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Critical
> Attachments: HADOOP-16210.001.patch, 
> HADOOP-16210.002.findbugsfix.wip.patch, HADOOP-16210.002.patch, 
> HADOOP-16210.003.patch
>
>
> com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found 
> CVE-2018-10237.
> This is a sub-task for trunk from HADOOP-15960 to track issues with that 
> particular branch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16210) Update guava to 27.0-jre in hadoop-project trunk

2019-04-03 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809125#comment-16809125
 ] 

Sean Mackrory commented on HADOOP-16210:


The failure is the libprotoc version mismatch I occasionally see pop up, and 
that's always flaky when I try to look at it...

+1 and pushed, as commented on the PR. I squashed the 2 commits to push, but I 
don't immediately see how you're supposed to close a PR that was merged 
out-of-band. Maybe it's the submitter that does that?

> Update guava to 27.0-jre in hadoop-project trunk
> 
>
> Key: HADOOP-16210
> URL: https://issues.apache.org/jira/browse/HADOOP-16210
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Critical
> Attachments: HADOOP-16210.001.patch, 
> HADOOP-16210.002.findbugsfix.wip.patch, HADOOP-16210.002.patch, 
> HADOOP-16210.003.patch
>
>
> com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found 
> CVE-2018-10237.
> This is a sub-task for trunk from HADOOP-15960 to track issues with that 
> particular branch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16210) Update guava to 27.0-jre in hadoop-project trunk

2019-04-01 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807140#comment-16807140
 ] 

Sean Mackrory commented on HADOOP-16210:


Beyond the findbugs issues, I'm supportive of this change. I also ran the Azure 
tests. There's more work to be done coordinating with downstream projects, but 
I think that can happen after it's submitted to trunk. We've gotta commit at 
some point to really see what else breaks that can't foresee. Pretty dangerous 
to be as far behind on dependencies as we are with this one - even if we're not 
affected by specific vulnerabilities, IMO.

> Update guava to 27.0-jre in hadoop-project trunk
> 
>
> Key: HADOOP-16210
> URL: https://issues.apache.org/jira/browse/HADOOP-16210
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Critical
> Attachments: HADOOP-16210.001.patch, HADOOP-16210.002.patch
>
>
> com.google.guava:guava should be upgraded to 27.0-jre due to new CVE's found 
> CVE-2018-10237.
> This is a sub-task for trunk from HADOOP-15960 to track issues with that 
> particular branch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-22 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775467#comment-16775467
 ] 

Sean Mackrory commented on HADOOP-15625:


Just as a heads up for anyone else following along, the .003. patch looks like 
it's completely unrelated to the rest of this.

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch, 
> HADOOP-15625-003.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15999) [s3a] Better support for out-of-band operations

2019-02-21 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774397#comment-16774397
 ] 

Sean Mackrory commented on HADOOP-15999:


FWIW I'm not seeing that test failure, but can't dig into it much myself right 
now.

> [s3a] Better support for out-of-band operations
> ---
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15999.001.patch, HADOOP-15999.002.patch, 
> HADOOP-15999.003.patch, HADOOP-15999.004.patch, HADOOP-15999.005.patch, 
> HADOOP-15999.006.patch, out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16101) Use lighter-weight alternatives to innerGetFileStatus where possible

2019-02-08 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763826#comment-16763826
 ] 

Sean Mackrory commented on HADOOP-16101:


The other detail here is that we don't think we need to do a HEAD at all before 
doing the actual GET work when reading.

> Use lighter-weight alternatives to innerGetFileStatus where possible
> 
>
> Key: HADOOP-16101
> URL: https://issues.apache.org/jira/browse/HADOOP-16101
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Sean Mackrory
>Priority: Major
>
> Discussion in HADOOP-15999 highlighted the heaviness of a full 
> innerGetFileStatus call, where many usages of it may need a lighter weight 
> fileExists, etc. Let's investigate usage of innerGetFileStatus and slim it 
> down where possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15999) [s3a] Better support for out-of-band operations

2019-02-08 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763800#comment-16763800
 ] 

Sean Mackrory commented on HADOOP-15999:


+1 on the patch. I filed HADOOP-15619 to make sure we don't lose Steve's  
suggestion of lighter weight alternatives to innerGetFileStatus where possible.

{quote}any time we know we don't care 100% on being consistent{quote}

I was talking to someone recently about slimming down some of the checks done 
on create, etc. (or at least make them optional). Otherwise the idea of knowing 
we might be less consistent than S3 itself scares me more than a little. Feels 
like whenever we've done that someone invariably hit the use case we thought 
was fine.

> [s3a] Better support for out-of-band operations
> ---
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15999.001.patch, HADOOP-15999.002.patch, 
> out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16101) Use lighter-weight alternatives to innerGetFileStatus where possible

2019-02-08 Thread Sean Mackrory (JIRA)
Sean Mackrory created HADOOP-16101:
--

 Summary: Use lighter-weight alternatives to innerGetFileStatus 
where possible
 Key: HADOOP-16101
 URL: https://issues.apache.org/jira/browse/HADOOP-16101
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Sean Mackrory


Discussion in HADOOP-15999 highlighted the heaviness of a full 
innerGetFileStatus call, where many usages of it may need a lighter weight 
fileExists, etc. Let's investigate usage of innerGetFileStatus and slim it down 
where possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15582) Document ABFS

2019-02-05 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761081#comment-16761081
 ] 

Sean Mackrory commented on HADOOP-15582:


I can probably take care of this next week.

> Document ABFS
> -
>
> Key: HADOOP-15582
> URL: https://issues.apache.org/jira/browse/HADOOP-15582
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs/azure
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Thomas Marquardt
>Priority: Major
>
> Add documentation for abfs under 
> {{hadoop-tools/hadoop-azure/src/site/markdown}}
> Possible topics include
> * intro to scheme
> * why abfs (link to MSDN, etc)
> * config options
> * switching from wasb/interop
> * troubleshooting
> testing.md should add a section on testing this stuff too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16085) S3Guard: use object version to protect against inconsistent read after replace/overwrite

2019-02-01 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758726#comment-16758726
 ] 

Sean Mackrory commented on HADOOP-16085:


{quote}object version can be up to 1024 characters{quote}
I'm less concerned about the space taken up in the metadata store - the 
problems I'm trying to remember were due to the amount of space in the S3 
bucket itself. It was with repeated test runs, so there were similar filenames 
used many, many times (which is not that realistic, but the more you care about 
read-after-update consistency, the more this would impact you), so many 
versions had been kept.

> S3Guard: use object version to protect against inconsistent read after 
> replace/overwrite
> 
>
> Key: HADOOP-16085
> URL: https://issues.apache.org/jira/browse/HADOOP-16085
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Ben Roling
>Priority: Major
> Attachments: HADOOP-16085_3.2.0_001.patch
>
>
> Currently S3Guard doesn't track S3 object versions.  If a file is written in 
> S3A with S3Guard and then subsequently overwritten, there is no protection 
> against the next reader seeing the old version of the file instead of the new 
> one.
> It seems like the S3Guard metadata could track the S3 object version.  When a 
> file is created or updated, the object version could be written to the 
> S3Guard metadata.  When a file is read, the read out of S3 could be performed 
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my 
> impression from looking through the code.  My organization is looking to 
> shift some datasets stored in HDFS over to S3 and is concerned about this 
> potential issue as there are some cases in our codebase that would do an 
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite 
> track down any JIRAs discussing it.  If there is one, feel free to close this 
> with a reference to it.
> Am I understanding things correctly?  Is this idea feasible?  Any feedback 
> that could be provided would be appreciated.  We may consider crafting a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16085) S3Guard: use object version to protect against inconsistent read after replace/overwrite

2019-02-01 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758672#comment-16758672
 ] 

Sean Mackrory commented on HADOOP-16085:


Thanks for submitting a patch [~ben.roling]. Haven't had a chance to do a full 
review yet, but one of [~fabbri]'s comments was also high on my list of things 
to watch out for:
{quote}Backward / forward compatible with existing S3Guarded buckets and Dynamo 
tables.{quote}
Specifically, we need to gracefully deal with any row missing an object 
version. The other direction is easy - if this simply adds a new field, old 
code will ignore it and we'll continue to get the current behavior.

My other concern is that this requires enabling object versioning. I know 
[~fabbri] has done some testing with that and I think eventually hit issues. 
Was it just a matter of the space all the versions were taking up, or was it 
actually a performance problem once there was enough overhead?

> S3Guard: use object version to protect against inconsistent read after 
> replace/overwrite
> 
>
> Key: HADOOP-16085
> URL: https://issues.apache.org/jira/browse/HADOOP-16085
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Ben Roling
>Priority: Major
> Attachments: HADOOP-16085_3.2.0_001.patch
>
>
> Currently S3Guard doesn't track S3 object versions.  If a file is written in 
> S3A with S3Guard and then subsequently overwritten, there is no protection 
> against the next reader seeing the old version of the file instead of the new 
> one.
> It seems like the S3Guard metadata could track the S3 object version.  When a 
> file is created or updated, the object version could be written to the 
> S3Guard metadata.  When a file is read, the read out of S3 could be performed 
> by object version, ensuring the correct version is retrieved.
> I don't have a lot of direct experience with this yet, but this is my 
> impression from looking through the code.  My organization is looking to 
> shift some datasets stored in HDFS over to S3 and is concerned about this 
> potential issue as there are some cases in our codebase that would do an 
> overwrite.
> I imagine this idea may have been considered before but I couldn't quite 
> track down any JIRAs discussing it.  If there is one, feel free to close this 
> with a reference to it.
> Am I understanding things correctly?  Is this idea feasible?  Any feedback 
> that could be provided would be appreciated.  We may consider crafting a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16041) UserAgent string for ABFS

2019-01-29 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755525#comment-16755525
 ] 

Sean Mackrory commented on HADOOP-16041:


Committed. Confirmed, for everyone's reference, that the User-Agent now looks 
like this for me:
{code}Azure Blob FS/3.3.0-SNAPSHOT (JavaJRE 1.8.0_191; Linux 
4.15.0-43-generic){code}

All tests pass, except for this one that's always being weird (although I 
thought it was just WASB compat tests that were being weird and this is 
different):
{code}ITestGetNameSpaceEnabled.testNonXNSAccount:57->Assert.assertFalse:64->Assert.assertTrue:41->Assert.fail:88
 Expecting getIsNamespaceEnabled() return false{code}

> UserAgent string for ABFS
> -
>
> Key: HADOOP-16041
> URL: https://issues.apache.org/jira/browse/HADOOP-16041
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-16041.001.patch, HADOOP-16041.002.patch, 
> HADOOP-16041.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16041) UserAgent string for ABFS

2019-01-29 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755439#comment-16755439
 ] 

Sean Mackrory edited comment on HADOOP-16041 at 1/29/19 10:39 PM:
--

+1. Will commit. (edit: as briefly discussed offline - there is now a space 
after the slash that appears to be unintentional)


was (Author: mackrorysd):
+1. Will commit.

> UserAgent string for ABFS
> -
>
> Key: HADOOP-16041
> URL: https://issues.apache.org/jira/browse/HADOOP-16041
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-16041.001.patch, HADOOP-16041.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16041) UserAgent string for ABFS

2019-01-29 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755439#comment-16755439
 ] 

Sean Mackrory commented on HADOOP-16041:


+1. Will commit.

> UserAgent string for ABFS
> -
>
> Key: HADOOP-16041
> URL: https://issues.apache.org/jira/browse/HADOOP-16041
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-16041.001.patch, HADOOP-16041.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16041) UserAgent string for ABFS

2019-01-25 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752888#comment-16752888
 ] 

Sean Mackrory commented on HADOOP-16041:


Looks pretty good to me. Test failure is unrelated and probably just flaky 
given it's fuzzy logic to begin with. We probably want it to be "ABFS/" + 
VersionInfo..., though, as [~tmarquardt] suggested. I'm fairly certain that the 
term ABFS uniquely refers to the Hadoop driver, right? So that would be a nice 
clean way to distinguish Hadoop / ABFS from other ADLS Gen2 clients since the 
specific formatting of the Hadoop version can vary. I'd also be fine with 
"Azure Blob FS/" so that we simply replace the 1.0 with the actual Hadoop 
version (unless 1.0 referred to an API version or something?)
 

> UserAgent string for ABFS
> -
>
> Key: HADOOP-16041
> URL: https://issues.apache.org/jira/browse/HADOOP-16041
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-16041.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15999) [s3a] Better support for out-of-band operations

2019-01-14 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742638#comment-16742638
 ] 

Sean Mackrory commented on HADOOP-15999:


After a preliminary review I think this looks pretty good. A couple of requests 
though:
* Let's fix the checkstyle issues if you're not already doing that
* Can we apply similar logic to when getting directory listings? I think it'll 
be quick enough to just include as part of this JIRA.
I'll do another review when I'm less rushed & tired, but I don't see any other 
issues with the code right now. Good stuff - thank you.

> [s3a] Better support for out-of-band operations
> ---
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15999.001.patch, out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16041) UserAgent string for ABFS

2019-01-11 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740666#comment-16740666
 ] 

Sean Mackrory commented on HADOOP-16041:


In the case of CDH (and I believe Hortonworks / HDInsight) the version number 
includes an identification of the vendor and the vendor release. For example 
"3.0.0-cdh6.0.0". That's how the vendor has been identified in the other 
connectors as far as I'm aware - that's definitely all we did for ADLS Gen1 and 
I know the required information was still collected. In these Hadoop distros 
the user-configurable prefix is also modifiable by the user, and has been used 
(not on Azure specifically in the cases I'm thinking of) to identify workloads 
from a particular large user or something like that. It's not as good a fit for 
identifying the vendor. With the Databricks model where it's more of a service, 
then it does make sense to identify themselves using the configured prefix 
(which is actually a suffix right now) unless their version string embeds 
enough information.

+1 to ABFS/, if that allows you to identify Hadoop vendors 
sufficiently well.

> UserAgent string for ABFS
> -
>
> Key: HADOOP-16041
> URL: https://issues.apache.org/jira/browse/HADOOP-16041
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16041) UserAgent string for ABFS

2019-01-10 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739970#comment-16739970
 ] 

Sean Mackrory commented on HADOOP-16041:


{quote}As/when a successor to htrace is added, making it an option to 
dynamically add trace info would be nice, though as that is per-request it gets 
a bit more complex, especially with a thread pool to service requests.{quote}

Yeah I'd suggest we tackle that as a separate feature when we have a clear idea 
of what that trace info would look like. Do you have like a compressed stack 
trace in mind, or more just a unique ID for each request or something?

> UserAgent string for ABFS
> -
>
> Key: HADOOP-16041
> URL: https://issues.apache.org/jira/browse/HADOOP-16041
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16041) UserAgent string for ABFS

2019-01-10 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739969#comment-16739969
 ] 

Sean Mackrory commented on HADOOP-16041:


{quote}Partner Service{quote}

To be clear, that's not part of the user agent string. I suspect you're getting 
that from verifyUserAgent, where it sets the configurable prefix to "Partner 
Service", but any Hadoop installation could set that to whatever they wanted. 
The Hadoop distribution in usage can actually be inferred from VersionInfo, 
which is how it's done in other cloud connectors already (and is actually 
what's motivating this change).

We should coordinate with [~tmarquardt] to make sure there isn't any intended 
future use of Azure Blob FS/1.0, if that's supposed to map to some API version 
or something. Since we don't have a distinct SDK, Hadoop *is* the client, so 
the Hadoop version makes sense to me. But we can preserve Azure Blob FS/1.0 if 
that carries any special meaning for Microsoft.

> UserAgent string for ABFS
> -
>
> Key: HADOOP-16041
> URL: https://issues.apache.org/jira/browse/HADOOP-16041
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16041) UserAgent string for ABFS

2019-01-10 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739969#comment-16739969
 ] 

Sean Mackrory edited comment on HADOOP-16041 at 1/11/19 2:49 AM:
-

{quote}Partner Service{quote}

To be clear, that's not part of the user agent string. I suspect you're getting 
that from verifyUserAgent, where it sets the configurable prefix to "Partner 
Service", but any Hadoop installation could set that to whatever they wanted. 
The Hadoop distribution in usage can actually be inferred from VersionInfo, 
which is how it's done in other cloud connectors already (and is actually 
what's motivating this change).

We should coordinate with [~tmarquardt] and [~DanielZhou] to make sure there 
isn't any intended future use of Azure Blob FS/1.0, if that's supposed to map 
to some API version or something. Since we don't have a distinct SDK, Hadoop 
*is* the client, so the Hadoop version makes sense to me. But we can preserve 
Azure Blob FS/1.0 if that carries any special meaning for Microsoft.


was (Author: mackrorysd):
{quote}Partner Service{quote}

To be clear, that's not part of the user agent string. I suspect you're getting 
that from verifyUserAgent, where it sets the configurable prefix to "Partner 
Service", but any Hadoop installation could set that to whatever they wanted. 
The Hadoop distribution in usage can actually be inferred from VersionInfo, 
which is how it's done in other cloud connectors already (and is actually 
what's motivating this change).

We should coordinate with [~tmarquardt] to make sure there isn't any intended 
future use of Azure Blob FS/1.0, if that's supposed to map to some API version 
or something. Since we don't have a distinct SDK, Hadoop *is* the client, so 
the Hadoop version makes sense to me. But we can preserve Azure Blob FS/1.0 if 
that carries any special meaning for Microsoft.

> UserAgent string for ABFS
> -
>
> Key: HADOOP-16041
> URL: https://issues.apache.org/jira/browse/HADOOP-16041
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Shweta
>Assignee: Shweta
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16027) [DOC] Effective use of FS instances during S3A integration tests

2019-01-10 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-16027:
---
   Resolution: Fixed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

> [DOC] Effective use of FS instances during S3A integration tests
> 
>
> Key: HADOOP-16027
> URL: https://issues.apache.org/jira/browse/HADOOP-16027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-16027.001.patch, HADOOP-16027.002.patch
>
>
> While fixing HADOOP-15819 we found that a closed fs got into the static fs 
> cache during testing, which caused other tests to fail when the tests were 
> running sequentially.
> We should document some best practices in the testing section on the s3 docs 
> with the following:
> {panel}
> Tests using FileSystems are fastest if they can recycle the existing FS 
> instance from the same JVM. If you do that, you MUST NOT close or do unique 
> configuration on them. If you want a guarantee of 100% isolation or an 
> instance with unique config, create a new instance
> which you MUST close in the teardown to avoid leakage of resources.
> Do not add FileSystem instances (with e.g 
> org.apache.hadoop.fs.FileSystem#addFileSystemForTesting) to the cache that 
> will be modified or closed during the test runs. This can cause other tests 
> to fail when using the same modified or closed FS instance. For more details 
> see HADOOP-15819.
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16027) [DOC] Effective use of FS instances during S3A integration tests

2019-01-09 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738490#comment-16738490
 ] 

Sean Mackrory commented on HADOOP-16027:


+1, committed. Nit: added the empty lines between what appeared to be 
paragraphs.

> [DOC] Effective use of FS instances during S3A integration tests
> 
>
> Key: HADOOP-16027
> URL: https://issues.apache.org/jira/browse/HADOOP-16027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-16027.001.patch, HADOOP-16027.002.patch
>
>
> While fixing HADOOP-15819 we found that a closed fs got into the static fs 
> cache during testing, which caused other tests to fail when the tests were 
> running sequentially.
> We should document some best practices in the testing section on the s3 docs 
> with the following:
> {panel}
> Tests using FileSystems are fastest if they can recycle the existing FS 
> instance from the same JVM. If you do that, you MUST NOT close or do unique 
> configuration on them. If you want a guarantee of 100% isolation or an 
> instance with unique config, create a new instance
> which you MUST close in the teardown to avoid leakage of resources.
> Do not add FileSystem instances (with e.g 
> org.apache.hadoop.fs.FileSystem#addFileSystemForTesting) to the cache that 
> will be modified or closed during the test runs. This can cause other tests 
> to fail when using the same modified or closed FS instance. For more details 
> see HADOOP-15819.
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15860) ABFS: Throw IllegalArgumentException when Directory/File name ends with a period(.)

2019-01-02 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15860:
---
Component/s: (was: fs/adl)
 fs/azure

> ABFS: Throw IllegalArgumentException when Directory/File name ends with a 
> period(.)
> ---
>
> Key: HADOOP-15860
> URL: https://issues.apache.org/jira/browse/HADOOP-15860
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Sean Mackrory
>Assignee: Shweta
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15860.001.patch, HADOOP-15860.002.patch, 
> HADOOP-15860.003.patch, HADOOP-15860.004.patch, HADOOP-15860.005.patch, 
> HADOOP-15860.006.patch, trailing-periods.patch
>
>
> If you create a directory with a trailing period (e.g. '/test.') the period 
> is silently dropped, and will be listed as simply '/test'. '/test.test' 
> appears to work just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15860) ABFS: Throw IllegalArgumentException when Directory/File name ends with a period(.)

2019-01-02 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15860:
---
   Resolution: Fixed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

> ABFS: Throw IllegalArgumentException when Directory/File name ends with a 
> period(.)
> ---
>
> Key: HADOOP-15860
> URL: https://issues.apache.org/jira/browse/HADOOP-15860
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.2.0
>Reporter: Sean Mackrory
>Assignee: Shweta
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15860.001.patch, HADOOP-15860.002.patch, 
> HADOOP-15860.003.patch, HADOOP-15860.004.patch, HADOOP-15860.005.patch, 
> HADOOP-15860.006.patch, trailing-periods.patch
>
>
> If you create a directory with a trailing period (e.g. '/test.') the period 
> is silently dropped, and will be listed as simply '/test'. '/test.test' 
> appears to work just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15860) ABFS: Throw IllegalArgumentException when Directory/File name ends with a period(.)

2019-01-02 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732191#comment-16732191
 ] 

Sean Mackrory edited comment on HADOOP-15860 at 1/2/19 4:19 PM:


+1 (although nit: I added whitespace before {'s where it was missing).

I continue to see the "java.lang.AssertionError: Expecting 
getIsNamespaceEnabled() return false" failure, but I believe that is just 
because of features still rolling out to the service. I'm gonna blame the 
timeouts I saw on the car dealership I'm working from this morning - I've 
tested the previous patches enough to be very confident in this one. Thanks for 
the patch, Shweta.


was (Author: mackrorysd):
+1 (although nit: I added whitespace before {'s where it was missing).

I continue to see the "java.lang.AssertionError: Expecting 
getIsNamespaceEnabled() return false" failure, but I believe that is just 
because of features still rolling out to the service.

> ABFS: Throw IllegalArgumentException when Directory/File name ends with a 
> period(.)
> ---
>
> Key: HADOOP-15860
> URL: https://issues.apache.org/jira/browse/HADOOP-15860
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.2.0
>Reporter: Sean Mackrory
>Assignee: Shweta
>Priority: Major
> Attachments: HADOOP-15860.001.patch, HADOOP-15860.002.patch, 
> HADOOP-15860.003.patch, HADOOP-15860.004.patch, HADOOP-15860.005.patch, 
> HADOOP-15860.006.patch, trailing-periods.patch
>
>
> If you create a directory with a trailing period (e.g. '/test.') the period 
> is silently dropped, and will be listed as simply '/test'. '/test.test' 
> appears to work just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15860) ABFS: Throw IllegalArgumentException when Directory/File name ends with a period(.)

2019-01-02 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732191#comment-16732191
 ] 

Sean Mackrory commented on HADOOP-15860:


+1 (although nit: I added whitespace before {'s where it was missing).

I continue to see the "java.lang.AssertionError: Expecting 
getIsNamespaceEnabled() return false" failure, but I believe that is just 
because of features still rolling out to the service.

> ABFS: Throw IllegalArgumentException when Directory/File name ends with a 
> period(.)
> ---
>
> Key: HADOOP-15860
> URL: https://issues.apache.org/jira/browse/HADOOP-15860
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.2.0
>Reporter: Sean Mackrory
>Assignee: Shweta
>Priority: Major
> Attachments: HADOOP-15860.001.patch, HADOOP-15860.002.patch, 
> HADOOP-15860.003.patch, HADOOP-15860.004.patch, HADOOP-15860.005.patch, 
> HADOOP-15860.006.patch, trailing-periods.patch
>
>
> If you create a directory with a trailing period (e.g. '/test.') the period 
> is silently dropped, and will be listed as simply '/test'. '/test.test' 
> appears to work just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2019-01-02 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732172#comment-16732172
 ] 

Sean Mackrory commented on HADOOP-15819:


I wouldn't have thought so as I think you have to go our of your way to do 
something you don't usually do to hit this, but then I don't fully understand 
why this was added in the first place. I think the patch originally came from 
you - do you happen to remember why we started adding the FS instance to the 
cache here? And if so, is that something that might occur to a future test 
writer as well?

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Adam Antal
>Priority: Critical
> Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, 
> HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, 
> S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified java file for reference. See the next 
> example of the result when it's running:
> {noformat}
> 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): 
> java.lang.RuntimeException: Using closed FS!.
>   at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73)
>   at 
> 

[jira] [Commented] (HADOOP-15860) ABFS: Throw IllegalArgumentException when Directory/File name ends with a period(.)

2018-12-27 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730004#comment-16730004
 ] 

Sean Mackrory commented on HADOOP-15860:


So the issue isn't that path is null, the issue is that path.length == 0. path 
references a valid String object, but it's trying to do .charAt(-1), which is 
invalid. I suspect that what's happening is that you're hitting root, at which 
point it's an empty string. Maybe you can just replace the original null check 
with "while (!path.isRoot())". path.getParent() is null at that point, but we 
won't even get that far if we have bounds checking problems. We also need to 
make sure you can run the ABFS tests.

> ABFS: Throw IllegalArgumentException when Directory/File name ends with a 
> period(.)
> ---
>
> Key: HADOOP-15860
> URL: https://issues.apache.org/jira/browse/HADOOP-15860
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.2.0
>Reporter: Sean Mackrory
>Assignee: Shweta
>Priority: Major
> Attachments: HADOOP-15860.001.patch, HADOOP-15860.002.patch, 
> HADOOP-15860.003.patch, HADOOP-15860.004.patch, HADOOP-15860.005.patch, 
> trailing-periods.patch
>
>
> If you create a directory with a trailing period (e.g. '/test.') the period 
> is silently dropped, and will be listed as simply '/test'. '/test.test' 
> appears to work just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2018-12-27 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory resolved HADOOP-15819.

Resolution: Fixed

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Adam Antal
>Priority: Critical
> Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, 
> HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, 
> S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified java file for reference. See the next 
> example of the result when it's running:
> {noformat}
> 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): 
> java.lang.RuntimeException: Using closed FS!.
>   at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73)
>   at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474)
>   at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
>   at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193)
>   at 
> org.apache.hadoop.fs.s3a.ITestS3AClosedFS.setup(ITestS3AClosedFS.java:40)
>   at 

[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2018-12-27 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729682#comment-16729682
 ] 

Sean Mackrory commented on HADOOP-15819:


Committed. Thank you for the excellent work here [~adam.antal].

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Adam Antal
>Priority: Critical
> Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, 
> HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, 
> S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified java file for reference. See the next 
> example of the result when it's running:
> {noformat}
> 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): 
> java.lang.RuntimeException: Using closed FS!.
>   at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73)
>   at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474)
>   at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338)
>   at 
> org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193)
>   at 
> 

[jira] [Commented] (HADOOP-15860) ABFS: Throw IllegalArgumentException when Directory/File name ends with a period(.)

2018-12-27 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729681#comment-16729681
 ] 

Sean Mackrory commented on HADOOP-15860:


Cancel that - many tests fail. Need to add a check that the string isn't empty.

{code}
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.charAt(String.java:658)
at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.trailingPeriodCheck(AzureBlobFileSystem.java:395)
{code}

> ABFS: Throw IllegalArgumentException when Directory/File name ends with a 
> period(.)
> ---
>
> Key: HADOOP-15860
> URL: https://issues.apache.org/jira/browse/HADOOP-15860
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.2.0
>Reporter: Sean Mackrory
>Assignee: Shweta
>Priority: Major
> Attachments: HADOOP-15860.001.patch, HADOOP-15860.002.patch, 
> HADOOP-15860.003.patch, HADOOP-15860.004.patch, trailing-periods.patch
>
>
> If you create a directory with a trailing period (e.g. '/test.') the period 
> is silently dropped, and will be listed as simply '/test'. '/test.test' 
> appears to work just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15860) ABFS: Throw IllegalArgumentException when Directory/File name ends with a period(.)

2018-12-26 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729291#comment-16729291
 ] 

Sean Mackrory commented on HADOOP-15860:


+1, will commit.

> ABFS: Throw IllegalArgumentException when Directory/File name ends with a 
> period(.)
> ---
>
> Key: HADOOP-15860
> URL: https://issues.apache.org/jira/browse/HADOOP-15860
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.2.0
>Reporter: Sean Mackrory
>Assignee: Shweta
>Priority: Major
> Attachments: HADOOP-15860.001.patch, HADOOP-15860.002.patch, 
> HADOOP-15860.003.patch, HADOOP-15860.004.patch, trailing-periods.patch
>
>
> If you create a directory with a trailing period (e.g. '/test.') the period 
> is silently dropped, and will be listed as simply '/test'. '/test.test' 
> appears to work just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2018-12-19 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725544#comment-16725544
 ] 

Sean Mackrory commented on HADOOP-15819:


Ah - FS.key not being set is the piece I was missing. I suppose the truly 
correct solution here would be to make sure FS.key is set properly, but I don't 
know that the benefit to that is worth a whole lot of your time. I'll hold off 
for a day in case [~ste...@apache.org] who wrote this in the first place 
disagrees. Otherwise +1 and I'll commit soon.

As a side note, I noticed StagingTestBase has a call to the same function, but 
obviously isn't causing the same issue.

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Adam Antal
>Priority: Critical
> Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, 
> HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, 
> S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified java file for reference. See the next 
> example of the result when it's running:
> {noformat}
> 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): 
> java.lang.RuntimeException: Using closed FS!.
>   at 
> 

[jira] [Comment Edited] (HADOOP-15860) ABFS: Throw IllegalArgumentException when Directory/File name ends with a period(.)

2018-12-19 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725538#comment-16725538
 ] 

Sean Mackrory edited comment on HADOOP-15860 at 12/20/18 2:59 AM:
--

So there's a checkstyle issue to fix (space before the '(') and a number of 
other minor whitespace issues like inconsistent spacing before '{'s. Can you 
also revisit the exception messages? Currently they're not entirely correct. 
For example, I'd make the following changes:
* "Cannot create a Path with names that end with a dot" -> "ABFS does not allow 
files or directories to end with a dot"
* "Path does not have a trailing period." -> "Attempt to create file that ended 
with a dot should throw IllegalArgumentException"

Also, it occurred to me that mkdirs is recursive. Someone could pass 
"/dir1/dir2./dir3/" when none of them exist and we wouldn't stop them. In 
mkdirs we should check each .parent() up to root.

When playing around with these tests to see what happened in various cases I 
got a couple of exceptions for files already existing. There's 2 reasons for 
that:
* It might be that one test run was hitting data from the previous run. Let's 
replace our use of "new Path()" with "path()". It's a helper function in the 
parent classes that gives you a path inside a randomized subdirectory to help 
prevent this. All tests should actually do this, but let's just fix the new 
ones for now.
* Let's add a comment about the actual problem here. (a) Microsoft says you 
shouldn't do this, and (b) if you do it anyway, /this./path. and /this/path are 
treated as identical in some cases. This is potentially problematic if anyone 
messes with the tests since the 2 paths in each of your tests would be equals. 
So let's have a clear comment about this and I'll elaborate in the commit 
message too.


was (Author: mackrorysd):
So there's a checkstyle issue to fix (space before the '(') and a number of 
other minor whitespace issues like inconsistent spacing before '{'s. Can you 
also revisit the exception messages? Currently they're not entirely correct. 
For example, I'd make the following changes:
* "Cannot create a Path with names that end with a dot" -> "ABFS does not allow 
files or directories to end with a dot"
* "Path does not have a trailing period." -> "Attempt to create file that ended 
with a dot should throw IllegalArgumentException"
Also, it occurred to me that mkdirs is recursive. Someone could pass 
"/dir1/dir2./dir3/" when none of them exist and we wouldn't stop them.

When playing around with these tests to see what happened in various cases I 
got a couple of exceptions for files already existing. There's 2 reasons for 
that:
* It might be that one test run was hitting data from the previous run. Let's 
replace our use of "new Path()" with "path()". It's a helper function in the 
parent classes that gives you a path inside a randomized subdirectory to help 
prevent this. All tests should actually do this, but let's just fix the new 
ones for now.
* Let's add a comment about the actual problem here. (a) Microsoft says you 
shouldn't do this, and (b) if you do it anyway, /this./path. and /this/path are 
treated as identical in some cases. This is potentially problematic if anyone 
messes with the tests since the 2 paths in each of your tests would be equals. 
So let's have a clear comment about this and I'll elaborate in the commit 
message too.

> ABFS: Throw IllegalArgumentException when Directory/File name ends with a 
> period(.)
> ---
>
> Key: HADOOP-15860
> URL: https://issues.apache.org/jira/browse/HADOOP-15860
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.2.0
>Reporter: Sean Mackrory
>Assignee: Shweta
>Priority: Major
> Attachments: HADOOP-15860.001.patch, HADOOP-15860.002.patch, 
> trailing-periods.patch
>
>
> If you create a directory with a trailing period (e.g. '/test.') the period 
> is silently dropped, and will be listed as simply '/test'. '/test.test' 
> appears to work just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15860) ABFS: Throw IllegalArgumentException when Directory/File name ends with a period(.)

2018-12-19 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725538#comment-16725538
 ] 

Sean Mackrory commented on HADOOP-15860:


So there's a checkstyle issue to fix (space before the '(') and a number of 
other minor whitespace issues like inconsistent spacing before '{'s. Can you 
also revisit the exception messages? Currently they're not entirely correct. 
For example, I'd make the following changes:
* "Cannot create a Path with names that end with a dot" -> "ABFS does not allow 
files or directories to end with a dot"
* "Path does not have a trailing period." -> "Attempt to create file that ended 
with a dot should throw IllegalArgumentException"
Also, it occurred to me that mkdirs is recursive. Someone could pass 
"/dir1/dir2./dir3/" when none of them exist and we wouldn't stop them.

When playing around with these tests to see what happened in various cases I 
got a couple of exceptions for files already existing. There's 2 reasons for 
that:
* It might be that one test run was hitting data from the previous run. Let's 
replace our use of "new Path()" with "path()". It's a helper function in the 
parent classes that gives you a path inside a randomized subdirectory to help 
prevent this. All tests should actually do this, but let's just fix the new 
ones for now.
* Let's add a comment about the actual problem here. (a) Microsoft says you 
shouldn't do this, and (b) if you do it anyway, /this./path. and /this/path are 
treated as identical in some cases. This is potentially problematic if anyone 
messes with the tests since the 2 paths in each of your tests would be equals. 
So let's have a clear comment about this and I'll elaborate in the commit 
message too.

> ABFS: Throw IllegalArgumentException when Directory/File name ends with a 
> period(.)
> ---
>
> Key: HADOOP-15860
> URL: https://issues.apache.org/jira/browse/HADOOP-15860
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.2.0
>Reporter: Sean Mackrory
>Assignee: Shweta
>Priority: Major
> Attachments: HADOOP-15860.001.patch, HADOOP-15860.002.patch, 
> trailing-periods.patch
>
>
> If you create a directory with a trailing period (e.g. '/test.') the period 
> is silently dropped, and will be listed as simply '/test'. '/test.test' 
> appears to work just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2018-12-18 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724301#comment-16724301
 ] 

Sean Mackrory commented on HADOOP-15819:


The only failure I see with this patch is the bouncycastle one. I don't have an 
objection to removing this - it's only for a slight perf improvement on tests. 
I'm curious to know why this is resulting an a distinct cache entry. Is the URL 
slightly different at this point than it is in FS.newInstance() or something?

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Adam Antal
>Priority: Critical
> Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, 
> HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, 
> S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified java file for reference. See the next 
> example of the result when it's running:
> {noformat}
> 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): 
> java.lang.RuntimeException: Using closed FS!.
>   at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73)
>   at 
> 

[jira] [Comment Edited] (HADOOP-15860) ABFS: Throw IllegalArgumentException when Directory/File name ends with a period(.)

2018-12-14 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721703#comment-16721703
 ] 

Sean Mackrory edited comment on HADOOP-15860 at 12/14/18 7:30 PM:
--

Also instead of just catching the exceptions, we should ensure we fail if an 
exception is thrown. e.g. catch block changes a flag from false to true, then 
we assert that it's true.

edit:

Just to add to my previous comments, 
ITestAbfsFileSystemContractMkdir>AbstractContractMkdirTest.testMkdirSlashHandling
 fails with this patch. Make sure you run all the ABFS tests to check for 
regressions.

I did some playing with these same APIs trying to make a file that ended with a 
slash. It does actually fail explicitly on the server-side (instead of silently 
losing the period like happens with trailing periods), and it is blocked by 
other code or otherwise valid (like when specifying a destination directory). 
So I think we can simply ignore trailing slashes.


was (Author: mackrorysd):
Also instead of just catching the exceptions, we should ensure we fail if an 
exception is thrown. e.g. catch block changes a flag from false to true, then 
we assert that it's true.

> ABFS: Throw IllegalArgumentException when Directory/File name ends with a 
> period(.)
> ---
>
> Key: HADOOP-15860
> URL: https://issues.apache.org/jira/browse/HADOOP-15860
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.2.0
>Reporter: Sean Mackrory
>Assignee: Shweta
>Priority: Major
> Attachments: HADOOP-15860.001.patch, trailing-periods.patch
>
>
> If you create a directory with a trailing period (e.g. '/test.') the period 
> is silently dropped, and will be listed as simply '/test'. '/test.test' 
> appears to work just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15860) ABFS: Throw IllegalArgumentException when Directory/File name ends with a period(.)

2018-12-14 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721703#comment-16721703
 ] 

Sean Mackrory commented on HADOOP-15860:


Also instead of just catching the exceptions, we should ensure we fail if an 
exception is thrown. e.g. catch block changes a flag from false to true, then 
we assert that it's true.

> ABFS: Throw IllegalArgumentException when Directory/File name ends with a 
> period(.)
> ---
>
> Key: HADOOP-15860
> URL: https://issues.apache.org/jira/browse/HADOOP-15860
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.2.0
>Reporter: Sean Mackrory
>Assignee: Shweta
>Priority: Major
> Attachments: HADOOP-15860.001.patch, trailing-periods.patch
>
>
> If you create a directory with a trailing period (e.g. '/test.') the period 
> is silently dropped, and will be listed as simply '/test'. '/test.test' 
> appears to work just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15860) ABFS: Throw IllegalArgumentException when Directory/File name ends with a period(.)

2018-12-13 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720899#comment-16720899
 ] 

Sean Mackrory commented on HADOOP-15860:


Thanks for the patch [~shwetayakkali]. For the most part it looks good, but I 
think we need to consider the trailing slash a little harder. A destination 
path absolutely can end in a trailing slash (e.g. mv /some-file 
/some-existing-directory/). On the other hand, I'm actually not sure how or if 
one can create a file that ends with a trailing slash. I suspect that the Path 
class already handles the trailing slash as much as we need it to, and we 
should just check for trailing periods. But we should dig and confirm that the 
edge cases here are working correctly.

> ABFS: Throw IllegalArgumentException when Directory/File name ends with a 
> period(.)
> ---
>
> Key: HADOOP-15860
> URL: https://issues.apache.org/jira/browse/HADOOP-15860
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.2.0
>Reporter: Sean Mackrory
>Assignee: Shweta
>Priority: Major
> Attachments: HADOOP-15860.001.patch, trailing-periods.patch
>
>
> If you create a directory with a trailing period (e.g. '/test.') the period 
> is silently dropped, and will be listed as simply '/test'. '/test.test' 
> appears to work just fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15999) [s3a] Better support for out-of-band operations

2018-12-12 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719468#comment-16719468
 ] 

Sean Mackrory edited comment on HADOOP-15999 at 12/12/18 9:35 PM:
--

{quote}We might just want to make that configurable (separate config knob 
probably). If we are in "check both MS and S3" mode, we probably want a 
configurable or pluggable conflict policy.{quote}

Yeah - I also considered addressing the out-of-band deletes problem with a 
config (or 2) that governs whether we create and / or honor tombstones. But 
that's adding exposed complexity and isn't very elegant. If we can relatively 
easily just start comparing modification times, then we can fix all these use 
cases and offer 2 basic modes:

- S3Guard with authoritative mode, in which the MetadataStore is the source of 
truth and we can assume All The Things.
- S3Guard without authoritative mode, in which S3 is the source of truth. We 
will always be at least as up to date as S3 appears, and will fix list 
consistency as long as S3 doesn't give us evidence to the contrary (i.e. older 
modification times or the lack of an update entirely).

I feel very uncomfortable with the idea of some middle ground where S3Guard 
can't be the source of truth, but we're still trying to be in some cases. It 
either has all the context or it doesn't, and if it doesn't we're trading in 
correctness for some performance, which I think is the wrong trade-off.


was (Author: mackrorysd):
{quote}We might just want to make that configurable (separate config knob 
probably). If we are in "check both MS and S3" mode, we probably want a 
configurable or pluggable conflict policy.{quote}

Yeah - I also considered addressing the out-of-band deletes problem with a 
config (or 2) that governs whether we create and / or honor tombstones. But 
that's adding exposed complexity and isn't very elegant. If we can relative 
easily just start comparing modification times, then we can offer 2 basic modes:

- S3Guard with authoritative mode, in which the MetadataStore is the source of 
truth and we can assume All The Things.
- S3Guard without authoritative mode, in which S3 is the source of truth. We 
will always be at least as up to date as S3 appears, and will fix list 
consistency as long as S3 doesn't give us evidence to the contrary (i.e. older 
modification times or the lack of an update entirely).

> [s3a] Better support for out-of-band operations
> ---
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Attachments: out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15999) [s3a] Better support for out-of-band operations

2018-12-12 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719468#comment-16719468
 ] 

Sean Mackrory commented on HADOOP-15999:


{quote}We might just want to make that configurable (separate config knob 
probably). If we are in "check both MS and S3" mode, we probably want a 
configurable or pluggable conflict policy.{quote}

Yeah - I also considered addressing the out-of-band deletes problem with a 
config (or 2) that governs whether we create and / or honor tombstones. But 
that's adding exposed complexity and isn't very elegant. If we can relative 
easily just start comparing modification times, then we can offer 2 basic modes:

- S3Guard with authoritative mode, in which the MetadataStore is the source of 
truth and we can assume All The Things.
- S3Guard without authoritative mode, in which S3 is the source of truth. We 
will always be at least as up to date as S3 appears, and will fix list 
consistency as long as S3 doesn't give us evidence to the contrary (i.e. older 
modification times or the lack of an update entirely).

> [s3a] Better support for out-of-band operations
> ---
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Sean Mackrory
>Assignee: Gabor Bota
>Priority: Major
> Attachments: out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15988) Should be able to set empty directory flag to TRUE in DynamoDBMetadataStore#innerGet when using authoritative directory listings

2018-12-12 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15988:
---
   Resolution: Fixed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

Looks good to me. I did remove the iff -> if change you made between .001. and 
.002. as I believe that's intentionally (if-and-only-if).

> Should be able to set empty directory flag to TRUE in 
> DynamoDBMetadataStore#innerGet when using authoritative directory listings
> 
>
> Key: HADOOP-15988
> URL: https://issues.apache.org/jira/browse/HADOOP-15988
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15988.001.patch, HADOOP-15988.002.patch
>
>
> We have the following comment and implementation in DynamoDBMetadataStore:
> {noformat}
> // When this class has support for authoritative
> // (fully-cached) directory listings, we may also be able to answer
> // TRUE here.  Until then, we don't know if we have full listing or
> // not, thus the UNKNOWN here:
> meta.setIsEmptyDirectory(
> hasChildren ? Tristate.FALSE : Tristate.UNKNOWN);
> {noformat}
> We have authoritative listings now in dynamo since HADOOP-15621, so we should 
> resolve this comment, implement the solution and test it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15428) s3guard bucket-info will create s3guard table if FS is set to do this automatically

2018-12-12 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719117#comment-16719117
 ] 

Sean Mackrory commented on HADOOP-15428:


Yeah that's more direct. Tweaked it a bit and updated. I'm happy with it, so 
I'll resolve. Feel free to discuss if there's more feedback...

> s3guard bucket-info will create s3guard table if FS is set to do this 
> automatically
> ---
>
> Key: HADOOP-15428
> URL: https://issues.apache.org/jira/browse/HADOOP-15428
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15428.001.patch
>
>
> If you call hadoop s3guard bucket-info on a bucket where the fs is set to 
> create a s3guard table on demand, then the DDB table is automatically 
> created. As a result
> the {{bucket-info -unguarded}} option cannot be used, and the call has 
> significant side effects (i.e. it can run up bills)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15428) s3guard bucket-info will create s3guard table if FS is set to do this automatically

2018-12-12 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15428:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> s3guard bucket-info will create s3guard table if FS is set to do this 
> automatically
> ---
>
> Key: HADOOP-15428
> URL: https://issues.apache.org/jira/browse/HADOOP-15428
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15428.001.patch
>
>
> If you call hadoop s3guard bucket-info on a bucket where the fs is set to 
> create a s3guard table on demand, then the DDB table is automatically 
> created. As a result
> the {{bucket-info -unguarded}} option cannot be used, and the call has 
> significant side effects (i.e. it can run up bills)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15428) s3guard bucket-info will create s3guard table if FS is set to do this automatically

2018-12-12 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15428:
---
Release Note: If -unguarded flag is passed to `hadoop s3guard bucket-info`, 
it will now proceed with S3Guard disabled instead of failing if S3Guard is not 
already disabled.  (was: The -unguarded flag, passed to `hadoop s3guard 
bucket-info` will no proceed with S3Guard disabled instead of failing if 
S3Guard is not already disabled.)

> s3guard bucket-info will create s3guard table if FS is set to do this 
> automatically
> ---
>
> Key: HADOOP-15428
> URL: https://issues.apache.org/jira/browse/HADOOP-15428
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15428.001.patch
>
>
> If you call hadoop s3guard bucket-info on a bucket where the fs is set to 
> create a s3guard table on demand, then the DDB table is automatically 
> created. As a result
> the {{bucket-info -unguarded}} option cannot be used, and the call has 
> significant side effects (i.e. it can run up bills)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15999) [s3a] Better support for out-of-band operations

2018-12-11 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15999:
---
Attachment: out-of-band-operations.patch

> [s3a] Better support for out-of-band operations
> ---
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Mackrory
>Priority: Major
> Attachments: out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15841) ABFS: change createRemoteFileSystemDuringInitialization default to true

2018-12-11 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory resolved HADOOP-15841.

Resolution: Won't Fix

> ABFS: change createRemoteFileSystemDuringInitialization default to true
> ---
>
> Key: HADOOP-15841
> URL: https://issues.apache.org/jira/browse/HADOOP-15841
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Major
>
> I haven't seen a way to create a working container (at least for the dfs 
> endpoint) except for setting 
> fs.azure.createRemoteFileSystemDuringInitialization=true. I personally don't 
> see that much of a downside to having it default to true, and it's a mild 
> inconvenience to remember to set it to true for some action to create a 
> container. I vaguely recall [~tmarquardt] considering changing this default 
> too.
> I propose we do it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15999) [s3a] Better support for out-of-band operations

2018-12-11 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718241#comment-16718241
 ] 

Sean Mackrory commented on HADOOP-15999:


Attaching a test that illustrates some of the nuances of the issue when 
overwriting files. The test of deleted files doesn't fail with S3Guard, which 
surprises me, as I know this same basic scenario has caused some problems. Need 
to dig deeper into why this doesn't reproduce that behavior at some point...

> [s3a] Better support for out-of-band operations
> ---
>
> Key: HADOOP-15999
> URL: https://issues.apache.org/jira/browse/HADOOP-15999
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Mackrory
>Priority: Major
> Attachments: out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15999) [s3a] Better support for out-of-band operations

2018-12-11 Thread Sean Mackrory (JIRA)
Sean Mackrory created HADOOP-15999:
--

 Summary: [s3a] Better support for out-of-band operations
 Key: HADOOP-15999
 URL: https://issues.apache.org/jira/browse/HADOOP-15999
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Sean Mackrory


S3Guard was initially done on the premise that a new MetadataStore would be the 
source of truth, and that it wouldn't provide guarantees if updates were done 
without using S3Guard.

I've been seeing increased demand for better support for scenarios where 
operations are done on the data that can't reasonably be done with S3Guard 
involved. For example:
* A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
can't tell the difference between the new file and delete / list inconsistency 
and continues to treat the file as deleted.
* An S3Guard-ed file is overwritten by a longer file by some other tool. When 
reading the file, only the length of the original file is read.

We could possibly have smarter behavior here by querying both S3 and the 
MetadataStore (even in cases where we may currently only query the 
MetadataStore in getFileStatus) and use whichever one has the higher modified 
time.

This kills the performance boost we currently get in some workloads with the 
short-circuited getFileStatus, but we could keep it with authoritative mode 
which should give a larger performance boost. At least we'd get more 
correctness without authoritative mode and a clear declaration of when we can 
make the assumptions required to short-circuit the process. If we can't 
consider S3Guard the source of truth, we need to defer to S3 more.

We'd need to be extra sure of any locality / time zone issues if we start 
relying on mod_time more directly, but currently we're tracking the 
modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15845) s3guard init and destroy command will create/destroy tables if ddb.table & region are set

2018-12-11 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15845:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> s3guard init and destroy command will create/destroy tables if ddb.table & 
> region are set
> -
>
> Key: HADOOP-15845
> URL: https://issues.apache.org/jira/browse/HADOOP-15845
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15845.001.patch
>
>
> If you have s3guard set up with a table name and a region, then s3guard init 
> will automatically create the table, without you specifying a bucket or URI.
> I had expected the command just to print out its arguments, but it actually 
> did the init with the default bucket values
> Even worse, `hadoop s3guard destroy` will destroy the table. 
> This is too dangerous to allow. The command must require either the name of a 
> bucket or an an explicit ddb table URI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15987) ITestDynamoDBMetadataStore should check if test ddb table set properly before initializing the test

2018-12-11 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15987:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed. Not worth obsessing too much over the wording of an exception that 
only occurs in tests. The update to the documentation is sufficiently clear.

> ITestDynamoDBMetadataStore should check if test ddb table set properly before 
> initializing the test
> ---
>
> Key: HADOOP-15987
> URL: https://issues.apache.org/jira/browse/HADOOP-15987
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15987.001.patch
>
>
> The jira covers the following:
> * We should assert that the table name is configured when 
> DynamoDBMetadataStore is used for testing, so the test should fail if it's 
> not configured.
> * We should assert that the test table is not the same as the production 
> table, as the test table could be modified and destroyed multiple times 
> during the test.
> * This behavior should be added to the testing docs.
> [Assume from junit 
> doc|http://junit.sourceforge.net/javadoc/org/junit/Assume.html]:
> {noformat}
> A set of methods useful for stating assumptions about the conditions in which 
> a test is meaningful. 
> A failed assumption does not mean the code is broken, but that the test 
> provides no useful information. 
> The default JUnit runner treats tests with failing assumptions as ignored.
> {noformat}
> A failed assert will cause test failure instead of just skipping the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15845) s3guard init and destroy command will create/destroy tables if ddb.table & region are set

2018-12-11 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-15845:
---
Release Note: `hadoop s3guard destroy` and `hadoop s3guard init` now 
require a MetadataStore URI or a S3A URI to be explicitly provided on the 
command-line to perform the action.

+1 - looks good to me and tests pass.

> s3guard init and destroy command will create/destroy tables if ddb.table & 
> region are set
> -
>
> Key: HADOOP-15845
> URL: https://issues.apache.org/jira/browse/HADOOP-15845
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Steve Loughran
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15845.001.patch
>
>
> If you have s3guard set up with a table name and a region, then s3guard init 
> will automatically create the table, without you specifying a bucket or URI.
> I had expected the command just to print out its arguments, but it actually 
> did the init with the default bucket values
> Even worse, `hadoop s3guard destroy` will destroy the table. 
> This is too dangerous to allow. The command must require either the name of a 
> bucket or an an explicit ddb table URI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15988) Should be able to set empty directory flag to TRUE in DynamoDBMetadataStore#innerGet when using authoritative directory listings

2018-12-10 Thread Sean Mackrory (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715805#comment-16715805
 ] 

Sean Mackrory commented on HADOOP-15988:


Good stuff. A few nits:
* Can you clean up the checkstyle issues that were raised?
* Most of the code between the 2 tests is shared. Can we refactor that into a 
single test that just tests the same sequence with a different auth value and 
outcome? If that turns out to be messy for some reason it's not a deal breaker, 
but worth a couple of minutes if that's all it takes.

+1 otherwise.

> Should be able to set empty directory flag to TRUE in 
> DynamoDBMetadataStore#innerGet when using authoritative directory listings
> 
>
> Key: HADOOP-15988
> URL: https://issues.apache.org/jira/browse/HADOOP-15988
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
> Attachments: HADOOP-15988.001.patch
>
>
> We have the following comment and implementation in DynamoDBMetadataStore:
> {noformat}
> // When this class has support for authoritative
> // (fully-cached) directory listings, we may also be able to answer
> // TRUE here.  Until then, we don't know if we have full listing or
> // not, thus the UNKNOWN here:
> meta.setIsEmptyDirectory(
> hasChildren ? Tristate.FALSE : Tristate.UNKNOWN);
> {noformat}
> We have authoritative listings now in dynamo since HADOOP-15621, so we should 
> resolve this comment, implement the solution and test it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >