[jira] [Resolved] (HADOOP-18680) Insufficient heap during full test runs in Docker container.

2023-04-03 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HADOOP-18680.

Target Version/s:   (was: 3.3.6)
  Resolution: Fixed

> Insufficient heap during full test runs in Docker container.
> 
>
> Key: HADOOP-18680
> URL: https://issues.apache.org/jira/browse/HADOOP-18680
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
>
> During verification of releases on the 3.3 line, I often run out of heap 
> during full test runs inside the Docker container. Let's increase the default 
> in {{MAVEN_OPTS}} to match trunk.
> Additionally, on trunk, the settings are different in Dockerfile vs. 
> Dockerfile_aarch64. We can align those.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18680) Insufficient heap during full test runs in Docker container.

2023-04-03 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-18680:
---
Fix Version/s: 3.4.0
   3.3.6

> Insufficient heap during full test runs in Docker container.
> 
>
> Key: HADOOP-18680
> URL: https://issues.apache.org/jira/browse/HADOOP-18680
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
>
> During verification of releases on the 3.3 line, I often run out of heap 
> during full test runs inside the Docker container. Let's increase the default 
> in {{MAVEN_OPTS}} to match trunk.
> Additionally, on trunk, the settings are different in Dockerfile vs. 
> Dockerfile_aarch64. We can align those.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18680) Insufficient heap during full test runs in Docker container.

2023-03-30 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-18680:
---
Description: 
During verification of releases on the 3.3 line, I often run out of heap during 
full test runs inside the Docker container. Let's increase the default in 
{{MAVEN_OPTS}} to match trunk.

Additionally, on trunk, the settings are different in Dockerfile vs. 
Dockerfile_aarch64. We can align those.

  was:During verification of releases on the 3.3 line, I often run out of heap 
during full test runs inside the Docker container. Let's increase the default 
in {{MAVEN_OPTS}} to match trunk.

Summary: Insufficient heap during full test runs in Docker container.  
(was: Insufficient heap during full test runs in Docker container on 
branch-3.3.)

> Insufficient heap during full test runs in Docker container.
> 
>
> Key: HADOOP-18680
> URL: https://issues.apache.org/jira/browse/HADOOP-18680
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: pull-request-available
>
> During verification of releases on the 3.3 line, I often run out of heap 
> during full test runs inside the Docker container. Let's increase the default 
> in {{MAVEN_OPTS}} to match trunk.
> Additionally, on trunk, the settings are different in Dockerfile vs. 
> Dockerfile_aarch64. We can align those.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18680) Insufficient heap during full test runs in Docker container on branch-3.3.

2023-03-24 Thread Chris Nauroth (Jira)
Chris Nauroth created HADOOP-18680:
--

 Summary: Insufficient heap during full test runs in Docker 
container on branch-3.3.
 Key: HADOOP-18680
 URL: https://issues.apache.org/jira/browse/HADOOP-18680
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Reporter: Chris Nauroth
Assignee: Chris Nauroth


During verification of releases on the 3.3 line, I often run out of heap during 
full test runs inside the Docker container. Let's increase the default in 
{{MAVEN_OPTS}} to match trunk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18677) Hadoop "current" documentation link broken after release 3.3.5.

2023-03-23 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17704361#comment-17704361
 ] 

Chris Nauroth commented on HADOOP-18677:


[~ste...@apache.org], thanks for the review, and no worries. This was easy to 
miss. One of my browser sessions kept on showing the cached 3.3.4 content for a 
while.

> Hadoop "current" documentation link broken after release 3.3.5.
> ---
>
> Key: HADOOP-18677
> URL: https://issues.apache.org/jira/browse/HADOOP-18677
> Project: Hadoop Common
>  Issue Type: Task
>  Components: site
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> From hadoop.apache.org, access Documentation -> Current, leading to:
> https://hadoop.apache.org/docs/current/
> This results in a Forbidden response, seemingly since completion of the 3.3.5 
> release. (To see this, you might need to refresh your browser if it's still 
> serving a cached copy of the 3.3.4 docs from this link.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18677) Hadoop "current" documentation link broken after release 3.3.5.

2023-03-23 Thread Chris Nauroth (Jira)
Chris Nauroth created HADOOP-18677:
--

 Summary: Hadoop "current" documentation link broken after release 
3.3.5.
 Key: HADOOP-18677
 URL: https://issues.apache.org/jira/browse/HADOOP-18677
 Project: Hadoop Common
  Issue Type: Task
  Components: site
Reporter: Chris Nauroth
Assignee: Chris Nauroth


>From hadoop.apache.org, access Documentation -> Current, leading to:

https://hadoop.apache.org/docs/current/

This results in a Forbidden response, seemingly since completion of the 3.3.5 
release. (To see this, you might need to refresh your browser if it's still 
serving a cached copy of the 3.3.4 docs from this link.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18582) No need to clean tmp files in distcp direct mode

2023-01-24 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HADOOP-18582.

Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> No need to clean tmp files in distcp direct mode
> 
>
> Key: HADOOP-18582
> URL: https://issues.apache.org/jira/browse/HADOOP-18582
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.3.4
>Reporter: 1kang
>Assignee: 1kang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> it not necessary to do `cleanupTempFiles`  while ditcp commit job in direct  
> mode, because it there is no temp files in direct mode.
> This clean operation will increase the task execution time, because it will 
> get the list of files in the target path. When the number of files in the 
> target path is very large, this operation will be very slow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Moved] (HADOOP-18599) Expose `listStatus(Path path, String startFrom)` on `AzureBlobFileSystem`

2023-01-18 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth moved HDFS-16894 to HADOOP-18599:
---

  Component/s: fs/azure
   (was: fs/azure)
Fix Version/s: (was: 3.3.2)
  Key: HADOOP-18599  (was: HDFS-16894)
Affects Version/s: 3.3.4
   3.3.2
   (was: 3.3.2)
   (was: 3.3.4)
  Project: Hadoop Common  (was: Hadoop HDFS)

> Expose `listStatus(Path path, String startFrom)` on `AzureBlobFileSystem`
> -
>
> Key: HADOOP-18599
> URL: https://issues.apache.org/jira/browse/HADOOP-18599
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 3.3.4, 3.3.2
>Reporter: Thomas Newton
>Priority: Minor
>
> When working with Azure blob storage listing operations can often be quite 
> slow even on storage accounts with the hierarchical namespace. 
> This can be mitigated by listing only a specific subset of directories using 
> a function like 
> [https://hadoop.apache.org/docs/r3.3.4/api/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.html#listStatus-org.apache.hadoop.fs.Path-java.lang.String-org.apache.hadoop.fs.azurebfs.utils.TracingContext-]
> Which accepts a `startFrom` argument and lists all files in order starting 
> from there.
> I'm wondering if we could add a method to the `AzureBlobFileSystem`
> Something like:
> ```
> public FileStatus[] listStatus(final Path f, final String startFrom) throws 
> IOException
> ```
> This exposes the functionality that already exists on the underlying 
> `AzureBlobFileSystemStore`. My understanding from reading a bit of the code 
> is that users should mainly be dealing with `AzureBlobFileSystem`s and 
> `AzureBlobFileSystem` seem easier to use to me hence the benefit of exposing 
> it on the `AzureBlobFileSystem`.
>  
> I'm very un-familiar with java but I'm told that keeping strictly to 
> interfaces is strongly preferred. However I can see some examples already on 
> `AzureBlobFileSystem` that do not belong to any interface (e.g. `breakLease`) 
> so I'm hoping its acceptable to add a method like I described only for the 
> one `FileSystem` implementation.
>  
> The specific motivation for this is to unblock 
> [https://github.com/delta-io/delta/issues/1568]
> I would be willing to contribute this if maintainers think the plan is 
> reasonable. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18591) Fix a typo in Trash

2023-01-12 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HADOOP-18591.

Fix Version/s: 3.4.0
   3.2.5
   3.3.9
   Resolution: Fixed

> Fix a typo in Trash
> ---
>
> Key: HADOOP-18591
> URL: https://issues.apache.org/jira/browse/HADOOP-18591
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Reporter: xiaoping.huang
>Assignee: xiaoping.huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.9
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18591) Fix a typo in Trash

2023-01-12 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-18591:
---
Component/s: documentation

> Fix a typo in Trash
> ---
>
> Key: HADOOP-18591
> URL: https://issues.apache.org/jira/browse/HADOOP-18591
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Reporter: xiaoping.huang
>Assignee: xiaoping.huang
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18591) Fix a typo in Trash

2023-01-10 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HADOOP-18591:
--

Assignee: xiaoping.huang

> Fix a typo in Trash
> ---
>
> Key: HADOOP-18591
> URL: https://issues.apache.org/jira/browse/HADOOP-18591
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: xiaoping.huang
>Assignee: xiaoping.huang
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18590) Publish SBOM artifacts

2023-01-09 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HADOOP-18590.

Fix Version/s: 3.4.0
   3.2.5
   3.3.9
 Assignee: Dongjoon Hyun
   Resolution: Fixed

> Publish SBOM artifacts
> --
>
> Key: HADOOP-18590
> URL: https://issues.apache.org/jira/browse/HADOOP-18590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.9
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18587) upgrade to jettison 1.5.3 due to security issue

2023-01-06 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HADOOP-18587.

Fix Version/s: 3.4.0
   3.3.9
 Assignee: PJ Fanning
   Resolution: Fixed

I have committed this to trunk and branch-3.3, after resolving a minor merge 
conflict in LICENSE-binary.

@pjfanning , thank you for the contribution.

> upgrade to jettison 1.5.3 due to security issue
> ---
>
> Key: HADOOP-18587
> URL: https://issues.apache.org/jira/browse/HADOOP-18587
> Project: Hadoop Common
>  Issue Type: Task
>  Components: common
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> [https://github.com/advisories/GHSA-x27m-9w8j-5vcw]
>  
> [https://github.com/jettison-json/jettison/releases]
> v1.5.2 is flagged as fixing a CVE but a v1.5.3 was quickly released and 
> appears ti fix some regressions caused by v1.5.2.
> Many hadoop tests fail when jettison 1.5.2 is used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18587) upgrade to jettison 1.5.3 due to security issue

2023-01-05 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-18587:
---
Summary: upgrade to jettison 1.5.3 due to security issue  (was: upgrade to 
jettison 1.5.2 due to security issue)

> upgrade to jettison 1.5.3 due to security issue
> ---
>
> Key: HADOOP-18587
> URL: https://issues.apache.org/jira/browse/HADOOP-18587
> Project: Hadoop Common
>  Issue Type: Task
>  Components: common
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>
> [https://github.com/advisories/GHSA-x27m-9w8j-5vcw]
>  
> [https://github.com/jettison-json/jettison/releases]
> v1.5.2 is flagged as fixing a CVE but a v1.5.3 was quickly released and 
> appears ti fix some regressions caused by v1.5.2.
> Many hadoop tests fail when jettison 1.5.2 is used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18582) No need to clean tmp files in ditcp direct mode

2022-12-22 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HADOOP-18582:
--

Assignee: 1kang

> No need to clean tmp files in ditcp direct mode
> ---
>
> Key: HADOOP-18582
> URL: https://issues.apache.org/jira/browse/HADOOP-18582
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.3.4
>Reporter: 1kang
>Assignee: 1kang
>Priority: Major
>  Labels: pull-request-available
>
> it not necessary to do `cleanupTempFiles`  while ditcp commit job in direct  
> mode, because it there is no temp files in direct mode.
> This clean operation will increase the task execution time, because it will 
> get the list of files in the target path. When the number of files in the 
> target path is very large, this operation will be very slow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-11899) Remove dedicated checkstyle rules in hadoop-tools/hadoop-azure

2022-10-25 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HADOOP-11899:
--

Assignee: (was: Chris Nauroth)

As of today, the hadoop-tools/hadoop-azure codebase still diverges in several 
ways from the common Checkstyle ruleset. (One example is line length.) It still 
needs its own dedicated hadoop-tools/hadoop-azure/src/config/checkstyle.xml.

Consolidating this to the shared checkstyle.xml would require a mass code 
reformatting. I'm no longer actively maintaining hadoop-azure, so I'll have to 
unassign this and see if anyone else wants to pick it up. (I'd be happy to help 
with code review.)

> Remove dedicated checkstyle rules in hadoop-tools/hadoop-azure
> --
>
> Key: HADOOP-11899
> URL: https://issues.apache.org/jira/browse/HADOOP-11899
> Project: Hadoop Common
>  Issue Type: Task
>Affects Versions: 2.7.0
>Reporter: Gera Shegalov
>Priority: Minor
>
> HADOOP-11889 got rid of duplicate checkstyle rules. However, 
> hadoop-tools/hadoop-azure had some unique rules such as 160-column line 
> length. The purpose of this JIRA to discuss whether the dedicated checkstyle 
> overrides are really required.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17948) JAR in conflict with timestamp check causes AM errors

2022-07-20 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HADOOP-17948.

Resolution: Duplicate

I'm closing this as a duplicate of YARN-3606. I don't expect there will be 
changes made in this area, because the timestamp check has worked well for a 
fast and lightweight check of unexpected resource changes. YARN-3606 contains 
more discussion.

> JAR in conflict with timestamp check causes AM errors
> -
>
> Key: HADOOP-17948
> URL: https://issues.apache.org/jira/browse/HADOOP-17948
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.9.2
>Reporter: Michael Taylor
>Priority: Blocker
>
> After an init action pulls down a new JAR and the check of a JAR's timestamp 
> is performed  [1]we can sometimes cause an incorrect error if the timestamp 
> does not match. In order to address this you can perform workarounds like:
> record old timestamp at the beginning before the connector is changed
> local -r old_file_time=$(date -r ${dataproc_common_lib_dir}/gcs-connector.jar 
> "+%m%d%H%M.00")
> # at end of script.
> touch -t "${old_file_time}" 
> touch -h -t "${old_file_time}" 
> We should instead of checking the date be comparing version compatibility 
> tests.
>  
>  
> 1. 
> https://github.com/apache/hadoop/blob/release-2.7.3-RC2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java#L255-L258



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18300) Update Gson to 2.9.0

2022-06-22 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HADOOP-18300.

Fix Version/s: 3.4.0
   3.2.4
   3.3.9
 Hadoop Flags: Reviewed
   Resolution: Fixed

I have committed this to trunk, branch-3.3 and branch-3.2. [~medb], thank you 
for the contribution. [~ayushtkn], thank you for code reviewing.

> Update Gson to 2.9.0
> 
>
> Key: HADOOP-18300
> URL: https://issues.apache.org/jira/browse/HADOOP-18300
> Project: Hadoop Common
>  Issue Type: Task
>  Components: build
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.9
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Update to the Gson 2.9.0 that has many 
> [fixes|https://github.com/google/gson/releases/tag/gson-parent-2.9.0], and 
> backward-compatible as long as Java 7+ is used.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13464) update GSON to 2.7+

2021-12-30 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HADOOP-13464.

Fix Version/s: 3.4.0
   3.2.4
   3.3.3
   Resolution: Fixed

+1. I have committed this to trunk, branch-3.3 and branch-3.2.

[~medb], thank you for the contribution.

> update GSON to 2.7+
> ---
>
> Key: HADOOP-13464
> URL: https://issues.apache.org/jira/browse/HADOOP-13464
> Project: Hadoop Common
>  Issue Type: Task
>  Components: build
>Reporter: Sean Busbey
>Assignee: Igor Dvorzhak
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> our GSON version is from ~3 years ago. update to latest release.
> try to check release notes to see if this is incompatible.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup

2021-09-03 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-15129:
---
Fix Version/s: 3.2.4
   3.3.2
   3.4.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I have committed this to trunk, branch-3.3 and branch-3.2.  I didn't end up 
merging down to the 2.x line like I said I would, because I retested on 2.x, 
and the bug isn't present there.

[~Karthik Palaniappan], thank you for providing the original patch.  Thank you 
to all of the reviewers and [~ywskycn] for the final review.

> Datanode caches namenode DNS lookup failure and cannot startup
> --
>
> Key: HADOOP-15129
> URL: https://issues.apache.org/jira/browse/HADOOP-15129
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.8.2
> Environment: Google Compute Engine.
> I'm using Java 8, Debian 8, Hadoop 2.8.2.
>Reporter: Karthik Palaniappan
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> On startup, the Datanode creates an InetSocketAddress to register with each 
> namenode. Though there are retries on connection failure throughout the 
> stack, the same InetSocketAddress is reused.
> InetSocketAddress is an interesting class, because it resolves DNS names to 
> IP addresses on construction, and it is never refreshed. Hadoop re-creates an 
> InetSocketAddress in some cases just in case the remote IP has changed for a 
> particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472.
> Anyway, on startup, you cna see the Datanode log: "Namenode...remains 
> unresolved" -- referring to the fact that DNS lookup failed.
> {code:java}
> 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Refresh request received for nameservices: null
> 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode 
> for null remains unresolved for ID null. Check your hdfs-site.xml file to 
> ensure namenodes are configured properly.
> 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Starting BPOfferServices for nameservices: 
> 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool  (Datanode Uuid unassigned) service to 
> cluster-32f5-m:8020 starting to offer service
> {code}
> The Datanode then proceeds to use this unresolved address, as it may work if 
> the DN is configured to use a proxy. Since I'm not using a proxy, it forever 
> prints out this message:
> {code:java}
> 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> {code}
> Unfortunately, the log doesn't contain the exception that triggered it, but 
> the culprit is actually in IPC Client: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444.
> This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 
> to give a clear error message when somebody mispells an address.
> However, the fix in HADOOP-7472 doesn't apply here, because that code happens 
> in Client#getConnection after the Connection is constructed.
> My proposed fix (will attach a patch) is to move this exception out of the 
> constructor and into a place that will trigger HADOOP-7472's logic to 
> re-resolve addresses. If the DNS failure was temporary, this will allow the 
> connection to succeed. If not, the connection will fail after ipc client 
> retries (default 10 seconds worth of retries).
> I want to fix this in ipc client rather than just in Datanode startup, as 
> this fixes temporary DNS issues for all of Hadoop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup

2021-09-01 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408264#comment-17408264
 ] 

Chris Nauroth commented on HADOOP-15129:


[~ywskycn], thank you for the +1.  I'm planning to commit this but will give it 
a little more time in case the earlier reviewers still want to add feedback.

> Datanode caches namenode DNS lookup failure and cannot startup
> --
>
> Key: HADOOP-15129
> URL: https://issues.apache.org/jira/browse/HADOOP-15129
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.8.2
> Environment: Google Compute Engine.
> I'm using Java 8, Debian 8, Hadoop 2.8.2.
>Reporter: Karthik Palaniappan
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> On startup, the Datanode creates an InetSocketAddress to register with each 
> namenode. Though there are retries on connection failure throughout the 
> stack, the same InetSocketAddress is reused.
> InetSocketAddress is an interesting class, because it resolves DNS names to 
> IP addresses on construction, and it is never refreshed. Hadoop re-creates an 
> InetSocketAddress in some cases just in case the remote IP has changed for a 
> particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472.
> Anyway, on startup, you cna see the Datanode log: "Namenode...remains 
> unresolved" -- referring to the fact that DNS lookup failed.
> {code:java}
> 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Refresh request received for nameservices: null
> 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode 
> for null remains unresolved for ID null. Check your hdfs-site.xml file to 
> ensure namenodes are configured properly.
> 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Starting BPOfferServices for nameservices: 
> 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool  (Datanode Uuid unassigned) service to 
> cluster-32f5-m:8020 starting to offer service
> {code}
> The Datanode then proceeds to use this unresolved address, as it may work if 
> the DN is configured to use a proxy. Since I'm not using a proxy, it forever 
> prints out this message:
> {code:java}
> 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> {code}
> Unfortunately, the log doesn't contain the exception that triggered it, but 
> the culprit is actually in IPC Client: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444.
> This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 
> to give a clear error message when somebody mispells an address.
> However, the fix in HADOOP-7472 doesn't apply here, because that code happens 
> in Client#getConnection after the Connection is constructed.
> My proposed fix (will attach a patch) is to move this exception out of the 
> constructor and into a place that will trigger HADOOP-7472's logic to 
> re-resolve addresses. If the DNS failure was temporary, this will allow the 
> connection to succeed. If not, the connection will fail after ipc client 
> retries (default 10 seconds worth of retries).
> I want to fix this in ipc client rather than just in Datanode startup, as 
> this fixes temporary DNS issues for all of Hadoop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup

2021-08-28 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406251#comment-17406251
 ] 

Chris Nauroth commented on HADOOP-15129:


I forgot to mention one other change I made from the prior patch.  I 
incorporated {{NetUtils.getHostname()}} into the exception message, but now I 
see Arpit raised a point in an earlier comment about multi-homed 
configurations.  If it's preferred, I'm happy to switch that back to {{null}} 
and leave it undetermined in the exception message.

> Datanode caches namenode DNS lookup failure and cannot startup
> --
>
> Key: HADOOP-15129
> URL: https://issues.apache.org/jira/browse/HADOOP-15129
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.8.2
> Environment: Google Compute Engine.
> I'm using Java 8, Debian 8, Hadoop 2.8.2.
>Reporter: Karthik Palaniappan
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> On startup, the Datanode creates an InetSocketAddress to register with each 
> namenode. Though there are retries on connection failure throughout the 
> stack, the same InetSocketAddress is reused.
> InetSocketAddress is an interesting class, because it resolves DNS names to 
> IP addresses on construction, and it is never refreshed. Hadoop re-creates an 
> InetSocketAddress in some cases just in case the remote IP has changed for a 
> particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472.
> Anyway, on startup, you cna see the Datanode log: "Namenode...remains 
> unresolved" -- referring to the fact that DNS lookup failed.
> {code:java}
> 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Refresh request received for nameservices: null
> 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode 
> for null remains unresolved for ID null. Check your hdfs-site.xml file to 
> ensure namenodes are configured properly.
> 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Starting BPOfferServices for nameservices: 
> 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool  (Datanode Uuid unassigned) service to 
> cluster-32f5-m:8020 starting to offer service
> {code}
> The Datanode then proceeds to use this unresolved address, as it may work if 
> the DN is configured to use a proxy. Since I'm not using a proxy, it forever 
> prints out this message:
> {code:java}
> 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> {code}
> Unfortunately, the log doesn't contain the exception that triggered it, but 
> the culprit is actually in IPC Client: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444.
> This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 
> to give a clear error message when somebody mispells an address.
> However, the fix in HADOOP-7472 doesn't apply here, because that code happens 
> in Client#getConnection after the Connection is constructed.
> My proposed fix (will attach a patch) is to move this exception out of the 
> constructor and into a place that will trigger HADOOP-7472's logic to 
> re-resolve addresses. If the DNS failure was temporary, this will allow the 
> connection to succeed. If not, the connection will fail after ipc client 
> retries (default 10 seconds worth of retries).
> I want to fix this in ipc client rather than just in Datanode startup, as 
> this fixes temporary DNS issues for all of Hadoop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup

2021-08-27 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406116#comment-17406116
 ] 

Chris Nauroth edited comment on HADOOP-15129 at 8/28/21, 4:10 AM:
--

This remains a problem for cloud infrastructure deployments, so I'd like to 
pick it up and see if we can get it completed.  I've sent a pull request with 
the following changes compared to the prior revision:

* Remove older code for a throw of {{UnknownHostException}}.  This lies outside 
the retry loop, so even though the earlier patch did the right thing by placing 
the throw inside the retry loop, this remaining code perpetuated the problem of 
an infinite unresolved host.
* Make minor formatting changes in the test to resolve Checkstyle issues 
flagged in the last Yetus run.

Additionally, I've confirmed testing of the patch in moderate-sized (200-node) 
Dataproc cluster deployments.

[~ste...@apache.org], [~arp], [~raviprak], [~ajayydv], and [~shahrs87], can we 
please work on getting this reviewed and committed?  I'm interested in merging 
this down to branch-3.3, branch-3.2, branch-2.10 and branch-2.9.  The patch 
as-is won't apply cleanly to 2.x.  If you approve, then I'll prepare separate 
pull requests for those branches.

Also, BTW, hello everyone.  :-)


was (Author: cnauroth):
This remains a problem for cloud infrastructure deployments, so I'd like to 
pick it up and see if we can get it completed.  I've sent a pull request with 
the following changes compared to the prior revision:

* Remove older code for a throw of {{UnknownHostException}}.  This lies outside 
the retry loop, so even though the earlier patch did the right thing by placing 
the throw inside the retry loop, this remaining code perpetuated the problem of 
an infinite unresolved host.
* Make minor formatting changes in the test to resolve Checkstyle issues 
flagged in the last Yetus run.

Additionally, I've confirmed testing of the patch in moderate-sized (200-node) 
Dataproc cluster deployments.

[~ste...@apache.org], [~arp], [~raviprak], [~ajayydv], and [~shahrs87], can we 
please work on getting this reviewed and committed?  I'm interested in merging 
this down to branch-3.3, branch-3.2, branch-2.10 and branch-2.10.  The patch 
as-is won't apply cleanly to 2.x.  If you approve, then I'll prepare separate 
pull requests for those branches.

Also, BTW, hello everyone.  :-)

> Datanode caches namenode DNS lookup failure and cannot startup
> --
>
> Key: HADOOP-15129
> URL: https://issues.apache.org/jira/browse/HADOOP-15129
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.8.2
> Environment: Google Compute Engine.
> I'm using Java 8, Debian 8, Hadoop 2.8.2.
>Reporter: Karthik Palaniappan
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> On startup, the Datanode creates an InetSocketAddress to register with each 
> namenode. Though there are retries on connection failure throughout the 
> stack, the same InetSocketAddress is reused.
> InetSocketAddress is an interesting class, because it resolves DNS names to 
> IP addresses on construction, and it is never refreshed. Hadoop re-creates an 
> InetSocketAddress in some cases just in case the remote IP has changed for a 
> particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472.
> Anyway, on startup, you cna see the Datanode log: "Namenode...remains 
> unresolved" -- referring to the fact that DNS lookup failed.
> {code:java}
> 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Refresh request received for nameservices: null
> 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode 
> for null remains unresolved for ID null. Check your hdfs-site.xml file to 
> ensure namenodes are configured properly.
> 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Starting BPOfferServices for nameservices: 
> 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool  (Datanode Uuid unassigned) service to 
> cluster-32f5-m:8020 starting to offer service
> {code}
> The Datanode then proceeds to use this unresolved address, as it may work if 
> the DN is configured to use a proxy. Since I'm not using a proxy, it forever 
> prints out this message:
> {code:java}
> 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: 

[jira] [Commented] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup

2021-08-27 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406117#comment-17406117
 ] 

Chris Nauroth commented on HADOOP-15129:


Let's please also maintain credit for Karthik.  (I submitted my patch with a 
Co-authored-by tag.)

> Datanode caches namenode DNS lookup failure and cannot startup
> --
>
> Key: HADOOP-15129
> URL: https://issues.apache.org/jira/browse/HADOOP-15129
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.8.2
> Environment: Google Compute Engine.
> I'm using Java 8, Debian 8, Hadoop 2.8.2.
>Reporter: Karthik Palaniappan
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> On startup, the Datanode creates an InetSocketAddress to register with each 
> namenode. Though there are retries on connection failure throughout the 
> stack, the same InetSocketAddress is reused.
> InetSocketAddress is an interesting class, because it resolves DNS names to 
> IP addresses on construction, and it is never refreshed. Hadoop re-creates an 
> InetSocketAddress in some cases just in case the remote IP has changed for a 
> particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472.
> Anyway, on startup, you cna see the Datanode log: "Namenode...remains 
> unresolved" -- referring to the fact that DNS lookup failed.
> {code:java}
> 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Refresh request received for nameservices: null
> 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode 
> for null remains unresolved for ID null. Check your hdfs-site.xml file to 
> ensure namenodes are configured properly.
> 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Starting BPOfferServices for nameservices: 
> 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool  (Datanode Uuid unassigned) service to 
> cluster-32f5-m:8020 starting to offer service
> {code}
> The Datanode then proceeds to use this unresolved address, as it may work if 
> the DN is configured to use a proxy. Since I'm not using a proxy, it forever 
> prints out this message:
> {code:java}
> 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> {code}
> Unfortunately, the log doesn't contain the exception that triggered it, but 
> the culprit is actually in IPC Client: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444.
> This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 
> to give a clear error message when somebody mispells an address.
> However, the fix in HADOOP-7472 doesn't apply here, because that code happens 
> in Client#getConnection after the Connection is constructed.
> My proposed fix (will attach a patch) is to move this exception out of the 
> constructor and into a place that will trigger HADOOP-7472's logic to 
> re-resolve addresses. If the DNS failure was temporary, this will allow the 
> connection to succeed. If not, the connection will fail after ipc client 
> retries (default 10 seconds worth of retries).
> I want to fix this in ipc client rather than just in Datanode startup, as 
> this fixes temporary DNS issues for all of Hadoop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15129) Datanode caches namenode DNS lookup failure and cannot startup

2021-08-27 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HADOOP-15129:
--

Assignee: Chris Nauroth  (was: Karthik Palaniappan)

This remains a problem for cloud infrastructure deployments, so I'd like to 
pick it up and see if we can get it completed.  I've sent a pull request with 
the following changes compared to the prior revision:

* Remove older code for a throw of {{UnknownHostException}}.  This lies outside 
the retry loop, so even though the earlier patch did the right thing by placing 
the throw inside the retry loop, this remaining code perpetuated the problem of 
an infinite unresolved host.
* Make minor formatting changes in the test to resolve Checkstyle issues 
flagged in the last Yetus run.

Additionally, I've confirmed testing of the patch in moderate-sized (200-node) 
Dataproc cluster deployments.

[~ste...@apache.org], [~arp], [~raviprak], [~ajayydv], and [~shahrs87], can we 
please work on getting this reviewed and committed?  I'm interested in merging 
this down to branch-3.3, branch-3.2, branch-2.10 and branch-2.10.  The patch 
as-is won't apply cleanly to 2.x.  If you approve, then I'll prepare separate 
pull requests for those branches.

Also, BTW, hello everyone.  :-)

> Datanode caches namenode DNS lookup failure and cannot startup
> --
>
> Key: HADOOP-15129
> URL: https://issues.apache.org/jira/browse/HADOOP-15129
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.8.2
> Environment: Google Compute Engine.
> I'm using Java 8, Debian 8, Hadoop 2.8.2.
>Reporter: Karthik Palaniappan
>Assignee: Chris Nauroth
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> On startup, the Datanode creates an InetSocketAddress to register with each 
> namenode. Though there are retries on connection failure throughout the 
> stack, the same InetSocketAddress is reused.
> InetSocketAddress is an interesting class, because it resolves DNS names to 
> IP addresses on construction, and it is never refreshed. Hadoop re-creates an 
> InetSocketAddress in some cases just in case the remote IP has changed for a 
> particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472.
> Anyway, on startup, you cna see the Datanode log: "Namenode...remains 
> unresolved" -- referring to the fact that DNS lookup failed.
> {code:java}
> 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Refresh request received for nameservices: null
> 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode 
> for null remains unresolved for ID null. Check your hdfs-site.xml file to 
> ensure namenodes are configured properly.
> 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Starting BPOfferServices for nameservices: 
> 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool  (Datanode Uuid unassigned) service to 
> cluster-32f5-m:8020 starting to offer service
> {code}
> The Datanode then proceeds to use this unresolved address, as it may work if 
> the DN is configured to use a proxy. Since I'm not using a proxy, it forever 
> prints out this message:
> {code:java}
> 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> {code}
> Unfortunately, the log doesn't contain the exception that triggered it, but 
> the culprit is actually in IPC Client: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444.
> This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 
> to give a clear error message when somebody mispells an address.
> However, the fix in HADOOP-7472 doesn't apply here, because that code happens 
> in Client#getConnection after the Connection is constructed.
> My proposed fix (will attach a patch) is to move this exception out of the 
> constructor and into a place that will trigger HADOOP-7472's logic to 
> re-resolve addresses. 

[jira] [Commented] (HADOOP-14544) DistCp documentation for command line options is misaligned.

2017-06-19 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054379#comment-16054379
 ] 

Chris Nauroth commented on HADOOP-14544:


I believe the problem is specific to the 2.7 line, though I haven't reviewed 
comprehensively.

http://hadoop.apache.org/docs/stable/hadoop-distcp/DistCp.html

http://hadoop.apache.org/docs/r2.7.3/hadoop-distcp/DistCp.html


> DistCp documentation for command line options is misaligned.
> 
>
> Key: HADOOP-14544
> URL: https://issues.apache.org/jira/browse/HADOOP-14544
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.3
>Reporter: Chris Nauroth
>Priority: Minor
> Attachments: DistCp 2.7.3 Documentation.png
>
>
> In the DistCp documentation, the Command Line Options section appears to be 
> misaligned/incorrect in some of the Notes for release 2.7.3.  This is the 
> current stable version, so it's likely that users will drive into this 
> version of the document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14544) DistCp documentation for command line options is misaligned.

2017-06-19 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-14544:
---
Attachment: DistCp 2.7.3 Documentation.png

> DistCp documentation for command line options is misaligned.
> 
>
> Key: HADOOP-14544
> URL: https://issues.apache.org/jira/browse/HADOOP-14544
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.3
>Reporter: Chris Nauroth
>Priority: Minor
> Attachments: DistCp 2.7.3 Documentation.png
>
>
> In the DistCp documentation, the Command Line Options section appears to be 
> misaligned/incorrect in some of the Notes for release 2.7.3.  This is the 
> current stable version, so it's likely that users will drive into this 
> version of the document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14544) DistCp documentation for command line options is misaligned.

2017-06-19 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-14544:
--

 Summary: DistCp documentation for command line options is 
misaligned.
 Key: HADOOP-14544
 URL: https://issues.apache.org/jira/browse/HADOOP-14544
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.3
Reporter: Chris Nauroth
Priority: Minor


In the DistCp documentation, the Command Line Options section appears to be 
misaligned/incorrect in some of the Notes for release 2.7.3.  This is the 
current stable version, so it's likely that users will drive into this version 
of the document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-13726) Enforce that FileSystem initializes only a single instance of the requested FileSystem.

2017-04-12 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966754#comment-15966754
 ] 

Chris Nauroth edited comment on HADOOP-13726 at 4/13/17 12:57 AM:
--

My understanding is that with use of the Guava {{Cache}}, we'd effectively 
achieve per-FS locking granularity already, not a lock over the whole {{Cache}} 
instance, and we'd achieve it without changing state tracking around the FS 
instances.  Multiple threads attempting to retrieve the same FS would block 
waiting for the first thread to finish initialization, but threads retrieving a 
different FS could proceed concurrently.

If you're trying to achieve it without using the Guava {{Cache}} at all, then 
maybe the {{FileSystem#initialize}} call could be moved inside a 
{{synchronized}} block on the {{Cache.Key}} instance, and it still wouldn't 
need state tracking changes around the FS?


was (Author: cnauroth):
My understanding is that with use of the Guava {{Cache}}, we'd effectively 
achieve per-FS locking granularity already, not a lock over the whole {{Cache}} 
instance, and we'd achieve it without changing state tracking around the FS 
instances.  Multiple threads attempting to retrieve the same FS would block 
waiting for the first thread to finish initialization, but threads retrieving a 
different FS could proceed concurrently.

If you're trying to achieve it without using the Guava {{Cache}} at all, then 
maybe the {{FileSystem#initialize}} call could be moved inside a 
{{syncrhonized}} block on the {{Cache.Key}} instance, and it still wouldn't 
need state tracking changes around the FS?

> Enforce that FileSystem initializes only a single instance of the requested 
> FileSystem.
> ---
>
> Key: HADOOP-13726
> URL: https://issues.apache.org/jira/browse/HADOOP-13726
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Chris Nauroth
>Assignee: Manjunath Anand
>
> The {{FileSystem}} cache is intended to guarantee reuse of instances by 
> multiple call sites or multiple threads.  The current implementation does 
> provide this guarantee, but there is a brief race condition window during 
> which multiple threads could perform redundant initialization.  If the file 
> system implementation has expensive initialization logic, then this is 
> wasteful.  This issue proposes to eliminate that race condition and guarantee 
> initialization of only a single instance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13726) Enforce that FileSystem initializes only a single instance of the requested FileSystem.

2017-04-12 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966952#comment-15966952
 ] 

Chris Nauroth commented on HADOOP-13726:


bq. ...maybe the {{FileSystem#initialize}} call could be moved inside a 
{{synchronized}} block on the Cache.Key instance...?

No, I'm wrong about this, because it's a different {{Key}} instance on each 
thread, and therefore multiple threads wouldn't be coordinating on the same 
lock.

> Enforce that FileSystem initializes only a single instance of the requested 
> FileSystem.
> ---
>
> Key: HADOOP-13726
> URL: https://issues.apache.org/jira/browse/HADOOP-13726
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Chris Nauroth
>Assignee: Manjunath Anand
>
> The {{FileSystem}} cache is intended to guarantee reuse of instances by 
> multiple call sites or multiple threads.  The current implementation does 
> provide this guarantee, but there is a brief race condition window during 
> which multiple threads could perform redundant initialization.  If the file 
> system implementation has expensive initialization logic, then this is 
> wasteful.  This issue proposes to eliminate that race condition and guarantee 
> initialization of only a single instance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13726) Enforce that FileSystem initializes only a single instance of the requested FileSystem.

2017-04-12 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966754#comment-15966754
 ] 

Chris Nauroth commented on HADOOP-13726:


My understanding is that with use of the Guava {{Cache}}, we'd effectively 
achieve per-FS locking granularity already, not a lock over the whole {{Cache}} 
instance, and we'd achieve it without changing state tracking around the FS 
instances.  Multiple threads attempting to retrieve the same FS would block 
waiting for the first thread to finish initialization, but threads retrieving a 
different FS could proceed concurrently.

If you're trying to achieve it without using the Guava {{Cache}} at all, then 
maybe the {{FileSystem#initialize}} call could be moved inside a 
{{syncrhonized}} block on the {{Cache.Key}} instance, and it still wouldn't 
need state tracking changes around the FS?

> Enforce that FileSystem initializes only a single instance of the requested 
> FileSystem.
> ---
>
> Key: HADOOP-13726
> URL: https://issues.apache.org/jira/browse/HADOOP-13726
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Chris Nauroth
>Assignee: Manjunath Anand
>
> The {{FileSystem}} cache is intended to guarantee reuse of instances by 
> multiple call sites or multiple threads.  The current implementation does 
> provide this guarantee, but there is a brief race condition window during 
> which multiple threads could perform redundant initialization.  If the file 
> system implementation has expensive initialization logic, then this is 
> wasteful.  This issue proposes to eliminate that race condition and guarantee 
> initialization of only a single instance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14301) Deprecate SharedInstanceProfileCredentialsProvider in branch-2.

2017-04-12 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HADOOP-14301.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0

+1 and committed to branch-2.  [~liuml07], thank you for the patch.

> Deprecate SharedInstanceProfileCredentialsProvider in branch-2.
> ---
>
> Key: HADOOP-14301
> URL: https://issues.apache.org/jira/browse/HADOOP-14301
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.9.0
>
> Attachments: HADOOP-14248-branch-2.002.patch
>
>
> [HADOOP-13727] added the {{SharedInstanceProfileCredentialsProvider}}, which 
> effectively reduces high number of connections to EC2 Instance Metadata 
> Service caused by {{InstanceProfileCredentialsProvider}}. That patch, in 
> order to prevent the throttling problem, defined new class 
> {{SharedInstanceProfileCredentialsProvider}} as a subclass of 
> {{InstanceProfileCredentialsProvider}}, which enforces creation of only a 
> single instance.
> Per [HADOOP-13050], we upgraded the AWS Java SDK. Since then, the 
> {{InstanceProfileCredentialsProvider}} in SDK code internally enforces a 
> singleton. {{SharedInstanceProfileCredentialsProvider}} can be deprecated as 
> of release 2.9.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14248) Retire SharedInstanceProfileCredentialsProvider in trunk.

2017-04-12 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-14248:
---
   Resolution: Fixed
 Hadoop Flags: Incompatible change,Reviewed  (was: Incompatible change)
Fix Version/s: 3.0.0-alpha3
   Status: Resolved  (was: Patch Available)

+1 and committed to both trunk and branch-2.  The new JIRA HADOOP-14301 tracks 
the branch-2 commit of the deprecation, for inclusion in 2.9.0.  I also made 
that a sub-task of HADOOP-13204, which is also targeted to 2.9.0.  For this 
one, I converted to a top-level issue, because it wouldn't make sense for an 
issue targeted to 3.0.0-alpha3 to be a sub-task of an issue targeted to 2.9.0.  
(If there is an umbrella S3A JIRA targeted to 3.x that I'm not aware of, please 
feel free to move it under there.)

Thank you for the patch, [~liuml07].


> Retire SharedInstanceProfileCredentialsProvider in trunk.
> -
>
> Key: HADOOP-14248
> URL: https://issues.apache.org/jira/browse/HADOOP-14248
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha3
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 3.0.0-alpha3
>
> Attachments: HADOOP-14248.000.patch, HADOOP-14248.001.patch, 
> HADOOP-14248-branch-2.001.patch, HADOOP-14248-branch-2.002.patch
>
>
> This is from the discussion in [HADOOP-13050].
> So [HADOOP-13727] added the SharedInstanceProfileCredentialsProvider, which 
> effectively reduces high number of connections to EC2 Instance Metadata 
> Service caused by InstanceProfileCredentialsProvider. That patch, in order to 
> prevent the throttling problem, defined new class 
> {{SharedInstanceProfileCredentialsProvider}} as a subclass of 
> {{InstanceProfileCredentialsProvider}}, which enforces creation of only a 
> single instance.
> Per [HADOOP-13050], we upgraded the AWS Java SDK. Since then, the 
> {{InstanceProfileCredentialsProvider}} in SDK code internally enforces a 
> singleton. That  confirms that our effort in [HADOOP-13727] makes 100% sense. 
> Meanwhile, {{SharedInstanceProfileCredentialsProvider}} can retire gracefully 
> in trunk branch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14248) Retire SharedInstanceProfileCredentialsProvider in trunk.

2017-04-12 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-14248:
---
Issue Type: Improvement  (was: Bug)

> Retire SharedInstanceProfileCredentialsProvider in trunk.
> -
>
> Key: HADOOP-14248
> URL: https://issues.apache.org/jira/browse/HADOOP-14248
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha3
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-14248.000.patch, HADOOP-14248.001.patch, 
> HADOOP-14248-branch-2.001.patch, HADOOP-14248-branch-2.002.patch
>
>
> This is from the discussion in [HADOOP-13050].
> So [HADOOP-13727] added the SharedInstanceProfileCredentialsProvider, which 
> effectively reduces high number of connections to EC2 Instance Metadata 
> Service caused by InstanceProfileCredentialsProvider. That patch, in order to 
> prevent the throttling problem, defined new class 
> {{SharedInstanceProfileCredentialsProvider}} as a subclass of 
> {{InstanceProfileCredentialsProvider}}, which enforces creation of only a 
> single instance.
> Per [HADOOP-13050], we upgraded the AWS Java SDK. Since then, the 
> {{InstanceProfileCredentialsProvider}} in SDK code internally enforces a 
> singleton. That  confirms that our effort in [HADOOP-13727] makes 100% sense. 
> Meanwhile, {{SharedInstanceProfileCredentialsProvider}} can retire gracefully 
> in trunk branch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14248) Retire SharedInstanceProfileCredentialsProvider in trunk.

2017-04-12 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-14248:
---
Issue Type: Bug  (was: Sub-task)
Parent: (was: HADOOP-13204)

> Retire SharedInstanceProfileCredentialsProvider in trunk.
> -
>
> Key: HADOOP-14248
> URL: https://issues.apache.org/jira/browse/HADOOP-14248
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha3
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-14248.000.patch, HADOOP-14248.001.patch, 
> HADOOP-14248-branch-2.001.patch, HADOOP-14248-branch-2.002.patch
>
>
> This is from the discussion in [HADOOP-13050].
> So [HADOOP-13727] added the SharedInstanceProfileCredentialsProvider, which 
> effectively reduces high number of connections to EC2 Instance Metadata 
> Service caused by InstanceProfileCredentialsProvider. That patch, in order to 
> prevent the throttling problem, defined new class 
> {{SharedInstanceProfileCredentialsProvider}} as a subclass of 
> {{InstanceProfileCredentialsProvider}}, which enforces creation of only a 
> single instance.
> Per [HADOOP-13050], we upgraded the AWS Java SDK. Since then, the 
> {{InstanceProfileCredentialsProvider}} in SDK code internally enforces a 
> singleton. That  confirms that our effort in [HADOOP-13727] makes 100% sense. 
> Meanwhile, {{SharedInstanceProfileCredentialsProvider}} can retire gracefully 
> in trunk branch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14248) Retire SharedInstanceProfileCredentialsProvider in trunk.

2017-04-12 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-14248:
---
Summary: Retire SharedInstanceProfileCredentialsProvider in trunk.  (was: 
Retire SharedInstanceProfileCredentialsProvider in trunk; deprecate in branch-2)

> Retire SharedInstanceProfileCredentialsProvider in trunk.
> -
>
> Key: HADOOP-14248
> URL: https://issues.apache.org/jira/browse/HADOOP-14248
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha3
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-14248.000.patch, HADOOP-14248.001.patch, 
> HADOOP-14248-branch-2.001.patch, HADOOP-14248-branch-2.002.patch
>
>
> This is from the discussion in [HADOOP-13050].
> So [HADOOP-13727] added the SharedInstanceProfileCredentialsProvider, which 
> effectively reduces high number of connections to EC2 Instance Metadata 
> Service caused by InstanceProfileCredentialsProvider. That patch, in order to 
> prevent the throttling problem, defined new class 
> {{SharedInstanceProfileCredentialsProvider}} as a subclass of 
> {{InstanceProfileCredentialsProvider}}, which enforces creation of only a 
> single instance.
> Per [HADOOP-13050], we upgraded the AWS Java SDK. Since then, the 
> {{InstanceProfileCredentialsProvider}} in SDK code internally enforces a 
> singleton. That  confirms that our effort in [HADOOP-13727] makes 100% sense. 
> Meanwhile, {{SharedInstanceProfileCredentialsProvider}} can retire gracefully 
> in trunk branch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14301) Deprecate SharedInstanceProfileCredentialsProvider in branch-2.

2017-04-12 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-14301:
---
Attachment: HADOOP-14248-branch-2.002.patch

> Deprecate SharedInstanceProfileCredentialsProvider in branch-2.
> ---
>
> Key: HADOOP-14301
> URL: https://issues.apache.org/jira/browse/HADOOP-14301
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-14248-branch-2.002.patch
>
>
> [HADOOP-13727] added the {{SharedInstanceProfileCredentialsProvider}}, which 
> effectively reduces high number of connections to EC2 Instance Metadata 
> Service caused by {{InstanceProfileCredentialsProvider}}. That patch, in 
> order to prevent the throttling problem, defined new class 
> {{SharedInstanceProfileCredentialsProvider}} as a subclass of 
> {{InstanceProfileCredentialsProvider}}, which enforces creation of only a 
> single instance.
> Per [HADOOP-13050], we upgraded the AWS Java SDK. Since then, the 
> {{InstanceProfileCredentialsProvider}} in SDK code internally enforces a 
> singleton. {{SharedInstanceProfileCredentialsProvider}} can be deprecated as 
> of release 2.9.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14301) Deprecate SharedInstanceProfileCredentialsProvider in branch-2.

2017-04-12 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966210#comment-15966210
 ] 

Chris Nauroth commented on HADOOP-14301:


This issue is spun off from HADOOP-14248 just for the sake of tracking separate 
release notes ("removal" vs. "deprecation") in different release lines.  The 
patch already went through pre-commit on HADOOP-14248, so we won't wait for 
pre-commit again here.

> Deprecate SharedInstanceProfileCredentialsProvider in branch-2.
> ---
>
> Key: HADOOP-14301
> URL: https://issues.apache.org/jira/browse/HADOOP-14301
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-14248-branch-2.002.patch
>
>
> [HADOOP-13727] added the {{SharedInstanceProfileCredentialsProvider}}, which 
> effectively reduces high number of connections to EC2 Instance Metadata 
> Service caused by {{InstanceProfileCredentialsProvider}}. That patch, in 
> order to prevent the throttling problem, defined new class 
> {{SharedInstanceProfileCredentialsProvider}} as a subclass of 
> {{InstanceProfileCredentialsProvider}}, which enforces creation of only a 
> single instance.
> Per [HADOOP-13050], we upgraded the AWS Java SDK. Since then, the 
> {{InstanceProfileCredentialsProvider}} in SDK code internally enforces a 
> singleton. {{SharedInstanceProfileCredentialsProvider}} can be deprecated as 
> of release 2.9.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14301) Deprecate SharedInstanceProfileCredentialsProvider in branch-2.

2017-04-12 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-14301:
--

 Summary: Deprecate SharedInstanceProfileCredentialsProvider in 
branch-2.
 Key: HADOOP-14301
 URL: https://issues.apache.org/jira/browse/HADOOP-14301
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Mingliang Liu
Assignee: Mingliang Liu


[HADOOP-13727] added the {{SharedInstanceProfileCredentialsProvider}}, which 
effectively reduces high number of connections to EC2 Instance Metadata Service 
caused by {{InstanceProfileCredentialsProvider}}. That patch, in order to 
prevent the throttling problem, defined new class 
{{SharedInstanceProfileCredentialsProvider}} as a subclass of 
{{InstanceProfileCredentialsProvider}}, which enforces creation of only a 
single instance.

Per [HADOOP-13050], we upgraded the AWS Java SDK. Since then, the 
{{InstanceProfileCredentialsProvider}} in SDK code internally enforces a 
singleton. {{SharedInstanceProfileCredentialsProvider}} can be deprecated as of 
release 2.9.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14248) Retire SharedInstanceProfileCredentialsProvider in trunk; deprecate in branch-2

2017-04-11 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964937#comment-15964937
 ] 

Chris Nauroth commented on HADOOP-14248:


[~liuml07], this looks good now, and I confirmed a full test run against 
US-west-2.

However, I'm now thinking that we need two separates JIRA issues, just for the 
sake of accurate tracking against target versions and release notes.  
HADOOP-14248 would track the removal from trunk (already covered by the current 
release note), and the new issue would be targeted to 2.9.0 with a different 
release note describing deprecation instead of removal.  (No need to repeat 
pre-commit.  I'd just comment on the new JIRA that pre-commit for branch-2 was 
already covered here.)

Do you think that makes sense?  If so, I'd be happy to be the JIRA janitor and 
finish off committing this.  :-)

> Retire SharedInstanceProfileCredentialsProvider in trunk; deprecate in 
> branch-2
> ---
>
> Key: HADOOP-14248
> URL: https://issues.apache.org/jira/browse/HADOOP-14248
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha3
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-14248.000.patch, HADOOP-14248.001.patch, 
> HADOOP-14248-branch-2.001.patch, HADOOP-14248-branch-2.002.patch
>
>
> This is from the discussion in [HADOOP-13050].
> So [HADOOP-13727] added the SharedInstanceProfileCredentialsProvider, which 
> effectively reduces high number of connections to EC2 Instance Metadata 
> Service caused by InstanceProfileCredentialsProvider. That patch, in order to 
> prevent the throttling problem, defined new class 
> {{SharedInstanceProfileCredentialsProvider}} as a subclass of 
> {{InstanceProfileCredentialsProvider}}, which enforces creation of only a 
> single instance.
> Per [HADOOP-13050], we upgraded the AWS Java SDK. Since then, the 
> {{InstanceProfileCredentialsProvider}} in SDK code internally enforces a 
> singleton. That  confirms that our effort in [HADOOP-13727] makes 100% sense. 
> Meanwhile, {{SharedInstanceProfileCredentialsProvider}} can retire gracefully 
> in trunk branch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14248) Retire SharedInstanceProfileCredentialsProvider in trunk; deprecate in branch-2

2017-04-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963537#comment-15963537
 ] 

Chris Nauroth commented on HADOOP-14248:


Hello [~liuml07].  This looks good overall.  I have a comment on the branch-2 
patch.

{code}
   private SharedInstanceProfileCredentialsProvider() {
-super();
+InstanceProfileCredentialsProvider.getInstance();
   }
{code}

I don't think this change is necessary.  The call to 
{{InstanceProfileCredentialsProvider#getInstance()}} returns an instance 
(always the same one now that we've upgraded the AWS SDK), but then it never 
saves a reference to that instance or does anything else with it.

> Retire SharedInstanceProfileCredentialsProvider in trunk; deprecate in 
> branch-2
> ---
>
> Key: HADOOP-14248
> URL: https://issues.apache.org/jira/browse/HADOOP-14248
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha3
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-14248.000.patch, HADOOP-14248.001.patch, 
> HADOOP-14248-branch-2.001.patch
>
>
> This is from the discussion in [HADOOP-13050].
> So [HADOOP-13727] added the SharedInstanceProfileCredentialsProvider, which 
> effectively reduces high number of connections to EC2 Instance Metadata 
> Service caused by InstanceProfileCredentialsProvider. That patch, in order to 
> prevent the throttling problem, defined new class 
> {{SharedInstanceProfileCredentialsProvider}} as a subclass of 
> {{InstanceProfileCredentialsProvider}}, which enforces creation of only a 
> single instance.
> Per [HADOOP-13050], we upgraded the AWS Java SDK. Since then, the 
> {{InstanceProfileCredentialsProvider}} in SDK code internally enforces a 
> singleton. That  confirms that our effort in [HADOOP-13727] makes 100% sense. 
> Meanwhile, {{SharedInstanceProfileCredentialsProvider}} can retire gracefully 
> in trunk branch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13726) Enforce that FileSystem initializes only a single instance of the requested FileSystem.

2017-04-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963519#comment-15963519
 ] 

Chris Nauroth commented on HADOOP-13726:


Thank you, [~manju_hadoop]!  Your last comment looks to me like a good way to 
go.  Please feel free to attach a patch file as described in the 
[HowToContribute|https://wiki.apache.org/hadoop/HowToContribute] wiki page.

bq. ...if the thread which succeeded in getting the lock throws an exception 
during FileSystem initialization, then all other threads waiting for the result 
will get ExecutionException and wouldnot retry serially...

It's good that you remapped the {{ExecutionException}} back to {{IOException}} 
in your example.  Typical callers are equipped to handle an {{IOException}}.  I 
think this is acceptable, as there has never been any stated contract around 
{{FileSystem#get}} retrying internally.  Calling code that wants to be 
resilient against transient failure already must have retry logic of its own.

> Enforce that FileSystem initializes only a single instance of the requested 
> FileSystem.
> ---
>
> Key: HADOOP-13726
> URL: https://issues.apache.org/jira/browse/HADOOP-13726
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Chris Nauroth
>Assignee: Manjunath Anand
>
> The {{FileSystem}} cache is intended to guarantee reuse of instances by 
> multiple call sites or multiple threads.  The current implementation does 
> provide this guarantee, but there is a brief race condition window during 
> which multiple threads could perform redundant initialization.  If the file 
> system implementation has expensive initialization logic, then this is 
> wasteful.  This issue proposes to eliminate that race condition and guarantee 
> initialization of only a single instance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14176) distcp reports beyond physical memory limits on 2.X

2017-03-15 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927201#comment-15927201
 ] 

Chris Nauroth commented on HADOOP-14176:


[~jrottinghuis], thanks for some great suggestions on this.  I agree that:

* If DistCp overrides {{mapreduce.map.memory.mb}}, then a corresponding 
override of {{mapreduce.map.java.opts}} would help.
* Considering the uber-task case and tuning 
{{yarn.app.mapreduce.am.resource.mb}} is a good idea.
* Configuration options for reducers aren't doing anything and could 
potentially be removed.

This will be a small patch that is unfortunately difficult to test 
comprehensively.  For example, does anyone have expectations that global 
changes to {{mapred.child.java.opts}} in mapred-site.xml would propagate down 
to DistCp?  That would stop happening after this change.

It looks like similar discussion concluded that MAPREDUCE-5785 was backward 
incompatible, and therefore it only went to the 3.x line.

> distcp reports beyond physical memory limits on 2.X
> ---
>
> Key: HADOOP-14176
> URL: https://issues.apache.org/jira/browse/HADOOP-14176
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HADOOP-14176-branch-2.001.patch, 
> HADOOP-14176-branch-2.002.patch
>
>
> When i run distcp,  i get some errors as follow
> {quote}
> 17/02/21 15:31:18 INFO mapreduce.Job: Task Id : 
> attempt_1487645941615_0037_m_03_0, Status : FAILED
> Container [pid=24661,containerID=container_1487645941615_0037_01_05] is 
> running beyond physical memory limits. Current usage: 1.1 GB of 1 GB physical 
> memory used; 4.0 GB of 5 GB virtual memory used. Killing container.
> Dump of the process-tree for container_1487645941615_0037_01_05 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 24661 24659 24661 24661 (bash) 0 0 108650496 301 /bin/bash -c 
> /usr/lib/jvm/java/bin/java -Djava.net.preferIPv4Stack=true 
> -Dhadoop.metrics.log.level=WARN  -Xmx2120m 
> -Djava.io.tmpdir=/mnt/disk4/yarn/usercache/hadoop/appcache/application_1487645941615_0037/container_1487645941615_0037_01_05/tmp
>  -Dlog4j.configuration=container-log4j.properties 
> -Dyarn.app.container.log.dir=/mnt/disk2/log/hadoop-yarn/containers/application_1487645941615_0037/container_1487645941615_0037_01_05
>  -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
> -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.1.208 
> 44048 attempt_1487645941615_0037_m_03_0 5 
> 1>/mnt/disk2/log/hadoop-yarn/containers/application_1487645941615_0037/container_1487645941615_0037_01_05/stdout
>  
> 2>/mnt/disk2/log/hadoop-yarn/containers/application_1487645941615_0037/container_1487645941615_0037_01_05/stderr
> |- 24665 24661 24661 24661 (java) 1766 336 4235558912 280699 
> /usr/lib/jvm/java/bin/java -Djava.net.preferIPv4Stack=true 
> -Dhadoop.metrics.log.level=WARN -Xmx2120m 
> -Djava.io.tmpdir=/mnt/disk4/yarn/usercache/hadoop/appcache/application_1487645941615_0037/container_1487645941615_0037_01_05/tmp
>  -Dlog4j.configuration=container-log4j.properties 
> -Dyarn.app.container.log.dir=/mnt/disk2/log/hadoop-yarn/containers/application_1487645941615_0037/container_1487645941615_0037_01_05
>  -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
> -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.1.208 
> 44048 attempt_1487645941615_0037_m_03_0 5
> Container killed on request. Exit code is 143
> Container exited with a non-zero exit code 143
> {quote}
> Deep into the code , i find that because distcp configuration covers 
> mapred-site.xml
> {code}
> 
> mapred.job.map.memory.mb
> 1024
> 
> 
> mapred.job.reduce.memory.mb
> 1024
> 
> {code}
> When mapreduce.map.java.opts and mapreduce.map.memory.mb is setting in 
> mapred-default.xml, and the value is larger than setted in 
> distcp-default.xml, the error maybe occur.
> we should remove those two configurations in distcp-default.xml 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-13726) Enforce that FileSystem initializes only a single instance of the requested FileSystem.

2017-03-15 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HADOOP-13726:
--

Assignee: Manjunath Anand

bq. I would like to work on this JIRA (if you dont mind)...

Thanks very much for taking it on!  I have assigned the issue to you.

bq. I Think the main concern of Chris (and myself) is not the operations which 
fail, it's those the block for a long time before failing.

Yes, that's my concern.  If opening a socket for one kind of {{FileSystem}} 
blocked initialization of any other kind of {{FileSystem}} in the process, then 
that would be a performance regression from the current implementation.

bq. I could see that if the hashcode is same for say two similar keys which are 
passed to computeIfAbsent concurrently then one of them waits for the other to 
complete, but if the hashcode of the keys are different then it doesnt block 
each other.

I appreciate the testing and measurement.  Unfortunately, it's difficult to 
build complete confidence with this kind of testing.  For example, if lock 
granularity is based on hash bucket within the hash table, such that 2 threads 
operating on 2 distinct {{FileSystem}} keys generate different hash codes, but 
land in the same hash bucket, then testing wouldn't expose that blocking unless 
we were lucky enough to get just the right hash codes.

I'd prefer that the JavaDocs have a concrete statement about locking 
granularity, but we don't have that.  Barring that, I think we're left with 
code inspection.  I read through {{computeIfAbsent}} again.  It's very tricky 
code, but I did notice that the mapping function can be called while holding a 
lock on an internal {{TreeBin}} class.

http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/8c93eb3fa1c0/src/share/classes/java/util/concurrent/ConcurrentHashMap.java#l2718

>From what I understand of this code so far, I interpret that it is possible 
>for one stalled initialization to block others that are attempting to insert 
>to the same {{TreeBin}}.

bq. Can you or Chris Nauroth please provide an approximate initial value for 
the ConcurrentHashMap to be used.

I don't think it's easily predictable or consistent across different 
applications.  For something like a short-running Hive query, I expect 
relatively few {{FileSystem}} instances.  For something like Oozie, it's a 
long-running process that proxies activity to multiple {{FileSystem}} instances 
on behalf of multiple users.  Since {{UserGroupInformation}} is part of the 
cache key, there will be numerous unique {{FileSystem}} instances created and 
destroyed during the lifetime of the process, potentially causing the hash 
table to expand and shrink multiple times.  The current implementation just 
uses the default {{HashMap}} size by calling the default constructor.

Having dug into Guava's 
[{{LoadingCache#get}}|http://google.github.io/guava/releases/snapshot/api/docs/com/google/common/cache/LoadingCache.html#get-K-]
 more, this looks like it has the locking granularity that we want.

{quote}
If another call to get(K) or getUnchecked(K) is currently loading the value for 
key, simply waits for that thread to finish and returns its loaded value. Note 
that multiple threads can concurrently load values for distinct keys.
{quote}

That's exactly the behavior I was hoping to achieve.

bq. how do we pass the URI and conf from the getInternal method...

Oh no.  Now I'm stuck.  I don't have a good recommendation for this yet.  My 
initial reaction was to include the {{Configuration}} and {{URI}} in the 
{{Key}} object but omit it from {{hashCode()}} calculation.  Then, the 
{{CacheLoader}} could unpack what it needs from the key.  However, this would 
potentially cause the cache to hold on to {{Configuration}} instances much 
longer and bloat memory footprint.

I'll think on this more and let you know if I come up with anything.


> Enforce that FileSystem initializes only a single instance of the requested 
> FileSystem.
> ---
>
> Key: HADOOP-13726
> URL: https://issues.apache.org/jira/browse/HADOOP-13726
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Chris Nauroth
>Assignee: Manjunath Anand
>
> The {{FileSystem}} cache is intended to guarantee reuse of instances by 
> multiple call sites or multiple threads.  The current implementation does 
> provide this guarantee, but there is a brief race condition window during 
> which multiple threads could perform redundant initialization.  If the file 
> system implementation has expensive initialization logic, then this is 
> wasteful.  This issue proposes to eliminate that race condition and guarantee 
> initialization of only a single instance.



--
This message was sent by Atlassian 

[jira] [Commented] (HADOOP-14169) Implement listStatusIterator, listLocatedStatus for ViewFs

2017-03-15 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927079#comment-15927079
 ] 

Chris Nauroth commented on HADOOP-14169:


+1 from me also, pending pre-commit run on the latest revision.  Thank you, 
[~xkrogen].

> Implement listStatusIterator, listLocatedStatus for ViewFs
> --
>
> Key: HADOOP-14169
> URL: https://issues.apache.org/jira/browse/HADOOP-14169
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: viewfs
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Minor
> Attachments: HADOOP-14169.000.patch, HADOOP-14169.001.patch, 
> HADOOP-14169.002.patch
>
>
> Similar to what HADOOP-11812 did for ViewFileSystem, currently ViewFs does 
> not respect optimizations to {{listStatusIterator}} or {{listLocatedStatus}}, 
> using the naive implementations within {{AbstractFileSystem}}. This can cause 
> performance issues if iterating over a large directory, especially if the 
> locations are also needed. This can be easily fixed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14038) Rename ADLS credential properties

2017-03-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905942#comment-15905942
 ] 

Chris Nauroth commented on HADOOP-14038:


bq. Is there any way to deprecate properties with dynamic names, e.g. 
{{dfs.adls..hostname}}?

Sorry, no, I am not aware of any convenient way to achieve this (barring the 
possibility of putting new features into {{Configuration}} to support it).

bq. It fixes the issue but means reloadConfiguration is called for every new 
instance.

I would prefer to avoid that because of the extra I/O reading the files and the 
extra XML parsing.  I thought every {{Configuration}} instance was sharing the 
same static/global {{DeprecationContext}}, so therefore wherever in the code we 
add new deprecations, it would propagate down to all live instances.  Am I 
missing something?

> Rename ADLS credential properties
> -
>
> Key: HADOOP-14038
> URL: https://issues.apache.org/jira/browse/HADOOP-14038
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/adl
>Affects Versions: 3.0.0-alpha3
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Minor
> Attachments: HADOOP-14038.001.patch, HADOOP-14038.002.patch, 
> HADOOP-14038.003.patch, HADOOP-14038.004.patch, HADOOP-14038.005.patch
>
>
> Add ADLS credential configuration properties to {{core-default.xml}}. 
> Set/document the default value for 
> {{dfs.adls.oauth2.access.token.provider.type}} to {{ClientCredential}}.
> Fix {{AdlFileSystem#getAccessTokenProvider}} which implies the provider type 
> is {{Custom}}.
> Fix several unit tests that set {{dfs.adls.oauth2.access.token.provider}} but 
> does not set {{dfs.adls.oauth2.access.token.provider.type}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13736) Change PathMetadata to hold S3AFileStatus instead of FileStatus.

2017-03-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898098#comment-15898098
 ] 

Chris Nauroth commented on HADOOP-13736:


I'm unclear on the current status of this patch.  Is it time for another review 
of revision 006?

> Change PathMetadata to hold S3AFileStatus instead of FileStatus.
> 
>
> Key: HADOOP-13736
> URL: https://issues.apache.org/jira/browse/HADOOP-13736
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HADOOP-13736.000.patch, 
> HADOOP-13736-HADOOP-13345.000.patch, HADOOP-13736-HADOOP-13345.001.patch, 
> HADOOP-13736-HADOOP-13345.002.patch, HADOOP-13736-HADOOP-13345.003.patch, 
> HADOOP-13736-HADOOP-13345.004.patch, HADOOP-13736-HADOOP-13345.005.patch, 
> HADOOP-13736-HADOOP-13345.006.patch, HADOOP-13736.wip-01.patch
>
>
> {{S3AFileStatus}} is implemented differently with {{FileStatus}}, for 
> instance {{S3AFileStatus#isEmptyDirectory()}} is not implemented in 
> {{FileStatus()}}. And {{access_time}}, {{block_replication}}, {{owner}}, 
> {{group}} and a few other fields are not meaningful in {{S3AFileStatus}}.  
> So in the scope of {{S3guard}}, it should use {{S3AFileStatus}} in  instead 
> of {{FileStatus}} in {{PathMetadaa}} to avoid casting the types back and 
> forth in S3A. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13726) Enforce that FileSystem initializes only a single instance of the requested FileSystem.

2017-03-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898071#comment-15898071
 ] 

Chris Nauroth commented on HADOOP-13726:


That's an interesting observation about {{computeIfAbsent}}, at least for the 
3.x line where we can use Java 8.  I am concerned about this statement from 
[{{computeIfAbsent}} 
JavaDocs|http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#computeIfAbsent-K-java.util.function.Function-]:

{quote}
Some attempted update operations on this map by other threads may be blocked 
while computation is in progress, so the computation should be short and 
simple...
{quote}

Sometimes {{FileSystem}} initialization is neither short nor simple, involving 
things like network connections and authentication, all of which can suffer 
problematic failure modes like timeouts.  The current code prevents one blocked 
{{FileSystem}} initialization from stalling all other threads accessing the 
cache.  For example, if there is a blocked connection to s3a://my-bucket, then 
only threads attempting to access s3a://my-bucket get blocked.  Threads 
accessing a different {{FileSystem}}, such as hdfs://mylocalcluster can still 
make progress.

>From the JavaDocs, I don't see a clear statement of the locking granularity, 
>so I don't know if {{computeIfAbsent}} would preserve the current behavior.  
>The [code for 
>{{computeIfAbsent}}|http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/8c93eb3fa1c0/src/share/classes/java/util/concurrent/ConcurrentHashMap.java#l1643]
> is complex, and I don't have time right now to read it and understand the 
>locking granularity.  (It also might be unwise to assume a particular locking 
>implementation is common across all possible JVMs.)

This makes me skeptical of {{computeIfAbsent}} as a potential solution for this 
problem, but it's my first time digging this deeply into that method, so I 
might have more to learn here.

Steve mentioned possibly Guava.  I believe 
[{{LoadingCache}}|https://google.github.io/guava/releases/11.0.2/api/docs/com/google/common/cache/LoadingCache.html]
 / 
[{{CacheLoader}}|https://google.github.io/guava/releases/11.0.2/api/docs/com/google/common/cache/CacheLoader.html]
 do basically what I described in my last comment.  We could potentially review 
that code in more detail to make sure it's a good fit.

> Enforce that FileSystem initializes only a single instance of the requested 
> FileSystem.
> ---
>
> Key: HADOOP-13726
> URL: https://issues.apache.org/jira/browse/HADOOP-13726
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Chris Nauroth
>
> The {{FileSystem}} cache is intended to guarantee reuse of instances by 
> multiple call sites or multiple threads.  The current implementation does 
> provide this guarantee, but there is a brief race condition window during 
> which multiple threads could perform redundant initialization.  If the file 
> system implementation has expensive initialization logic, then this is 
> wasteful.  This issue proposes to eliminate that race condition and guarantee 
> initialization of only a single instance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14122) Add ADLS to hadoop-cloud-storage-project

2017-03-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897948#comment-15897948
 ] 

Chris Nauroth commented on HADOOP-14122:


[~jzhuge], my initial goal with HADOOP-13687 was to create the 
hadoop-cloud-storage artifact and also set the stage for migrating each cloud 
storage filesystem module under the hadoop-cloud-storage-project part of the 
build.  Because of that, my early revisions of the patch included a mass 
movement of the hadoop-azure-datalake codebase.  I backed off on that goal 
though, because HADOOP-13037 wasn't yet committed, and I didn't want to 
invalidate patches in progress there.

There is no reason not to include hadoop-azure-datalake in the combined 
hadoop-cloud-storage artifact.  We just didn't do it at the time, because the 
focus was more on S3A and WASB.

Managing this would get somewhat easier if we get hadoop-azure-datalake to 
branch-2 first.  I just commented on HADOOP-13037 stating that I am +1 for a 
merge to branch-2.

> Add ADLS to hadoop-cloud-storage-project
> 
>
> Key: HADOOP-14122
> URL: https://issues.apache.org/jira/browse/HADOOP-14122
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
>Affects Versions: 3.0.0-alpha3
>Reporter: John Zhuge
>
> Add hadoop-azure-datalake to hadoop-cloud-storage-project.
> HADOOP-13687 did include hadoop-azure-datalake at one point.
> [~cnauroth], could you comment?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13037) Refactor Azure Data Lake Store as an independent FileSystem

2017-03-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897903#comment-15897903
 ] 

Chris Nauroth commented on HADOOP-13037:


+1 for backporting to branch-2.

> Refactor Azure Data Lake Store as an independent FileSystem
> ---
>
> Key: HADOOP-13037
> URL: https://issues.apache.org/jira/browse/HADOOP-13037
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
>Reporter: Shrikant Naidu
>Assignee: Vishwajeet Dusane
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13037-001.patch, HADOOP-13037-002.patch, 
> HADOOP-13037-003.patch, HADOOP-13037-004.patch, HADOOP-13037.005.patch, 
> HADOOP-13037.006.patch, HADOOP-13037 Proposal.pdf
>
>
> The jira proposes an improvement over HADOOP-12666 to remove webhdfs 
> dependencies from the ADL file system client and build out a standalone 
> client. At a high level, this approach would extend the Hadoop file system 
> class to provide an implementation for accessing Azure Data Lake. The scheme 
> used for accessing the file system will continue to be 
> adl://.azuredatalake.net/path/to/file. 
> The Azure Data Lake Cloud Store will continue to provide a webHDFS rest 
> interface. The client will  access the ADLS store using WebHDFS Rest APIs 
> provided by the ADLS store. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13946) Document how HDFS updates timestamps in the FS spec; compare with object stores

2017-03-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897890#comment-15897890
 ] 

Chris Nauroth commented on HADOOP-13946:


The content looks great.  I have a few minor copy editing comments.

{code}
* When a file is renamed, it's modification time is not changed, *but the source
  and destination directories have their modification times updated.
{code}

Replace with "its modification time".  I think you were going for italics with 
that '\*', but it's unclosed and rendering the actual '\*' into the page when I 
build the site.

{code}
  the default granularity is 1 houre. If the precision set to zero, access times
{code}

Change to "hours" and "precision is set to zero".

{code}
  `create()` call, or the actual the time which the PUT request was initiated.
{code}

Remove the extra "the".

{code}
 * `FileSystem.chmod()` may update modification times (example: Azure wasb://).
{code}

Markdown is rendering an actualy hyperlink that targets "wasb://)", which does 
nothing when clicked.  Can this be suppressed or changed to a link to our WASB 
document?

After that, this patch will be ready to go.


> Document how HDFS updates timestamps in the FS spec; compare with object 
> stores
> ---
>
> Key: HADOOP-13946
> URL: https://issues.apache.org/jira/browse/HADOOP-13946
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation, fs
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-13946-001.patch, HADOOP-13946-002.patch, 
> HADOOP-13946-003.patch, HADOOP-13946-004.patch
>
>
> SPARK-17159 shows that the behavior of when HDFS updates timestamps isn't 
> well documented. Document these in the FS spec.
> I'm not going to add tests for this, as it is so very dependent on FS 
> implementations, as in "POSIX filesystems may behave differently from HDFS". 
> If someone knows what happens there, their contribution is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13998) initial s3guard preview

2017-02-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854656#comment-15854656
 ] 

Chris Nauroth commented on HADOOP-13998:


+1 for proceeding with a trunk merge vote after resolving the linked issues.  
(I just added the HADOOP-14051 documentation fix mentioned in the last comment.)

> initial s3guard preview
> ---
>
> Key: HADOOP-13998
> URL: https://issues.apache.org/jira/browse/HADOOP-13998
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> JIRA to link in all the things we think are needed for a preview/merge into 
> trunk



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13589) S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.

2017-01-18 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13589:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HADOOP-13345
   Status: Resolved  (was: Patch Available)

+1 for patch 005.  It looks good, ignoring the known test failures.  I 
committed this to the HADOOP-13345 feature branch.  [~ste...@apache.org], thank 
you for the patch.

> S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.
> ---
>
> Key: HADOOP-13589
> URL: https://issues.apache.org/jira/browse/HADOOP-13589
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Chris Nauroth
>Assignee: Steve Loughran
> Fix For: HADOOP-13345
>
> Attachments: HADOOP-13589-HADOOP-13345-001.patch, 
> HADOOP-13589-HADOOP-13345-002.patch, HADOOP-13589-HADOOP-13345-002.patch, 
> HADOOP-13589-HADOOP-13345-004.patch, HADOOP-13589-HADOOP-13345-005.patch
>
>
> With S3Guard enabled, S3A must continue to be functionally correct.  The best 
> way to verify this is to execute our existing S3A integration tests in a mode 
> with S3A enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13650) S3Guard: Provide command line tools to manipulate metadata store.

2017-01-17 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826793#comment-15826793
 ] 

Chris Nauroth commented on HADOOP-13650:


Thanks for investigating the test failures.  A separate JIRA sounds fine to me. 
 I'd be happy to help with code review and a test run on my machine to make 
sure it works.

> S3Guard: Provide command line tools to manipulate metadata store.
> -
>
> Key: HADOOP-13650
> URL: https://issues.apache.org/jira/browse/HADOOP-13650
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Fix For: HADOOP-13345
>
> Attachments: HADOOP-13650-HADOOP-13345.000.patch, 
> HADOOP-13650-HADOOP-13345.001.patch, HADOOP-13650-HADOOP-13345.002.patch, 
> HADOOP-13650-HADOOP-13345.003.patch, HADOOP-13650-HADOOP-13345.004.patch, 
> HADOOP-13650-HADOOP-13345.005.patch, HADOOP-13650-HADOOP-13345.006.patch, 
> HADOOP-13650-HADOOP-13345.007.patch, HADOOP-13650-HADOOP-13345.008.patch, 
> HADOOP-13650-HADOOP-13345.009.patch
>
>
> Similar systems like EMRFS has the CLI tools to manipulate the metadata 
> store, i.e., create or delete metadata store, or {{import}}, {{sync}} the 
> file metadata between metadata store and S3. 
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13589) S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.

2017-01-17 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826633#comment-15826633
 ] 

Chris Nauroth commented on HADOOP-13589:


The latest patch looks good to me.  Unfortunately, I can't fully verify it 
because of some test failures that are likely unrelated.  I commented on 
HADOOP-13650 to try to figure out what to do about those test failures.

> S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.
> ---
>
> Key: HADOOP-13589
> URL: https://issues.apache.org/jira/browse/HADOOP-13589
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Chris Nauroth
>Assignee: Steve Loughran
> Attachments: HADOOP-13589-HADOOP-13345-001.patch, 
> HADOOP-13589-HADOOP-13345-002.patch, HADOOP-13589-HADOOP-13345-002.patch, 
> HADOOP-13589-HADOOP-13345-004.patch, HADOOP-13589-HADOOP-13345-005.patch
>
>
> With S3Guard enabled, S3A must continue to be functionally correct.  The best 
> way to verify this is to execute our existing S3A integration tests in a mode 
> with S3A enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13650) S3Guard: Provide command line tools to manipulate metadata store.

2017-01-17 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826630#comment-15826630
 ] 

Chris Nauroth commented on HADOOP-13650:


I'm getting failures in {{TestS3GuardTool}}.  Is anyone else seeing this?

{code}
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.427 sec <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.TestS3GuardTool
testInitDynamoDBMetadataStore(org.apache.hadoop.fs.s3a.s3guard.TestS3GuardTool) 
 Time elapsed: 1.377 sec  <<< FAILURE!
java.lang.AssertionError: expected:<0> but was:<-1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.fs.s3a.s3guard.TestS3GuardTool.testInitDynamoDBMetadataStore(TestS3GuardTool.java:112)
{code}

{code}
Must specify -e ENDPOINT or fs.s3a.s3guard.ddb.endpoint in conf.
{code}

I had specified {{fs.s3a.endpoint}}, but not {{fs.s3a.s3guard.ddb.endpoint}}, 
so I tried adding {{fs.s3a.s3guard.ddb.endpoint}}, and then I got a different 
error:

{code}
2017-01-17 10:59:59,696 [main] INFO  s3guard.S3GuardTool 
(S3GuardTool.java:initMetadataStore(148)) - create metadata store: 
dynamodb://s3guard_test_init_ddb_table scheme: dynamodb
2017-01-17 11:00:00,092 [main] INFO  json.JsonContent 
(JsonContent.java:parseJsonContent(74)) - Unable to parse HTTP response content
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 
60)): expected a valid value (number, String, array, object, 'true', 'false' or 
'null')
 at [Source: [B@2b5825fa; line: 1, column: 2]
at 
com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1586)
at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:521)
at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:450)
at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2631)
at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:854)
at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:748)
at 
com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3847)
at 
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3792)
at 
com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2355)
at 
com.amazonaws.protocol.json.JsonContent.parseJsonContent(JsonContent.java:72)
at com.amazonaws.protocol.json.JsonContent.(JsonContent.java:64)
at 
com.amazonaws.protocol.json.JsonContent.createJsonContent(JsonContent.java:54)
at 
com.amazonaws.http.JsonErrorResponseHandler.handle(JsonErrorResponseHandler.java:61)
at 
com.amazonaws.http.JsonErrorResponseHandler.handle(JsonErrorResponseHandler.java:33)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1495)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1167)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:1722)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:1698)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.describeTable(AmazonDynamoDBClient.java:1096)
at 
com.amazonaws.services.dynamodbv2.document.Table.describe(Table.java:130)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.initTable(DynamoDBMetadataStore.java:572)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.initialize(DynamoDBMetadataStore.java:270)
at 
org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.initMetadataStore(S3GuardTool.java:171)
at 
org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$InitMetadata.run(S3GuardTool.java:257)
at 

[jira] [Commented] (HADOOP-13589) S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.

2017-01-11 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818945#comment-15818945
 ] 

Chris Nauroth commented on HADOOP-13589:


Reviewing patch 004, we still need to update {{NullMetadataStore#toString}} to 
avoid the unnecessary {{StringBuilder}}.  Also, some of the contract subclasses 
are now instantiating their own {{Configuration}} while others are calling 
{{super.createConfiguration()}} and then overriding specific properties.  I 
think the latter is better so that these suites would inherit changes in the 
configuration logic from the base class.

This definitely works though for running the contract tests with S3Guard 
enabled and integrated with DynamoDB.  I did a test run both with and without 
the new profiles.

> S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.
> ---
>
> Key: HADOOP-13589
> URL: https://issues.apache.org/jira/browse/HADOOP-13589
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Chris Nauroth
>Assignee: Steve Loughran
> Attachments: HADOOP-13589-HADOOP-13345-001.patch, 
> HADOOP-13589-HADOOP-13345-002.patch, HADOOP-13589-HADOOP-13345-002.patch, 
> HADOOP-13589-HADOOP-13345-004.patch
>
>
> With S3Guard enabled, S3A must continue to be functionally correct.  The best 
> way to verify this is to execute our existing S3A integration tests in a mode 
> with S3A enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13589) S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.

2017-01-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815904#comment-15815904
 ] 

Chris Nauroth commented on HADOOP-13589:


I just realized that this doesn't cover the contract tests.  To achieve that, 
we'd need to update every {{ITestS3A*}} class to call 
{{S3ATestUtils#maybeEnableS3Guard}}.  The repetition is unfortunate, but I 
don't see a way to avoid it.

> S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.
> ---
>
> Key: HADOOP-13589
> URL: https://issues.apache.org/jira/browse/HADOOP-13589
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Chris Nauroth
>Assignee: Steve Loughran
> Attachments: HADOOP-13589-HADOOP-13345-001.patch, 
> HADOOP-13589-HADOOP-13345-002.patch, HADOOP-13589-HADOOP-13345-002.patch
>
>
> With S3Guard enabled, S3A must continue to be functionally correct.  The best 
> way to verify this is to execute our existing S3A integration tests in a mode 
> with S3A enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13908) S3Guard: Existing tables may not be initialized correctly in DynamoDBMetadataStore

2017-01-09 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13908:
---
   Resolution: Fixed
Fix Version/s: HADOOP-13345
   Status: Resolved  (was: Patch Available)

I committed this to the HADOOP-13345 feature branch.  [~liuml07], thank you for 
the patch.

> S3Guard: Existing tables may not be initialized correctly in 
> DynamoDBMetadataStore
> --
>
> Key: HADOOP-13908
> URL: https://issues.apache.org/jira/browse/HADOOP-13908
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: HADOOP-13345
>
> Attachments: HADOOP-13908-HADOOP-13345.000.patch, 
> HADOOP-13908-HADOOP-13345.001.patch, HADOOP-13908-HADOOP-13345.002.patch, 
> HADOOP-13908-HADOOP-13345.002.patch, HADOOP-13908-HADOOP-13345.003.patch, 
> HADOOP-13908-HADOOP-13345.004.patch, HADOOP-13908-HADOOP-13345.005.patch, 
> HADOOP-13908-HADOOP-13345.006.patch
>
>
> This was based on discussion in [HADOOP-13455]. Though we should not create 
> table unless the config {{fs.s3a.s3guard.ddb.table.create}} is set true, we 
> still have to get the existing table in 
> {{DynamoDBMetadataStore#initialize()}} and wait for its becoming active, 
> before any table/item operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13589) S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.

2017-01-09 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813087#comment-15813087
 ] 

Chris Nauroth commented on HADOOP-13589:


This looks great.  Thanks for picking it up.  I tested against us-west-2 with 
and without the new S3Guard options enabled.

Steve, do you want to skip the {{StringBuilder}} in 
{{NullMetadataStore#toString}}, since it always returns the same string 
literal?  With that and a pre-commit run, I think this patch will be ready to 
go.

> S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.
> ---
>
> Key: HADOOP-13589
> URL: https://issues.apache.org/jira/browse/HADOOP-13589
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Steve Loughran
> Attachments: HADOOP-13589-HADOOP-13345-001.patch
>
>
> With S3Guard enabled, S3A must continue to be functionally correct.  The best 
> way to verify this is to execute our existing S3A integration tests in a mode 
> with S3A enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13650) S3Guard: Provide command line tools to manipulate metadata store.

2017-01-09 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812476#comment-15812476
 ] 

Chris Nauroth commented on HADOOP-13650:


[~eddyxu], it looks like this patch needs a rebase.  Would you mind doing that 
before I take another look?  I'd prefer to be able to do a full distro build 
and test the sub-command in action while reviewing the code.

> S3Guard: Provide command line tools to manipulate metadata store.
> -
>
> Key: HADOOP-13650
> URL: https://issues.apache.org/jira/browse/HADOOP-13650
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HADOOP-13650-HADOOP-13345.000.patch, 
> HADOOP-13650-HADOOP-13345.001.patch, HADOOP-13650-HADOOP-13345.002.patch, 
> HADOOP-13650-HADOOP-13345.003.patch, HADOOP-13650-HADOOP-13345.004.patch
>
>
> Similar systems like EMRFS has the CLI tools to manipulate the metadata 
> store, i.e., create or delete metadata store, or {{import}}, {{sync}} the 
> file metadata between metadata store and S3. 
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13929) ADLS should not check in contract-test-options.xml

2017-01-09 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812470#comment-15812470
 ] 

Chris Nauroth commented on HADOOP-13929:


[~jzhuge], there are a lot of nice clean-ups here.  Thank you!  Unfortunately, 
I've lost my configurations for live testing against ADL, so I'm going to need 
to recreate that.  Meanwhile, [~vishwajeet.dusane] and [~ASikaria], if you get 
a test run completed with this patch before me, please comment and let me know.

> ADLS should not check in contract-test-options.xml
> --
>
> Key: HADOOP-13929
> URL: https://issues.apache.org/jira/browse/HADOOP-13929
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, test
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>Assignee: John Zhuge
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13929.001.patch, HADOOP-13929.002.patch, 
> HADOOP-13929.003.patch, HADOOP-13929.004.patch, HADOOP-13929.005.patch, 
> HADOOP-13929.006.patch, HADOOP-13929.007.patch, HADOOP-13929.008.patch
>
>
> Should not check in the file {{contract-test-options.xml}}. Make sure the 
> file is excluded by {{.gitignore}}. Make sure ADLS {{index.md}} provides a 
> complete example of this file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13946) Document how HDFS updates timestamps in the FS spec; compare with object stores

2017-01-09 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812450#comment-15812450
 ] 

Chris Nauroth commented on HADOOP-13946:


bq. creation time? That of mkdir()?

I think I'm confused by the mentions of creation time.  We have mtime and atime 
in {{FileStatus}}.  AFAIK, the inode data structures in the NameNode don't 
track a separate notion of creation time, just mtime and atime.  Is there 
something I've missed?

bq. what operations on child entries update the modtime? mkdir, create, delete, 
rename. And which don't?

Yes, and to this please also add concat.

bq. chmod and set time calls?

chmod (and setacl) and setTimes do not alter any modification times, neither on 
the target path itself nor its parent.

> Document how HDFS updates timestamps in the FS spec; compare with object 
> stores
> ---
>
> Key: HADOOP-13946
> URL: https://issues.apache.org/jira/browse/HADOOP-13946
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation, fs
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13946-001.patch, HADOOP-13946-002.patch
>
>
> SPARK-17159 shows that the behavior of when HDFS updates timestamps isn't 
> well documented. Document these in the FS spec.
> I'm not going to add tests for this, as it is so very dependent on FS 
> implementations, as in "POSIX filesystems may behave differently from HDFS". 
> If someone knows what happens there, their contribution is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13908) S3Guard: Existing tables may not be initialized correctly in DynamoDBMetadataStore

2017-01-09 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812406#comment-15812406
 ] 

Chris Nauroth commented on HADOOP-13908:


[~liuml07], it looks like we'll need to rebase this patch after a commit today. 
 Once that's available, I'd be happy to do a final test run and commit the 
patch.

> S3Guard: Existing tables may not be initialized correctly in 
> DynamoDBMetadataStore
> --
>
> Key: HADOOP-13908
> URL: https://issues.apache.org/jira/browse/HADOOP-13908
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-13908-HADOOP-13345.000.patch, 
> HADOOP-13908-HADOOP-13345.001.patch, HADOOP-13908-HADOOP-13345.002.patch, 
> HADOOP-13908-HADOOP-13345.002.patch, HADOOP-13908-HADOOP-13345.003.patch, 
> HADOOP-13908-HADOOP-13345.004.patch, HADOOP-13908-HADOOP-13345.005.patch
>
>
> This was based on discussion in [HADOOP-13455]. Though we should not create 
> table unless the config {{fs.s3a.s3guard.ddb.table.create}} is set true, we 
> still have to get the existing table in 
> {{DynamoDBMetadataStore#initialize()}} and wait for its becoming active, 
> before any table/item operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13336) S3A to support per-bucket configuration

2017-01-09 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812397#comment-15812397
 ] 

Chris Nauroth commented on HADOOP-13336:


I'm in favor of the approach, and there are already plenty of reviewers 
commenting, so I'll focus my code review energies elsewhere.  Thanks, Steve!

> S3A to support per-bucket configuration
> ---
>
> Key: HADOOP-13336
> URL: https://issues.apache.org/jira/browse/HADOOP-13336
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13336-006.patch, HADOOP-13336-007.patch, 
> HADOOP-13336-HADOOP-13345-001.patch, HADOOP-13336-HADOOP-13345-002.patch, 
> HADOOP-13336-HADOOP-13345-003.patch, HADOOP-13336-HADOOP-13345-004.patch, 
> HADOOP-13336-HADOOP-13345-005.patch, HADOOP-13336-HADOOP-13345-006.patch
>
>
> S3a now supports different regions, by way of declaring the endpoint —but you 
> can't do things like read in one region, write back in another (e.g. a distcp 
> backup), because only one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt 
> s3a://b2.seol , then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, 
> etc, in the XML file. Would we need to do that much? It'd be simpler 
> initially to use a domain suffix of a URL to set the region of a bucket from 
> the domain and have the aws library sort the details out itself, maybe with 
> some config options for working with non-AWS infra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13908) S3Guard: Existing tables may not be initialized correctly in DynamoDBMetadataStore

2017-01-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806195#comment-15806195
 ] 

Chris Nauroth commented on HADOOP-13908:


[~liuml07], thank you for the update.  I agree with proceeding with patch 005 
and possibly optimizing the table probe logic in scope of a different JIRA 
issue.  I will hold off committing in case Steve wants to respond again, since 
he made the earlier comment about preferring to avoid {{table.describe()}}.

> S3Guard: Existing tables may not be initialized correctly in 
> DynamoDBMetadataStore
> --
>
> Key: HADOOP-13908
> URL: https://issues.apache.org/jira/browse/HADOOP-13908
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-13908-HADOOP-13345.000.patch, 
> HADOOP-13908-HADOOP-13345.001.patch, HADOOP-13908-HADOOP-13345.002.patch, 
> HADOOP-13908-HADOOP-13345.002.patch, HADOOP-13908-HADOOP-13345.003.patch, 
> HADOOP-13908-HADOOP-13345.004.patch, HADOOP-13908-HADOOP-13345.005.patch
>
>
> This was based on discussion in [HADOOP-13455]. Though we should not create 
> table unless the config {{fs.s3a.s3guard.ddb.table.create}} is set true, we 
> still have to get the existing table in 
> {{DynamoDBMetadataStore#initialize()}} and wait for its becoming active, 
> before any table/item operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-13908) S3Guard: Existing tables may not be initialized correctly in DynamoDBMetadataStore

2017-01-06 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HADOOP-13908:
--

Assignee: Chris Nauroth  (was: Mingliang Liu)

> S3Guard: Existing tables may not be initialized correctly in 
> DynamoDBMetadataStore
> --
>
> Key: HADOOP-13908
> URL: https://issues.apache.org/jira/browse/HADOOP-13908
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Mingliang Liu
>Assignee: Chris Nauroth
> Attachments: HADOOP-13908-HADOOP-13345.000.patch, 
> HADOOP-13908-HADOOP-13345.001.patch, HADOOP-13908-HADOOP-13345.002.patch, 
> HADOOP-13908-HADOOP-13345.002.patch, HADOOP-13908-HADOOP-13345.003.patch, 
> HADOOP-13908-HADOOP-13345.004.patch
>
>
> This was based on discussion in [HADOOP-13455]. Though we should not create 
> table unless the config {{fs.s3a.s3guard.ddb.table.create}} is set true, we 
> still have to get the existing table in 
> {{DynamoDBMetadataStore#initialize()}} and wait for its becoming active, 
> before any table/item operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13908) S3Guard: Existing tables may not be initialized correctly in DynamoDBMetadataStore

2017-01-06 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13908:
---
Assignee: Mingliang Liu  (was: Chris Nauroth)

> S3Guard: Existing tables may not be initialized correctly in 
> DynamoDBMetadataStore
> --
>
> Key: HADOOP-13908
> URL: https://issues.apache.org/jira/browse/HADOOP-13908
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-13908-HADOOP-13345.000.patch, 
> HADOOP-13908-HADOOP-13345.001.patch, HADOOP-13908-HADOOP-13345.002.patch, 
> HADOOP-13908-HADOOP-13345.002.patch, HADOOP-13908-HADOOP-13345.003.patch, 
> HADOOP-13908-HADOOP-13345.004.patch
>
>
> This was based on discussion in [HADOOP-13455]. Though we should not create 
> table unless the config {{fs.s3a.s3guard.ddb.table.create}} is set true, we 
> still have to get the existing table in 
> {{DynamoDBMetadataStore#initialize()}} and wait for its becoming active, 
> before any table/item operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13931) S3AGuard: Use BatchWriteItem in DynamoDBMetadataStore#put()

2017-01-06 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13931:
---
   Resolution: Fixed
Fix Version/s: HADOOP-13345
   Status: Resolved  (was: Patch Available)

I have committed this patch to the HADOOP-13345 feature branch.

bq. I found they failed last time because I ran the tests both on my local 
machine and the AWS EC2 vm using the same S3 bucket at the same time, which is 
not supported I believe. Sorry for the last confusing comment.

No worries!   Yes, we try to do our best on test isolation within a single mvn 
run so that multiples test suites can run in parallel, but we don't currently 
attempt to isolate for multiple concurrent mvn runs against the same bucket.  
For some test suites, like the root path tests and the multi-part purge tests, 
it would probably be impossible.

> S3AGuard: Use BatchWriteItem in DynamoDBMetadataStore#put()
> ---
>
> Key: HADOOP-13931
> URL: https://issues.apache.org/jira/browse/HADOOP-13931
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Rajesh Balamohan
>Assignee: Mingliang Liu
>Priority: Minor
> Fix For: HADOOP-13345
>
> Attachments: HADOOP-13931-HADOOP-13345.000.patch, 
> HADOOP-13931-HADOOP-13345.001.patch
>
>
> Using {{batchWriteItem}} might be performant in 
> {{DynamoDBMetadataStore#put(DirListingMetadata meta)}} and  
> {{DynamoDBMetadataStore#put(PathMetadata meta)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13929) ADLS should not check in contract-test-options.xml

2017-01-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805224#comment-15805224
 ] 

Chris Nauroth commented on HADOOP-13929:


I know John is making more changes, but here is my reply to the most recent 
comments.

Patch 005 deleted contract-test-options.xml, but kept it listed in .gitignore.  
I don't think it needs to remain in .gitignore, because there is no other file 
anywhere in the source tree named contract-test-options.xml, besides the ADL 
one that the patch deletes.

In general though, trying to achieve some commonality across these file names 
and using .gitignore entries that can cover all sub-modules globally sound like 
a great move for future-proofing this.

bq. Another question: can we add another property fs.contract.test.enabled 
(default true to be backwards compatible)?

I'm not sure I completely understand, but does the lack of fs in this new 
property mean that it would be global, not tied to a specific file system's 
tests?  If it's global, then a weakness of this approach is that for developers 
running {{mvn verify}} at the root of the source tree or at the root of 
hadoop-cloud-storage-project, it would be all or nothing.  If a developer had 
credentials to AWS and Azure but not OpenStack, then they'd need to do 
something different to run just for the modules where they have credentials.

> ADLS should not check in contract-test-options.xml
> --
>
> Key: HADOOP-13929
> URL: https://issues.apache.org/jira/browse/HADOOP-13929
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, test
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>Assignee: John Zhuge
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13929.001.patch, HADOOP-13929.002.patch, 
> HADOOP-13929.003.patch, HADOOP-13929.004.patch, HADOOP-13929.005.patch
>
>
> Should not check in the file {{contract-test-options.xml}}. Make sure the 
> file is excluded by {{.gitignore}}. Make sure ADLS {{index.md}} provides a 
> complete example of this file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13931) S3AGuard: Use BatchWriteItem in DynamoDBMetadataStore#put()

2017-01-05 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13931:
---
Hadoop Flags: Reviewed

+1 pending pre-commit.

> S3AGuard: Use BatchWriteItem in DynamoDBMetadataStore#put()
> ---
>
> Key: HADOOP-13931
> URL: https://issues.apache.org/jira/browse/HADOOP-13931
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Rajesh Balamohan
>Assignee: Mingliang Liu
>Priority: Minor
> Attachments: HADOOP-13931-HADOOP-13345.000.patch, 
> HADOOP-13931-HADOOP-13345.001.patch
>
>
> Using {{batchWriteItem}} might be performant in 
> {{DynamoDBMetadataStore#put(DirListingMetadata meta)}} and  
> {{DynamoDBMetadataStore#put(PathMetadata meta)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13934) S3Guard: DynamoDBMetadataStore#move() could be throwing exception due to BatchWriteItem limits

2017-01-05 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13934:
---
   Resolution: Fixed
Fix Version/s: HADOOP-13345
   Status: Resolved  (was: Patch Available)

I committed this to the HADOOP-13345 feature branch.  [~liuml07], thank you for 
the patch.

> S3Guard: DynamoDBMetadataStore#move() could be throwing exception due to 
> BatchWriteItem limits
> --
>
> Key: HADOOP-13934
> URL: https://issues.apache.org/jira/browse/HADOOP-13934
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Rajesh Balamohan
>Assignee: Mingliang Liu
>Priority: Minor
> Fix For: HADOOP-13345
>
> Attachments: HADOOP-13934-HADOOP-13345.000.patch, 
> HADOOP-13934-HADOOP-13345.001.patch, HADOOP-13934-HADOOP-13345.002.patch, 
> HADOOP-13934-HADOOP-13345.003.patch, HADOOP-13934-HADOOP-13345.004.patch, 
> HADOOP-13934-HADOOP-13345.005.patch
>
>
> When using {{DynamoDBMetadataStore}} with a insert heavy hive app , it 
> started throwing exceptions in {{DynamoDBMetadataStore::move}}. But just with 
> the following exception, it is relatively hard to debug on the real issue in 
> DynamoDB side. 
> {noformat}
> Caused by: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: 1 
> validation error detected: Value 
> '{ddb-table-name-334=[com.amazonaws.dynamodb.v20120810.WriteRequest@ca1da583, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca1fc7cd, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca4244e6, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca2f58a9, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca3525f8,
> ...
> ...
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1529)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1167)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:1722)
> at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:1698)
> at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.batchWriteItem(AmazonDynamoDBClient.java:668)
> at 
> com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.doBatchWriteItem(BatchWriteItemImpl.java:111)
> at 
> com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.batchWriteItem(BatchWriteItemImpl.java:52)
> at 
> com.amazonaws.services.dynamodbv2.document.DynamoDB.batchWriteItem(DynamoDB.java:178)
> at 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:351)
> ... 28 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13908) S3Guard: Existing tables may not be initialized correctly in DynamoDBMetadataStore

2017-01-05 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13908:
---
Hadoop Flags: Reviewed

+1 for the patch.  As described by Mingliang, there are some test failures 
already tracked elsewhere.

> S3Guard: Existing tables may not be initialized correctly in 
> DynamoDBMetadataStore
> --
>
> Key: HADOOP-13908
> URL: https://issues.apache.org/jira/browse/HADOOP-13908
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-13908-HADOOP-13345.000.patch, 
> HADOOP-13908-HADOOP-13345.001.patch, HADOOP-13908-HADOOP-13345.002.patch, 
> HADOOP-13908-HADOOP-13345.002.patch, HADOOP-13908-HADOOP-13345.003.patch, 
> HADOOP-13908-HADOOP-13345.004.patch
>
>
> This was based on discussion in [HADOOP-13455]. Though we should not create 
> table unless the config {{fs.s3a.s3guard.ddb.table.create}} is set true, we 
> still have to get the existing table in 
> {{DynamoDBMetadataStore#initialize()}} and wait for its becoming active, 
> before any table/item operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13934) S3Guard: DynamoDBMetadataStore#move() could be throwing exception due to BatchWriteItem limits

2017-01-05 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13934:
---
Hadoop Flags: Reviewed

+1 for the patch.  I think it would be fine to address the issues described in 
Mingliang's last comment in separate JIRAs.

> S3Guard: DynamoDBMetadataStore#move() could be throwing exception due to 
> BatchWriteItem limits
> --
>
> Key: HADOOP-13934
> URL: https://issues.apache.org/jira/browse/HADOOP-13934
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Rajesh Balamohan
>Assignee: Mingliang Liu
>Priority: Minor
> Attachments: HADOOP-13934-HADOOP-13345.000.patch, 
> HADOOP-13934-HADOOP-13345.001.patch, HADOOP-13934-HADOOP-13345.002.patch, 
> HADOOP-13934-HADOOP-13345.003.patch, HADOOP-13934-HADOOP-13345.004.patch, 
> HADOOP-13934-HADOOP-13345.005.patch
>
>
> When using {{DynamoDBMetadataStore}} with a insert heavy hive app , it 
> started throwing exceptions in {{DynamoDBMetadataStore::move}}. But just with 
> the following exception, it is relatively hard to debug on the real issue in 
> DynamoDB side. 
> {noformat}
> Caused by: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: 1 
> validation error detected: Value 
> '{ddb-table-name-334=[com.amazonaws.dynamodb.v20120810.WriteRequest@ca1da583, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca1fc7cd, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca4244e6, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca2f58a9, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca3525f8,
> ...
> ...
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1529)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1167)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:1722)
> at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:1698)
> at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.batchWriteItem(AmazonDynamoDBClient.java:668)
> at 
> com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.doBatchWriteItem(BatchWriteItemImpl.java:111)
> at 
> com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.batchWriteItem(BatchWriteItemImpl.java:52)
> at 
> com.amazonaws.services.dynamodbv2.document.DynamoDB.batchWriteItem(DynamoDB.java:178)
> at 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:351)
> ... 28 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801890#comment-15801890
 ] 

Chris Nauroth commented on HADOOP-13345:


I would prefer to see HADOOP-13589 completed before merge.  Being able to run 
the existing S3A test suite with S3Guard enabled would help ensure that we're 
maintaining the existing semantics as much as possible as we iterate further on 
the code.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13336) S3A to support per-bucket configuration

2017-01-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801879#comment-15801879
 ] 

Chris Nauroth commented on HADOOP-13336:


+1 for the plan described in Steve's last comment.

Longer term, I wonder if we can find some commonality with other configuration 
prefix qualification use cases in the codebase, e.g. the {{DFSUtil}} methods 
for qualifying NameNode address configuration by nameservice in an HA or 
federated deployment.  Perhaps all such use cases could be handled by common 
utilities.  No need to worry about this in scope of this JIRA though.  We can 
always refactor later if we find commonality.

> S3A to support per-bucket configuration
> ---
>
> Key: HADOOP-13336
> URL: https://issues.apache.org/jira/browse/HADOOP-13336
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> S3a now supports different regions, by way of declaring the endpoint —but you 
> can't do things like read in one region, write back in another (e.g. a distcp 
> backup), because only one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt 
> s3a://b2.seol , then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, 
> etc, in the XML file. Would we need to do that much? It'd be simpler 
> initially to use a domain suffix of a URL to set the region of a bucket from 
> the domain and have the aws library sort the details out itself, maybe with 
> some config options for working with non-AWS infra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13736) Change PathMetadata to hold S3AFileStatus instead of FileStatus.

2017-01-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801864#comment-15801864
 ] 

Chris Nauroth commented on HADOOP-13736:


I agree that we should proceed with this patch, but it needs a rebase.  
[~eddyxu], can you please do that?  Thank you.

> Change PathMetadata to hold S3AFileStatus instead of FileStatus.
> 
>
> Key: HADOOP-13736
> URL: https://issues.apache.org/jira/browse/HADOOP-13736
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HADOOP-13736-HADOOP-13345.000.patch, 
> HADOOP-13736-HADOOP-13345.001.patch, HADOOP-13736.000.patch, 
> HADOOP-13736.wip-01.patch
>
>
> {{S3AFileStatus}} is implemented differently with {{FileStatus}}, for 
> instance {{S3AFileStatus#isEmptyDirectory()}} is not implemented in 
> {{FileStatus()}}. And {{access_time}}, {{block_replication}}, {{owner}}, 
> {{group}} and a few other fields are not meaningful in {{S3AFileStatus}}.  
> So in the scope of {{S3guard}}, it should use {{S3AFileStatus}} in  instead 
> of {{FileStatus}} in {{PathMetadaa}} to avoid casting the types back and 
> forth in S3A. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13922) Some modules have dependencies on hadoop-client jar removed by HADOOP-11804

2017-01-04 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799654#comment-15799654
 ] 

Chris Nauroth commented on HADOOP-13922:


[~mblo], no worries, and thank you for coming back to confirm that the problem 
no longer repros.

> Some modules have dependencies on hadoop-client jar removed by HADOOP-11804
> ---
>
> Key: HADOOP-13922
> URL: https://issues.apache.org/jira/browse/HADOOP-13922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Joe Pallas
>Assignee: Sean Busbey
>Priority: Blocker
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13922.1.patch
>
>
> As discussed in [HADOOP-11804 comment 
> 15758048|https://issues.apache.org/jira/browse/HADOOP-11804?focusedCommentId=15758048=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15758048]
>  and following comments, there are still dependencies on the now-removed 
> hadoop-client jar.  The current code builds only because an obsolete snapshot 
> of the jar is found on the repository server.  Changing the project version 
> to something new exposes the problem.
> While the build currently dies at hadoop-tools/hadoop-sls, I'm seeing issues 
> with some Hadoop Client modules, too.
> I'm filing a new bug because I can't reopen HADOOP-11804.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13946) Document how HDFS updates timestamps in the FS spec; compare with object stores

2017-01-04 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15798723#comment-15798723
 ] 

Chris Nauroth commented on HADOOP-13946:


It's definitely possible that I messed up.  :-)  I've lost our email thread on 
this, but as I recall, the point about rename and handling modification time of 
directories came up in a follow-up discussion, so it would have been easy for 
that to get lost in the shuffle.

Thank you for working on the updates.

> Document how HDFS updates timestamps in the FS spec; compare with object 
> stores
> ---
>
> Key: HADOOP-13946
> URL: https://issues.apache.org/jira/browse/HADOOP-13946
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation, fs
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13946-001.patch
>
>
> SPARK-17159 shows that the behavior of when HDFS updates timestamps isn't 
> well documented. Document these in the FS spec.
> I'm not going to add tests for this, as it is so very dependent on FS 
> implementations, as in "POSIX filesystems may behave differently from HDFS". 
> If someone knows what happens there, their contribution is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13946) Document how HDFS updates timestamps in the FS spec; compare with object stores

2017-01-03 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796303#comment-15796303
 ] 

Chris Nauroth commented on HADOOP-13946:


Sorry to come late to the review, but I would have liked to see a mention of 
how HDFS rename updates the modification time of both the source and the 
destination folder (though not the modification time of the renamed file 
itself).

Also, regarding this:
{quote}
Object stores have a significantly simpler view of time:
...
 + A file's modification time is always the same as its creation time.
{quote}

This makes it sound like this section covers all object stores, but the 
statement about modification time is not necessarily true universally.  For 
example, on WASB, the {{FileStatus}} on read is always populated with the last 
modified time field of the blob as reported by the Azure Storage service.  I 
think any kind of modification of the blob will result in a change in that 
value.  I specifically tested {{hadoop fs -chmod}} against WASB, and it updated 
the blob's modification time, which is different from HDFS.  Out-of-band blob 
modifications directly through the Azure Storage service, bypassing the 
{{FileSystem}} API, could be another source of perceived changes in the last 
modification time.

I expect this is not consistent across services, and therefore it's unlikely we 
can make accurate statements in the file system spec beyond just saying "it's 
different."  :-)

Please feel free to address this either by reverting and revising or filing a 
new JIRA to track an addendum.

Thanks!

> Document how HDFS updates timestamps in the FS spec; compare with object 
> stores
> ---
>
> Key: HADOOP-13946
> URL: https://issues.apache.org/jira/browse/HADOOP-13946
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation, fs
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13946-001.patch
>
>
> SPARK-17159 shows that the behavior of when HDFS updates timestamps isn't 
> well documented. Document these in the FS spec.
> I'm not going to add tests for this, as it is so very dependent on FS 
> implementations, as in "POSIX filesystems may behave differently from HDFS". 
> If someone knows what happens there, their contribution is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13946) Document how HDFS updates timestamps in the FS spec; compare with object stores

2017-01-03 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HADOOP-13946.

Resolution: Fixed

Reopened just to change resolution from Cannot Reproduce to Fixed.

> Document how HDFS updates timestamps in the FS spec; compare with object 
> stores
> ---
>
> Key: HADOOP-13946
> URL: https://issues.apache.org/jira/browse/HADOOP-13946
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation, fs
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13946-001.patch
>
>
> SPARK-17159 shows that the behavior of when HDFS updates timestamps isn't 
> well documented. Document these in the FS spec.
> I'm not going to add tests for this, as it is so very dependent on FS 
> implementations, as in "POSIX filesystems may behave differently from HDFS". 
> If someone knows what happens there, their contribution is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-13946) Document how HDFS updates timestamps in the FS spec; compare with object stores

2017-01-03 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HADOOP-13946:


> Document how HDFS updates timestamps in the FS spec; compare with object 
> stores
> ---
>
> Key: HADOOP-13946
> URL: https://issues.apache.org/jira/browse/HADOOP-13946
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation, fs
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13946-001.patch
>
>
> SPARK-17159 shows that the behavior of when HDFS updates timestamps isn't 
> well documented. Document these in the FS spec.
> I'm not going to add tests for this, as it is so very dependent on FS 
> implementations, as in "POSIX filesystems may behave differently from HDFS". 
> If someone knows what happens there, their contribution is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13922) Some modules have dependencies on hadoop-client jar removed by HADOOP-11804

2017-01-03 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13922:
---
   Resolution: Fixed
Fix Version/s: 3.0.0-alpha2
   Status: Resolved  (was: Patch Available)

I have committed this to trunk.  [~busbey], thank you for contributing the fix.

[~mblo], I chose to commit this, because it fixes the immediate problem that is 
impacting some of us.  If you continue to reproduce issues running {{mvn 
eclipse:eclipse}}, then please feel free to file a new JIRA issue with the 
details.  Thank you.

> Some modules have dependencies on hadoop-client jar removed by HADOOP-11804
> ---
>
> Key: HADOOP-13922
> URL: https://issues.apache.org/jira/browse/HADOOP-13922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Joe Pallas
>Assignee: Sean Busbey
>Priority: Blocker
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13922.1.patch
>
>
> As discussed in [HADOOP-11804 comment 
> 15758048|https://issues.apache.org/jira/browse/HADOOP-11804?focusedCommentId=15758048=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15758048]
>  and following comments, there are still dependencies on the now-removed 
> hadoop-client jar.  The current code builds only because an obsolete snapshot 
> of the jar is found on the repository server.  Changing the project version 
> to something new exposes the problem.
> While the build currently dies at hadoop-tools/hadoop-sls, I'm seeing issues 
> with some Hadoop Client modules, too.
> I'm filing a new bug because I can't reopen HADOOP-11804.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13650) S3Guard: Provide command line tools to manipulate metadata store.

2016-12-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788486#comment-15788486
 ] 

Chris Nauroth commented on HADOOP-13650:


Actually, part of my last comment was wrong.  Even though it would be a 
built-in, the hadoop-aws jars would only go onto the classpath when the user is 
invoking {{hadoop s3a}}.  I take back what I said about "not sure we really 
want it to be a built-in."  Following the recipe I described should work great. 
 Thanks!

> S3Guard: Provide command line tools to manipulate metadata store.
> -
>
> Key: HADOOP-13650
> URL: https://issues.apache.org/jira/browse/HADOOP-13650
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HADOOP-13650-HADOOP-13345.000.patch, 
> HADOOP-13650-HADOOP-13345.001.patch, HADOOP-13650-HADOOP-13345.002.patch
>
>
> Similar systems like EMRFS has the CLI tools to manipulate the metadata 
> store, i.e., create or delete metadata store, or {{import}}, {{sync}} the 
> file metadata between metadata store and S3. 
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13929) ADLS should not check in contract-test-options.xml

2016-12-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788235#comment-15788235
 ] 

Chris Nauroth commented on HADOOP-13929:


Hello [~jzhuge].  Thank you for the patch.

I would prefer to see this match the layout we've established in places like 
hadoop-azure and hadoop-aws.  We could keep the contract-test-options.xml file 
as a placeholder, which then uses XInclude to include a credential file that is 
left out of source control by .gitignore.  The placeholder file is a spot to 
leave commented out samples of how to configure it.

> ADLS should not check in contract-test-options.xml
> --
>
> Key: HADOOP-13929
> URL: https://issues.apache.org/jira/browse/HADOOP-13929
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, test
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>Assignee: John Zhuge
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13929.001.patch, HADOOP-13929.002.patch
>
>
> Should not check in the file {{contract-test-options.xml}}. Make sure the 
> file is excluded by {{.gitignore}}. Make sure ADLS {{index.md}} provides a 
> complete example of this file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13650) S3Guard: Provide command line tools to manipulate metadata store.

2016-12-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788216#comment-15788216
 ] 

Chris Nauroth commented on HADOOP-13650:


bq. Would you mind to give me advice how to generate 
{{libexec/tools/hadoop-distcp.sh}}?

Hello [~eddyxu].  Here is a general recipe that should work:
# Create file hadoop-aws/src/main/shellprofile.d/hadoop-aws.sh.  The code of 
this shell profile would need to add hadoop-aws.jar and its dependencies to the 
classpath and also call something like {{hadoop_add_subcommand "s3a" "S3A 
Utilities"}} and define {{function hadoop_subcommand_s3a}}.
# Update hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml so 
that the assembly copies the new file to libexec/shellprofile.d when building 
the distro.  You can probably copy and adapt the XML stanza that already does 
it for hadoop-distcp.  This is sufficient to land it into the distro as an 
optional shell profile, which individual deployments or users would have to 
enable explicitly.
# To turn it into a "built-in" like DistCp, which is always on the classpath 
and command set for every deployment, then you would need an additional step.  
Edit hadoop-tools/hadoop-aws/pom.xml and add a copy-dependencies execution.  
You can probably copy and adapt the XML stanza that already does it for 
hadoop-distcp.  Look for "tools-builtin" in hadoop-tools/hadoop-distcp/pom.xml. 
 This will cause it to land in libexec/tools and make it a built-in.

However, I'm not sure we really want it to be a built-in.  Doing so would put 
extra jars onto the classpath for all Hadoop deployments, even for users that 
won't use S3A at all.  That would have the usual side effects of bloating the 
classpath and possibly causing dependency management challenges for end users 
that weren't expecting to receive those jars.

> S3Guard: Provide command line tools to manipulate metadata store.
> -
>
> Key: HADOOP-13650
> URL: https://issues.apache.org/jira/browse/HADOOP-13650
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HADOOP-13650-HADOOP-13345.000.patch, 
> HADOOP-13650-HADOOP-13345.001.patch, HADOOP-13650-HADOOP-13345.002.patch
>
>
> Similar systems like EMRFS has the CLI tools to manipulate the metadata 
> store, i.e., create or delete metadata store, or {{import}}, {{sync}} the 
> file metadata between metadata store and S3. 
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11804) Shaded Hadoop client artifacts and minicluster

2016-12-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788151#comment-15788151
 ] 

Chris Nauroth commented on HADOOP-11804:


I have +1'd the follow-up bug fix patch on HADOOP-13922, but I'm going to hold 
off committing since it's right before a holiday and others might be offline.  
I plan to commit it on Monday, 1/2.

> Shaded Hadoop client artifacts and minicluster
> --
>
> Key: HADOOP-11804
> URL: https://issues.apache.org/jira/browse/HADOOP-11804
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Reporter: Sean Busbey
>Assignee: Sean Busbey
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-11804.1.patch, HADOOP-11804.10.patch, 
> HADOOP-11804.11.patch, HADOOP-11804.12.patch, HADOOP-11804.13.patch, 
> HADOOP-11804.14.patch, HADOOP-11804.2.patch, HADOOP-11804.3.patch, 
> HADOOP-11804.4.patch, HADOOP-11804.5.patch, HADOOP-11804.6.patch, 
> HADOOP-11804.7.patch, HADOOP-11804.8.patch, HADOOP-11804.9.patch, 
> hadoop-11804-client-test.tar.gz
>
>
> make a hadoop-client-api and hadoop-client-runtime that i.e. HBase can use to 
> talk with a Hadoop cluster without seeing any of the implementation 
> dependencies.
> see proposal on parent for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13922) Some modules have dependencies on hadoop-client jar removed by HADOOP-11804

2016-12-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13922:
---
Hadoop Flags: Reviewed

+1 for the patch.  I had a clean new dev box that was hitting this build 
problem, and I confirmed that this patch fixes it.  I think it's the right 
thing to do in the interest of compatibility with existing downstream consumers.

Since it's right before a holiday and interested people might be offline 
partying hard, I will hold off the commit.  I plan to commit this on Monday, 
1/2, unless I hear otherwise.

> Some modules have dependencies on hadoop-client jar removed by HADOOP-11804
> ---
>
> Key: HADOOP-13922
> URL: https://issues.apache.org/jira/browse/HADOOP-13922
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Joe Pallas
>Assignee: Sean Busbey
>Priority: Blocker
> Attachments: HADOOP-13922.1.patch
>
>
> As discussed in [HADOOP-11804 comment 
> 15758048|https://issues.apache.org/jira/browse/HADOOP-11804?focusedCommentId=15758048=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15758048]
>  and following comments, there are still dependencies on the now-removed 
> hadoop-client jar.  The current code builds only because an obsolete snapshot 
> of the jar is found on the repository server.  Changing the project version 
> to something new exposes the problem.
> While the build currently dies at hadoop-tools/hadoop-sls, I'm seeing issues 
> with some Hadoop Client modules, too.
> I'm filing a new bug because I can't reopen HADOOP-11804.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13050) Upgrade to AWS SDK 10.11+

2016-11-22 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687951#comment-15687951
 ] 

Chris Nauroth commented on HADOOP-13050:


[~liuml07], thank you for reviewing.

{quote}
One last concern from the Jenkins run is that, HADOOP-13727 added the 
SharedInstanceProfileCredentialsProvider, which can be removed after this AWS 
SDK upgrade. Do you think we address it here, or separately?
{quote}

I'd suggest handling it in a separate JIRA, targeted only to 3.x (not 2.x).  
It's possible that users of 2.8 will start referring to 
{{SharedInstanceProfileCredentialsProvider}} in their configuration.  If it 
were removed in 2.9, then those configurations would break after a user 
upgrades from 2.8 to 2.9.  Waiting until 3.x for that kind of change would 
adhere to our compatibility guidelines and give those users more of a warning.

> Upgrade to AWS SDK 10.11+
> -
>
> Key: HADOOP-13050
> URL: https://issues.apache.org/jira/browse/HADOOP-13050
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Attachments: HADOOP-13050-001.patch, HADOOP-13050-002.patch, 
> HADOOP-13050-branch-2-003.patch, HADOOP-13050-branch-2-004.patch, 
> HADOOP-13050-branch-2.002.patch, HADOOP-13050-branch-2.003.patch
>
>
> HADOOP-13044 highlights that AWS SDK 10.6 —shipping in Hadoop 2.7+, doesn't 
> work on open jdk >= 8u60, because a change in the JDK broke the version of 
> Joda time that AWS uses.
> Fix, update the JDK. Though, that implies updating http components: 
> HADOOP-12767.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13687) Provide a unified dependency artifact that transitively includes the cloud storage modules shipped with Hadoop.

2016-11-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654431#comment-15654431
 ] 

Chris Nauroth commented on HADOOP-13687:


The Javadoc warnings are not new, and I will not address them as part of this 
patch.

The license warnings seem to be getting triggered on Eclipse-generated files, 
so I don't believe they are immediately addressable.

> Provide a unified dependency artifact that transitively includes the cloud 
> storage modules shipped with Hadoop.
> ---
>
> Key: HADOOP-13687
> URL: https://issues.apache.org/jira/browse/HADOOP-13687
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13687-branch-2.001.patch, 
> HADOOP-13687-branch-2.002.patch, HADOOP-13687-branch-2.003.patch, 
> HADOOP-13687-trunk.001.patch, HADOOP-13687-trunk.002.patch, 
> HADOOP-13687-trunk.003.patch, HADOOP-13687-trunk.004.patch, 
> HADOOP-13687-trunk.005.patch, HADOOP-13687-trunk.006.patch, 
> HADOOP-13687-trunk.006.patch
>
>
> Currently, downstream projects that want to integrate with different 
> Hadoop-compatible file systems like WASB and S3A need to list dependencies on 
> each one.  This creates an ongoing maintenance burden for those projects, 
> because they need to update their build whenever a new Hadoop-compatible file 
> system is introduced.  This issue proposes adding a new artifact that 
> transitively includes all Hadoop-compatible file systems.  Similar to 
> hadoop-client, this new artifact will consist of just a pom.xml listing the 
> individual dependencies.  Downstream users can depend on this artifact to 
> sweep in everything, and picking up a new file system in a future version 
> will be just a matter of updating the Hadoop dependency version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13687) Provide a unified dependency artifact that transitively includes the cloud storage modules shipped with Hadoop.

2016-11-08 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13687:
---
Attachment: HADOOP-13687-trunk.006.patch

I'm attaching trunk revision 006, once again attempting to fix the version 
numbers.

> Provide a unified dependency artifact that transitively includes the cloud 
> storage modules shipped with Hadoop.
> ---
>
> Key: HADOOP-13687
> URL: https://issues.apache.org/jira/browse/HADOOP-13687
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13687-branch-2.001.patch, 
> HADOOP-13687-branch-2.002.patch, HADOOP-13687-branch-2.003.patch, 
> HADOOP-13687-trunk.001.patch, HADOOP-13687-trunk.002.patch, 
> HADOOP-13687-trunk.003.patch, HADOOP-13687-trunk.004.patch, 
> HADOOP-13687-trunk.005.patch, HADOOP-13687-trunk.006.patch
>
>
> Currently, downstream projects that want to integrate with different 
> Hadoop-compatible file systems like WASB and S3A need to list dependencies on 
> each one.  This creates an ongoing maintenance burden for those projects, 
> because they need to update their build whenever a new Hadoop-compatible file 
> system is introduced.  This issue proposes adding a new artifact that 
> transitively includes all Hadoop-compatible file systems.  Similar to 
> hadoop-client, this new artifact will consist of just a pom.xml listing the 
> individual dependencies.  Downstream users can depend on this artifact to 
> sweep in everything, and picking up a new file system in a future version 
> will be just a matter of updating the Hadoop dependency version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13687) Provide a unified dependency artifact that transitively includes the cloud storage modules shipped with Hadoop.

2016-11-08 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13687:
---
Attachment: HADOOP-13687-trunk.005.patch

I'm uploading trunk revision 005 to correct the version numbers in the pom.xml 
files.

> Provide a unified dependency artifact that transitively includes the cloud 
> storage modules shipped with Hadoop.
> ---
>
> Key: HADOOP-13687
> URL: https://issues.apache.org/jira/browse/HADOOP-13687
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13687-branch-2.001.patch, 
> HADOOP-13687-branch-2.002.patch, HADOOP-13687-branch-2.003.patch, 
> HADOOP-13687-trunk.001.patch, HADOOP-13687-trunk.002.patch, 
> HADOOP-13687-trunk.003.patch, HADOOP-13687-trunk.004.patch, 
> HADOOP-13687-trunk.005.patch
>
>
> Currently, downstream projects that want to integrate with different 
> Hadoop-compatible file systems like WASB and S3A need to list dependencies on 
> each one.  This creates an ongoing maintenance burden for those projects, 
> because they need to update their build whenever a new Hadoop-compatible file 
> system is introduced.  This issue proposes adding a new artifact that 
> transitively includes all Hadoop-compatible file systems.  Similar to 
> hadoop-client, this new artifact will consist of just a pom.xml listing the 
> individual dependencies.  Downstream users can depend on this artifact to 
> sweep in everything, and picking up a new file system in a future version 
> will be just a matter of updating the Hadoop dependency version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13037) Azure Data Lake Client: Support Azure data lake as a file system in Hadoop

2016-11-07 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644926#comment-15644926
 ] 

Chris Nauroth commented on HADOOP-13037:


[~chris.douglas] and [~vishwajeet.dusane], I advise setting 
fs.contract.is-blobstore to false.  This flag is now strictly informational and 
does not control behavior of any tests.  Since ADL is not a file system tree 
mapped onto a flat object store, it makes sense to set it to false.

HADOOP-13502 introduced two new flags to control test behavior previously 
controlled by fs.contract.is-blobstore.  For ADL, I believe the correct match 
to its semantics would be fs.contract.create-overwrites-directory=false and 
fs.contract.create-visibility-delayed=false.  The latter is certainly important 
to support HBase expectations, and I know you want ADL to be able to support 
HBase.  Running the ADL subclass of {{AbstractContractCreateTest}} would 
demonstrate if ADL is successfully implementing these semantics.

FYI, I am about one week away from having a viable build environment, so I'll 
be delayed on testing the rebased patch that I mentioned.  HADOOP-13687 likely 
will get committed without moving ADL in the source tree.  We can likely 
accomplish that move within the scope of this JIRA instead.

> Azure Data Lake Client: Support Azure data lake as a file system in Hadoop
> --
>
> Key: HADOOP-13037
> URL: https://issues.apache.org/jira/browse/HADOOP-13037
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/azure, tools
>Reporter: Shrikant Naidu
>Assignee: Vishwajeet Dusane
> Fix For: 2.9.0
>
> Attachments: HADOOP-13037 Proposal.pdf, HADOOP-13037-001.patch, 
> HADOOP-13037-002.patch, HADOOP-13037-003.patch, HADOOP-13037-004.patch, 
> HADOOP-13037.005.patch
>
>
> The jira proposes an improvement over HADOOP-12666 to remove webhdfs 
> dependencies from the ADL file system client and build out a standalone 
> client. At a high level, this approach would extend the Hadoop file system 
> class to provide an implementation for accessing Azure Data Lake. The scheme 
> used for accessing the file system will continue to be 
> adl://.azuredatalake.net/path/to/file. 
> The Azure Data Lake Cloud Store will continue to provide a webHDFS rest 
> interface. The client will  access the ADLS store using WebHDFS Rest APIs 
> provided by the ADLS store. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13687) Provide a unified dependency artifact that transitively includes the cloud storage modules shipped with Hadoop.

2016-11-07 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13687:
---
Attachment: HADOOP-13687-trunk.004.patch

[~ste...@apache.org], this sounds like a good plan to me.  I think this is just 
a matter of taking the branch-2 patch and applying the same thing to trunk.  
I'm attaching a trunk patch rev 004 to test that out.

I am about a week away from having access to a viable build environment for 
testing a rebased HADOOP-13037 patch, so this is a good way to get the new 
hadoop-cloud-storage module into place more quickly.

> Provide a unified dependency artifact that transitively includes the cloud 
> storage modules shipped with Hadoop.
> ---
>
> Key: HADOOP-13687
> URL: https://issues.apache.org/jira/browse/HADOOP-13687
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13687-branch-2.001.patch, 
> HADOOP-13687-branch-2.002.patch, HADOOP-13687-branch-2.003.patch, 
> HADOOP-13687-trunk.001.patch, HADOOP-13687-trunk.002.patch, 
> HADOOP-13687-trunk.003.patch, HADOOP-13687-trunk.004.patch
>
>
> Currently, downstream projects that want to integrate with different 
> Hadoop-compatible file systems like WASB and S3A need to list dependencies on 
> each one.  This creates an ongoing maintenance burden for those projects, 
> because they need to update their build whenever a new Hadoop-compatible file 
> system is introduced.  This issue proposes adding a new artifact that 
> transitively includes all Hadoop-compatible file systems.  Similar to 
> hadoop-client, this new artifact will consist of just a pom.xml listing the 
> individual dependencies.  Downstream users can depend on this artifact to 
> sweep in everything, and picking up a new file system in a future version 
> will be just a matter of updating the Hadoop dependency version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13449) S3Guard: Implement DynamoDBMetadataStore.

2016-10-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15620639#comment-15620639
 ] 

Chris Nauroth commented on HADOOP-13449:


bq. Also: that dynamo DB dependency MUST be at {{}} scope. We don't 
want to force it on people.

[~ste...@apache.org], the DynamoDB client adds no transitive dependencies that 
hadoop-aws is not already picking up through the AWS SDK.  Here we can see the 
only dependencies are on aws-java-sdk-s3 and aws-java-sdk-core:

https://github.com/aws/aws-sdk-java/blob/1.10.6/aws-java-sdk-dynamodb/pom.xml#L19-L45

Everything else is a test-only dependency.

In this case, I wonder if the right trade-off is for us to allow the 
dependency, so that downstream projects can pick up S3Guard functionality 
without needing to add the aws-java-sdk-dynamodb dependency explicitly.  Those 
projects would likely need to keep synchronized with whatever AWS SDK version 
number we're using in Hadoop, so as to avoid internal version conflicts around 
things like aws-java-sdk-core.

> S3Guard: Implement DynamoDBMetadataStore.
> -
>
> Key: HADOOP-13449
> URL: https://issues.apache.org/jira/browse/HADOOP-13449
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Mingliang Liu
> Attachments: HADOOP-13449-HADOOP-13345.000.patch, 
> HADOOP-13449-HADOOP-13345.001.patch
>
>
> Provide an implementation of the metadata store backed by DynamoDB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13037) Azure Data Lake Client: Support Azure data lake as a file system in Hadoop

2016-10-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15620624#comment-15620624
 ] 

Chris Nauroth commented on HADOOP-13037:


As a heads-up, my HADOOP-13687 patch is restructuring hadoop-azure-datalake 
under a new hadoop-cloud-storage module, which will serve as a unified 
dependency artifact that downstream projects can use to pick up all cloud 
storage file systems that ship within Hadoop.  Before I commit HADOOP-13687, 
I'm going to prepare a rebased version of the HADOOP-13037 patch.  This will 
just be a mechanical change of the source file paths in the patch, probably 
mostly automated with sed.

> Azure Data Lake Client: Support Azure data lake as a file system in Hadoop
> --
>
> Key: HADOOP-13037
> URL: https://issues.apache.org/jira/browse/HADOOP-13037
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/azure, tools
>Reporter: Shrikant Naidu
>Assignee: Vishwajeet Dusane
> Fix For: 2.9.0
>
> Attachments: HADOOP-13037 Proposal.pdf, HADOOP-13037-001.patch, 
> HADOOP-13037-002.patch, HADOOP-13037-003.patch
>
>
> The jira proposes an improvement over HADOOP-12666 to remove webhdfs 
> dependencies from the ADL file system client and build out a standalone 
> client. At a high level, this approach would extend the Hadoop file system 
> class to provide an implementation for accessing Azure Data Lake. The scheme 
> used for accessing the file system will continue to be 
> adl://.azuredatalake.net/path/to/file. 
> The Azure Data Lake Cloud Store will continue to provide a webHDFS rest 
> interface. The client will  access the ADLS store using WebHDFS Rest APIs 
> provided by the ADLS store. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13687) Provide a unified dependency artifact that transitively includes the cloud storage modules shipped with Hadoop.

2016-10-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15620622#comment-15620622
 ] 

Chris Nauroth commented on HADOOP-13687:


HADOOP-13037 is a significant patch to hadoop-azure-datalake, which would be 
invalidated by the restructuring I've done here.  Before I commit this patch, 
I'm going to be a good citizen and prepare a rebased revision of the current 
HADOOP-13037 patch.

> Provide a unified dependency artifact that transitively includes the cloud 
> storage modules shipped with Hadoop.
> ---
>
> Key: HADOOP-13687
> URL: https://issues.apache.org/jira/browse/HADOOP-13687
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13687-branch-2.001.patch, 
> HADOOP-13687-branch-2.002.patch, HADOOP-13687-branch-2.003.patch, 
> HADOOP-13687-trunk.001.patch, HADOOP-13687-trunk.002.patch, 
> HADOOP-13687-trunk.003.patch
>
>
> Currently, downstream projects that want to integrate with different 
> Hadoop-compatible file systems like WASB and S3A need to list dependencies on 
> each one.  This creates an ongoing maintenance burden for those projects, 
> because they need to update their build whenever a new Hadoop-compatible file 
> system is introduced.  This issue proposes adding a new artifact that 
> transitively includes all Hadoop-compatible file systems.  Similar to 
> hadoop-client, this new artifact will consist of just a pom.xml listing the 
> individual dependencies.  Downstream users can depend on this artifact to 
> sweep in everything, and picking up a new file system in a future version 
> will be just a matter of updating the Hadoop dependency version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13736) Change PathMetadata to hold S3AFileStatus instead of FileStatus.

2016-10-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13736:
---
Hadoop Flags: Reviewed

[~eddyxu], I think I understand your point now.  With this in place, we'll 
still have to downcast, but it will just be a single downcast to 
{{MetadataStore}} done at initialization time.  Then, all 
subsequent operations can execute on {{S3AFileStatus}} instances without a 
downcast.

This won't provide a strong type safety guarantee, because there is no way to 
check the type parameter at runtime due to Java's type erasure.  However, it's 
still helpful for reducing the amount of boilerplate code spent on downcasts.

+1 for the patch, pending resolution of Aaron's requests to get other patches 
committed first.

> Change PathMetadata to hold S3AFileStatus instead of FileStatus.
> 
>
> Key: HADOOP-13736
> URL: https://issues.apache.org/jira/browse/HADOOP-13736
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HADOOP-13736-HADOOP-13345.000.patch, 
> HADOOP-13736-HADOOP-13345.001.patch, HADOOP-13736.000.patch, 
> HADOOP-13736.wip-01.patch
>
>
> {{S3AFileStatus}} is implemented differently with {{FileStatus}}, for 
> instance {{S3AFileStatus#isEmptyDirectory()}} is not implemented in 
> {{FileStatus()}}. And {{access_time}}, {{block_replication}}, {{owner}}, 
> {{group}} and a few other fields are not meaningful in {{S3AFileStatus}}.  
> So in the scope of {{S3guard}}, it should use {{S3AFileStatus}} in  instead 
> of {{FileStatus}} in {{PathMetadaa}} to avoid casting the types back and 
> forth in S3A. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13651) S3Guard: S3AFileSystem Integration with MetadataStore

2016-10-30 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15620602#comment-15620602
 ] 

Chris Nauroth commented on HADOOP-13651:


bq. I'll go study the FileSystem cache now (i.e. does it guarantee one instance 
per-bucket, or get close to that?)

Yes, the relevant piece to look at is the cache {{Key}} class inside 
{{FileSystem}}.  This data structure defines a composite key for entries in the 
cache:

{code}
/** FileSystem.Cache.Key */
static class Key {
  final String scheme;
  final String authority;
  final UserGroupInformation ugi;
  final long unique;   // an artificial way to make a key unique
{code}

The {{scheme}} will be "s3a", and the {{authority}} will be the S3 bucket, so 
it will guarantee the same instance is reused for the same bucket, so long as 
it's the same user running the code that allocates the {{FileSystem}}.  The 
{{unique}} field is an artifical cache buster used for callers that explicitly 
do not want to share an instance and instead request a unique one by calling 
{{FileSystem#newInstance}}.  Calling {{FileSystem#close}} evicts the instance 
from the cache.  There are some pretty big gotchas that can come up related to 
this {{FileSystem}} cache, but for the sake of this discussion, we can say that 
it works as expected.

I don't have any objection to a plan of proceeding with this patch and 
converting to an instance per {{S3AFileSystem}} in a later patch if that's 
helpful for the development process.  We have the freedom to work that way on a 
feature branch.  However, I wonder if that's problematic for tests that access 
multiple buckets, like the tests that read from the public landsat-pds bucket.

> S3Guard: S3AFileSystem Integration with MetadataStore
> -
>
> Key: HADOOP-13651
> URL: https://issues.apache.org/jira/browse/HADOOP-13651
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13651-HADOOP-13345.001.patch, 
> HADOOP-13651-HADOOP-13345.002.patch, HADOOP-13651-HADOOP-13345.003.patch
>
>
> Modify S3AFileSystem et al. to optionally use a MetadataStore for metadata 
> consistency and caching.
> Implementation should have minimal overhead when no MetadataStore is 
> configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13631) S3Guard: implement move() for LocalMetadataStore, add unit tests

2016-10-28 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-13631:
---
Hadoop Flags: Reviewed

+1, assuming [~eddyxu] thinks his feedback has been resolved satisfactorily.

I think this {{move}} interface does make sense based on our current goals with 
metadata backed by DynamoDB.  Eddy has a good point that some implementations 
might operate better by working with the source and destination path, like the 
existing interface.  I could imagine an implementation backed by a simplified 
NameNode or maybe NFS (though I'm generally wary of NFS high availability 
capabilities).  When that comes up, maybe the interface could evolve to include 
both the collections and the source and destination path, and the 
implementation can decide which parameters to use.

I think we'll have freedom to evolve the {{MetadataStore}} interface.  We're 
not planning to make it public with strict backward compatibility guarantees, 
so we can change it easily later.

> S3Guard: implement move() for LocalMetadataStore, add unit tests
> 
>
> Key: HADOOP-13631
> URL: https://issues.apache.org/jira/browse/HADOOP-13631
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13631-HADOOP-13345.001.patch
>
>
> Building on HADOOP-13573 and HADOOP-13452, implement move() in 
> LocalMetadataStore and associated MetadataStore unit tests.
> (Making this a separate JIRA to break up work into decent-sized and 
> reviewable chunks.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >