[jira] [Commented] (HADOOP-16434) Expose an api which provides current active configurations set in .xml files.

2019-07-17 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886782#comment-16886782
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-16434:
--

[~ashishdoneriya], there is a /conf end-point on every Hadoop daemon that 
should do what you want. If that works, please close this JIRA. Tx.

> Expose an api which provides current active configurations set in .xml files.
> -
>
> Key: HADOOP-16434
> URL: https://issues.apache.org/jira/browse/HADOOP-16434
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.1.2
>Reporter: Ashish Doneriya
>Priority: Minor
>
> There is no rest api exposed in yarn timeline or resource manager or namanode 
> from which we could find the value of the configuration. To view the 
> configurations of a cluster, we have to manually open the files 
> (yarn-site.xml, core-site.xml, hdfs-site.xml, mapred-site.xml) and see the 
> values. If you have cloudera manager or ambari then the only way to check the 
> values is to open the managers in browser and see its value.
> Therefore please create a rest api which returns all active configurations in 
> a hadoop cluster so that it could be directly consumed by programs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15571) After HADOOP-13440, multiple filesystems/file-contexts created with the same Configuration object are forced to have the same umask

2018-07-03 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15571:
-
Status: Open  (was: Patch Available)

Tx for the +1 on the approach, [~xyao] and [~ste...@apache.org].

 Uploading a new patch with the test-case. It makes sure that
 - as long as no explicit FileContext.setUMask() calls are made, the conf 
updates are reflected
 - once an explicit API call is made, that takes preference over any conf 
updates

> After HADOOP-13440, multiple filesystems/file-contexts created with the same 
> Configuration object are forced to have the same umask
> ---
>
> Key: HADOOP-15571
> URL: https://issues.apache.org/jira/browse/HADOOP-15571
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: HADOOP-15571.1.txt, HADOOP-15571.txt
>
>
> Ran into a super hard-to-debug issue due to this. [Edit: Turns out the same 
> issue as YARN-5749 that [~Tao Yang] ran into]
> h4. Issue
> Configuration conf = new Configuration();
>  fc1 = FileContext.getFileContext(uri1, conf);
>  fc2 = FileContext.getFileContext(uri2, conf);
>  fc.setUMask(umask_for_fc1); // Screws up umask for fc2 also!
> This was not the case before HADOOP-13440.
> h4. Symptoms:
> h5. Scenario I ran into
> When trying to localize a HDFS directory (hdfs:///my/dir/1.txt), NodeManager 
> tries to replicate the directory structure on the local file-system 
> ($yarn-local-dirs/filecache/my/dir/1.txt).
> Now depending on whether NM has ever done a log-aggregation (completely 
> unrelated code that sets umask to be 137 for its own files on HDFS), the 
> directories /my and /my/dir on local-fs may have different permissions. In 
> the specific case where NM did log-aggregation, /my/dir was created with 137 
> umask and so localization of 1.txt completely failed due to absent directory 
> executable permissions!
> h5. Previous scenarios:
> We ran into this before in test-cases and instead of fixing the root-cause, 
> we just fixed the test-cases: YARN-5679 / YARN-5749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15571) After HADOOP-13440, multiple filesystems/file-contexts created with the same Configuration object are forced to have the same umask

2018-07-03 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15571:
-
Attachment: HADOOP-15571.1.txt

> After HADOOP-13440, multiple filesystems/file-contexts created with the same 
> Configuration object are forced to have the same umask
> ---
>
> Key: HADOOP-15571
> URL: https://issues.apache.org/jira/browse/HADOOP-15571
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: HADOOP-15571.1.txt, HADOOP-15571.txt
>
>
> Ran into a super hard-to-debug issue due to this. [Edit: Turns out the same 
> issue as YARN-5749 that [~Tao Yang] ran into]
> h4. Issue
> Configuration conf = new Configuration();
>  fc1 = FileContext.getFileContext(uri1, conf);
>  fc2 = FileContext.getFileContext(uri2, conf);
>  fc.setUMask(umask_for_fc1); // Screws up umask for fc2 also!
> This was not the case before HADOOP-13440.
> h4. Symptoms:
> h5. Scenario I ran into
> When trying to localize a HDFS directory (hdfs:///my/dir/1.txt), NodeManager 
> tries to replicate the directory structure on the local file-system 
> ($yarn-local-dirs/filecache/my/dir/1.txt).
> Now depending on whether NM has ever done a log-aggregation (completely 
> unrelated code that sets umask to be 137 for its own files on HDFS), the 
> directories /my and /my/dir on local-fs may have different permissions. In 
> the specific case where NM did log-aggregation, /my/dir was created with 137 
> umask and so localization of 1.txt completely failed due to absent directory 
> executable permissions!
> h5. Previous scenarios:
> We ran into this before in test-cases and instead of fixing the root-cause, 
> we just fixed the test-cases: YARN-5679 / YARN-5749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15571) After HADOOP-13440, multiple filesystems/file-contexts created with the same Configuration object are forced to have the same umask

2018-07-03 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15571:
-
Status: Patch Available  (was: Open)

> After HADOOP-13440, multiple filesystems/file-contexts created with the same 
> Configuration object are forced to have the same umask
> ---
>
> Key: HADOOP-15571
> URL: https://issues.apache.org/jira/browse/HADOOP-15571
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: HADOOP-15571.1.txt, HADOOP-15571.txt
>
>
> Ran into a super hard-to-debug issue due to this. [Edit: Turns out the same 
> issue as YARN-5749 that [~Tao Yang] ran into]
> h4. Issue
> Configuration conf = new Configuration();
>  fc1 = FileContext.getFileContext(uri1, conf);
>  fc2 = FileContext.getFileContext(uri2, conf);
>  fc.setUMask(umask_for_fc1); // Screws up umask for fc2 also!
> This was not the case before HADOOP-13440.
> h4. Symptoms:
> h5. Scenario I ran into
> When trying to localize a HDFS directory (hdfs:///my/dir/1.txt), NodeManager 
> tries to replicate the directory structure on the local file-system 
> ($yarn-local-dirs/filecache/my/dir/1.txt).
> Now depending on whether NM has ever done a log-aggregation (completely 
> unrelated code that sets umask to be 137 for its own files on HDFS), the 
> directories /my and /my/dir on local-fs may have different permissions. In 
> the specific case where NM did log-aggregation, /my/dir was created with 137 
> umask and so localization of 1.txt completely failed due to absent directory 
> executable permissions!
> h5. Previous scenarios:
> We ran into this before in test-cases and instead of fixing the root-cause, 
> we just fixed the test-cases: YARN-5679 / YARN-5749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15571) After HADOOP-13440, multiple filesystems/file-contexts created with the same Configuration object are forced to have the same umask

2018-07-01 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned HADOOP-15571:


Assignee: Vinod Kumar Vavilapalli

Thanks for the review [~xyao]!
{quote}FileContext assume to be a per process wide settings. If uri1 and uri2 
are under different context, should they use different Configuration objects? 
This way, the existing logic will be able to handle it properly.
{quote}
That was my first approach too, but there are too many uses of FileContext (in 
YARN and potentially as well as in downstream projects) and forcing all of them 
to create a new Configuration object is not right. This creation of such a new 
Config object per URI was not needed in 2.x.

> After HADOOP-13440, multiple filesystems/file-contexts created with the same 
> Configuration object are forced to have the same umask
> ---
>
> Key: HADOOP-15571
> URL: https://issues.apache.org/jira/browse/HADOOP-15571
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: HADOOP-15571.txt
>
>
> Ran into a super hard-to-debug issue due to this. [Edit: Turns out the same 
> issue as YARN-5749 that [~Tao Yang] ran into]
> h4. Issue
> Configuration conf = new Configuration();
>  fc1 = FileContext.getFileContext(uri1, conf);
>  fc2 = FileContext.getFileContext(uri2, conf);
>  fc.setUMask(umask_for_fc1); // Screws up umask for fc2 also!
> This was not the case before HADOOP-13440.
> h4. Symptoms:
> h5. Scenario I ran into
> When trying to localize a HDFS directory (hdfs:///my/dir/1.txt), NodeManager 
> tries to replicate the directory structure on the local file-system 
> ($yarn-local-dirs/filecache/my/dir/1.txt).
> Now depending on whether NM has ever done a log-aggregation (completely 
> unrelated code that sets umask to be 137 for its own files on HDFS), the 
> directories /my and /my/dir on local-fs may have different permissions. In 
> the specific case where NM did log-aggregation, /my/dir was created with 137 
> umask and so localization of 1.txt completely failed due to absent directory 
> executable permissions!
> h5. Previous scenarios:
> We ran into this before in test-cases and instead of fixing the root-cause, 
> we just fixed the test-cases: YARN-5679 / YARN-5749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15571) After HADOOP-13440, multiple filesystems/file-contexts created with the same Configuration object are forced to have the same umask

2018-06-28 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15571:
-
Description: 
Ran into a super hard-to-debug issue due to this. [Edit: Turns out the same 
issue as YARN-5749 that [~Tao Yang] ran into]

h4. Issue

Configuration conf = new Configuration();
 fc1 = FileContext.getFileContext(uri1, conf);
 fc2 = FileContext.getFileContext(uri2, conf);
 fc.setUMask(umask_for_fc1); // Screws up umask for fc2 also!

This was not the case before HADOOP-13440.
h4. Symptoms:
h5. Scenario I ran into

When trying to localize a HDFS directory (hdfs:///my/dir/1.txt), NodeManager 
tries to replicate the directory structure on the local file-system 
($yarn-local-dirs/filecache/my/dir/1.txt).

Now depending on whether NM has ever done a log-aggregation (completely 
unrelated code that sets umask to be 137 for its own files on HDFS), the 
directories /my and /my/dir on local-fs may have different permissions. In the 
specific case where NM did log-aggregation, /my/dir was created with 137 umask 
and so localization of 1.txt completely failed due to absent directory 
executable permissions!
h5. Previous scenarios:

We ran into this before in test-cases and instead of fixing the root-cause, we 
just fixed the test-cases: YARN-5679 / YARN-5749

  was:
Ran into a super hard-to-debug due to this.
h4. Issue

Configuration conf = new Configuration();
 fc1 = FileContext.getFileContext(uri1, conf);
 fc2 = FileContext.getFileContext(uri2, conf);
 fc.setUMask(umask_for_fc1); // Screws up umask for fc2 also!

This was not the case before HADOOP-13440.
h4. Symptoms:
h5. Scenario I ran into

When trying to localize a HDFS directory (hdfs:///my/dir/1.txt), NodeManager 
tries to replicate the directory structure on the local file-system 
($yarn-local-dirs/filecache/my/dir/1.txt).

Now depending on whether NM has ever done a log-aggregation (completely 
unrelated code that sets umask to be 137 for its own files on HDFS), the 
directories /my and /my/dir on local-fs may have different permissions. In the 
specific case where NM did log-aggregation, /my/dir was created with 137 umask 
and so localization of 1.txt completely failed due to absent directory 
executable permissions!
h5. Previous scenarios:

We ran into this before in test-cases and instead of fixing the root-cause, we 
just fixed the test-cases: YARN-5679 / YARN-5749


> After HADOOP-13440, multiple filesystems/file-contexts created with the same 
> Configuration object are forced to have the same umask
> ---
>
> Key: HADOOP-15571
> URL: https://issues.apache.org/jira/browse/HADOOP-15571
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: HADOOP-15571.txt
>
>
> Ran into a super hard-to-debug issue due to this. [Edit: Turns out the same 
> issue as YARN-5749 that [~Tao Yang] ran into]
> h4. Issue
> Configuration conf = new Configuration();
>  fc1 = FileContext.getFileContext(uri1, conf);
>  fc2 = FileContext.getFileContext(uri2, conf);
>  fc.setUMask(umask_for_fc1); // Screws up umask for fc2 also!
> This was not the case before HADOOP-13440.
> h4. Symptoms:
> h5. Scenario I ran into
> When trying to localize a HDFS directory (hdfs:///my/dir/1.txt), NodeManager 
> tries to replicate the directory structure on the local file-system 
> ($yarn-local-dirs/filecache/my/dir/1.txt).
> Now depending on whether NM has ever done a log-aggregation (completely 
> unrelated code that sets umask to be 137 for its own files on HDFS), the 
> directories /my and /my/dir on local-fs may have different permissions. In 
> the specific case where NM did log-aggregation, /my/dir was created with 137 
> umask and so localization of 1.txt completely failed due to absent directory 
> executable permissions!
> h5. Previous scenarios:
> We ran into this before in test-cases and instead of fixing the root-cause, 
> we just fixed the test-cases: YARN-5679 / YARN-5749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15571) After HADOOP-13440, multiple filesystems/file-contexts created with the same Configuration object are forced to have the same umask

2018-06-28 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15571:
-
Status: Patch Available  (was: Open)

Attached a patch that does the above and reverts YARN-5679.

> After HADOOP-13440, multiple filesystems/file-contexts created with the same 
> Configuration object are forced to have the same umask
> ---
>
> Key: HADOOP-15571
> URL: https://issues.apache.org/jira/browse/HADOOP-15571
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: HADOOP-15571.txt
>
>
> Ran into a super hard-to-debug due to this.
> h4. Issue
> Configuration conf = new Configuration();
>  fc1 = FileContext.getFileContext(uri1, conf);
>  fc2 = FileContext.getFileContext(uri2, conf);
>  fc.setUMask(umask_for_fc1); // Screws up umask for fc2 also!
> This was not the case before HADOOP-13440.
> h4. Symptoms:
> h5. Scenario I ran into
> When trying to localize a HDFS directory (hdfs:///my/dir/1.txt), NodeManager 
> tries to replicate the directory structure on the local file-system 
> ($yarn-local-dirs/filecache/my/dir/1.txt).
> Now depending on whether NM has ever done a log-aggregation (completely 
> unrelated code that sets umask to be 137 for its own files on HDFS), the 
> directories /my and /my/dir on local-fs may have different permissions. In 
> the specific case where NM did log-aggregation, /my/dir was created with 137 
> umask and so localization of 1.txt completely failed due to absent directory 
> executable permissions!
> h5. Previous scenarios:
> We ran into this before in test-cases and instead of fixing the root-cause, 
> we just fixed the test-cases: YARN-5679 / YARN-5749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15571) After HADOOP-13440, multiple filesystems/file-contexts created with the same Configuration object are forced to have the same umask

2018-06-28 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15571:
-
Attachment: HADOOP-15571.txt

> After HADOOP-13440, multiple filesystems/file-contexts created with the same 
> Configuration object are forced to have the same umask
> ---
>
> Key: HADOOP-15571
> URL: https://issues.apache.org/jira/browse/HADOOP-15571
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: HADOOP-15571.txt
>
>
> Ran into a super hard-to-debug due to this.
> h4. Issue
> Configuration conf = new Configuration();
>  fc1 = FileContext.getFileContext(uri1, conf);
>  fc2 = FileContext.getFileContext(uri2, conf);
>  fc.setUMask(umask_for_fc1); // Screws up umask for fc2 also!
> This was not the case before HADOOP-13440.
> h4. Symptoms:
> h5. Scenario I ran into
> When trying to localize a HDFS directory (hdfs:///my/dir/1.txt), NodeManager 
> tries to replicate the directory structure on the local file-system 
> ($yarn-local-dirs/filecache/my/dir/1.txt).
> Now depending on whether NM has ever done a log-aggregation (completely 
> unrelated code that sets umask to be 137 for its own files on HDFS), the 
> directories /my and /my/dir on local-fs may have different permissions. In 
> the specific case where NM did log-aggregation, /my/dir was created with 137 
> umask and so localization of 1.txt completely failed due to absent directory 
> executable permissions!
> h5. Previous scenarios:
> We ran into this before in test-cases and instead of fixing the root-cause, 
> we just fixed the test-cases: YARN-5679 / YARN-5749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15571) After HADOOP-13440, multiple filesystems/file-contexts created with the same Configuration object are forced to have the same umask

2018-06-28 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526904#comment-16526904
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-15571:
--

>From my limited understanding of all of this 

This is a regression from 2.x to 3.x.

Essentially there were two ways to set umask in 2.x.
 # FileContext.setUMask(myUMask) // Previously per file-context
 # conf.set(fs.permissions.umask-mode, myUmask) // Global

In 3.x, there is only one. Either (the configuration from a file) or (any-call 
by code anywhere in the same JVM for any file-context) will set umask for *all* 
file-systems.

IMO and IIUC, we should support both - with the FileContext.setUMask() taking 
precedence if it was explicitly called. /cc [~ajisakaa], [~boky01], [~yufeigu], 
[~steve_l] for HADOOP-13440 & related patches.

> After HADOOP-13440, multiple filesystems/file-contexts created with the same 
> Configuration object are forced to have the same umask
> ---
>
> Key: HADOOP-15571
> URL: https://issues.apache.org/jira/browse/HADOOP-15571
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Priority: Critical
>
> Ran into a super hard-to-debug due to this.
> h4. Issue
> Configuration conf = new Configuration();
>  fc1 = FileContext.getFileContext(uri1, conf);
>  fc2 = FileContext.getFileContext(uri2, conf);
>  fc.setUMask(umask_for_fc1); // Screws up umask for fc2 also!
> This was not the case before HADOOP-13440.
> h4. Symptoms:
> h5. Scenario I ran into
> When trying to localize a HDFS directory (hdfs:///my/dir/1.txt), NodeManager 
> tries to replicate the directory structure on the local file-system 
> ($yarn-local-dirs/filecache/my/dir/1.txt).
> Now depending on whether NM has ever done a log-aggregation (completely 
> unrelated code that sets umask to be 137 for its own files on HDFS), the 
> directories /my and /my/dir on local-fs may have different permissions. In 
> the specific case where NM did log-aggregation, /my/dir was created with 137 
> umask and so localization of 1.txt completely failed due to absent directory 
> executable permissions!
> h5. Previous scenarios:
> We ran into this before in test-cases and instead of fixing the root-cause, 
> we just fixed the test-cases: YARN-5679 / YARN-5749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15571) After HADOOP-13440, multiple filesystems/file-contexts created with the same Configuration object are forced to have the same umask

2018-06-28 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created HADOOP-15571:


 Summary: After HADOOP-13440, multiple filesystems/file-contexts 
created with the same Configuration object are forced to have the same umask
 Key: HADOOP-15571
 URL: https://issues.apache.org/jira/browse/HADOOP-15571
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli


Ran into a super hard-to-debug due to this.
h4. Issue

Configuration conf = new Configuration();
 fc1 = FileContext.getFileContext(uri1, conf);
 fc2 = FileContext.getFileContext(uri2, conf);
 fc.setUMask(umask_for_fc1); // Screws up umask for fc2 also!

This was not the case before HADOOP-13440.
h4. Symptoms:
h5. Scenario I ran into

When trying to localize a HDFS directory (hdfs:///my/dir/1.txt), NodeManager 
tries to replicate the directory structure on the local file-system 
($yarn-local-dirs/filecache/my/dir/1.txt).

Now depending on whether NM has ever done a log-aggregation (completely 
unrelated code that sets umask to be 137 for its own files on HDFS), the 
directories /my and /my/dir on local-fs may have different permissions. In the 
specific case where NM did log-aggregation, /my/dir was created with 137 umask 
and so localization of 1.txt completely failed due to absent directory 
executable permissions!
h5. Previous scenarios:

We ran into this before in test-cases and instead of fixing the root-cause, we 
just fixed the test-cases: YARN-5679 / YARN-5749



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15518) Authentication filter calling handler after request already authenticated

2018-06-26 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524265#comment-16524265
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-15518:
--

bq. I have changed getUserPrincipal to getRemoteUser and this change seems 
working fine. 
That was my first solution too. [~kminder], would that work?

> Authentication filter calling handler after request already authenticated
> -
>
> Key: HADOOP-15518
> URL: https://issues.apache.org/jira/browse/HADOOP-15518
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.7.1
>Reporter: Kevin Minder
>Assignee: Kevin Minder
>Priority: Major
> Attachments: HADOOP-15518-001.patch
>
>
> The hadoop-auth AuthenticationFilter will invoke its handler even if a prior 
> successful authentication has occurred in the current request.  This 
> primarily affects situations where multiple authentication mechanism has been 
> configured.  For example when core-site.xml's has 
> hadoop.http.authentication.type=kerberos and yarn-site.xml has 
> yarn.timeline-service.http-authentication.type=kerberos the result is an 
> attempt to perform two Kerberos authentications for the same request.  This 
> in turn results in Kerberos triggering a replay attack detection.  The 
> javadocs for AuthenticationHandler 
> ([https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationHandler.java)]
>  indicate for the authenticate method that
> {quote}This method is invoked by the AuthenticationFilter only if the HTTP 
> client request is not yet authenticated.
> {quote}
> This does not appear to be the case in practice.
> I've create a patch and tested on a limited number of functional use cases 
> (e.g. the timeline-service issue noted above).  If there is general agreement 
> that the change is valid I'll add unit tests to the patch.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14163) Refactor existing hadoop site to use more usable static website generator

2018-06-20 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518730#comment-16518730
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-14163:
--

Got pinged about this offline.

Thanks for keeping at it, [~elek]!

I think there are two road-blocks here
 (1) Is the mechanism using which the website is built good enough - mvn-site / 
hugo etc?
 (2) Is the new website good enough?

For (1), I just think we need more committer attention and get feedback rapidly 
on this Jira and get it in.

For (2), how about we do it in a different way in the interest of progress?
 - We create a hadoop.apache.org/new-site/ where this new site goes.
 - We then modify the existing web-site to say that there is a new 
site/experience that folks can click on a link and navigate to
 - As this new website matures and gets feedback & fixes, we finally pull the 
plug at a later point of time when we think we are good to go.

Thoughts?

> Refactor existing hadoop site to use more usable static website generator
> -
>
> Key: HADOOP-14163
> URL: https://issues.apache.org/jira/browse/HADOOP-14163
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: site
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-14163-001.zip, HADOOP-14163-002.zip, 
> HADOOP-14163-003.zip, HADOOP-14163.004.patch, HADOOP-14163.005.patch, 
> HADOOP-14163.006.patch, HADOOP-14163.007.patch, HADOOP-14163.008.tar.gz, 
> HADOOP-14163.009.patch, HADOOP-14163.009.tar.gz, hadoop-site.tar.gz, 
> hadop-site-rendered.tar.gz
>
>
> From the dev mailing list:
> "Publishing can be attacked via a mix of scripting and revamping the darned 
> website. Forrest is pretty bad compared to the newer static site generators 
> out there (e.g. need to write XML instead of markdown, it's hard to review a 
> staging site because of all the absolute links, hard to customize, did I 
> mention XML?), and the look and feel of the site is from the 00s. We don't 
> actually have that much site content, so it should be possible to migrate to 
> a new system."
> This issue is find a solution to migrate the old site to a new modern static 
> site generator using a more contemprary theme.
> Goals: 
>  * existing links should work (or at least redirected)
>  * It should be easy to add more content required by a release automatically 
> (most probably with creating separated markdown files)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15527) loop until TIMEOUT before sending kill -9

2018-06-18 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516364#comment-16516364
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-15527:
--

bq. FYI: this patch is a bit fragile due to some assumptions made about the 
environment.
[~aw], I did try to to minimize problems like that to my knowledge, but if you 
can point out the specific issues, I can fix them..

> loop until TIMEOUT before sending kill -9
> -
>
> Key: HADOOP-15527
> URL: https://issues.apache.org/jira/browse/HADOOP-15527
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: HADOOP-15527.1.txt, HADOOP-15527.2.txt, HADOOP-15527.txt
>
>
> I'm seeing that sometimes daemons keep running for a little while even after 
> "kill -9" from daemon-stop scripts.
> Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".
> Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
> stop nodemanager}}. Though it is possible that other daemons may run into 
> this too.
> Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15527) Sometimes daemons keep running even after "kill -9" from daemon-stop script

2018-06-12 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15527:
-
Status: Patch Available  (was: Open)

Addressing shellcheck issue.

> Sometimes daemons keep running even after "kill -9" from daemon-stop script
> ---
>
> Key: HADOOP-15527
> URL: https://issues.apache.org/jira/browse/HADOOP-15527
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
> Attachments: HADOOP-15527.1.txt, HADOOP-15527.2.txt, HADOOP-15527.txt
>
>
> I'm seeing that sometimes daemons keep running for a little while even after 
> "kill -9" from daemon-stop scripts.
> Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".
> Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
> stop nodemanager}}. Though it is possible that other daemons may run into 
> this too.
> Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15527) Sometimes daemons keep running even after "kill -9" from daemon-stop script

2018-06-12 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15527:
-
Attachment: HADOOP-15527.2.txt

> Sometimes daemons keep running even after "kill -9" from daemon-stop script
> ---
>
> Key: HADOOP-15527
> URL: https://issues.apache.org/jira/browse/HADOOP-15527
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
> Attachments: HADOOP-15527.1.txt, HADOOP-15527.2.txt, HADOOP-15527.txt
>
>
> I'm seeing that sometimes daemons keep running for a little while even after 
> "kill -9" from daemon-stop scripts.
> Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".
> Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
> stop nodemanager}}. Though it is possible that other daemons may run into 
> this too.
> Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15527) Sometimes daemons keep running even after "kill -9" from daemon-stop script

2018-06-12 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15527:
-
Status: Open  (was: Patch Available)

> Sometimes daemons keep running even after "kill -9" from daemon-stop script
> ---
>
> Key: HADOOP-15527
> URL: https://issues.apache.org/jira/browse/HADOOP-15527
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
> Attachments: HADOOP-15527.1.txt, HADOOP-15527.2.txt, HADOOP-15527.txt
>
>
> I'm seeing that sometimes daemons keep running for a little while even after 
> "kill -9" from daemon-stop scripts.
> Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".
> Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
> stop nodemanager}}. Though it is possible that other daemons may run into 
> this too.
> Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15527) Sometimes daemons keep running even after "kill -9" from daemon-stop script

2018-06-12 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15527:
-
Status: Patch Available  (was: Open)

Updated new patch addressing the complaints from Jenkins.

> Sometimes daemons keep running even after "kill -9" from daemon-stop script
> ---
>
> Key: HADOOP-15527
> URL: https://issues.apache.org/jira/browse/HADOOP-15527
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
> Attachments: HADOOP-15527.1.txt, HADOOP-15527.txt
>
>
> I'm seeing that sometimes daemons keep running for a little while even after 
> "kill -9" from daemon-stop scripts.
> Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".
> Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
> stop nodemanager}}. Though it is possible that other daemons may run into 
> this too.
> Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15527) Sometimes daemons keep running even after "kill -9" from daemon-stop script

2018-06-12 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15527:
-
Status: Open  (was: Patch Available)

bq. In the batch file, hadoop_stop_daemon is renamed to 
hadoop_stop_daemon_changing_pid. Is this change necessary?
It's not necessary, but earlier there was only one test and so it was okay to 
have a generic name. We now have two tests so modified to disambiguate what 
each test is doing.

> Sometimes daemons keep running even after "kill -9" from daemon-stop script
> ---
>
> Key: HADOOP-15527
> URL: https://issues.apache.org/jira/browse/HADOOP-15527
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
> Attachments: HADOOP-15527.1.txt, HADOOP-15527.txt
>
>
> I'm seeing that sometimes daemons keep running for a little while even after 
> "kill -9" from daemon-stop scripts.
> Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".
> Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
> stop nodemanager}}. Though it is possible that other daemons may run into 
> this too.
> Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15527) Sometimes daemons keep running even after "kill -9" from daemon-stop script

2018-06-12 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15527:
-
Attachment: HADOOP-15527.1.txt

> Sometimes daemons keep running even after "kill -9" from daemon-stop script
> ---
>
> Key: HADOOP-15527
> URL: https://issues.apache.org/jira/browse/HADOOP-15527
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
> Attachments: HADOOP-15527.1.txt, HADOOP-15527.txt
>
>
> I'm seeing that sometimes daemons keep running for a little while even after 
> "kill -9" from daemon-stop scripts.
> Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".
> Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
> stop nodemanager}}. Though it is possible that other daemons may run into 
> this too.
> Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15527) Sometimes daemons keep running even after "kill -9" from daemon-stop script

2018-06-12 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509272#comment-16509272
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-15527:
--

Uploaded a patch that loops around for HADOOP_STOP_TIMEOUT after kill -9.

It also optimizes the wait before kill -9. Earlier, we would always 
unnecessarily wait for HADOOP_STOP_TIMEOUT after SIGTERM, even if the process 
disappeared in the mean while.

> Sometimes daemons keep running even after "kill -9" from daemon-stop script
> ---
>
> Key: HADOOP-15527
> URL: https://issues.apache.org/jira/browse/HADOOP-15527
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
> Attachments: HADOOP-15527.txt
>
>
> I'm seeing that sometimes daemons keep running for a little while even after 
> "kill -9" from daemon-stop scripts.
> Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".
> Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
> stop nodemanager}}. Though it is possible that other daemons may run into 
> this too.
> Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15527) Sometimes daemons keep running even after "kill -9" from daemon-stop script

2018-06-12 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15527:
-
Status: Patch Available  (was: Open)

> Sometimes daemons keep running even after "kill -9" from daemon-stop script
> ---
>
> Key: HADOOP-15527
> URL: https://issues.apache.org/jira/browse/HADOOP-15527
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
> Attachments: HADOOP-15527.txt
>
>
> I'm seeing that sometimes daemons keep running for a little while even after 
> "kill -9" from daemon-stop scripts.
> Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".
> Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
> stop nodemanager}}. Though it is possible that other daemons may run into 
> this too.
> Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15527) Sometimes daemons keep running even after "kill -9" from daemon-stop script

2018-06-12 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15527:
-
Attachment: HADOOP-15527.txt

> Sometimes daemons keep running even after "kill -9" from daemon-stop script
> ---
>
> Key: HADOOP-15527
> URL: https://issues.apache.org/jira/browse/HADOOP-15527
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
> Attachments: HADOOP-15527.txt
>
>
> I'm seeing that sometimes daemons keep running for a little while even after 
> "kill -9" from daemon-stop scripts.
> Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".
> Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
> stop nodemanager}}. Though it is possible that other daemons may run into 
> this too.
> Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15527) Sometimes daemons keep running even after "kill -9" from daemon-stop script

2018-06-11 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508695#comment-16508695
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-15527:
--

Here is the info from some debug logs I added to 
hadoop/libexec/hadoop-functions.sh and after adding a while loop around the 
"ps" check.
{code}
=== 2018-06-10 00:43:31,754 vinodkv inside scripts sending SIGTERM
=== 2018-06-10 00:43:31,756 vinodkv inside scripts SIGTERM sent, 
sleeping
=== 2018-06-10 00:43:36,759 vinodkv inside scripts 3989960 still alive! 
sending sig-kill
=== 2018-06-10 00:43:36,797 vinodkv inside scripts sigkill sent
=== 2018-06-10 00:43:36,827 vinodkv inside scripts.. unable to kill 
3989960
=== 2018-06-10 00:43:36,846 vinodkv inside scripts.. unable to kill 
3989960
=== 2018-06-10 00:43:36,866 vinodkv inside scripts.. unable to kill 
3989960
=== 2018-06-10 00:43:36,885 vinodkv inside scripts.. unable to kill 
3989960
=== 2018-06-10 00:43:36,904 vinodkv inside scripts.. unable to kill 
3989960
=== 2018-06-10 00:43:36,924 vinodkv inside scripts.. process 3989960 
finally dead
{code}
{code}
=== 2018-06-10 00:48:00,884 vinodkv inside scripts sending SIGTERM
=== 2018-06-10 00:48:00,886 vinodkv inside scripts SIGTERM sent, 
sleeping
=== 2018-06-10 00:48:05,890 vinodkv inside scripts 3992747 still alive! 
sending sig-kill
=== 2018-06-10 00:48:05,898 vinodkv inside scripts sigkill sent
=== 2018-06-10 00:48:05,921 vinodkv inside scripts.. unable to kill 
3992747
=== 2018-06-10 00:48:05,938 vinodkv inside scripts.. unable to kill 
3992747
=== 2018-06-10 00:48:05,953 vinodkv inside scripts.. unable to kill 
3992747
=== 2018-06-10 00:48:05,970 vinodkv inside scripts.. unable to kill 
3992747
=== 2018-06-10 00:48:05,987 vinodkv inside scripts.. unable to kill 
3992747
=== 2018-06-10 00:48:06,006 vinodkv inside scripts.. unable to kill 
3992747
=== 2018-06-10 00:48:06,024 vinodkv inside scripts.. unable to kill 
3992747
=== 2018-06-10 00:48:06,042 vinodkv inside scripts.. process 3992747 
finally dead
{code}

It takes roughly 125-145 milliseconds for RM to come down once a "kill -9" is 
sent.

It is possible that it may be due to system load.

I don't have any other explanation as to why this is only happening now.

> Sometimes daemons keep running even after "kill -9" from daemon-stop script
> ---
>
> Key: HADOOP-15527
> URL: https://issues.apache.org/jira/browse/HADOOP-15527
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
>
> I'm seeing that sometimes daemons keep running for a little while even after 
> "kill -9" from daemon-stop scripts.
> Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".
> Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
> stop nodemanager}}. Though it is possible that other daemons may run into 
> this too.
> Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15527) Sometimes daemons keep running even after "kill -9" from daemon-stop script

2018-06-11 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created HADOOP-15527:


 Summary: Sometimes daemons keep running even after "kill -9" from 
daemon-stop script
 Key: HADOOP-15527
 URL: https://issues.apache.org/jira/browse/HADOOP-15527
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


I'm seeing that sometimes daemons keep running for a little while even after 
"kill -9" from daemon-stop scripts.

Debugging more, I see several instances of "ERROR: Unable to kill ${pid}".

Saw this specifically with ResourceManager & NodeManager -  {{yarn --daemon 
stop nodemanager}}. Though it is possible that other daemons may run into this 
too.

Saw this on both Centos as well as Ubuntu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Moved] (HADOOP-15501) [Umbrella] Upgrade efforts to Hadoop 3.x

2018-05-29 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli moved YARN-8347 to HADOOP-15501:


Target Version/s: 3.2.0, 3.1.1  (was: 3.1.1)
  Issue Type: Task  (was: Bug)
 Key: HADOOP-15501  (was: YARN-8347)
 Project: Hadoop Common  (was: Hadoop YARN)

> [Umbrella] Upgrade efforts to Hadoop 3.x
> 
>
> Key: HADOOP-15501
> URL: https://issues.apache.org/jira/browse/HADOOP-15501
> Project: Hadoop Common
>  Issue Type: Task
>Reporter: Sunil Govindan
>Priority: Major
>
> This is an umbrella ticket to manage all similar efforts to close gaps for 
> upgrade efforts to 3.x.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15406) hadoop-nfs dependencies for mockito and junit are not test scope

2018-05-22 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484252#comment-16484252
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-15406:
--

This broke TimelineService V1.5 - just filed YARN-8338, please see.

> hadoop-nfs dependencies for mockito and junit are not test scope
> 
>
> Key: HADOOP-15406
> URL: https://issues.apache.org/jira/browse/HADOOP-15406
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: HADOOP-15406.001.patch
>
>
> hadoop-nfs asks for mockito-all and junit for its unit tests but it does not 
> mark the dependency as being required only for tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15390) Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens

2018-04-18 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15390:
-
Priority: Critical  (was: Major)
Target Version/s: 3.1.1

Can you please add which patch originally caused this and which versions are 
affected.

Tentatively putting target-version as 3.1.1 till then.

> Yarn RM logs flooded by DelegationTokenRenewer trying to renew KMS tokens
> -
>
> Key: HADOOP-15390
> URL: https://issues.apache.org/jira/browse/HADOOP-15390
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HADOOP-15390.01.patch, HADOOP-15390.02.patch
>
>
> When looking at a recent issue with [~rkanter] and [~yufeigu], we found that 
> the RM log in a cluster was flooded by KMS token renewal errors below:
> {noformat}
> $ tail -9 hadoop-cmf-yarn-RESOURCEMANAGER.log
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> ...
> 2018-04-11 11:34:09,367 WARN 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer: 
> keyProvider null cannot renew dt.
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: kms-dt, Service: KMSIP:16000, Ident: 
> (kms-dt owner=user, renewer=yarn, realUser=, issueDate=1522192283334, 
> maxDate=1522797083334, sequenceNumber=15108613, masterKeyId=2674);exp=0; 
> apps=[]], for []
> 2018-04-11 11:34:09,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: kms-dt, Service: KMSIP:16000, Ident: (kms-dt owner=user, 
> renewer=yarn, realUser=, issueDate=1522192283334, maxDate=1522797083334, 
> sequenceNumber=15108613, masterKeyId=2674);exp=0; apps=[] in -1523446449367 
> ms, appId = []
> {noformat}
> Further inspection shows the KMS IP is from another cluster. The RM is before 
> HADOOP-14445, so needs to read from config. The config rightfully doesn't 
> have the other cluster's KMS configured.
> Although HADOOP-14445 will make this a non-issue by creating the provider 
> from token service, we should fix 2 things here:
> - KMS token renewer should throw instead of return 0. Returning 0 when not 
> able to renew shall be considered a bug in the renewer.
> - Yarn RM's {{DelegationTokenRenewer}} service should validate the return and 
> not go into this busy loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13500) Concurrency issues when using Configuration iterator

2018-04-02 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved HADOOP-13500.
--
Resolution: Duplicate

Already fixed by HADOOP-13556. Closing as a dup. Please reopen if that isn't 
the case.

> Concurrency issues when using Configuration iterator
> 
>
> Key: HADOOP-13500
> URL: https://issues.apache.org/jira/browse/HADOOP-13500
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: conf
>Reporter: Jason Lowe
>Assignee: Ajay Kumar
>Priority: Major
>
> It is possible to encounter a ConcurrentModificationException while trying to 
> iterate a Configuration object.  The iterator method tries to walk the 
> underlying Property object without proper synchronization, so another thread 
> simultaneously calling the set method can trigger it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15208) DistCp to offer -xtrack option to save src/dest filesets as alternative to delete()

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved HADOOP-15208.
--
   Resolution: Duplicate
Fix Version/s: (was: 3.1.0)

> DistCp to offer -xtrack  option to save src/dest filesets as 
> alternative to delete()
> --
>
> Key: HADOOP-15208
> URL: https://issues.apache.org/jira/browse/HADOOP-15208
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15208-001.patch, HADOOP-15208-002.patch, 
> HADOOP-15208-002.patch, HADOOP-15208-003.patch
>
>
> There are opportunities to improve distcp delete performance and scalability 
> with object stores, but you need to test with production datasets to 
> determine if the optimizations work, don't run out of memory, etc.
> By adding the option to save the sequence files of source, dest listings, 
> people (myself included) can experiment with different strategies before 
> trying to commit one which doesn't scale



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-15208) DistCp to offer -xtrack option to save src/dest filesets as alternative to delete()

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened HADOOP-15208:
--

Reopening and closing this instead as a dup of HADOOP-15209 as I can't find any 
patch for this in 3.1.0 for this JIRA. Revert back if this is incorrect.

> DistCp to offer -xtrack  option to save src/dest filesets as 
> alternative to delete()
> --
>
> Key: HADOOP-15208
> URL: https://issues.apache.org/jira/browse/HADOOP-15208
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HADOOP-15208-001.patch, HADOOP-15208-002.patch, 
> HADOOP-15208-002.patch, HADOOP-15208-003.patch
>
>
> There are opportunities to improve distcp delete performance and scalability 
> with object stores, but you need to test with production datasets to 
> determine if the optimizations work, don't run out of memory, etc.
> By adding the option to save the sequence files of source, dest listings, 
> people (myself included) can experiment with different strategies before 
> trying to commit one which doesn't scale



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14974) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation fails in trunk

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved HADOOP-14974.
--
   Resolution: Duplicate
Fix Version/s: (was: 3.1.0)

> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
>  fails in trunk
> ---
>
> Key: HADOOP-14974
> URL: https://issues.apache.org/jira/browse/HADOOP-14974
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: John Zhuge
>Priority: Blocker
>
> {code}
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> QueueMetrics,q0=root already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:239)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueMetrics.forQueue(CSQueueMetrics.java:141)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.(AbstractCSQueue.java:131)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.(ParentQueue.java:90)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:267)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(CapacitySchedulerQueueManager.java:158)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:639)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:331)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:391)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:756)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1152)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:317)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1313)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:161)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:140)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:136)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation.testExcessReservationThanNodeManagerCapacity(TestContainerAllocation.java:90)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-14974) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation fails in trunk

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened HADOOP-14974:
--

HADOOP-14954 never made it to a release. And there's no other patch for this 
JIRA in the 3.1.0 release. Reopening and closing this instead as a dup.

> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
>  fails in trunk
> ---
>
> Key: HADOOP-14974
> URL: https://issues.apache.org/jira/browse/HADOOP-14974
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: John Zhuge
>Priority: Blocker
> Fix For: 3.1.0
>
>
> {code}
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> QueueMetrics,q0=root already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:239)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueMetrics.forQueue(CSQueueMetrics.java:141)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.(AbstractCSQueue.java:131)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.(ParentQueue.java:90)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:267)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.initializeQueues(CapacitySchedulerQueueManager.java:158)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:639)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:331)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:391)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:756)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1152)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:317)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1313)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:161)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:140)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:136)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation.testExcessReservationThanNodeManagerCapacity(TestContainerAllocation.java:90)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14714) handle InternalError in bulk object delete through retries

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved HADOOP-14714.
--
   Resolution: Duplicate
Fix Version/s: (was: 3.1.0)

> handle InternalError in bulk object delete through retries
> --
>
> Key: HADOOP-14714
> URL: https://issues.apache.org/jira/browse/HADOOP-14714
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> There's some more detail appearing on HADOOP-11572 about the errors seen 
> here; sounds like its large fileset related (or just probability working 
> against you). Most importantly: retries may make it go away. 
> Proposed: implement a retry policy.
> Issue: delete is not idempotent, not if someone else adds things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-14714) handle InternalError in bulk object delete through retries

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened HADOOP-14714:
--

Reopening and closing this instead as a dup of HADOOP-13786 as I don't find any 
patch for this in the source-code. Please revert back if this is incorrect.

> handle InternalError in bulk object delete through retries
> --
>
> Key: HADOOP-14714
> URL: https://issues.apache.org/jira/browse/HADOOP-14714
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> There's some more detail appearing on HADOOP-11572 about the errors seen 
> here; sounds like its large fileset related (or just probability working 
> against you). Most importantly: retries may make it go away. 
> Proposed: implement a retry policy.
> Issue: delete is not idempotent, not if someone else adds things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14585) Ensure controls in-place to prevent clients with significant clock skews pruning aggressively

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-14585:
-
Fix Version/s: (was: 3.1.0)

Removing fix-version from this open JIRA - it is instead set during commit-time.

> Ensure controls in-place to prevent clients with significant clock skews 
> pruning aggressively
> -
>
> Key: HADOOP-14585
> URL: https://issues.apache.org/jira/browse/HADOOP-14585
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Sean Mackrory
>Priority: Minor
>
> From discussion on HADOOP-14499:
> {quote}
> bear in mind that we can't guarantee that the clocks of all clients are in 
> sync; you don't want a client whose TZ setting is wrong to aggressively prune 
> things. Had that happen in production with files in shared filestore. This is 
> why ant -diagnostics checks time consistency with temp files...
> {quote}
> {quote}
> temp files work on a shared FS. AWS is actually somewhat sensitive to clocks: 
> if your VM is too far out of time then auth actually fails, its ~+-15 
> minutes. There's some stuff in the Java SDK to actually calculate and adjust 
> clock skew, presumably parsing the timestamp of a failure, calculating the 
> difference and retrying. Which means that the field in SDKGlobalConfiguration 
> could help identify the difference between local time and AWS time.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14585) Ensure controls in-place to prevent clients with significant clock skews pruning aggressively

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-14585:
-

Removing fix-version from this open JIRA - it is instead set during commit-time.

> Ensure controls in-place to prevent clients with significant clock skews 
> pruning aggressively
> -
>
> Key: HADOOP-14585
> URL: https://issues.apache.org/jira/browse/HADOOP-14585
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Sean Mackrory
>Priority: Minor
>
> From discussion on HADOOP-14499:
> {quote}
> bear in mind that we can't guarantee that the clocks of all clients are in 
> sync; you don't want a client whose TZ setting is wrong to aggressively prune 
> things. Had that happen in production with files in shared filestore. This is 
> why ant -diagnostics checks time consistency with temp files...
> {quote}
> {quote}
> temp files work on a shared FS. AWS is actually somewhat sensitive to clocks: 
> if your VM is too far out of time then auth actually fails, its ~+-15 
> minutes. There's some stuff in the Java SDK to actually calculate and adjust 
> clock skew, presumably parsing the timestamp of a failure, calculating the 
> difference and retrying. Which means that the field in SDKGlobalConfiguration 
> could help identify the difference between local time and AWS time.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14531) [Umbrella] Improve S3A error handling & reporting

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-14531:
-
Summary: [Umbrella] Improve S3A error handling & reporting  (was: Improve 
S3A error handling & reporting)

> [Umbrella] Improve S3A error handling & reporting
> -
>
> Key: HADOOP-14531
> URL: https://issues.apache.org/jira/browse/HADOOP-14531
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Fix For: 3.1.0
>
>
> Improve S3a error handling and reporting
> this includes
> # looking at error codes and translating to more specific exceptions
> # better retry logic where present
> # adding retry logic where not present
> # more diagnostics in exceptions 
> # docs
> Overall goals
> * things that can be retried and will go away are retried for a bit
> * things that don't go away when retried failfast (302, no auth, unknown 
> host, connection refused)
> * meaningful exceptions are built in translate exception
> * diagnostics are included, where possible
> * our troubleshooting docs are expanded with new failures we encounter
> AWS S3 error codes: 
> http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-14381) S3AUtils.translateException to map 503 reponse to => throttling failure

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened HADOOP-14381:
--

Reopening and closing this instead as a dup of HADOOP-13786 as I don't find any 
patch for this in the source-code. Please revert back if this is incorrect.

> S3AUtils.translateException to map 503 reponse to => throttling failure
> ---
>
> Key: HADOOP-14381
> URL: https://issues.apache.org/jira/browse/HADOOP-14381
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> When AWS S3 returns "503", it means that the overall set of requests on a 
> part of an S3 bucket exceeds the permitted limit; the client(s) need to 
> throttle back or away for some rebalancing to complete.
> The aws SDK retries 3 times on a 503, but then throws it up. Our code doesn't 
> do anything with that other than create a generic {{AWSS3IOException}}.
> Proposed
> * add a new exception, {{AWSOverloadedException}}
> * raise it on a 503 from S3 (& for s3guard, on DDB complaints)
> * have it include a link to a wiki page on the topic, as well as the path
> * and any other diags
> Code talking to S3 may then be able to catch this and choose to react. Some 
> retry with exponential backoff is the obvious option. Failing, well, that 
> could trigger task reattempts at that part of the query, then job retry 
> —which will again fail, *unless the number of tasks run in parallel is 
> reduced*
> As this throttling is across all clients talking to the same part of a 
> bucket, fixing it is potentially a high level option. We can at least start 
> by reporting things better



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14381) S3AUtils.translateException to map 503 reponse to => throttling failure

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved HADOOP-14381.
--
   Resolution: Duplicate
Fix Version/s: (was: 3.1.0)

> S3AUtils.translateException to map 503 reponse to => throttling failure
> ---
>
> Key: HADOOP-14381
> URL: https://issues.apache.org/jira/browse/HADOOP-14381
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> When AWS S3 returns "503", it means that the overall set of requests on a 
> part of an S3 bucket exceeds the permitted limit; the client(s) need to 
> throttle back or away for some rebalancing to complete.
> The aws SDK retries 3 times on a 503, but then throws it up. Our code doesn't 
> do anything with that other than create a generic {{AWSS3IOException}}.
> Proposed
> * add a new exception, {{AWSOverloadedException}}
> * raise it on a 503 from S3 (& for s3guard, on DDB complaints)
> * have it include a link to a wiki page on the topic, as well as the path
> * and any other diags
> Code talking to S3 may then be able to catch this and choose to react. Some 
> retry with exponential backoff is the obvious option. Failing, well, that 
> could trigger task reattempts at that part of the query, then job retry 
> —which will again fail, *unless the number of tasks run in parallel is 
> reduced*
> As this throttling is across all clients talking to the same part of a 
> bucket, fixing it is potentially a high level option. We can at least start 
> by reporting things better



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14325) [Umbrella] Stabilise S3A Server Side Encryption

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-14325:
-
Summary: [Umbrella] Stabilise S3A Server Side Encryption  (was: Stabilise 
S3A Server Side Encryption)

> [Umbrella] Stabilise S3A Server Side Encryption
> ---
>
> Key: HADOOP-14325
> URL: https://issues.apache.org/jira/browse/HADOOP-14325
> Project: Hadoop Common
>  Issue Type: Task
>  Components: documentation, fs/s3, test
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Priority: Major
> Fix For: 3.1.0
>
>
> Round off the S3 SSE encryption support with everything needed to safely ship 
> it.
> The core code is in, along with tests, so this covers the details
> * docs with examples, including JCEKS files
> * keeping secrets secret
> * any more tests, including scale ones (huge file, rename)
> * I'll add a KMS test to my (github) spark suite



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13205) S3A to support custom retry policies; failfast on unknown host

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved HADOOP-13205.
--
   Resolution: Duplicate
Fix Version/s: (was: 3.1.0)

> S3A to support custom retry policies; failfast on unknown host
> --
>
> Key: HADOOP-13205
> URL: https://issues.apache.org/jira/browse/HADOOP-13205
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> Noticed today that when connections are down, S3A retries on 
> UnknownHostExceptions logging noisily in the process.
> # it should be possible to define or customize retry policies for an FS 
> instance (fail fast, exponential backoff, etc)
> # we may want to explicitly have a fail-fast-if-offline retry policy, 
> catching the common connectivity ones.
> Testing will be fun here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-13205) S3A to support custom retry policies; failfast on unknown host

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened HADOOP-13205:
--

Reopening and closing this instead as a dup of HADOOP-13786 as I don't find any 
patch for this JIRA in the source-code. Please revert back if this is incorrect.

> S3A to support custom retry policies; failfast on unknown host
> --
>
> Key: HADOOP-13205
> URL: https://issues.apache.org/jira/browse/HADOOP-13205
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.1.0
>
>
> Noticed today that when connections are down, S3A retries on 
> UnknownHostExceptions logging noisily in the process.
> # it should be possible to define or customize retry policies for an FS 
> instance (fail fast, exponential backoff, etc)
> # we may want to explicitly have a fail-fast-if-offline retry policy, 
> catching the common connectivity ones.
> Testing will be fun here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved HADOOP-13811.
--
   Resolution: Duplicate
Fix Version/s: (was: 3.1.0)

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened HADOOP-13811:
--

Reopening and closing this instead as a dup of HADOOP-13786 as I don't find any 
patch for this JIRA in the source-code. Please revert back if this is incorrect.

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-14303) Review retry logic on all S3 SDK calls, implement where needed

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened HADOOP-14303:
--

Reopening and closing this instead as a dup of HADOOP-13786 as I don't find any 
patch for this in the source-code. Please revert back if this is incorrect.

> Review retry logic on all S3 SDK calls, implement where needed
> --
>
> Key: HADOOP-14303
> URL: https://issues.apache.org/jira/browse/HADOOP-14303
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> AWS S3, IAM, KMS, DDB etc all throttle callers: the S3A code needs to handle 
> this without failing, as if it slows down its requests it can recover.
> 1. Look at all the places where we are calling S3A via the AWS SDK and make 
> sure we are retrying with some backoff & jitter policy, ideally something 
> unified. This must be more systematic than the case-by-case, 
> problem-by-problem strategy we are implicitly using.
> 2. Many of the AWS S3 SDK calls do implement retry (e.g PUT/multipart PUT), 
> but we need to check the other parts of the process: login, initiate/complete 
> MPU, ...
> Related
> HADOOP-13811 Failed to sanitize XML document destined for handler class
> HADOOP-13664 S3AInputStream to use a retry policy on read failures
> This stuff is all hard to test. A key need is to be able to differentiate 
> recoverable throttle & network failures from unrecoverable problems like: 
> auth, network config (e.g bad endpoint), etc.
> May be the opportunity to add a faulting subclass of Amazon S3 client which 
> can be configured in IT Tests to fail at specific points. Ryan Blue's mcok S3 
> client does this in HADOOP-13786, but it is for 100% mock. I'm thinking of 
> something with similar fault raising, but in front of the real S3A client 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14303) Review retry logic on all S3 SDK calls, implement where needed

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved HADOOP-14303.
--
   Resolution: Duplicate
Fix Version/s: (was: 3.1.0)

> Review retry logic on all S3 SDK calls, implement where needed
> --
>
> Key: HADOOP-14303
> URL: https://issues.apache.org/jira/browse/HADOOP-14303
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> AWS S3, IAM, KMS, DDB etc all throttle callers: the S3A code needs to handle 
> this without failing, as if it slows down its requests it can recover.
> 1. Look at all the places where we are calling S3A via the AWS SDK and make 
> sure we are retrying with some backoff & jitter policy, ideally something 
> unified. This must be more systematic than the case-by-case, 
> problem-by-problem strategy we are implicitly using.
> 2. Many of the AWS S3 SDK calls do implement retry (e.g PUT/multipart PUT), 
> but we need to check the other parts of the process: login, initiate/complete 
> MPU, ...
> Related
> HADOOP-13811 Failed to sanitize XML document destined for handler class
> HADOOP-13664 S3AInputStream to use a retry policy on read failures
> This stuff is all hard to test. A key need is to be able to differentiate 
> recoverable throttle & network failures from unrecoverable problems like: 
> auth, network config (e.g bad endpoint), etc.
> May be the opportunity to add a faulting subclass of Amazon S3 client which 
> can be configured in IT Tests to fail at specific points. Ryan Blue's mcok S3 
> client does this in HADOOP-13786, but it is for 100% mock. I'm thinking of 
> something with similar fault raising, but in front of the real S3A client 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13271) Intermittent failure of TestS3AContractRootDir.testListEmptyRootDirectory

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13271:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> Intermittent failure of TestS3AContractRootDir.testListEmptyRootDirectory
> -
>
> Key: HADOOP-13271
> URL: https://issues.apache.org/jira/browse/HADOOP-13271
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> I'm seeing an intermittent failure of 
> {{TestS3AContractRootDir.testListEmptyRootDirectory}}
> The sequence of : deleteFiles(listStatus(Path("/)")) is failing because the 
> file to delete is root ...yet the code is passing in the children of /, not / 
> itself.
> hypothesis: when you call listStatus on an empty root dir, you get a file 
> entry back that says isFile, not isDirectory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15095) S3a committer factory to warn when default FileOutputFormat committer is created

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15095:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> S3a committer factory to warn when default FileOutputFormat committer is 
> created
> 
>
> Key: HADOOP-15095
> URL: https://issues.apache.org/jira/browse/HADOOP-15095
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>Priority: Minor
>
> The S3ACommitterFactory should warn when the classic FileOutputCommitter is 
> used (i.e. the client is not configured to use a new one). Something like
> "this committer is neither fast nor guaranteed to be correct. See $URL" where 
> URL is a pointer to something (wiki? hadoop docs?).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13968) S3a FS to support "__magic" path for the special "unmaterialized" writes

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13968:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> S3a FS to support "__magic" path for the special "unmaterialized" writes
> 
>
> Key: HADOOP-13968
> URL: https://issues.apache.org/jira/browse/HADOOP-13968
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> S3AFileSystem to add support for a special path, such as  
> {{.temp_pending_put/}} or similar, which, when used as the base of a path, 
> indicates that the file is actually to be saved to the parent dir, but only 
> via a delayed put commit operation.
> At the same time, we may need blocks on some normal fileIO ops under these 
> dirs, especially rename and delete, as this would cause serious problems 
> including data loss and large bills for pending data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15241) transient failure of ITestCommitOperations.testCommitEmptyFile; fault injection related

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15241:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> transient failure of ITestCommitOperations.testCommitEmptyFile; fault 
> injection related
> ---
>
> Key: HADOOP-15241
> URL: https://issues.apache.org/jira/browse/HADOOP-15241
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Reporter: Steve Loughran
>Priority: Minor
>
> Parallel 12 thread run on a machine overloaded with other work; transient 
> failure of a test expecting fault injection to raise 2 exceptions before 
> success; a 3rd one was caught.
> {code}
> [ERROR] 
> testCommitEmptyFile(org.apache.hadoop.fs.s3a.commit.ITestCommitOperations)  
> Time elapsed: 10.805 s  <<< ERROR!
> org.apache.hadoop.fs.s3a.AWSServiceThrottledException: Completing multipart 
> commit on 
> fork-00010/test/DELAY_LISTING_ME/testCommitEmptyFile/empty-commit.txt: 
> com.amazonaws.AmazonServiceException: throttled count = 3 (Service: null; 
> Status Code: 503; Error Code: null; Request ID: null):null: throttled count = 
> 3 (Service: null; Status Code: 503; Error Code: null; Request ID: null)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14971) Merge S3A committers into trunk

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-14971:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> Merge S3A committers into trunk
> ---
>
> Key: HADOOP-14971
> URL: https://issues.apache.org/jira/browse/HADOOP-14971
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-13786-040.patch, HADOOP-13786-041.patch
>
>
> Merge the HADOOP-13786 committer into trunk. This branch is being set up as a 
> github PR for review there & to keep it out the mailboxes of the watchers on 
> the main JIRA



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15269) S3 returning 400 on the directory /test/ GET of getFileStatus

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15269:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> S3 returning 400 on the directory /test/ GET of getFileStatus
> -
>
> Key: HADOOP-15269
> URL: https://issues.apache.org/jira/browse/HADOOP-15269
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>
> Since Monday Feb 26, I'm getting intermittent failures of getFileStatus on a 
> directory
> # file path: {{/test}} is returning 404, as expected
> # directory path {{//test/}} is returning 400, so failing the entire operation
> S3 Ireland. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14606) S3AInputStream: Handle http stream skip(n) skipping < n bytes in a forward seek

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-14606:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> S3AInputStream: Handle http stream skip(n) skipping < n bytes in a forward 
> seek
> ---
>
> Key: HADOOP-14606
> URL: https://issues.apache.org/jira/browse/HADOOP-14606
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Priority: Critical
>
> There's some hints in the InputStream docs that {{skip(n)}} may skip  bytes. Codepaths only seem to do this if read() returns -1, meaning end of 
> stream is reached.
> If that happens when doing a forward seek via skip, then we have got our 
> numbers wrong and are in trouble. Look for a negative response, log @ ERROR 
> and revert to a close/reopen seek to an absolute position.
> *I have no evidence of this acutally occurring*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14161) Failed to rename file in S3A during FileOutputFormat commitTask

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-14161:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> Failed to rename file in S3A during FileOutputFormat commitTask
> ---
>
> Key: HADOOP-14161
> URL: https://issues.apache.org/jira/browse/HADOOP-14161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 2.7.1, 2.7.2, 2.7.3
> Environment: spark 2.0.2 with mesos
> hadoop 2.7.2
>Reporter: Luke Miner
>Priority: Minor
>
> I'm getting non deterministic rename errors while writing to S3 using spark 
> and hadoop. The proper permissions are set and this only happens 
> occasionally. It can happen on a job that is as simple as reading in json, 
> repartitioning and then writing out. After this failure occurs, the overall 
> job hangs indefinitely.
> {code}
> org.apache.spark.SparkException: Task failed while writing rows
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Failed to commit task
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:275)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:257)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
> at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1348)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258)
> ... 8 more
> Caused by: java.io.IOException: Failed to rename 
> S3AFileStatus{path=s3a://foo/_temporary/0/_temporary/attempt_201703081855_0018_m_000966_0/part-r-00966-615ed714-58c1-4b89-be56-e47966737c75.snappy.parquet;
>  isDirectory=false; length=111225342; replication=1; blocksize=33554432; 
> modification_time=1488999342000; access_time=0; owner=; group=; 
> permission=rw-rw-rw-; isSymlink=false} to 
> s3a://foo/part-r-00966-615ed714-58c1-4b89-be56-e47966737c75.snappy.parquet
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:415)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:539)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502)
> at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
> at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:211)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:270)
> ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15191) Add Private/Unstable BulkDelete operations to supporting object stores for DistCP

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15191:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> Add Private/Unstable BulkDelete operations to supporting object stores for 
> DistCP
> -
>
> Key: HADOOP-15191
> URL: https://issues.apache.org/jira/browse/HADOOP-15191
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15191-001.patch, HADOOP-15191-002.patch, 
> HADOOP-15191-003.patch, HADOOP-15191-004.patch
>
>
> Large scale DistCP with the -delete option doesn't finish in a viable time 
> because of the final CopyCommitter doing a 1 by 1 delete of all missing 
> files. This isn't randomized (the list is sorted), and it's throttled by AWS.
> If bulk deletion of files was exposed as an API, distCP would do 1/1000 of 
> the REST calls, so not get throttled.
> Proposed: add an initially private/unstable interface for stores, 
> {{BulkDelete}} which declares a page size and offers a 
> {{bulkDelete(List)}} operation for the bulk deletion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15210) Handle FNFE from S3Guard.getMetadataStore() in S3A initialize()

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15210:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> Handle FNFE from S3Guard.getMetadataStore() in S3A initialize()
> ---
>
> Key: HADOOP-15210
> URL: https://issues.apache.org/jira/browse/HADOOP-15210
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Minor
>
> {{S3Guard.getMetadataStore()}} throws FileNotFoundExceptions up, as the 
> comments say " rely on callers to catch and treat specially"
> S3A Filesystem doesn't do that, instead it will just fail 
> FileSystem.initialize; the FNFE  is generated by DynamoDBMetadataStore.
> Are we happy with this? 
> Downgrading has some appeal: if you don't have the table, it will keep going. 
> But failures could be a sign of bad config, so maybe silent recovery is bad.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13664) S3AInputStream to use a retry policy on read failures

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13664:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> S3AInputStream to use a retry policy on read failures
> -
>
> Key: HADOOP-13664
> URL: https://issues.apache.org/jira/browse/HADOOP-13664
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> {{S3AInputStream}} has some retry logic to handle failures on a read: log and 
> retry. We should move this over to a (possibly hard coded RetryPolicy with 
> some sleep logic, so that longer-than-just-transient read failures can be 
> handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13967) S3ABlockOutputStream to support plugin point for different multipart strategies

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13967:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> S3ABlockOutputStream to support plugin point for different multipart 
> strategies
> ---
>
> Key: HADOOP-13967
> URL: https://issues.apache.org/jira/browse/HADOOP-13967
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> For 0-rename commits, we need to delay the final commit of a multipart PUT, 
> instead saving the data needed to build that commit into the s3 bucket.
> This means changes to {{S3ABlockOutputStream}} so that it can support 
> different policies on how to do this, "classic" and "delayed commit".
> Having this self contained means we can test it in isolation of anything else.
> I'm ignoring the old output stream...we will switch to fast output whenever a 
> special destination path is encountered



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14800) eliminate double stack trace on some s3guard CLI failures

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-14800:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> eliminate double stack trace on some s3guard CLI failures
> -
>
> Key: HADOOP-14800
> URL: https://issues.apache.org/jira/browse/HADOOP-14800
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Priority: Minor
>
> {{s3guard destroy}] when there's no bucket ends up double-listing the stack 
> trace, which is somewhat confusing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14530) Translate AWS SSE-KMS missing key exception to something

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-14530:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> Translate AWS SSE-KMS missing key exception to something
> 
>
> Key: HADOOP-14530
> URL: https://issues.apache.org/jira/browse/HADOOP-14530
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> when you use SSE-KMS and the ARN is invalid for that region, you get a 400 
> bad request exception + special error text "KMS.NotFoundException".This could 
> be a special exception



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13973) S3A GET/HEAD requests failing: java.lang.IllegalStateException: Connection is not open/Connection pool shut down

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13973:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> S3A GET/HEAD requests failing: java.lang.IllegalStateException: Connection is 
> not open/Connection pool shut down
> 
>
> Key: HADOOP-13973
> URL: https://issues.apache.org/jira/browse/HADOOP-13973
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: EC2 cluster
>Reporter: Rajesh Balamohan
>Assignee: Steve Loughran
>Priority: Major
>
> S3 requests failing with an error coming from Http client, 
> "java.lang.IllegalStateException: Connection is not open"
> Some online discussion implies that this is related to shared connection pool 
> shutdown & fixed in http client 4.4+. Hadoop & AWS SDK use v 4.5.2 so the fix 
> is in, we just need to make sure the pool is being set up right.
> There's a problem here of course: it may require moving to a later version of 
> the AWS SDK, with the consequences on jackson , as seen in HADOOP-13050. 
> And that's if there is a patched version out there



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14620) S3A authentication failure for regions other than us-east-1

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-14620:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> S3A authentication failure for regions other than us-east-1
> ---
>
> Key: HADOOP-14620
> URL: https://issues.apache.org/jira/browse/HADOOP-14620
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.3
>Reporter: Ilya Fourmanov
>Priority: Minor
> Attachments: s3-403.txt
>
>
> hadoop fs s3a:// operations fail authentication for s3 buckets hosted in 
> regions other than default us-east-1
> Steps to reproduce:
> # create s3 bucket in eu-west-1
> # Using IAM instance profile or fs.s3a.access.key/fs.s3a.secret.key run 
> following command:
> {code}
> hadoop --loglevel DEBUG  -D fs.s3a.endpoint=s3.eu-west-1.amazonaws.com  -ls  
> s3a://your-eu-west-1-hosted-bucket/ 
> {code}
> Expected behaviour:
> You will see listing of the bucket
> Actual behaviour:
> You will get 403 Authentication Denied response for AWS S3.
> Reason is mismatch in string to sign as defined in 
> http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html 
> provided by hadoop and expected by AWS. 
> If you use https://aws.amazon.com/code/199 to analyse StringToSignBytes 
> returned by AWS, you will see that AWS expects CanonicalizedResource to be in 
> form  
> /your-eu-west-1-hosted-bucket{color:red}.s3.eu-west-1.amazonaws.com{color}/.
> Hadoop provides it as /your-eu-west-1-hosted-bucket/
> Note that AWS documentation doesn't explicitly state that endpoint or full 
> dns address should be appended to CanonicalizedResource however practice 
> shows it is actually required.
> I've also submitted this to AWS for them to correct behaviour or 
> documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13969) S3A to support commit(path) operation, which commits all pending put commits in a path

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13969:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> S3A to support commit(path) operation, which commits all pending put commits 
> in a path
> --
>
> Key: HADOOP-13969
> URL: https://issues.apache.org/jira/browse/HADOOP-13969
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> as well as creating and saving data with a pending-commit, s3a needs to add 
> the actual commit operation.
> this would scan a directory, take its pending commits, read them in and 
> execute them. 
> issue: what to do on failures?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13059) S3a over-reacts to potentially transient network problems in its init() logic

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13059:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> S3a over-reacts to potentially transient network problems in its init() logic
> -
>
> Key: HADOOP-13059
> URL: https://issues.apache.org/jira/browse/HADOOP-13059
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-13059-001.patch
>
>
> If there's a reason for s3a not being able to connect to AWS, then the 
> constructor fails, even if this is a potentially transient event.
> This happens because the code to check for a bucket existing will relay the 
> exceptions.
> The constructor should catch IOEs against the remote FS, downgrade to warn 
> and let the code continue; it may fail later, but it may also recover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15003) Merge S3A committers into trunk: Yetus patch checker

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15003:
-
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> Merge S3A committers into trunk: Yetus patch checker
> 
>
> Key: HADOOP-15003
> URL: https://issues.apache.org/jira/browse/HADOOP-15003
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-13786-041.patch, HADOOP-13786-042.patch, 
> HADOOP-13786-043.patch, HADOOP-13786-044.patch, HADOOP-13786-045.patch, 
> HADOOP-13786-046.patch, HADOOP-13786-047.patch, HADOOP-13786-048.patch, 
> HADOOP-13786-049.patch, HADOOP-13786-050.patch, HADOOP-13786-051.patch, 
> HADOOP-13786-052.patch, HADOOP-13786-053.patch, HADOOP-15033-testfix-1.diff
>
>
> This is a Yetus only JIRA created to have Yetus review the 
> HADOOP-13786/HADOOP-14971 patch as a .patch file, as the review PR 
> [https://github.com/apache/hadoop/pull/282] is stopping this happening in 
> HADOOP-14971.
> Reviews should go into the PR/other task



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14961) Docker failed to build yetus/hadoop:0de40f0: Oracle JDK 8 is NOT installed

2018-02-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-14961:
-

Committer of this patch, please set the *Fix Version/s* for this JIRA.

> Docker failed to build yetus/hadoop:0de40f0: Oracle JDK 8 is NOT installed
> --
>
> Key: HADOOP-14961
> URL: https://issues.apache.org/jira/browse/HADOOP-14961
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, test
>Affects Versions: 3.1.0
>Reporter: John Zhuge
>Priority: Major
>
> https://builds.apache.org/job/PreCommit-HADOOP-Build/13546/console 
> {noformat} 
> Downloading Oracle Java 8... 
> --2017-10-18 18:28:11-- 
> http://download.oracle.com/otn-pub/java/jdk/8u144-b01/090f390dda5b47b9b721c7dfaa008135/jdk-8u144-linux-x64.tar.gz
>  
> Resolving download.oracle.com (download.oracle.com)... 
> 23.59.190.131, 23.59.190.130 
> Connecting to download.oracle.com (download.oracle.com)|23.59.190.131|:80... 
> connected. 
> HTTP request sent, awaiting response... 302 Moved Temporarily 
> Location: 
> https://edelivery.oracle.com/otn-pub/java/jdk/8u144-b01/090f390dda5b47b9b721c7dfaa008135/jdk-8u144-linux-x64.tar.gz
>  [following] 
> --2017-10-18 18:28:11-- 
> https://edelivery.oracle.com/otn-pub/java/jdk/8u144-b01/090f390dda5b47b9b721c7dfaa008135/jdk-8u144-linux-x64.tar.gz
>  
> Resolving edelivery.oracle.com (edelivery.oracle.com)... 
> 23.39.16.136, 2600:1409:a:39c::2d3e, 2600:1409:a:39e::2d3e 
> Connecting to edelivery.oracle.com 
> (edelivery.oracle.com)|23.39.16.136|:443... connected. 
> HTTP request sent, awaiting response... 302 Moved 
> Temporarily 
> Location: 
> http://download.oracle.com/otn-pub/java/jdk/8u144-b01/090f390dda5b47b9b721c7dfaa008135/jdk-8u144-linux-x64.tar.gz?AuthParam=1508351411_3d448519d55b9741af15953ef5049a7c
>  [following] 
> --2017-10-18 18:28:11-- 
> http://download.oracle.com/otn-pub/java/jdk/8u144-b01/090f390dda5b47b9b721c7dfaa008135/jdk-8u144-linux-x64.tar.gz?AuthParam=1508351411_3d448519d55b9741af15953ef5049a7c
>  
> Connecting to download.oracle.com (download.oracle.com)|23.59.190.131|:80... 
> connected. 
> HTTP request sent, awaiting response... 404 Not Found 
> 2017-10-18 18:28:12 ERROR 404: Not Found. 
> download failed 
> Oracle JDK 8 is NOT installed. 
> {noformat}
> Looks like Oracle JDK 8u144 is no longer available for download using that 
> link. 8u151 and 8u152 are available.
> Many of last 10 https://builds.apache.org/job/PreCommit-HADOOP-Build/ jobs 
> failed the same way, all on build host H1 and H6.
> [~aw] has a patch available in HADOOP-14816 "Update Dockerfile to use Xenial" 
> for a long term fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which breaks rolling upgrade

2017-12-08 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15059:
-
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

I just committed 005 patch to trunk, branch-3.0 and branch-3.0.0. Thanks 
[~jlowe] for the patch and the quick turn-around!

Thanks [~djp] for finding the issue, [~rchiang] for verifying the fix and 
[~daryn] for the reviews.

> 3.0 deployment cannot work with old version MR tar ball which breaks rolling 
> upgrade
> 
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch, 
> HADOOP-15059.006.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which breaks rolling upgrade

2017-12-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280875#comment-16280875
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-15059:
--

That makes sense. Then I can commit the previous patch. [~djp], can you give 
005 patch a spin? 

> 3.0 deployment cannot work with old version MR tar ball which breaks rolling 
> upgrade
> 
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch, 
> HADOOP-15059.006.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-12-06 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15059:
-
Attachment: HADOOP-15059.006.patch

The patch looks good to me.

[~daryn], don't get scared by exception handling!

Uploading a new patch that is same as that of Jason's but with SerializedFormat 
made static.

The test case failures reported here are unrelated. Though it is surprising how 
badly the unit tests are broken - I'll debug them independently and file bugs.

If Jenkins says okay, I'll commit this unless [~daryn] / [~jlowe] say no.

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch, 
> HADOOP-15059.006.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-12-06 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15059:
-
Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch, 
> HADOOP-15059.006.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-12-06 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15059:
-
Status: Open  (was: Patch Available)

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272093#comment-16272093
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-15059:
--

Tx for taking this up [~jlowe]!

bq. I'm attaching a patch that implements the "bridge release(s)" approach 
where the code supports reading the new format but will write the old format by 
default.
+1 for the bridging approach.

bq. The main drawback is that we don't get to easily leverage the benefits of 
the new format since it's not the default format.
I realized it's just not about changing the default format. 
ContainerLaunchContext.tokens in YARN is unfortunately a byte-buffer. Taking a 
protobuf, wrapping it into a byte-buffer and sending it to the RM is backwards 
to me. The right way to use this in YARN is to assume that the existing tokens 
field is old-style credentials and then add a new first-class protobuf based 
Credentials field.

The patch looks mostly good to me.

bq. minor suggestion of the private statics for version can be replaced with 
the enums
If we are no longer going to bump up this version in the protobuf world, +1 - 
there will be only two of these ever. IAC, these are private.

One other minor suggestion if you are doing the above. 
Credentials.SerializedFormat can be static.

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch, HADOOP-15059.004.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261483#comment-16261483
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-15059:
--

HADOOP-13123 looks related.

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Priority: Blocker
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15041) XInclude support in .xml configuration file is broken after "5eb7dbe9b31a45f57f2e1623aa1c9ce84a56c4d1" commit

2017-11-20 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved HADOOP-15041.
--
Resolution: Duplicate

Closing instead as a duplicate.

{code}
commit 9a89a3e0b09a2eae6aabadd510fa48c5e18c0dcc
Author: Arun Suresh 
Date:   Mon Nov 13 13:49:08 2017 -0800

Addendum patch for Configuration fix. (Jason Lowe via asuresh)

(cherry picked from commit 4e847d63a39268d8a34e0f22ddb5e40c5ef71e3a)
{code}

> XInclude support in .xml configuration file is broken after 
> "5eb7dbe9b31a45f57f2e1623aa1c9ce84a56c4d1" commit
> -
>
> Key: HADOOP-15041
> URL: https://issues.apache.org/jira/browse/HADOOP-15041
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: SammiChen
>Assignee: Arun Suresh
>Priority: Critical
>
> XInclude support in .xml configuration file is broken after following 
> check-in.  
> Since there is no JIRA number in the following commit message, create a new 
> JIRA to track the issue. 
> commit 5eb7dbe9b31a45f57f2e1623aa1c9ce84a56c4d1
> Author: Arun Suresh 
> Date:   Thu Nov 9 15:15:51 2017 -0800
> Fixing Job History Server Configuration parsing. (Jason Lowe via asuresh)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-15041) XInclude support in .xml configuration file is broken after "5eb7dbe9b31a45f57f2e1623aa1c9ce84a56c4d1" commit

2017-11-20 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened HADOOP-15041:
--

> XInclude support in .xml configuration file is broken after 
> "5eb7dbe9b31a45f57f2e1623aa1c9ce84a56c4d1" commit
> -
>
> Key: HADOOP-15041
> URL: https://issues.apache.org/jira/browse/HADOOP-15041
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: SammiChen
>Assignee: Arun Suresh
>Priority: Critical
>
> XInclude support in .xml configuration file is broken after following 
> check-in.  
> Since there is no JIRA number in the following commit message, create a new 
> JIRA to track the issue. 
> commit 5eb7dbe9b31a45f57f2e1623aa1c9ce84a56c4d1
> Author: Arun Suresh 
> Date:   Thu Nov 9 15:15:51 2017 -0800
> Fixing Job History Server Configuration parsing. (Jason Lowe via asuresh)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14847) Remove Guava Supplier and change to java Supplier in AMRMClient and AMRMClientAysnc

2017-09-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159416#comment-16159416
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-14847:
--

[~haibochen], you forgot to set the fix-version.

> Remove Guava Supplier and change to java Supplier in AMRMClient and 
> AMRMClientAysnc
> ---
>
> Key: HADOOP-14847
> URL: https://issues.apache.org/jira/browse/HADOOP-14847
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Blocker
> Attachments: HADOOP-14847.patch
>
>
> Remove the Guava library Supplier usage in user facing API's in 
> AMRMClient.java and AMRMClientAsync.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14284) Shade Guava everywhere

2017-08-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139454#comment-16139454
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-14284:
--

bq. Essentially, if we're not shading the server-side for YARN / MapReduce then 
we ought not. For client-side stuff we should just tell people who want to not 
have a Guava conflict that they need to use the published shaded client jars.
I did miss this in this long thread. This is essentially saying that if you are 
upgrading to 3.0
 - Don't use hadoop-hdfs - this is our private interface. Use the client jar.
 - If you want to use your own custom guava / jackson / whatever, use the 
shaded client jars. May be always use shaded client jars to future proof 
yourselves.

Did I read that right, [~busbey]?

While this is a change to downstream users, I can get behind this as the one 
final step from 2.x to 3.x which will shield our users for all of the future.

> Shade Guava everywhere
> --
>
> Key: HADOOP-14284
> URL: https://issues.apache.org/jira/browse/HADOOP-14284
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha4
>Reporter: Andrew Wang
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: HADOOP-14238.pre001.patch, HADOOP-14284.002.patch, 
> HADOOP-14284.004.patch, HADOOP-14284.007.patch, HADOOP-14284.010.patch, 
> HADOOP-14284.012.patch
>
>
> HADOOP-10101 upgraded the guava version for 3.x to 21.
> Guava is broadly used by Java projects that consume our artifacts. 
> Unfortunately, these projects also consume our private artifacts like 
> {{hadoop-hdfs}}. They also are unlikely on the new shaded client introduced 
> by HADOOP-11804, currently only available in 3.0.0-alpha2.
> We should shade Guava everywhere to proactively avoid breaking downstreams. 
> This isn't a requirement for all dependency upgrades, but it's necessary for 
> known-bad dependencies like Guava.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14284) Shade Guava everywhere

2017-08-23 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139419#comment-16139419
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-14284:
--

Paging [~djp] for his opinion.

This is a long JIRA and there's been a lot of back-and-forth, so just 
reiterating. I am not a fan of this but if you want to shade everything because 
of the hadoop-hdfs module issue, you should definitely leave YARN and mapreduce 
out of this - see my comment above, *snip*
bq. I think there is one thing we should definitely do. YARN and mapreduce have 
always had separate client libraries. And the expectation of users has always 
been to not depend on the server jars. So you should definitely skip YARN 
server side modules and mapreduce non-client modules from shading.

> Shade Guava everywhere
> --
>
> Key: HADOOP-14284
> URL: https://issues.apache.org/jira/browse/HADOOP-14284
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha4
>Reporter: Andrew Wang
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: HADOOP-14238.pre001.patch, HADOOP-14284.002.patch, 
> HADOOP-14284.004.patch, HADOOP-14284.007.patch, HADOOP-14284.010.patch, 
> HADOOP-14284.012.patch
>
>
> HADOOP-10101 upgraded the guava version for 3.x to 21.
> Guava is broadly used by Java projects that consume our artifacts. 
> Unfortunately, these projects also consume our private artifacts like 
> {{hadoop-hdfs}}. They also are unlikely on the new shaded client introduced 
> by HADOOP-11804, currently only available in 3.0.0-alpha2.
> We should shade Guava everywhere to proactively avoid breaking downstreams. 
> This isn't a requirement for all dependency upgrades, but it's necessary for 
> known-bad dependencies like Guava.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14284) Shade Guava everywhere

2017-05-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001884#comment-16001884
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-14284:
--

bq. Isn't it better to just shade our final artifacts instead of shading 
individual libraries' jars? 
bq. Do you mean that we prepare new project "hadoop-server-modules" and shading 
Guava and Curator inside them like hadoop-client-modules? It sounds better 
approach to me. By adding skipShade option here, we can overcome build time 
problem. Andrew Wang Sean Busbey Akira Ajisaka What do you think?
[~ozawa], I meant that we shade the leaf modules. For e.g. we shade 
hadoop-yarn-client module (instead of guava) completely so that the downstream 
users of hadoop-yarn-client only see one big fat jar with all the dependencies 
shaded. This way guava, log4j that hadoop-yarn-client depends on are completely 
invisible to users. The downside of this is that (a) the fat jar becomes, ahem, 
fat (b) some folks may want to control the versions explicitly. (a) is 
unavoidable. For (b), we could ship both slim and fat jars.

> Shade Guava everywhere
> --
>
> Key: HADOOP-14284
> URL: https://issues.apache.org/jira/browse/HADOOP-14284
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha3
>Reporter: Andrew Wang
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: HADOOP-14238.pre001.patch, HADOOP-14284.002.patch, 
> HADOOP-14284.004.patch, HADOOP-14284.007.patch, HADOOP-14284.010.patch, 
> HADOOP-14284.012.patch
>
>
> HADOOP-10101 upgraded the guava version for 3.x to 21.
> Guava is broadly used by Java projects that consume our artifacts. 
> Unfortunately, these projects also consume our private artifacts like 
> {{hadoop-hdfs}}. They also are unlikely on the new shaded client introduced 
> by HADOOP-11804, currently only available in 3.0.0-alpha2.
> We should shade Guava everywhere to proactively avoid breaking downstreams. 
> This isn't a requirement for all dependency upgrades, but it's necessary for 
> known-bad dependencies like Guava.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14284) Shade Guava everywhere

2017-05-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001837#comment-16001837
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-14284:
--

bq. Junping Du Vinod Kumar Vavilapalli could you share your opinion based on 
the information we shared after your comments? The looks large patch, however, 
it only changes the code of import statement and pom.xml. diff size is as 
follows as a reference:
[~ozawa], I think there is one thing we should definitely do. YARN and 
mapreduce have always had separate client libraries. And the expectation of 
users has always been to not depend on the server jars. So you should 
definitely skip YARN server side modules and mapreduce non-client modules from 
shading.

> Shade Guava everywhere
> --
>
> Key: HADOOP-14284
> URL: https://issues.apache.org/jira/browse/HADOOP-14284
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha3
>Reporter: Andrew Wang
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: HADOOP-14238.pre001.patch, HADOOP-14284.002.patch, 
> HADOOP-14284.004.patch, HADOOP-14284.007.patch, HADOOP-14284.010.patch, 
> HADOOP-14284.012.patch
>
>
> HADOOP-10101 upgraded the guava version for 3.x to 21.
> Guava is broadly used by Java projects that consume our artifacts. 
> Unfortunately, these projects also consume our private artifacts like 
> {{hadoop-hdfs}}. They also are unlikely on the new shaded client introduced 
> by HADOOP-11804, currently only available in 3.0.0-alpha2.
> We should shade Guava everywhere to proactively avoid breaking downstreams. 
> This isn't a requirement for all dependency upgrades, but it's necessary for 
> known-bad dependencies like Guava.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14284) Shade Guava everywhere

2017-05-02 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994109#comment-15994109
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-14284:
--

Are we sure this is the best approach? The 1.1MB patch just to shade guava is 
downright scary. As we keep doing this for other libraries, I'm concerned if 
our code becomes more brittle (changing imports everywhere) and if the build 
times explode.

Are there alternatives? Isn't it better to just shade our final artifacts 
instead of shading individual libraries' jars? I remember  doing something of 
this sort in the very early stages of YARN.

> Shade Guava everywhere
> --
>
> Key: HADOOP-14284
> URL: https://issues.apache.org/jira/browse/HADOOP-14284
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha3
>Reporter: Andrew Wang
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: HADOOP-14238.pre001.patch, HADOOP-14284.002.patch, 
> HADOOP-14284.004.patch, HADOOP-14284.007.patch, HADOOP-14284.010.patch
>
>
> HADOOP-10101 upgraded the guava version for 3.x to 21.
> Guava is broadly used by Java projects that consume our artifacts. 
> Unfortunately, these projects also consume our private artifacts like 
> {{hadoop-hdfs}}. They also are unlikely on the new shaded client introduced 
> by HADOOP-11804, currently only available in 3.0.0-alpha2.
> We should shade Guava everywhere to proactively avoid breaking downstreams. 
> This isn't a requirement for all dependency upgrades, but it's necessary for 
> known-bad dependencies like Guava.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13543) [Umbrella] Analyse 2.8.0 and 3.0.0-alpha1 jdiff reports and fix any issues

2016-08-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447120#comment-15447120
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-13543:
--

bq. FWIW I'm working on a wrapper for Java ACC that provides more user-friendly 
API reports than JDiff. My WIP patch should already be usable.
Sure, once we have something ready there, we can compare and contrast the 
reports.

This JIRA's focus is on fixing the unnecessary incompatible changes that jdiff 
already recognized and fix them so as to unblock 2.8.0 and 3.0.0-alpha1.

> [Umbrella] Analyse 2.8.0 and 3.0.0-alpha1 jdiff reports and fix any issues
> --
>
> Key: HADOOP-13543
> URL: https://issues.apache.org/jira/browse/HADOOP-13543
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
>
> Now that we have fixed JDiff report generation for 2.8.0 and above, we should 
> analyse them.
> For the previous releases, I was applying the jdiff patches myself, and 
> analysed them offline. It's better to track them here now that the reports 
> are automatically getting generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13544) JDiff reports unncessarily show unannotated APIs and cause confusion while our javadocs only show annotated and public APIs

2016-08-25 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13544:
-
Attachment: HADOOP-13544-20160825.txt

Here's  a patch that fixes the problem. Now, just like our official javadocs, 
JDiff report only looks at annotated APIs that are marked Public.

The patch is large because I also went back to the old 2.7.2 release and 
regenerated the jdiff files following this new convention.

> JDiff reports unncessarily show unannotated APIs and cause confusion while 
> our javadocs only show annotated and public APIs
> ---
>
> Key: HADOOP-13544
> URL: https://issues.apache.org/jira/browse/HADOOP-13544
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Attachments: HADOOP-13544-20160825.txt
>
>
> Our javadocs only show annotated and @Public APIs (original JIRAs 
> HADOOP-7782, HADOOP-6658).
> But the jdiff shows all APIs that are not annotated @Private. This causes 
> confusion on how we read the reports and what APIs we really broke.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13544) JDiff reports unncessarily show unannotated APIs and cause confusion while our javadocs only show annotated and public APIs

2016-08-25 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13544:
-
Status: Patch Available  (was: Open)

> JDiff reports unncessarily show unannotated APIs and cause confusion while 
> our javadocs only show annotated and public APIs
> ---
>
> Key: HADOOP-13544
> URL: https://issues.apache.org/jira/browse/HADOOP-13544
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Attachments: HADOOP-13544-20160825.txt
>
>
> Our javadocs only show annotated and @Public APIs (original JIRAs 
> HADOOP-7782, HADOOP-6658).
> But the jdiff shows all APIs that are not annotated @Private. This causes 
> confusion on how we read the reports and what APIs we really broke.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13544) JDiff reports unncessarily show unannotated APIs and cause confusion while our javadocs only show annotated and public APIs

2016-08-24 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created HADOOP-13544:


 Summary: JDiff reports unncessarily show unannotated APIs and 
cause confusion while our javadocs only show annotated and public APIs
 Key: HADOOP-13544
 URL: https://issues.apache.org/jira/browse/HADOOP-13544
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker


Our javadocs only show annotated and @Public APIs (original JIRAs HADOOP-7782, 
HADOOP-6658).

But the jdiff shows all APIs that are not annotated @Private. This causes 
confusion on how we read the reports and what APIs we really broke.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13423) Run JDiff on trunk for Hadoop-Common and analyze results

2016-08-24 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13423:
-
Issue Type: Sub-task  (was: Bug)
Parent: HADOOP-13543

> Run JDiff on trunk for Hadoop-Common and analyze results
> 
>
> Key: HADOOP-13423
> URL: https://issues.apache.org/jira/browse/HADOOP-13423
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: 3.0.0-alpha1-hadoop-common-jdiff.zip, 
> 3.0.0-alpha1-jdiff.zip
>
>
> We need to run JDiff and make sure the first 3.0.0 alpha release doesn't 
> include unnecessary API incompatible change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13543) [Umbrella] Analyse 2.8.0 and 3.0.0-alpha1 jdiff reports and fix any issues

2016-08-24 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created HADOOP-13543:


 Summary: [Umbrella] Analyse 2.8.0 and 3.0.0-alpha1 jdiff reports 
and fix any issues
 Key: HADOOP-13543
 URL: https://issues.apache.org/jira/browse/HADOOP-13543
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker


Now that we have fixed JDiff report generation for 2.8.0 and above, we should 
analyse them.

For the previous releases, I was applying the jdiff patches myself, and 
analysed them offline. It's better to track them here now that the reports are 
automatically getting generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-10776) Open up already widely-used APIs for delegation-token fetching & renewal to ecosystem projects

2016-08-22 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431512#comment-15431512
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-10776:
--

And /cc'ing [~steve_l] too.

> Open up already widely-used APIs for delegation-token fetching & renewal to 
> ecosystem projects
> --
>
> Key: HADOOP-10776
> URL: https://issues.apache.org/jira/browse/HADOOP-10776
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Robert Joseph Evans
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Attachments: HADOOP-10776-20160822.txt
>
>
> Storm would like to be able to fetch delegation tokens and forward them on to 
> running topologies so that they can access HDFS (STORM-346).  But to do so we 
> need to open up access to some of APIs. 
> Most notably FileSystem.addDelegationTokens(), Token.renew, 
> Credentials.getAllTokens, and UserGroupInformation but there may be others.
> At a minimum adding in storm to the list of allowed API users. But ideally 
> making them public. Restricting access to such important functionality to 
> just MR really makes secure HDFS inaccessible to anything except MR, or tools 
> that reuse MR input formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-10776) Open up already widely-used APIs for delegation-token fetching & renewal to ecosystem projects

2016-08-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-10776:
-
Summary: Open up already widely-used APIs for delegation-token fetching & 
renewal to ecosystem projects  (was: Open up Delegation token fetching and 
renewal to STORM (Possibly others))

> Open up already widely-used APIs for delegation-token fetching & renewal to 
> ecosystem projects
> --
>
> Key: HADOOP-10776
> URL: https://issues.apache.org/jira/browse/HADOOP-10776
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Robert Joseph Evans
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Attachments: HADOOP-10776-20160822.txt
>
>
> Storm would like to be able to fetch delegation tokens and forward them on to 
> running topologies so that they can access HDFS (STORM-346).  But to do so we 
> need to open up access to some of APIs. 
> Most notably FileSystem.addDelegationTokens(), Token.renew, 
> Credentials.getAllTokens, and UserGroupInformation but there may be others.
> At a minimum adding in storm to the list of allowed API users. But ideally 
> making them public. Restricting access to such important functionality to 
> just MR really makes secure HDFS inaccessible to anything except MR, or tools 
> that reuse MR input formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-10776) Open up Delegation token fetching and renewal to STORM (Possibly others)

2016-08-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-10776:
-
Assignee: Vinod Kumar Vavilapalli
  Status: Patch Available  (was: Open)

> Open up Delegation token fetching and renewal to STORM (Possibly others)
> 
>
> Key: HADOOP-10776
> URL: https://issues.apache.org/jira/browse/HADOOP-10776
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Robert Joseph Evans
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Attachments: HADOOP-10776-20160822.txt
>
>
> Storm would like to be able to fetch delegation tokens and forward them on to 
> running topologies so that they can access HDFS (STORM-346).  But to do so we 
> need to open up access to some of APIs. 
> Most notably FileSystem.addDelegationTokens(), Token.renew, 
> Credentials.getAllTokens, and UserGroupInformation but there may be others.
> At a minimum adding in storm to the list of allowed API users. But ideally 
> making them public. Restricting access to such important functionality to 
> just MR really makes secure HDFS inaccessible to anything except MR, or tools 
> that reuse MR input formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-10776) Open up Delegation token fetching and renewal to STORM (Possibly others)

2016-08-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-10776:
-
Attachment: HADOOP-10776-20160822.txt

Taking a quick crack at making some of the already very widely used security 
related class public.

The patch makes the following public
 - Classes: AccessControlException, Credentials, UserGroupInformation, 
AuthorizationException, Token.TrivialRenewer, 
AbstractDelegationTokenIdentifier, AbstractDelegationTokenSecretManager
 - Methods: FileSystem.getCanonicalServiceName(), 
FileSystem.addDelegationTokens()

Couple of general notes
 - I'd like to skip the evolving vs public discussion for now and focus only on 
visibility - so I just marked everything evolving.
 - I did a quick search and obviously there are a lot more classes that need 
more careful thinking. Unless I've missed some of the very obvious ones, I'd 
like to make progress on getting the current ones done first.

[~revans2], [~cnauroth], [~arpitagarwal], can one or more of you quickly look 
at this? Shouldn't take more than 5-10 minutes.

> Open up Delegation token fetching and renewal to STORM (Possibly others)
> 
>
> Key: HADOOP-10776
> URL: https://issues.apache.org/jira/browse/HADOOP-10776
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Robert Joseph Evans
>Priority: Blocker
> Attachments: HADOOP-10776-20160822.txt
>
>
> Storm would like to be able to fetch delegation tokens and forward them on to 
> running topologies so that they can access HDFS (STORM-346).  But to do so we 
> need to open up access to some of APIs. 
> Most notably FileSystem.addDelegationTokens(), Token.renew, 
> Credentials.getAllTokens, and UserGroupInformation but there may be others.
> At a minimum adding in storm to the list of allowed API users. But ideally 
> making them public. Restricting access to such important functionality to 
> just MR really makes secure HDFS inaccessible to anything except MR, or tools 
> that reuse MR input formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13524) mvn eclipse:eclipse generates .gitignore'able files

2016-08-19 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13524:
-
Attachment: HADOOP-13524.txt

Simple patch adding these files to .gitignore.

> mvn eclipse:eclipse generates .gitignore'able files
> ---
>
> Key: HADOOP-13524
> URL: https://issues.apache.org/jira/browse/HADOOP-13524
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: HADOOP-13524.txt
>
>
> {code}
> $ git status
> On branch trunk
> Your branch is up-to-date with 'origin/trunk'.
> Untracked files:
>   (use "git add ..." to include in what will be committed)
> hadoop-build-tools/.externalToolBuilders/
> hadoop-build-tools/maven-eclipse.xml
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13524) mvn eclipse:eclipse generates .gitignore'able files

2016-08-19 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13524:
-
Status: Patch Available  (was: Open)

> mvn eclipse:eclipse generates .gitignore'able files
> ---
>
> Key: HADOOP-13524
> URL: https://issues.apache.org/jira/browse/HADOOP-13524
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: HADOOP-13524.txt
>
>
> {code}
> $ git status
> On branch trunk
> Your branch is up-to-date with 'origin/trunk'.
> Untracked files:
>   (use "git add ..." to include in what will be committed)
> hadoop-build-tools/.externalToolBuilders/
> hadoop-build-tools/maven-eclipse.xml
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13524) mvn eclipse:eclipse generates .gitignore'able files

2016-08-19 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13524:
-
Description: 
{code}
$ git status
On branch trunk
Your branch is up-to-date with 'origin/trunk'.
Untracked files:
  (use "git add ..." to include in what will be committed)

hadoop-build-tools/.externalToolBuilders/
hadoop-build-tools/maven-eclipse.xml
{code}

  was:
$ git status
On branch trunk
Your branch is up-to-date with 'origin/trunk'.
Untracked files:
  (use "git add ..." to include in what will be committed)

hadoop-build-tools/.externalToolBuilders/
hadoop-build-tools/maven-eclipse.xml


> mvn eclipse:eclipse generates .gitignore'able files
> ---
>
> Key: HADOOP-13524
> URL: https://issues.apache.org/jira/browse/HADOOP-13524
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: HADOOP-13524.txt
>
>
> {code}
> $ git status
> On branch trunk
> Your branch is up-to-date with 'origin/trunk'.
> Untracked files:
>   (use "git add ..." to include in what will be committed)
> hadoop-build-tools/.externalToolBuilders/
> hadoop-build-tools/maven-eclipse.xml
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13524) mvn eclipse:eclipse generates .gitignore'able files

2016-08-19 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created HADOOP-13524:


 Summary: mvn eclipse:eclipse generates .gitignore'able files
 Key: HADOOP-13524
 URL: https://issues.apache.org/jira/browse/HADOOP-13524
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


$ git status
On branch trunk
Your branch is up-to-date with 'origin/trunk'.
Untracked files:
  (use "git add ..." to include in what will be committed)

hadoop-build-tools/.externalToolBuilders/
hadoop-build-tools/maven-eclipse.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13250) jdiff and dependency reports aren't linked in site web pages

2016-08-19 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13250:
-
Assignee: Vinod Kumar Vavilapalli
Target Version/s: 2.8.0, 3.0.0-alpha1
Priority: Critical  (was: Major)

Been thinking about this myself too for a while and found this JIRA already 
exists.

Will get this in starting 2.8.0 and 3.0.0-alpha1 to get the visibility, marking 
against these versions.

The next step is to include jdiff checking in per-patch validation - will file 
a JIRA for that too.

> jdiff and dependency reports aren't linked in site web pages
> 
>
> Key: HADOOP-13250
> URL: https://issues.apache.org/jira/browse/HADOOP-13250
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, documentation
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
>
> Even though they are in the site tar ball (after HADOOP-13245), they aren't 
> actually reachable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13428) Fix hadoop-common to generate jdiff

2016-08-19 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13428:
-
Fix Version/s: 3.0.0-alpha1

Pushed this into branch-3.0.0-alpha1 too.

> Fix hadoop-common to generate jdiff
> ---
>
> Key: HADOOP-13428
> URL: https://issues.apache.org/jira/browse/HADOOP-13428
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HADOOP-13428-branch-2.8.005.patch, HADOOP-13428.1.patch, 
> HADOOP-13428.2.patch, HADOOP-13428.3.patch, HADOOP-13428.4.patch, 
> HADOOP-13428.5.patch, metric-system-temp-fix.patch
>
>
> Hadoop-common failed to generate JDiff. We need to fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13428) Fix hadoop-common to generate jdiff

2016-08-19 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-13428:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed this to trunk, branch-2 and branch-2.8. Thanks [~leftnoteasy].

> Fix hadoop-common to generate jdiff
> ---
>
> Key: HADOOP-13428
> URL: https://issues.apache.org/jira/browse/HADOOP-13428
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: HADOOP-13428-branch-2.8.005.patch, HADOOP-13428.1.patch, 
> HADOOP-13428.2.patch, HADOOP-13428.3.patch, HADOOP-13428.4.patch, 
> HADOOP-13428.5.patch, metric-system-temp-fix.patch
>
>
> Hadoop-common failed to generate JDiff. We need to fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13428) Fix hadoop-common to generate jdiff

2016-08-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429158#comment-15429158
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-13428:
--

The whitespace warnings are reported on 
hadoop-common-project/hadoop-common/src/site/resources/core-default.xml which 
doesn't exist at all. Not in the patch, not on trunk, branch2. I also tried 
running mvn site to see if that gets generated - nope. If this happens again, 
we'll have to investigate more, but I have hit a dead-end here.

Anyways, it is unrelated to the patch.

The latest patch looks good to me. +1, checking this in.

> Fix hadoop-common to generate jdiff
> ---
>
> Key: HADOOP-13428
> URL: https://issues.apache.org/jira/browse/HADOOP-13428
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: HADOOP-13428-branch-2.8.005.patch, HADOOP-13428.1.patch, 
> HADOOP-13428.2.patch, HADOOP-13428.3.patch, HADOOP-13428.4.patch, 
> HADOOP-13428.5.patch, metric-system-temp-fix.patch
>
>
> Hadoop-common failed to generate JDiff. We need to fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13428) Fix hadoop-common to generate jdiff

2016-08-18 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427318#comment-15427318
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-13428:
--

Glad to see progress on this. This approach looks good to me to work around the 
jdiff bug.

Can we use 2.7.2 as the base stable version instead of 2.7.3 as 2.7.3 is still 
under release?

Also fix the whitespace issues before uploading the next patch?

> Fix hadoop-common to generate jdiff
> ---
>
> Key: HADOOP-13428
> URL: https://issues.apache.org/jira/browse/HADOOP-13428
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: HADOOP-13428.1.patch, HADOOP-13428.2.patch, 
> HADOOP-13428.3.patch, metric-system-temp-fix.patch
>
>
> Hadoop-common failed to generate JDiff. We need to fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



  1   2   3   4   5   >