[jira] [Work started] (HDDS-1132) Ozone serialization codec for Ozone S3 secret table

2019-03-06 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDDS-1132 started by Zsolt Venczel.
---
> Ozone serialization codec for Ozone S3 secret table
> ---
>
> Key: HDDS-1132
> URL: https://issues.apache.org/jira/browse/HDDS-1132
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager, S3
>Reporter: Elek, Marton
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
>
> HDDS-748/HDDS-864 introduced an option to use strongly typed metadata tables 
> and separated the serialization/deserialization logic to separated codec 
> implementation
> HDDS-937 introduced a new S3 secret table which is not codec based.
> I propose to use codecs for this table.
> In OzoneMetadataManager the return value of getS3SecretTable() should be 
> changed from Table to Table. 
> The encoding/decoding logic of S3SecretValue should be registered in 
> ~OzoneMetadataManagerImpl:L204
> As the codecs are type based we may need a wrapper class to encode the String 
> kerberos id with md5: class S3SecretKey(String name = kerberodId). Long term 
> we can modify the S3SecretKey to support multiple keys for the same kerberos 
> id.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-1132) Ozone serialization codec for Ozone S3 secret table

2019-02-28 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel reassigned HDDS-1132:
---

Assignee: Zsolt Venczel

> Ozone serialization codec for Ozone S3 secret table
> ---
>
> Key: HDDS-1132
> URL: https://issues.apache.org/jira/browse/HDDS-1132
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Manager, S3
>Reporter: Elek, Marton
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
>
> HDDS-748/HDDS-864 introduced an option to use strongly typed metadata tables 
> and separated the serialization/deserialization logic to separated codec 
> implementation
> HDDS-937 introduced a new S3 secret table which is not codec based.
> I propose to use codecs for this table.
> In OzoneMetadataManager the return value of getS3SecretTable() should be 
> changed from Table to Table. 
> The encoding/decoding logic of S3SecretValue should be registered in 
> ~OzoneMetadataManagerImpl:L204
> As the codecs are type based we may need a wrapper class to encode the String 
> kerberos id with md5: class S3SecretKey(String name = kerberodId). Long term 
> we can modify the S3SecretKey to support multiple keys for the same kerberos 
> id.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14121) Log message about the old hosts file format is misleading

2018-12-14 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721319#comment-16721319
 ] 

Zsolt Venczel commented on HDFS-14121:
--

Good point [~templedf].

Will the legacy format be deprecated?

Probably to make users drive away from potentially not supported formats a 
warning is more useful otherwise I agree that an info level is fine.

> Log message about the old hosts file format is misleading
> -
>
> Key: HDFS-14121
> URL: https://issues.apache.org/jira/browse/HDFS-14121
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-14121.01.patch, HDFS-14121.02.patch
>
>
> In {{CombinedHostsFileReader.readFile()}} we have the following:
> {code}  LOG.warn("{} has invalid JSON format." +
>   "Try the old format without top-level token defined.", 
> hostsFile);{code}
> That message is trying to say that we tried parsing the hosts file as a 
> well-formed JSON file and failed, so we're going to try again assuming that 
> it's in the old badly-formed format.  What it actually says is that the hosts 
> fie is bad, and the admin should try switching to the old format.  Those are 
> two very different things.
> While were in there, we should refactor the logging so that instead of 
> reporting that we're going to try using a different parser (who the heck 
> cares?), we report that the we had to use the old parser to successfully 
> parse the hosts file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14121) Log message about the old hosts file format is misleading

2018-12-13 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720240#comment-16720240
 ] 

Zsolt Venczel commented on HDFS-14121:
--

Thanks [~knanasi] and [~templedf] for the valuable feedback. I've tried to 
address your concerns in the latest patch!

> Log message about the old hosts file format is misleading
> -
>
> Key: HDFS-14121
> URL: https://issues.apache.org/jira/browse/HDFS-14121
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-14121.01.patch, HDFS-14121.02.patch
>
>
> In {{CombinedHostsFileReader.readFile()}} we have the following:
> {code}  LOG.warn("{} has invalid JSON format." +
>   "Try the old format without top-level token defined.", 
> hostsFile);{code}
> That message is trying to say that we tried parsing the hosts file as a 
> well-formed JSON file and failed, so we're going to try again assuming that 
> it's in the old badly-formed format.  What it actually says is that the hosts 
> fie is bad, and the admin should try switching to the old format.  Those are 
> two very different things.
> While were in there, we should refactor the logging so that instead of 
> reporting that we're going to try using a different parser (who the heck 
> cares?), we report that the we had to use the old parser to successfully 
> parse the hosts file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14121) Log message about the old hosts file format is misleading

2018-12-13 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-14121:
-
Attachment: HDFS-14121.02.patch

> Log message about the old hosts file format is misleading
> -
>
> Key: HDFS-14121
> URL: https://issues.apache.org/jira/browse/HDFS-14121
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-14121.01.patch, HDFS-14121.02.patch
>
>
> In {{CombinedHostsFileReader.readFile()}} we have the following:
> {code}  LOG.warn("{} has invalid JSON format." +
>   "Try the old format without top-level token defined.", 
> hostsFile);{code}
> That message is trying to say that we tried parsing the hosts file as a 
> well-formed JSON file and failed, so we're going to try again assuming that 
> it's in the old badly-formed format.  What it actually says is that the hosts 
> fie is bad, and the admin should try switching to the old format.  Those are 
> two very different things.
> While were in there, we should refactor the logging so that instead of 
> reporting that we're going to try using a different parser (who the heck 
> cares?), we report that the we had to use the old parser to successfully 
> parse the hosts file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock

2018-12-13 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720142#comment-16720142
 ] 

Zsolt Venczel commented on HDFS-14101:
--

Thanks a lot [~mackrorysd] for the meaningful note and the commit!

We might want to increase the corruption length or not use random at all in 
unit tests?

> Random failure of testListCorruptFilesCorruptedBlock
> 
>
> Key: HDFS-14101
> URL: https://issues.apache.org/jira/browse/HDFS-14101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.2.0, 3.0.3, 2.8.5
>Reporter: Kihwal Lee
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0
>
> Attachments: HDFS-14101.01.patch, HDFS-14101.02.patch
>
>
> We've seen this occasionally.
> {noformat}
> java.lang.IllegalArgumentException: Negative position
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755)
>   at org.apache.hadoop.hdfs.server.namenode.
>  
> TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105)
> {noformat}
> The test has a flaw.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13843) RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI

2018-12-10 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714593#comment-16714593
 ] 

Zsolt Venczel commented on HDFS-13843:
--

Thanks for your feedback [~SoumyaPN]!
{quote}Sure.It doesn't . It was related to showing up the info in UI when two 
destinations are mounted to one NS.NS1->tmp1 and NS1->tmp2 then in the UI it 
was not showing order.
{quote}
I might have misunderstood but based on the description _"But order information 
like HASH, RANDOM is not displayed in mount entries and also not displayed in 
federation router UI."_ the order information was not displayed in mount 
entries *and* not displayed in the UI.
 I added the order information that was missing and fixed the UI as it was not 
displaying order information ever (not just in case of multiple destinations).
Based on your reply I assume the UI fix is needed only. Am I right?

{quote}There is already one JIRA addressing it.
{quote}
Can you point me to that jira? Should this jira be closed as a duplicate then?

You are right about compatibility problems. Would adding a new command to 
require the Order list be more beneficial?

Best regards,
 Zsolt

> RBF: When we add/update mount entry to multiple destinations, unable to see 
> the order information in mount entry points and in federation router UI
> ---
>
> Key: HDFS-13843
> URL: https://issues.apache.org/jira/browse/HDFS-13843
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-13843.01.patch, HDFS-13843.02.patch
>
>
> *Scenario:*
> Execute the below add/update command for single mount entry for single 
> nameservice pointing to multiple destinations. 
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3
>  # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order 
> RANDOM
> *Actual*. With the above commands, mount entry is successfully updated.
> But order information like HASH, RANDOM is not displayed in mount entries and 
> also not displayed in federation router UI. However order information is 
> updated properly when there are multiple nameservices. This issue is with 
> single nameservice having multiple destinations.
> *Expected:* 
> *Order information should be updated in mount entries so that the user will 
> come to know which order has been set.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13843) RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI

2018-12-10 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13843:
-
Summary: RBF: When we add/update mount entry to multiple destinations, 
unable to see the order information in mount entry points and in federation 
router UI  (was: RBF: show the order when listing mount points)

> RBF: When we add/update mount entry to multiple destinations, unable to see 
> the order information in mount entry points and in federation router UI
> ---
>
> Key: HDFS-13843
> URL: https://issues.apache.org/jira/browse/HDFS-13843
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-13843.01.patch, HDFS-13843.02.patch
>
>
> *Scenario:*
> Execute the below add/update command for single mount entry for single 
> nameservice pointing to multiple destinations. 
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3
>  # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order 
> RANDOM
> *Actual*. With the above commands, mount entry is successfully updated.
> But order information like HASH, RANDOM is not displayed in mount entries and 
> also not displayed in federation router UI. However order information is 
> updated properly when there are multiple nameservices. This issue is with 
> single nameservice having multiple destinations.
> *Expected:* 
> *Order information should be updated in mount entries so that the user will 
> come to know which order has been set.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13843) RBF: show the order when listing mount points

2018-12-10 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13843:
-
Attachment: HDFS-13843.02.patch

> RBF: show the order when listing mount points
> -
>
> Key: HDFS-13843
> URL: https://issues.apache.org/jira/browse/HDFS-13843
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-13843.01.patch, HDFS-13843.02.patch
>
>
> *Scenario:*
> Execute the below add/update command for single mount entry for single 
> nameservice pointing to multiple destinations. 
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3
>  # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order 
> RANDOM
> *Actual*. With the above commands, mount entry is successfully updated.
> But order information like HASH, RANDOM is not displayed in mount entries and 
> also not displayed in federation router UI. However order information is 
> updated properly when there are multiple nameservices. This issue is with 
> single nameservice having multiple destinations.
> *Expected:* 
> *Order information should be updated in mount entries so that the user will 
> come to know which order has been set.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13843) RBF: show the order when listing mount points

2018-12-10 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714526#comment-16714526
 ] 

Zsolt Venczel commented on HDFS-13843:
--

Thank you so much for the quick review [~elgoiri]!
I addressed your concerns in the latest patch!

> RBF: show the order when listing mount points
> -
>
> Key: HDFS-13843
> URL: https://issues.apache.org/jira/browse/HDFS-13843
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-13843.01.patch, HDFS-13843.02.patch
>
>
> *Scenario:*
> Execute the below add/update command for single mount entry for single 
> nameservice pointing to multiple destinations. 
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3
>  # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order 
> RANDOM
> *Actual*. With the above commands, mount entry is successfully updated.
> But order information like HASH, RANDOM is not displayed in mount entries and 
> also not displayed in federation router UI. However order information is 
> updated properly when there are multiple nameservices. This issue is with 
> single nameservice having multiple destinations.
> *Expected:* 
> *Order information should be updated in mount entries so that the user will 
> come to know which order has been set.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13843) RBF: show the order when listing mount points

2018-12-10 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13843:
-
Summary: RBF: show the order when listing mount points  (was: RBF: When we 
add/update mount entry to multiple destinations, unable to see the order 
information in mount entry points and in federation router UI)

> RBF: show the order when listing mount points
> -
>
> Key: HDFS-13843
> URL: https://issues.apache.org/jira/browse/HDFS-13843
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-13843.01.patch
>
>
> *Scenario:*
> Execute the below add/update command for single mount entry for single 
> nameservice pointing to multiple destinations. 
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3
>  # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order 
> RANDOM
> *Actual*. With the above commands, mount entry is successfully updated.
> But order information like HASH, RANDOM is not displayed in mount entries and 
> also not displayed in federation router UI. However order information is 
> updated properly when there are multiple nameservices. This issue is with 
> single nameservice having multiple destinations.
> *Expected:* 
> *Order information should be updated in mount entries so that the user will 
> come to know which order has been set.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13843) RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI

2018-12-09 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13843:
-
Attachment: HDFS-13843.01.patch
Status: Patch Available  (was: In Progress)

> RBF: When we add/update mount entry to multiple destinations, unable to see 
> the order information in mount entry points and in federation router UI
> ---
>
> Key: HDFS-13843
> URL: https://issues.apache.org/jira/browse/HDFS-13843
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: RBF
> Attachments: HDFS-13843.01.patch
>
>
> *Scenario:*
> Execute the below add/update command for single mount entry for single 
> nameservice pointing to multiple destinations. 
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3
>  # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order 
> RANDOM
> *Actual*. With the above commands, mount entry is successfully updated.
> But order information like HASH, RANDOM is not displayed in mount entries and 
> also not displayed in federation router UI. However order information is 
> updated properly when there are multiple nameservices. This issue is with 
> single nameservice having multiple destinations.
> *Expected:* 
> *Order information should be updated in mount entries so that the user will 
> come to know which order has been set.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14121) Log message about the old hosts file format is misleading

2018-12-03 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-14121:
-
Attachment: HDFS-14121.01.patch
Status: Patch Available  (was: Open)

> Log message about the old hosts file format is misleading
> -
>
> Key: HDFS-14121
> URL: https://issues.apache.org/jira/browse/HDFS-14121
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-14121.01.patch
>
>
> In {{CombinedHostsFileReader.readFile()}} we have the following:
> {code}  LOG.warn("{} has invalid JSON format." +
>   "Try the old format without top-level token defined.", 
> hostsFile);{code}
> That message is trying to say that we tried parsing the hosts file as a 
> well-formed JSON file and failed, so we're going to try again assuming that 
> it's in the old badly-formed format.  What it actually says is that the hosts 
> fie is bad, and the admin should try switching to the old format.  Those are 
> two very different things.
> While were in there, we should refactor the logging so that instead of 
> reporting that we're going to try using a different parser (who the heck 
> cares?), we report that the we had to use the old parser to successfully 
> parse the hosts file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14121) Log message about the old hosts file format is misleading

2018-12-03 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel reassigned HDFS-14121:


Assignee: Zsolt Venczel

> Log message about the old hosts file format is misleading
> -
>
> Key: HDFS-14121
> URL: https://issues.apache.org/jira/browse/HDFS-14121
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Zsolt Venczel
>Priority: Major
>
> In {{CombinedHostsFileReader.readFile()}} we have the following:
> {code}  LOG.warn("{} has invalid JSON format." +
>   "Try the old format without top-level token defined.", 
> hostsFile);{code}
> That message is trying to say that we tried parsing the hosts file as a 
> well-formed JSON file and failed, so we're going to try again assuming that 
> it's in the old badly-formed format.  What it actually says is that the hosts 
> fie is bad, and the admin should try switching to the old format.  Those are 
> two very different things.
> While were in there, we should refactor the logging so that instead of 
> reporting that we're going to try using a different parser (who the heck 
> cares?), we report that the we had to use the old parser to successfully 
> parse the hosts file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock

2018-11-29 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703048#comment-16703048
 ] 

Zsolt Venczel commented on HDFS-14101:
--

Thanks for the review [~ayushtkn] and for the valuable comment.

In the latest patch I was trying to make the relation between the minimum file 
size and the size of the data meant to corrupt the block more clear by shared 
constants.

I hope this way it's more meaningful than a comment.

> Random failure of testListCorruptFilesCorruptedBlock
> 
>
> Key: HDFS-14101
> URL: https://issues.apache.org/jira/browse/HDFS-14101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.2.0, 3.0.3, 2.8.5
>Reporter: Kihwal Lee
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
> Attachments: HDFS-14101.01.patch, HDFS-14101.02.patch
>
>
> We've seen this occasionally.
> {noformat}
> java.lang.IllegalArgumentException: Negative position
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755)
>   at org.apache.hadoop.hdfs.server.namenode.
>  
> TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105)
> {noformat}
> The test has a flaw.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock

2018-11-29 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-14101:
-
Attachment: HDFS-14101.02.patch

> Random failure of testListCorruptFilesCorruptedBlock
> 
>
> Key: HDFS-14101
> URL: https://issues.apache.org/jira/browse/HDFS-14101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.2.0, 3.0.3, 2.8.5
>Reporter: Kihwal Lee
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
> Attachments: HDFS-14101.01.patch, HDFS-14101.02.patch
>
>
> We've seen this occasionally.
> {noformat}
> java.lang.IllegalArgumentException: Negative position
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755)
>   at org.apache.hadoop.hdfs.server.namenode.
>  
> TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105)
> {noformat}
> The test has a flaw.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12116) BlockReportTestBase#blockReport_08 and #blockReport_08 intermittently fail

2018-11-29 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel reassigned HDFS-12116:


Assignee: Zsolt Venczel

> BlockReportTestBase#blockReport_08 and #blockReport_08 intermittently fail
> --
>
> Key: HDFS-12116
> URL: https://issues.apache.org/jira/browse/HDFS-12116
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-12116.01.patch, HDFS-12116.02.patch, 
> HDFS-12116.03.patch, 
> TEST-org.apache.hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage.xml
>
>
> This seems to be long-standing, but the failure rate (~10%) is slightly 
> higher in dist-test run in using cdh.
> In both _08 and _09 tests:
> # an attempt is made to make a replica in {{TEMPORARY}}
>  state, by {{waitForTempReplica}}.
> # Once that's returned, the test goes on to verify block reports shows 
> correct pending replication blocks.
> But there's a race condition. If the replica is replicated between steps #1 
> and #2, {{getPendingReplicationBlocks}} could return 0 or 1, depending on how 
> many replicas are replicated, hence failing the test.
> Failures are seen on both {{TestNNHandlesBlockReportPerStorage}} and 
> {{TestNNHandlesCombinedBlockReport}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-11-27 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700631#comment-16700631
 ] 

Zsolt Venczel commented on HDFS-13998:
--

Thanks for the clarification [~templedf].

[~brahmareddy] and [~vinayrpet]

In light of what [~xiaochen] replied here: 
[comment-16685723|https://issues.apache.org/jira/browse/HDFS-13998?focusedCommentId=16685723=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16685723]
and what [~templedf] summarized here: 
[comment-16688719|https://issues.apache.org/jira/browse/HDFS-13998?focusedCommentId=16688719=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16688719]
what do you think, how should this issue progress?

Thanks and best regards,
Zsolt

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, 
> HDFS-13998.03.patch
>
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock

2018-11-27 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700374#comment-16700374
 ] 

Zsolt Venczel commented on HDFS-14101:
--

Thanks [~kihwal] for reporting the issue!

DFSTestUtil.Builder creates files with random size no larger then 512 bytes and 
no smaller then 1 byte.
{code}
 DFSTestUtil util = new DFSTestUtil.Builder().
  setName("testCorruptFilesCorruptedBlock").setNumFiles(2).
  setMaxLevels(1).setMaxSize(512).build();
{code}

Whenever the file size is 1 byte the test fails as it tries to corrupt a block 
by inserting a 2 bytes long buffer starting 2 bytes before the end of the file 
that is -1.
The submitted patch should fix this (statistically this test had failed 1 times 
per 512 run but it was running fine for 2000 runs having the patch).

> Random failure of testListCorruptFilesCorruptedBlock
> 
>
> Key: HDFS-14101
> URL: https://issues.apache.org/jira/browse/HDFS-14101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.2.0, 3.0.3, 2.8.5
>Reporter: Kihwal Lee
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
> Attachments: HDFS-14101.01.patch
>
>
> We've seen this occasionally.
> {noformat}
> java.lang.IllegalArgumentException: Negative position
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755)
>   at org.apache.hadoop.hdfs.server.namenode.
>  
> TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105)
> {noformat}
> The test has a flaw.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock

2018-11-27 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-14101:
-
Attachment: HDFS-14101.01.patch

> Random failure of testListCorruptFilesCorruptedBlock
> 
>
> Key: HDFS-14101
> URL: https://issues.apache.org/jira/browse/HDFS-14101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.2.0, 3.0.3, 2.8.5
>Reporter: Kihwal Lee
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
> Attachments: HDFS-14101.01.patch
>
>
> We've seen this occasionally.
> {noformat}
> java.lang.IllegalArgumentException: Negative position
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755)
>   at org.apache.hadoop.hdfs.server.namenode.
>  
> TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105)
> {noformat}
> The test has a flaw.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock

2018-11-27 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-14101:
-
Status: Patch Available  (was: In Progress)

> Random failure of testListCorruptFilesCorruptedBlock
> 
>
> Key: HDFS-14101
> URL: https://issues.apache.org/jira/browse/HDFS-14101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.3, 3.2.0, 2.8.5
>Reporter: Kihwal Lee
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
> Attachments: HDFS-14101.01.patch
>
>
> We've seen this occasionally.
> {noformat}
> java.lang.IllegalArgumentException: Negative position
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755)
>   at org.apache.hadoop.hdfs.server.namenode.
>  
> TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105)
> {noformat}
> The test has a flaw.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock

2018-11-27 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-14101 started by Zsolt Venczel.

> Random failure of testListCorruptFilesCorruptedBlock
> 
>
> Key: HDFS-14101
> URL: https://issues.apache.org/jira/browse/HDFS-14101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.5
>Reporter: Kihwal Lee
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
>
> We've seen this occasionally.
> {noformat}
> java.lang.IllegalArgumentException: Negative position
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755)
>   at org.apache.hadoop.hdfs.server.namenode.
>  
> TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105)
> {noformat}
> The test has a flaw.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock

2018-11-27 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-14101:
-
Target Version/s: 3.0.4, 3.3.0, 2.8.6, 3.2.1  (was: 2.8.6)

> Random failure of testListCorruptFilesCorruptedBlock
> 
>
> Key: HDFS-14101
> URL: https://issues.apache.org/jira/browse/HDFS-14101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.2.0, 3.0.3, 2.8.5
>Reporter: Kihwal Lee
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
>
> We've seen this occasionally.
> {noformat}
> java.lang.IllegalArgumentException: Negative position
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755)
>   at org.apache.hadoop.hdfs.server.namenode.
>  
> TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105)
> {noformat}
> The test has a flaw.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock

2018-11-27 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-14101:
-
Affects Version/s: (was: 3.2.1)
   (was: 3.3.0)
   3.2.0

> Random failure of testListCorruptFilesCorruptedBlock
> 
>
> Key: HDFS-14101
> URL: https://issues.apache.org/jira/browse/HDFS-14101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.2.0, 3.0.3, 2.8.5
>Reporter: Kihwal Lee
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
>
> We've seen this occasionally.
> {noformat}
> java.lang.IllegalArgumentException: Negative position
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755)
>   at org.apache.hadoop.hdfs.server.namenode.
>  
> TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105)
> {noformat}
> The test has a flaw.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock

2018-11-27 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-14101:
-
Affects Version/s: 3.2.1
   3.3.0
   3.0.3

> Random failure of testListCorruptFilesCorruptedBlock
> 
>
> Key: HDFS-14101
> URL: https://issues.apache.org/jira/browse/HDFS-14101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.3, 2.8.5, 3.3.0, 3.2.1
>Reporter: Kihwal Lee
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
>
> We've seen this occasionally.
> {noformat}
> java.lang.IllegalArgumentException: Negative position
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755)
>   at org.apache.hadoop.hdfs.server.namenode.
>  
> TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105)
> {noformat}
> The test has a flaw.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock

2018-11-27 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel reassigned HDFS-14101:


Assignee: Zsolt Venczel

> Random failure of testListCorruptFilesCorruptedBlock
> 
>
> Key: HDFS-14101
> URL: https://issues.apache.org/jira/browse/HDFS-14101
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.5
>Reporter: Kihwal Lee
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: newbie
>
> We've seen this occasionally.
> {noformat}
> java.lang.IllegalArgumentException: Negative position
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755)
>   at org.apache.hadoop.hdfs.server.namenode.
>  
> TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105)
> {noformat}
> The test has a flaw.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14100) TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml fails due to missing dfs.image.string-tables.expanded from hdfs-defaults.xml

2018-11-26 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel resolved HDFS-14100.
--
Resolution: Invalid

The failure had happened due to a local git issue.

> TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml fails due 
> to missing dfs.image.string-tables.expanded from hdfs-defaults.xml
> 
>
> Key: HDFS-14100
> URL: https://issues.apache.org/jira/browse/HDFS-14100
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
>
> After HDFS-13882 
> TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml requires 
> hdfs-defaults.xml to have dfs.image.string-tables.expanded added and 
> populated with a default value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14100) TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml fails due to missing dfs.image.string-tables.expanded from hdfs-defaults.xml

2018-11-26 Thread Zsolt Venczel (JIRA)
Zsolt Venczel created HDFS-14100:


 Summary: 
TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml fails due 
to missing dfs.image.string-tables.expanded from hdfs-defaults.xml
 Key: HDFS-14100
 URL: https://issues.apache.org/jira/browse/HDFS-14100
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zsolt Venczel
Assignee: Zsolt Venczel


After HDFS-13882 
TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml requires 
hdfs-defaults.xml to have dfs.image.string-tables.expanded added and populated 
with a default value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14054) TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky

2018-11-14 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686550#comment-16686550
 ] 

Zsolt Venczel edited comment on HDFS-14054 at 11/14/18 2:50 PM:


The failure happened due to FSEditLog.endCurrentLogSegment not being mocked 
early enough that had caused the edit log finalization to fail.

In very rare cases I've seen NPE in line 573. that is handled as well.

Also in very rare cases the waitForMillis for line 575. was not enough.


was (Author: zvenczel):
The failure happened due to FSEditLog.endCurrentLogSegment not being mocked 
early enough that had caused the edit log finalization to fail.

In very rare cases I've seen NPE in line 573. that is handled as well.

Also in very rare cases the timeout for line 575. was not enough.

> TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and 
> testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky
> 
>
> Key: HDFS-14054
> URL: https://issues.apache.org/jira/browse/HDFS-14054
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: flaky-test
> Attachments: HDFS-14054.01.patch
>
>
> ---
>  T E S T S
> ---
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support 
> was removed in 8.0
> Running org.apache.hadoop.hdfs.TestLeaseRecovery2
> Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2
> testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2)
>   Time elapsed: 4.375 sec  <<< FAILURE!
> java.lang.AssertionError: lease holder should now be the NN
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437)
> testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2)
>   Time elapsed: 4.339 sec  <<< FAILURE!
> java.lang.AssertionError: lease holder should now be the NN
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443)
> Results :
> Failed tests: 
>   
> TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568
>  lease holder should now be the NN
>   
> TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568
>  lease holder should now be the NN
> Tests run: 7, Failures: 2, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14054) TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky

2018-11-14 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686550#comment-16686550
 ] 

Zsolt Venczel commented on HDFS-14054:
--

The failure happened due to FSEditLog.endCurrentLogSegment not being mocked 
early enough that had caused the edit log finalization to fail.

In very rare cases I've seen NPE in line 573. that is handled as well.

Also in very rare cases the timeout for line 575. was not enough.

> TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and 
> testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky
> 
>
> Key: HDFS-14054
> URL: https://issues.apache.org/jira/browse/HDFS-14054
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: flaky-test
> Attachments: HDFS-14054.01.patch
>
>
> ---
>  T E S T S
> ---
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support 
> was removed in 8.0
> Running org.apache.hadoop.hdfs.TestLeaseRecovery2
> Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2
> testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2)
>   Time elapsed: 4.375 sec  <<< FAILURE!
> java.lang.AssertionError: lease holder should now be the NN
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437)
> testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2)
>   Time elapsed: 4.339 sec  <<< FAILURE!
> java.lang.AssertionError: lease holder should now be the NN
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443)
> Results :
> Failed tests: 
>   
> TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568
>  lease holder should now be the NN
>   
> TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568
>  lease holder should now be the NN
> Tests run: 7, Failures: 2, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14054) TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky

2018-11-14 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-14054:
-
Attachment: HDFS-14054.01.patch

> TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and 
> testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky
> 
>
> Key: HDFS-14054
> URL: https://issues.apache.org/jira/browse/HDFS-14054
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: flaky-test
> Attachments: HDFS-14054.01.patch
>
>
> ---
>  T E S T S
> ---
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support 
> was removed in 8.0
> Running org.apache.hadoop.hdfs.TestLeaseRecovery2
> Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2
> testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2)
>   Time elapsed: 4.375 sec  <<< FAILURE!
> java.lang.AssertionError: lease holder should now be the NN
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437)
> testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2)
>   Time elapsed: 4.339 sec  <<< FAILURE!
> java.lang.AssertionError: lease holder should now be the NN
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443)
> Results :
> Failed tests: 
>   
> TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568
>  lease holder should now be the NN
>   
> TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568
>  lease holder should now be the NN
> Tests run: 7, Failures: 2, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14054) TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky

2018-11-14 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-14054:
-
Status: Patch Available  (was: In Progress)

> TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and 
> testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky
> 
>
> Key: HDFS-14054
> URL: https://issues.apache.org/jira/browse/HDFS-14054
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.3, 2.6.0
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: flaky-test
> Attachments: HDFS-14054.01.patch
>
>
> ---
>  T E S T S
> ---
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support 
> was removed in 8.0
> Running org.apache.hadoop.hdfs.TestLeaseRecovery2
> Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2
> testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2)
>   Time elapsed: 4.375 sec  <<< FAILURE!
> java.lang.AssertionError: lease holder should now be the NN
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437)
> testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2)
>   Time elapsed: 4.339 sec  <<< FAILURE!
> java.lang.AssertionError: lease holder should now be the NN
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443)
> Results :
> Failed tests: 
>   
> TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568
>  lease holder should now be the NN
>   
> TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568
>  lease holder should now be the NN
> Tests run: 7, Failures: 2, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-11-12 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683575#comment-16683575
 ] 

Zsolt Venczel commented on HDFS-13998:
--

Thank you [~brahmareddy] for taking a look!

Please find my comments below:
{quote}IMHO, HDFS-13732 change might not require..? As admin will be aware of 
configured policy and these are admin commands.
{quote}
For supportability reasons helping out administrators (there could be many) by 
displaying the actual outcome of their actions can be valuable.
 We support them by providing a warning message as well when the directory is 
not empty. I think this is also valuable despite its load (a listStatus command 
is executed that adds an extra audit log entry and might also return 1000 
FileStatus information by default if the directory is large enough).
{quote}Adding RPC can mislead

For concurrent calls and any error while getting the policy after setting.
{quote}
In this scenario not knowing the default might be even worse.
{quote}and Extra overhead as Ayush Saxena mentioned.

Audit log ( for debugging) and RPC call
{quote}
I think we have a common understanding with [~ayushtkn] here that the overhead 
would be worth it. [~ayushtkn] can you please comment?
{quote}If we really required why can't we do through getserverdefaults()(by 
adding EC field there).
{quote}
I think any change on the default EC policy would not be reflected in the 
serverdefaults on the client without config re-distribution that might also 
lead to confusions.

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, 
> HDFS-13998.03.patch
>
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-14054) TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky

2018-11-12 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-14054 started by Zsolt Venczel.

> TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and 
> testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky
> 
>
> Key: HDFS-14054
> URL: https://issues.apache.org/jira/browse/HDFS-14054
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: flaky-test
>
> ---
>  T E S T S
> ---
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support 
> was removed in 8.0
> Running org.apache.hadoop.hdfs.TestLeaseRecovery2
> Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2
> testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2)
>   Time elapsed: 4.375 sec  <<< FAILURE!
> java.lang.AssertionError: lease holder should now be the NN
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437)
> testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2)
>   Time elapsed: 4.339 sec  <<< FAILURE!
> java.lang.AssertionError: lease holder should now be the NN
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
>   at 
> org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443)
> Results :
> Failed tests: 
>   
> TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568
>  lease holder should now be the NN
>   
> TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568
>  lease holder should now be the NN
> Tests run: 7, Failures: 2, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-11-10 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682428#comment-16682428
 ] 

Zsolt Venczel commented on HDFS-13998:
--

Thanks for the review [~templedf]!

As I can see the extraneous " " was there for some time now and it was added 
due to the line length checkstyle limitation.
I reshuffled this code section a bit to look prettier.

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, 
> HDFS-13998.03.patch
>
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-11-10 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13998:
-
Attachment: HDFS-13998.03.patch

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, 
> HDFS-13998.03.patch
>
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-11-09 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13998:
-
Attachment: HDFS-13998.02.patch

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch
>
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13985) Clearer error message for ReplicaNotFoundException

2018-11-09 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681145#comment-16681145
 ] 

Zsolt Venczel commented on HDFS-13985:
--

Thanks [~adam.antal] for the update.
I think patch 002 is good to go. +1 (non-binding)

> Clearer error message for ReplicaNotFoundException
> --
>
> Key: HDFS-13985
> URL: https://issues.apache.org/jira/browse/HDFS-13985
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: HDFS-13985.001.patch, HDFS-13985.002.patch
>
>
> The issue is that we came across a ReplicaNotFoundException in a bug report, 
> the most informative thing we could get is "Replica not found for 
> [ExtendedBlock]". If someone tries to investigate cases including 
> ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, 
> but as a starting point enhancing the exception message would boost this 
> process, and be beneficial in the long run.
> More concretely, it would be helpful if any of the following information was 
> displayed along with the exception: file's name, replication factor or block 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-11-08 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679470#comment-16679470
 ] 

Zsolt Venczel commented on HDFS-13998:
--

[~ayushtkn] your proposed solution is more efficient I completely agree!
It also leaves some gap for race conditions I was also intending to close but I 
would agree on dropping these concern.
What do you think [~xiaochen]?

Best regards,
Zsolt

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13998.01.patch
>
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-11-07 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678544#comment-16678544
 ] 

Zsolt Venczel commented on HDFS-13998:
--

[~ayushtkn] thanks for sharing your concerns about setPolicy adding additional 
audit log entries.

My solution for HDFS-13732 added an additional getPolicy call to fetch the 
default policy as there is no way for the ECAdmin to know precisely the NN 
default settings.
If you think this is not the preferred solution we could think about extending 
the setPolicy RPC call to return the actual policy set but this is a more 
involved change and should be tracked separately.
What do you think?

Best regards,
Zsolt

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13998.01.patch
>
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-11-07 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13998:
-
Attachment: HDFS-13998.01.patch
Status: Patch Available  (was: In Progress)

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13998.01.patch
>
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13985) Clearer error message for ReplicaNotFoundException

2018-11-07 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677893#comment-16677893
 ] 

Zsolt Venczel edited comment on HDFS-13985 at 11/7/18 9:23 AM:
---

Thanks for the patch [~adam.antal]!

I think the message content in line 43. is fine and is more meaningful. 
Extending the message for the *public ReplicaNotFoundException(ExtendedBlock 
b)* constructor with it makes sense.

I have some concerns with extending the message for the *public 
ReplicaNotFoundException(String msg)* constructor as it has various use-cases 
having various messages that can be distorted by this message extension (a few 
examples in FsDatasetImpl). What do you think?



was (Author: zvenczel):
Thanks for the patch [~adam.antal]!

I think the message content in line 43. is fine and should be more meaningful. 
Extending the message for the *public ReplicaNotFoundException(ExtendedBlock 
b)* constructor with it makes sense.

I have some concerns with extending the message for the *public 
ReplicaNotFoundException(String msg)* constructor as it has various use-cases 
having various messages that can be distorted by this message extension (a few 
examples in FsDatasetImpl). What do you think?


> Clearer error message for ReplicaNotFoundException
> --
>
> Key: HDFS-13985
> URL: https://issues.apache.org/jira/browse/HDFS-13985
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: HDFS-13985.001.patch
>
>
> The issue is that we came across a ReplicaNotFoundException in a bug report, 
> the most informative thing we could get is "Replica not found for 
> [ExtendedBlock]". If someone tries to investigate cases including 
> ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, 
> but as a starting point enhancing the exception message would boost this 
> process, and be beneficial in the long run.
> More concretely, it would be helpful if any of the following information was 
> displayed along with the exception: file's name, replication factor or block 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13985) Clearer error message for ReplicaNotFoundException

2018-11-07 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677893#comment-16677893
 ] 

Zsolt Venczel commented on HDFS-13985:
--

Thanks for the patch [~adam.antal]!

I think the message content in line 43. is fine and should be more meaningful. 
Extending the message for the *public ReplicaNotFoundException(ExtendedBlock 
b)* constructor with it makes sense.

I have some concerns with extending the message for the *public 
ReplicaNotFoundException(String msg)* constructor as it has various use-cases 
having various messages that can be distorted by this message extension (a few 
examples in FsDatasetImpl). What do you think?


> Clearer error message for ReplicaNotFoundException
> --
>
> Key: HDFS-13985
> URL: https://issues.apache.org/jira/browse/HDFS-13985
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: HDFS-13985.001.patch
>
>
> The issue is that we came across a ReplicaNotFoundException in a bug report, 
> the most informative thing we could get is "Replica not found for 
> [ExtendedBlock]". If someone tries to investigate cases including 
> ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, 
> but as a starting point enhancing the exception message would boost this 
> process, and be beneficial in the long run.
> More concretely, it would be helpful if any of the following information was 
> displayed along with the exception: file's name, replication factor or block 
> location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14054) TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky

2018-11-07 Thread Zsolt Venczel (JIRA)
Zsolt Venczel created HDFS-14054:


 Summary: TestLeaseRecovery2: 
testHardLeaseRecoveryAfterNameNodeRestart2 and 
testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky
 Key: HDFS-14054
 URL: https://issues.apache.org/jira/browse/HDFS-14054
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.3, 2.6.0
Reporter: Zsolt Venczel
Assignee: Zsolt Venczel


---
 T E S T S
---
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was 
removed in 8.0
Running org.apache.hadoop.hdfs.TestLeaseRecovery2
Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2
testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2)
  Time elapsed: 4.375 sec  <<< FAILURE!
java.lang.AssertionError: lease holder should now be the NN
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437)
testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2)
  Time elapsed: 4.339 sec  <<< FAILURE!
java.lang.AssertionError: lease holder should now be the NN
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568)
at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520)
at 
org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443)
Results :
Failed tests: 
  
TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568
 lease holder should now be the NN
  
TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568
 lease holder should now be the NN
Tests run: 7, Failures: 2, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-10-30 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669157#comment-16669157
 ] 

Zsolt Venczel commented on HDFS-13998:
--

[~ayushtkn] I started progressing with it.

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-10-30 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-13998 started by Zsolt Venczel.

> ECAdmin NPE with -setPolicy -replicate
> --
>
> Key: HDFS-13998
> URL: https://issues.apache.org/jira/browse/HDFS-13998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
>
> HDFS-13732 tried to improve the output of the console tool. But we missed the 
> fact that for replication, {{getErasureCodingPolicy}} would return null.
> This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13860) Space character in the path is shown as "+" while creating dirs in WebHDFS

2018-10-18 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655009#comment-16655009
 ] 

Zsolt Venczel commented on HDFS-13860:
--

Thank you very much [~shashikant] for reporting the issue and providing a patch!

I went through the code and I find the fix to be fine, it makes webhdfs 
behavior more consistent with the non webhdfs use cases.

One last inconsistency I found was that with your patch, path having "+" will 
be transformed to space and we should use %2B instead.
In my opinion this is a compromise we can live with especially if it's 
documented.

I'd +1 (non-binding) patch 01 with some documentation update.

> Space character in the path is shown as "+" while creating dirs in WebHDFS 
> ---
>
> Key: HDFS-13860
> URL: https://issues.apache.org/jira/browse/HDFS-13860
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.2.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13860.00.patch, HDFS-13860.01.patch
>
>
> $ ./hdfs dfs -mkdir webhdfs://127.0.0.1/tmp1/"file 1"
> 2018-08-23 15:16:08,258 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> $ ./hdfs dfs -ls webhdfs://127.0.0.1/tmp1
> 2018-08-23 15:16:21,244 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Found 1 items
> drwxr-xr-x   - sbanerjee hadoop          0 2018-08-23 15:16 
> webhdfs://127.0.0.1/tmp1/file+1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-10-11 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646073#comment-16646073
 ] 

Zsolt Venczel commented on HDFS-13697:
--

Thanks a lot [~daryn] for your reply!

Just a quick note:
{quote}3. If all tests are passing, the patch is flawed. I recall the tests 
codified bugs.{quote}
I took a look at the flawed tests and fixed them as far as I can tell.

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.11.patch, 
> HDFS-13697.12.patch, HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> 

[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-09-11 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610349#comment-16610349
 ] 

Zsolt Venczel commented on HDFS-13697:
--

Above test failures should be unrelated as they are passing with the patch 
applied:
{code}
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-hdfs ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal
[INFO] Tests run: 38, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.834 
s - in org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal
[INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 115 s - 
in org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
{code}

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.11.patch, 
> HDFS-13697.12.patch, HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> 

[jira] [Comment Edited] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-09-10 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609685#comment-16609685
 ] 

Zsolt Venczel edited comment on HDFS-13697 at 9/10/18 8:48 PM:
---

In my latest patch I fixed the TestEncryptionZonesWithKMS failure.

With the latest patch (12) all above, failed tests have passed:
{code}
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-hdfs ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts
[INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.35 s 
- in org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts
[INFO] Running org.apache.hadoop.hdfs.TestRollingUpgrade
[INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 143.89 
s - in org.apache.hadoop.hdfs.TestRollingUpgrade
[INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 70.541 
s - in org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
[INFO] Running org.apache.hadoop.hdfs.server.balancer.TestBalancer
[INFO] Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 366.403 
s - in org.apache.hadoop.hdfs.server.balancer.TestBalancer
[INFO] Running org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
[INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.862 
s - in org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 117, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
{code}


was (Author: zvenczel):
In my latest patch I fixed the TestEncryptionZonesWithKMS failure.

With the latest patch (11) all above, failed tests have passed:
{code}
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-hdfs ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts
[INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.35 s 
- in org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts
[INFO] Running org.apache.hadoop.hdfs.TestRollingUpgrade
[INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 143.89 
s - in org.apache.hadoop.hdfs.TestRollingUpgrade
[INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 70.541 
s - in org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
[INFO] Running org.apache.hadoop.hdfs.server.balancer.TestBalancer
[INFO] Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 366.403 
s - in org.apache.hadoop.hdfs.server.balancer.TestBalancer
[INFO] Running org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
[INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.862 
s - in org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 117, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
{code}

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.11.patch, 
> HDFS-13697.12.patch, HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. 

[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-09-10 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609685#comment-16609685
 ] 

Zsolt Venczel commented on HDFS-13697:
--

In my latest patch I fixed the TestEncryptionZonesWithKMS failure.

With the latest patch (11) all above, failed tests have passed:
{code}
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-hdfs ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts
[INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.35 s 
- in org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts
[INFO] Running org.apache.hadoop.hdfs.TestRollingUpgrade
[INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 143.89 
s - in org.apache.hadoop.hdfs.TestRollingUpgrade
[INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 70.541 
s - in org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
[INFO] Running org.apache.hadoop.hdfs.server.balancer.TestBalancer
[INFO] Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 366.403 
s - in org.apache.hadoop.hdfs.server.balancer.TestBalancer
[INFO] Running org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
[INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.862 
s - in org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 117, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
{code}

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.11.patch, 
> HDFS-13697.12.patch, HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> 

[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-09-10 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Attachment: HDFS-13697.12.patch

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.11.patch, 
> HDFS-13697.12.patch, HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
>  at 
> 

[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-09-10 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Attachment: HDFS-13697.11.patch

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.11.patch, 
> HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
>  at 
> 

[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-09-10 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609125#comment-16609125
 ] 

Zsolt Venczel commented on HDFS-13697:
--

Thank you so much [~xyao] for the review!

Please find my answers below:
{quote}Line 408: The KMSCP is used by both client (DFSClient) and server (NN). 
authMethod ==PROXY is not a reliable way to cover all the proxy user cases. We 
could change line 408-409 to if 
(UserGroupInformation.getCurrentUser().getRealUser()!=null)
{quote}
I updated this section in my latest patch (11) based on your suggestions.
{quote}Line 412: authMethod=TOKEN case
 Do we use the login user even if the current UGI has KMS delegation token?
{quote}
With the current approach we use the login user only if the authMethod at 
construction time was TOKEN. The potential issue that popped-up for me could be 
HADOOP-13381 that as I can see, after getting rid of the KP cache should no 
longer be a problem. I'll try to double check it though. What do you think?
{quote}Line 484: NIT: can we wrap this with getCachedUgi() similar to 
getDoAsUser() to make future change easier?
{quote}
I updated it as you suggested.
{quote}TestEncryptionZones.java
 Line 1340-1341: can be replaced with DFSTestUtil.mockDFSClientKeyProvider
{quote}
This is a good catch, thanks! After updating 
DFSTestUtil.mockDFSClientKeyProvider to cope with this scenario (had to mock 
dfs.dfs as well) I could use it in additional scenarios also.

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at 

[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-09-06 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605831#comment-16605831
 ] 

Zsolt Venczel commented on HDFS-13744:
--

Thanks a lot [~mackrorysd] for the review and the fix!

I was a bit puzzled on the specification about how to escape a CRLF properly as 
it's not specified exactly (there's an example to replace it character by 
character which is your approach but there's another example here: 
https://tools.ietf.org/html/rfc2234#section-2.3).
>From a usability perspective I think you're approach is the best as it clearly 
>displays all special characters. For debugging purposes this is the most 
>valuable.

Test failures are unrelated.



> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch, HDFS-13744.02.patch, 
> HDFS-13744.03.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-08-31 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598669#comment-16598669
 ] 

Zsolt Venczel edited comment on HDFS-13697 at 8/31/18 12:20 PM:


Thanks [~xiaochen] for pointing out that KMSClientProvider.createConnection 
still had morphing. I removed it with patch 10.


was (Author: zvenczel):
Thanks [~xiaochen] for pointing out that KMSClientProvider.createConnection 
still had morphing I removed with patch 10.

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> 

[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-08-31 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598669#comment-16598669
 ] 

Zsolt Venczel commented on HDFS-13697:
--

Thanks [~xiaochen] for pointing out that KMSClientProvider.createConnection 
still had morphing I removed with patch 10.

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> 

[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-08-31 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Attachment: HDFS-13697.10.patch

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
>  at 
> 

[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-30 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597451#comment-16597451
 ] 

Zsolt Venczel commented on HDFS-13744:
--

Thank you very much for the review [~mackrorysd]. Please let me know if you 
have any preference about the direction this solution should take.
In my latest patch I added StringUtils.CR support as you suggested.

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch, HDFS-13744.02.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-30 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13744:
-
Attachment: HDFS-13744.02.patch

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch, HDFS-13744.02.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-08-29 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596512#comment-16596512
 ] 

Zsolt Venczel commented on HDFS-13697:
--

Thanks [~daryn] for the investigation and explanation and thanks [~xiaochen] 
for the continuous work and discussion!

In my latest patch (09) I addressed the following:
 * *KMSClientProvider*
 No more morphing... The doAsUser is calculated at construction time and that's 
it.
 * *TestKMS*
 Based on Xiao's findings I fixed the key provider creation in the 
doProxyUserTest function to correctly test key creation by proxy users.
 * *TestAclsEndToEnd*
 I think the main issue with this test suite was that it was using the mini 
cluster dfs client for all of its operations. As we stopped morphing the 
problem had surfaced therefore I refactored it to use a truly end-to-end 
approach by having a proper client and a proper, client side key provider. The 
fat part of the changes are due to introducing a service user that needed the 
appropriate ACLs for the various testing scenarios.

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.09.patch, HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> 

[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-08-29 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Attachment: HDFS-13697.09.patch

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.09.patch, HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
>  at 
> 

[jira] [Updated] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints

2018-08-28 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13731:
-
Attachment: HDFS-13731.03.patch

> ReencryptionUpdater fails with ConcurrentModificationException during 
> processCheckpoints
> 
>
> Key: HDFS-13731
> URL: https://issues.apache.org/jira/browse/HDFS-13731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13731-failure.log, HDFS-13731.01.patch, 
> HDFS-13731.02.patch, HDFS-13731.03.patch
>
>
> HDFS-12837 fixed some flakiness of Reencryption related tests. But as 
> [~zvenczel]'s comment, there are a few timeouts still. We should investigate 
> that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints

2018-08-27 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593715#comment-16593715
 ] 

Zsolt Venczel commented on HDFS-13731:
--

Thanks for the review [~xiaochen]!

In my latest patch I added the protection you suggested. Also I did some 
additional analysis and I found that the rest of the failure scenarios were due 
to the same ConcurrentModificationException therefore I removed the changes 
from the tests.

Could not reproduce any failure after I've applied the patch on the latest 
trunk: 
http://dist-test.cloudera.org:80/job?job_id=hadoop-hdfs.zvenczel.1535377953.29868

> ReencryptionUpdater fails with ConcurrentModificationException during 
> processCheckpoints
> 
>
> Key: HDFS-13731
> URL: https://issues.apache.org/jira/browse/HDFS-13731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13731-failure.log, HDFS-13731.01.patch, 
> HDFS-13731.02.patch
>
>
> HDFS-12837 fixed some flakiness of Reencryption related tests. But as 
> [~zvenczel]'s comment, there are a few timeouts still. We should investigate 
> that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints

2018-08-27 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13731:
-
Attachment: HDFS-13731.02.patch

> ReencryptionUpdater fails with ConcurrentModificationException during 
> processCheckpoints
> 
>
> Key: HDFS-13731
> URL: https://issues.apache.org/jira/browse/HDFS-13731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13731-failure.log, HDFS-13731.01.patch, 
> HDFS-13731.02.patch
>
>
> HDFS-12837 fixed some flakiness of Reencryption related tests. But as 
> [~zvenczel]'s comment, there are a few timeouts still. We should investigate 
> that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped

2018-08-24 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591436#comment-16591436
 ] 

Zsolt Venczel edited comment on HDFS-13846 at 8/24/18 10:15 AM:


Thanks [~knanasi] for creating this issue, I think it's a great catch!

In the description the term "node" is a bit confusing to me. These are the 
"real data blocks" you are referring to right?

I like how you extended the already available mocking approach in the unit 
tests.
 When I applied the test it was failing but it did pass with your proposed fix 
therefore I think it should be valid.

Overall I think it's a valid change, +1 (non-binding) from me.


was (Author: zvenczel):
Thanks [~knanasi] for creating this issue, I think it's a great catch!

In the description the term "node" is a bit confusing to me. These are the 
"real data blocks" you are referring to right?

I like how you extended the already available mocking approach in the unit 
tests.
When I applied the test they were failing but they did pass with your proposed 
fix therefore I think they should be valid.

Overall I think it's a valid change, +1 (non-binding) from me.

> Safe blocks counter is not decremented correctly if the block is striped
> 
>
> Key: HDFS-13846
> URL: https://issues.apache.org/jira/browse/HDFS-13846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13846.001.patch
>
>
> In BlockManagerSafeMode class, the "safe blocks" counter is incremented if 
> the number of nodes containing the block equals to the number of data units 
> specified by the erasure coding policy, which looks like this in the code:
> {code:java}
> final int safe = storedBlock.isStriped() ?
> ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : 
> safeReplication;
> if (storageNum == safe) {
>   this.blockSafe++;
> {code}
> But when it is decremented the code does not check if the block is striped or 
> not, just compares the number of nodes containing the block with 0 
> (safeReplication - 1) if the block is complete, which is not correct.
> {code:java}
> if (storedBlock.isComplete() &&
> blockManager.countNodes(b).liveReplicas() == safeReplication - 1) {
>   this.blockSafe--;
>   assert blockSafe >= 0;
>   checkSafeMode();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped

2018-08-24 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591436#comment-16591436
 ] 

Zsolt Venczel commented on HDFS-13846:
--

Thanks [~knanasi] for creating this issue, I think it's a great catch!

In the description the term "node" is a bit confusing to me. These are the 
"real data blocks" you are referring to right?

I like how you extended the already available mocking approach in the unit 
tests.
When I applied the test they were failing but they did pass with your proposed 
fix therefore I think they should be valid.

Overall I think it's a valid change, +1 (non-binding) from me.

> Safe blocks counter is not decremented correctly if the block is striped
> 
>
> Key: HDFS-13846
> URL: https://issues.apache.org/jira/browse/HDFS-13846
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13846.001.patch
>
>
> In BlockManagerSafeMode class, the "safe blocks" counter is incremented if 
> the number of nodes containing the block equals to the number of data units 
> specified by the erasure coding policy, which looks like this in the code:
> {code:java}
> final int safe = storedBlock.isStriped() ?
> ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : 
> safeReplication;
> if (storageNum == safe) {
>   this.blockSafe++;
> {code}
> But when it is decremented the code does not check if the block is striped or 
> not, just compares the number of nodes containing the block with 0 
> (safeReplication - 1) if the block is complete, which is not correct.
> {code:java}
> if (storedBlock.isComplete() &&
> blockManager.countNodes(b).liveReplicas() == safeReplication - 1) {
>   this.blockSafe--;
>   assert blockSafe >= 0;
>   checkSafeMode();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13752) fs.Path stores file path in java.net.URI causes big memory waste

2018-08-23 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590382#comment-16590382
 ] 

Zsolt Venczel commented on HDFS-13752:
--

Thanks for the patch [~b.maidics] and thanks for posting the review we talked 
about [~gabor.bota]!

A few additional thoughts from my side:
 * The Path class is used within all services of HDFS eg. the DataNode and 
NameNode. The impact on these components would be  tremendous. Introducing 
SoftReference in a NameNode would induce some unwanted GC behavior especially 
in larger scale clusters (the small file problem would be even more imminent). 
This off course needs to be measured therefore some initial metrics would be 
great.
 * The toURI is used in Hadoop 2.7.6 in 237 places and ~20 sub-components. In 
Hadoop trunk this number is much larger. Please revisit your calculations.

By giving a thought about the initial problem I could imagine something that 
lives on the client side only and tries to introduce some caching by either 
extending the Path class or transforming it to something more convenient. 

> fs.Path stores file path in java.net.URI causes big memory waste
> 
>
> Key: HDFS-13752
> URL: https://issues.apache.org/jira/browse/HDFS-13752
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.7.6
> Environment: Hive 2.1.1 and hadoop 2.7.6 
>Reporter: Barnabas Maidics
>Priority: Major
> Attachments: HDFS-13752.001.patch, HDFS-13752.002.patch, 
> HDFS-13752.003.patch, Screen Shot 2018-07-20 at 11.12.38.png, 
> heapdump-10partitions.html, measurement.pdf
>
>
> I was looking at HiveServer2 memory usage, and a big percentage of this was 
> because of org.apache.hadoop.fs.Path, where you store file paths in a 
> java.net.URI object. The URI implementation stores the same string in 3 
> different objects (see the attached image). In Hive when there are many 
> partitions this cause a big memory usage. In my particular case 42% of memory 
> was used by java.net.URI so it could be reduced to 14%. 
> I wonder if the community is open to replace it with a more memory efficient 
> implementation and what other things should be considered here? It can be a 
> huge memory improvement for Hadoop and for Hive as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints

2018-08-23 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13731:
-
Attachment: HDFS-13731.01.patch

> ReencryptionUpdater fails with ConcurrentModificationException during 
> processCheckpoints
> 
>
> Key: HDFS-13731
> URL: https://issues.apache.org/jira/browse/HDFS-13731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13731-failure.log, HDFS-13731.01.patch
>
>
> HDFS-12837 fixed some flakiness of Reencryption related tests. But as 
> [~zvenczel]'s comment, there are a few timeouts still. We should investigate 
> that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints

2018-08-23 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13731:
-
Status: Patch Available  (was: In Progress)

> ReencryptionUpdater fails with ConcurrentModificationException during 
> processCheckpoints
> 
>
> Key: HDFS-13731
> URL: https://issues.apache.org/jira/browse/HDFS-13731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13731-failure.log, HDFS-13731.01.patch
>
>
> HDFS-12837 fixed some flakiness of Reencryption related tests. But as 
> [~zvenczel]'s comment, there are a few timeouts still. We should investigate 
> that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints

2018-08-23 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590033#comment-16590033
 ] 

Zsolt Venczel commented on HDFS-13731:
--

With my patch applied the test passes: 
http://dist-test.cloudera.org:80/job?job_id=hadoop-hdfs.zvenczel.1535016717.29829

> ReencryptionUpdater fails with ConcurrentModificationException during 
> processCheckpoints
> 
>
> Key: HDFS-13731
> URL: https://issues.apache.org/jira/browse/HDFS-13731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13731-failure.log
>
>
> HDFS-12837 fixed some flakiness of Reencryption related tests. But as 
> [~zvenczel]'s comment, there are a few timeouts still. We should investigate 
> that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints

2018-08-23 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590030#comment-16590030
 ] 

Zsolt Venczel commented on HDFS-13731:
--

While investigating the above timeouts I found the following concurrency issue:
 * while the ReencryptionUpdate.processCheckpoints method is executing and 
removing tasks from the task list
 * on a different thread a new re-encryption task can be added to the same task 
list by calling ReencryptionHandler.submitCurrentBatch that calls 
ZoneSubmissionTracker.addTask

My latest patch contains a proposal to prevent this.

I've attached the full log produced for the issue.

The important section where the *processCheckpoints* iterations are still 
running but a new ZoneSubmissionTracker task is being added:
{code:java}
2018-08-22 17:16:01,535 INFO  FSTreeTraverser - Submitted batch 
(start:/zones/zone/0, size:5) of zone 16387 to re-encrypt.
2018-08-22 17:16:01,535 INFO  ReencryptionHandler - Processing batched 
re-encryption for zone 16387, batch size 5, start:/zones/zone/0
2018-08-22 17:16:01,536 INFO  ReencryptionHandler - Completed re-encrypting one 
batch of 5 edeks from KMS, time consumed: 922873, start: /zones/zone/0.
2018-08-22 17:16:01,536 INFO  ReencryptionUpdater - Processing returned 
re-encryption task for zone /zones/zone(16387), batch size 5, 
start:/zones/zone/0
2018-08-22 17:16:01,536 DEBUG ReencryptionUpdater - Updating file xattrs for 
re-encrypting zone /zones/zone, starting at /zones/zone/0
2018-08-22 17:16:01,536 TRACE ReencryptionUpdater - Updating 16388 for 
re-encryption.
2018-08-22 17:16:01,536 TRACE ReencryptionUpdater - Updating 16389 for 
re-encryption.
2018-08-22 17:16:01,536 TRACE ReencryptionUpdater - Updating 16390 for 
re-encryption.
2018-08-22 17:16:01,536 TRACE ReencryptionUpdater - Updating 16391 for 
re-encryption.
2018-08-22 17:16:01,536 TRACE ReencryptionUpdater - Updating 16392 for 
re-encryption.
2018-08-22 17:16:01,536 INFO  ReencryptionUpdater - Updated xattrs on 5(5) 
files in zone /zones/zone for re-encryption, starting:/zones/zone/0.
2018-08-22 17:16:01,536 DEBUG ReencryptionUpdater - Updating re-encryption 
checkpoint with completed task. last: /zones/zone/4 size:5.
2018-08-22 17:16:01,536 INFO  FSTreeTraverser - Submitted batch 
(start:/zones/zone/5, size:5) of zone 16387 to re-encrypt.
2018-08-22 17:16:01,536 INFO  ReencryptionHandler - Processing batched 
re-encryption for zone 16387, batch size 5, start:/zones/zone/5
2018-08-22 17:16:01,537 ERROR ReencryptionUpdater - Re-encryption updater 
thread exiting.
java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966)
at java.util.LinkedList$ListItr.remove(LinkedList.java:921)
at 
org.apache.hadoop.hdfs.server.namenode.ReencryptionUpdater.processCheckpoints(ReencryptionUpdater.java:411)
at 
org.apache.hadoop.hdfs.server.namenode.ReencryptionUpdater.processTask(ReencryptionUpdater.java:488)
at 
org.apache.hadoop.hdfs.server.namenode.ReencryptionUpdater.takeAndProcessTasks(ReencryptionUpdater.java:437)
at 
org.apache.hadoop.hdfs.server.namenode.ReencryptionUpdater.run(ReencryptionUpdater.java:264)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-08-22 17:16:01,537 INFO  ReencryptionHandler - Submission completed of 
zone 16387 for re-encryption.
{code}
Which results in cancelling the re-encryption tasks:
{code:java}
2018-08-22 17:16:51,612 INFO  ReencryptionUpdater - Cancelling 2 re-encryption 
tasks
...
2018-08-22 17:16:51,621 INFO  ReencryptionUpdater - Cancelling 2 re-encryption 
tasks
{code}
My uploaded patch fixes two other test related issues:
 * sometimes in the testRestartAfterReencryptAndCheckpoint fs.saveNamespace() 
call was performing slow therefore we should wait for it to finish the operation
 * cancelFutureDuringReencryption method introduced a race condition as at
{code:java}
callableRunning.set(true); Thread.sleep(Long.MAX_VALUE);{code}
between setting the callableRunning to true and sleeping the thread a 
concurrent modification can happen in rare cases.

> ReencryptionUpdater fails with ConcurrentModificationException during 
> processCheckpoints
> 
>
> Key: HDFS-13731
> URL: https://issues.apache.org/jira/browse/HDFS-13731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13731-failure.log
>
>
> HDFS-12837 fixed some flakiness of Reencryption related tests. But as 
> [~zvenczel]'s comment, there are a few timeouts still. 

[jira] [Updated] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints

2018-08-23 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13731:
-
Attachment: HDFS-13731-failure.log

> ReencryptionUpdater fails with ConcurrentModificationException during 
> processCheckpoints
> 
>
> Key: HDFS-13731
> URL: https://issues.apache.org/jira/browse/HDFS-13731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13731-failure.log
>
>
> HDFS-12837 fixed some flakiness of Reencryption related tests. But as 
> [~zvenczel]'s comment, there are a few timeouts still. We should investigate 
> that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints

2018-08-23 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13731:
-
Summary: ReencryptionUpdater fails with ConcurrentModificationException 
during processCheckpoints  (was: Investigate TestReencryption timeouts)

> ReencryptionUpdater fails with ConcurrentModificationException during 
> processCheckpoints
> 
>
> Key: HDFS-13731
> URL: https://issues.apache.org/jira/browse/HDFS-13731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, test
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Zsolt Venczel
>Priority: Major
>
> HDFS-12837 fixed some flakiness of Reencryption related tests. But as 
> [~zvenczel]'s comment, there are a few timeouts still. We should investigate 
> that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-13843) RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI

2018-08-22 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-13843 started by Zsolt Venczel.

> RBF: When we add/update mount entry to multiple destinations, unable to see 
> the order information in mount entry points and in federation router UI
> ---
>
> Key: HDFS-13843
> URL: https://issues.apache.org/jira/browse/HDFS-13843
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: RBF
>
> *Scenario:*
> Execute the below add/update command for single mount entry for single 
> nameservice pointing to multiple destinations. 
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3
>  # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order 
> RANDOM
> *Actual*. With the above commands, mount entry is successfully updated.
> But order information like HASH, RANDOM is not displayed in mount entries and 
> also not displayed in federation router UI. However order information is 
> updated properly when there are multiple nameservices. This issue is with 
> single nameservice having multiple destinations.
> *Expected:* 
> *Order information should be updated in mount entries so that the user will 
> come to know which order has been set.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13843) RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI

2018-08-22 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel reassigned HDFS-13843:


Assignee: Zsolt Venczel

> RBF: When we add/update mount entry to multiple destinations, unable to see 
> the order information in mount entry points and in federation router UI
> ---
>
> Key: HDFS-13843
> URL: https://issues.apache.org/jira/browse/HDFS-13843
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: RBF
>
> *Scenario:*
> Execute the below add/update command for single mount entry for single 
> nameservice pointing to multiple destinations. 
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1
>  # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3
>  # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order 
> RANDOM
> *Actual*. With the above commands, mount entry is successfully updated.
> But order information like HASH, RANDOM is not displayed in mount entries and 
> also not displayed in federation router UI. However order information is 
> updated properly when there are multiple nameservices. This issue is with 
> single nameservice having multiple destinations.
> *Expected:* 
> *Order information should be updated in mount entries so that the user will 
> come to know which order has been set.*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-16 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582928#comment-16582928
 ] 

Zsolt Venczel commented on HDFS-13744:
--

I could not reproduce the above test failure with or without the patch 
therefore it should be unrelated:
{code}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 74.774 s 
- in 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
{code}

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-16 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13744:
-
Status: Patch Available  (was: In Progress)

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 3.0.3, 2.7.6, 2.8.4, 2.9.1, 2.6.5
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * Delimited processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * 
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * 
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-16 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13744:
-
Description: 
In certain cases when control characters or white space is present in file or 
directory names OIV tool processors can export data in a misleading format.

In the below examples we have EXAMPLE_NAME as a file and a directory name where 
the directory has a line feed character at the end (the actual production case 
has multiple line feeds and multiple spaces)
 * Delimited processor case:
 ** misleading example:
{code:java}
/user/data/EXAMPLE_NAME
,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
/user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
{code}

 * 
 ** expected example as suggested by 
[https://tools.ietf.org/html/rfc4180#section-2]:
{code:java}
"/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
"/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
{code}

 * XML processor case:
 ** misleading example:
{code:java}
479867791DIRECTORYEXAMPLE_NAME
1493033668294user:group:0775

113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
{code}

 * 
 ** expected example as specified in 
[https://www.w3.org/TR/REC-xml/#sec-line-ends]:
{code:java}
479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775

113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
{code}

 * JSON:
 The OIV Web Processor behaves correctly and produces the following:
{code:java}
{
  "FileStatuses": {
"FileStatus": [
  {
"fileId": 113632535,
"accessTime": 1494954320141,
"replication": 3,
"owner": "user",
"length": 520,
"permission": "674",
"blockSize": 134217728,
"modificationTime": 1472205657504,
"type": "FILE",
"group": "group",
"childrenNum": 0,
"pathSuffix": "EXAMPLE_NAME"
  },
  {
"fileId": 479867791,
"accessTime": 0,
"replication": 0,
"owner": "user",
"length": 0,
"permission": "775",
"blockSize": 0,
"modificationTime": 1493033668294,
"type": "DIRECTORY",
"group": "group",
"childrenNum": 0,
"pathSuffix": "EXAMPLE_NAME\n"
  }
]
  }
}
{code}

  was:
In certain cases when control characters or white space is present in file or 
directory names OIV tool processors can export data in a misleading format.

In the below examples we have EXAMPLE_NAME as a file and a directory name where 
the directory has a line feed character at the end (the actual production case 
has multiple line feeds and multiple spaces)
 * CSV processor case:
 ** misleading example:
{code:java}
/user/data/EXAMPLE_NAME
,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
/user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
{code}

 ** expected example as suggested by 
[https://tools.ietf.org/html/rfc4180#section-2]:
{code:java}
"/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
"/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
{code}

 * XML processor case:
 ** misleading example:
{code:java}
479867791DIRECTORYEXAMPLE_NAME
1493033668294user:group:0775

113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
{code}

 ** expected example as specified in 
[https://www.w3.org/TR/REC-xml/#sec-line-ends]:
{code:java}
479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775

113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
{code}

 * JSON:
 The OIV Web Processor behaves correctly and produces the following:
{code:java}
{
  "FileStatuses": {
"FileStatus": [
  {
"fileId": 113632535,
"accessTime": 1494954320141,
"replication": 3,
"owner": "user",
"length": 520,
"permission": "674",
"blockSize": 134217728,
"modificationTime": 1472205657504,
"type": "FILE",
"group": "group",
"childrenNum": 0,
"pathSuffix": "EXAMPLE_NAME"
  },
  {
"fileId": 479867791,
"accessTime": 0,
"replication": 0,
"owner": "user",
"length": 0,
"permission": "775",
"blockSize": 0,
"modificationTime": 1493033668294,
"type": "DIRECTORY",
"group": "group",
"childrenNum": 0,
"pathSuffix": "EXAMPLE_NAME\n"
  }
]
  }
}
{code}


> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744

[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-16 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582703#comment-16582703
 ] 

Zsolt Venczel commented on HDFS-13744:
--

After doing some more analysis it turns out that very few CSV and XML clients 
are following the LF character encoding specifications.

This can have the following impact:

* For the XML processor:
Escaping the LF character following the specification can distort an XML parser 
to correctly reproduce a file name. It can also modify filenames when using the 
ReverseXML processor. *I would not recommend escaping here.*

* For the Delimited processor:
The output of the Delimited processor is handy for report creation and grepping 
where a wrongly displayed filename or directory name having LF can cause more 
problems than the appearance of an escaped LF character therefore *I would 
recommend escaping in this scenario*.

In my uploaded patch I added escaping for the Delimited processor only.

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * CSV processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-16 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13744:
-
Attachment: HDFS-13744.01.patch

> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
> URL: https://issues.apache.org/jira/browse/HDFS-13744
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, tools
>Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Critical
> Attachments: HDFS-13744.01.patch
>
>
> In certain cases when control characters or white space is present in file or 
> directory names OIV tool processors can export data in a misleading format.
> In the below examples we have EXAMPLE_NAME as a file and a directory name 
> where the directory has a line feed character at the end (the actual 
> production case has multiple line feeds and multiple spaces)
>  * CSV processor case:
>  ** misleading example:
> {code:java}
> /user/data/EXAMPLE_NAME
> ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  ** expected example as suggested by 
> [https://tools.ietf.org/html/rfc4180#section-2]:
> {code:java}
> "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
> 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
> "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
> 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
> {code}
>  * XML processor case:
>  ** misleading example:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME
> 1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  ** expected example as specified in 
> [https://www.w3.org/TR/REC-xml/#sec-line-ends]:
> {code:java}
> 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775
> 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
> {code}
>  * JSON:
>  The OIV Web Processor behaves correctly and produces the following:
> {code:java}
> {
>   "FileStatuses": {
> "FileStatus": [
>   {
> "fileId": 113632535,
> "accessTime": 1494954320141,
> "replication": 3,
> "owner": "user",
> "length": 520,
> "permission": "674",
> "blockSize": 134217728,
> "modificationTime": 1472205657504,
> "type": "FILE",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME"
>   },
>   {
> "fileId": 479867791,
> "accessTime": 0,
> "replication": 0,
> "owner": "user",
> "length": 0,
> "permission": "775",
> "blockSize": 0,
> "modificationTime": 1493033668294,
> "type": "DIRECTORY",
> "group": "group",
> "childrenNum": 0,
> "pathSuffix": "EXAMPLE_NAME\n"
>   }
> ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13744) OIV tool should better handle control characters present in file or directory names

2018-08-16 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13744:
-
Description: 
In certain cases when control characters or white space is present in file or 
directory names OIV tool processors can export data in a misleading format.

In the below examples we have EXAMPLE_NAME as a file and a directory name where 
the directory has a line feed character at the end (the actual production case 
has multiple line feeds and multiple spaces)
 * CSV processor case:
 ** misleading example:
{code:java}
/user/data/EXAMPLE_NAME
,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
/user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
{code}

 ** expected example as suggested by 
[https://tools.ietf.org/html/rfc4180#section-2]:
{code:java}
"/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 
16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
"/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
{code}

 * XML processor case:
 ** misleading example:
{code:java}
479867791DIRECTORYEXAMPLE_NAME
1493033668294user:group:0775

113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
{code}

 ** expected example as specified in 
[https://www.w3.org/TR/REC-xml/#sec-line-ends]:
{code:java}
479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775

113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
{code}

 * JSON:
 The OIV Web Processor behaves correctly and produces the following:
{code:java}
{
  "FileStatuses": {
"FileStatus": [
  {
"fileId": 113632535,
"accessTime": 1494954320141,
"replication": 3,
"owner": "user",
"length": 520,
"permission": "674",
"blockSize": 134217728,
"modificationTime": 1472205657504,
"type": "FILE",
"group": "group",
"childrenNum": 0,
"pathSuffix": "EXAMPLE_NAME"
  },
  {
"fileId": 479867791,
"accessTime": 0,
"replication": 0,
"owner": "user",
"length": 0,
"permission": "775",
"blockSize": 0,
"modificationTime": 1493033668294,
"type": "DIRECTORY",
"group": "group",
"childrenNum": 0,
"pathSuffix": "EXAMPLE_NAME\n"
  }
]
  }
}
{code}

  was:
In certain cases when control characters or white space is present in file or 
directory names OIV tool processors can export data in a misleading format.

In the below examples we have EXAMPLE_NAME as a file and a directory name where 
the directory has a line feed character at the end (the actual production case 
has multiple line feeds and multiple spaces)
 * CSV processor case:
 ** misleading example:
{code:java}
/user/data/EXAMPLE_NAME
,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
/user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 
10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
{code}
 ** expected example as suggested by 
[https://tools.ietf.org/html/rfc4180#section-2]:
{code:java}
"/user/data/EXAMPLE_NAME%x0D",0,2017-04-24 04:34,1969-12-31 
16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group
"/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 
10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group
{code}

 * XML processor case:
 ** misleading example:
{code:java}
479867791DIRECTORYEXAMPLE_NAME
1493033668294user:group:0775

113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
{code}
 ** expected example as specified in 
[https://www.w3.org/TR/REC-xml/#sec-line-ends]:
{code:java}
479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775

113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674
{code}

 * JSON:
 The OIV Web Processor behaves correctly and produces the following:
{code:java}
{
  "FileStatuses": {
"FileStatus": [
  {
"fileId": 113632535,
"accessTime": 1494954320141,
"replication": 3,
"owner": "user",
"length": 520,
"permission": "674",
"blockSize": 134217728,
"modificationTime": 1472205657504,
"type": "FILE",
"group": "group",
"childrenNum": 0,
"pathSuffix": "EXAMPLE_NAME"
  },
  {
"fileId": 479867791,
"accessTime": 0,
"replication": 0,
"owner": "user",
"length": 0,
"permission": "775",
"blockSize": 0,
"modificationTime": 1493033668294,
"type": "DIRECTORY",
"group": "group",
"childrenNum": 0,
"pathSuffix": "EXAMPLE_NAME\n"
  }
]
  }
}
{code}


> OIV tool should better handle control characters present in file or directory 
> names
> ---
>
> Key: HDFS-13744
>

[jira] [Commented] (HDFS-13732) Erasure Coding policy name is not coming when the new policy is set

2018-08-15 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581061#comment-16581061
 ] 

Zsolt Venczel commented on HDFS-13732:
--

Failed tests are passing locally with or without the patch therefore they 
should be unrelated:
{code}
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal
[INFO] Tests run: 38, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.179 
s - in org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal
[INFO] Running org.apache.hadoop.hdfs.TestSafeModeWithStripedFile
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 25.843 s 
- in org.apache.hadoop.hdfs.TestSafeModeWithStripedFile
[INFO] Running org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 40.357 s 
- in org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus
[INFO] Running 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.797 s 
- in org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
[INFO] Running org.apache.hadoop.hdfs.TestMaintenanceState
[INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 350.793 
s - in org.apache.hadoop.hdfs.TestMaintenanceState
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 75, Failures: 0, Errors: 0, Skipped: 0
{code}

> Erasure Coding policy name is not coming when the new policy is set
> ---
>
> Key: HDFS-13732
> URL: https://issues.apache.org/jira/browse/HDFS-13732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, tools
>Affects Versions: 3.0.0
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Trivial
> Attachments: EC_Policy.PNG, HDFS-13732.01.patch
>
>
> Scenerio:
> If the new policy apart from the default EC policy is set for the HDFS 
> directory, then the console message is coming as "Set default erasure coding 
> policy on "
> Expected output:
> It would be good If the EC policy name is displayed when the policy is set...
>  
> Actual output:
> Set default erasure coding policy on 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13732) Erasure Coding policy name is not coming when the new policy is set

2018-08-15 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13732:
-
Status: Patch Available  (was: In Progress)

> Erasure Coding policy name is not coming when the new policy is set
> ---
>
> Key: HDFS-13732
> URL: https://issues.apache.org/jira/browse/HDFS-13732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, tools
>Affects Versions: 3.0.0
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Trivial
> Attachments: EC_Policy.PNG, HDFS-13732.01.patch
>
>
> Scenerio:
> If the new policy apart from the default EC policy is set for the HDFS 
> directory, then the console message is coming as "Set default erasure coding 
> policy on "
> Expected output:
> It would be good If the EC policy name is displayed when the policy is set...
>  
> Actual output:
> Set default erasure coding policy on 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13732) Erasure Coding policy name is not coming when the new policy is set

2018-08-15 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13732:
-
Attachment: HDFS-13732.01.patch

> Erasure Coding policy name is not coming when the new policy is set
> ---
>
> Key: HDFS-13732
> URL: https://issues.apache.org/jira/browse/HDFS-13732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, tools
>Affects Versions: 3.0.0
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Trivial
> Attachments: EC_Policy.PNG, HDFS-13732.01.patch
>
>
> Scenerio:
> If the new policy apart from the default EC policy is set for the HDFS 
> directory, then the console message is coming as "Set default erasure coding 
> policy on "
> Expected output:
> It would be good If the EC policy name is displayed when the policy is set...
>  
> Actual output:
> Set default erasure coding policy on 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13732) Erasure Coding policy name is not coming when the new policy is set

2018-08-14 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579574#comment-16579574
 ] 

Zsolt Venczel commented on HDFS-13732:
--

Hi [~SoumyaPN],

Thanks for reporting the issue!

Can you please extend the description of this issue with:
1) exact command you are executing
2) exact actual output
3) exact expected output

Many thanks,
Zsolt

> Erasure Coding policy name is not coming when the new policy is set
> ---
>
> Key: HDFS-13732
> URL: https://issues.apache.org/jira/browse/HDFS-13732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, tools
>Affects Versions: 3.0.0
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Trivial
> Attachments: EC_Policy.PNG
>
>
> Scenerio:
> If the new policy apart from the default EC policy is set for the HDFS 
> directory, then the console message is coming as "Set default erasure coding 
> policy on "
> Expected output:
> It would be good If the EC policy name is displayed when the policy is set...
>  
> Actual output:
> Set default erasure coding policy on 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-08-13 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578717#comment-16578717
 ] 

Zsolt Venczel edited comment on HDFS-13697 at 8/13/18 6:00 PM:
---

Test failures seem to be unrelated, I could not reproduce locally with or 
without my patch: 
{code:java}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 132.276 
s - in org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks
[INFO] Running 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 54.574 s 
- in 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
[INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.645 
s - in org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner
[INFO] Running org.apache.hadoop.tracing.TestTracing
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.591 s 
- in org.apache.hadoop.tracing.TestTracing
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
{code}


was (Author: zvenczel):
Test failures seem to be unrelated, I could not reproduce locally with or 
without my commit: 
{code:java}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 132.276 
s - in org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks
[INFO] Running 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 54.574 s 
- in 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
[INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.645 
s - in org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner
[INFO] Running org.apache.hadoop.tracing.TestTracing
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.591 s 
- in org.apache.hadoop.tracing.TestTracing
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
{code}

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> 

[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-08-13 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578717#comment-16578717
 ] 

Zsolt Venczel commented on HDFS-13697:
--

Test failures seem to be unrelated, I could not reproduce locally with or 
without my commit: 
{code:java}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 132.276 
s - in org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks
[INFO] Running 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 54.574 s 
- in 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy
[INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.645 
s - in org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner
[INFO] Running org.apache.hadoop.tracing.TestTracing
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.591 s 
- in org.apache.hadoop.tracing.TestTracing
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
{code}

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> 

[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-08-13 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578191#comment-16578191
 ] 

Zsolt Venczel commented on HDFS-13697:
--

Hi [~xiaochen],

Thanks a lot for working on this and providing your solution in the prelim 
patch.

While investigating the proposal to cache the ugi and prevent morphing I came 
across the same set of failing tests your approach touched. The most 
interesting one is HDFS-9295 (full spectrum test by [~templedf]) in 
org.apache.hadoop.hdfs.TestAclsEndToEnd. This test suite does a test on all 
possible, expected variations about morphing and I found that the following 
tests are not compatible with the cached ugi approach: 
testGoodWithWhitelistWithoutBlacklist, testGoodWithKeyAcls, 
testGoodWithWhitelist, testGoodWithKeyAclsWithoutBlacklist

As these use cases are around for a while I'd expect them to be used widely and 
hard to avoid. What do you think?

I've uploaded a new patch (v08) where I factored out the keyProvider injection 
for the DFSClient to happen via Mockito only.

Also reverted the TestKMS as you suggested leaving the HADOOP-13749 changes in.

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> 

[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-08-13 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Attachment: HDFS-13697.08.patch

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, 
> HDFS-13697.prelim.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
>  at 
> 

[jira] [Commented] (HDFS-13770) dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted

2018-08-13 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578169#comment-16578169
 ] 

Zsolt Venczel commented on HDFS-13770:
--

Thanks for the update [~knanasi], patch v003 looks good, +1 (non-binding) from 
me.

> dfsadmin -report does not always decrease "missing blocks (with replication 
> factor 1)" metrics when file is deleted
> ---
>
> Key: HDFS-13770
> URL: https://issues.apache.org/jira/browse/HDFS-13770
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.7
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13770-branch-2.001.patch, 
> HDFS-13770-branch-2.002.patch, HDFS-13770-branch-2.003.patch
>
>
> Missing blocks (with replication factor 1) metric is not always decreased 
> when file is deleted.
> If a file is deleted, the remove function of UnderReplicatedBlocks can be 
> called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called 
> with the wrong priority the corruptReplOneBlocks metric is not decreased, 
> however the block is removed from the priority queue which contains it.
> The corresponding code:
> {code:java}
> /** remove a block from a under replication queue */
> synchronized boolean remove(BlockInfo block,
>  int oldReplicas,
>  int oldReadOnlyReplicas,
>  int decommissionedReplicas,
>  int oldExpectedReplicas) {
>  final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas,
>  decommissionedReplicas, oldExpectedReplicas);
>  boolean removedBlock = remove(block, priLevel);
>  if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS &&
>  oldExpectedReplicas == 1 &&
>  removedBlock) {
>  corruptReplOneBlocks--;
>  assert corruptReplOneBlocks >= 0 :
>  "Number of corrupt blocks with replication factor 1 " +
>  "should be non-negative";
>  }
>  return removedBlock;
> }
> /**
>  * Remove a block from the under replication queues.
>  *
>  * The priLevel parameter is a hint of which queue to query
>  * first: if negative or = \{@link #LEVEL} this shortcutting
>  * is not attmpted.
>  *
>  * If the block is not found in the nominated queue, an attempt is made to
>  * remove it from all queues.
>  *
>  * Warning: This is not a synchronized method.
>  * @param block block to remove
>  * @param priLevel expected privilege level
>  * @return true if the block was found and removed from one of the priority 
> queues
>  */
> boolean remove(BlockInfo block, int priLevel) {
>  if(priLevel >= 0 && priLevel < LEVEL
>  && priorityQueues.get(priLevel).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" +
>  " from priority queue {}", block, priLevel);
>  return true;
>  } else {
>  // Try to remove the block from all queues if the block was
>  // not found in the queue for the given priority level.
>  for (int i = 0; i < LEVEL; i++) {
>  if (i != priLevel && priorityQueues.get(i).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" +
>  " {} from priority queue {}", block, i);
>  return true;
>  }
>  }
>  }
>  return false;
> }
> {code}
> It is already fixed on trunk by this jira: HDFS-10999, but that ticket 
> introduces new metrics, which I think should't be backported to branch-2.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13788) Update EC documentation about rack fault tolerance

2018-08-13 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578168#comment-16578168
 ] 

Zsolt Venczel commented on HDFS-13788:
--

Thanks [~xiaochen] for reporting this issue and thanks [~knanasi] for working 
on the patch.

The updated documentation seems to be fine. +1 (non-binding) from me.

> Update EC documentation about rack fault tolerance
> --
>
> Key: HDFS-13788
> URL: https://issues.apache.org/jira/browse/HDFS-13788
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: documentation, erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13788.001.patch
>
>
> From 
> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html:
> {quote}
> For rack fault-tolerance, it is also important to have at least as many racks 
> as the configured EC stripe width. For EC policy RS (6,3), this means 
> minimally 9 racks, and ideally 10 or 11 to handle planned and unplanned 
> outages. For clusters with fewer racks than the stripe width, HDFS cannot 
> maintain rack fault-tolerance, but will still attempt to spread a striped 
> file across multiple nodes to preserve node-level fault-tolerance.
> {quote}
> Theoretical minimum is 3 racks, and ideally 9 or more, so the document should 
> be updated.
> (I didn't check timestamps, but this is probably due to 
> {{BlockPlacementPolicyRackFaultTolerant}} isn't completely done when 
> HDFS-9088 introduced this doc. Later there's also examples in 
> {{TestErasureCodingMultipleRacks}} to test this explicitly.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13770) dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted

2018-08-10 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576460#comment-16576460
 ] 

Zsolt Venczel commented on HDFS-13770:
--

Hi [~knanasi],

Thanks for working on this and thank you for the latest patch. Your changes 
seem to be fine for me.

I've checked, the test does fail without the fix and passes with the fix 
applied.

I found a few checkstyle issues for UnderReplicatedBlocks line 265 and 
TestUnderReplicatedBlocks line 164, 175 and 186.

Best regards,
Zsolt

> dfsadmin -report does not always decrease "missing blocks (with replication 
> factor 1)" metrics when file is deleted
> ---
>
> Key: HDFS-13770
> URL: https://issues.apache.org/jira/browse/HDFS-13770
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.7.7
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13770-branch-2.001.patch, 
> HDFS-13770-branch-2.002.patch
>
>
> Missing blocks (with replication factor 1) metric is not always decreased 
> when file is deleted.
> If a file is deleted, the remove function of UnderReplicatedBlocks can be 
> called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called 
> with the wrong priority the corruptReplOneBlocks metric is not decreased, 
> however the block is removed from the priority queue which contains it.
> The corresponding code:
> {code:java}
> /** remove a block from a under replication queue */
> synchronized boolean remove(BlockInfo block,
>  int oldReplicas,
>  int oldReadOnlyReplicas,
>  int decommissionedReplicas,
>  int oldExpectedReplicas) {
>  final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas,
>  decommissionedReplicas, oldExpectedReplicas);
>  boolean removedBlock = remove(block, priLevel);
>  if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS &&
>  oldExpectedReplicas == 1 &&
>  removedBlock) {
>  corruptReplOneBlocks--;
>  assert corruptReplOneBlocks >= 0 :
>  "Number of corrupt blocks with replication factor 1 " +
>  "should be non-negative";
>  }
>  return removedBlock;
> }
> /**
>  * Remove a block from the under replication queues.
>  *
>  * The priLevel parameter is a hint of which queue to query
>  * first: if negative or = \{@link #LEVEL} this shortcutting
>  * is not attmpted.
>  *
>  * If the block is not found in the nominated queue, an attempt is made to
>  * remove it from all queues.
>  *
>  * Warning: This is not a synchronized method.
>  * @param block block to remove
>  * @param priLevel expected privilege level
>  * @return true if the block was found and removed from one of the priority 
> queues
>  */
> boolean remove(BlockInfo block, int priLevel) {
>  if(priLevel >= 0 && priLevel < LEVEL
>  && priorityQueues.get(priLevel).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" +
>  " from priority queue {}", block, priLevel);
>  return true;
>  } else {
>  // Try to remove the block from all queues if the block was
>  // not found in the queue for the given priority level.
>  for (int i = 0; i < LEVEL; i++) {
>  if (i != priLevel && priorityQueues.get(i).remove(block)) {
>  NameNode.blockStateChangeLog.debug(
>  "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" +
>  " {} from priority queue {}", block, i);
>  return true;
>  }
>  }
>  }
>  return false;
> }
> {code}
> It is already fixed on trunk by this jira: HDFS-10999, but that ticket 
> introduces new metrics, which I think should't be backported to branch-2.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-08-06 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Attachment: HDFS-13697.07.patch

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch, HDFS-13697.07.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>  at 
> 

[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-08-06 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570178#comment-16570178
 ] 

Zsolt Venczel commented on HDFS-13697:
--

Thanks a lot [~xiaochen] for the support on this task. It's a challenging one 
indeed :)

Please find my answers below:
{quote}Ideally we want to do the same as DFSClient, where a ugi of 
{{UGI#getCurrentUser}} is just cached at construction time, and used for later 
auths. I tried that but it caused test failures in TestKMS with the 
{{doWebHDFSProxyUserTest}} tests and {{testTGTRenewal}} - for the sake of 
compatibility I think we can do something like this to allow the tests to pass.
{code:java}
// in KMSCP ctor
ugi = UserGroupInformation.getCurrentUser().getRealUser() == null ?
 UserGroupInformation.getCurrentUser() : 
 UserGroupInformation.getCurrentUser().getRealUser();
{code}
[~daryn] [~xyao] [~jnp] what do you think?
{quote}
The tests are failing because with the above approach we are not supporting the 
scenario when the user component provides new entitlements for KMS interactions 
through a doAs call (eg. calls the 'createConnection' function implicitly 
having a proxy user provided in a doAs context). If we do want to be 
compatible, caching at construction time the UGI is not enough.
{quote}
We don't need cachedProxyUgi, and getDoAsUser can figure things out from the 
ugi cached if we do the above
{quote}
I was trying to introduce some clean code here by defining explicitly under 
what circumstances can we have a cachedProxyUgi and by this I also moved one 
computation to the constructor level instead of having many on the getDoAsUser 
level. Does this make sense?
{quote}
ugiToUse doesn't seem necessary
{quote}
I was trying to make the code more meaningful and also to support the above 
mentioned, proxy scenario we still need to check whether the current call 
(currentUgi) introduces any proxy ugi.
{quote}
Could you explain why the setLoginUser lines were removed in TestKMS? I'd like 
to make sure existing tests pass as-is, if possible.
{quote}
I've reverted HADOOP-13749 and these lines were introduced by it. I'm not sure 
if it makes sense to set the login user even after the revert. What do you 
think?
{quote}
the new com.google imports should be placed next to other existing imports of 
that module.
{quote}
Thanks for checking, I've fixed it in my latest patch.
{quote}
I would not call the KeyProvider variable testKeyProvider - it's used for all 
purposes. Just the VisibleForTesting annotation on setKeyProvider would be 
enough, which you already have.
{quote}
Yes, it makes sense, I've fixed it in my latest patch. On a long run I might 
refactor these test cases to use Mockito to reduce production code complexity.
{quote}
The new patch's KeyProviderSupplier#isKeyProviderCreated doesn't seem 
necessary. We can't prevent the caller calling getKeyProvider after calling 
close here from that check. (We probably can add a guard in DFSClient to 
prevent all API calls after close, but that's separate from this jira.)
{quote}
KeyProviderSupplier#isKeyProviderCreated is the only way to know for sure 
whether KeyProvider got instantiated or not. If we call keyProviderCache.get() 
in the close method we might end up with an unnecessary creation of a 
KeyProvider.
I agree that we should take care of any post closure calls separately.
{quote}
Although callers seem to have check about nullity of the provider, if DFSClient 
failed to create a key provider, it's preferred to throw immediately.
{quote}
I was trying to reproduce the already available behavior present in the 
KeyProviderCache that had returned a null and had emitted warn level log 
messages. Should we change that?

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> 

[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-07-28 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Attachment: HDFS-13697.06.patch

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, 
> HDFS-13697.06.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>  at 
> 

[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-07-28 Thread Zsolt Venczel (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560799#comment-16560799
 ] 

Zsolt Venczel commented on HDFS-13697:
--

Thank you [~xiaochen] for doing the review, much appreciated!
 Please find my answers below:
{quote} * Per our discussion, we should be just caching UGI#getCurrentUser() at 
ctor.
 * Because doAsUser depends on the UGI, I think it would also make sense to 
cache that String at ctor.{quote}
In order to support proxy users and functionalities introduced by HADOOP-10698 
the current user and the doAsUser string cannot be cached. HADOOP-10698 does an 
in flight calculation as well at line 385 that I was trying to consolidate in 
the getDoAsUser function to reuse logic. Also, this feature requirement is 
being double checked by TestKMS#testProxyUserKerb and 
TestKMS#testProxyUserSimple tests.

Please let me know if I misunderstood the intentions in the code.
{quote} * Do we really need the supplier? It seems for each client the 
keyprovider will only be created once. If so I'd suggest we avoid caching the 
Supplier here.
 * 
{code:java}
public KeyProvider getKeyProvider() {
  return provider==null ? keyProviderSupplier.get() : provider;
}
{code}
need to handle the race condition here that multiple threads calling this 
method may end up creating more than 1 provider.
{quote}
Suppliers#memoize caches the output (the KeyProvider instance in this case) of 
the supplier and not the supplier. Also does this in a thread safe way not to 
create more than 1 provider.
{quote} * trivial, SafeModeAction changes are unrelated{quote}
Thanks for checking I've removed it in the latest patch.
{quote}
 can do a VisibleForTesting setKeyProvider method, so TestEncryptionZones and 
TestReservedRawPaths don't have to be modified.
{quote}
Thanks for the hint, it was a remnant of a revert commit. I updated the patch 
as you suggested.

 

In the latest patch I added a check to close the keyProvider only if it was 
created and also made the test key provider more explicit by renaming it to 
"testKeyProvider".

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> 

[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling

2018-07-27 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Summary: DFSClient should instantiate and cache KMSClientProvider using UGI 
at creation time for consistent UGI handling  (was: DFSClient should 
instantiate and cache KMSClientProvider at creation time for consistent UGI 
handling)

> DFSClient should instantiate and cache KMSClientProvider using UGI at 
> creation time for consistent UGI handling
> ---
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)

[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider at creation time for consistent UGI handling

2018-07-27 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Attachment: HDFS-13697.05.patch

> DFSClient should instantiate and cache KMSClientProvider at creation time for 
> consistent UGI handling
> -
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>  at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1440)
>  

[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider at creation time for consistent UGI handling

2018-07-27 Thread Zsolt Venczel (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zsolt Venczel updated HDFS-13697:
-
Attachment: (was: HDFS-13697.05.patch)

> DFSClient should instantiate and cache KMSClientProvider at creation time for 
> consistent UGI handling
> -
>
> Key: HDFS-13697
> URL: https://issues.apache.org/jira/browse/HDFS-13697
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, 
> HDFS-13697.03.patch, HDFS-13697.04.patch
>
>
> While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack 
> might not have doAs privileged execution call (in the DFSClient for example). 
> This results in loosing the proxy user from UGI as UGI.getCurrentUser finds 
> no AccessControllerContext and does a re-login for the login user only.
> This can cause the following for example: if we have set up the oozie user to 
> be entitled to perform actions on behalf of example_user but oozie is 
> forbidden to decrypt any EDEK (for security reasons), due to the above issue, 
> example_user entitlements are lost from UGI and the following error is 
> reported:
> {code}
> [0] 
> SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] 
> JOB[0020905-180313191552532-oozie-oozi-W] 
> ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting 
> action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message 
> [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with 
> ACL name [encrypted_key]!!]
> org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not 
> authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!!
>  at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>  at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199)
>  at 
> org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232)
>  at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
>  at org.apache.oozie.command.XCommand.call(XCommand.java:286)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332)
>  at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User 
> [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
> [encrypted_key]!!
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>  at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>  at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
>  at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>  at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1440)
>  at 
> 

  1   2   3   >