[jira] [Work started] (HDDS-1132) Ozone serialization codec for Ozone S3 secret table
[ https://issues.apache.org/jira/browse/HDDS-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-1132 started by Zsolt Venczel. --- > Ozone serialization codec for Ozone S3 secret table > --- > > Key: HDDS-1132 > URL: https://issues.apache.org/jira/browse/HDDS-1132 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Manager, S3 >Reporter: Elek, Marton >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > > HDDS-748/HDDS-864 introduced an option to use strongly typed metadata tables > and separated the serialization/deserialization logic to separated codec > implementation > HDDS-937 introduced a new S3 secret table which is not codec based. > I propose to use codecs for this table. > In OzoneMetadataManager the return value of getS3SecretTable() should be > changed from Table to Table. > The encoding/decoding logic of S3SecretValue should be registered in > ~OzoneMetadataManagerImpl:L204 > As the codecs are type based we may need a wrapper class to encode the String > kerberos id with md5: class S3SecretKey(String name = kerberodId). Long term > we can modify the S3SecretKey to support multiple keys for the same kerberos > id. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-1132) Ozone serialization codec for Ozone S3 secret table
[ https://issues.apache.org/jira/browse/HDDS-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel reassigned HDDS-1132: --- Assignee: Zsolt Venczel > Ozone serialization codec for Ozone S3 secret table > --- > > Key: HDDS-1132 > URL: https://issues.apache.org/jira/browse/HDDS-1132 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Manager, S3 >Reporter: Elek, Marton >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > > HDDS-748/HDDS-864 introduced an option to use strongly typed metadata tables > and separated the serialization/deserialization logic to separated codec > implementation > HDDS-937 introduced a new S3 secret table which is not codec based. > I propose to use codecs for this table. > In OzoneMetadataManager the return value of getS3SecretTable() should be > changed from Table to Table. > The encoding/decoding logic of S3SecretValue should be registered in > ~OzoneMetadataManagerImpl:L204 > As the codecs are type based we may need a wrapper class to encode the String > kerberos id with md5: class S3SecretKey(String name = kerberodId). Long term > we can modify the S3SecretKey to support multiple keys for the same kerberos > id. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14121) Log message about the old hosts file format is misleading
[ https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721319#comment-16721319 ] Zsolt Venczel commented on HDFS-14121: -- Good point [~templedf]. Will the legacy format be deprecated? Probably to make users drive away from potentially not supported formats a warning is more useful otherwise I agree that an info level is fine. > Log message about the old hosts file format is misleading > - > > Key: HDFS-14121 > URL: https://issues.apache.org/jira/browse/HDFS-14121 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-14121.01.patch, HDFS-14121.02.patch > > > In {{CombinedHostsFileReader.readFile()}} we have the following: > {code} LOG.warn("{} has invalid JSON format." + > "Try the old format without top-level token defined.", > hostsFile);{code} > That message is trying to say that we tried parsing the hosts file as a > well-formed JSON file and failed, so we're going to try again assuming that > it's in the old badly-formed format. What it actually says is that the hosts > fie is bad, and the admin should try switching to the old format. Those are > two very different things. > While were in there, we should refactor the logging so that instead of > reporting that we're going to try using a different parser (who the heck > cares?), we report that the we had to use the old parser to successfully > parse the hosts file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14121) Log message about the old hosts file format is misleading
[ https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720240#comment-16720240 ] Zsolt Venczel commented on HDFS-14121: -- Thanks [~knanasi] and [~templedf] for the valuable feedback. I've tried to address your concerns in the latest patch! > Log message about the old hosts file format is misleading > - > > Key: HDFS-14121 > URL: https://issues.apache.org/jira/browse/HDFS-14121 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-14121.01.patch, HDFS-14121.02.patch > > > In {{CombinedHostsFileReader.readFile()}} we have the following: > {code} LOG.warn("{} has invalid JSON format." + > "Try the old format without top-level token defined.", > hostsFile);{code} > That message is trying to say that we tried parsing the hosts file as a > well-formed JSON file and failed, so we're going to try again assuming that > it's in the old badly-formed format. What it actually says is that the hosts > fie is bad, and the admin should try switching to the old format. Those are > two very different things. > While were in there, we should refactor the logging so that instead of > reporting that we're going to try using a different parser (who the heck > cares?), we report that the we had to use the old parser to successfully > parse the hosts file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14121) Log message about the old hosts file format is misleading
[ https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-14121: - Attachment: HDFS-14121.02.patch > Log message about the old hosts file format is misleading > - > > Key: HDFS-14121 > URL: https://issues.apache.org/jira/browse/HDFS-14121 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-14121.01.patch, HDFS-14121.02.patch > > > In {{CombinedHostsFileReader.readFile()}} we have the following: > {code} LOG.warn("{} has invalid JSON format." + > "Try the old format without top-level token defined.", > hostsFile);{code} > That message is trying to say that we tried parsing the hosts file as a > well-formed JSON file and failed, so we're going to try again assuming that > it's in the old badly-formed format. What it actually says is that the hosts > fie is bad, and the admin should try switching to the old format. Those are > two very different things. > While were in there, we should refactor the logging so that instead of > reporting that we're going to try using a different parser (who the heck > cares?), we report that the we had to use the old parser to successfully > parse the hosts file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock
[ https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720142#comment-16720142 ] Zsolt Venczel commented on HDFS-14101: -- Thanks a lot [~mackrorysd] for the meaningful note and the commit! We might want to increase the corruption length or not use random at all in unit tests? > Random failure of testListCorruptFilesCorruptedBlock > > > Key: HDFS-14101 > URL: https://issues.apache.org/jira/browse/HDFS-14101 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.2.0, 3.0.3, 2.8.5 >Reporter: Kihwal Lee >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > Fix For: 3.3.0 > > Attachments: HDFS-14101.01.patch, HDFS-14101.02.patch > > > We've seen this occasionally. > {noformat} > java.lang.IllegalArgumentException: Negative position > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755) > at org.apache.hadoop.hdfs.server.namenode. > > TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105) > {noformat} > The test has a flaw. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13843) RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI
[ https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714593#comment-16714593 ] Zsolt Venczel commented on HDFS-13843: -- Thanks for your feedback [~SoumyaPN]! {quote}Sure.It doesn't . It was related to showing up the info in UI when two destinations are mounted to one NS.NS1->tmp1 and NS1->tmp2 then in the UI it was not showing order. {quote} I might have misunderstood but based on the description _"But order information like HASH, RANDOM is not displayed in mount entries and also not displayed in federation router UI."_ the order information was not displayed in mount entries *and* not displayed in the UI. I added the order information that was missing and fixed the UI as it was not displaying order information ever (not just in case of multiple destinations). Based on your reply I assume the UI fix is needed only. Am I right? {quote}There is already one JIRA addressing it. {quote} Can you point me to that jira? Should this jira be closed as a duplicate then? You are right about compatibility problems. Would adding a new command to require the Order list be more beneficial? Best regards, Zsolt > RBF: When we add/update mount entry to multiple destinations, unable to see > the order information in mount entry points and in federation router UI > --- > > Key: HDFS-13843 > URL: https://issues.apache.org/jira/browse/HDFS-13843 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Reporter: Soumyapn >Assignee: Zsolt Venczel >Priority: Major > Labels: RBF > Attachments: HDFS-13843.01.patch, HDFS-13843.02.patch > > > *Scenario:* > Execute the below add/update command for single mount entry for single > nameservice pointing to multiple destinations. > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1 > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3 > # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order > RANDOM > *Actual*. With the above commands, mount entry is successfully updated. > But order information like HASH, RANDOM is not displayed in mount entries and > also not displayed in federation router UI. However order information is > updated properly when there are multiple nameservices. This issue is with > single nameservice having multiple destinations. > *Expected:* > *Order information should be updated in mount entries so that the user will > come to know which order has been set.* > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13843) RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI
[ https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13843: - Summary: RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI (was: RBF: show the order when listing mount points) > RBF: When we add/update mount entry to multiple destinations, unable to see > the order information in mount entry points and in federation router UI > --- > > Key: HDFS-13843 > URL: https://issues.apache.org/jira/browse/HDFS-13843 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Reporter: Soumyapn >Assignee: Zsolt Venczel >Priority: Major > Labels: RBF > Attachments: HDFS-13843.01.patch, HDFS-13843.02.patch > > > *Scenario:* > Execute the below add/update command for single mount entry for single > nameservice pointing to multiple destinations. > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1 > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3 > # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order > RANDOM > *Actual*. With the above commands, mount entry is successfully updated. > But order information like HASH, RANDOM is not displayed in mount entries and > also not displayed in federation router UI. However order information is > updated properly when there are multiple nameservices. This issue is with > single nameservice having multiple destinations. > *Expected:* > *Order information should be updated in mount entries so that the user will > come to know which order has been set.* > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13843) RBF: show the order when listing mount points
[ https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13843: - Attachment: HDFS-13843.02.patch > RBF: show the order when listing mount points > - > > Key: HDFS-13843 > URL: https://issues.apache.org/jira/browse/HDFS-13843 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Reporter: Soumyapn >Assignee: Zsolt Venczel >Priority: Major > Labels: RBF > Attachments: HDFS-13843.01.patch, HDFS-13843.02.patch > > > *Scenario:* > Execute the below add/update command for single mount entry for single > nameservice pointing to multiple destinations. > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1 > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3 > # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order > RANDOM > *Actual*. With the above commands, mount entry is successfully updated. > But order information like HASH, RANDOM is not displayed in mount entries and > also not displayed in federation router UI. However order information is > updated properly when there are multiple nameservices. This issue is with > single nameservice having multiple destinations. > *Expected:* > *Order information should be updated in mount entries so that the user will > come to know which order has been set.* > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13843) RBF: show the order when listing mount points
[ https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714526#comment-16714526 ] Zsolt Venczel commented on HDFS-13843: -- Thank you so much for the quick review [~elgoiri]! I addressed your concerns in the latest patch! > RBF: show the order when listing mount points > - > > Key: HDFS-13843 > URL: https://issues.apache.org/jira/browse/HDFS-13843 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Reporter: Soumyapn >Assignee: Zsolt Venczel >Priority: Major > Labels: RBF > Attachments: HDFS-13843.01.patch, HDFS-13843.02.patch > > > *Scenario:* > Execute the below add/update command for single mount entry for single > nameservice pointing to multiple destinations. > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1 > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3 > # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order > RANDOM > *Actual*. With the above commands, mount entry is successfully updated. > But order information like HASH, RANDOM is not displayed in mount entries and > also not displayed in federation router UI. However order information is > updated properly when there are multiple nameservices. This issue is with > single nameservice having multiple destinations. > *Expected:* > *Order information should be updated in mount entries so that the user will > come to know which order has been set.* > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13843) RBF: show the order when listing mount points
[ https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13843: - Summary: RBF: show the order when listing mount points (was: RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI) > RBF: show the order when listing mount points > - > > Key: HDFS-13843 > URL: https://issues.apache.org/jira/browse/HDFS-13843 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Reporter: Soumyapn >Assignee: Zsolt Venczel >Priority: Major > Labels: RBF > Attachments: HDFS-13843.01.patch > > > *Scenario:* > Execute the below add/update command for single mount entry for single > nameservice pointing to multiple destinations. > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1 > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3 > # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order > RANDOM > *Actual*. With the above commands, mount entry is successfully updated. > But order information like HASH, RANDOM is not displayed in mount entries and > also not displayed in federation router UI. However order information is > updated properly when there are multiple nameservices. This issue is with > single nameservice having multiple destinations. > *Expected:* > *Order information should be updated in mount entries so that the user will > come to know which order has been set.* > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13843) RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI
[ https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13843: - Attachment: HDFS-13843.01.patch Status: Patch Available (was: In Progress) > RBF: When we add/update mount entry to multiple destinations, unable to see > the order information in mount entry points and in federation router UI > --- > > Key: HDFS-13843 > URL: https://issues.apache.org/jira/browse/HDFS-13843 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Reporter: Soumyapn >Assignee: Zsolt Venczel >Priority: Major > Labels: RBF > Attachments: HDFS-13843.01.patch > > > *Scenario:* > Execute the below add/update command for single mount entry for single > nameservice pointing to multiple destinations. > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1 > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3 > # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order > RANDOM > *Actual*. With the above commands, mount entry is successfully updated. > But order information like HASH, RANDOM is not displayed in mount entries and > also not displayed in federation router UI. However order information is > updated properly when there are multiple nameservices. This issue is with > single nameservice having multiple destinations. > *Expected:* > *Order information should be updated in mount entries so that the user will > come to know which order has been set.* > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14121) Log message about the old hosts file format is misleading
[ https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-14121: - Attachment: HDFS-14121.01.patch Status: Patch Available (was: Open) > Log message about the old hosts file format is misleading > - > > Key: HDFS-14121 > URL: https://issues.apache.org/jira/browse/HDFS-14121 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-14121.01.patch > > > In {{CombinedHostsFileReader.readFile()}} we have the following: > {code} LOG.warn("{} has invalid JSON format." + > "Try the old format without top-level token defined.", > hostsFile);{code} > That message is trying to say that we tried parsing the hosts file as a > well-formed JSON file and failed, so we're going to try again assuming that > it's in the old badly-formed format. What it actually says is that the hosts > fie is bad, and the admin should try switching to the old format. Those are > two very different things. > While were in there, we should refactor the logging so that instead of > reporting that we're going to try using a different parser (who the heck > cares?), we report that the we had to use the old parser to successfully > parse the hosts file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14121) Log message about the old hosts file format is misleading
[ https://issues.apache.org/jira/browse/HDFS-14121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel reassigned HDFS-14121: Assignee: Zsolt Venczel > Log message about the old hosts file format is misleading > - > > Key: HDFS-14121 > URL: https://issues.apache.org/jira/browse/HDFS-14121 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Zsolt Venczel >Priority: Major > > In {{CombinedHostsFileReader.readFile()}} we have the following: > {code} LOG.warn("{} has invalid JSON format." + > "Try the old format without top-level token defined.", > hostsFile);{code} > That message is trying to say that we tried parsing the hosts file as a > well-formed JSON file and failed, so we're going to try again assuming that > it's in the old badly-formed format. What it actually says is that the hosts > fie is bad, and the admin should try switching to the old format. Those are > two very different things. > While were in there, we should refactor the logging so that instead of > reporting that we're going to try using a different parser (who the heck > cares?), we report that the we had to use the old parser to successfully > parse the hosts file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock
[ https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703048#comment-16703048 ] Zsolt Venczel commented on HDFS-14101: -- Thanks for the review [~ayushtkn] and for the valuable comment. In the latest patch I was trying to make the relation between the minimum file size and the size of the data meant to corrupt the block more clear by shared constants. I hope this way it's more meaningful than a comment. > Random failure of testListCorruptFilesCorruptedBlock > > > Key: HDFS-14101 > URL: https://issues.apache.org/jira/browse/HDFS-14101 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.2.0, 3.0.3, 2.8.5 >Reporter: Kihwal Lee >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > Attachments: HDFS-14101.01.patch, HDFS-14101.02.patch > > > We've seen this occasionally. > {noformat} > java.lang.IllegalArgumentException: Negative position > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755) > at org.apache.hadoop.hdfs.server.namenode. > > TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105) > {noformat} > The test has a flaw. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock
[ https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-14101: - Attachment: HDFS-14101.02.patch > Random failure of testListCorruptFilesCorruptedBlock > > > Key: HDFS-14101 > URL: https://issues.apache.org/jira/browse/HDFS-14101 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.2.0, 3.0.3, 2.8.5 >Reporter: Kihwal Lee >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > Attachments: HDFS-14101.01.patch, HDFS-14101.02.patch > > > We've seen this occasionally. > {noformat} > java.lang.IllegalArgumentException: Negative position > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755) > at org.apache.hadoop.hdfs.server.namenode. > > TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105) > {noformat} > The test has a flaw. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12116) BlockReportTestBase#blockReport_08 and #blockReport_08 intermittently fail
[ https://issues.apache.org/jira/browse/HDFS-12116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel reassigned HDFS-12116: Assignee: Zsolt Venczel > BlockReportTestBase#blockReport_08 and #blockReport_08 intermittently fail > -- > > Key: HDFS-12116 > URL: https://issues.apache.org/jira/browse/HDFS-12116 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 0.22.0 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-12116.01.patch, HDFS-12116.02.patch, > HDFS-12116.03.patch, > TEST-org.apache.hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage.xml > > > This seems to be long-standing, but the failure rate (~10%) is slightly > higher in dist-test run in using cdh. > In both _08 and _09 tests: > # an attempt is made to make a replica in {{TEMPORARY}} > state, by {{waitForTempReplica}}. > # Once that's returned, the test goes on to verify block reports shows > correct pending replication blocks. > But there's a race condition. If the replica is replicated between steps #1 > and #2, {{getPendingReplicationBlocks}} could return 0 or 1, depending on how > many replicas are replicated, hence failing the test. > Failures are seen on both {{TestNNHandlesBlockReportPerStorage}} and > {{TestNNHandlesCombinedBlockReport}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700631#comment-16700631 ] Zsolt Venczel commented on HDFS-13998: -- Thanks for the clarification [~templedf]. [~brahmareddy] and [~vinayrpet] In light of what [~xiaochen] replied here: [comment-16685723|https://issues.apache.org/jira/browse/HDFS-13998?focusedCommentId=16685723=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16685723] and what [~templedf] summarized here: [comment-16688719|https://issues.apache.org/jira/browse/HDFS-13998?focusedCommentId=16688719=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16688719] what do you think, how should this issue progress? Thanks and best regards, Zsolt > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, > HDFS-13998.03.patch > > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock
[ https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700374#comment-16700374 ] Zsolt Venczel commented on HDFS-14101: -- Thanks [~kihwal] for reporting the issue! DFSTestUtil.Builder creates files with random size no larger then 512 bytes and no smaller then 1 byte. {code} DFSTestUtil util = new DFSTestUtil.Builder(). setName("testCorruptFilesCorruptedBlock").setNumFiles(2). setMaxLevels(1).setMaxSize(512).build(); {code} Whenever the file size is 1 byte the test fails as it tries to corrupt a block by inserting a 2 bytes long buffer starting 2 bytes before the end of the file that is -1. The submitted patch should fix this (statistically this test had failed 1 times per 512 run but it was running fine for 2000 runs having the patch). > Random failure of testListCorruptFilesCorruptedBlock > > > Key: HDFS-14101 > URL: https://issues.apache.org/jira/browse/HDFS-14101 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.2.0, 3.0.3, 2.8.5 >Reporter: Kihwal Lee >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > Attachments: HDFS-14101.01.patch > > > We've seen this occasionally. > {noformat} > java.lang.IllegalArgumentException: Negative position > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755) > at org.apache.hadoop.hdfs.server.namenode. > > TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105) > {noformat} > The test has a flaw. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock
[ https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-14101: - Attachment: HDFS-14101.01.patch > Random failure of testListCorruptFilesCorruptedBlock > > > Key: HDFS-14101 > URL: https://issues.apache.org/jira/browse/HDFS-14101 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.2.0, 3.0.3, 2.8.5 >Reporter: Kihwal Lee >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > Attachments: HDFS-14101.01.patch > > > We've seen this occasionally. > {noformat} > java.lang.IllegalArgumentException: Negative position > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755) > at org.apache.hadoop.hdfs.server.namenode. > > TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105) > {noformat} > The test has a flaw. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock
[ https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-14101: - Status: Patch Available (was: In Progress) > Random failure of testListCorruptFilesCorruptedBlock > > > Key: HDFS-14101 > URL: https://issues.apache.org/jira/browse/HDFS-14101 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.3, 3.2.0, 2.8.5 >Reporter: Kihwal Lee >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > Attachments: HDFS-14101.01.patch > > > We've seen this occasionally. > {noformat} > java.lang.IllegalArgumentException: Negative position > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755) > at org.apache.hadoop.hdfs.server.namenode. > > TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105) > {noformat} > The test has a flaw. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock
[ https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-14101 started by Zsolt Venczel. > Random failure of testListCorruptFilesCorruptedBlock > > > Key: HDFS-14101 > URL: https://issues.apache.org/jira/browse/HDFS-14101 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.5 >Reporter: Kihwal Lee >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > > We've seen this occasionally. > {noformat} > java.lang.IllegalArgumentException: Negative position > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755) > at org.apache.hadoop.hdfs.server.namenode. > > TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105) > {noformat} > The test has a flaw. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock
[ https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-14101: - Target Version/s: 3.0.4, 3.3.0, 2.8.6, 3.2.1 (was: 2.8.6) > Random failure of testListCorruptFilesCorruptedBlock > > > Key: HDFS-14101 > URL: https://issues.apache.org/jira/browse/HDFS-14101 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.2.0, 3.0.3, 2.8.5 >Reporter: Kihwal Lee >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > > We've seen this occasionally. > {noformat} > java.lang.IllegalArgumentException: Negative position > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755) > at org.apache.hadoop.hdfs.server.namenode. > > TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105) > {noformat} > The test has a flaw. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock
[ https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-14101: - Affects Version/s: (was: 3.2.1) (was: 3.3.0) 3.2.0 > Random failure of testListCorruptFilesCorruptedBlock > > > Key: HDFS-14101 > URL: https://issues.apache.org/jira/browse/HDFS-14101 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.2.0, 3.0.3, 2.8.5 >Reporter: Kihwal Lee >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > > We've seen this occasionally. > {noformat} > java.lang.IllegalArgumentException: Negative position > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755) > at org.apache.hadoop.hdfs.server.namenode. > > TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105) > {noformat} > The test has a flaw. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock
[ https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-14101: - Affects Version/s: 3.2.1 3.3.0 3.0.3 > Random failure of testListCorruptFilesCorruptedBlock > > > Key: HDFS-14101 > URL: https://issues.apache.org/jira/browse/HDFS-14101 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.3, 2.8.5, 3.3.0, 3.2.1 >Reporter: Kihwal Lee >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > > We've seen this occasionally. > {noformat} > java.lang.IllegalArgumentException: Negative position > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755) > at org.apache.hadoop.hdfs.server.namenode. > > TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105) > {noformat} > The test has a flaw. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14101) Random failure of testListCorruptFilesCorruptedBlock
[ https://issues.apache.org/jira/browse/HDFS-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel reassigned HDFS-14101: Assignee: Zsolt Venczel > Random failure of testListCorruptFilesCorruptedBlock > > > Key: HDFS-14101 > URL: https://issues.apache.org/jira/browse/HDFS-14101 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.5 >Reporter: Kihwal Lee >Assignee: Zsolt Venczel >Priority: Major > Labels: newbie > > We've seen this occasionally. > {noformat} > java.lang.IllegalArgumentException: Negative position > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:755) > at org.apache.hadoop.hdfs.server.namenode. > > TestListCorruptFileBlocks.testListCorruptFilesCorruptedBlock(TestListCorruptFileBlocks.java:105) > {noformat} > The test has a flaw. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14100) TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml fails due to missing dfs.image.string-tables.expanded from hdfs-defaults.xml
[ https://issues.apache.org/jira/browse/HDFS-14100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel resolved HDFS-14100. -- Resolution: Invalid The failure had happened due to a local git issue. > TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml fails due > to missing dfs.image.string-tables.expanded from hdfs-defaults.xml > > > Key: HDFS-14100 > URL: https://issues.apache.org/jira/browse/HDFS-14100 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > > After HDFS-13882 > TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml requires > hdfs-defaults.xml to have dfs.image.string-tables.expanded added and > populated with a default value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14100) TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml fails due to missing dfs.image.string-tables.expanded from hdfs-defaults.xml
Zsolt Venczel created HDFS-14100: Summary: TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml fails due to missing dfs.image.string-tables.expanded from hdfs-defaults.xml Key: HDFS-14100 URL: https://issues.apache.org/jira/browse/HDFS-14100 Project: Hadoop HDFS Issue Type: Bug Reporter: Zsolt Venczel Assignee: Zsolt Venczel After HDFS-13882 TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml requires hdfs-defaults.xml to have dfs.image.string-tables.expanded added and populated with a default value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14054) TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky
[ https://issues.apache.org/jira/browse/HDFS-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686550#comment-16686550 ] Zsolt Venczel edited comment on HDFS-14054 at 11/14/18 2:50 PM: The failure happened due to FSEditLog.endCurrentLogSegment not being mocked early enough that had caused the edit log finalization to fail. In very rare cases I've seen NPE in line 573. that is handled as well. Also in very rare cases the waitForMillis for line 575. was not enough. was (Author: zvenczel): The failure happened due to FSEditLog.endCurrentLogSegment not being mocked early enough that had caused the edit log finalization to fail. In very rare cases I've seen NPE in line 573. that is handled as well. Also in very rare cases the timeout for line 575. was not enough. > TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and > testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky > > > Key: HDFS-14054 > URL: https://issues.apache.org/jira/browse/HDFS-14054 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.3 >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Labels: flaky-test > Attachments: HDFS-14054.01.patch > > > --- > T E S T S > --- > OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support > was removed in 8.0 > Running org.apache.hadoop.hdfs.TestLeaseRecovery2 > Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2 > testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2) > Time elapsed: 4.375 sec <<< FAILURE! > java.lang.AssertionError: lease holder should now be the NN > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437) > testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2) > Time elapsed: 4.339 sec <<< FAILURE! > java.lang.AssertionError: lease holder should now be the NN > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443) > Results : > Failed tests: > > TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568 > lease holder should now be the NN > > TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568 > lease holder should now be the NN > Tests run: 7, Failures: 2, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14054) TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky
[ https://issues.apache.org/jira/browse/HDFS-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686550#comment-16686550 ] Zsolt Venczel commented on HDFS-14054: -- The failure happened due to FSEditLog.endCurrentLogSegment not being mocked early enough that had caused the edit log finalization to fail. In very rare cases I've seen NPE in line 573. that is handled as well. Also in very rare cases the timeout for line 575. was not enough. > TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and > testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky > > > Key: HDFS-14054 > URL: https://issues.apache.org/jira/browse/HDFS-14054 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.3 >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Labels: flaky-test > Attachments: HDFS-14054.01.patch > > > --- > T E S T S > --- > OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support > was removed in 8.0 > Running org.apache.hadoop.hdfs.TestLeaseRecovery2 > Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2 > testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2) > Time elapsed: 4.375 sec <<< FAILURE! > java.lang.AssertionError: lease holder should now be the NN > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437) > testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2) > Time elapsed: 4.339 sec <<< FAILURE! > java.lang.AssertionError: lease holder should now be the NN > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443) > Results : > Failed tests: > > TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568 > lease holder should now be the NN > > TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568 > lease holder should now be the NN > Tests run: 7, Failures: 2, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14054) TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky
[ https://issues.apache.org/jira/browse/HDFS-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-14054: - Attachment: HDFS-14054.01.patch > TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and > testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky > > > Key: HDFS-14054 > URL: https://issues.apache.org/jira/browse/HDFS-14054 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.3 >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Labels: flaky-test > Attachments: HDFS-14054.01.patch > > > --- > T E S T S > --- > OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support > was removed in 8.0 > Running org.apache.hadoop.hdfs.TestLeaseRecovery2 > Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2 > testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2) > Time elapsed: 4.375 sec <<< FAILURE! > java.lang.AssertionError: lease holder should now be the NN > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437) > testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2) > Time elapsed: 4.339 sec <<< FAILURE! > java.lang.AssertionError: lease holder should now be the NN > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443) > Results : > Failed tests: > > TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568 > lease holder should now be the NN > > TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568 > lease holder should now be the NN > Tests run: 7, Failures: 2, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14054) TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky
[ https://issues.apache.org/jira/browse/HDFS-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-14054: - Status: Patch Available (was: In Progress) > TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and > testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky > > > Key: HDFS-14054 > URL: https://issues.apache.org/jira/browse/HDFS-14054 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.3, 2.6.0 >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Labels: flaky-test > Attachments: HDFS-14054.01.patch > > > --- > T E S T S > --- > OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support > was removed in 8.0 > Running org.apache.hadoop.hdfs.TestLeaseRecovery2 > Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2 > testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2) > Time elapsed: 4.375 sec <<< FAILURE! > java.lang.AssertionError: lease holder should now be the NN > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437) > testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2) > Time elapsed: 4.339 sec <<< FAILURE! > java.lang.AssertionError: lease holder should now be the NN > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443) > Results : > Failed tests: > > TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568 > lease holder should now be the NN > > TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568 > lease holder should now be the NN > Tests run: 7, Failures: 2, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683575#comment-16683575 ] Zsolt Venczel commented on HDFS-13998: -- Thank you [~brahmareddy] for taking a look! Please find my comments below: {quote}IMHO, HDFS-13732 change might not require..? As admin will be aware of configured policy and these are admin commands. {quote} For supportability reasons helping out administrators (there could be many) by displaying the actual outcome of their actions can be valuable. We support them by providing a warning message as well when the directory is not empty. I think this is also valuable despite its load (a listStatus command is executed that adds an extra audit log entry and might also return 1000 FileStatus information by default if the directory is large enough). {quote}Adding RPC can mislead For concurrent calls and any error while getting the policy after setting. {quote} In this scenario not knowing the default might be even worse. {quote}and Extra overhead as Ayush Saxena mentioned. Audit log ( for debugging) and RPC call {quote} I think we have a common understanding with [~ayushtkn] here that the overhead would be worth it. [~ayushtkn] can you please comment? {quote}If we really required why can't we do through getserverdefaults()(by adding EC field there). {quote} I think any change on the default EC policy would not be reflected in the serverdefaults on the client without config re-distribution that might also lead to confusions. > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, > HDFS-13998.03.patch > > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-14054) TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky
[ https://issues.apache.org/jira/browse/HDFS-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-14054 started by Zsolt Venczel. > TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and > testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky > > > Key: HDFS-14054 > URL: https://issues.apache.org/jira/browse/HDFS-14054 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0, 3.0.3 >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Labels: flaky-test > > --- > T E S T S > --- > OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support > was removed in 8.0 > Running org.apache.hadoop.hdfs.TestLeaseRecovery2 > Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2 > testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2) > Time elapsed: 4.375 sec <<< FAILURE! > java.lang.AssertionError: lease holder should now be the NN > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437) > testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2) > Time elapsed: 4.339 sec <<< FAILURE! > java.lang.AssertionError: lease holder should now be the NN > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443) > Results : > Failed tests: > > TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568 > lease holder should now be the NN > > TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568 > lease holder should now be the NN > Tests run: 7, Failures: 2, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682428#comment-16682428 ] Zsolt Venczel commented on HDFS-13998: -- Thanks for the review [~templedf]! As I can see the extraneous " " was there for some time now and it was added due to the line length checkstyle limitation. I reshuffled this code section a bit to look prettier. > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, > HDFS-13998.03.patch > > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13998: - Attachment: HDFS-13998.03.patch > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch, > HDFS-13998.03.patch > > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13998: - Attachment: HDFS-13998.02.patch > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13998.01.patch, HDFS-13998.02.patch > > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13985) Clearer error message for ReplicaNotFoundException
[ https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681145#comment-16681145 ] Zsolt Venczel commented on HDFS-13985: -- Thanks [~adam.antal] for the update. I think patch 002 is good to go. +1 (non-binding) > Clearer error message for ReplicaNotFoundException > -- > > Key: HDFS-13985 > URL: https://issues.apache.org/jira/browse/HDFS-13985 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HDFS-13985.001.patch, HDFS-13985.002.patch > > > The issue is that we came across a ReplicaNotFoundException in a bug report, > the most informative thing we could get is "Replica not found for > [ExtendedBlock]". If someone tries to investigate cases including > ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, > but as a starting point enhancing the exception message would boost this > process, and be beneficial in the long run. > More concretely, it would be helpful if any of the following information was > displayed along with the exception: file's name, replication factor or block > location. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679470#comment-16679470 ] Zsolt Venczel commented on HDFS-13998: -- [~ayushtkn] your proposed solution is more efficient I completely agree! It also leaves some gap for race conditions I was also intending to close but I would agree on dropping these concern. What do you think [~xiaochen]? Best regards, Zsolt > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13998.01.patch > > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678544#comment-16678544 ] Zsolt Venczel commented on HDFS-13998: -- [~ayushtkn] thanks for sharing your concerns about setPolicy adding additional audit log entries. My solution for HDFS-13732 added an additional getPolicy call to fetch the default policy as there is no way for the ECAdmin to know precisely the NN default settings. If you think this is not the preferred solution we could think about extending the setPolicy RPC call to return the actual policy set but this is a more involved change and should be tracked separately. What do you think? Best regards, Zsolt > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13998.01.patch > > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13998: - Attachment: HDFS-13998.01.patch Status: Patch Available (was: In Progress) > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13998.01.patch > > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13985) Clearer error message for ReplicaNotFoundException
[ https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677893#comment-16677893 ] Zsolt Venczel edited comment on HDFS-13985 at 11/7/18 9:23 AM: --- Thanks for the patch [~adam.antal]! I think the message content in line 43. is fine and is more meaningful. Extending the message for the *public ReplicaNotFoundException(ExtendedBlock b)* constructor with it makes sense. I have some concerns with extending the message for the *public ReplicaNotFoundException(String msg)* constructor as it has various use-cases having various messages that can be distorted by this message extension (a few examples in FsDatasetImpl). What do you think? was (Author: zvenczel): Thanks for the patch [~adam.antal]! I think the message content in line 43. is fine and should be more meaningful. Extending the message for the *public ReplicaNotFoundException(ExtendedBlock b)* constructor with it makes sense. I have some concerns with extending the message for the *public ReplicaNotFoundException(String msg)* constructor as it has various use-cases having various messages that can be distorted by this message extension (a few examples in FsDatasetImpl). What do you think? > Clearer error message for ReplicaNotFoundException > -- > > Key: HDFS-13985 > URL: https://issues.apache.org/jira/browse/HDFS-13985 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HDFS-13985.001.patch > > > The issue is that we came across a ReplicaNotFoundException in a bug report, > the most informative thing we could get is "Replica not found for > [ExtendedBlock]". If someone tries to investigate cases including > ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, > but as a starting point enhancing the exception message would boost this > process, and be beneficial in the long run. > More concretely, it would be helpful if any of the following information was > displayed along with the exception: file's name, replication factor or block > location. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13985) Clearer error message for ReplicaNotFoundException
[ https://issues.apache.org/jira/browse/HDFS-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677893#comment-16677893 ] Zsolt Venczel commented on HDFS-13985: -- Thanks for the patch [~adam.antal]! I think the message content in line 43. is fine and should be more meaningful. Extending the message for the *public ReplicaNotFoundException(ExtendedBlock b)* constructor with it makes sense. I have some concerns with extending the message for the *public ReplicaNotFoundException(String msg)* constructor as it has various use-cases having various messages that can be distorted by this message extension (a few examples in FsDatasetImpl). What do you think? > Clearer error message for ReplicaNotFoundException > -- > > Key: HDFS-13985 > URL: https://issues.apache.org/jira/browse/HDFS-13985 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HDFS-13985.001.patch > > > The issue is that we came across a ReplicaNotFoundException in a bug report, > the most informative thing we could get is "Replica not found for > [ExtendedBlock]". If someone tries to investigate cases including > ReplicaNotFoundExceptions should review diagnostic bundles, dig through logs, > but as a starting point enhancing the exception message would boost this > process, and be beneficial in the long run. > More concretely, it would be helpful if any of the following information was > displayed along with the exception: file's name, replication factor or block > location. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14054) TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky
Zsolt Venczel created HDFS-14054: Summary: TestLeaseRecovery2: testHardLeaseRecoveryAfterNameNodeRestart2 and testHardLeaseRecoveryWithRenameAfterNameNodeRestart are flaky Key: HDFS-14054 URL: https://issues.apache.org/jira/browse/HDFS-14054 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.3, 2.6.0 Reporter: Zsolt Venczel Assignee: Zsolt Venczel --- T E S T S --- OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.TestLeaseRecovery2 Tests run: 7, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 68.971 sec <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2 testHardLeaseRecoveryAfterNameNodeRestart2(org.apache.hadoop.hdfs.TestLeaseRecovery2) Time elapsed: 4.375 sec <<< FAILURE! java.lang.AssertionError: lease holder should now be the NN at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568) at org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520) at org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2(TestLeaseRecovery2.java:437) testHardLeaseRecoveryWithRenameAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2) Time elapsed: 4.339 sec <<< FAILURE! java.lang.AssertionError: lease holder should now be the NN at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.TestLeaseRecovery2.checkLease(TestLeaseRecovery2.java:568) at org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:520) at org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart(TestLeaseRecovery2.java:443) Results : Failed tests: TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart2:437->hardLeaseRecoveryRestartHelper:520->checkLease:568 lease holder should now be the NN TestLeaseRecovery2.testHardLeaseRecoveryWithRenameAfterNameNodeRestart:443->hardLeaseRecoveryRestartHelper:520->checkLease:568 lease holder should now be the NN Tests run: 7, Failures: 2, Errors: 0, Skipped: 0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669157#comment-16669157 ] Zsolt Venczel commented on HDFS-13998: -- [~ayushtkn] I started progressing with it. > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate
[ https://issues.apache.org/jira/browse/HDFS-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-13998 started by Zsolt Venczel. > ECAdmin NPE with -setPolicy -replicate > -- > > Key: HDFS-13998 > URL: https://issues.apache.org/jira/browse/HDFS-13998 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.2.0, 3.1.2 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > > HDFS-13732 tried to improve the output of the console tool. But we missed the > fact that for replication, {{getErasureCodingPolicy}} would return null. > This jira is to fix it in ECAdmin, and add a unit test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13860) Space character in the path is shown as "+" while creating dirs in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655009#comment-16655009 ] Zsolt Venczel commented on HDFS-13860: -- Thank you very much [~shashikant] for reporting the issue and providing a patch! I went through the code and I find the fix to be fine, it makes webhdfs behavior more consistent with the non webhdfs use cases. One last inconsistency I found was that with your patch, path having "+" will be transformed to space and we should use %2B instead. In my opinion this is a compromise we can live with especially if it's documented. I'd +1 (non-binding) patch 01 with some documentation update. > Space character in the path is shown as "+" while creating dirs in WebHDFS > --- > > Key: HDFS-13860 > URL: https://issues.apache.org/jira/browse/HDFS-13860 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0, 3.2.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13860.00.patch, HDFS-13860.01.patch > > > $ ./hdfs dfs -mkdir webhdfs://127.0.0.1/tmp1/"file 1" > 2018-08-23 15:16:08,258 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > $ ./hdfs dfs -ls webhdfs://127.0.0.1/tmp1 > 2018-08-23 15:16:21,244 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Found 1 items > drwxr-xr-x - sbanerjee hadoop 0 2018-08-23 15:16 > webhdfs://127.0.0.1/tmp1/file+1 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646073#comment-16646073 ] Zsolt Venczel commented on HDFS-13697: -- Thanks a lot [~daryn] for your reply! Just a quick note: {quote}3. If all tests are passing, the patch is flawed. I recall the tests codified bugs.{quote} I took a look at the flawed tests and fixed them as far as I can tell. > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.11.patch, > HDFS-13697.12.patch, HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205) > at >
[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610349#comment-16610349 ] Zsolt Venczel commented on HDFS-13697: -- Above test failures should be unrelated as they are passing with the patch applied: {code} [INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-hdfs --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal [INFO] Tests run: 38, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.834 s - in org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal [INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 115 s - in org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner [INFO] [INFO] Results: [INFO] [INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0 [INFO] {code} > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.11.patch, > HDFS-13697.12.patch, HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at >
[jira] [Comment Edited] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609685#comment-16609685 ] Zsolt Venczel edited comment on HDFS-13697 at 9/10/18 8:48 PM: --- In my latest patch I fixed the TestEncryptionZonesWithKMS failure. With the latest patch (12) all above, failed tests have passed: {code} [INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-hdfs --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts [INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.35 s - in org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts [INFO] Running org.apache.hadoop.hdfs.TestRollingUpgrade [INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 143.89 s - in org.apache.hadoop.hdfs.TestRollingUpgrade [INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 70.541 s - in org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure [INFO] Running org.apache.hadoop.hdfs.server.balancer.TestBalancer [INFO] Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 366.403 s - in org.apache.hadoop.hdfs.server.balancer.TestBalancer [INFO] Running org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS [INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.862 s - in org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS [INFO] [INFO] Results: [INFO] [INFO] Tests run: 117, Failures: 0, Errors: 0, Skipped: 0 [INFO] {code} was (Author: zvenczel): In my latest patch I fixed the TestEncryptionZonesWithKMS failure. With the latest patch (11) all above, failed tests have passed: {code} [INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-hdfs --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts [INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.35 s - in org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts [INFO] Running org.apache.hadoop.hdfs.TestRollingUpgrade [INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 143.89 s - in org.apache.hadoop.hdfs.TestRollingUpgrade [INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 70.541 s - in org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure [INFO] Running org.apache.hadoop.hdfs.server.balancer.TestBalancer [INFO] Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 366.403 s - in org.apache.hadoop.hdfs.server.balancer.TestBalancer [INFO] Running org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS [INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.862 s - in org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS [INFO] [INFO] Results: [INFO] [INFO] Tests run: 117, Failures: 0, Errors: 0, Skipped: 0 [INFO] {code} > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.11.patch, > HDFS-13697.12.patch, HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path].
[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609685#comment-16609685 ] Zsolt Venczel commented on HDFS-13697: -- In my latest patch I fixed the TestEncryptionZonesWithKMS failure. With the latest patch (11) all above, failed tests have passed: {code} [INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-hdfs --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts [INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.35 s - in org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts [INFO] Running org.apache.hadoop.hdfs.TestRollingUpgrade [INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 143.89 s - in org.apache.hadoop.hdfs.TestRollingUpgrade [INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure [INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 70.541 s - in org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure [INFO] Running org.apache.hadoop.hdfs.server.balancer.TestBalancer [INFO] Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 366.403 s - in org.apache.hadoop.hdfs.server.balancer.TestBalancer [INFO] Running org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS [INFO] Tests run: 45, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.862 s - in org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS [INFO] [INFO] Results: [INFO] [INFO] Tests run: 117, Failures: 0, Errors: 0, Skipped: 0 [INFO] {code} > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.11.patch, > HDFS-13697.12.patch, HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at >
[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13697: - Attachment: HDFS-13697.12.patch > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.11.patch, > HDFS-13697.12.patch, HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205) > at >
[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13697: - Attachment: HDFS-13697.11.patch > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.11.patch, > HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205) > at >
[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609125#comment-16609125 ] Zsolt Venczel commented on HDFS-13697: -- Thank you so much [~xyao] for the review! Please find my answers below: {quote}Line 408: The KMSCP is used by both client (DFSClient) and server (NN). authMethod ==PROXY is not a reliable way to cover all the proxy user cases. We could change line 408-409 to if (UserGroupInformation.getCurrentUser().getRealUser()!=null) {quote} I updated this section in my latest patch (11) based on your suggestions. {quote}Line 412: authMethod=TOKEN case Do we use the login user even if the current UGI has KMS delegation token? {quote} With the current approach we use the login user only if the authMethod at construction time was TOKEN. The potential issue that popped-up for me could be HADOOP-13381 that as I can see, after getting rid of the KP cache should no longer be a problem. I'll try to double check it though. What do you think? {quote}Line 484: NIT: can we wrap this with getCachedUgi() similar to getDoAsUser() to make future change easier? {quote} I updated it as you suggested. {quote}TestEncryptionZones.java Line 1340-1341: can be replaced with DFSTestUtil.mockDFSClientKeyProvider {quote} This is a good catch, thanks! After updating DFSTestUtil.mockDFSClientKeyProvider to cope with this scenario (had to mock dfs.dfs as well) I could use it in additional scenarios also. > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at
[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names
[ https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605831#comment-16605831 ] Zsolt Venczel commented on HDFS-13744: -- Thanks a lot [~mackrorysd] for the review and the fix! I was a bit puzzled on the specification about how to escape a CRLF properly as it's not specified exactly (there's an example to replace it character by character which is your approach but there's another example here: https://tools.ietf.org/html/rfc2234#section-2.3). >From a usability perspective I think you're approach is the best as it clearly >displays all special characters. For debugging purposes this is the most >valuable. Test failures are unrelated. > OIV tool should better handle control characters present in file or directory > names > --- > > Key: HDFS-13744 > URL: https://issues.apache.org/jira/browse/HDFS-13744 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, tools >Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3 >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Critical > Attachments: HDFS-13744.01.patch, HDFS-13744.02.patch, > HDFS-13744.03.patch > > > In certain cases when control characters or white space is present in file or > directory names OIV tool processors can export data in a misleading format. > In the below examples we have EXAMPLE_NAME as a file and a directory name > where the directory has a line feed character at the end (the actual > production case has multiple line feeds and multiple spaces) > * Delimited processor case: > ** misleading example: > {code:java} > /user/data/EXAMPLE_NAME > ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > * > ** expected example as suggested by > [https://tools.ietf.org/html/rfc4180#section-2]: > {code:java} > "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 > 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > * XML processor case: > ** misleading example: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME > 1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > * > ** expected example as specified in > [https://www.w3.org/TR/REC-xml/#sec-line-ends]: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > * JSON: > The OIV Web Processor behaves correctly and produces the following: > {code:java} > { > "FileStatuses": { > "FileStatus": [ > { > "fileId": 113632535, > "accessTime": 1494954320141, > "replication": 3, > "owner": "user", > "length": 520, > "permission": "674", > "blockSize": 134217728, > "modificationTime": 1472205657504, > "type": "FILE", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME" > }, > { > "fileId": 479867791, > "accessTime": 0, > "replication": 0, > "owner": "user", > "length": 0, > "permission": "775", > "blockSize": 0, > "modificationTime": 1493033668294, > "type": "DIRECTORY", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME\n" > } > ] > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598669#comment-16598669 ] Zsolt Venczel edited comment on HDFS-13697 at 8/31/18 12:20 PM: Thanks [~xiaochen] for pointing out that KMSClientProvider.createConnection still had morphing. I removed it with patch 10. was (Author: zvenczel): Thanks [~xiaochen] for pointing out that KMSClientProvider.createConnection still had morphing I removed with patch 10. > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at >
[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598669#comment-16598669 ] Zsolt Venczel commented on HDFS-13697: -- Thanks [~xiaochen] for pointing out that KMSClientProvider.createConnection still had morphing I removed with patch 10. > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94) > at >
[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13697: - Attachment: HDFS-13697.10.patch > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.09.patch, HDFS-13697.10.patch, HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205) > at >
[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names
[ https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597451#comment-16597451 ] Zsolt Venczel commented on HDFS-13744: -- Thank you very much for the review [~mackrorysd]. Please let me know if you have any preference about the direction this solution should take. In my latest patch I added StringUtils.CR support as you suggested. > OIV tool should better handle control characters present in file or directory > names > --- > > Key: HDFS-13744 > URL: https://issues.apache.org/jira/browse/HDFS-13744 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, tools >Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3 >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Critical > Attachments: HDFS-13744.01.patch, HDFS-13744.02.patch > > > In certain cases when control characters or white space is present in file or > directory names OIV tool processors can export data in a misleading format. > In the below examples we have EXAMPLE_NAME as a file and a directory name > where the directory has a line feed character at the end (the actual > production case has multiple line feeds and multiple spaces) > * Delimited processor case: > ** misleading example: > {code:java} > /user/data/EXAMPLE_NAME > ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > * > ** expected example as suggested by > [https://tools.ietf.org/html/rfc4180#section-2]: > {code:java} > "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 > 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > * XML processor case: > ** misleading example: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME > 1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > * > ** expected example as specified in > [https://www.w3.org/TR/REC-xml/#sec-line-ends]: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > * JSON: > The OIV Web Processor behaves correctly and produces the following: > {code:java} > { > "FileStatuses": { > "FileStatus": [ > { > "fileId": 113632535, > "accessTime": 1494954320141, > "replication": 3, > "owner": "user", > "length": 520, > "permission": "674", > "blockSize": 134217728, > "modificationTime": 1472205657504, > "type": "FILE", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME" > }, > { > "fileId": 479867791, > "accessTime": 0, > "replication": 0, > "owner": "user", > "length": 0, > "permission": "775", > "blockSize": 0, > "modificationTime": 1493033668294, > "type": "DIRECTORY", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME\n" > } > ] > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13744) OIV tool should better handle control characters present in file or directory names
[ https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13744: - Attachment: HDFS-13744.02.patch > OIV tool should better handle control characters present in file or directory > names > --- > > Key: HDFS-13744 > URL: https://issues.apache.org/jira/browse/HDFS-13744 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, tools >Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3 >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Critical > Attachments: HDFS-13744.01.patch, HDFS-13744.02.patch > > > In certain cases when control characters or white space is present in file or > directory names OIV tool processors can export data in a misleading format. > In the below examples we have EXAMPLE_NAME as a file and a directory name > where the directory has a line feed character at the end (the actual > production case has multiple line feeds and multiple spaces) > * Delimited processor case: > ** misleading example: > {code:java} > /user/data/EXAMPLE_NAME > ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > * > ** expected example as suggested by > [https://tools.ietf.org/html/rfc4180#section-2]: > {code:java} > "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 > 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > * XML processor case: > ** misleading example: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME > 1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > * > ** expected example as specified in > [https://www.w3.org/TR/REC-xml/#sec-line-ends]: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > * JSON: > The OIV Web Processor behaves correctly and produces the following: > {code:java} > { > "FileStatuses": { > "FileStatus": [ > { > "fileId": 113632535, > "accessTime": 1494954320141, > "replication": 3, > "owner": "user", > "length": 520, > "permission": "674", > "blockSize": 134217728, > "modificationTime": 1472205657504, > "type": "FILE", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME" > }, > { > "fileId": 479867791, > "accessTime": 0, > "replication": 0, > "owner": "user", > "length": 0, > "permission": "775", > "blockSize": 0, > "modificationTime": 1493033668294, > "type": "DIRECTORY", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME\n" > } > ] > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596512#comment-16596512 ] Zsolt Venczel commented on HDFS-13697: -- Thanks [~daryn] for the investigation and explanation and thanks [~xiaochen] for the continuous work and discussion! In my latest patch (09) I addressed the following: * *KMSClientProvider* No more morphing... The doAsUser is calculated at construction time and that's it. * *TestKMS* Based on Xiao's findings I fixed the key provider creation in the doProxyUserTest function to correctly test key creation by proxy users. * *TestAclsEndToEnd* I think the main issue with this test suite was that it was using the mini cluster dfs client for all of its operations. As we stopped morphing the problem had surfaced therefore I refactored it to use a truly end-to-end approach by having a proper client and a proper, client side key provider. The fat part of the changes are due to introducing a service user that needed the appropriate ACLs for the various testing scenarios. > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.09.patch, HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at >
[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13697: - Attachment: HDFS-13697.09.patch > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.09.patch, HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205) > at >
[jira] [Updated] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints
[ https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13731: - Attachment: HDFS-13731.03.patch > ReencryptionUpdater fails with ConcurrentModificationException during > processCheckpoints > > > Key: HDFS-13731 > URL: https://issues.apache.org/jira/browse/HDFS-13731 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, test >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13731-failure.log, HDFS-13731.01.patch, > HDFS-13731.02.patch, HDFS-13731.03.patch > > > HDFS-12837 fixed some flakiness of Reencryption related tests. But as > [~zvenczel]'s comment, there are a few timeouts still. We should investigate > that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints
[ https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593715#comment-16593715 ] Zsolt Venczel commented on HDFS-13731: -- Thanks for the review [~xiaochen]! In my latest patch I added the protection you suggested. Also I did some additional analysis and I found that the rest of the failure scenarios were due to the same ConcurrentModificationException therefore I removed the changes from the tests. Could not reproduce any failure after I've applied the patch on the latest trunk: http://dist-test.cloudera.org:80/job?job_id=hadoop-hdfs.zvenczel.1535377953.29868 > ReencryptionUpdater fails with ConcurrentModificationException during > processCheckpoints > > > Key: HDFS-13731 > URL: https://issues.apache.org/jira/browse/HDFS-13731 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, test >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13731-failure.log, HDFS-13731.01.patch, > HDFS-13731.02.patch > > > HDFS-12837 fixed some flakiness of Reencryption related tests. But as > [~zvenczel]'s comment, there are a few timeouts still. We should investigate > that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints
[ https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13731: - Attachment: HDFS-13731.02.patch > ReencryptionUpdater fails with ConcurrentModificationException during > processCheckpoints > > > Key: HDFS-13731 > URL: https://issues.apache.org/jira/browse/HDFS-13731 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, test >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13731-failure.log, HDFS-13731.01.patch, > HDFS-13731.02.patch > > > HDFS-12837 fixed some flakiness of Reencryption related tests. But as > [~zvenczel]'s comment, there are a few timeouts still. We should investigate > that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped
[ https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591436#comment-16591436 ] Zsolt Venczel edited comment on HDFS-13846 at 8/24/18 10:15 AM: Thanks [~knanasi] for creating this issue, I think it's a great catch! In the description the term "node" is a bit confusing to me. These are the "real data blocks" you are referring to right? I like how you extended the already available mocking approach in the unit tests. When I applied the test it was failing but it did pass with your proposed fix therefore I think it should be valid. Overall I think it's a valid change, +1 (non-binding) from me. was (Author: zvenczel): Thanks [~knanasi] for creating this issue, I think it's a great catch! In the description the term "node" is a bit confusing to me. These are the "real data blocks" you are referring to right? I like how you extended the already available mocking approach in the unit tests. When I applied the test they were failing but they did pass with your proposed fix therefore I think they should be valid. Overall I think it's a valid change, +1 (non-binding) from me. > Safe blocks counter is not decremented correctly if the block is striped > > > Key: HDFS-13846 > URL: https://issues.apache.org/jira/browse/HDFS-13846 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Attachments: HDFS-13846.001.patch > > > In BlockManagerSafeMode class, the "safe blocks" counter is incremented if > the number of nodes containing the block equals to the number of data units > specified by the erasure coding policy, which looks like this in the code: > {code:java} > final int safe = storedBlock.isStriped() ? > ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : > safeReplication; > if (storageNum == safe) { > this.blockSafe++; > {code} > But when it is decremented the code does not check if the block is striped or > not, just compares the number of nodes containing the block with 0 > (safeReplication - 1) if the block is complete, which is not correct. > {code:java} > if (storedBlock.isComplete() && > blockManager.countNodes(b).liveReplicas() == safeReplication - 1) { > this.blockSafe--; > assert blockSafe >= 0; > checkSafeMode(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13846) Safe blocks counter is not decremented correctly if the block is striped
[ https://issues.apache.org/jira/browse/HDFS-13846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591436#comment-16591436 ] Zsolt Venczel commented on HDFS-13846: -- Thanks [~knanasi] for creating this issue, I think it's a great catch! In the description the term "node" is a bit confusing to me. These are the "real data blocks" you are referring to right? I like how you extended the already available mocking approach in the unit tests. When I applied the test they were failing but they did pass with your proposed fix therefore I think they should be valid. Overall I think it's a valid change, +1 (non-binding) from me. > Safe blocks counter is not decremented correctly if the block is striped > > > Key: HDFS-13846 > URL: https://issues.apache.org/jira/browse/HDFS-13846 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Attachments: HDFS-13846.001.patch > > > In BlockManagerSafeMode class, the "safe blocks" counter is incremented if > the number of nodes containing the block equals to the number of data units > specified by the erasure coding policy, which looks like this in the code: > {code:java} > final int safe = storedBlock.isStriped() ? > ((BlockInfoStriped)storedBlock).getRealDataBlockNum() : > safeReplication; > if (storageNum == safe) { > this.blockSafe++; > {code} > But when it is decremented the code does not check if the block is striped or > not, just compares the number of nodes containing the block with 0 > (safeReplication - 1) if the block is complete, which is not correct. > {code:java} > if (storedBlock.isComplete() && > blockManager.countNodes(b).liveReplicas() == safeReplication - 1) { > this.blockSafe--; > assert blockSafe >= 0; > checkSafeMode(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13752) fs.Path stores file path in java.net.URI causes big memory waste
[ https://issues.apache.org/jira/browse/HDFS-13752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590382#comment-16590382 ] Zsolt Venczel commented on HDFS-13752: -- Thanks for the patch [~b.maidics] and thanks for posting the review we talked about [~gabor.bota]! A few additional thoughts from my side: * The Path class is used within all services of HDFS eg. the DataNode and NameNode. The impact on these components would be tremendous. Introducing SoftReference in a NameNode would induce some unwanted GC behavior especially in larger scale clusters (the small file problem would be even more imminent). This off course needs to be measured therefore some initial metrics would be great. * The toURI is used in Hadoop 2.7.6 in 237 places and ~20 sub-components. In Hadoop trunk this number is much larger. Please revisit your calculations. By giving a thought about the initial problem I could imagine something that lives on the client side only and tries to introduce some caching by either extending the Path class or transforming it to something more convenient. > fs.Path stores file path in java.net.URI causes big memory waste > > > Key: HDFS-13752 > URL: https://issues.apache.org/jira/browse/HDFS-13752 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs >Affects Versions: 2.7.6 > Environment: Hive 2.1.1 and hadoop 2.7.6 >Reporter: Barnabas Maidics >Priority: Major > Attachments: HDFS-13752.001.patch, HDFS-13752.002.patch, > HDFS-13752.003.patch, Screen Shot 2018-07-20 at 11.12.38.png, > heapdump-10partitions.html, measurement.pdf > > > I was looking at HiveServer2 memory usage, and a big percentage of this was > because of org.apache.hadoop.fs.Path, where you store file paths in a > java.net.URI object. The URI implementation stores the same string in 3 > different objects (see the attached image). In Hive when there are many > partitions this cause a big memory usage. In my particular case 42% of memory > was used by java.net.URI so it could be reduced to 14%. > I wonder if the community is open to replace it with a more memory efficient > implementation and what other things should be considered here? It can be a > huge memory improvement for Hadoop and for Hive as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints
[ https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13731: - Attachment: HDFS-13731.01.patch > ReencryptionUpdater fails with ConcurrentModificationException during > processCheckpoints > > > Key: HDFS-13731 > URL: https://issues.apache.org/jira/browse/HDFS-13731 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, test >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13731-failure.log, HDFS-13731.01.patch > > > HDFS-12837 fixed some flakiness of Reencryption related tests. But as > [~zvenczel]'s comment, there are a few timeouts still. We should investigate > that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints
[ https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13731: - Status: Patch Available (was: In Progress) > ReencryptionUpdater fails with ConcurrentModificationException during > processCheckpoints > > > Key: HDFS-13731 > URL: https://issues.apache.org/jira/browse/HDFS-13731 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, test >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13731-failure.log, HDFS-13731.01.patch > > > HDFS-12837 fixed some flakiness of Reencryption related tests. But as > [~zvenczel]'s comment, there are a few timeouts still. We should investigate > that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints
[ https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590033#comment-16590033 ] Zsolt Venczel commented on HDFS-13731: -- With my patch applied the test passes: http://dist-test.cloudera.org:80/job?job_id=hadoop-hdfs.zvenczel.1535016717.29829 > ReencryptionUpdater fails with ConcurrentModificationException during > processCheckpoints > > > Key: HDFS-13731 > URL: https://issues.apache.org/jira/browse/HDFS-13731 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, test >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13731-failure.log > > > HDFS-12837 fixed some flakiness of Reencryption related tests. But as > [~zvenczel]'s comment, there are a few timeouts still. We should investigate > that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints
[ https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590030#comment-16590030 ] Zsolt Venczel commented on HDFS-13731: -- While investigating the above timeouts I found the following concurrency issue: * while the ReencryptionUpdate.processCheckpoints method is executing and removing tasks from the task list * on a different thread a new re-encryption task can be added to the same task list by calling ReencryptionHandler.submitCurrentBatch that calls ZoneSubmissionTracker.addTask My latest patch contains a proposal to prevent this. I've attached the full log produced for the issue. The important section where the *processCheckpoints* iterations are still running but a new ZoneSubmissionTracker task is being added: {code:java} 2018-08-22 17:16:01,535 INFO FSTreeTraverser - Submitted batch (start:/zones/zone/0, size:5) of zone 16387 to re-encrypt. 2018-08-22 17:16:01,535 INFO ReencryptionHandler - Processing batched re-encryption for zone 16387, batch size 5, start:/zones/zone/0 2018-08-22 17:16:01,536 INFO ReencryptionHandler - Completed re-encrypting one batch of 5 edeks from KMS, time consumed: 922873, start: /zones/zone/0. 2018-08-22 17:16:01,536 INFO ReencryptionUpdater - Processing returned re-encryption task for zone /zones/zone(16387), batch size 5, start:/zones/zone/0 2018-08-22 17:16:01,536 DEBUG ReencryptionUpdater - Updating file xattrs for re-encrypting zone /zones/zone, starting at /zones/zone/0 2018-08-22 17:16:01,536 TRACE ReencryptionUpdater - Updating 16388 for re-encryption. 2018-08-22 17:16:01,536 TRACE ReencryptionUpdater - Updating 16389 for re-encryption. 2018-08-22 17:16:01,536 TRACE ReencryptionUpdater - Updating 16390 for re-encryption. 2018-08-22 17:16:01,536 TRACE ReencryptionUpdater - Updating 16391 for re-encryption. 2018-08-22 17:16:01,536 TRACE ReencryptionUpdater - Updating 16392 for re-encryption. 2018-08-22 17:16:01,536 INFO ReencryptionUpdater - Updated xattrs on 5(5) files in zone /zones/zone for re-encryption, starting:/zones/zone/0. 2018-08-22 17:16:01,536 DEBUG ReencryptionUpdater - Updating re-encryption checkpoint with completed task. last: /zones/zone/4 size:5. 2018-08-22 17:16:01,536 INFO FSTreeTraverser - Submitted batch (start:/zones/zone/5, size:5) of zone 16387 to re-encrypt. 2018-08-22 17:16:01,536 INFO ReencryptionHandler - Processing batched re-encryption for zone 16387, batch size 5, start:/zones/zone/5 2018-08-22 17:16:01,537 ERROR ReencryptionUpdater - Re-encryption updater thread exiting. java.util.ConcurrentModificationException at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966) at java.util.LinkedList$ListItr.remove(LinkedList.java:921) at org.apache.hadoop.hdfs.server.namenode.ReencryptionUpdater.processCheckpoints(ReencryptionUpdater.java:411) at org.apache.hadoop.hdfs.server.namenode.ReencryptionUpdater.processTask(ReencryptionUpdater.java:488) at org.apache.hadoop.hdfs.server.namenode.ReencryptionUpdater.takeAndProcessTasks(ReencryptionUpdater.java:437) at org.apache.hadoop.hdfs.server.namenode.ReencryptionUpdater.run(ReencryptionUpdater.java:264) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-08-22 17:16:01,537 INFO ReencryptionHandler - Submission completed of zone 16387 for re-encryption. {code} Which results in cancelling the re-encryption tasks: {code:java} 2018-08-22 17:16:51,612 INFO ReencryptionUpdater - Cancelling 2 re-encryption tasks ... 2018-08-22 17:16:51,621 INFO ReencryptionUpdater - Cancelling 2 re-encryption tasks {code} My uploaded patch fixes two other test related issues: * sometimes in the testRestartAfterReencryptAndCheckpoint fs.saveNamespace() call was performing slow therefore we should wait for it to finish the operation * cancelFutureDuringReencryption method introduced a race condition as at {code:java} callableRunning.set(true); Thread.sleep(Long.MAX_VALUE);{code} between setting the callableRunning to true and sleeping the thread a concurrent modification can happen in rare cases. > ReencryptionUpdater fails with ConcurrentModificationException during > processCheckpoints > > > Key: HDFS-13731 > URL: https://issues.apache.org/jira/browse/HDFS-13731 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, test >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13731-failure.log > > > HDFS-12837 fixed some flakiness of Reencryption related tests. But as > [~zvenczel]'s comment, there are a few timeouts still.
[jira] [Updated] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints
[ https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13731: - Attachment: HDFS-13731-failure.log > ReencryptionUpdater fails with ConcurrentModificationException during > processCheckpoints > > > Key: HDFS-13731 > URL: https://issues.apache.org/jira/browse/HDFS-13731 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, test >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13731-failure.log > > > HDFS-12837 fixed some flakiness of Reencryption related tests. But as > [~zvenczel]'s comment, there are a few timeouts still. We should investigate > that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13731) ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints
[ https://issues.apache.org/jira/browse/HDFS-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13731: - Summary: ReencryptionUpdater fails with ConcurrentModificationException during processCheckpoints (was: Investigate TestReencryption timeouts) > ReencryptionUpdater fails with ConcurrentModificationException during > processCheckpoints > > > Key: HDFS-13731 > URL: https://issues.apache.org/jira/browse/HDFS-13731 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, test >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Zsolt Venczel >Priority: Major > > HDFS-12837 fixed some flakiness of Reencryption related tests. But as > [~zvenczel]'s comment, there are a few timeouts still. We should investigate > that. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-13843) RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI
[ https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-13843 started by Zsolt Venczel. > RBF: When we add/update mount entry to multiple destinations, unable to see > the order information in mount entry points and in federation router UI > --- > > Key: HDFS-13843 > URL: https://issues.apache.org/jira/browse/HDFS-13843 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Reporter: Soumyapn >Assignee: Zsolt Venczel >Priority: Major > Labels: RBF > > *Scenario:* > Execute the below add/update command for single mount entry for single > nameservice pointing to multiple destinations. > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1 > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3 > # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order > RANDOM > *Actual*. With the above commands, mount entry is successfully updated. > But order information like HASH, RANDOM is not displayed in mount entries and > also not displayed in federation router UI. However order information is > updated properly when there are multiple nameservices. This issue is with > single nameservice having multiple destinations. > *Expected:* > *Order information should be updated in mount entries so that the user will > come to know which order has been set.* > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-13843) RBF: When we add/update mount entry to multiple destinations, unable to see the order information in mount entry points and in federation router UI
[ https://issues.apache.org/jira/browse/HDFS-13843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel reassigned HDFS-13843: Assignee: Zsolt Venczel > RBF: When we add/update mount entry to multiple destinations, unable to see > the order information in mount entry points and in federation router UI > --- > > Key: HDFS-13843 > URL: https://issues.apache.org/jira/browse/HDFS-13843 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Reporter: Soumyapn >Assignee: Zsolt Venczel >Priority: Major > Labels: RBF > > *Scenario:* > Execute the below add/update command for single mount entry for single > nameservice pointing to multiple destinations. > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1 > # hdfs dfsrouteradmin -add /apps1 hacluster /tmp1,/tmp2,/tmp3 > # hdfs dfsrouteradmin -update /apps1 hacluster /tmp1,/tmp2,/tmp3 -order > RANDOM > *Actual*. With the above commands, mount entry is successfully updated. > But order information like HASH, RANDOM is not displayed in mount entries and > also not displayed in federation router UI. However order information is > updated properly when there are multiple nameservices. This issue is with > single nameservice having multiple destinations. > *Expected:* > *Order information should be updated in mount entries so that the user will > come to know which order has been set.* > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names
[ https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582928#comment-16582928 ] Zsolt Venczel commented on HDFS-13744: -- I could not reproduce the above test failure with or without the patch therefore it should be unrelated: {code} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 74.774 s - in org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy [INFO] [INFO] Results: [INFO] [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0 [INFO] {code} > OIV tool should better handle control characters present in file or directory > names > --- > > Key: HDFS-13744 > URL: https://issues.apache.org/jira/browse/HDFS-13744 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, tools >Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3 >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Critical > Attachments: HDFS-13744.01.patch > > > In certain cases when control characters or white space is present in file or > directory names OIV tool processors can export data in a misleading format. > In the below examples we have EXAMPLE_NAME as a file and a directory name > where the directory has a line feed character at the end (the actual > production case has multiple line feeds and multiple spaces) > * Delimited processor case: > ** misleading example: > {code:java} > /user/data/EXAMPLE_NAME > ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > * > ** expected example as suggested by > [https://tools.ietf.org/html/rfc4180#section-2]: > {code:java} > "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 > 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > * XML processor case: > ** misleading example: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME > 1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > * > ** expected example as specified in > [https://www.w3.org/TR/REC-xml/#sec-line-ends]: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > * JSON: > The OIV Web Processor behaves correctly and produces the following: > {code:java} > { > "FileStatuses": { > "FileStatus": [ > { > "fileId": 113632535, > "accessTime": 1494954320141, > "replication": 3, > "owner": "user", > "length": 520, > "permission": "674", > "blockSize": 134217728, > "modificationTime": 1472205657504, > "type": "FILE", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME" > }, > { > "fileId": 479867791, > "accessTime": 0, > "replication": 0, > "owner": "user", > "length": 0, > "permission": "775", > "blockSize": 0, > "modificationTime": 1493033668294, > "type": "DIRECTORY", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME\n" > } > ] > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13744) OIV tool should better handle control characters present in file or directory names
[ https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13744: - Status: Patch Available (was: In Progress) > OIV tool should better handle control characters present in file or directory > names > --- > > Key: HDFS-13744 > URL: https://issues.apache.org/jira/browse/HDFS-13744 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, tools >Affects Versions: 3.0.3, 2.7.6, 2.8.4, 2.9.1, 2.6.5 >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Critical > Attachments: HDFS-13744.01.patch > > > In certain cases when control characters or white space is present in file or > directory names OIV tool processors can export data in a misleading format. > In the below examples we have EXAMPLE_NAME as a file and a directory name > where the directory has a line feed character at the end (the actual > production case has multiple line feeds and multiple spaces) > * Delimited processor case: > ** misleading example: > {code:java} > /user/data/EXAMPLE_NAME > ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > * > ** expected example as suggested by > [https://tools.ietf.org/html/rfc4180#section-2]: > {code:java} > "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 > 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > * XML processor case: > ** misleading example: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME > 1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > * > ** expected example as specified in > [https://www.w3.org/TR/REC-xml/#sec-line-ends]: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > * JSON: > The OIV Web Processor behaves correctly and produces the following: > {code:java} > { > "FileStatuses": { > "FileStatus": [ > { > "fileId": 113632535, > "accessTime": 1494954320141, > "replication": 3, > "owner": "user", > "length": 520, > "permission": "674", > "blockSize": 134217728, > "modificationTime": 1472205657504, > "type": "FILE", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME" > }, > { > "fileId": 479867791, > "accessTime": 0, > "replication": 0, > "owner": "user", > "length": 0, > "permission": "775", > "blockSize": 0, > "modificationTime": 1493033668294, > "type": "DIRECTORY", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME\n" > } > ] > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13744) OIV tool should better handle control characters present in file or directory names
[ https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13744: - Description: In certain cases when control characters or white space is present in file or directory names OIV tool processors can export data in a misleading format. In the below examples we have EXAMPLE_NAME as a file and a directory name where the directory has a line feed character at the end (the actual production case has multiple line feeds and multiple spaces) * Delimited processor case: ** misleading example: {code:java} /user/data/EXAMPLE_NAME ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group {code} * ** expected example as suggested by [https://tools.ietf.org/html/rfc4180#section-2]: {code:java} "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group {code} * XML processor case: ** misleading example: {code:java} 479867791DIRECTORYEXAMPLE_NAME 1493033668294user:group:0775 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 {code} * ** expected example as specified in [https://www.w3.org/TR/REC-xml/#sec-line-ends]: {code:java} 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 {code} * JSON: The OIV Web Processor behaves correctly and produces the following: {code:java} { "FileStatuses": { "FileStatus": [ { "fileId": 113632535, "accessTime": 1494954320141, "replication": 3, "owner": "user", "length": 520, "permission": "674", "blockSize": 134217728, "modificationTime": 1472205657504, "type": "FILE", "group": "group", "childrenNum": 0, "pathSuffix": "EXAMPLE_NAME" }, { "fileId": 479867791, "accessTime": 0, "replication": 0, "owner": "user", "length": 0, "permission": "775", "blockSize": 0, "modificationTime": 1493033668294, "type": "DIRECTORY", "group": "group", "childrenNum": 0, "pathSuffix": "EXAMPLE_NAME\n" } ] } } {code} was: In certain cases when control characters or white space is present in file or directory names OIV tool processors can export data in a misleading format. In the below examples we have EXAMPLE_NAME as a file and a directory name where the directory has a line feed character at the end (the actual production case has multiple line feeds and multiple spaces) * CSV processor case: ** misleading example: {code:java} /user/data/EXAMPLE_NAME ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group {code} ** expected example as suggested by [https://tools.ietf.org/html/rfc4180#section-2]: {code:java} "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group {code} * XML processor case: ** misleading example: {code:java} 479867791DIRECTORYEXAMPLE_NAME 1493033668294user:group:0775 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 {code} ** expected example as specified in [https://www.w3.org/TR/REC-xml/#sec-line-ends]: {code:java} 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 {code} * JSON: The OIV Web Processor behaves correctly and produces the following: {code:java} { "FileStatuses": { "FileStatus": [ { "fileId": 113632535, "accessTime": 1494954320141, "replication": 3, "owner": "user", "length": 520, "permission": "674", "blockSize": 134217728, "modificationTime": 1472205657504, "type": "FILE", "group": "group", "childrenNum": 0, "pathSuffix": "EXAMPLE_NAME" }, { "fileId": 479867791, "accessTime": 0, "replication": 0, "owner": "user", "length": 0, "permission": "775", "blockSize": 0, "modificationTime": 1493033668294, "type": "DIRECTORY", "group": "group", "childrenNum": 0, "pathSuffix": "EXAMPLE_NAME\n" } ] } } {code} > OIV tool should better handle control characters present in file or directory > names > --- > > Key: HDFS-13744
[jira] [Commented] (HDFS-13744) OIV tool should better handle control characters present in file or directory names
[ https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582703#comment-16582703 ] Zsolt Venczel commented on HDFS-13744: -- After doing some more analysis it turns out that very few CSV and XML clients are following the LF character encoding specifications. This can have the following impact: * For the XML processor: Escaping the LF character following the specification can distort an XML parser to correctly reproduce a file name. It can also modify filenames when using the ReverseXML processor. *I would not recommend escaping here.* * For the Delimited processor: The output of the Delimited processor is handy for report creation and grepping where a wrongly displayed filename or directory name having LF can cause more problems than the appearance of an escaped LF character therefore *I would recommend escaping in this scenario*. In my uploaded patch I added escaping for the Delimited processor only. > OIV tool should better handle control characters present in file or directory > names > --- > > Key: HDFS-13744 > URL: https://issues.apache.org/jira/browse/HDFS-13744 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, tools >Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3 >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Critical > Attachments: HDFS-13744.01.patch > > > In certain cases when control characters or white space is present in file or > directory names OIV tool processors can export data in a misleading format. > In the below examples we have EXAMPLE_NAME as a file and a directory name > where the directory has a line feed character at the end (the actual > production case has multiple line feeds and multiple spaces) > * CSV processor case: > ** misleading example: > {code:java} > /user/data/EXAMPLE_NAME > ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > ** expected example as suggested by > [https://tools.ietf.org/html/rfc4180#section-2]: > {code:java} > "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 > 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > * XML processor case: > ** misleading example: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME > 1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > ** expected example as specified in > [https://www.w3.org/TR/REC-xml/#sec-line-ends]: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > * JSON: > The OIV Web Processor behaves correctly and produces the following: > {code:java} > { > "FileStatuses": { > "FileStatus": [ > { > "fileId": 113632535, > "accessTime": 1494954320141, > "replication": 3, > "owner": "user", > "length": 520, > "permission": "674", > "blockSize": 134217728, > "modificationTime": 1472205657504, > "type": "FILE", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME" > }, > { > "fileId": 479867791, > "accessTime": 0, > "replication": 0, > "owner": "user", > "length": 0, > "permission": "775", > "blockSize": 0, > "modificationTime": 1493033668294, > "type": "DIRECTORY", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME\n" > } > ] > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13744) OIV tool should better handle control characters present in file or directory names
[ https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13744: - Attachment: HDFS-13744.01.patch > OIV tool should better handle control characters present in file or directory > names > --- > > Key: HDFS-13744 > URL: https://issues.apache.org/jira/browse/HDFS-13744 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, tools >Affects Versions: 2.6.5, 2.9.1, 2.8.4, 2.7.6, 3.0.3 >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Critical > Attachments: HDFS-13744.01.patch > > > In certain cases when control characters or white space is present in file or > directory names OIV tool processors can export data in a misleading format. > In the below examples we have EXAMPLE_NAME as a file and a directory name > where the directory has a line feed character at the end (the actual > production case has multiple line feeds and multiple spaces) > * CSV processor case: > ** misleading example: > {code:java} > /user/data/EXAMPLE_NAME > ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > ** expected example as suggested by > [https://tools.ietf.org/html/rfc4180#section-2]: > {code:java} > "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 > 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group > "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 > 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group > {code} > * XML processor case: > ** misleading example: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME > 1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > ** expected example as specified in > [https://www.w3.org/TR/REC-xml/#sec-line-ends]: > {code:java} > 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775 > 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 > {code} > * JSON: > The OIV Web Processor behaves correctly and produces the following: > {code:java} > { > "FileStatuses": { > "FileStatus": [ > { > "fileId": 113632535, > "accessTime": 1494954320141, > "replication": 3, > "owner": "user", > "length": 520, > "permission": "674", > "blockSize": 134217728, > "modificationTime": 1472205657504, > "type": "FILE", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME" > }, > { > "fileId": 479867791, > "accessTime": 0, > "replication": 0, > "owner": "user", > "length": 0, > "permission": "775", > "blockSize": 0, > "modificationTime": 1493033668294, > "type": "DIRECTORY", > "group": "group", > "childrenNum": 0, > "pathSuffix": "EXAMPLE_NAME\n" > } > ] > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13744) OIV tool should better handle control characters present in file or directory names
[ https://issues.apache.org/jira/browse/HDFS-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13744: - Description: In certain cases when control characters or white space is present in file or directory names OIV tool processors can export data in a misleading format. In the below examples we have EXAMPLE_NAME as a file and a directory name where the directory has a line feed character at the end (the actual production case has multiple line feeds and multiple spaces) * CSV processor case: ** misleading example: {code:java} /user/data/EXAMPLE_NAME ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group {code} ** expected example as suggested by [https://tools.ietf.org/html/rfc4180#section-2]: {code:java} "/user/data/EXAMPLE_NAME%x0A",0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group {code} * XML processor case: ** misleading example: {code:java} 479867791DIRECTORYEXAMPLE_NAME 1493033668294user:group:0775 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 {code} ** expected example as specified in [https://www.w3.org/TR/REC-xml/#sec-line-ends]: {code:java} 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 {code} * JSON: The OIV Web Processor behaves correctly and produces the following: {code:java} { "FileStatuses": { "FileStatus": [ { "fileId": 113632535, "accessTime": 1494954320141, "replication": 3, "owner": "user", "length": 520, "permission": "674", "blockSize": 134217728, "modificationTime": 1472205657504, "type": "FILE", "group": "group", "childrenNum": 0, "pathSuffix": "EXAMPLE_NAME" }, { "fileId": 479867791, "accessTime": 0, "replication": 0, "owner": "user", "length": 0, "permission": "775", "blockSize": 0, "modificationTime": 1493033668294, "type": "DIRECTORY", "group": "group", "childrenNum": 0, "pathSuffix": "EXAMPLE_NAME\n" } ] } } {code} was: In certain cases when control characters or white space is present in file or directory names OIV tool processors can export data in a misleading format. In the below examples we have EXAMPLE_NAME as a file and a directory name where the directory has a line feed character at the end (the actual production case has multiple line feeds and multiple spaces) * CSV processor case: ** misleading example: {code:java} /user/data/EXAMPLE_NAME ,0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group /user/data/EXAMPLE_NAME,2016-08-26 03:00,2017-05-16 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group {code} ** expected example as suggested by [https://tools.ietf.org/html/rfc4180#section-2]: {code:java} "/user/data/EXAMPLE_NAME%x0D",0,2017-04-24 04:34,1969-12-31 16:00,0,0,0,-1,-1,drwxrwxr-x+,user,group "/user/data/EXAMPLE_NAME",2016-08-26 03:00,2017-05-16 10:05,134217728,1,520,0,0,-rw-rwxr--+,user,group {code} * XML processor case: ** misleading example: {code:java} 479867791DIRECTORYEXAMPLE_NAME 1493033668294user:group:0775 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 {code} ** expected example as specified in [https://www.w3.org/TR/REC-xml/#sec-line-ends]: {code:java} 479867791DIRECTORYEXAMPLE_NAME#xA1493033668294user:group:0775 113632535FILEEXAMPLE_NAME314722056575041494954320141134217728user:group:0674 {code} * JSON: The OIV Web Processor behaves correctly and produces the following: {code:java} { "FileStatuses": { "FileStatus": [ { "fileId": 113632535, "accessTime": 1494954320141, "replication": 3, "owner": "user", "length": 520, "permission": "674", "blockSize": 134217728, "modificationTime": 1472205657504, "type": "FILE", "group": "group", "childrenNum": 0, "pathSuffix": "EXAMPLE_NAME" }, { "fileId": 479867791, "accessTime": 0, "replication": 0, "owner": "user", "length": 0, "permission": "775", "blockSize": 0, "modificationTime": 1493033668294, "type": "DIRECTORY", "group": "group", "childrenNum": 0, "pathSuffix": "EXAMPLE_NAME\n" } ] } } {code} > OIV tool should better handle control characters present in file or directory > names > --- > > Key: HDFS-13744 >
[jira] [Commented] (HDFS-13732) Erasure Coding policy name is not coming when the new policy is set
[ https://issues.apache.org/jira/browse/HDFS-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581061#comment-16581061 ] Zsolt Venczel commented on HDFS-13732: -- Failed tests are passing locally with or without the patch therefore they should be unrelated: {code} [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal [INFO] Tests run: 38, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.179 s - in org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal [INFO] Running org.apache.hadoop.hdfs.TestSafeModeWithStripedFile [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 25.843 s - in org.apache.hadoop.hdfs.TestSafeModeWithStripedFile [INFO] Running org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus [INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 40.357 s - in org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus [INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.797 s - in org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations [INFO] Running org.apache.hadoop.hdfs.TestMaintenanceState [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 350.793 s - in org.apache.hadoop.hdfs.TestMaintenanceState [INFO] [INFO] Results: [INFO] [INFO] Tests run: 75, Failures: 0, Errors: 0, Skipped: 0 {code} > Erasure Coding policy name is not coming when the new policy is set > --- > > Key: HDFS-13732 > URL: https://issues.apache.org/jira/browse/HDFS-13732 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.0.0 >Reporter: Soumyapn >Assignee: Zsolt Venczel >Priority: Trivial > Attachments: EC_Policy.PNG, HDFS-13732.01.patch > > > Scenerio: > If the new policy apart from the default EC policy is set for the HDFS > directory, then the console message is coming as "Set default erasure coding > policy on " > Expected output: > It would be good If the EC policy name is displayed when the policy is set... > > Actual output: > Set default erasure coding policy on > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13732) Erasure Coding policy name is not coming when the new policy is set
[ https://issues.apache.org/jira/browse/HDFS-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13732: - Status: Patch Available (was: In Progress) > Erasure Coding policy name is not coming when the new policy is set > --- > > Key: HDFS-13732 > URL: https://issues.apache.org/jira/browse/HDFS-13732 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.0.0 >Reporter: Soumyapn >Assignee: Zsolt Venczel >Priority: Trivial > Attachments: EC_Policy.PNG, HDFS-13732.01.patch > > > Scenerio: > If the new policy apart from the default EC policy is set for the HDFS > directory, then the console message is coming as "Set default erasure coding > policy on " > Expected output: > It would be good If the EC policy name is displayed when the policy is set... > > Actual output: > Set default erasure coding policy on > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13732) Erasure Coding policy name is not coming when the new policy is set
[ https://issues.apache.org/jira/browse/HDFS-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13732: - Attachment: HDFS-13732.01.patch > Erasure Coding policy name is not coming when the new policy is set > --- > > Key: HDFS-13732 > URL: https://issues.apache.org/jira/browse/HDFS-13732 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.0.0 >Reporter: Soumyapn >Assignee: Zsolt Venczel >Priority: Trivial > Attachments: EC_Policy.PNG, HDFS-13732.01.patch > > > Scenerio: > If the new policy apart from the default EC policy is set for the HDFS > directory, then the console message is coming as "Set default erasure coding > policy on " > Expected output: > It would be good If the EC policy name is displayed when the policy is set... > > Actual output: > Set default erasure coding policy on > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13732) Erasure Coding policy name is not coming when the new policy is set
[ https://issues.apache.org/jira/browse/HDFS-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579574#comment-16579574 ] Zsolt Venczel commented on HDFS-13732: -- Hi [~SoumyaPN], Thanks for reporting the issue! Can you please extend the description of this issue with: 1) exact command you are executing 2) exact actual output 3) exact expected output Many thanks, Zsolt > Erasure Coding policy name is not coming when the new policy is set > --- > > Key: HDFS-13732 > URL: https://issues.apache.org/jira/browse/HDFS-13732 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, tools >Affects Versions: 3.0.0 >Reporter: Soumyapn >Assignee: Zsolt Venczel >Priority: Trivial > Attachments: EC_Policy.PNG > > > Scenerio: > If the new policy apart from the default EC policy is set for the HDFS > directory, then the console message is coming as "Set default erasure coding > policy on " > Expected output: > It would be good If the EC policy name is displayed when the policy is set... > > Actual output: > Set default erasure coding policy on > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578717#comment-16578717 ] Zsolt Venczel edited comment on HDFS-13697 at 8/13/18 6:00 PM: --- Test failures seem to be unrelated, I could not reproduce locally with or without my patch: {code:java} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 132.276 s - in org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks [INFO] Running org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 54.574 s - in org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy [INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.645 s - in org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner [INFO] Running org.apache.hadoop.tracing.TestTracing [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.591 s - in org.apache.hadoop.tracing.TestTracing [INFO] [INFO] Results: [INFO] [INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0 [INFO] {code} was (Author: zvenczel): Test failures seem to be unrelated, I could not reproduce locally with or without my commit: {code:java} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 132.276 s - in org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks [INFO] Running org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 54.574 s - in org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy [INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.645 s - in org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner [INFO] Running org.apache.hadoop.tracing.TestTracing [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.591 s - in org.apache.hadoop.tracing.TestTracing [INFO] [INFO] Results: [INFO] [INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0 [INFO] {code} > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at >
[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578717#comment-16578717 ] Zsolt Venczel commented on HDFS-13697: -- Test failures seem to be unrelated, I could not reproduce locally with or without my commit: {code:java} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 132.276 s - in org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks [INFO] Running org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 54.574 s - in org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy [INFO] Running org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.645 s - in org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner [INFO] Running org.apache.hadoop.tracing.TestTracing [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.591 s - in org.apache.hadoop.tracing.TestTracing [INFO] [INFO] Results: [INFO] [INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0 [INFO] {code} > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at >
[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578191#comment-16578191 ] Zsolt Venczel commented on HDFS-13697: -- Hi [~xiaochen], Thanks a lot for working on this and providing your solution in the prelim patch. While investigating the proposal to cache the ugi and prevent morphing I came across the same set of failing tests your approach touched. The most interesting one is HDFS-9295 (full spectrum test by [~templedf]) in org.apache.hadoop.hdfs.TestAclsEndToEnd. This test suite does a test on all possible, expected variations about morphing and I found that the following tests are not compatible with the cached ugi approach: testGoodWithWhitelistWithoutBlacklist, testGoodWithKeyAcls, testGoodWithWhitelist, testGoodWithKeyAclsWithoutBlacklist As these use cases are around for a while I'd expect them to be used widely and hard to avoid. What do you think? I've uploaded a new patch (v08) where I factored out the keyProvider injection for the DFSClient to happen via Mockito only. Also reverted the TestKMS as you suggested leaving the HADOOP-13749 changes in. > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at >
[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13697: - Attachment: HDFS-13697.08.patch > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch, HDFS-13697.08.patch, > HDFS-13697.prelim.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205) > at >
[jira] [Commented] (HDFS-13770) dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted
[ https://issues.apache.org/jira/browse/HDFS-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578169#comment-16578169 ] Zsolt Venczel commented on HDFS-13770: -- Thanks for the update [~knanasi], patch v003 looks good, +1 (non-binding) from me. > dfsadmin -report does not always decrease "missing blocks (with replication > factor 1)" metrics when file is deleted > --- > > Key: HDFS-13770 > URL: https://issues.apache.org/jira/browse/HDFS-13770 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.7 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Attachments: HDFS-13770-branch-2.001.patch, > HDFS-13770-branch-2.002.patch, HDFS-13770-branch-2.003.patch > > > Missing blocks (with replication factor 1) metric is not always decreased > when file is deleted. > If a file is deleted, the remove function of UnderReplicatedBlocks can be > called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called > with the wrong priority the corruptReplOneBlocks metric is not decreased, > however the block is removed from the priority queue which contains it. > The corresponding code: > {code:java} > /** remove a block from a under replication queue */ > synchronized boolean remove(BlockInfo block, > int oldReplicas, > int oldReadOnlyReplicas, > int decommissionedReplicas, > int oldExpectedReplicas) { > final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas, > decommissionedReplicas, oldExpectedReplicas); > boolean removedBlock = remove(block, priLevel); > if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS && > oldExpectedReplicas == 1 && > removedBlock) { > corruptReplOneBlocks--; > assert corruptReplOneBlocks >= 0 : > "Number of corrupt blocks with replication factor 1 " + > "should be non-negative"; > } > return removedBlock; > } > /** > * Remove a block from the under replication queues. > * > * The priLevel parameter is a hint of which queue to query > * first: if negative or = \{@link #LEVEL} this shortcutting > * is not attmpted. > * > * If the block is not found in the nominated queue, an attempt is made to > * remove it from all queues. > * > * Warning: This is not a synchronized method. > * @param block block to remove > * @param priLevel expected privilege level > * @return true if the block was found and removed from one of the priority > queues > */ > boolean remove(BlockInfo block, int priLevel) { > if(priLevel >= 0 && priLevel < LEVEL > && priorityQueues.get(priLevel).remove(block)) { > NameNode.blockStateChangeLog.debug( > "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" + > " from priority queue {}", block, priLevel); > return true; > } else { > // Try to remove the block from all queues if the block was > // not found in the queue for the given priority level. > for (int i = 0; i < LEVEL; i++) { > if (i != priLevel && priorityQueues.get(i).remove(block)) { > NameNode.blockStateChangeLog.debug( > "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" + > " {} from priority queue {}", block, i); > return true; > } > } > } > return false; > } > {code} > It is already fixed on trunk by this jira: HDFS-10999, but that ticket > introduces new metrics, which I think should't be backported to branch-2. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13788) Update EC documentation about rack fault tolerance
[ https://issues.apache.org/jira/browse/HDFS-13788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578168#comment-16578168 ] Zsolt Venczel commented on HDFS-13788: -- Thanks [~xiaochen] for reporting this issue and thanks [~knanasi] for working on the patch. The updated documentation seems to be fine. +1 (non-binding) from me. > Update EC documentation about rack fault tolerance > -- > > Key: HDFS-13788 > URL: https://issues.apache.org/jira/browse/HDFS-13788 > Project: Hadoop HDFS > Issue Type: Task > Components: documentation, erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Kitti Nanasi >Priority: Major > Attachments: HDFS-13788.001.patch > > > From > http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html: > {quote} > For rack fault-tolerance, it is also important to have at least as many racks > as the configured EC stripe width. For EC policy RS (6,3), this means > minimally 9 racks, and ideally 10 or 11 to handle planned and unplanned > outages. For clusters with fewer racks than the stripe width, HDFS cannot > maintain rack fault-tolerance, but will still attempt to spread a striped > file across multiple nodes to preserve node-level fault-tolerance. > {quote} > Theoretical minimum is 3 racks, and ideally 9 or more, so the document should > be updated. > (I didn't check timestamps, but this is probably due to > {{BlockPlacementPolicyRackFaultTolerant}} isn't completely done when > HDFS-9088 introduced this doc. Later there's also examples in > {{TestErasureCodingMultipleRacks}} to test this explicitly.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13770) dfsadmin -report does not always decrease "missing blocks (with replication factor 1)" metrics when file is deleted
[ https://issues.apache.org/jira/browse/HDFS-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576460#comment-16576460 ] Zsolt Venczel commented on HDFS-13770: -- Hi [~knanasi], Thanks for working on this and thank you for the latest patch. Your changes seem to be fine for me. I've checked, the test does fail without the fix and passes with the fix applied. I found a few checkstyle issues for UnderReplicatedBlocks line 265 and TestUnderReplicatedBlocks line 164, 175 and 186. Best regards, Zsolt > dfsadmin -report does not always decrease "missing blocks (with replication > factor 1)" metrics when file is deleted > --- > > Key: HDFS-13770 > URL: https://issues.apache.org/jira/browse/HDFS-13770 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.7.7 >Reporter: Kitti Nanasi >Assignee: Kitti Nanasi >Priority: Major > Attachments: HDFS-13770-branch-2.001.patch, > HDFS-13770-branch-2.002.patch > > > Missing blocks (with replication factor 1) metric is not always decreased > when file is deleted. > If a file is deleted, the remove function of UnderReplicatedBlocks can be > called with the wrong priority (UnderReplicatedBlocks.LEVEL), if it is called > with the wrong priority the corruptReplOneBlocks metric is not decreased, > however the block is removed from the priority queue which contains it. > The corresponding code: > {code:java} > /** remove a block from a under replication queue */ > synchronized boolean remove(BlockInfo block, > int oldReplicas, > int oldReadOnlyReplicas, > int decommissionedReplicas, > int oldExpectedReplicas) { > final int priLevel = getPriority(oldReplicas, oldReadOnlyReplicas, > decommissionedReplicas, oldExpectedReplicas); > boolean removedBlock = remove(block, priLevel); > if (priLevel == QUEUE_WITH_CORRUPT_BLOCKS && > oldExpectedReplicas == 1 && > removedBlock) { > corruptReplOneBlocks--; > assert corruptReplOneBlocks >= 0 : > "Number of corrupt blocks with replication factor 1 " + > "should be non-negative"; > } > return removedBlock; > } > /** > * Remove a block from the under replication queues. > * > * The priLevel parameter is a hint of which queue to query > * first: if negative or = \{@link #LEVEL} this shortcutting > * is not attmpted. > * > * If the block is not found in the nominated queue, an attempt is made to > * remove it from all queues. > * > * Warning: This is not a synchronized method. > * @param block block to remove > * @param priLevel expected privilege level > * @return true if the block was found and removed from one of the priority > queues > */ > boolean remove(BlockInfo block, int priLevel) { > if(priLevel >= 0 && priLevel < LEVEL > && priorityQueues.get(priLevel).remove(block)) { > NameNode.blockStateChangeLog.debug( > "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block {}" + > " from priority queue {}", block, priLevel); > return true; > } else { > // Try to remove the block from all queues if the block was > // not found in the queue for the given priority level. > for (int i = 0; i < LEVEL; i++) { > if (i != priLevel && priorityQueues.get(i).remove(block)) { > NameNode.blockStateChangeLog.debug( > "BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block" + > " {} from priority queue {}", block, i); > return true; > } > } > } > return false; > } > {code} > It is already fixed on trunk by this jira: HDFS-10999, but that ticket > introduces new metrics, which I think should't be backported to branch-2. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13697: - Attachment: HDFS-13697.07.patch > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch, HDFS-13697.07.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388) > at >
[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570178#comment-16570178 ] Zsolt Venczel commented on HDFS-13697: -- Thanks a lot [~xiaochen] for the support on this task. It's a challenging one indeed :) Please find my answers below: {quote}Ideally we want to do the same as DFSClient, where a ugi of {{UGI#getCurrentUser}} is just cached at construction time, and used for later auths. I tried that but it caused test failures in TestKMS with the {{doWebHDFSProxyUserTest}} tests and {{testTGTRenewal}} - for the sake of compatibility I think we can do something like this to allow the tests to pass. {code:java} // in KMSCP ctor ugi = UserGroupInformation.getCurrentUser().getRealUser() == null ? UserGroupInformation.getCurrentUser() : UserGroupInformation.getCurrentUser().getRealUser(); {code} [~daryn] [~xyao] [~jnp] what do you think? {quote} The tests are failing because with the above approach we are not supporting the scenario when the user component provides new entitlements for KMS interactions through a doAs call (eg. calls the 'createConnection' function implicitly having a proxy user provided in a doAs context). If we do want to be compatible, caching at construction time the UGI is not enough. {quote} We don't need cachedProxyUgi, and getDoAsUser can figure things out from the ugi cached if we do the above {quote} I was trying to introduce some clean code here by defining explicitly under what circumstances can we have a cachedProxyUgi and by this I also moved one computation to the constructor level instead of having many on the getDoAsUser level. Does this make sense? {quote} ugiToUse doesn't seem necessary {quote} I was trying to make the code more meaningful and also to support the above mentioned, proxy scenario we still need to check whether the current call (currentUgi) introduces any proxy ugi. {quote} Could you explain why the setLoginUser lines were removed in TestKMS? I'd like to make sure existing tests pass as-is, if possible. {quote} I've reverted HADOOP-13749 and these lines were introduced by it. I'm not sure if it makes sense to set the login user even after the revert. What do you think? {quote} the new com.google imports should be placed next to other existing imports of that module. {quote} Thanks for checking, I've fixed it in my latest patch. {quote} I would not call the KeyProvider variable testKeyProvider - it's used for all purposes. Just the VisibleForTesting annotation on setKeyProvider would be enough, which you already have. {quote} Yes, it makes sense, I've fixed it in my latest patch. On a long run I might refactor these test cases to use Mockito to reduce production code complexity. {quote} The new patch's KeyProviderSupplier#isKeyProviderCreated doesn't seem necessary. We can't prevent the caller calling getKeyProvider after calling close here from that check. (We probably can add a guard in DFSClient to prevent all API calls after close, but that's separate from this jira.) {quote} KeyProviderSupplier#isKeyProviderCreated is the only way to know for sure whether KeyProvider got instantiated or not. If we call keyProviderCache.get() in the close method we might end up with an unnecessary creation of a KeyProvider. I agree that we should take care of any post closure calls separately. {quote} Although callers seem to have check about nullity of the provider, if DFSClient failed to create a key provider, it's preferred to throw immediately. {quote} I was trying to reproduce the already available behavior present in the KeyProviderCache that had returned a null and had emitted warn level log messages. Should we change that? > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, >
[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13697: - Attachment: HDFS-13697.06.patch > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch, > HDFS-13697.06.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388) > at >
[jira] [Commented] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560799#comment-16560799 ] Zsolt Venczel commented on HDFS-13697: -- Thank you [~xiaochen] for doing the review, much appreciated! Please find my answers below: {quote} * Per our discussion, we should be just caching UGI#getCurrentUser() at ctor. * Because doAsUser depends on the UGI, I think it would also make sense to cache that String at ctor.{quote} In order to support proxy users and functionalities introduced by HADOOP-10698 the current user and the doAsUser string cannot be cached. HADOOP-10698 does an in flight calculation as well at line 385 that I was trying to consolidate in the getDoAsUser function to reuse logic. Also, this feature requirement is being double checked by TestKMS#testProxyUserKerb and TestKMS#testProxyUserSimple tests. Please let me know if I misunderstood the intentions in the code. {quote} * Do we really need the supplier? It seems for each client the keyprovider will only be created once. If so I'd suggest we avoid caching the Supplier here. * {code:java} public KeyProvider getKeyProvider() { return provider==null ? keyProviderSupplier.get() : provider; } {code} need to handle the race condition here that multiple threads calling this method may end up creating more than 1 provider. {quote} Suppliers#memoize caches the output (the KeyProvider instance in this case) of the supplier and not the supplier. Also does this in a thread safe way not to create more than 1 provider. {quote} * trivial, SafeModeAction changes are unrelated{quote} Thanks for checking I've removed it in the latest patch. {quote} can do a VisibleForTesting setKeyProvider method, so TestEncryptionZones and TestReservedRawPaths don't have to be modified. {quote} Thanks for the hint, it was a remnant of a revert commit. I updated the patch as you suggested. In the latest patch I added a check to close the keyProvider only if it was created and also made the test key provider more explicit by renaming it to "testKeyProvider". > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at >
[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13697: - Summary: DFSClient should instantiate and cache KMSClientProvider using UGI at creation time for consistent UGI handling (was: DFSClient should instantiate and cache KMSClientProvider at creation time for consistent UGI handling) > DFSClient should instantiate and cache KMSClientProvider using UGI at > creation time for consistent UGI handling > --- > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205)
[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13697: - Attachment: HDFS-13697.05.patch > DFSClient should instantiate and cache KMSClientProvider at creation time for > consistent UGI handling > - > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch, HDFS-13697.05.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388) > at > org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1440) >
[jira] [Updated] (HDFS-13697) DFSClient should instantiate and cache KMSClientProvider at creation time for consistent UGI handling
[ https://issues.apache.org/jira/browse/HDFS-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zsolt Venczel updated HDFS-13697: - Attachment: (was: HDFS-13697.05.patch) > DFSClient should instantiate and cache KMSClientProvider at creation time for > consistent UGI handling > - > > Key: HDFS-13697 > URL: https://issues.apache.org/jira/browse/HDFS-13697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zsolt Venczel >Assignee: Zsolt Venczel >Priority: Major > Attachments: HDFS-13697.01.patch, HDFS-13697.02.patch, > HDFS-13697.03.patch, HDFS-13697.04.patch > > > While calling KeyProviderCryptoExtension decryptEncryptedKey the call stack > might not have doAs privileged execution call (in the DFSClient for example). > This results in loosing the proxy user from UGI as UGI.getCurrentUser finds > no AccessControllerContext and does a re-login for the login user only. > This can cause the following for example: if we have set up the oozie user to > be entitled to perform actions on behalf of example_user but oozie is > forbidden to decrypt any EDEK (for security reasons), due to the above issue, > example_user entitlements are lost from UGI and the following error is > reported: > {code} > [0] > SERVER[xxx] USER[example_user] GROUP[-] TOKEN[] APP[Test_EAR] > JOB[0020905-180313191552532-oozie-oozi-W] > ACTION[0020905-180313191552532-oozie-oozi-W@polling_dir_path] Error starting > action [polling_dir_path]. ErrorType [ERROR], ErrorCode [FS014], Message > [FS014: User [oozie] is not authorized to perform [DECRYPT_EEK] on key with > ACL name [encrypted_key]!!] > org.apache.oozie.action.ActionExecutorException: FS014: User [oozie] is not > authorized to perform [DECRYPT_EEK] on key with ACL name [encrypted_key]!! > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:441) > at > org.apache.oozie.action.hadoop.FsActionExecutor.touchz(FsActionExecutor.java:523) > at > org.apache.oozie.action.hadoop.FsActionExecutor.doOperations(FsActionExecutor.java:199) > at > org.apache.oozie.action.hadoop.FsActionExecutor.start(FsActionExecutor.java:563) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: org.apache.hadoop.security.authorize.AuthorizationException: User > [oozie] is not authorized to perform [DECRYPT_EEK] on key with ACL name > [encrypted_key]!! > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:157) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:607) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:565) > at > org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:832) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:209) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$5.call(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94) > at > org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:205) > at > org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388) > at > org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1440) > at >