Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk
Sorry the formatting got messed by my email client. Here it is again Dear Hadoop Community Members, We had multiple community discussions, a few meetings in smaller groups and also jira discussions with respect to this thread. We express our gratitude for participation and valuable comments. The key questions raised were following 1) How the new block storage layer and OzoneFS benefit HDFS and we were asked to chalk out a roadmap towards the goal of a scalable namenode working with the new storage layer 2) We were asked to provide a security design 3)There were questions around stability given ozone brings in a large body of code. 4) Why can’t they be separate projects forever or merged in when production ready? We have responded to all the above questions with detailed explanations and answers on the jira as well as in the discussions. We believe that should sufficiently address community’s concerns. Please see the summary below: 1) The new code base benefits HDFS scaling and a roadmap has been provided. Summary: - New block storage layer addresses the scalability of the block layer. We have shown how existing NN can be connected to the new block layer and its benefits. We have shown 2 milestones, 1st milestone is much simpler than 2nd milestone while giving almost the same scaling benefits. Originally we had proposed simply milestone 2 and the community felt that removing the FSN/BM lock was was a fair amount of work and a simpler solution would be useful - We provide a new K-V namespace called Ozone FS with FileSystem/FileContext plugins to allow the users to use the new system. BTW Hive and Spark work very well on KV-namespaces on the cloud. This will facilitate stabilizing the new block layer. - The new block layer has a new netty based protocol engine in the Datanode which, when stabilized, can be used by the old hdfs block layer. See details below on sharing of code. 2) Stability impact on the existing HDFS code base and code separation. The new block layer and the OzoneFS are in modules that are separate from old HDFS code - currently there are no calls from HDFS into Ozone except for DN starting the new block layer module if configured to do so. It does not add instability (the instability argument has been raised many times). Over time as we share code, we will ensure that the old HDFS continues to remains stable. (for example we plan to stabilize the new netty based protocol engine in the new block layer before sharing it with HDFS’s old block layer) 3) In the short term and medium term, the new system and HDFS will be used side-by-side by users. Side by-side usage in the short term for testing and side-by-side in the medium term for actual production use till the new system has feature parity with old HDFS. During this time, sharing the DN daemon and admin functions between the two systems is operationally important: - Sharing DN daemon to avoid additional operational daemon lifecycle management - Common decommissioning of the daemon and DN: One place to decommission for a node and its storage. - Replacing failed disks and internal balancing capacity across disks - this needs to be done for both the current HDFS blocks and the new block-layer blocks. - Balancer: we would like use the same balancer and provide a common way to balance and common management of the bandwidth used for balancing - Security configuration setup - reuse existing set up for DNs rather then a new one for an independent cluster. 4) Need to easily share the block layer code between the two systems when used side-by-side. Areas where sharing code is desired over time: - Sharing new block layer’s new netty based protocol engine for old HDFS DNs (a long time sore issue for HDFS block layer). - Shallow data copy from old system to new system is practical only if within same project and daemon otherwise have to deal with security setting and coordinations across daemons. Shallow copy is useful as customer migrate from old to new. - Shared disk scheduling in the future and in the short term have a single round robin rather than independent round robins. While sharing code across projects is technically possible (anything is possible in software), it is significantly harder typically requiring cleaner public apis etc. Sharing within a project though internal APIs is often simpler (such as the protocol engine that we want to share). 5) Security design, including a threat model and and the solution has been posted. 6) Temporary Separation and merge later: Several of the comments in the jira have argued that we temporarily separate the two code bases for now and then later merge them when the new code is stable: - If there is agreement to merge later, why bother separating now - there needs to be to be good reasons to separate now. We have addressed the stability and separation of the new code from existing above. - Merge
[jira] [Created] (HADOOP-15231) WavefrontSink for Hadoop Metrics2
Howard Yoo created HADOOP-15231: --- Summary: WavefrontSink for Hadoop Metrics2 Key: HADOOP-15231 URL: https://issues.apache.org/jira/browse/HADOOP-15231 Project: Hadoop Common Issue Type: Wish Components: metrics Reporter: Howard Yoo Wavefront is a SaaS based large scale real time metrics monitoring and analytic system capable of monitoring many different source systems. There are several Sinks available in Hadoop in order to capture various metrics to external system. The Wavefront data format follows a similar format as that of Graphite, with the addition of native point tag support and source value. The details are outline here: [https://docs.wavefront.com/wavefront_data_format.html] It would be greatly helpful for Wavefront to have a native integration with Hadoop using Wavefront Sink to collect in its native data format. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk
Dear Hadoop Community Members, We had multiple community discussions, a few meetings in smaller groups and also jira discussions with respect to this thread. We express our gratitude for participation and valuable comments. The key questions raised were following How the new block storage layer and OzoneFS benefit HDFS and we were asked to chalk out a roadmap towards the goal of a scalable namenode working with the new storage layer We were asked to provide a security design There were questions around stability given ozone brings in a large body of code. Why can’t they be separate projects forever or merged in when production ready? We have responded to all the above questions with detailed explanations and answers on the jira as well as in the discussions. We believe that should sufficiently address community’s concerns. Please see the summary below: The new code base benefits to HDFS scaling and a roadmap has been provided. Summary: New block storage layer addresses the scalability of the block layer. We have shown how existing NN can be connected to the new block layer and its benefits. We have shown 2 milestones, 1st milestone is much simpler than 2nd milestone while giving almost the same scaling benefits. Originally we had proposed simply milestone 2 and the community felt that removing the FSN/BM lock was was a fair amount of work and a simpler solution would be useful. We provide a new K-V namespace called Ozone FS with FileSystem/FileContext plugins to allow the users to use the new system. BTW Hive and Spark work very well on KV-namespaces on the cloud. This will facilitate stabilizing the new block layer. The new block layer has a new netty based protocol engine in the Datanode which, when stabilized, can be used by the old hdfs block layer. See details below on sharing of code. Stability impact on the existing HDFS code base and code separation. The new block layer and the OzoneFS are in modules that are separate from old HDFS code - currently there are no calls from HDFS into Ozone except for DN starting the new block layer module if configured to do so. It does not add instability (the instability argument has been raised many times). Over time as we share code, we will ensure that the old HDFS continues to remains stable. (for example we plan to stabilize the new netty based protocol engine in the new block layer before sharing it with HDFS’s old block layer) In the short term and medium term, the new system and HDFS will be used side-by-side by users. Side by-side usage in the short term for testing and side-by-side in the medium term for actual production use till the new system has feature parity with old HDFS. During this time, sharing the DN daemon and admin functions between the two systems is operationally important: Sharing DN daemon to avoid additional operational daemon lifecycle management Common decommissioning of the daemon and DN: One place to decommission for a node and its storage. Replacing failed disks and internal balancing capacity across disks - this needs to be done for both the current HDFS blocks and the new block-layer blocks. Balancer: we would like use the same balancer and provide a common way to balance and common management of the bandwidth used for balancing Security configuration setup - reuse existing set up for DNs rather then a new one for an independent cluster. Need to easily share the block layer code between the two systems when used side-by-side. Areas where sharing code is desired over time: Sharing new block layer’s new netty based protocol engine for old HDFS DNs (a long time sore issue for HDFS block layer). Shallow data copy from old system to new system is practical only if within same project and daemon otherwise have to deal with security setting and coordinations across daemons. Shallow copy is useful as customer migrate from old to new. Shared disk scheduling in the future and in the short term have a single round robin rather than independent round robins. While sharing code across projects is technically possible (anything is possible in software), it is significantly harder typically requiring cleaner public apis etc. Sharing within a project though internal APIs is often simpler (such as the protocol engine that we want to share). Security design, including a threat model and and the solution has been posted. Temporary Separation and merge later: Several of the comments in the jira have argued that we temporarily separate the two code bases for now and then later merge them when the new code is stable: If there is agreement to merge later, why bother separating now - there needs to be to be good reasons to separate now. We have addressed the stability and separation of the new code from existing above. Merge the new code back into HDFS later will be harder. The code and goals will diverge further. We will be taking on extra work to split and then take extra work to
[jira] [Created] (HADOOP-15230) org.apache.hadoop.metrics2.GraphiteSink is not implemented correctly
Howard Yoo created HADOOP-15230: --- Summary: org.apache.hadoop.metrics2.GraphiteSink is not implemented correctly Key: HADOOP-15230 URL: https://issues.apache.org/jira/browse/HADOOP-15230 Project: Hadoop Common Issue Type: Bug Components: metrics Reporter: Howard Yoo org.apache.hadoop.metrics2.GraphiteSink's implementation has certain problems that would make it to generate metrics incorrectly. The problem lies with line 77 ~ 84 of the GraphiteSink java: {code:java} for (MetricsTag tag : record.tags()) { if (tag.value() != null) { metricsPathPrefix.append("."); metricsPathPrefix.append(tag.name()); metricsPathPrefix.append("="); metricsPathPrefix.append(tag.value()); } } {code} It produces point tags having name=value pair in the metrics. However, notice how the tags are added with '.' as its delimiters. Rather than using the '.' character, it should follow the following convention mentioned in the latest graphite doc of using ';' character. [http://graphite.readthedocs.io/en/latest/tags.html] Also, the value is not properly being escaped, meaning that if the value has a '.' character in it, it will easily confuse Graphite to accept it as a delimiter, rather than the value. A really good prime example is when the value is a hostname or ip address, {code:java} metrics.example.Hostname=this.is.a.hostname.and.this.is.Metrics 10.0{code} In this example, the since the value of the hostname contains '.', it is extremely hard for the receiving end to determine which part is hostname and which part is the rest of the metrics name. A good strategy is to convert any '.' character in the value to be converted to other characters, such as '_'. However, the best way would be to follow the latest metrics convention of using ';' -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14530) Translate AWS SSE-KMS missing key exception to something
[ https://issues.apache.org/jira/browse/HADOOP-14530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-14530. - Resolution: Won't Fix Fix Version/s: 3.1.0 There's nothing meaningful to change it to; documented the meaning instead (no key, key in different region) > Translate AWS SSE-KMS missing key exception to something > > > Key: HADOOP-14530 > URL: https://issues.apache.org/jira/browse/HADOOP-14530 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Priority: Minor > Fix For: 3.1.0 > > > when you use SSE-KMS and the ARN is invalid for that region, you get a 400 > bad request exception + special error text "KMS.NotFoundException".This could > be a special exception -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14325) Stabilise S3A Server Side Encryption
[ https://issues.apache.org/jira/browse/HADOOP-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-14325. - Resolution: Fixed Fix Version/s: 3.1.0 > Stabilise S3A Server Side Encryption > > > Key: HADOOP-14325 > URL: https://issues.apache.org/jira/browse/HADOOP-14325 > Project: Hadoop Common > Issue Type: Task > Components: documentation, fs/s3, test >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Priority: Major > Fix For: 3.1.0 > > > Round off the S3 SSE encryption support with everything needed to safely ship > it. > The core code is in, along with tests, so this covers the details > * docs with examples, including JCEKS files > * keeping secrets secret > * any more tests, including scale ones (huge file, rename) > * I'll add a KMS test to my (github) spark suite -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14332) Document S3A SSE
[ https://issues.apache.org/jira/browse/HADOOP-14332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-14332. - Resolution: Duplicate Assignee: Steve Loughran > Document S3A SSE > > > Key: HADOOP-14332 > URL: https://issues.apache.org/jira/browse/HADOOP-14332 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > > Go into some detail about effective (secure) use of S3 SSE > * features, benefits, limitations > * how to lock down a bucket > * what to avoid (SSE-C; mixed-security buckets) > * performance impact > * what the new stack traces mean -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15228) S3A Retry policy to retry on NoResponseException
Steve Loughran created HADOOP-15228: --- Summary: S3A Retry policy to retry on NoResponseException Key: HADOOP-15228 URL: https://issues.apache.org/jira/browse/HADOOP-15228 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.1.0 Reporter: Steve Loughran treat`org.apache.http.NoHttpResponseException: hwdev-rajesh-new2.s3.amazonaws.com:443 failed to respond` to something which can be retried on idempotent calls. Need to handle shaded as well as unshaded binding here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15227) add mapreduce.outputcommitter.factory.scheme.s3a to core-default
Steve Loughran created HADOOP-15227: --- Summary: add mapreduce.outputcommitter.factory.scheme.s3a to core-default Key: HADOOP-15227 URL: https://issues.apache.org/jira/browse/HADOOP-15227 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.1.0 Reporter: Steve Loughran Assignee: Steve Loughran Need to add this property to core-default.xml. It's documented as being there, but it isn't. {code} mapreduce.outputcommitter.factory.scheme.s3a org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory The committer factory to use when writing data to S3A filesystems. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/135/ [Feb 12, 2018 3:02:42 PM] (arp) HDFS-10453. ReplicationMonitor thread could stuck for long time due to [Feb 12, 2018 4:55:44 PM] (brahma) HDFS-8693. Addendum patch to execute the command using UGI. Contributed [Feb 12, 2018 8:30:42 PM] (jlowe) MAPREDUCE-7048. Uber AM can crash due to unknown task in statusUpdate. -1 overall The following subsystems voted -1: asflicense unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Unreaped Processes : hadoop-common:1 hadoop-hdfs:28 bkjournal:5 hadoop-yarn-server-resourcemanager:1 hadoop-yarn-client:8 hadoop-yarn-applications-distributedshell:1 hadoop-mapreduce-client-app:2 hadoop-mapreduce-client-jobclient:13 hadoop-distcp:4 hadoop-extras:1 Failed junit tests : hadoop.hdfs.server.datanode.TestReadOnlySharedStorage hadoop.hdfs.TestDatanodeDeath hadoop.hdfs.TestSetrepIncreasing hadoop.hdfs.TestDataTransferProtocol hadoop.hdfs.server.datanode.TestIncrementalBrVariations hadoop.hdfs.TestDFSFinalize hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime hadoop.yarn.server.TestDiskFailures hadoop.yarn.client.TestGetGroups hadoop.yarn.client.TestResourceManagerAdministrationProtocolPBClientImpl hadoop.mapred.TestJavaSerialization hadoop.mapred.TestClientRedirect hadoop.mapred.TestReduceFetch hadoop.mapred.TestLocalJobSubmission hadoop.mapreduce.security.TestBinaryTokenFile hadoop.mapred.TestFileInputFormatPathFilter hadoop.mapred.TestTextInputFormat hadoop.mapreduce.security.TestMRCredentials hadoop.fs.TestDFSIO hadoop.mapred.TestJobSysDirWithDFS hadoop.tools.TestIntegration hadoop.tools.TestDistCpViewFs hadoop.yarn.sls.appmaster.TestAMSimulator hadoop.resourceestimator.solver.impl.TestLpSolver hadoop.resourceestimator.service.TestResourceEstimatorService Timed out junit tests : org.apache.hadoop.log.TestLogLevel org.apache.hadoop.hdfs.TestLeaseRecovery2 org.apache.hadoop.hdfs.TestDatanodeRegistration org.apache.hadoop.hdfs.TestBlocksScheduledCounter org.apache.hadoop.hdfs.TestDFSClientFailover org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.web.TestWebHdfsTokens org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream org.apache.hadoop.hdfs.TestFileAppendRestart org.apache.hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter org.apache.hadoop.hdfs.TestSeekBug org.apache.hadoop.hdfs.TestDatanodeReport org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.hdfs.web.TestWebHDFSXAttr org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs org.apache.hadoop.hdfs.TestTrashWithSecureEncryptionZones org.apache.hadoop.hdfs.TestDFSRollback org.apache.hadoop.hdfs.TestMiniDFSCluster org.apache.hadoop.hdfs.TestDistributedFileSystem org.apache.hadoop.hdfs.web.TestWebHDFSForHA org.apache.hadoop.hdfs.TestBalancerBandwidth org.apache.hadoop.hdfs.TestTrashWithEncryptionZones org.apache.hadoop.hdfs.TestSetTimes org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.web.TestWebHDFSAcl org.apache.hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM org.apache.hadoop.contrib.bkjournal.TestBookKeeperJournalManager org.apache.hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir org.apache.hadoop.contrib.bkjournal.TestBookKeeperSpeculativeRead org.apache.hadoop.yarn.server.resourcemanager.TestRMHA org.apache.hadoop.yarn.client.TestRMFailover org.apache.hadoop.yarn.client.cli.TestYarnCLI org.apache.hadoop.yarn.client.TestApplicationMasterServiceProtocolOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.client.api.impl.TestYarnClientWithReservation org.apache.hadoop.yarn.client.api.impl.TestYarnClient org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.client.api.impl.TestNMClient org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup org.apache.hadoop.mapreduce.v2.app.TestJobEndNotifier
[jira] [Resolved] (HADOOP-14077) Improve the patch of HADOOP-13119
[ https://issues.apache.org/jira/browse/HADOOP-14077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved HADOOP-14077. Resolution: Fixed This has already been part of a release. Please leave it resolved. > Improve the patch of HADOOP-13119 > - > > Key: HADOOP-14077 > URL: https://issues.apache.org/jira/browse/HADOOP-14077 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu >Priority: Major > Fix For: 3.0.0-alpha4 > > Attachments: HADOOP-14077.001.patch, HADOOP-14077.002.patch, > HADOOP-14077.003.patch > > > For some links(such as "/jmx, /stack"), blocking the links in filter chain > due to impersonation issue is not friendly for users. For example, user "sam" > is not allowed to be impersonated by user "knox", and the link "/jmx" doesn't > need any user to do authorization by default. It only needs user "knox" to do > authentication, in this case, it's not right to block the access in SPNEGO > filter. We intend to check impersonation permission when the method > "getRemoteUser" of request is used, so that such kind of links("/jmx, > /stack") would not be blocked by mistake. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14620) S3A authentication failure for regions other than us-east-1
[ https://issues.apache.org/jira/browse/HADOOP-14620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-14620. - Resolution: Works for Me Fix Version/s: 3.1.0 > S3A authentication failure for regions other than us-east-1 > --- > > Key: HADOOP-14620 > URL: https://issues.apache.org/jira/browse/HADOOP-14620 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.3 >Reporter: Ilya Fourmanov >Priority: Minor > Fix For: 3.1.0 > > Attachments: s3-403.txt > > > hadoop fs s3a:// operations fail authentication for s3 buckets hosted in > regions other than default us-east-1 > Steps to reproduce: > # create s3 bucket in eu-west-1 > # Using IAM instance profile or fs.s3a.access.key/fs.s3a.secret.key run > following command: > {code} > hadoop --loglevel DEBUG -D fs.s3a.endpoint=s3.eu-west-1.amazonaws.com -ls > s3a://your-eu-west-1-hosted-bucket/ > {code} > Expected behaviour: > You will see listing of the bucket > Actual behaviour: > You will get 403 Authentication Denied response for AWS S3. > Reason is mismatch in string to sign as defined in > http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html > provided by hadoop and expected by AWS. > If you use https://aws.amazon.com/code/199 to analyse StringToSignBytes > returned by AWS, you will see that AWS expects CanonicalizedResource to be in > form > /your-eu-west-1-hosted-bucket{color:red}.s3.eu-west-1.amazonaws.com{color}/. > Hadoop provides it as /your-eu-west-1-hosted-bucket/ > Note that AWS documentation doesn't explicitly state that endpoint or full > dns address should be appended to CanonicalizedResource however practice > shows it is actually required. > I've also submitted this to AWS for them to correct behaviour or > documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-13308) S3A delete and rename may fail to preserve parent directory.
[ https://issues.apache.org/jira/browse/HADOOP-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-13308. - Resolution: Won't Fix > S3A delete and rename may fail to preserve parent directory. > > > Key: HADOOP-13308 > URL: https://issues.apache.org/jira/browse/HADOOP-13308 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Chris Nauroth >Priority: Minor > > When a file or directory is deleted or renamed in S3A, and the result of that > operation makes the parent empty, S3A must store a fake directory (a pure > metadata object) at the parent to indicate that the directory still exists. > The logic for restoring fake directories is not resilient to a process death. > This may cause a directory to vanish unexpectedly after a deletion or rename > of its last child. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-14077) Improve the patch of HADOOP-13119
[ https://issues.apache.org/jira/browse/HADOOP-14077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reopened HADOOP-14077: > Improve the patch of HADOOP-13119 > - > > Key: HADOOP-14077 > URL: https://issues.apache.org/jira/browse/HADOOP-14077 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu >Priority: Major > Fix For: 3.0.0-alpha4 > > Attachments: HADOOP-14077.001.patch, HADOOP-14077.002.patch, > HADOOP-14077.003.patch > > > For some links(such as "/jmx, /stack"), blocking the links in filter chain > due to impersonation issue is not friendly for users. For example, user "sam" > is not allowed to be impersonated by user "knox", and the link "/jmx" doesn't > need any user to do authorization by default. It only needs user "knox" to do > authentication, in this case, it's not right to block the access in SPNEGO > filter. We intend to check impersonation permission when the method > "getRemoteUser" of request is used, so that such kind of links("/jmx, > /stack") would not be blocked by mistake. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15226) Über-JIRA: S3Guard Phase III: Hadoop 3.2 features
Steve Loughran created HADOOP-15226: --- Summary: Über-JIRA: S3Guard Phase III: Hadoop 3.2 features Key: HADOOP-15226 URL: https://issues.apache.org/jira/browse/HADOOP-15226 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 3.0.0, 3.1.0 Reporter: Steve Loughran S3Guard features/improvements/fixes for Hadoop 3.2 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-13713) ITestS3AContractRootDir.testRmEmptyRootDirNonRecursive failing intermittently
[ https://issues.apache.org/jira/browse/HADOOP-13713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-13713. - Resolution: Cannot Reproduce Target Version/s: 3.1.0 > ITestS3AContractRootDir.testRmEmptyRootDirNonRecursive failing intermittently > - > > Key: HADOOP-13713 > URL: https://issues.apache.org/jira/browse/HADOOP-13713 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: s3 ireland >Reporter: Steve Loughran >Priority: Major > > intermittent failure of > {{ITestS3AContractRootDir.testRmEmptyRootDirNonRecursive}} surfacing in > HADOOP-12774 test run. > This is a test which came in with HADOOP-12977, one test which deletes all > children of the root dir, then verifies that they are gone. Although it > tested happily during development, the sightings of two transient failures > before it worked implied that it's either got some race condition with > previous tests and/or maven build, or we are seeing listing inconsistency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-13271) Intermittent failure of TestS3AContractRootDir.testListEmptyRootDirectory
[ https://issues.apache.org/jira/browse/HADOOP-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-13271. - Resolution: Cannot Reproduce Fix Version/s: 3.1.0 > Intermittent failure of TestS3AContractRootDir.testListEmptyRootDirectory > - > > Key: HADOOP-13271 > URL: https://issues.apache.org/jira/browse/HADOOP-13271 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Priority: Minor > Fix For: 3.1.0 > > > I'm seeing an intermittent failure of > {{TestS3AContractRootDir.testListEmptyRootDirectory}} > The sequence of : deleteFiles(listStatus(Path("/)")) is failing because the > file to delete is root ...yet the code is passing in the children of /, not / > itself. > hypothesis: when you call listStatus on an empty root dir, you get a file > entry back that says isFile, not isDirectory. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14531) Improve S3A error handling & reporting
[ https://issues.apache.org/jira/browse/HADOOP-14531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-14531. - Resolution: Fixed Fix Version/s: 3.1.0 > Improve S3A error handling & reporting > -- > > Key: HADOOP-14531 > URL: https://issues.apache.org/jira/browse/HADOOP-14531 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Fix For: 3.1.0 > > > Improve S3a error handling and reporting > this includes > # looking at error codes and translating to more specific exceptions > # better retry logic where present > # adding retry logic where not present > # more diagnostics in exceptions > # docs > Overall goals > * things that can be retried and will go away are retried for a bit > * things that don't go away when retried failfast (302, no auth, unknown > host, connection refused) > * meaningful exceptions are built in translate exception > * diagnostics are included, where possible > * our troubleshooting docs are expanded with new failures we encounter > AWS S3 error codes: > http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-13973) S3A GET/HEAD requests failing: java.lang.IllegalStateException: Connection is not open/Connection pool shut down
[ https://issues.apache.org/jira/browse/HADOOP-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-13973. - Resolution: Cannot Reproduce Fix Version/s: 3.1.0 Haven't seen this ourselves, and with the retry logic in reopen() in the input stream, if the error is translated into an IOE, it will be handled by the S3A retry policy > S3A GET/HEAD requests failing: java.lang.IllegalStateException: Connection is > not open/Connection pool shut down > > > Key: HADOOP-13973 > URL: https://issues.apache.org/jira/browse/HADOOP-13973 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: EC2 cluster >Reporter: Rajesh Balamohan >Assignee: Steve Loughran >Priority: Major > Fix For: 3.1.0 > > > S3 requests failing with an error coming from Http client, > "java.lang.IllegalStateException: Connection is not open" > Some online discussion implies that this is related to shared connection pool > shutdown & fixed in http client 4.4+. Hadoop & AWS SDK use v 4.5.2 so the fix > is in, we just need to make sure the pool is being set up right. > There's a problem here of course: it may require moving to a later version of > the AWS SDK, with the consequences on jackson , as seen in HADOOP-13050. > And that's if there is a patched version out there -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15225) mvn javadoc:test-javadoc goal throws cannot find symbol
Andras Bokor created HADOOP-15225: - Summary: mvn javadoc:test-javadoc goal throws cannot find symbol Key: HADOOP-15225 URL: https://issues.apache.org/jira/browse/HADOOP-15225 Project: Hadoop Common Issue Type: Bug Reporter: Andras Bokor Assignee: Andras Bokor {code:java} hadoop/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestReflectionUtils.java:28: error: cannot find symbol [WARNING] import static org.hamcrest.CoreMatchers.containsString; [WARNING] ^ [WARNING] symbol: static containsString [WARNING] location: class{code} This happens because mockito-all includes Hamcrest classes but a different version. Let's see TestReflectionUtils as an example: {{import static org.hamcrest.CoreMatchers.containsString; }}will result in error. Somehow mvn javadoc:test-javadoc will find Mockito's CoreMatchers class on the classpath which has no containsString method. From Mockito 2 the mockito-all is discontinued so HADOOP-14178 will solve this. Once HADOOP-14178 is resolved this can be closed as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/ [Feb 12, 2018 2:25:22 AM] (wangda) YARN-7906. Fix mvn site fails with error: Multiple sources of package [Feb 12, 2018 2:27:15 AM] (wangda) YARN-5848. Remove unnecessary public/crossdomain.xml from YARN UIv2 sub [Feb 12, 2018 2:28:35 AM] (wangda) YARN-7697. NM goes down with OOM due to leak in log-aggregation. (Xuan [Feb 12, 2018 2:29:37 AM] (wangda) YARN-7739. DefaultAMSProcessor should properly check customized resource [Feb 12, 2018 3:13:00 PM] (stevel) HADOOP-15187. Remove ADL mock test dependency on REST call invoked from [Feb 12, 2018 3:17:40 PM] (arp) HDFS-10453. ReplicationMonitor thread could stuck for long time due to [Feb 12, 2018 3:27:43 PM] (jlowe) YARN-7917. Fix failing test [Feb 12, 2018 4:44:34 PM] (brahma) HDFS-8693. Addendum patch to execute the command using UGI. Contributed [Feb 12, 2018 7:21:09 PM] (jlowe) MAPREDUCE-7048. Uber AM can crash due to unknown task in statusUpdate. [Feb 12, 2018 9:50:10 PM] (jlowe) YARN-7914. Fix exit code handling for short lived Docker containers. -1 overall The following subsystems voted -1: findbugs unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api org.apache.hadoop.yarn.api.records.Resource.getResources() may expose internal representation by returning Resource.resources At Resource.java:by returning Resource.resources At Resource.java:[line 234] Failed junit tests : hadoop.hdfs.TestDFSStripedOutputStreamWithFailure hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120 hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy hadoop.hdfs.TestUnsetAndChangeDirectoryEcPolicy hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.hdfs.TestDFSStripedOutputStreamWithFailure060 hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage hadoop.yarn.client.api.impl.TestAMRMClientPlacementConstraints hadoop.mapreduce.v2.TestMRJobs cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/diff-compile-javac-root.txt [280K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/diff-checkstyle-root.txt [17M] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/diff-patch-shelldocs.txt [12K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/whitespace-eol.txt [9.2M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/whitespace-tabs.txt [292K] xml: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/xml.txt [4.0K] findbugs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/diff-javadoc-javadoc-root.txt [760K] unit: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [400K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt [48K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt [16K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt [84K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/690/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt [8.0K] Powered by Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
[jira] [Created] (HADOOP-15224) builld up md5 checksum as blocks are built in S3ABlockOutputStream; validate upload
Steve Loughran created HADOOP-15224: --- Summary: builld up md5 checksum as blocks are built in S3ABlockOutputStream; validate upload Key: HADOOP-15224 URL: https://issues.apache.org/jira/browse/HADOOP-15224 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.0.0 Reporter: Steve Loughran [~rdblue] reports sometimes he sees corrupt data on S3. Given MD5 checks from upload to S3, its likelier to have happened in VM RAM, HDD or nearby. If the MD5 checksum for each block was built up as data was written to it, and checked against the etag RAM/HDD storage of the saved blocks could be removed as sources of corruption The obvious place would be {{org.apache.hadoop.fs.s3a.S3ADataBlocks.DataBlock}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-15213) JniBasedUnixGroupsNetgroupMapping.java and ShellBasedUnixGroupsNetgroupMapping.java use netgroup.substring(1)
[ https://issues.apache.org/jira/browse/HADOOP-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dhirendra Khanka resolved HADOOP-15213. --- Resolution: Not A Problem > JniBasedUnixGroupsNetgroupMapping.java and > ShellBasedUnixGroupsNetgroupMapping.java use netgroup.substring(1) > -- > > Key: HADOOP-15213 > URL: https://issues.apache.org/jira/browse/HADOOP-15213 > Project: Hadoop Common > Issue Type: Improvement > Components: security > Environment: SUSE Linux Enterprise Server 11 (x86_64) > VERSION = 11 > PATCHLEVEL = 3 >Reporter: Dhirendra Khanka >Priority: Minor > > > Part of the code below shown from below 2 classes > org.apache.hadoop.security.JniBasedUnixGroupsNetgroupMapping.java > {code:java} > protected synchronized List getUsersForNetgroup(String netgroup) { > String[] users = null; > try { > // JNI code does not expect '@' at the begining of the group name > users = getUsersForNetgroupJNI(netgroup.substring(1)); > } catch (Exception e) { > if (LOG.isDebugEnabled()) { > LOG.debug("Error getting users for netgroup " + netgroup, e); > } else { > LOG.info("Error getting users for netgroup " + netgroup + > ": " + e.getMessage()); > } > } > if (users != null && users.length != 0) { > return Arrays.asList(users); > } > return new LinkedList(); > }{code} > org.apache.hadoop.security.ShellBasedUnixGroupsNetgroupMapping.java > > {code:java} > protected String execShellGetUserForNetgroup(final String netgroup) > throws IOException { > String result = ""; > try > { // shell command does not expect '@' at the begining of the group name > result = Shell.execCommand( > Shell.getUsersForNetgroupCommand(netgroup.substring(1))); } > catch (ExitCodeException e) > { // if we didn't get the group - just return empty list; LOG.warn("error > getting users for netgroup " + netgroup, e); } > return result; > } > {code} > The comments from the code above expect the input to contain '@' , however > when executing the shell directly the output has the below form which does > not contain any ampersand symbol. > {code:java} > :~> getent netgroup mynetgroup1 > mynetgroup1 ( , a3xsds, ) ( , beekvkl, ) ( , redcuan, ) ( , > uedfmst, ){code} > > I have created a test code and removed the substring function and then ran it > on the cluster using hadoop jar. The code returned netgroups correctly after > the modification. I have limited knowledge on netgroup. The issue was > discovered when > hadoop.security.group.mapping = > *org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback* was added > to core-site.xml and it failed to apply netgroup access. > > Also find below debug comment to see negroup api calls in action > tdms@casatdhdp01master01:~> hdfs dfs -ls /user/tdms > 18/02/09 09:47:30 DEBUG util.Shell: setsid exited with exit code 0 > 18/02/09 09:47:30 DEBUG conf.Configuration: parsing URL > jar:file:/usr/hdp/2.5.3.0-37/hadoop/hadoop-common-2.7.3.2.5.3.0-37.jar!/core-default.xml > 18/02/09 09:47:30 DEBUG conf.Configuration: parsing input stream > sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream@78186a70 > 18/02/09 09:47:30 DEBUG conf.Configuration: parsing URL > file:/etc/hadoop/2.5.3.0-37/0/core-site.xml > 18/02/09 09:47:30 DEBUG conf.Configuration: parsing input stream > java.io.BufferedInputStream@15d9bc04 > 18/02/09 09:47:30 DEBUG security.SecurityUtil: Setting > hadoop.security.token.service.use_ip to true > 18/02/09 09:47:30 DEBUG util.KerberosName: Kerberos krb5 configuration not > found, setting default realm to empty > 18/02/09 09:47:30 DEBUG security.Groups: Creating new Groups object > 18/02/09 09:47:30 DEBUG util.NativeCodeLoader: Trying to load the > custom-built native-hadoop library... > 18/02/09 09:47:30 DEBUG util.NativeCodeLoader: Loaded the native-hadoop > library > 18/02/09 09:47:30 DEBUG security.JniBasedUnixGroupsMapping: Using > JniBasedUnixGroupsMapping for Group resolution > 18/02/09 09:47:30 DEBUG security.JniBasedUnixGroupsNetgroupMapping: Using > JniBasedUnixGroupsNetgroupMapping for Netgroup resolution > 18/02/09 09:47:30 DEBUG > security.JniBasedUnixGroupsNetgroupMappingWithFallback: Group mapping > impl=org.apache.hadoop.security.JniBasedUnixGroupsNetgroupMapping > 18/02/09 09:47:30 DEBUG security.Groups: Group mapping > impl=org.apache.hadoop.security.JniBasedUnixGroupsNetgroupMappingWithFallback; > cacheTimeout=30; warningDeltaMs=5000 > 18/02/09 09:47:30 DEBUG security.UserGroupInformation: hadoop login > 18/02/09 09:47:30 DEBUG security.UserGroupInformation: hadoop login commit > 18/02/09 09:47:30 DEBUG security.UserGroupInformation: using local >