[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268253#comment-16268253 ] Xiao Chen commented on HADOOP-14445: Thanks all for the discussions about this, and Rushabh for a demonstration patch. I've read through the comments of this one as well as HADOOP-14441. The patch would also take care of HADOOP-14134. To summarize and make sure my understanding is correct: - We all agree the token sharing will be an issue - For HDFS clients, the kp uri is already provided by NN. (Already done by HADOOP-14104) - Token should be recognized from UGI credentials when clients trying to authenticate. (reason for HADOOP-14441) - Token renewal currently reads configs. We'd want this to be read from the token, and make sure it's HA'ed. (reason for HADOOP-14134) - Besides the ip:port of the KMS instance, there's also information about whether it's http or https. We'd need a way to get this information, during renewal/cancelation. - Backwards compatible: old clients should work with new server; new clients should work with old server. Having a way to let clients work without reading configs feels to me to be a better approach, and more inline with HADOOP-14104, and the 'transparent' name of HDFS encryption. It feels to me a 'nameservice' solution doesn't add much value than the current way of using the full kp uri, which by itself is a representation of available services. For added KMS scenario, I think it's fine to let existing tokens to use the existing instances in that token's uri - presumbly this is better than deploying the config and restarting the client. For removed KMS instances, LBKMSCP handles it. It seems to me the patch here has handled most of this pretty well. I plan to take a crack at this tomorrow to: - rebase to trunk - address some earlier comments - modify with some of my thoughts: seems we can use LBKMSCP to store the full provider uri; some tests to verify compatibility If you have any comments, please let me know. > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: documentation, kms >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Rushabh S Shah > Attachments: HADOOP-14445-branch-2.8.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15070) add test to verify FileSystem and paths differentiate on user info
[ https://issues.apache.org/jira/browse/HADOOP-15070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268224#comment-16268224 ] Mingliang Liu commented on HADOOP-15070: +1 > add test to verify FileSystem and paths differentiate on user info > -- > > Key: HADOOP-15070 > URL: https://issues.apache.org/jira/browse/HADOOP-15070 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, test >Affects Versions: 2.8.2 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-15070-001.patch > > > Add a test to verify that userinfo data is (correctly) used to differentiate > the entries in the FS cache, so are treated as different filesystems. > * This is criticalk for wasb, which uses the username to identify the > container, in a path like {{wasb:contain...@stevel.azure.net}}. This works in > Hadoop, but SPARK-22587 shows that it may not be followed everywhere (and > given there's no documentation, who can fault them?) > * AbstractFileSystem.checkPath looks suspiciously like it's path validation > just checks host, not authority. That needs a test too. > * And we should cut the @LimitedPrivate(HDFS, Mapreduce) from > Path.makeQualified. If MR needs it, it should be considered open to all apps > using the Hadoop APIs. Until I looked at the code I thought it was... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14699) Impersonation errors with UGI after second principal relogin
[ https://issues.apache.org/jira/browse/HADOOP-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239746#comment-16239746 ] Jeff Storck edited comment on HADOOP-14699 at 11/28/17 6:05 AM: [~jnp] Please take a look at the [test code|https://github.com/jtstorck/ugi-test] I have provided. It shows a simplified scenario (inspired by a use case in NiFi) that causes the impersonation error. If two instantiations of the UGI class are used to represent two users, the impersonation error will occur on the relogin of the second user, provided that Hadoop is not configured to allow the impersonation. This use case of UGI occurs in NiFi when the Kerberos credentials in a Hadoop processor are changed from one user to another, with no intention of proxying a user. was (Author: jtstorck): [~jnp] Please take a look at the [test code|https://github.com/jtstorck/kerberos-examples/tree/master/hadoop/ugi-test] I have provided. It shows a simplified scenario (inspired by a use case in NiFi) that causes the impersonation error. If two instantiations of the UGI class are used to represent two users, the impersonation error will occur on the relogin of the second user, provided that Hadoop is not configured to allow the impersonation. This use case of UGI occurs in NiFi when the Kerberos credentials in a Hadoop processor are changed from one user to another, with no intention of proxying a user. > Impersonation errors with UGI after second principal relogin > > > Key: HADOOP-14699 > URL: https://issues.apache.org/jira/browse/HADOOP-14699 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 2.6.2, 2.7.3, 2.8.1 >Reporter: Jeff Storck > > Multiple principals that are logged in using UGI instances that are > instantiated from a UGI class loaded by the same classloader will encounter > problems when the second principal attempts to relogin and perform an action > using a UGI.doAs(). An impersonation will occur and the operation attempted > by the second principal after relogging in will fail. There should not be an > implicit attempt to impersonate the second principal through the first > principal that logged in. > I have created a GitHub project that exhibits the impersonation error with > brief instructions on how to set up for the test and run it: > https://github.com/jtstorck/ugi-test > {noformat}18:44:55.687 [pool-2-thread-2] WARN > h.u.u.ugirunnable.ugite...@example.com - Unexpected exception while > performing task for [ugite...@example.com (auth:KERBEROS)] > org.apache.hadoop.ipc.RemoteException: User: ugite...@example.com is not > allowed to impersonate ugite...@example.com > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1481) > at org.apache.hadoop.ipc.Client.call(Client.java:1427) > at org.apache.hadoop.ipc.Client.call(Client.java:1337) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:787) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335) > at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1700) > at > org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1436) > at > org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1433) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1448) > at >
[jira] [Updated] (HADOOP-14699) Impersonation errors with UGI after second principal relogin
[ https://issues.apache.org/jira/browse/HADOOP-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Storck updated HADOOP-14699: - Description: Multiple principals that are logged in using UGI instances that are instantiated from a UGI class loaded by the same classloader will encounter problems when the second principal attempts to relogin and perform an action using a UGI.doAs(). An impersonation will occur and the operation attempted by the second principal after relogging in will fail. There should not be an implicit attempt to impersonate the second principal through the first principal that logged in. I have created a GitHub project that exhibits the impersonation error with brief instructions on how to set up for the test and run it: https://github.com/jtstorck/ugi-test {noformat}18:44:55.687 [pool-2-thread-2] WARN h.u.u.ugirunnable.ugite...@example.com - Unexpected exception while performing task for [ugite...@example.com (auth:KERBEROS)] org.apache.hadoop.ipc.RemoteException: User: ugite...@example.com is not allowed to impersonate ugite...@example.com at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1481) at org.apache.hadoop.ipc.Client.call(Client.java:1427) at org.apache.hadoop.ipc.Client.call(Client.java:1337) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:787) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335) at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1700) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1436) at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1433) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1448) at hadoop.ugitest.UgiTestMain$UgiRunnable.lambda$run$2(UgiTestMain.java:194) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) at hadoop.ugitest.UgiTestMain$UgiRunnable.run(UgiTestMain.java:194) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745){noformat} was: Multiple principals that are logged in using UGI instances that are instantiated from a UGI class loaded by the same classloader will encounter problems when the second principal attempts to relogin and perform an action using a UGI.doAs(). An impersonation will occur and the operation attempted by the second principal after relogging in will fail. There should not be an implicit attempt to impersonate the second principal through the first principal that logged in. I have created a GitHub project that exhibits the impersonation error with brief instructions on how to set up for the test and run it: https://github.com/jtstorck/kerberos-examples/tree/master/hadoop/ugi-test {noformat}18:44:55.687 [pool-2-thread-2] WARN h.u.u.ugirunnable.ugite...@example.com - Unexpected
[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268148#comment-16268148 ] Yonger commented on HADOOP-14475: - [~mackrorysd] Thank you. I start to understand your logic, and I think you are right. {code:java} There's no guarantee that metrics source names would even be consistent among all JVMs for a given bucket, since they're assigned numbers in the order that they're created {code} I can get the info what i want to by aggregating based on bucket instead of metric source name which is not still unique in multiple JVM processes. > Metrics of S3A don't print out when enable it in Hadoop metrics property file > -- > > Key: HADOOP-14475 > URL: https://issues.apache.org/jira/browse/HADOOP-14475 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: uname -a > Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > cat /etc/issue > Ubuntu 16.04.2 LTS \n \l >Reporter: Yonger >Assignee: Yonger > Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, > HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, > HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, > HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, > HADOOP-14475.015.patch, HADOOP-14775.007.patch, failsafe-report-s3a-it.html, > failsafe-report-s3a-scale.html, failsafe-report-scale.html, > failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip > > > *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink > #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #*.sink.influxdb.url=http:/xx > #*.sink.influxdb.influxdb_port=8086 > #*.sink.influxdb.database=hadoop > #*.sink.influxdb.influxdb_username=hadoop > #*.sink.influxdb.influxdb_password=hadoop > #*.sink.ingluxdb.cluster=c1 > *.period=10 > #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out > I can't find the out put file even i run a MR job which should be used s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268098#comment-16268098 ] Ping Liu commented on HADOOP-14600: --- Yes, Chris. I am verifying the patch. There is an issue just found tonight in my Linux environment. In TestRawLocalFileSystemContract.testPermission(), the native call failed with {{java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;}}. I'll look into it further. I guess it is due to my last change. I'll come back with update. > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268073#comment-16268073 ] Sean Mackrory edited comment on HADOOP-14475 at 11/28/17 4:07 AM: -- [~iyonger] I don't believe I've changed anything that would affect whether or not the metrics source name shows up in the sink's output - I've only changed what the source name would be. I think the format even when I tested your original patch was pretty much as it is above, and did not include the source name, only the record name (which happened to be similar). If you're wanting to aggregate based on the bucket, I would use the bucket field itself. There's no guarantee that metrics source names would even be consistent among all JVMs for a given bucket, since they're assigned numbers in the order that they're created - that would only be true if every JVM had accessed the exact same buckets in the exact same order - the assumption would break down as soon as a job didn't utilize the entire cluster or a node was down during a job, etc. was (Author: mackrorysd): [~iyonger] I don't believe I've changed anything that would affect whether or not the metrics source name shows up in the sink's output - I've only changed what the source name would be. I think the format even when I tested your original patch was pretty much as it is above, and did not include the source name, only the record name (which happened to be similar). If you're wanting to aggregate based on the bucket, I would use the bucket field itself. There's no guarantee that metrics source names would even be consistent among all JVMs for a given bucket - that would only be true if every JVM had accessed the exact same buckets in the exact same order - the assumption would break down as soon as a job didn't utilize the entire cluster or a node was down during a job, etc. > Metrics of S3A don't print out when enable it in Hadoop metrics property file > -- > > Key: HADOOP-14475 > URL: https://issues.apache.org/jira/browse/HADOOP-14475 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: uname -a > Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > cat /etc/issue > Ubuntu 16.04.2 LTS \n \l >Reporter: Yonger >Assignee: Yonger > Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, > HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, > HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, > HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, > HADOOP-14475.015.patch, HADOOP-14775.007.patch, failsafe-report-s3a-it.html, > failsafe-report-s3a-scale.html, failsafe-report-scale.html, > failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip > > > *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink > #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #*.sink.influxdb.url=http:/xx > #*.sink.influxdb.influxdb_port=8086 > #*.sink.influxdb.database=hadoop > #*.sink.influxdb.influxdb_username=hadoop > #*.sink.influxdb.influxdb_password=hadoop > #*.sink.ingluxdb.cluster=c1 > *.period=10 > #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out > I can't find the out put file even i run a MR job which should be used s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268073#comment-16268073 ] Sean Mackrory commented on HADOOP-14475: [~iyonger] I don't believe I've changed anything that would affect whether or not the metrics source name shows up in the sink's output - I've only changed what the source name would be. I think the format even when I tested your original patch was pretty much as it is above, and did not include the source name, only the record name (which happened to be similar). If you're wanting to aggregate based on the bucket, I would use the bucket field itself. There's no guarantee that metrics source names would even be consistent among all JVMs for a given bucket - that would only be true if every JVM had accessed the exact same buckets in the exact same order - the assumption would break down as soon as a job didn't utilize the entire cluster or a node was down during a job, etc. > Metrics of S3A don't print out when enable it in Hadoop metrics property file > -- > > Key: HADOOP-14475 > URL: https://issues.apache.org/jira/browse/HADOOP-14475 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: uname -a > Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > cat /etc/issue > Ubuntu 16.04.2 LTS \n \l >Reporter: Yonger >Assignee: Yonger > Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, > HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, > HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, > HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, > HADOOP-14475.015.patch, HADOOP-14775.007.patch, failsafe-report-s3a-it.html, > failsafe-report-s3a-scale.html, failsafe-report-scale.html, > failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip > > > *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink > #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #*.sink.influxdb.url=http:/xx > #*.sink.influxdb.influxdb_port=8086 > #*.sink.influxdb.database=hadoop > #*.sink.influxdb.influxdb_username=hadoop > #*.sink.influxdb.influxdb_password=hadoop > #*.sink.ingluxdb.cluster=c1 > *.period=10 > #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out > I can't find the out put file even i run a MR job which should be used s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268051#comment-16268051 ] Yonger commented on HADOOP-14475: - [~mackrorysd] So our final output format of record doesn't include any metric source info? if yes, I think it is not friendly to make a statistic chart like via InfluxDB+Grafana, because only fsId is unique in a record, especially there are multiple metric sources registered with the same bucket(I don't know why, but they were exist in my test), whose output records can't be easy to distinguish except fsid, then your chart in Grafana is hard readable. > Metrics of S3A don't print out when enable it in Hadoop metrics property file > -- > > Key: HADOOP-14475 > URL: https://issues.apache.org/jira/browse/HADOOP-14475 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: uname -a > Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > cat /etc/issue > Ubuntu 16.04.2 LTS \n \l >Reporter: Yonger >Assignee: Yonger > Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, > HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, > HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, > HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, > HADOOP-14475.015.patch, HADOOP-14775.007.patch, failsafe-report-s3a-it.html, > failsafe-report-s3a-scale.html, failsafe-report-scale.html, > failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip > > > *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink > #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #*.sink.influxdb.url=http:/xx > #*.sink.influxdb.influxdb_port=8086 > #*.sink.influxdb.database=hadoop > #*.sink.influxdb.influxdb_username=hadoop > #*.sink.influxdb.influxdb_password=hadoop > #*.sink.ingluxdb.cluster=c1 > *.period=10 > #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out > I can't find the out put file even i run a MR job which should be used s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268003#comment-16268003 ] genericqa commented on HADOOP-15059: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 14s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 12m 11s{color} | {color:red} root generated 7 new + 7 unchanged - 0 fixed = 14 total (was 7) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 11s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 1s{color} | {color:orange} root: The patch generated 4 new + 364 unchanged - 7 fixed = 368 total (was 371) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 8m 54s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 59s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 38s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 22s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}111m 41s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15059 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12899517/HADOOP-15059.001.patch | | Optional Tests | asflicense compile
[jira] [Commented] (HADOOP-13134) WASB's file delete still throwing Blob not found exception
[ https://issues.apache.org/jira/browse/HADOOP-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267968#comment-16267968 ] Zichen Sun commented on HADOOP-13134: - We had a similar issue reported a few times when doing delete directories, and the workaround was to set fs.azure.flatlist.enable to true, introduced by https://issues.apache.org/jira/browse/HADOOP-13403, which skips the following if (!enableFlatListing) { // Currently at a depth of one, decrement the listing depth for // sub-directories. buildUpList(directory, fileMetadata, maxListingCount,maxListingDepth - 1);} > WASB's file delete still throwing Blob not found exception > -- > > Key: HADOOP-13134 > URL: https://issues.apache.org/jira/browse/HADOOP-13134 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.7.1 >Reporter: Lin Chan >Assignee: Dushyanth > > WASB is still throwing blob not found exception as shown in the following > stack. Need to catch that and convert to Boolean return code in WASB delete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14699) Impersonation errors with UGI after second principal relogin
[ https://issues.apache.org/jira/browse/HADOOP-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267929#comment-16267929 ] Jeff Storck commented on HADOOP-14699: -- [~jnp] I updated the test code and provided new instructions for reproducing the impersonation issue. The test code has been updated to provide per-principal task configuration, and now writes files to HDFS rather than just retrieving status. Please let me know if you have any problems using the updated code. Thanks! > Impersonation errors with UGI after second principal relogin > > > Key: HADOOP-14699 > URL: https://issues.apache.org/jira/browse/HADOOP-14699 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 2.6.2, 2.7.3, 2.8.1 >Reporter: Jeff Storck > > Multiple principals that are logged in using UGI instances that are > instantiated from a UGI class loaded by the same classloader will encounter > problems when the second principal attempts to relogin and perform an action > using a UGI.doAs(). An impersonation will occur and the operation attempted > by the second principal after relogging in will fail. There should not be an > implicit attempt to impersonate the second principal through the first > principal that logged in. > I have created a GitHub project that exhibits the impersonation error with > brief instructions on how to set up for the test and run it: > https://github.com/jtstorck/kerberos-examples/tree/master/hadoop/ugi-test > {noformat}18:44:55.687 [pool-2-thread-2] WARN > h.u.u.ugirunnable.ugite...@example.com - Unexpected exception while > performing task for [ugite...@example.com (auth:KERBEROS)] > org.apache.hadoop.ipc.RemoteException: User: ugite...@example.com is not > allowed to impersonate ugite...@example.com > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1481) > at org.apache.hadoop.ipc.Client.call(Client.java:1427) > at org.apache.hadoop.ipc.Client.call(Client.java:1337) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:787) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335) > at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1700) > at > org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1436) > at > org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1433) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1448) > at > hadoop.ugitest.UgiTestMain$UgiRunnable.lambda$run$2(UgiTestMain.java:194) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at hadoop.ugitest.UgiTestMain$UgiRunnable.run(UgiTestMain.java:194) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Commented] (HADOOP-15039) move SemaphoredDelegatingExecutor to hadoop-common
[ https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267909#comment-16267909 ] Genmao Yu commented on HADOOP-15039: [~ste...@apache.org] Could you take a review again please? > move SemaphoredDelegatingExecutor to hadoop-common > -- > > Key: HADOOP-15039 > URL: https://issues.apache.org/jira/browse/HADOOP-15039 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/oss, fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Minor > Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, > HADOOP-15039.003.patch > > > Detailed discussions in HADOOP-14999 and HADOOP-15027. > share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}. > cc [~ste...@apache.org] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HADOOP-15039) move SemaphoredDelegatingExecutor to hadoop-common
[ https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated HADOOP-15039: --- Comment: was deleted (was: [~ste...@apache.org] take a look please.) > move SemaphoredDelegatingExecutor to hadoop-common > -- > > Key: HADOOP-15039 > URL: https://issues.apache.org/jira/browse/HADOOP-15039 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/oss, fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Minor > Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, > HADOOP-15039.003.patch > > > Detailed discussions in HADOOP-14999 and HADOOP-15027. > share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}. > cc [~ste...@apache.org] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14898) Create official Docker images for development and testing features
[ https://issues.apache.org/jira/browse/HADOOP-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267894#comment-16267894 ] Miklos Szegedi commented on HADOOP-14898: - Thank you [~elek] for the patch. Could you provide a patch file named HADOOP-14898.000.patch against the latest trunk, so that the Jenkins job (asf license, etc.) can run on it? > Create official Docker images for development and testing features > --- > > Key: HADOOP-14898 > URL: https://issues.apache.org/jira/browse/HADOOP-14898 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton > Attachments: HADOOP-14898.001.tar.gz, HADOOP-14898.002.tar.gz, > HADOOP-14898.003.tgz > > > This is the original mail from the mailing list: > {code} > TL;DR: I propose to create official hadoop images and upload them to the > dockerhub. > GOAL/SCOPE: I would like improve the existing documentation with easy-to-use > docker based recipes to start hadoop clusters with various configuration. > The images also could be used to test experimental features. For example > ozone could be tested easily with these compose file and configuration: > https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6 > Or even the configuration could be included in the compose file: > https://github.com/elek/hadoop/blob/docker-2.8.0/example/docker-compose.yaml > I would like to create separated example compose files for federation, ha, > metrics usage, etc. to make it easier to try out and understand the features. > CONTEXT: There is an existing Jira > https://issues.apache.org/jira/browse/HADOOP-13397 > But it’s about a tool to generate production quality docker images (multiple > types, in a flexible way). If no objections, I will create a separated issue > to create simplified docker images for rapid prototyping and investigating > new features. And register the branch to the dockerhub to create the images > automatically. > MY BACKGROUND: I am working with docker based hadoop/spark clusters quite a > while and run them succesfully in different environments (kubernetes, > docker-swarm, nomad-based scheduling, etc.) My work is available from here: > https://github.com/flokkr but they could handle more complex use cases (eg. > instrumenting java processes with btrace, or read/reload configuration from > consul). > And IMHO in the official hadoop documentation it’s better to suggest to use > official apache docker images and not external ones (which could be changed). > {code} > The next list will enumerate the key decision points regarding to docker > image creating > A. automated dockerhub build / jenkins build > Docker images could be built on the dockerhub (a branch pattern should be > defined for a github repository and the location of the Docker files) or > could be built on a CI server and pushed. > The second one is more flexible (it's more easy to create matrix build, for > example) > The first one had the advantage that we can get an additional flag on the > dockerhub that the build is automated (and built from the source by the > dockerhub). > The decision is easy as ASF supports the first approach: (see > https://issues.apache.org/jira/browse/INFRA-12781?focusedCommentId=15824096=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15824096) > B. source: binary distribution or source build > The second question is about creating the docker image. One option is to > build the software on the fly during the creation of the docker image the > other one is to use the binary releases. > I suggest to use the second approach as: > 1. In that case the hadoop:2.7.3 could contain exactly the same hadoop > distrubution as the downloadable one > 2. We don't need to add development tools to the image, the image could be > more smaller (which is important as the goal for this image to getting > started as fast as possible) > 3. The docker definition will be more simple (and more easy to maintain) > Usually this approach is used in other projects (I checked Apache Zeppelin > and Apache Nutch) > C. branch usage > Other question is the location of the Docker file. It could be on the > official source-code branches (branch-2, trunk, etc.) or we can create > separated branches for the dockerhub (eg. docker/2.7 docker/2.8 docker/3.0) > For the first approach it's easier to find the docker images, but it's less > flexible. For example if we had a Dockerfile for on the source code it should > be used for every release (for example the Docker file from the tag > release-3.0.0 should be used for the 3.0 hadoop docker image). In that case > the release process is much more harder: in case of a Dockerfile error (which > could be test on dockerhub only after the
[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267892#comment-16267892 ] genericqa commented on HADOOP-14475: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 1 new + 19 unchanged - 0 fixed = 20 total (was 19) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 35s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 46m 7s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-14475 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12899510/HADOOP-14475.015.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7cbf4148bb74 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d8923cd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/13753/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/13753/testReport/ | | Max. process+thread count | 325 (vs. ulimit of 5000) | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13753/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This
[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-15059: Attachment: HADOOP-15059.001.patch Attaching a patch that has the container launch process write two token files, the legacy token format in the existing container_tokens file and the new version 1 format in a new container_tokens-v1 file. I left the container localizer path alone since localizers are running the same code as the nodemanager and therefore can directly support the new v1 format as-is. The basic idea is to tack on the "-v1" suffix to the legacy token pathname to form the new v1 token pathname. The container launcher and container executors both do this, so the interface between them did not have to change. The legacy path is passed between them to indicate where both token files can be located (once the suffix is applied to form the new token path). It's definitely not the cleanest, but it was relatively simple to implement. I refactored some names in the container start context to make it more clear which path is being used. This needs a lot more testing, but I was able to run a sleep job on a simple security pseudo-distributed cluster and manually verified both container token files were being written and each was the proper format. I also manually forced the launcher to omit the new environment variable for the version 1 file, forcing the UGI to load the legacy token file, and that worked as well. I have not had a chance yet to test the rolling-upgrade-with-tarball scenario nor the native container-executor changes, but I thought it was far enough along to at least get some feedback. If others could take a look at the patch and/or take it for a test drive that would be great. > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Assignee: Jason Lowe >Priority: Blocker > Attachments: HADOOP-15059.001.patch > > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-15059: Status: Patch Available (was: Open) > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Assignee: Jason Lowe >Priority: Blocker > Attachments: HADOOP-15059.001.patch > > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267865#comment-16267865 ] Sean Mackrory commented on HADOOP-14475: One other thing worth mentioning is the compatibility implications of removing fsUri and replacing it with bucket. This was basically unusable from a metrics2 perspective before, so... it's pretty tough to be broken by this change. > Metrics of S3A don't print out when enable it in Hadoop metrics property file > -- > > Key: HADOOP-14475 > URL: https://issues.apache.org/jira/browse/HADOOP-14475 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: uname -a > Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > cat /etc/issue > Ubuntu 16.04.2 LTS \n \l >Reporter: Yonger >Assignee: Yonger > Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, > HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, > HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, > HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, > HADOOP-14475.015.patch, HADOOP-14775.007.patch, failsafe-report-s3a-it.html, > failsafe-report-s3a-scale.html, failsafe-report-scale.html, > failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip > > > *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink > #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #*.sink.influxdb.url=http:/xx > #*.sink.influxdb.influxdb_port=8086 > #*.sink.influxdb.database=hadoop > #*.sink.influxdb.influxdb_username=hadoop > #*.sink.influxdb.influxdb_password=hadoop > #*.sink.ingluxdb.cluster=c1 > *.period=10 > #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out > I can't find the out put file even i run a MR job which should be used s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-14475: --- Attachment: HADOOP-14475.015.patch Conflict presumably due to s3a output committer. Relatively simple to merge - but will need to retest. Some unit tests modified or created by the committer are failing locally. > Metrics of S3A don't print out when enable it in Hadoop metrics property file > -- > > Key: HADOOP-14475 > URL: https://issues.apache.org/jira/browse/HADOOP-14475 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: uname -a > Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > cat /etc/issue > Ubuntu 16.04.2 LTS \n \l >Reporter: Yonger >Assignee: Yonger > Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, > HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, > HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, > HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, > HADOOP-14475.015.patch, HADOOP-14775.007.patch, failsafe-report-s3a-it.html, > failsafe-report-s3a-scale.html, failsafe-report-scale.html, > failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip > > > *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink > #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #*.sink.influxdb.url=http:/xx > #*.sink.influxdb.influxdb_port=8086 > #*.sink.influxdb.database=hadoop > #*.sink.influxdb.influxdb_username=hadoop > #*.sink.influxdb.influxdb_password=hadoop > #*.sink.ingluxdb.cluster=c1 > *.period=10 > #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out > I can't find the out put file even i run a MR job which should be used s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267791#comment-16267791 ] genericqa commented on HADOOP-14475: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HADOOP-14475 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-14475 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12899507/HADOOP-14475.014.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13752/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Metrics of S3A don't print out when enable it in Hadoop metrics property file > -- > > Key: HADOOP-14475 > URL: https://issues.apache.org/jira/browse/HADOOP-14475 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: uname -a > Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > cat /etc/issue > Ubuntu 16.04.2 LTS \n \l >Reporter: Yonger >Assignee: Yonger > Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, > HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, > HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, > HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, > HADOOP-14775.007.patch, failsafe-report-s3a-it.html, > failsafe-report-s3a-scale.html, failsafe-report-scale.html, > failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip > > > *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink > #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #*.sink.influxdb.url=http:/xx > #*.sink.influxdb.influxdb_port=8086 > #*.sink.influxdb.database=hadoop > #*.sink.influxdb.influxdb_username=hadoop > #*.sink.influxdb.influxdb_password=hadoop > #*.sink.ingluxdb.cluster=c1 > *.period=10 > #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out > I can't find the out put file even i run a MR job which should be used s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-14475: --- Attachment: HADOOP-14475.014.patch So I see the local hostname is already included in metrics - that makes sense. I've updated the patch to just use the bucket, not the entire URI. So long as I was looking at how wasb does this, I took a more specific look at how it names sources, records, etc. and made a few changes to be more in line with what they're doing - some of which are actually a return to how [~iyonger] had originally done it, which I had changed to be more brief. Consistency with wasb is a more compelling motivation that a shorter context name. So to be clear, a line in the metrics log used to look like this: {code} 1511208770680 s3afs.S3AMetrics1-bucket: Context=s3afs, FileSystemId=892b02bb-7b30-4ffe-80ca-3a9935e1d96e-bucket, fsURI=s3a://bucket/home/user/terasuite {code} But now looks like this: {code} 1511208770680 s3aFileSystem.s3aFileSystem: Context=s3aFileSystem, s3aFileSystemId=892b02bb-7b30-4ffe-80ca-3a9935e1d96e, bucket=bucket, {code} > Metrics of S3A don't print out when enable it in Hadoop metrics property file > -- > > Key: HADOOP-14475 > URL: https://issues.apache.org/jira/browse/HADOOP-14475 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: uname -a > Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > cat /etc/issue > Ubuntu 16.04.2 LTS \n \l >Reporter: Yonger >Assignee: Yonger > Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, > HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, > HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, > HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, > HADOOP-14775.007.patch, failsafe-report-s3a-it.html, > failsafe-report-s3a-scale.html, failsafe-report-scale.html, > failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip > > > *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink > #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #*.sink.influxdb.url=http:/xx > #*.sink.influxdb.influxdb_port=8086 > #*.sink.influxdb.database=hadoop > #*.sink.influxdb.influxdb_username=hadoop > #*.sink.influxdb.influxdb_password=hadoop > #*.sink.ingluxdb.cluster=c1 > *.period=10 > #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out > I can't find the out put file even i run a MR job which should be used s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-10054) ViewFsFileStatus.toString() is broken
[ https://issues.apache.org/jira/browse/HADOOP-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal reassigned HADOOP-10054: -- Assignee: Hanisha Koneru > ViewFsFileStatus.toString() is broken > - > > Key: HADOOP-10054 > URL: https://issues.apache.org/jira/browse/HADOOP-10054 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.0.5-alpha >Reporter: Paul Han >Assignee: Hanisha Koneru >Priority: Minor > > ViewFsFileStatus.toString is broken. Following code snippet : > {code} > FileStatus stat= somefunc(); // somefunc() returns an instance of > ViewFsFileStatus > System.out.println("path:" + stat.getPath()); > System.out.println(stat.toString()); > {code} > produces the output: > {code} > path:viewfs://x.com/user/X/tmp-48 > ViewFsFileStatus{path=null; isDirectory=false; length=0; replication=0; > blocksize=0; modification_time=0; access_time=0; owner=; group=; > permission=rw-rw-rw-; isSymlink=false} > {code} > Note that "path=null" is not correct. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267686#comment-16267686 ] Chris Douglas commented on HADOOP-14964: bq. The patch for 2.8.3 will be cherry picked from branch-2 Please cherry-pick from branch-2.9 to branch-2.8, then from branch-2.8 to branch-2.8.3 so the lineage is clear. The concerns from the release thread: # *Dependencies* Since the OSS artifact is not shaded, does it introduce dependencies that could conflict with versions used by clients? # *Upgrade* 2.9.0 does not contain Aliyun OSS. If we release 2.8.3, users cannot upgrade to 2.9.x without regressing on this feature. # *Timing* 2.8.3 contains critical bug fixes, so this needs to wrap up quickly if it will be included. # *Release policy* This should not contradict Hadoop's [compatibility guidelines|https://hadoop.apache.org/docs/r3.0.0-beta1/hadoop-project-dist/hadoop-common/Compatibility.html]. 1-3 are concerns that have a technical remediation: For (1), what are all the transitive dependencies for this module? If the SDK uses dependencies that already ship with 2.8.2, then we may not need shading, but any new, visible dependency needs to be added to the release notes and classify this as an incompatible change. If the number of new, visible dependencies is zero (by shading or reuse), then (1) is trivially satisfied. For (2), if we take the current branch-2.9 and cut a 2.9.1 release, that would tranquilize anxiety about the upgrade path. [~Sammi], would you be able to RM this? [~asuresh] and [~subru] may be able to help by providing pointers to release docs. (3) Is not a problem, as there are some blocker issues and the patch is available. As policy, (4) is stickier. Even releasing this with 2.9.1 doesn't strictly adhere to our rules, but it's better than adding another, active release branch. The case for 2.8.3 is more problematic. The cadence for 2.8.x will decelerate more rapidly than 2.9.x, so fixes to Aliyun OSS will be released less often. We may not do its users a favor by including an outdated client with their 2.8 clusters. Frankly, maintenance is also simpler when we disallow feature backports into patch releases, rather than discussing the merits of each one. This can't become a precedent; it takes too much time. Personally, if 1-3 are addressed and folks working on the module still want to go ahead with it, then I'd support a 2.8.3 release with Aliyun OSS. > AliyunOSS: backport Aliyun OSS module to branch-2 > - > > Key: HADOOP-14964 > URL: https://issues.apache.org/jira/browse/HADOOP-14964 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Reporter: Genmao Yu >Assignee: SammiChen > Fix For: 2.9.1 > > Attachments: HADOOP-14964-branch-2.000.patch, > HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, > HADOOP-14964-branch-2.9.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned HADOOP-15059: --- Assignee: Jason Lowe Thanks for the feedback, Ray. I'll take a crack at implementing the two token file approach. > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Assignee: Jason Lowe >Priority: Blocker > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HADOOP-14964: --- Resolution: Fixed Hadoop Flags: Reviewed Release Note: Aliyun OSS is widely used among China’s cloud users and this work implemented a new Hadoop compatible filesystem AliyunOSSFileSystem with oss:// scheme, similar to the s3a and azure support. (was: OSS is widely used among China’s cloud users and this work implemented a new Hadoop compatible filesystem AliyunOSSFileSystem with oss scheme, similar to the s3a and azure support. Currently, the feature is support in 2.9.1. ) Status: Resolved (was: Patch Available) I'm going to mark this as "Resolved", to track the fact that it's been committed to branch-2 and branch-2.9. I also changed the release note to omit the version information (this should be implied by context). > AliyunOSS: backport Aliyun OSS module to branch-2 > - > > Key: HADOOP-14964 > URL: https://issues.apache.org/jira/browse/HADOOP-14964 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/oss >Reporter: Genmao Yu >Assignee: SammiChen > Fix For: 2.9.1 > > Attachments: HADOOP-14964-branch-2.000.patch, > HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, > HADOOP-14964-branch-2.9.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances
[ https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267351#comment-16267351 ] Xiao Chen commented on HADOOP-14445: Hi [~shahrs87] and [~daryn], Hope you had a great thanksgiving.. Any updates on this? > Delegation tokens are not shared between KMS instances > -- > > Key: HADOOP-14445 > URL: https://issues.apache.org/jira/browse/HADOOP-14445 > Project: Hadoop Common > Issue Type: Bug > Components: documentation, kms >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Rushabh S Shah > Attachments: HADOOP-14445-branch-2.8.patch > > > As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do > not share delegation tokens. (a client uses KMS address/port as the key for > delegation token) > {code:title=DelegationTokenAuthenticatedURL#openConnection} > if (!creds.getAllTokens().isEmpty()) { > InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(), > url.getPort()); > Text service = SecurityUtil.buildTokenService(serviceAddr); > dToken = creds.getToken(service); > {code} > But KMS doc states: > {quote} > Delegation Tokens > Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation > tokens too. > Under HA, A KMS instance must verify the delegation token given by another > KMS instance, by checking the shared secret used to sign the delegation > token. To do this, all KMS instances must be able to retrieve the shared > secret from ZooKeeper. > {quote} > We should either update the KMS documentation, or fix this code to share > delegation tokens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267246#comment-16267246 ] Chris Douglas commented on HADOOP-14600: bq. replace `listLocatedStatus` call with `listStatusIterator` because it returns FileStatus rather than LocatedFileStatus and that doesn't trigger all the getPermission() mess at all Good point. Unfortunately, for everything accepted by the filter (which defaults to accepting everything, IIRC), we double the RPCs if the client subsequently asks for locations. That's bad for HDFS, but irrelevant to the local FS and object stores that don't report locality information. [~myapachejira], have you had a chance to verify the patch, yet? > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267241#comment-16267241 ] genericqa commented on HADOOP-1: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HADOOP-1 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-1 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12883510/HADOOP-1.10.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13751/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann > Attachments: HADOOP-1.10.patch, HADOOP-1.2.patch, > HADOOP-1.3.patch, HADOOP-1.4.patch, HADOOP-1.5.patch, > HADOOP-1.6.patch, HADOOP-1.7.patch, HADOOP-1.8.patch, > HADOOP-1.9.patch, HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support for explicit FTPS (SSL/TLS) > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole directory > whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267229#comment-16267229 ] Steve Loughran commented on HADOOP-1: - quick review Overall: looks like a good quality piece of code: well, written, tests. very nice docs, and lots of proxy support. * doesn't like applying to trunk; I had to fiddle with the poms, but there's also a clientkeystore file which I couldn't handle. Is that needed? Could it be generated? * doesn't compile either. Tests * why is the FTP test skipped on Windows? * we tend to use `setup()` `teardown()` as the @Before/@after operations in filesystems. Having standard names makes it more consistent when subclassing...and having >1 before/after method puts you into ambiguous ordering. Fix: change the names, subclass as appropriate, calling the superclass method as desired. * like what you've done with the mixin to reuse all the tests, but I'd prefer a name more unique to the FS than ContractTestBase. FTPContractTestMixin? * Never thought about having `AbstractFSContract createContract()` raise an IOE. We could add that to its signature (best in a separate JIRA) * You are importing the distcp tests but not using them. What's your plan there? Get this patch in and then add that as the next iteration? Docs * readme should go into src/site/org/apache/hadoop/ftpextended/index.md Misc minor points * need to rename AbstractFileSystem to a class which isn't used elsewhere, e.g AbstractFTPFileSystem * use try-with-resources areound channel logic and have the implicit channel.close() do the disconnect * There's lots of opportunities to use subclasses of IOE where it is useful to provide more meaningful failures. * the style guidelines have conventions on import ordering we strive to maintain, especially on new code * hadoop code prefers a space after // in comments; a search & replace should fix * org/apache/hadoop/fs/ftpextended/ftp/package-info.java should declare code as @Private+Unstable. Even if the FS is public, there's no API coming from this module, nor stability guarantees. * Unless it's going to leak passwords, error messages should try and include the filesystem URI in them. Why? helps debugging when the job is working with >1 FS and all you have is a log to go on * When wrapping library exceptions (e.g SFTP exceptions), always include the toString() value of the wrapped exception. It'll be the string most likely to make it to bug reports. * core-site.xml mentions s3 h3. Migration I don't know what the good story for migration here is. Given how much better this is than the fairly basic ftp client there is today, we've no reason not to tell everyone to move to it. The hard part is how to * this works best if the previous options still all work. h3. Security I'm moving to a world where we have to provide security audits of sensitive patches, which this is. What's the security mechanism here? * Is Configuration.getPassword() used to get secrets through JCEKS files? * I see that user:password is supported. I don't like this. I guess given its only FTP it doesn't matter that much, but on SFTP it does * And on the topic of SFTP, what to do there? The docs will need a section in this, with the overall theme being telling people how to use it securely > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann > Attachments: HADOOP-1.10.patch, HADOOP-1.2.patch, > HADOOP-1.3.patch, HADOOP-1.4.patch, HADOOP-1.5.patch, > HADOOP-1.6.patch, HADOOP-1.7.patch, HADOOP-1.8.patch, > HADOOP-1.9.patch, HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support for explicit FTPS (SSL/TLS) > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole directory > whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached
[jira] [Commented] (HADOOP-14976) Allow overriding HADOOP_SHELL_EXECNAME
[ https://issues.apache.org/jira/browse/HADOOP-14976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267146#comment-16267146 ] Arpit Agarwal commented on HADOOP-14976: Allen, do you have any comments on the updated patch or the test case? Thanks. > Allow overriding HADOOP_SHELL_EXECNAME > -- > > Key: HADOOP-14976 > URL: https://issues.apache.org/jira/browse/HADOOP-14976 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HADOOP-14976.01.patch, HADOOP-14976.02.patch, > HADOOP-14976.03.patch, HADOOP-14976.04.patch > > > Some Hadoop shell scripts infer their own name using this bit of shell magic: > {code} > 18 MYNAME="${BASH_SOURCE-$0}" > 19 HADOOP_SHELL_EXECNAME="${MYNAME##*/}" > {code} > e.g. see the > [hdfs|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs#L18] > script. > The inferred shell script name is later passed to _hadoop-functions.sh_ which > uses it to construct the names of some environment variables. E.g. when > invoking _hdfs datanode_, the options variable name is inferred as follows: > {code} > # HDFS + DATANODE + OPTS -> HDFS_DATANODE_OPTS > {code} > This works well if the calling script name is standard {{hdfs}} or {{yarn}}. > If a distribution renames the script to something like foo.bar, , then the > variable names will be inferred as {{FOO.BAR_DATANODE_OPTS}}. This is not a > valid bash variable name. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267086#comment-16267086 ] Sean Mackrory commented on HADOOP-14475: {quote}I think we'd probably be better off with hostname than full fsURI. The URI will include the schema, and may, if we don't sanitise it properly, include user:password secrets. Using bucket only will keep things more consistent with other code{quote} Absolutely! I had thought about that but forgotten, changing to just use hostname now. {quote}azure just give the registry name to the record{quote} Looks to me like azure tags it with the account name, container name, and a random ID unique to the instance. This is very similar to what we're doing. {quote}I think we don't need to add name.getHost to record name{quote} I agree. But why does the metrics source name need it then? Unless there may be multiple record types from the same source I don't think I understand the purpose, so maybe I'm missing something. If we could include the local hostname (e.g. the cluster node this metric came from) that would be more useful... Any thoughts? > Metrics of S3A don't print out when enable it in Hadoop metrics property file > -- > > Key: HADOOP-14475 > URL: https://issues.apache.org/jira/browse/HADOOP-14475 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: uname -a > Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > cat /etc/issue > Ubuntu 16.04.2 LTS \n \l >Reporter: Yonger >Assignee: Yonger > Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, > HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, > HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, > HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14775.007.patch, > failsafe-report-s3a-it.html, failsafe-report-s3a-scale.html, > failsafe-report-scale.html, failsafe-report-scale.zip, s3a-metrics.patch1, > stdout.zip > > > *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink > #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #*.sink.influxdb.url=http:/xx > #*.sink.influxdb.influxdb_port=8086 > #*.sink.influxdb.database=hadoop > #*.sink.influxdb.influxdb_username=hadoop > #*.sink.influxdb.influxdb_password=hadoop > #*.sink.ingluxdb.cluster=c1 > *.period=10 > #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out > I can't find the out put file even i run a MR job which should be used s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15071) s3a troubleshooting docs to add a couple more failure modes
[ https://issues.apache.org/jira/browse/HADOOP-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267073#comment-16267073 ] genericqa commented on HADOOP-15071: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 26m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 38m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15071 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12899437/HADOOP-15071-001.patch | | Optional Tests | asflicense mvnsite | | uname | Linux 0e75b2aea180 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2bde3ae | | maven | version: Apache Maven 3.3.9 | | whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/13750/artifact/out/whitespace-eol.txt | | Max. process+thread count | 341 (vs. ulimit of 5000) | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13750/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > s3a troubleshooting docs to add a couple more failure modes > --- > > Key: HADOOP-15071 > URL: https://issues.apache.org/jira/browse/HADOOP-15071 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs/s3 >Affects Versions: 2.8.2 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-15071-001.patch > > > I've got some more troubleshooting entries to add -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15071) s3a troubleshooting docs to add a couple more failure modes
[ https://issues.apache.org/jira/browse/HADOOP-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15071: Attachment: HADOOP-15071-001.patch Patch 001 * running out of the pooled connections * auth error on a PUT (but read OK) > s3a troubleshooting docs to add a couple more failure modes > --- > > Key: HADOOP-15071 > URL: https://issues.apache.org/jira/browse/HADOOP-15071 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs/s3 >Affects Versions: 2.8.2 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-15071-001.patch > > > I've got some more troubleshooting entries to add -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15071) s3a troubleshooting docs to add a couple more failure modes
[ https://issues.apache.org/jira/browse/HADOOP-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15071: Status: Patch Available (was: Open) > s3a troubleshooting docs to add a couple more failure modes > --- > > Key: HADOOP-15071 > URL: https://issues.apache.org/jira/browse/HADOOP-15071 > Project: Hadoop Common > Issue Type: Sub-task > Components: documentation, fs/s3 >Affects Versions: 2.8.2 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-15071-001.patch > > > I've got some more troubleshooting entries to add -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14971) Merge S3A committers into trunk
[ https://issues.apache.org/jira/browse/HADOOP-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267005#comment-16267005 ] ASF GitHub Bot commented on HADOOP-14971: - Github user steveloughran closed the pull request at: https://github.com/apache/hadoop/pull/282 > Merge S3A committers into trunk > --- > > Key: HADOOP-14971 > URL: https://issues.apache.org/jira/browse/HADOOP-14971 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 3.1.0 > > Attachments: HADOOP-13786-040.patch, HADOOP-13786-041.patch > > > Merge the HADOOP-13786 committer into trunk. This branch is being set up as a > github PR for review there & to keep it out the mailboxes of the watchers on > the main JIRA -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15071) s3a troubleshooting docs to add a couple more failure modes
Steve Loughran created HADOOP-15071: --- Summary: s3a troubleshooting docs to add a couple more failure modes Key: HADOOP-15071 URL: https://issues.apache.org/jira/browse/HADOOP-15071 Project: Hadoop Common Issue Type: Sub-task Components: documentation, fs/s3 Affects Versions: 2.8.2 Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor I've got some more troubleshooting entries to add -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15070) add test to verify FileSystem and paths differentiate on user info
[ https://issues.apache.org/jira/browse/HADOOP-15070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266942#comment-16266942 ] genericqa commented on HADOOP-15070: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 50s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 31s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 37s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 2 new + 8 unchanged - 6 fixed = 10 total (was 14) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 46s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 38s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 89m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15070 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12899394/HADOOP-15070-001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 82086d40498b 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2bde3ae | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/13749/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/13749/testReport/ | | Max. process+thread count | 1767 (vs. ulimit of 5000) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13749/console | | Powered by | Apache Yetus
[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266866#comment-16266866 ] Yonger commented on HADOOP-14475: - azure just give the registry name to the record, which is confused. We can't distinguish the different records if there are more than one relative metric source registered within a process, what I given the example in previous comments. > Metrics of S3A don't print out when enable it in Hadoop metrics property file > -- > > Key: HADOOP-14475 > URL: https://issues.apache.org/jira/browse/HADOOP-14475 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: uname -a > Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > cat /etc/issue > Ubuntu 16.04.2 LTS \n \l >Reporter: Yonger >Assignee: Yonger > Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, > HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, > HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, > HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14775.007.patch, > failsafe-report-s3a-it.html, failsafe-report-s3a-scale.html, > failsafe-report-scale.html, failsafe-report-scale.zip, s3a-metrics.patch1, > stdout.zip > > > *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink > #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #*.sink.influxdb.url=http:/xx > #*.sink.influxdb.influxdb_port=8086 > #*.sink.influxdb.database=hadoop > #*.sink.influxdb.influxdb_username=hadoop > #*.sink.influxdb.influxdb_password=hadoop > #*.sink.ingluxdb.cluster=c1 > *.period=10 > #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out > I can't find the out put file even i run a MR job which should be used s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266856#comment-16266856 ] Yonger commented on HADOOP-14943: - [~ste...@apache.org]I remember there are some discussion about how to configure the fake host list, such as returning endpoint, compute hosts and a star, is this right? I am not sure whether i understand these points totally. I just test these four cases with 1TB dataset on query42 of TPC-DS, results are below(seconds): ||default localhost||endpoint||star||compute host list|| |16|16l 16|28| >From this result, performance are equal in these cases except returning >compute host list. > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266856#comment-16266856 ] Yonger edited comment on HADOOP-14943 at 11/27/17 2:15 PM: --- [~ste...@apache.org]I remember there are some discussion about how to configure the fake host list, such as returning endpoint, compute hosts and a star, is this right? I am not sure whether i understand these points totally. I just test these four cases with 1TB dataset on query42 of TPC-DS, results are below(seconds): ||default localhost||endpoint||star||compute host list|| |16|16|16|28| >From this result, performance are equal in these cases except returning >compute host list. was (Author: iyonger): [~ste...@apache.org]I remember there are some discussion about how to configure the fake host list, such as returning endpoint, compute hosts and a star, is this right? I am not sure whether i understand these points totally. I just test these four cases with 1TB dataset on query42 of TPC-DS, results are below(seconds): ||default localhost||endpoint||star||compute host list|| |16|16l 16|28| >From this result, performance are equal in these cases except returning >compute host list. > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15070) add test to verify FileSystem and paths differentiate on user info
[ https://issues.apache.org/jira/browse/HADOOP-15070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15070: Attachment: HADOOP-15070-001.patch patch 001: add new comparision test that keys include userInfo. Also: update TestFileSystemCaching to Java7/8, factor out common operations > add test to verify FileSystem and paths differentiate on user info > -- > > Key: HADOOP-15070 > URL: https://issues.apache.org/jira/browse/HADOOP-15070 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, test >Affects Versions: 2.8.2 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-15070-001.patch > > > Add a test to verify that userinfo data is (correctly) used to differentiate > the entries in the FS cache, so are treated as different filesystems. > * This is criticalk for wasb, which uses the username to identify the > container, in a path like {{wasb:contain...@stevel.azure.net}}. This works in > Hadoop, but SPARK-22587 shows that it may not be followed everywhere (and > given there's no documentation, who can fault them?) > * AbstractFileSystem.checkPath looks suspiciously like it's path validation > just checks host, not authority. That needs a test too. > * And we should cut the @LimitedPrivate(HDFS, Mapreduce) from > Path.makeQualified. If MR needs it, it should be considered open to all apps > using the Hadoop APIs. Until I looked at the code I thought it was... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15070) add test to verify FileSystem and paths differentiate on user info
[ https://issues.apache.org/jira/browse/HADOOP-15070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15070: Status: Patch Available (was: Open) > add test to verify FileSystem and paths differentiate on user info > -- > > Key: HADOOP-15070 > URL: https://issues.apache.org/jira/browse/HADOOP-15070 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, test >Affects Versions: 2.8.2 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-15070-001.patch > > > Add a test to verify that userinfo data is (correctly) used to differentiate > the entries in the FS cache, so are treated as different filesystems. > * This is criticalk for wasb, which uses the username to identify the > container, in a path like {{wasb:contain...@stevel.azure.net}}. This works in > Hadoop, but SPARK-22587 shows that it may not be followed everywhere (and > given there's no documentation, who can fault them?) > * AbstractFileSystem.checkPath looks suspiciously like it's path validation > just checks host, not authority. That needs a test too. > * And we should cut the @LimitedPrivate(HDFS, Mapreduce) from > Path.makeQualified. If MR needs it, it should be considered open to all apps > using the Hadoop APIs. Until I looked at the code I thought it was... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15069) support git-secrets commit hook to keep AWS secrets out of git
[ https://issues.apache.org/jira/browse/HADOOP-15069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266708#comment-16266708 ] Steve Loughran commented on HADOOP-15069: - no, only those lines were autogenerated. The rest were built by trial and error: running the script and seeing what failed. The regexp and those strings are enough to keep the current source code and any new commits happy. The regexp didn't work for old repos, so I tried to insert the explicit strings, but eventually gave up. The key thing is with this file, if the user installs the git secrets hook & registers the AWS secrets, then they are kept out of source > support git-secrets commit hook to keep AWS secrets out of git > -- > > Key: HADOOP-15069 > URL: https://issues.apache.org/jira/browse/HADOOP-15069 > Project: Hadoop Common > Issue Type: Sub-task > Components: build >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-15069-001.patch, HADOOP-15069-002.patch > > > The latest Uber breach looks like it involved AWS keys in git repos. > Nobody wants that, which is why amazon provide > [git-secrets|https://github.com/awslabs/git-secrets]; a script you can use to > scan a repo and its history, *and* add as an automated check. > Anyone can set this up, but there are a few false positives in the scan, > mostly from longs and a few all-upper-case constants. These can all be added > to a .gitignore file. > Also: mention git-secrets in the aws testing docs; say "use it" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15070) add test to verify FileSystem and paths differentiate on user info
Steve Loughran created HADOOP-15070: --- Summary: add test to verify FileSystem and paths differentiate on user info Key: HADOOP-15070 URL: https://issues.apache.org/jira/browse/HADOOP-15070 Project: Hadoop Common Issue Type: Improvement Components: fs, test Affects Versions: 2.8.2 Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Add a test to verify that userinfo data is (correctly) used to differentiate the entries in the FS cache, so are treated as different filesystems. * This is criticalk for wasb, which uses the username to identify the container, in a path like {{wasb:contain...@stevel.azure.net}}. This works in Hadoop, but SPARK-22587 shows that it may not be followed everywhere (and given there's no documentation, who can fault them?) * AbstractFileSystem.checkPath looks suspiciously like it's path validation just checks host, not authority. That needs a test too. * And we should cut the @LimitedPrivate(HDFS, Mapreduce) from Path.makeQualified. If MR needs it, it should be considered open to all apps using the Hadoop APIs. Until I looked at the code I thought it was... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org