[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2017-11-27 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268253#comment-16268253
 ] 

Xiao Chen commented on HADOOP-14445:


Thanks all for the discussions about this, and Rushabh for a demonstration 
patch.
I've read through the comments of this one as well as HADOOP-14441. The patch 
would also take care of HADOOP-14134.

To summarize and make sure my understanding is correct:
- We all agree the token sharing will be an issue
- For HDFS clients, the kp uri is already provided by NN. (Already done by 
HADOOP-14104)
- Token should be recognized from UGI credentials when clients trying to 
authenticate. (reason for HADOOP-14441)
- Token renewal currently reads configs. We'd want this to be read from the 
token, and make sure it's HA'ed. (reason for HADOOP-14134)
- Besides the ip:port of the KMS instance, there's also information about 
whether it's http or https. We'd need a way to get this information, during 
renewal/cancelation.
- Backwards compatible: old clients should work with new server; new clients 
should work with old server. 

Having a way to let clients work without reading configs feels to me to be a 
better approach, and more inline with HADOOP-14104, and the 'transparent' name 
of HDFS encryption. It feels to me a 'nameservice' solution doesn't add much 
value than the current way of using the full kp uri, which by itself is a 
representation of available services. For added KMS scenario, I think it's fine 
to let existing tokens to use the existing instances in that token's uri - 
presumbly this is better than deploying the config and restarting the client. 
For removed KMS instances, LBKMSCP handles it.

It seems to me the patch here has handled most of this pretty well. I plan to 
take a crack at this tomorrow to:
- rebase to trunk
- address some earlier comments
- modify with some of my thoughts: seems we can use LBKMSCP to store the full 
provider uri; some tests to verify compatibility

If you have any comments, please let me know. 

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation, kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Rushabh S Shah
> Attachments: HADOOP-14445-branch-2.8.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15070) add test to verify FileSystem and paths differentiate on user info

2017-11-27 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268224#comment-16268224
 ] 

Mingliang Liu commented on HADOOP-15070:


+1

> add test to verify FileSystem and paths differentiate on user info
> --
>
> Key: HADOOP-15070
> URL: https://issues.apache.org/jira/browse/HADOOP-15070
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 2.8.2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15070-001.patch
>
>
> Add a test to verify that userinfo data is (correctly) used to differentiate 
> the entries in the FS cache, so are treated as different filesystems.
> * This is criticalk for wasb, which uses the username to identify the 
> container, in a path like {{wasb:contain...@stevel.azure.net}}. This works in 
> Hadoop, but SPARK-22587 shows that it may not be followed everywhere (and 
> given there's no documentation, who can fault them?)
> * AbstractFileSystem.checkPath looks suspiciously like it's path validation 
> just checks host, not authority. That needs a test too.
> * And we should cut the @LimitedPrivate(HDFS, Mapreduce) from 
> Path.makeQualified. If MR needs it, it should be considered open to all apps 
> using the Hadoop APIs. Until I looked at the code I thought it was...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14699) Impersonation errors with UGI after second principal relogin

2017-11-27 Thread Jeff Storck (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239746#comment-16239746
 ] 

Jeff Storck edited comment on HADOOP-14699 at 11/28/17 6:05 AM:


[~jnp] Please take a look at the [test 
code|https://github.com/jtstorck/ugi-test] I have provided.  It shows a 
simplified scenario (inspired by a use case in NiFi) that causes the 
impersonation error.  If two instantiations of the UGI class are used to 
represent two users, the impersonation error will occur on the relogin of the 
second user, provided that Hadoop is not configured to allow the impersonation. 
This use case of UGI occurs in NiFi when the Kerberos credentials in a Hadoop 
processor are changed from one user to another, with no intention of proxying a 
user.


was (Author: jtstorck):
[~jnp] Please take a look at the [test 
code|https://github.com/jtstorck/kerberos-examples/tree/master/hadoop/ugi-test] 
I have provided.  It shows a simplified scenario (inspired by a use case in 
NiFi) that causes the impersonation error.  If two instantiations of the UGI 
class are used to represent two users, the impersonation error will occur on 
the relogin of the second user, provided that Hadoop is not configured to allow 
the impersonation. This use case of UGI occurs in NiFi when the Kerberos 
credentials in a Hadoop processor are changed from one user to another, with no 
intention of proxying a user.

> Impersonation errors with UGI after second principal relogin
> 
>
> Key: HADOOP-14699
> URL: https://issues.apache.org/jira/browse/HADOOP-14699
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.6.2, 2.7.3, 2.8.1
>Reporter: Jeff Storck
>
> Multiple principals that are logged in using UGI instances that are 
> instantiated from a UGI class loaded by the same classloader will encounter 
> problems when the second principal attempts to relogin and perform an action 
> using a UGI.doAs().  An impersonation will occur and the operation attempted 
> by the second principal after relogging in will fail.  There should not be an 
> implicit attempt to impersonate the second principal through the first 
> principal that logged in.
> I have created  a GitHub project that exhibits the impersonation error with 
> brief instructions on how to set up for the test and run it: 
> https://github.com/jtstorck/ugi-test
> {noformat}18:44:55.687 [pool-2-thread-2] WARN  
> h.u.u.ugirunnable.ugite...@example.com - Unexpected exception while 
> performing task for [ugite...@example.com (auth:KERBEROS)]
> org.apache.hadoop.ipc.RemoteException: User: ugite...@example.com is not 
> allowed to impersonate ugite...@example.com
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1481)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1427)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1337)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:787)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1700)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1436)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1433)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1448)
>   at 
> 

[jira] [Updated] (HADOOP-14699) Impersonation errors with UGI after second principal relogin

2017-11-27 Thread Jeff Storck (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Storck updated HADOOP-14699:
-
Description: 
Multiple principals that are logged in using UGI instances that are 
instantiated from a UGI class loaded by the same classloader will encounter 
problems when the second principal attempts to relogin and perform an action 
using a UGI.doAs().  An impersonation will occur and the operation attempted by 
the second principal after relogging in will fail.  There should not be an 
implicit attempt to impersonate the second principal through the first 
principal that logged in.

I have created  a GitHub project that exhibits the impersonation error with 
brief instructions on how to set up for the test and run it: 
https://github.com/jtstorck/ugi-test

{noformat}18:44:55.687 [pool-2-thread-2] WARN  
h.u.u.ugirunnable.ugite...@example.com - Unexpected exception while performing 
task for [ugite...@example.com (auth:KERBEROS)]
org.apache.hadoop.ipc.RemoteException: User: ugite...@example.com is not 
allowed to impersonate ugite...@example.com
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1481)
at org.apache.hadoop.ipc.Client.call(Client.java:1427)
at org.apache.hadoop.ipc.Client.call(Client.java:1337)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:787)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1700)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1436)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1433)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1448)
at 
hadoop.ugitest.UgiTestMain$UgiRunnable.lambda$run$2(UgiTestMain.java:194)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at hadoop.ugitest.UgiTestMain$UgiRunnable.run(UgiTestMain.java:194)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745){noformat}

  was:
Multiple principals that are logged in using UGI instances that are 
instantiated from a UGI class loaded by the same classloader will encounter 
problems when the second principal attempts to relogin and perform an action 
using a UGI.doAs().  An impersonation will occur and the operation attempted by 
the second principal after relogging in will fail.  There should not be an 
implicit attempt to impersonate the second principal through the first 
principal that logged in.

I have created  a GitHub project that exhibits the impersonation error with 
brief instructions on how to set up for the test and run it: 
https://github.com/jtstorck/kerberos-examples/tree/master/hadoop/ugi-test

{noformat}18:44:55.687 [pool-2-thread-2] WARN  
h.u.u.ugirunnable.ugite...@example.com - Unexpected 

[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-11-27 Thread Yonger (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268148#comment-16268148
 ] 

Yonger commented on HADOOP-14475:
-

[~mackrorysd] Thank you.  I start to understand your logic, and I think you are 
right.
{code:java}
 There's no guarantee that metrics source names would even be consistent among 
all JVMs for a given bucket, since they're assigned numbers in the order that 
they're created
{code}
I can get the info what i want to by aggregating based on bucket instead of 
metric source name which is not still unique in multiple JVM processes.


> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> --
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>Reporter: Yonger
>Assignee: Yonger
> Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, 
> HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, 
> HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, 
> HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, 
> HADOOP-14475.015.patch, HADOOP-14775.007.patch, failsafe-report-s3a-it.html, 
> failsafe-report-s3a-scale.html, failsafe-report-scale.html, 
> failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions

2017-11-27 Thread Ping Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268098#comment-16268098
 ] 

Ping Liu commented on HADOOP-14600:
---

Yes, Chris.  I am verifying the patch.  There is an issue just found tonight in 
my Linux environment.  In TestRawLocalFileSystemContract.testPermission(), the 
native call failed with {{java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;}}.
  I'll look into it further.  I guess it is due to my last change.  I'll come 
back with update.

> LocatedFileStatus constructor forces RawLocalFS to exec a process to get the 
> permissions
> 
>
> Key: HADOOP-14600
> URL: https://issues.apache.org/jira/browse/HADOOP-14600
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.3
> Environment: file:// in a dir with many files
>Reporter: Steve Loughran
>Assignee: Ping Liu
> Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, 
> HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, 
> HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, 
> HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java
>
>
> Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws 
> against the local FS, because {{FileStatus.getPemissions}} call forces  
> {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI 
> values.
> That is: for every other FS, what's a field lookup or even a no-op, on the 
> local FS it's a process exec/spawn, with all the costs. This gets expensive 
> if you have many files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-11-27 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268073#comment-16268073
 ] 

Sean Mackrory edited comment on HADOOP-14475 at 11/28/17 4:07 AM:
--

[~iyonger] I don't believe I've changed anything that would affect whether or 
not the metrics source name shows up in the sink's output - I've only changed 
what the source name would be. I think the format even when I tested your 
original patch was pretty much as it is above, and did not include the source 
name, only the record name (which happened to be similar). If you're wanting to 
aggregate based on the bucket, I would use the bucket field itself. There's no 
guarantee that metrics source names would even be consistent among all JVMs for 
a given bucket, since they're assigned numbers in the order that they're 
created - that would only be true if every JVM had accessed the exact same 
buckets in the exact same order - the assumption would break down as soon as a 
job didn't utilize the entire cluster or a node was down during a job, etc.


was (Author: mackrorysd):
[~iyonger] I don't believe I've changed anything that would affect whether or 
not the metrics source name shows up in the sink's output - I've only changed 
what the source name would be. I think the format even when I tested your 
original patch was pretty much as it is above, and did not include the source 
name, only the record name (which happened to be similar). If you're wanting to 
aggregate based on the bucket, I would use the bucket field itself. There's no 
guarantee that metrics source names would even be consistent among all JVMs for 
a given bucket - that would only be true if every JVM had accessed the exact 
same buckets in the exact same order - the assumption would break down as soon 
as a job didn't utilize the entire cluster or a node was down during a job, etc.

> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> --
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>Reporter: Yonger
>Assignee: Yonger
> Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, 
> HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, 
> HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, 
> HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, 
> HADOOP-14475.015.patch, HADOOP-14775.007.patch, failsafe-report-s3a-it.html, 
> failsafe-report-s3a-scale.html, failsafe-report-scale.html, 
> failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-11-27 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268073#comment-16268073
 ] 

Sean Mackrory commented on HADOOP-14475:


[~iyonger] I don't believe I've changed anything that would affect whether or 
not the metrics source name shows up in the sink's output - I've only changed 
what the source name would be. I think the format even when I tested your 
original patch was pretty much as it is above, and did not include the source 
name, only the record name (which happened to be similar). If you're wanting to 
aggregate based on the bucket, I would use the bucket field itself. There's no 
guarantee that metrics source names would even be consistent among all JVMs for 
a given bucket - that would only be true if every JVM had accessed the exact 
same buckets in the exact same order - the assumption would break down as soon 
as a job didn't utilize the entire cluster or a node was down during a job, etc.

> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> --
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>Reporter: Yonger
>Assignee: Yonger
> Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, 
> HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, 
> HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, 
> HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, 
> HADOOP-14475.015.patch, HADOOP-14775.007.patch, failsafe-report-s3a-it.html, 
> failsafe-report-s3a-scale.html, failsafe-report-scale.html, 
> failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-11-27 Thread Yonger (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268051#comment-16268051
 ] 

Yonger commented on HADOOP-14475:
-

[~mackrorysd] So our final output format of record doesn't include any metric 
source info? if yes, I think it is not friendly to make a statistic chart like 
via InfluxDB+Grafana, because only fsId is unique in a record, especially there 
are multiple metric sources registered with the same bucket(I don't know why, 
but they were exist in my test), whose output records can't be easy to 
distinguish except fsid, then your chart in Grafana is hard readable.

> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> --
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>Reporter: Yonger
>Assignee: Yonger
> Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, 
> HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, 
> HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, 
> HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, 
> HADOOP-14475.015.patch, HADOOP-14775.007.patch, failsafe-report-s3a-it.html, 
> failsafe-report-s3a-scale.html, failsafe-report-scale.html, 
> failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-27 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268003#comment-16268003
 ] 

genericqa commented on HADOOP-15059:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
14s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 12m 11s{color} | 
{color:red} root generated 7 new + 7 unchanged - 0 fixed = 14 total (was 7) 
{color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
11s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m  1s{color} | {color:orange} root: The patch generated 4 new + 364 unchanged 
- 7 fixed = 368 total (was 371) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
8m 54s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
59s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
38s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 22s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}111m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.scheduler.TestContainerSchedulerQueuing
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15059 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12899517/HADOOP-15059.001.patch
 |
| Optional Tests |  asflicense  compile  

[jira] [Commented] (HADOOP-13134) WASB's file delete still throwing Blob not found exception

2017-11-27 Thread Zichen Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267968#comment-16267968
 ] 

Zichen Sun commented on HADOOP-13134:
-

We had a similar issue reported a few times when doing delete directories, and 
the workaround was to set fs.azure.flatlist.enable to true, introduced by 
https://issues.apache.org/jira/browse/HADOOP-13403, which skips the following
if (!enableFlatListing) {    // Currently at a depth of one, decrement the 
listing depth for    // sub-directories.    buildUpList(directory, 
fileMetadata, maxListingCount,maxListingDepth - 1);}

> WASB's file delete still throwing Blob not found exception
> --
>
> Key: HADOOP-13134
> URL: https://issues.apache.org/jira/browse/HADOOP-13134
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.7.1
>Reporter: Lin Chan
>Assignee: Dushyanth
>
> WASB is still throwing blob not found exception as shown in the following 
> stack. Need to catch that and convert to Boolean return code in WASB delete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14699) Impersonation errors with UGI after second principal relogin

2017-11-27 Thread Jeff Storck (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267929#comment-16267929
 ] 

Jeff Storck commented on HADOOP-14699:
--

[~jnp] I updated the test code and provided new instructions for reproducing 
the impersonation issue.  The test code has been updated to provide 
per-principal task configuration, and now writes files to HDFS rather than just 
retrieving status.  Please let me know if you have any problems using the 
updated code.  Thanks!

> Impersonation errors with UGI after second principal relogin
> 
>
> Key: HADOOP-14699
> URL: https://issues.apache.org/jira/browse/HADOOP-14699
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.6.2, 2.7.3, 2.8.1
>Reporter: Jeff Storck
>
> Multiple principals that are logged in using UGI instances that are 
> instantiated from a UGI class loaded by the same classloader will encounter 
> problems when the second principal attempts to relogin and perform an action 
> using a UGI.doAs().  An impersonation will occur and the operation attempted 
> by the second principal after relogging in will fail.  There should not be an 
> implicit attempt to impersonate the second principal through the first 
> principal that logged in.
> I have created  a GitHub project that exhibits the impersonation error with 
> brief instructions on how to set up for the test and run it: 
> https://github.com/jtstorck/kerberos-examples/tree/master/hadoop/ugi-test
> {noformat}18:44:55.687 [pool-2-thread-2] WARN  
> h.u.u.ugirunnable.ugite...@example.com - Unexpected exception while 
> performing task for [ugite...@example.com (auth:KERBEROS)]
> org.apache.hadoop.ipc.RemoteException: User: ugite...@example.com is not 
> allowed to impersonate ugite...@example.com
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1481)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1427)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1337)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:787)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1700)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1436)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1433)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1448)
>   at 
> hadoop.ugitest.UgiTestMain$UgiRunnable.lambda$run$2(UgiTestMain.java:194)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>   at hadoop.ugitest.UgiTestMain$UgiRunnable.run(UgiTestMain.java:194)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Commented] (HADOOP-15039) move SemaphoredDelegatingExecutor to hadoop-common

2017-11-27 Thread Genmao Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267909#comment-16267909
 ] 

Genmao Yu commented on HADOOP-15039:


[~ste...@apache.org] Could you take a review again please?

> move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, 
> HADOOP-15039.003.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HADOOP-15039) move SemaphoredDelegatingExecutor to hadoop-common

2017-11-27 Thread Genmao Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated HADOOP-15039:
---
Comment: was deleted

(was: [~ste...@apache.org] take a look please.)

> move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, 
> HADOOP-15039.003.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14898) Create official Docker images for development and testing features

2017-11-27 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267894#comment-16267894
 ] 

Miklos Szegedi commented on HADOOP-14898:
-

Thank you [~elek] for the patch. Could you provide a patch file named 
HADOOP-14898.000.patch against the latest trunk, so that the Jenkins job (asf 
license, etc.) can run on it?

> Create official Docker images for development and testing features 
> ---
>
> Key: HADOOP-14898
> URL: https://issues.apache.org/jira/browse/HADOOP-14898
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Attachments: HADOOP-14898.001.tar.gz, HADOOP-14898.002.tar.gz, 
> HADOOP-14898.003.tgz
>
>
> This is the original mail from the mailing list:
> {code}
> TL;DR: I propose to create official hadoop images and upload them to the 
> dockerhub.
> GOAL/SCOPE: I would like improve the existing documentation with easy-to-use 
> docker based recipes to start hadoop clusters with various configuration.
> The images also could be used to test experimental features. For example 
> ozone could be tested easily with these compose file and configuration:
> https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6
> Or even the configuration could be included in the compose file:
> https://github.com/elek/hadoop/blob/docker-2.8.0/example/docker-compose.yaml
> I would like to create separated example compose files for federation, ha, 
> metrics usage, etc. to make it easier to try out and understand the features.
> CONTEXT: There is an existing Jira 
> https://issues.apache.org/jira/browse/HADOOP-13397
> But it’s about a tool to generate production quality docker images (multiple 
> types, in a flexible way). If no objections, I will create a separated issue 
> to create simplified docker images for rapid prototyping and investigating 
> new features. And register the branch to the dockerhub to create the images 
> automatically.
> MY BACKGROUND: I am working with docker based hadoop/spark clusters quite a 
> while and run them succesfully in different environments (kubernetes, 
> docker-swarm, nomad-based scheduling, etc.) My work is available from here: 
> https://github.com/flokkr but they could handle more complex use cases (eg. 
> instrumenting java processes with btrace, or read/reload configuration from 
> consul).
>  And IMHO in the official hadoop documentation it’s better to suggest to use 
> official apache docker images and not external ones (which could be changed).
> {code}
> The next list will enumerate the key decision points regarding to docker 
> image creating
> A. automated dockerhub build  / jenkins build
> Docker images could be built on the dockerhub (a branch pattern should be 
> defined for a github repository and the location of the Docker files) or 
> could be built on a CI server and pushed.
> The second one is more flexible (it's more easy to create matrix build, for 
> example)
> The first one had the advantage that we can get an additional flag on the 
> dockerhub that the build is automated (and built from the source by the 
> dockerhub).
> The decision is easy as ASF supports the first approach: (see 
> https://issues.apache.org/jira/browse/INFRA-12781?focusedCommentId=15824096=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15824096)
> B. source: binary distribution or source build
> The second question is about creating the docker image. One option is to 
> build the software on the fly during the creation of the docker image the 
> other one is to use the binary releases.
> I suggest to use the second approach as:
> 1. In that case the hadoop:2.7.3 could contain exactly the same hadoop 
> distrubution as the downloadable one
> 2. We don't need to add development tools to the image, the image could be 
> more smaller (which is important as the goal for this image to getting 
> started as fast as possible)
> 3. The docker definition will be more simple (and more easy to maintain)
> Usually this approach is used in other projects (I checked Apache Zeppelin 
> and Apache Nutch)
> C. branch usage
> Other question is the location of the Docker file. It could be on the 
> official source-code branches (branch-2, trunk, etc.) or we can create 
> separated branches for the dockerhub (eg. docker/2.7 docker/2.8 docker/3.0)
> For the first approach it's easier to find the docker images, but it's less 
> flexible. For example if we had a Dockerfile for on the source code it should 
> be used for every release (for example the Docker file from the tag 
> release-3.0.0 should be used for the 3.0 hadoop docker image). In that case 
> the release process is much more harder: in case of a Dockerfile error (which 
> could be test on dockerhub only after the 

[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-11-27 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267892#comment-16267892
 ] 

genericqa commented on HADOOP-14475:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 1 
new + 19 unchanged - 0 fixed = 20 total (was 19) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
35s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-14475 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12899510/HADOOP-14475.015.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7cbf4148bb74 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d8923cd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13753/artifact/out/diff-checkstyle-hadoop-tools_hadoop-aws.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13753/testReport/ |
| Max. process+thread count | 325 (vs. ulimit of 5000) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13753/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This 

[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-27 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HADOOP-15059:

Attachment: HADOOP-15059.001.patch

Attaching a patch that has the container launch process write two token files, 
the legacy token format in the existing container_tokens file and the new 
version 1 format in a new container_tokens-v1 file.  I left the container 
localizer path alone since localizers are running the same code as the 
nodemanager and therefore can directly support the new v1 format as-is.

The basic idea is to tack on the "-v1" suffix to the legacy token pathname to 
form the new v1 token pathname.  The container launcher and container executors 
both do this, so the interface between them did not have to change.  The legacy 
path is passed between them to indicate where both token files can be located 
(once the suffix is applied to form the new token path).  It's definitely not 
the cleanest, but it was relatively simple to implement.  I refactored some 
names in the container start context to make it more clear which path is being 
used.

This needs a lot more testing, but I was able to run a sleep job on a simple 
security pseudo-distributed cluster and manually verified both container token 
files were being written and each was the proper format.  I also manually 
forced the launcher to omit the new environment variable for the version 1 
file, forcing the UGI to load the legacy token file, and that worked as well.  
I have not had a chance yet to test the rolling-upgrade-with-tarball scenario 
nor the native container-executor changes, but I thought it was far enough 
along to at least get some feedback.

If others could take a look at the patch and/or take it for a test drive that 
would be great.


> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-27 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HADOOP-15059:

Status: Patch Available  (was: Open)

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-11-27 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267865#comment-16267865
 ] 

Sean Mackrory commented on HADOOP-14475:


One other thing worth mentioning is the compatibility implications of removing 
fsUri and replacing it with bucket. This was basically unusable from a metrics2 
perspective before, so... it's pretty tough to be broken by this change.

> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> --
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>Reporter: Yonger
>Assignee: Yonger
> Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, 
> HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, 
> HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, 
> HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, 
> HADOOP-14475.015.patch, HADOOP-14775.007.patch, failsafe-report-s3a-it.html, 
> failsafe-report-s3a-scale.html, failsafe-report-scale.html, 
> failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-11-27 Thread Sean Mackrory (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-14475:
---
Attachment: HADOOP-14475.015.patch

Conflict presumably due to s3a output committer. Relatively simple to merge - 
but will need to retest. Some unit tests modified or created by the committer 
are failing locally.

> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> --
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>Reporter: Yonger
>Assignee: Yonger
> Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, 
> HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, 
> HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, 
> HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, 
> HADOOP-14475.015.patch, HADOOP-14775.007.patch, failsafe-report-s3a-it.html, 
> failsafe-report-s3a-scale.html, failsafe-report-scale.html, 
> failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-11-27 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267791#comment-16267791
 ] 

genericqa commented on HADOOP-14475:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HADOOP-14475 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-14475 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12899507/HADOOP-14475.014.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13752/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> --
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>Reporter: Yonger
>Assignee: Yonger
> Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, 
> HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, 
> HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, 
> HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, 
> HADOOP-14775.007.patch, failsafe-report-s3a-it.html, 
> failsafe-report-s3a-scale.html, failsafe-report-scale.html, 
> failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-11-27 Thread Sean Mackrory (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-14475:
---
Attachment: HADOOP-14475.014.patch

So I see the local hostname is already included in metrics - that makes sense. 
I've updated the patch to just use the bucket, not the entire URI. So long as I 
was looking at how wasb does this, I took a more specific look at how it names 
sources, records, etc. and made a few changes to be more in line with what 
they're doing - some of which are actually a return to how [~iyonger] had 
originally done it, which I had changed to be more brief. Consistency with wasb 
is a more compelling motivation that a shorter context name.

So to be clear, a line in the metrics log used to look like this:
{code}
1511208770680 s3afs.S3AMetrics1-bucket: Context=s3afs, 
FileSystemId=892b02bb-7b30-4ffe-80ca-3a9935e1d96e-bucket, 
fsURI=s3a://bucket/home/user/terasuite
{code}

But now looks like this:
{code}
1511208770680 s3aFileSystem.s3aFileSystem: Context=s3aFileSystem, 
s3aFileSystemId=892b02bb-7b30-4ffe-80ca-3a9935e1d96e, bucket=bucket,
{code}

> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> --
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>Reporter: Yonger
>Assignee: Yonger
> Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, 
> HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, 
> HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, 
> HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14475.014.patch, 
> HADOOP-14775.007.patch, failsafe-report-s3a-it.html, 
> failsafe-report-s3a-scale.html, failsafe-report-scale.html, 
> failsafe-report-scale.zip, s3a-metrics.patch1, stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-10054) ViewFsFileStatus.toString() is broken

2017-11-27 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HADOOP-10054:
--

Assignee: Hanisha Koneru

> ViewFsFileStatus.toString() is broken
> -
>
> Key: HADOOP-10054
> URL: https://issues.apache.org/jira/browse/HADOOP-10054
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.0.5-alpha
>Reporter: Paul Han
>Assignee: Hanisha Koneru
>Priority: Minor
>
> ViewFsFileStatus.toString is broken. Following code snippet :
> {code}
> FileStatus stat= somefunc(); // somefunc() returns an instance of 
> ViewFsFileStatus
> System.out.println("path:" + stat.getPath());
>   System.out.println(stat.toString());
> {code}
> produces the output:
> {code}
> path:viewfs://x.com/user/X/tmp-48
> ViewFsFileStatus{path=null; isDirectory=false; length=0; replication=0; 
> blocksize=0; modification_time=0; access_time=0; owner=; group=; 
> permission=rw-rw-rw-; isSymlink=false}
> {code}
> Note that "path=null" is not correct.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2

2017-11-27 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267686#comment-16267686
 ] 

Chris Douglas commented on HADOOP-14964:


bq. The patch for 2.8.3 will be cherry picked from branch-2
Please cherry-pick from branch-2.9 to branch-2.8, then from branch-2.8 to 
branch-2.8.3 so the lineage is clear.

The concerns from the release thread:
# *Dependencies* Since the OSS artifact is not shaded, does it introduce 
dependencies that could conflict with versions used by clients?
# *Upgrade* 2.9.0 does not contain Aliyun OSS. If we release 2.8.3, users 
cannot upgrade to 2.9.x without regressing on this feature.
# *Timing* 2.8.3 contains critical bug fixes, so this needs to wrap up quickly 
if it will be included.
# *Release policy* This should not contradict Hadoop's [compatibility 
guidelines|https://hadoop.apache.org/docs/r3.0.0-beta1/hadoop-project-dist/hadoop-common/Compatibility.html].

1-3 are concerns that have a technical remediation:
For (1), what are all the transitive dependencies for this module? If the SDK 
uses dependencies that already ship with 2.8.2, then we may not need shading, 
but any new, visible dependency needs to be added to the release notes and 
classify this as an incompatible change. If the number of new, visible 
dependencies is zero (by shading or reuse), then (1) is trivially satisfied.
For (2), if we take the current branch-2.9 and cut a 2.9.1 release, that would 
tranquilize anxiety about the upgrade path. [~Sammi], would you be able to RM 
this? [~asuresh] and [~subru] may be able to help by providing pointers to 
release docs.
(3) Is not a problem, as there are some blocker issues and the patch is 
available.

As policy, (4) is stickier. Even releasing this with 2.9.1 doesn't strictly 
adhere to our rules, but it's better than adding another, active release 
branch. The case for 2.8.3 is more problematic. The cadence for 2.8.x will 
decelerate more rapidly than 2.9.x, so fixes to Aliyun OSS will be released 
less often. We may not do its users a favor by including an outdated client 
with their 2.8 clusters. Frankly, maintenance is also simpler when we disallow 
feature backports into patch releases, rather than discussing the merits of 
each one. This can't become a precedent; it takes too much time.

Personally, if 1-3 are addressed and folks working on the module still want to 
go ahead with it, then I'd support a 2.8.3 release with Aliyun OSS.

> AliyunOSS: backport Aliyun OSS module to branch-2
> -
>
> Key: HADOOP-14964
> URL: https://issues.apache.org/jira/browse/HADOOP-14964
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: SammiChen
> Fix For: 2.9.1
>
> Attachments: HADOOP-14964-branch-2.000.patch, 
> HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, 
> HADOOP-14964-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-27 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned HADOOP-15059:
---

Assignee: Jason Lowe

Thanks for the feedback, Ray.  I'll take a crack at implementing the two token 
file approach.

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14964) AliyunOSS: backport Aliyun OSS module to branch-2

2017-11-27 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-14964:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
Release Note: Aliyun OSS is widely used among China’s cloud users and this 
work implemented a new Hadoop compatible filesystem AliyunOSSFileSystem with 
oss:// scheme, similar to the s3a and azure support.  (was: OSS is widely used 
among China’s cloud users and this work implemented a new Hadoop compatible 
filesystem AliyunOSSFileSystem with oss scheme, similar to the s3a and azure 
support. Currently, the feature is support in 2.9.1. )
  Status: Resolved  (was: Patch Available)

I'm going to mark this as "Resolved", to track the fact that it's been 
committed to branch-2 and branch-2.9. I also changed the release note to omit 
the version information (this should be implied by context).

> AliyunOSS: backport Aliyun OSS module to branch-2
> -
>
> Key: HADOOP-14964
> URL: https://issues.apache.org/jira/browse/HADOOP-14964
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: SammiChen
> Fix For: 2.9.1
>
> Attachments: HADOOP-14964-branch-2.000.patch, 
> HADOOP-14964-branch-2.8.000.patch, HADOOP-14964-branch-2.8.001.patch, 
> HADOOP-14964-branch-2.9.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14445) Delegation tokens are not shared between KMS instances

2017-11-27 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267351#comment-16267351
 ] 

Xiao Chen commented on HADOOP-14445:


Hi [~shahrs87] and [~daryn],

Hope you had a great thanksgiving.. 

Any updates on this?

> Delegation tokens are not shared between KMS instances
> --
>
> Key: HADOOP-14445
> URL: https://issues.apache.org/jira/browse/HADOOP-14445
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation, kms
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Rushabh S Shah
> Attachments: HADOOP-14445-branch-2.8.patch
>
>
> As discovered in HADOOP-14441, KMS HA using LoadBalancingKMSClientProvider do 
> not share delegation tokens. (a client uses KMS address/port as the key for 
> delegation token)
> {code:title=DelegationTokenAuthenticatedURL#openConnection}
> if (!creds.getAllTokens().isEmpty()) {
> InetSocketAddress serviceAddr = new InetSocketAddress(url.getHost(),
> url.getPort());
> Text service = SecurityUtil.buildTokenService(serviceAddr);
> dToken = creds.getToken(service);
> {code}
> But KMS doc states:
> {quote}
> Delegation Tokens
> Similar to HTTP authentication, KMS uses Hadoop Authentication for delegation 
> tokens too.
> Under HA, A KMS instance must verify the delegation token given by another 
> KMS instance, by checking the shared secret used to sign the delegation 
> token. To do this, all KMS instances must be able to retrieve the shared 
> secret from ZooKeeper.
> {quote}
> We should either update the KMS documentation, or fix this code to share 
> delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions

2017-11-27 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267246#comment-16267246
 ] 

Chris Douglas commented on HADOOP-14600:


bq. replace `listLocatedStatus` call with `listStatusIterator` because it 
returns FileStatus rather than LocatedFileStatus and that doesn't trigger all 
the getPermission() mess at all
Good point. Unfortunately, for everything accepted by the filter (which 
defaults to accepting everything, IIRC), we double the RPCs if the client 
subsequently asks for locations. That's bad for HDFS, but irrelevant to the 
local FS and object stores that don't report locality information.

[~myapachejira], have you had a chance to verify the patch, yet?

> LocatedFileStatus constructor forces RawLocalFS to exec a process to get the 
> permissions
> 
>
> Key: HADOOP-14600
> URL: https://issues.apache.org/jira/browse/HADOOP-14600
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.3
> Environment: file:// in a dir with many files
>Reporter: Steve Loughran
>Assignee: Ping Liu
> Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, 
> HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, 
> HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, 
> HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java
>
>
> Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws 
> against the local FS, because {{FileStatus.getPemissions}} call forces  
> {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI 
> values.
> That is: for every other FS, what's a field lookup or even a no-op, on the 
> local FS it's a process exec/spawn, with all the costs. This gets expensive 
> if you have many files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems

2017-11-27 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267241#comment-16267241
 ] 

genericqa commented on HADOOP-1:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HADOOP-1 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-1 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12883510/HADOOP-1.10.patch 
|
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13751/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
> Attachments: HADOOP-1.10.patch, HADOOP-1.2.patch, 
> HADOOP-1.3.patch, HADOOP-1.4.patch, HADOOP-1.5.patch, 
> HADOOP-1.6.patch, HADOOP-1.7.patch, HADOOP-1.8.patch, 
> HADOOP-1.9.patch, HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support for explicit FTPS (SSL/TLS)
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems

2017-11-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267229#comment-16267229
 ] 

Steve Loughran commented on HADOOP-1:
-

quick review

Overall: looks like a good quality piece of code: well, written, tests. very 
nice docs, and lots of proxy support.

* doesn't like applying to trunk; I had to fiddle with the poms, but there's 
also a clientkeystore file which I couldn't handle. Is that needed? Could it be 
generated?
* doesn't compile either. 


Tests

* why is the FTP test skipped on Windows?
* we tend to use `setup()` `teardown()` as the @Before/@after operations in 
filesystems. Having standard names makes it more consistent when 
subclassing...and having >1 before/after method puts you into ambiguous 
ordering. Fix: change the names, subclass as appropriate, calling the 
superclass method as desired.
* like what you've done with the mixin to reuse all the tests, but I'd prefer a 
name more unique to the FS than ContractTestBase. FTPContractTestMixin?
* Never thought about having `AbstractFSContract createContract()` raise an 
IOE. We could add that to its signature (best in a separate JIRA)
* You are importing the distcp tests but not using them. What's your plan 
there? Get this patch in and then add that as the next iteration?

Docs

* readme should go into src/site/org/apache/hadoop/ftpextended/index.md


Misc minor points


* need to rename AbstractFileSystem to a class which isn't used elsewhere, e.g 
AbstractFTPFileSystem
* use try-with-resources areound channel logic and have the implicit 
channel.close() do the disconnect
* There's lots of opportunities to use subclasses of IOE where it is useful to 
provide more meaningful failures. 
* the style guidelines have conventions on import ordering we strive to 
maintain, especially on new code
* hadoop code prefers a space after // in comments; a search & replace should 
fix
* org/apache/hadoop/fs/ftpextended/ftp/package-info.java should declare code as 
@Private+Unstable. Even if the FS is public, there's no API coming from this 
module, nor stability guarantees.
* Unless it's going to leak passwords, error messages should try and include 
the filesystem URI in them. Why? helps debugging when the job is working with 
>1 FS and all you have is a log to go on
* When wrapping library exceptions (e.g SFTP exceptions), always include the 
toString() value of the wrapped exception. It'll be the string most likely to 
make it to bug reports.
* core-site.xml mentions s3


h3. Migration

I don't know what the good story for migration here is. Given how much better 
this is than the fairly basic ftp client there is today, we've no reason not to 
tell everyone to move to it. The hard part is how to 

* this works best if the previous options still all work.

h3. Security

I'm moving to a world where we have to provide security audits of sensitive 
patches, which this is. 

What's the security mechanism here? 
* Is Configuration.getPassword() used to get secrets through JCEKS files?
* I see that user:password is supported. I don't like this. I guess given its 
only FTP it doesn't matter that much, but on SFTP it does
* And on the topic of SFTP, what to do there?

The docs will need a section in this, with the overall theme being telling 
people how to use it securely


> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
> Attachments: HADOOP-1.10.patch, HADOOP-1.2.patch, 
> HADOOP-1.3.patch, HADOOP-1.4.patch, HADOOP-1.5.patch, 
> HADOOP-1.6.patch, HADOOP-1.7.patch, HADOOP-1.8.patch, 
> HADOOP-1.9.patch, HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support for explicit FTPS (SSL/TLS)
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance 
> improvement over not cached 

[jira] [Commented] (HADOOP-14976) Allow overriding HADOOP_SHELL_EXECNAME

2017-11-27 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267146#comment-16267146
 ] 

Arpit Agarwal commented on HADOOP-14976:


Allen, do you have any comments on the updated patch or the test case? Thanks.

> Allow overriding HADOOP_SHELL_EXECNAME
> --
>
> Key: HADOOP-14976
> URL: https://issues.apache.org/jira/browse/HADOOP-14976
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HADOOP-14976.01.patch, HADOOP-14976.02.patch, 
> HADOOP-14976.03.patch, HADOOP-14976.04.patch
>
>
> Some Hadoop shell scripts infer their own name using this bit of shell magic:
> {code}
>  18 MYNAME="${BASH_SOURCE-$0}"
>  19 HADOOP_SHELL_EXECNAME="${MYNAME##*/}"
> {code}
> e.g. see the 
> [hdfs|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs#L18]
>  script.
> The inferred shell script name is later passed to _hadoop-functions.sh_ which 
> uses it to construct the names of some environment variables. E.g. when 
> invoking _hdfs datanode_, the options variable name is inferred as follows:
> {code}
> # HDFS + DATANODE + OPTS -> HDFS_DATANODE_OPTS
> {code}
> This works well if the calling script name is standard {{hdfs}} or {{yarn}}. 
> If a distribution renames the script to something like foo.bar, , then the 
> variable names will be inferred as {{FOO.BAR_DATANODE_OPTS}}. This is not a 
> valid bash variable name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-11-27 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267086#comment-16267086
 ] 

Sean Mackrory commented on HADOOP-14475:


{quote}I think we'd probably be better off with hostname than full fsURI. The 
URI will include the schema, and may, if we don't sanitise it properly, include 
user:password secrets. Using bucket only will keep things more consistent with 
other code{quote}

Absolutely! I had thought about that but forgotten, changing to just use 
hostname now.

{quote}azure just give the registry name to the record{quote}

Looks to me like azure tags it with the account name, container name, and a 
random ID unique to the instance. This is very similar to what we're doing.

{quote}I think we don't need to add name.getHost to record name{quote}

I agree. But why does the metrics source name need it then? Unless there may be 
multiple record types from the same source I don't think I understand the 
purpose, so maybe I'm missing something. If we could include the local hostname 
(e.g. the cluster node this metric came from) that would be more useful... Any 
thoughts?

> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> --
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>Reporter: Yonger
>Assignee: Yonger
> Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, 
> HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, 
> HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, 
> HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14775.007.patch, 
> failsafe-report-s3a-it.html, failsafe-report-s3a-scale.html, 
> failsafe-report-scale.html, failsafe-report-scale.zip, s3a-metrics.patch1, 
> stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15071) s3a troubleshooting docs to add a couple more failure modes

2017-11-27 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267073#comment-16267073
 ] 

genericqa commented on HADOOP-15071:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
26m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 6 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 38m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15071 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12899437/HADOOP-15071-001.patch
 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 0e75b2aea180 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2bde3ae |
| maven | version: Apache Maven 3.3.9 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13750/artifact/out/whitespace-eol.txt
 |
| Max. process+thread count | 341 (vs. ulimit of 5000) |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13750/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> s3a troubleshooting docs to add a couple more failure modes
> ---
>
> Key: HADOOP-15071
> URL: https://issues.apache.org/jira/browse/HADOOP-15071
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs/s3
>Affects Versions: 2.8.2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15071-001.patch
>
>
> I've got some more troubleshooting entries to add



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15071) s3a troubleshooting docs to add a couple more failure modes

2017-11-27 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15071:

Attachment: HADOOP-15071-001.patch

Patch 001
* running out of the pooled connections
* auth error on a PUT (but read OK)

> s3a troubleshooting docs to add a couple more failure modes
> ---
>
> Key: HADOOP-15071
> URL: https://issues.apache.org/jira/browse/HADOOP-15071
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs/s3
>Affects Versions: 2.8.2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15071-001.patch
>
>
> I've got some more troubleshooting entries to add



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15071) s3a troubleshooting docs to add a couple more failure modes

2017-11-27 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15071:

Status: Patch Available  (was: Open)

> s3a troubleshooting docs to add a couple more failure modes
> ---
>
> Key: HADOOP-15071
> URL: https://issues.apache.org/jira/browse/HADOOP-15071
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: documentation, fs/s3
>Affects Versions: 2.8.2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15071-001.patch
>
>
> I've got some more troubleshooting entries to add



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14971) Merge S3A committers into trunk

2017-11-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267005#comment-16267005
 ] 

ASF GitHub Bot commented on HADOOP-14971:
-

Github user steveloughran closed the pull request at:

https://github.com/apache/hadoop/pull/282


> Merge S3A committers into trunk
> ---
>
> Key: HADOOP-14971
> URL: https://issues.apache.org/jira/browse/HADOOP-14971
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 3.1.0
>
> Attachments: HADOOP-13786-040.patch, HADOOP-13786-041.patch
>
>
> Merge the HADOOP-13786 committer into trunk. This branch is being set up as a 
> github PR for review there & to keep it out the mailboxes of the watchers on 
> the main JIRA



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15071) s3a troubleshooting docs to add a couple more failure modes

2017-11-27 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15071:
---

 Summary: s3a troubleshooting docs to add a couple more failure 
modes
 Key: HADOOP-15071
 URL: https://issues.apache.org/jira/browse/HADOOP-15071
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: documentation, fs/s3
Affects Versions: 2.8.2
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor


I've got some more troubleshooting entries to add



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15070) add test to verify FileSystem and paths differentiate on user info

2017-11-27 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266942#comment-16266942
 ] 

genericqa commented on HADOOP-15070:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
31s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 37s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch 
generated 2 new + 8 unchanged - 6 fixed = 10 total (was 14) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 46s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
38s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 89m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15070 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12899394/HADOOP-15070-001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 82086d40498b 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2bde3ae |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13749/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13749/testReport/ |
| Max. process+thread count | 1767 (vs. ulimit of 5000) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13749/console |
| Powered by | Apache Yetus 

[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-11-27 Thread Yonger (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266866#comment-16266866
 ] 

Yonger commented on HADOOP-14475:
-

azure just give the registry name to the record, which is confused. We can't 
distinguish the different records if there are more than one relative metric 
source registered within a process, what I given the example in previous 
comments.  

> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> --
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>Reporter: Yonger
>Assignee: Yonger
> Attachments: HADOOP-14475-003.patch, HADOOP-14475.002.patch, 
> HADOOP-14475.005.patch, HADOOP-14475.006.patch, HADOOP-14475.008.patch, 
> HADOOP-14475.009.patch, HADOOP-14475.010.patch, HADOOP-14475.011.patch, 
> HADOOP-14475.012.patch, HADOOP-14475.013.patch, HADOOP-14775.007.patch, 
> failsafe-report-s3a-it.html, failsafe-report-s3a-scale.html, 
> failsafe-report-scale.html, failsafe-report-scale.zip, s3a-metrics.patch1, 
> stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-27 Thread Yonger (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266856#comment-16266856
 ] 

Yonger commented on HADOOP-14943:
-

[~ste...@apache.org]I remember there are some discussion about how to configure 
the fake host list, such as returning endpoint, compute hosts and a star, is 
this right? I am not sure whether i understand these points totally.

I just test these four cases with 1TB dataset on query42 of TPC-DS, results are 
below(seconds):

||default localhost||endpoint||star||compute host list||
|16|16l 16|28|

>From this result, performance are equal in these cases except returning 
>compute host list.



> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-27 Thread Yonger (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266856#comment-16266856
 ] 

Yonger edited comment on HADOOP-14943 at 11/27/17 2:15 PM:
---

[~ste...@apache.org]I remember there are some discussion about how to configure 
the fake host list, such as returning endpoint, compute hosts and a star, is 
this right? I am not sure whether i understand these points totally.

I just test these four cases with 1TB dataset on query42 of TPC-DS, results are 
below(seconds):

||default localhost||endpoint||star||compute host list||
|16|16|16|28|

>From this result, performance are equal in these cases except returning 
>compute host list.




was (Author: iyonger):
[~ste...@apache.org]I remember there are some discussion about how to configure 
the fake host list, such as returning endpoint, compute hosts and a star, is 
this right? I am not sure whether i understand these points totally.

I just test these four cases with 1TB dataset on query42 of TPC-DS, results are 
below(seconds):

||default localhost||endpoint||star||compute host list||
|16|16l 16|28|

>From this result, performance are equal in these cases except returning 
>compute host list.



> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15070) add test to verify FileSystem and paths differentiate on user info

2017-11-27 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15070:

Attachment: HADOOP-15070-001.patch

 patch 001: add new comparision test that keys include userInfo. Also: update 
TestFileSystemCaching to Java7/8, factor out common operations

> add test to verify FileSystem and paths differentiate on user info
> --
>
> Key: HADOOP-15070
> URL: https://issues.apache.org/jira/browse/HADOOP-15070
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 2.8.2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15070-001.patch
>
>
> Add a test to verify that userinfo data is (correctly) used to differentiate 
> the entries in the FS cache, so are treated as different filesystems.
> * This is criticalk for wasb, which uses the username to identify the 
> container, in a path like {{wasb:contain...@stevel.azure.net}}. This works in 
> Hadoop, but SPARK-22587 shows that it may not be followed everywhere (and 
> given there's no documentation, who can fault them?)
> * AbstractFileSystem.checkPath looks suspiciously like it's path validation 
> just checks host, not authority. That needs a test too.
> * And we should cut the @LimitedPrivate(HDFS, Mapreduce) from 
> Path.makeQualified. If MR needs it, it should be considered open to all apps 
> using the Hadoop APIs. Until I looked at the code I thought it was...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15070) add test to verify FileSystem and paths differentiate on user info

2017-11-27 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15070:

Status: Patch Available  (was: Open)

> add test to verify FileSystem and paths differentiate on user info
> --
>
> Key: HADOOP-15070
> URL: https://issues.apache.org/jira/browse/HADOOP-15070
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 2.8.2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15070-001.patch
>
>
> Add a test to verify that userinfo data is (correctly) used to differentiate 
> the entries in the FS cache, so are treated as different filesystems.
> * This is criticalk for wasb, which uses the username to identify the 
> container, in a path like {{wasb:contain...@stevel.azure.net}}. This works in 
> Hadoop, but SPARK-22587 shows that it may not be followed everywhere (and 
> given there's no documentation, who can fault them?)
> * AbstractFileSystem.checkPath looks suspiciously like it's path validation 
> just checks host, not authority. That needs a test too.
> * And we should cut the @LimitedPrivate(HDFS, Mapreduce) from 
> Path.makeQualified. If MR needs it, it should be considered open to all apps 
> using the Hadoop APIs. Until I looked at the code I thought it was...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15069) support git-secrets commit hook to keep AWS secrets out of git

2017-11-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266708#comment-16266708
 ] 

Steve Loughran commented on HADOOP-15069:
-

no, only those lines were autogenerated. The rest were built by trial and 
error: running the script and seeing what failed. The regexp and those strings 
are enough to keep the current source code and any new commits happy. The 
regexp didn't work for old repos, so I tried to insert the explicit strings, 
but eventually gave up.

The key thing is with this file, if the user installs the git secrets hook & 
registers the AWS secrets, then they are kept out of source

> support git-secrets commit hook to keep AWS secrets out of git
> --
>
> Key: HADOOP-15069
> URL: https://issues.apache.org/jira/browse/HADOOP-15069
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15069-001.patch, HADOOP-15069-002.patch
>
>
> The latest Uber breach looks like it involved AWS keys in git repos.
> Nobody wants that, which is why amazon provide 
> [git-secrets|https://github.com/awslabs/git-secrets]; a script you can use to 
> scan a repo and its history, *and* add as an automated check.
> Anyone can set this up, but there are a few false positives in the scan, 
> mostly from longs and a few all-upper-case constants. These can all be added 
> to a .gitignore file.
> Also: mention git-secrets in the aws testing docs; say "use it"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15070) add test to verify FileSystem and paths differentiate on user info

2017-11-27 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15070:
---

 Summary: add test to verify FileSystem and paths differentiate on 
user info
 Key: HADOOP-15070
 URL: https://issues.apache.org/jira/browse/HADOOP-15070
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs, test
Affects Versions: 2.8.2
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor


Add a test to verify that userinfo data is (correctly) used to differentiate 
the entries in the FS cache, so are treated as different filesystems.

* This is criticalk for wasb, which uses the username to identify the 
container, in a path like {{wasb:contain...@stevel.azure.net}}. This works in 
Hadoop, but SPARK-22587 shows that it may not be followed everywhere (and given 
there's no documentation, who can fault them?)
* AbstractFileSystem.checkPath looks suspiciously like it's path validation 
just checks host, not authority. That needs a test too.
* And we should cut the @LimitedPrivate(HDFS, Mapreduce) from 
Path.makeQualified. If MR needs it, it should be considered open to all apps 
using the Hadoop APIs. Until I looked at the code I thought it was...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org