[jira] [Resolved] (HDFS-16891) Avoid the overhead of copy-on-write exception list while loading inodes sub sections in parallel

2023-01-18 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-16891.
--
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> Avoid the overhead of copy-on-write exception list while loading inodes sub 
> sections in parallel
> 
>
> Key: HDFS-16891
> URL: https://issues.apache.org/jira/browse/HDFS-16891
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.4
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> If we enable parallel loading and persisting of inodes from/to fs image, we 
> get the benefit of improved performance. However, while loading sub-sections 
> INODE_DIR_SUB and INODE_SUB, if we encounter any errors, we use copy-on-write 
> list to maintain the list of exceptions. Since our usecase is not to iterate 
> over this list while executor threads are adding new elements to the list, 
> using copy-on-write is bit of an overhead for this usecase.
> It would be better to synchronize adding new elements to the list rather than 
> having the list copy all elements over every time new element is added to the 
> list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16887) Log start and end of phase/step in startup progress

2023-01-12 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-16887.
--
Fix Version/s: 3.4.0
   3.2.5
   3.3.9
   Resolution: Fixed

> Log start and end of phase/step in startup progress
> ---
>
> Key: HDFS-16887
> URL: https://issues.apache.org/jira/browse/HDFS-16887
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.9
>
>
> As part of Namenode startup progress, we have multiple phases and steps 
> within phase that are instantiated. While the startup progress view can be 
> instantiated with the current view of phase/step, having at least DEBUG logs 
> for startup progress would be helpful to identify when a particular step for 
> LOADING_FSIMAGE/SAVING_CHECKPOINT/LOADING_EDITS was started and ended.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-8510) Provide different timeout settings for hdfs dfsadmin -getDatanodeInfo.

2022-10-25 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-8510.
-
Resolution: Won't Fix

This is an old improvement proposal that I'm no longer planning on 
implementing. I'm going to close the issue. If anyone else would find it 
useful, please feel free to reopen and reassign. I'd be happy to help with code 
review.

> Provide different timeout settings for hdfs dfsadmin -getDatanodeInfo.
> --
>
> Key: HDFS-8510
> URL: https://issues.apache.org/jira/browse/HDFS-8510
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> During a rolling upgrade, an administrator runs {{hdfs dfsadmin 
> -getDatanodeInfo}} to check if a DataNode has stopped.  Currently, this 
> operation is subject to the RPC connection retries defined in 
> {{ipc.client.connect.max.retries}} and {{ipc.client.connect.retry.interval}}. 
>  This issue proposes adding separate configuration properties to control the 
> retries for this operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-4289) FsDatasetImpl#updateReplicaUnderRecovery throws errors validating replica byte count on Windows

2022-10-25 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-4289.
-
Resolution: Won't Fix

I'm no longer actively working on this. I no longer have easy access to a 
Windows environment to make Windows-specific changes, or even to confirm that 
this test failure still happens. It's a very old issue with no recent activity, 
so I'm going to assume it's no longer relevant and close it out. If it's still 
an ongoing issue that a Windows developer wants to pick up, please feel free to 
reopen and reassign.

> FsDatasetImpl#updateReplicaUnderRecovery throws errors validating replica 
> byte count on Windows
> ---
>
> Key: HDFS-4289
> URL: https://issues.apache.org/jira/browse/HDFS-4289
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: trunk-win
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
>
> {{FsDatasetImpl#updateReplicaUnderRecovery}} throws errors validating replica 
> byte count on Windows.  This can be seen by running 
> {{TestBalancerWithNodeGroup#testBalancerWithRackLocality}}, which fails on 
> Windows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16623) IllegalArgumentException in LifelineSender

2022-06-10 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-16623.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

I have committed this to trunk, branch-3.3 and branch-3.2. [~xuzq_zander] , 
thank you for the contribution.

> IllegalArgumentException in LifelineSender
> --
>
> Key: HDFS-16623
> URL: https://issues.apache.org/jira/browse/HDFS-16623
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In our production environment, an IllegalArgumentException occurred in the 
> LifelineSender at one DataNode which was undergoing GC at that time. 
> And the bug code is at line 1060 in BPServiceActor.java, because the sleep 
> time is negative.
> {code:java}
> while (shouldRun()) {
>  try {
> if (lifelineNamenode == null) {
>   lifelineNamenode = dn.connectToLifelineNN(lifelineNnAddr);
> }
> sendLifelineIfDue();
> Thread.sleep(scheduler.getLifelineWaitTime());
>   } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
>   } catch (IOException e) {
> LOG.warn("IOException in LifelineSender for " + BPServiceActor.this, 
> e);
>  }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11995) HDFS Architecture documentation incorrectly describes writing to a local temporary file.

2017-06-19 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-11995:


 Summary: HDFS Architecture documentation incorrectly describes 
writing to a local temporary file.
 Key: HDFS-11995
 URL: https://issues.apache.org/jira/browse/HDFS-11995
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.0.0-alpha3
Reporter: Chris Nauroth


The HDFS Architecture documentation has a section titled "Staging" that 
describes clients writing to a local temporary file first before interacting 
with the NameNode to allocate file metadata.  This information is incorrect.  
(Perhaps it was correct a long time ago, but it is no longer accurate with 
respect to the current implementation.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11063) Set NameNode RPC server handler thread name with more descriptive information about the RPC call.

2016-10-26 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-11063:


 Summary: Set NameNode RPC server handler thread name with more 
descriptive information about the RPC call.
 Key: HDFS-11063
 URL: https://issues.apache.org/jira/browse/HDFS-11063
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Chris Nauroth


We often run {{jstack}} on a NameNode process as a troubleshooting step if it 
is suffering high load or appears to be hanging.  By reading the stack trace, 
we can identify if a caller is blocked inside an expensive operation.  This 
would be even more helpful if we updated the RPC server handler thread name 
with more descriptive information about the RPC call.  This could include the 
calling user, the called RPC method, and the most significant argument to that 
method (most likely the path).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-11034) Provide a command line tool to clear decommissioned DataNode information from the NameNode without restarting.

2016-10-19 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-11034:


 Summary: Provide a command line tool to clear decommissioned 
DataNode information from the NameNode without restarting.
 Key: HDFS-11034
 URL: https://issues.apache.org/jira/browse/HDFS-11034
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Chris Nauroth


Information about decommissioned DataNodes remains tracked in the NameNode for 
the entire NameNode process lifetime.  Currently, the only way to clear this 
information is to restart the NameNode.  This issue proposes to add a way to 
clear this information online, without requiring a process restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-6277) WebHdfsFileSystem#toUrl does not perform character escaping for rename

2016-09-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-6277.
-
Resolution: Won't Fix

This bug is present in the 1.x line, but not 2.x or 3.x.  I'm resolving this as 
Won't Fix, because 1.x is no longer under active maintenance.

> WebHdfsFileSystem#toUrl does not perform character escaping for rename 
> ---
>
> Key: HDFS-6277
> URL: https://issues.apache.org/jira/browse/HDFS-6277
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Ramya Sunil
>Assignee: Chris Nauroth
>
> Found this issue while testing HDFS-6141. WebHdfsFileSystem#toUrl  does not 
> perform character escaping for rename and causes the operation to fail. 
> This bug does not exist on 2.x
> For e.g: 
> $ hadoop dfs -rmr 'webhdfs://:/tmp/test dirname with spaces'
> Problem with Trash.Unexpected HTTP response: code=400 != 200, op=RENAME, 
> message=Bad Request. Consider using -skipTrash option
> rmr: Failed to move to trash: webhdfs://:/tmp/test dirname 
> with spaces



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10546) hadoop-hdfs-native-client fails distro build when trying to copy libhdfs binaries.

2016-06-17 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10546.
--
Resolution: Duplicate

I just realized this duplicates HDFS-10353, which fixed the problem in trunk.  
We just need to cherry-pick that patch down to branch-2 and branch-2.8.  I'll 
cover it over there.

> hadoop-hdfs-native-client fails distro build when trying to copy libhdfs 
> binaries.
> --
>
> Key: HDFS-10546
> URL: https://issues.apache.org/jira/browse/HDFS-10546
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Blocker
>
> During the distro build, hadoop-hdfs-native-client copies the built libhdfs 
> binary artifacts for inclusion in the distro.  It references an incorrect 
> path though.  The copy fails and the build aborts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10546) hadoop-hdfs-native-client fails distro build when trying to copy libhdfs binaries.

2016-06-17 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-10546:


 Summary: hadoop-hdfs-native-client fails distro build when trying 
to copy libhdfs binaries.
 Key: HDFS-10546
 URL: https://issues.apache.org/jira/browse/HDFS-10546
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Blocker


During the distro build, hadoop-hdfs-native-client copies the built libhdfs 
binary artifacts for inclusion in the distro.  It references an incorrect path 
though.  The copy fails and the build aborts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10502) Enabled memory locking and now HDFS won't start up

2016-06-08 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10502.
--
Resolution: Invalid

Hello [~machey].  I recommend taking these questions to the 
u...@hadoop.apache.org mailing list.  We use JIRA for tracking confirmed bugs 
and feature requests.  We use u...@hadoop.apache.org for usage advice and 
troubleshooting.

Regarding whether or not this is a recommended approach, I think it depends on 
a few other factors.  Is the intent to use these cached files from Hadoop 
workloads, such as MapReduce jobs or Hive queries?  If not, then I wonder if 
your use case might be better served by something more directly focused on 
general caching use cases, such as Redis or memcached.  If your use case does 
involve Hadoop integration, then certainly Centralized Cache Management is 
worth exploring.

Regarding the timeouts, I can tell from the exception that this is the 
heartbeat RPC sent from the DataNode to the NameNode.  I recommend 
investigating connectivity between the DataNode and the NameNode and examining 
the logs from both sides to try to determine if something is going wrong in the 
handling of the heartbeat message.  On one hand, a heartbeat timeout is not an 
error condition that is specific to Centralized Cache Management.  It could 
happen whether or not you're using that feature.  On the other hand, the 
heartbeat message does contain some optional information about the state of 
cache capacity and current usage at the DataNode.  That information would 
trigger special handling logic at the NameNode side, so I suppose there is a 
chance that something in that logic is hanging up the heartbeat handling.  
Investigating the logs might reveal more.

u...@hadoop.apache.org would be a good forum for further discussion of both of 
these topics.

> Enabled memory locking and now HDFS won't start up
> --
>
> Key: HDFS-10502
> URL: https://issues.apache.org/jira/browse/HDFS-10502
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.2
> Environment: RHEL 6.8
>Reporter: Chris Machemer
>
> My goal is to speed up reads.  I have about 500k small files (2k to 15k) and 
> I'm trying to use HDFS as a cache for serialized instances of java objects.
> I've written the code to construct and serialize all the objects out to HDFS, 
> and am now hoping to improve read performance, because accessing the objects 
> from disk-based storage is proving to be too slow for my application's SLA's.
> So my first question is, is using memory locking and hdfs cacheadmin pools 
> and directives the right way to go, to cache my objects into memory, or 
> should I create RAM disks, and do memory-based storage instead?
> If hdfs cacheadmin is the way to go (it's the path I'm going down so far), 
> then I need to figure out if what's happening is a bug or if I've configured 
> something wrong, because when I start up HDFS with a gig of memory locked 
> (both in limits.d for ulimit -l and also in hdfs-site.xml) and the server 
> starts up, and presumably tries to cache things into memory, I get hours and 
> hours of timeouts in the logs like this:
> 2016-06-08 07:42:50,856 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> IOException in offerService
> java.net.SocketTimeoutException: Call From stgb-fe1.litle.com/10.1.9.66 to 
> localhost:8020 failed on socket timeout exception: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/127.0.0.1:51647 remote=localhost/127.0.0.1:8020]; For more details 
> see:  http://wiki.apache.org/hadoop/SocketTimeout
>   at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1479)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy13.sendHeartbeat(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:153)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:554)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:653)
>   at 
> 

[jira] [Created] (HDFS-10438) When NameNode HA is configured to use the lifeline RPC server, it should log the address of that server.

2016-05-19 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-10438:


 Summary: When NameNode HA is configured to use the lifeline RPC 
server, it should log the address of that server.
 Key: HDFS-10438
 URL: https://issues.apache.org/jira/browse/HDFS-10438
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Reporter: KWON BYUNGCHANG
Assignee: Chris Nauroth
Priority: Minor


As reported by [~magnum]:

I have configured below
{code}
dfs.namenode.servicerpc-address.xdev.nn1=my.host.com:8040
dfs.namenode.lifeline.rpc-address.xdev.nn1=my.host.com:8041
{code}

servicerpc port is 8040,  lifeline port is 8041.
however zkfc daemon is logging using servicerpc port. 
It may cause confusion.

thank you.

{code}
2016-05-19 19:18:40,566 WARN  ha.HealthMonitor 
(HealthMonitor.java:doHealthChecks(207)) - Service health check failed for 
NameNode at my.host.com/10.114.87.91:8040: The NameNode has no resources 
available
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10437) ReconfigurationProtocol not covered by HDFSPolicyProvider.

2016-05-19 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-10437:


 Summary: ReconfigurationProtocol not covered by HDFSPolicyProvider.
 Key: HDFS-10437
 URL: https://issues.apache.org/jira/browse/HDFS-10437
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.8.0
Reporter: Chris Nauroth


The {{HDFSPolicyProvider}} class contains an entry for defining the security 
policy of each HDFS RPC protocol interface.  {{ReconfigurationProtocol}} is not 
listed currently.  This may indicate that reconfiguration functionality is not 
working correctly in secured clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10373) HDFS ZKFC HealthMonitor Throw a Exception Cause AutoFailOver

2016-05-06 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10373.
--
Resolution: Invalid

Hello [~piaoyu zhang].  This doesn't look like a bug.  If ZKFC cannot contact 
its peer NameNode for a successful health check RPC, then an HA failover is the 
expected behavior.  This looks like an operational problem in this environment 
that needs further investigation.  "Connection reset by peer" means the remote 
end (the NameNode) closed out the socket before sending the expected response 
data.  I recommend looking at the NameNode logs to see if anything unusual 
happened during the timeframe of the HA failover.  If you need further 
assistance, then consider sending an email to u...@hadoop.apache.org.  I hope 
this helps.

> HDFS ZKFC HealthMonitor Throw a Exception Cause AutoFailOver
> 
>
> Key: HDFS-10373
> URL: https://issues.apache.org/jira/browse/HDFS-10373
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Affects Versions: 2.2.0
> Environment: CentOS6.5 Hadoop-2.2.0  
>Reporter: zhangyubiao
> Attachments: screenshot-1.png, 屏幕快照_2016-05-06_上午10.17.22.png
>
>
> HDFS ZKFC HealthMonitor Throw a Exception 
> 2016-05-05 02:00:59,475 WARN org.apache.hadoop.ha.HealthMonitor: 
> Transport-level exception trying to monitor health of NameNode at 
> XXX-XXX-XXX-hadoop.jd.local/172.22.17
> 1.XX:8021: Failed on local exception: java.io.IOException: Connection reset 
> by peer; Host Details : local host is: 
> "XXX-XXX-XXX-hadoop.jd.local/172.22.171.XX"; destinat
> ion host is: XXX-XXX-XXX-hadoop.jd.local":8021;
> Cause HA AutoFailOver



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10359) Allow trigger block report from all datanodes

2016-05-04 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10359.
--
Resolution: Won't Fix

I think there is momentum in this conversation towards a "Won't Fix" 
resolution, so I'm resolving the issue now.  [~Tao Jie], thank you for the 
discussion.  Even though this didn't lead to an enhancement, we appreciate the 
participation.

> Allow trigger block report from all datanodes
> -
>
> Key: HDFS-10359
> URL: https://issues.apache.org/jira/browse/HDFS-10359
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.0, 2.6.1
>Reporter: Tao Jie
>
> Since we have HDFS-7278 allows trigger block report from one certain 
> datanode. It would be helpful to add a option to this command to trigger 
> block report from all datanodes.
> Command maybe like this:
> *hdfs dfsadmin -triggerBlockReport \[-incremental\] 
> *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10356) Ozone: Container server needs enhancements to control of bind address for greater flexibility and testability.

2016-05-02 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-10356:


 Summary: Ozone: Container server needs enhancements to control of 
bind address for greater flexibility and testability.
 Key: HDFS-10356
 URL: https://issues.apache.org/jira/browse/HDFS-10356
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chris Nauroth


The container server, as implemented in class 
{{org.apache.hadoop.ozone.container.common.transport.server.XceiverServer}}, 
currently does not offer the same degree of flexibility as our other RPC 
servers for controlling the network interface and port used in the bind call.  
There is no "bind-host" property, so it is not possible to control the exact 
network interface selected.  If the requested port is different from the actual 
bound port (i.e. setting port to 0 in test cases), then there is no exposure of 
that actual bound port to clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10351) Ozone: Optimize key writes to chunks by providing a bulk write implementation in ChunkOutputStream.

2016-04-29 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-10351:


 Summary: Ozone: Optimize key writes to chunks by providing a bulk 
write implementation in ChunkOutputStream.
 Key: HDFS-10351
 URL: https://issues.apache.org/jira/browse/HDFS-10351
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chris Nauroth
Assignee: Chris Nauroth


HDFS-10268 introduced the {{ChunkOutputStream}} class as part of end-to-end 
integration of Ozone receiving key content and writing it to chunks in a 
container.  That patch provided an implementation of the mandatory single-byte 
write method.  We can improve performance by adding an implementation of the 
bulk write method too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10349) StorageContainerManager fails to compile after merge of HDFS-10312 maxDataLength enforcement.

2016-04-29 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-10349:


 Summary: StorageContainerManager fails to compile after merge of 
HDFS-10312 maxDataLength enforcement.
 Key: HDFS-10349
 URL: https://issues.apache.org/jira/browse/HDFS-10349
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ozone
Reporter: Chris Nauroth
Assignee: Chris Nauroth


HDFS-10312 introduced enforcement of a configurable maximum data length while 
deserializing large block reports.  This change broke compilation of 
{{StorageContainerManager}} on the HDFS-7240 feature branch, due to a 
constructor signature change in {{DatanodeProtocolServerSideTranslatorPB}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-26 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10322.
--
Resolution: Duplicate

[~chenfolin], thank you for investigating this further.  I'm just updating 
status on this issue to indicate it's a duplicate of a prior issue.

> DomianSocket error lead to more and more DataNode thread waiting 
> -
>
> Key: HDFS-10322
> URL: https://issues.apache.org/jira/browse/HDFS-10322
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Fix For: 2.6.4
>
>
> When open short read and  a DomianSoket broken pipe error happened,The 
> Datanode will produce more and more waiting threads.
>  It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
> because the DomainSocket thread is in Running state.
> stack log:
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
> operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
> condition [0x7f2d6e4a5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00061c493500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
>   at java.lang.Thread.run(Thread.java:745)
> =DomianSocketWatcher
> "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
> [0x7f2dbe4cb000]
>java.lang.Thread.State: RUNNABLE
>   at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
>   at java.lang.Thread.run(Thread.java:745)
> ===datanode error log
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
> operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
> java.net.SocketException: write(2) error: Broken pipe
> at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> at 
> org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
> at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
> at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
> at 
> com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-10322) DomianSocket error lead to more and more DataNode thread waiting

2016-04-26 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HDFS-10322:
--

> DomianSocket error lead to more and more DataNode thread waiting 
> -
>
> Key: HDFS-10322
> URL: https://issues.apache.org/jira/browse/HDFS-10322
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
>Reporter: ChenFolin
> Fix For: 2.6.4
>
>
> When open short read and  a DomianSoket broken pipe error happened,The 
> Datanode will produce more and more waiting threads.
>  It is similar to Bug HADOOP-11802, but i do not think they are same problem, 
> because the DomainSocket thread is in Running state.
> stack log:
> "DataXceiver for client unix:/var/run/hadoop-hdfs/dn.50010 [Waiting for 
> operation #1]" daemon prio=10 tid=0x0278e000 nid=0x2bc6 waiting on 
> condition [0x7f2d6e4a5000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00061c493500> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:316)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:394)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
>   at java.lang.Thread.run(Thread.java:745)
> =DomianSocketWatcher
> "Thread-759187" daemon prio=10 tid=0x0219c800 nid=0x8c56 runnable 
> [0x7f2dbe4cb000]
>java.lang.Thread.State: RUNNABLE
>   at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
>   at 
> org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:474)
>   at java.lang.Thread.run(Thread.java:745)
> ===datanode error log
> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
> datanode-:50010:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM 
> operation src: unix:/var/run/hadoop-hdfs/dn.50010 dst: 
> java.net.SocketException: write(2) error: Broken pipe
> at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
> at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
> at 
> org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
> at 
> com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
> at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
> at 
> com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:371)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:409)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:178)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:93)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10312) Large block reports may fail to decode at NameNode due to 64 MB protobuf maximum length restriction.

2016-04-19 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-10312:


 Summary: Large block reports may fail to decode at NameNode due to 
64 MB protobuf maximum length restriction.
 Key: HDFS-10312
 URL: https://issues.apache.org/jira/browse/HDFS-10312
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Chris Nauroth
Assignee: Chris Nauroth


Our RPC server caps the maximum size of incoming messages at 64 MB by default.  
For exceptional circumstances, this can be uptuned using 
{{ipc.maximum.data.length}}.  However, for block reports, there is still an 
internal maximum length restriction of 64 MB enforced by protobuf.  (Sample 
stack trace to follow in comments.)  This issue proposes to apply the same 
override to our block list decoding, so that large block reports can proceed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-10268) Ozone: end-to-end integration for create/get volumes, buckets and keys.

2016-04-06 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-10268:


 Summary: Ozone: end-to-end integration for create/get volumes, 
buckets and keys.
 Key: HDFS-10268
 URL: https://issues.apache.org/jira/browse/HDFS-10268
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Reporter: Chris Nauroth
Assignee: Chris Nauroth


The HDFS-7240 feature branch now has the building blocks required to enable 
end-to-end functionality and testing for create/get volumes, buckets and keys.  
The scope of this patch is to complete the necessary integration in 
{{DistributedStorageHandler}} and related classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-10257) Quick Thread Local Storage set-up has a small flaw

2016-04-05 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-10257.
--
Resolution: Not A Problem

Great, thanks for confirming [~stevebovy].  I'll go ahead and close this issue.

> Quick Thread Local Storage set-up has a small flaw
> --
>
> Key: HDFS-10257
> URL: https://issues.apache.org/jira/browse/HDFS-10257
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.6.4
> Environment: Linux 
>Reporter: Stephen Bovy
>Priority: Minor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In   jni_helper.c   in the   getJNIEnvfunction 
> The “THREAD_LOCAL_STORAGE_SET_QUICK(env);”   Macro   is   in the  wrong 
> location;   
> It should precede   the  “threadLocalStorageSet(env)”   as follows ::  
> THREAD_LOCAL_STORAGE_SET_QUICK(env);
> if (threadLocalStorageSet(env)) {
>   return NULL;
> }
> AND IN   “thread_local_storage.h”   the macro:   
> “THREAD_LOCAL_STORAGE_SET_QUICK”
> should be as follows :: 
> #ifdef HAVE_BETTER_TLS
>   #define THREAD_LOCAL_STORAGE_GET_QUICK() \
> static __thread JNIEnv *quickTlsEnv = NULL; \
> { \
>   if (quickTlsEnv) { \
> return quickTlsEnv; \
>   } \
> }
>   #define THREAD_LOCAL_STORAGE_SET_QUICK(env) \
> { \
>   quickTlsEnv = (env); \
>   return env;
> }
> #else
>   #define THREAD_LOCAL_STORAGE_GET_QUICK()
>   #define THREAD_LOCAL_STORAGE_SET_QUICK(env)
> #endif



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-350) DFSClient more robust if the namenode is busy doing GC

2016-03-22 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-350.

Resolution: Not A Problem

I'm resolving this issue.  In current versions, the client is more robust to 
this kind of failure.  The RPC layer implements retry policies.  Retried 
operations are handled gracefully using either an inherently idempotent 
implementation of the RPC or the retry cache for at-most-once execution.  In 
the event of an extremely long GC, the client would either retry and succeed 
after completion of the GC, or in more extreme cases it would trigger an HA 
failover and the client would successfully issue its call to the the new active 
NameNode.

> DFSClient more robust if the namenode is busy doing GC
> --
>
> Key: HDFS-350
> URL: https://issues.apache.org/jira/browse/HDFS-350
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> In the current code, if the client (writer) encounters an RPC error while 
> fetching a new block id from the namenode, it does not retry. It throws an 
> exception to the application. This becomes especially bad if the namenode is 
> in the middle of a GC and does not respond in time. The reason the client 
> throws an exception is because it does not know whether the namenode 
> successfully allocated a block for this file.
> One possible enhancement would be to make the client retry the addBlock RPC 
> if needed. The client can send the block list that it currently has. The 
> namenode can match the block list send by the client with what it has in its 
> own metadata and then send back a new blockid (or a previously allocated 
> blockid that the client had not yet received because the earlier RPC 
> timedout). This will make the client more robust!
> This works even when we support Appends because the namenode will *always* 
> verify that the client has the lease for the file in question.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9907) Exclude Ozone protobuf-generated classes from Findbugs analysis.

2016-03-04 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9907:
---

 Summary: Exclude Ozone protobuf-generated classes from Findbugs 
analysis.
 Key: HDFS-9907
 URL: https://issues.apache.org/jira/browse/HDFS-9907
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial


Pre-commit runs on the HDFS-7240 feature branch are currently flagging Ozone 
protobuf-generated classes with warnings.  These warnings aren't relevant, 
because we don't directly control the code generated by protoc.  We can exclude 
these classes in the Findbugs configuration, just like we do for other existing 
protobuf-generated classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9520) PeerCache evicts too frequently causing connection restablishments

2016-02-22 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9520.
-
Resolution: Won't Fix

I'm resolving this as Won't Fix as per prior discussion.  (Please feel free to 
reopen if there are further thoughts on configuration tuning.)

> PeerCache evicts too frequently causing connection restablishments
> --
>
> Key: HDFS-9520
> URL: https://issues.apache.org/jira/browse/HDFS-9520
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
> Attachments: HDFS-9520.png
>
>
> Env: 20 node setup
> dfs.client.socketcache.capacity = 16
> Issue:
> ==
> Monitored PeerCache and it was evicting lots of connections during close. Set 
> "dfs.client.socketcache.capacity=20" and tested again. Evictions still 
> happened. Screenshot of profiler is attached in the JIRA.
> Workaround:
> ===
> Temp fix was to set "dfs.client.socketcache.capacity=1000" to prevent 
> eviction. 
> Added more debug logs revealed that multimap.size() was 40 instead of 20. 
> LinkedListMultimap returns the total values instead of key size causing lots 
> of evictions.
> {code}
>if (capacity == multimap.size()) {
>   evictOldest();
> }
> {code}
> Should this be (capacity == multimap.keySet().size())  or is it expected that 
> the "dfs.client.socketcache.capacity" be set to very high value?
> \cc [~gopalv], [~sseth]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8943) Read apis in ByteRangeInputStream does not read all the bytes specified when chunked transfer-encoding is used in the server

2016-02-22 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-8943.
-
Resolution: Won't Fix

[~cmccabe], thank you for the reminder.  This is resolved as Won't Fix.

> Read apis in ByteRangeInputStream does not read all the bytes specified when 
> chunked transfer-encoding is used in the server
> 
>
> Key: HDFS-8943
> URL: https://issues.apache.org/jira/browse/HDFS-8943
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.1
>Reporter: Shradha Revankar
>Assignee: Shradha Revankar
> Attachments: HDFS-8943.000.patch
>
>
> With the default Webhdfs server implementation the read apis in 
> ByteRangeInputStream work as expected reading the correct number of bytes for 
> these apis :
> {{public int read(byte b[], int off, int len)}}
> {{public int read(long position, byte[] buffer, int offset, int length)}}
> But when a custom Webhdfs server implementation is plugged in which uses 
> chunked Transfer-encoding, these apis read only the first chunk. Simple fix 
> would be to loop and read till bytes specified similar to {{readfully()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9798) TestHdfsNativeCodeLoader fails

2016-02-12 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9798.
-
Resolution: Duplicate

Hi [~ajisakaa].  A recent Yetus change is preventing pre-commit from building 
libhadoop.so before running the HDFS tests.  We're tracking the fix in 
YETUS-281, and there is a patch in progress.

> TestHdfsNativeCodeLoader fails
> --
>
> Key: HDFS-9798
> URL: https://issues.apache.org/jira/browse/HDFS-9798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Akira AJISAKA
>
> TestHdfsNativeCodeLoader fails intermittently in Jenkins.
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14473/testReport/org.apache.hadoop.fs/TestHdfsNativeCodeLoader/testNativeCodeLoaded/
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14475/testReport/org.apache.hadoop.fs/TestHdfsNativeCodeLoader/testNativeCodeLoaded/
> Error message
> {noformat}
> TestNativeCodeLoader: libhadoop.so testing was required, but libhadoop.so was 
> not loaded.  LD_LIBRARY_PATH = 
> ${env.LD_LIBRARY_PATH}:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/native/target/usr/local/lib:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/../../hadoop-common-project/hadoop-common/target/native/target/usr/local/lib
> {noformat}
> Stacktrace
> {noformat}
> java.lang.AssertionError: TestNativeCodeLoader: libhadoop.so testing was 
> required, but libhadoop.so was not loaded.  LD_LIBRARY_PATH = 
> ${env.LD_LIBRARY_PATH}:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/native/target/usr/local/lib:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/../../hadoop-common-project/hadoop-common/target/native/target/usr/local/lib
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.fs.TestHdfsNativeCodeLoader.testNativeCodeLoaded(TestHdfsNativeCodeLoader.java:46)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9711) Integrate CSRF prevention filter in WebHDFS.

2016-01-27 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9711:
---

 Summary: Integrate CSRF prevention filter in WebHDFS.
 Key: HDFS-9711
 URL: https://issues.apache.org/jira/browse/HDFS-9711
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode, webhdfs
Reporter: Chris Nauroth
Assignee: Chris Nauroth


HADOOP-12691 introduced a filter in Hadoop Common to help REST APIs guard 
against cross-site request forgery attacks.  This issue tracks integration of 
that filter in WebHDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-6255) fuse_dfs will not adhere to ACL permissions in some cases

2016-01-19 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HDFS-6255:
-

I have a theory about what is happening.  fuse_dfs is not specifically made 
aware of the HDFS ACLs.  It only has visibility into the basic permissions.  In 
the case of an ACL entry that widens access (i.e. grant access to a specific 
named user or group), then if FUSE itself is enforcing access based solely on 
permissions, it might block access at the FUSE layer before even delegating to 
the NameNode.  This would be a limitation in granting access via ACLs through 
fuse_dfs, but it would not be a security hole.  (The problem can only make 
access more restrictive, not more relaxed.)

I tried to confirm this in the FUSE code, but I wasn't successful, and I don't 
have time to look deeper right now.  I'm seeing some comments from various 
sources that FUSE is unaware of POSIX ACLs, but can be made aware of xattrs.  
This might mean there is a possibility of making it work with some code changes 
in fuse_dfs.

I'm not entirely sure this is feasible yet, but I'm going to reopen the issue 
and mark it as a new feature request.

> fuse_dfs will not adhere to ACL permissions in some cases
> -
>
> Key: HDFS-6255
> URL: https://issues.apache.org/jira/browse/HDFS-6255
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Stephen Chu
>Assignee: Chris Nauroth
>
> As hdfs user, I created a directory /tmp/acl_dir/ and set permissions to 700. 
> Then I set a new acl group:jenkins:rwx on /tmp/acl_dir.
> {code}
> jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -getfacl /tmp/acl_dir
> # file: /tmp/acl_dir
> # owner: hdfs
> # group: supergroup
> user::rwx
> group::---
> group:jenkins:rwx
> mask::rwx
> other::---
> {code}
> Through the FsShell, the jenkins user can list /tmp/acl_dir as well as create 
> a file and directory inside.
> {code}
> [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -touchz /tmp/acl_dir/testfile1
> [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -mkdir /tmp/acl_dir/testdir1
> hdfs dfs -ls /tmp/acl[jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -ls /tmp/acl_dir/
> Found 2 items
> drwxr-xr-x   - jenkins supergroup  0 2014-04-17 19:11 
> /tmp/acl_dir/testdir1
> -rw-r--r--   1 jenkins supergroup  0 2014-04-17 19:11 
> /tmp/acl_dir/testfile1
> [jenkins@hdfs-vanilla-1 ~]$ 
> {code}
> However, as the same jenkins user, when I try to cd into /tmp/acl_dir using a 
> fuse_dfs mount, I get permission denied. Same permission denied when I try to 
> create or list files.
> {code}
> [jenkins@hdfs-vanilla-1 tmp]$ ls -l
> total 16
> drwxrwx--- 4 hdfsnobody 4096 Apr 17 19:11 acl_dir
> drwx-- 2 hdfsnobody 4096 Apr 17 18:30 acl_dir_2
> drwxr-xr-x 3 mapred  nobody 4096 Mar 11 03:53 mapred
> drwxr-xr-x 4 jenkins nobody 4096 Apr 17 07:25 testcli
> -rwx-- 1 hdfsnobody0 Apr  7 17:18 tf1
> [jenkins@hdfs-vanilla-1 tmp]$ cd acl_dir
> bash: cd: acl_dir: Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ touch acl_dir/testfile2
> touch: cannot touch `acl_dir/testfile2': Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ mkdir acl_dir/testdir2
> mkdir: cannot create directory `acl_dir/testdir2': Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ 
> {code}
> The fuse_dfs debug output doesn't show any error for the above operations:
> {code}
> unique: 18, opcode: OPENDIR (27), nodeid: 2, insize: 48
>unique: 18, success, outsize: 32
> unique: 19, opcode: READDIR (28), nodeid: 2, insize: 80
> readdir[0] from 0
>unique: 19, success, outsize: 312
> unique: 20, opcode: GETATTR (3), nodeid: 2, insize: 56
> getattr /tmp
>unique: 20, success, outsize: 120
> unique: 21, opcode: READDIR (28), nodeid: 2, insize: 80
>unique: 21, success, outsize: 16
> unique: 22, opcode: RELEASEDIR (29), nodeid: 2, insize: 64
>unique: 22, success, outsize: 16
> unique: 23, opcode: GETATTR (3), nodeid: 2, insize: 56
> getattr /tmp
>unique: 23, success, outsize: 120
> unique: 24, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 24, success, outsize: 120
> unique: 25, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 25, success, outsize: 120
> unique: 26, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 26, success, outsize: 120
> unique: 27, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 27, success, outsize: 120
> unique: 28, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 28, success, outsize: 120
> {code}
> In other scenarios, ACL permissions are enforced successfully. For example, 
> as hdfs user I create /tmp/acl_dir_2 and set permissions to 777. I then set 
> the acl user:jenkins:--- on the 

[jira] [Reopened] (HDFS-9569) Log the name of the fsimage being loaded for better supportability

2015-12-17 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HDFS-9569:
-

I have reverted this patch from trunk, branch-2, branch-2.8 and branch-2.7.  
This patch introduced a test failure in 
{{TestDFSUpgradeFromImage#testUpgradeFromRel2ReservedImage}}.  The test expects 
to see an {{IllegalArgumentException}}, and then retry the upgrade with the 
option to rename reserved paths.  After this patch, the error handling masked 
the {{IllegalArgumentException}}, so the test no longer worked as expected.

> Log the name of the fsimage being loaded for better supportability
> --
>
> Key: HDFS-9569
> URL: https://issues.apache.org/jira/browse/HDFS-9569
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Trivial
>  Labels: supportability
> Fix For: 2.7.3
>
> Attachments: HDFS-9569.001.patch
>
>
> When NN starts to load fsimage, it does
> {code}
>  void loadFSImageFile(FSNamesystem target, MetaRecoveryContext recovery,
>   FSImageFile imageFile, StartupOption startupOption) throws IOException {
>   LOG.debug("Planning to load image :\n" + imageFile);
>   ..
> long txId = loader.getLoadedImageTxId();
> LOG.info("Loaded image for txid " + txId + " from " + curFile);
> {code}
> A debug msg is issued at the beginning with the fsimage file name, then at 
> the end an info msg is issued after loading.
> If the fsimage loading failed due to corrupted fsimage (see HDFS-9406), we 
> don't see the first msg. It'd be helpful to always be able to see from NN 
> logs what fsimage file it's loading.
> Two improvements:
> 1. Change the above debug to info
> 2. If exception happens when loading fsimage, be sure to report the fsimage 
> name being loaded in the error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9552) Document types of permission checks performed for HDFS operations.

2015-12-11 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9552:
---

 Summary: Document types of permission checks performed for HDFS 
operations.
 Key: HDFS-9552
 URL: https://issues.apache.org/jira/browse/HDFS-9552
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Chris Nauroth
Assignee: Chris Nauroth


The HDFS permissions guide discusses our use of a POSIX-like model with read, 
write and execute permissions associated with users, groups and the catch-all 
other class.  However, there is no documentation that describes exactly what 
permission checks are performed by user-facing HDFS operations.  This is a 
frequent source of questions, so it would be good to document this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9534) Add CLI command to clear storage policy from a path.

2015-12-09 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9534:
---

 Summary: Add CLI command to clear storage policy from a path.
 Key: HDFS-9534
 URL: https://issues.apache.org/jira/browse/HDFS-9534
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Reporter: Chris Nauroth


The {{hdfs storagepolicies}} command has sub-commands for {{-setStoragePolicy}} 
and {{-getStoragePolicy}} on a path.  However, there is no 
{{-removeStoragePolicy}} to remove a previously set storage policy on a path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9505) HDFS Architecture documentation needs to be refreshed.

2015-12-03 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9505:
---

 Summary: HDFS Architecture documentation needs to be refreshed.
 Key: HDFS-9505
 URL: https://issues.apache.org/jira/browse/HDFS-9505
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Chris Nauroth
Priority: Minor


The HDFS Architecture document is out of date with respect to the current 
design of the system.

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

There are multiple false statements and omissions of recent features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9495) Data node opens random port for HTTPServer, not configurable

2015-12-02 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9495.
-
Resolution: Duplicate

Hello, [~neha.bathra].  This issue is tracked in HDFS-9049, so I'm resolving 
this one as a duplicate.

> Data node opens random port for HTTPServer, not configurable
> 
>
> Key: HDFS-9495
> URL: https://issues.apache.org/jira/browse/HDFS-9495
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: neha
>
> Data node opens random port for HTTP Server which is not configurable 
> currently. Better to make it configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9483) Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS.

2015-11-30 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9483:
---

 Summary: Documentation does not cover use of "swebhdfs" as URL 
scheme for SSL-secured WebHDFS.
 Key: HDFS-9483
 URL: https://issues.apache.org/jira/browse/HDFS-9483
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Chris Nauroth


If WebHDFS is secured with SSL, then you can use "swebhdfs" as the scheme in a 
URL to access it.  The current documentation does not state this anywhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9471) Webhdfs not working with shell command when kerberos security+https is enabled.

2015-11-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9471.
-
Resolution: Not A Problem

[~surendrasingh], that's a good point about the documentation.  I filed 
HDFS-9483 to track a documentation improvement.  If you're interested in 
providing the documentation, please feel free to pick up that one.  I'm going 
to resolve this one.

> Webhdfs not working with shell command when kerberos security+https is 
> enabled.
> ---
>
> Key: HDFS-9471
> URL: https://issues.apache.org/jira/browse/HDFS-9471
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Blocker
> Attachments: HDFS-9471.01.patch
>
>
> *Client exception*
> {code}
> secure@host85:/opt/hdfsdata/HA/install/hadoop/namenode/bin> ./hdfs dfs -ls 
> webhdfs://x.x.x.x:50070/test
> 15/11/25 18:46:55 ERROR web.WebHdfsFileSystem: Unable to get HomeDirectory 
> from original File System
> java.net.SocketException: Unexpected end of file from server
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:792)
> {code}
> *Exception in namenode log*
> {code}
> 2015-11-26 11:03:18,231 WARN org.mortbay.log: EXCEPTION
> javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
> at 
> sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:710)
> at sun.security.ssl.InputRecord.read(InputRecord.java:527)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:961)
> at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1363)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1391)
> at 
> sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1375)
> at 
> org.mortbay.jetty.security.SslSocketConnector$SslConnection.run(SslSocketConnector.java:708)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
> This is because URL schema hard coded in 
> {{WebHdfsFileSystem.getTransportScheme()}}.
> {code}
>  /**
>* return the underlying transport protocol (http / https).
>*/
>   protected String getTransportScheme() {
> return "http";
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HDFS-9370) TestDataNodeUGIProvider fails intermittently due to non-deterministic cache expiry.

2015-11-24 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HDFS-9370:
-

> TestDataNodeUGIProvider fails intermittently due to non-deterministic cache 
> expiry.
> ---
>
> Key: HDFS-9370
> URL: https://issues.apache.org/jira/browse/HDFS-9370
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: HDFS-9370.001.patch, HDFS-9370.002.patch
>
>
> {{TestDataNodeUGIProvider}} has hard-coded sleep times waiting for background 
> expiration of entries in a Guava cache.  I have seen this test suite fail 
> intermittently, because expiration is not guaranteed to happen strictly on 
> the boundary of the period defined by the cache's expiration time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9370) TestDataNodeUGIProvider fails intermittently due to non-deterministic cache expiry.

2015-11-24 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9370.
-
Resolution: Duplicate

> TestDataNodeUGIProvider fails intermittently due to non-deterministic cache 
> expiry.
> ---
>
> Key: HDFS-9370
> URL: https://issues.apache.org/jira/browse/HDFS-9370
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: HDFS-9370.001.patch, HDFS-9370.002.patch
>
>
> {{TestDataNodeUGIProvider}} has hard-coded sleep times waiting for background 
> expiration of entries in a Guava cache.  I have seen this test suite fail 
> intermittently, because expiration is not guaranteed to happen strictly on 
> the boundary of the period defined by the cache's expiration time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9459) hadoop-hdfs-native-client fails test build on Windows after transition to ctest.

2015-11-24 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9459:
---

 Summary: hadoop-hdfs-native-client fails test build on Windows 
after transition to ctest.
 Key: HDFS-9459
 URL: https://issues.apache.org/jira/browse/HDFS-9459
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build, test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Blocker


HDFS-9369 transitioned to usage of {{ctest}} for running the HDFS native tests. 
 This broke the {{mvn test}} build on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9458) TestBackupNode always binds to port 50070, which can cause bind failures.

2015-11-24 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9458:
---

 Summary: TestBackupNode always binds to port 50070, which can 
cause bind failures.
 Key: HDFS-9458
 URL: https://issues.apache.org/jira/browse/HDFS-9458
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth


{{TestBackupNode}} does not override port settings to use a dynamically 
selected port for the NameNode HTTP server.  It uses the default of 50070 
defined in hdfs-default.xml.  This should be changed to select a dynamic port 
to avoid bind errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9450) Fix failing HDFS tests on HDFS-7240 Ozone branch.

2015-11-23 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9450:
---

 Summary: Fix failing HDFS tests on HDFS-7240 Ozone branch.
 Key: HDFS-9450
 URL: https://issues.apache.org/jira/browse/HDFS-9450
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Chris Nauroth


Several test failures have been introduced on the HDFS-7240 Ozone feature 
branch.  This issue tracks fixing those tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9443) Disabling HDFS client socket cache causes logging message printed to console for CLI commands.

2015-11-19 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9443:
---

 Summary: Disabling HDFS client socket cache causes logging message 
printed to console for CLI commands.
 Key: HDFS-9443
 URL: https://issues.apache.org/jira/browse/HDFS-9443
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Trivial


The HDFS client's socket cache can be disabled by setting 
{{dfs.client.socketcache.capacity}} to {{0}}.  When this is done, the 
{{PeerCache}} class logs an info-level message stating that the cache is 
disabled.  This message is getting printed to the console for CLI commands, 
which disrupts CLI output.  This issue proposes to downgrade to debug-level 
logging for this message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-190) DataNode should be marked as final to prevent subclassing

2015-11-10 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-190.

Resolution: Won't Fix

We're now in a situation where the current codebase uses subclassing of 
{{DataNode}} for some tests.  There has been no activity on this issue for many 
years, so it looks unlikely that it would be implemented.  I'm closing it as 
won't fix.

> DataNode should be marked as final to prevent subclassing
> -
>
> Key: HDFS-190
> URL: https://issues.apache.org/jira/browse/HDFS-190
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Steve Loughran
>Priority: Minor
>
> Reviewing the DataNode core, it starts a thread in its constructor calling 
> back in to the Run() method. This is generally perceived as very dangerous, 
> as if DataNode were ever subclassed, the subclass would start to be invoked 
> in the run() method before its own constructor had finished working.
> 1. Consider splitting the constructor from the start() operation.
> 2. If this cannot be changed, mark DataNode as final so nobody can subclass 
> it.  Though if the latter were done, it would be convenient to have a method 
> to let external management components poll for the health of the node, and to 
> pick up reasons for the node shutting down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9409) DataNode shutdown does not guarantee full shutdown of all threads due to race condition.

2015-11-10 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9409:
---

 Summary: DataNode shutdown does not guarantee full shutdown of all 
threads due to race condition.
 Key: HDFS-9409
 URL: https://issues.apache.org/jira/browse/HDFS-9409
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Chris Nauroth


{{DataNode#shutdown}} is documented to return "only after shutdown is 
complete".  Even after completion of this method, it's possible that threads 
started by the DataNode are still running.  Race conditions in the shutdown 
sequence may cause it to skip stopping and joining the {{BPServiceActor}} 
threads.

This is likely not a big problem in normal operations, because these are daemon 
threads that won't block overall process exit.  It is more of a problem for 
tests, because it makes it impossible to write reliable assertions that these 
threads exited cleanly.  For large test suites, it can also cause an 
accumulation of unneeded threads, which might harm test performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9404) Findbugs issue reported in BlockRecoveryWorker$RecoveryTaskContiguous.recover()

2015-11-09 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9404.
-
Resolution: Duplicate

Hi [~yzhangal].  This is tracked in HDFS-9401.  Thanks!

> Findbugs issue reported in 
> BlockRecoveryWorker$RecoveryTaskContiguous.recover()
> ---
>
> Key: HDFS-9404
> URL: https://issues.apache.org/jira/browse/HDFS-9404
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Reporter: Yongjun Zhang
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/13431/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
> Reported:
> Code  Warning
> ECCall to 
> org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration.equals(org.apache.hadoop.hdfs.protocol.DatanodeInfo)
>  in 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover()
> Details
> EC_UNRELATED_TYPES: Call to equals() comparing different types
> This method calls equals(Object) on two references of different class types 
> and analysis suggests they will be to objects of different classes at 
> runtime. Further, examination of the equals methods that would be invoked 
> suggest that either this call will always return false, or else the equals 
> method is not be symmetric (which is a property required by the contract for 
> equals in class Object).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9400) TestRollingUpgradeRollback fails on branch-2.

2015-11-07 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9400:
---

 Summary: TestRollingUpgradeRollback fails on branch-2.
 Key: HDFS-9400
 URL: https://issues.apache.org/jira/browse/HDFS-9400
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chris Nauroth
Priority: Blocker


During a Jenkins pre-commit run on branch-2 for the HDFS-9394 patch, we noticed 
a pre-existing failure in {{TestRollingUpgradeRollback}}.  I have confirmed 
that this test is failing in branch-2 only.  It passes in trunk, and it passes 
in branch-2.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9384) TestWebHdfsContentLength intermittently hangs and fails due to TCP conversation mismatch between client and server.

2015-11-05 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9384:
---

 Summary: TestWebHdfsContentLength intermittently hangs and fails 
due to TCP conversation mismatch between client and server.
 Key: HDFS-9384
 URL: https://issues.apache.org/jira/browse/HDFS-9384
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


{{TestWebHdfsContentLength}} runs a simple hand-coded HTTP server in a 
background thread to simulate some WebHDFS server responses.  In some 
environments (notably Windows), I have observed that the test can hang and fail 
intermittently.  The root cause is that the server fails to fully consume the 
client's input.  This causes a mismatch in the TCP conversation state, and 
ultimately the client side hangs, then aborts after the 60-second socket 
timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9394) branch-2 hadoop-hdfs-client fails during FileSystem ServiceLoader initialization, because HftpFileSystem is missing.

2015-11-05 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9394:
---

 Summary: branch-2 hadoop-hdfs-client fails during FileSystem 
ServiceLoader initialization, because HftpFileSystem is missing.
 Key: HDFS-9394
 URL: https://issues.apache.org/jira/browse/HDFS-9394
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Reporter: Chris Nauroth
Priority: Critical


On branch-2, hadoop-hdfs-client contains a {{FileSystem}} service descriptor 
that lists {{HftpFileSystem}} and {{HsftpFileSystem}}.  These classes do not 
reside in hadoop-hdfs-client.  Instead, they reside in hadoop-hdfs.  If the 
application has hadoop-hdfs-client.jar on the classpath, but not 
hadoop-hdfs.jar, then this can cause a {{ServiceConfigurationError}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9378) hadoop-hdfs-client tests do not write logs.

2015-11-04 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9378:
---

 Summary: hadoop-hdfs-client tests do not write logs.
 Key: HDFS-9378
 URL: https://issues.apache.org/jira/browse/HDFS-9378
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


The tests that have been split into the hadoop-hdfs-client module are not 
writing any log output, because there is no src/test/resources/log4j.properties 
file in the module.  This makes it more difficult to troubleshoot test failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9370) TestDataNodeUGIProvider fails intermittently due to non-deterministic cache expiry.

2015-11-03 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9370:
---

 Summary: TestDataNodeUGIProvider fails intermittently due to 
non-deterministic cache expiry.
 Key: HDFS-9370
 URL: https://issues.apache.org/jira/browse/HDFS-9370
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


{{TestDataNodeUGIProvider}} has hard-coded sleep times waiting for background 
expiration of entries in a Guava cache.  I have seen this test suite fail 
intermittently, because expiration is not guaranteed to happen strictly on the 
boundary of the period defined by the cache's expiration time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9362) TestAuditLogger#testAuditLoggerWithCallContext assumes Unix line endings, fails on Windows.

2015-11-02 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9362:
---

 Summary: TestAuditLogger#testAuditLoggerWithCallContext assumes 
Unix line endings, fails on Windows.
 Key: HDFS-9362
 URL: https://issues.apache.org/jira/browse/HDFS-9362
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


{{TestAuditLogger#testAuditLoggerWithCallContext}} was added recently to 
exercise the new audit logging with caller context functionality.  The tests 
assume Unix line endings by hard-coding "\n" in asserts.  These tests fail on 
Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9311) Support optional offload of NameNode HA service health checks to a separate RPC server.

2015-10-26 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9311:
---

 Summary: Support optional offload of NameNode HA service health 
checks to a separate RPC server.
 Key: HDFS-9311
 URL: https://issues.apache.org/jira/browse/HDFS-9311
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Reporter: Chris Nauroth
Assignee: Chris Nauroth


When a NameNode is overwhelmed with load, it can lead to resource exhaustion of 
the RPC handler pools (both client-facing and service-facing).  Eventually, 
this blocks the health check RPC issued from ZKFC, which triggers a failover.  
Depending on fencing configuration, the former active NameNode may be killed.  
In an overloaded situation, the new active NameNode is likely to suffer the 
same fate, because client load patterns don't change after the failover.  This 
can degenerate into flapping between the 2 NameNodes without real recovery.  If 
a NameNode had been killed by fencing, then it would have to transition through 
safe mode, further delaying time to recovery.

This issue proposes a separate, optional RPC server at the NameNode for 
isolating the HA health checks.  These health checks are lightweight operations 
that do not suffer from contention issues on the namesystem lock or other 
shared resources.  Isolating the RPC handlers is sufficient to avoid this 
situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9239) DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness

2015-10-13 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9239:
---

 Summary: DataNode Lifeline Protocol: an alternative protocol for 
reporting DataNode liveness
 Key: HDFS-9239
 URL: https://issues.apache.org/jira/browse/HDFS-9239
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: DataNode-Lifeline-Protocol.pdf

This issue proposes introduction of a new feature: the DataNode Lifeline 
Protocol.  This is an RPC protocol that is responsible for reporting liveness 
and basic health information about a DataNode to a NameNode.  Compared to the 
existing heartbeat messages, it is lightweight and not prone to resource 
contention problems that can harm accurate tracking of DataNode liveness 
currently.  The attached design document contains more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9138) TestDatanodeStartupFixesLegacyStorageIDs fails on Windows due to failure to unpack old image tarball that contains hard links

2015-09-24 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9138:
---

 Summary: TestDatanodeStartupFixesLegacyStorageIDs fails on Windows 
due to failure to unpack old image tarball that contains hard links
 Key: HDFS-9138
 URL: https://issues.apache.org/jira/browse/HDFS-9138
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


{{TestDatanodeStartupFixesLegacyStorageIDs#testUpgradeFrom22via26FixesStorageIDs}}
 uses a checked-in DataNode data directory that contains hard links.  The hard 
links cannot be handled correctly by the commons-compress library used in the 
Windows implementation of {{FileUtil#unTar}}.  The result is that the unpacked 
block files have 0 length, the block files reported to the NameNode are 
invalid, and therefore the mini-cluster never gets enough good blocks reported 
to leave safe mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9138) TestDatanodeStartupFixesLegacyStorageIDs fails on Windows due to failure to unpack old image tarball that contains hard links

2015-09-24 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9138.
-
Resolution: Not A Problem

Yes, my earlier HDFS-8554 patch fixed this already.  I was only seeing a test 
failure on an older branch.  I'm resolving this issue.

> TestDatanodeStartupFixesLegacyStorageIDs fails on Windows due to failure to 
> unpack old image tarball that contains hard links
> -
>
> Key: HDFS-9138
> URL: https://issues.apache.org/jira/browse/HDFS-9138
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: HDFS-9138.001.patch
>
>
> {{TestDatanodeStartupFixesLegacyStorageIDs#testUpgradeFrom22via26FixesStorageIDs}}
>  uses a checked-in DataNode data directory that contains hard links.  The 
> hard links cannot be handled correctly by the commons-compress library used 
> in the Windows implementation of {{FileUtil#unTar}}.  The result is that the 
> unpacked block files have 0 length, the block files reported to the NameNode 
> are invalid, and therefore the mini-cluster never gets enough good blocks 
> reported to leave safe mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-9136) TestDFSUpgrade leaks file descriptors.

2015-09-23 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-9136.
-
Resolution: Not A Problem

I was mistaken.  This actually got fixed as part of the test changes done in 
HDFS-8846.  I'm no longer seeing a problem after that patch.  I'll resolve this.

> TestDFSUpgrade leaks file descriptors.
> --
>
> Key: HDFS-9136
> URL: https://issues.apache.org/jira/browse/HDFS-9136
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
>
> HDFS-8480 introduced code in {{TestDFSUpgrade#testPreserveEditLogs}} that 
> opens edit log files and reads from them, but these files are never closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9136) TestDFSUpgrade leaks file descriptors.

2015-09-23 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9136:
---

 Summary: TestDFSUpgrade leaks file descriptors.
 Key: HDFS-9136
 URL: https://issues.apache.org/jira/browse/HDFS-9136
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


HDFS-8480 introduced code in {{TestDFSUpgrade#testPreserveEditLogs}} that opens 
edit log files and reads from them, but these files are never closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9038) Reserved space is erroneously counted towards non-DFS used.

2015-09-08 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9038:
---

 Summary: Reserved space is erroneously counted towards non-DFS 
used.
 Key: HDFS-9038
 URL: https://issues.apache.org/jira/browse/HDFS-9038
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.7.1
Reporter: Chris Nauroth


HDFS-5215 changed the DataNode volume available space calculation to consider 
the reserved space held by the {{dfs.datanode.du.reserved}} configuration 
property.  As a side effect, reserved space is now counted towards non-DFS 
used.  I don't believe it was intentional to change the definition of non-DFS 
used.  This issue proposes restoring the prior behavior: do not count reserved 
space towards non-DFS used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8748) ACL permission check does not union groups to determine effective permissions

2015-07-24 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-8748.
-
Resolution: Won't Fix

As I stated in my last comment, the high level design goal of HDFS ACLs was to 
match POSIX semantics as closely as possible.  I'm going to resolve this as 
won't fix, because the current implemented behavior matches the latest quote 
from the POSIX spec, even though it doesn't match the HDFS-4685 design doc.

[~scott_o], I really appreciate your diligence tracking down the relevant spec. 
 Thank you!

 ACL permission check does not union groups to determine effective permissions
 -

 Key: HDFS-8748
 URL: https://issues.apache.org/jira/browse/HDFS-8748
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.7.1
Reporter: Scott Opell
  Labels: acl, permission
 Attachments: HDFS_8748.patch


 In the ACL permission checking routine, the implemented named group section 
 does not match the design document.
 In the design document, its shown in the pseudo-code that if the requester is 
 not the owner or a named user, then the applicable groups are unioned 
 together to form effective permissions for the requester.
 Instead, the current implementation will search for the first group that 
 grants access and will use that. It will not union the permissions together.
 Here is the design document's description of the desired behavior
 {quote}
 If the user is a member of the file's group or at least one group for which 
 there is a
 named group entry in the ACL, then effective permissions are calculated from 
 groups.
 This is the union of the file group permissions (if the user is a member of 
 the file group)
 and all named group entries matching the user's groups. For example, consider 
 a user
 that is a member of 2 groups: sales and execs. The user is not the file 
 owner, and the
 ACL contains no named user entries. The ACL contains named group entries for 
 both
 groups as follows: group:sales:r­­\-\-, group:execs:\-­w\-­. In this case, 
 the user's effective
 permissions are rw­-.
 {quote}
  
 ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf
  page 10??
 The design document's algorithm matches that description:
 *Design Document Algorithm*
 {code:title=DesignDocument}
 if (user == fileOwner) {
 effectivePermissions = aclEntries.getOwnerPermissions()
 } else if (user ∈ aclEntries.getNamedUsers()) {
 effectivePermissions = aclEntries.getNamedUserPermissions(user)
 } else if (userGroupsInAcl != ∅) {
 effectivePermissions = ∅
 if (fileGroup ∈ userGroupsInAcl) {
 effectivePermissions = effectivePermissions ∪
 aclEntries.getGroupPermissions()
 }
 for ({group | group ∈ userGroupsInAcl}) {
 effectivePermissions = effectivePermissions ∪
 aclEntries.getNamedGroupPermissions(group)
 }
 } else {
 effectivePermissions = aclEntries.getOthersPermissions()
 }
 {code}
 ??https://issues.apache.org/jira/secure/attachment/12627729/HDFS-ACLs-Design-3.pdf
  page 9??
 The current implementation does NOT match the description.
 *Current Trunk*
 {code:title=FSPermissionChecker.java}
 // Use owner entry from permission bits if user is owner.
 if (getUser().equals(inode.getUserName())) {
   if (mode.getUserAction().implies(access)) {
 return;
   }
   foundMatch = true;
 }
 // Check named user and group entries if user was not denied by owner 
 entry.
 if (!foundMatch) {
   for (int pos = 0, entry; pos  aclFeature.getEntriesSize(); pos++) {
 entry = aclFeature.getEntryAt(pos);
 if (AclEntryStatusFormat.getScope(entry) == AclEntryScope.DEFAULT) {
   break;
 }
 AclEntryType type = AclEntryStatusFormat.getType(entry);
 String name = AclEntryStatusFormat.getName(entry);
 if (type == AclEntryType.USER) {
   // Use named user entry with mask from permission bits applied if 
 user
   // matches name.
   if (getUser().equals(name)) {
 FsAction masked = AclEntryStatusFormat.getPermission(entry).and(
 mode.getGroupAction());
 if (masked.implies(access)) {
   return;
 }
 foundMatch = true;
 break;
   }
 } else if (type == AclEntryType.GROUP) {
   // Use group entry (unnamed or named) with mask from permission bits
   // applied if user is a member and entry grants access.  If user is 
 a
   // member of multiple groups that have entries that grant access, 
 then
   // it doesn't matter which is chosen, so exit early after first 
 match.
   String group = name == null ? 

[jira] [Resolved] (HDFS-8761) Windows HDFS daemon - datanode.DirectoryScanner: Error compiling report (...) XXX is not a prefix of YYY

2015-07-20 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-8761.
-
Resolution: Not A Problem

Hello [~odelalleau].

I answered your question on Stack Overflow.  I'm pasting the answer here too.  
After using the techniques I described to configure a path with a drive spec, I 
expect you won't see these errors anymore.  In the future, the best forum for 
questions like this is the u...@hadoop.apache.org mailing list.

You can specify a drive spec in {{hadoop.tmp.dir}} in core-site.xml by 
prepending a '/' in front of the absolute path, and using '/' as the path 
separator instead of '\' for all path elements.  For example, if the desired 
absolute path is D:\tmp\hdp, then it would look like this:

{code}
property
namehadoop.tmp.dir/name
value/D:/tmp/hadoop/value
/property
{code}

The reason this works is that the default values for many of the HDFS 
directories are configured to be file://${hadoop.tmp.dir}/suffix.  See the 
default definitions of {{dfs.namenode.name.dir}}, {{dfs.datanode.data.dir}} and 
{{dfs.namenode.checkpoint.dir}} here:

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Substituting the above value for {{hadoop.tmp.dir}} yields a valid {{file:}} 
URL with a drive spec and no authority, which satisfies the requirements for 
the HDFS configuration.  It's important to use '/' instead of '\', because a 
bare unencoded '\' character is not valid in URL syntax.

http://www.ietf.org/rfc/rfc1738.txt

If you prefer not to rely on this substitution behavior, then it's also valid 
to override all configuration properties that make use of {{hadoop.tmp.dir}} 
within your hdfs-site.xml file.  Each value must be a full {{file:}} URL.  For 
example:

{code}
property
namedfs.namenode.name.dir/name
valuefile:///D:/tmp/hadoop/dfs/name/value
/property

property
namedfs.datanode.data.dir/name
valuefile:///D:/tmp/hadoop/dfs/data/value
/property

property
namedfs.namenode.checkpoint.dir/name
valuefile:///D:/tmp/hadoop/dfs/namesecondary/value
/property
{code}

You might find this more readable overall.

 Windows HDFS daemon - datanode.DirectoryScanner: Error compiling report (...) 
 XXX is not a prefix of YYY
 

 Key: HDFS-8761
 URL: https://issues.apache.org/jira/browse/HDFS-8761
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.7.1
 Environment: Windows 7, Java SDK 1.8.0_45
Reporter: Olivier Delalleau
Priority: Minor

 I'm periodically seeing errors like the one below output by the HDFS daemon 
 (started with start-dfs.cmd). This is with the default settings for data 
 location (=not specified in my hdfs-site.xml). I assume it may be fixable by 
 specifying a path with the drive letter in the config file, however I haven't 
 be able to do that (see 
 http://stackoverflow.com/questions/31353226/setting-hadoop-tmp-dir-on-windows-gives-error-uri-has-an-authority-component).
 15/07/11 17:29:57 ERROR datanode.DirectoryScanner: Error compiling report
 java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
 \tmp\hadoop-odelalleau\dfs\data is not a prefix of 
 D:\tmp\hadoop-odelalleau\dfs\data\current\BP-1474392971-10.128.22.110-1436634926842\current\finalized\subdir0\subdir0\blk_1073741825
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.getDiskReport(DirectoryScanner.java:566)
 at 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:425)
 at 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:406)
 at 
 org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:362)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8752) OzoneHandler : Add Volume Interface to Data Node HTTP Server

2015-07-10 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-8752.
-
Resolution: Duplicate

 OzoneHandler : Add Volume Interface to Data Node HTTP Server
 

 Key: HDFS-8752
 URL: https://issues.apache.org/jira/browse/HDFS-8752
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: kanaka kumar avvaru
Assignee: kanaka kumar avvaru

 This JIRA proposes to enable  OzoneHandler path in HTTP Info Server and 
 verify volume REST API with Local Storage Handler implementation using 
 MiniDFSCluster 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8593) Calculation of effective layout version mishandles comparison to current layout version in storage.

2015-06-12 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-8593:
---

 Summary: Calculation of effective layout version mishandles 
comparison to current layout version in storage.
 Key: HDFS-8593
 URL: https://issues.apache.org/jira/browse/HDFS-8593
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Chris Nauroth
Assignee: Chris Nauroth


HDFS-8432 introduced the concept of a minimum compatible layout version so that 
downgrade is applicable in a wider set of circumstances.  This includes logic 
for determining if the current layout version in storage is within the bounds 
of the minimum compatible layout version.  There is an inverted comparison in 
this logic, which can result in an incorrect calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8554) TestDatanodeLayoutUpgrade fails on Windows.

2015-06-06 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-8554:
---

 Summary: TestDatanodeLayoutUpgrade fails on Windows.
 Key: HDFS-8554
 URL: https://issues.apache.org/jira/browse/HDFS-8554
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth


{{TestDatanodeLayoutUpgrade}} fails on Windows due to some Linux-specific file 
system path handling and incorrect handling of hard links unpacked from a 
tarball.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8527) OzoneHandler: Integration of REST interface and container data pipeline back-end

2015-06-03 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-8527:
---

 Summary: OzoneHandler: Integration of REST interface and container 
data pipeline back-end
 Key: HDFS-8527
 URL: https://issues.apache.org/jira/browse/HDFS-8527
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chris Nauroth
Assignee: Chris Nauroth


This issue tracks development of OzoneHandler.  This is a component within the 
DataNode that receives inbound requests parsed from the REST interface, 
dispatches to the underlying storage container data pipeline, and then returns 
an appropriate response to the REST layer for translation to an outbound HTTP 
response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8503) fs.copyoLocalFile(src, dst);will have NPE and can not run successful in windows

2015-06-01 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-8503.
-
Resolution: Not A Problem

Yes, agreed.  Thank you for your analysis, [~brahmareddy].

 fs.copyoLocalFile(src, dst);will have NPE and can not run successful in 
 windows
 ---

 Key: HDFS-8503
 URL: https://issues.apache.org/jira/browse/HDFS-8503
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fs
Affects Versions: 2.7.0
Reporter: huangyitian
Assignee: surendra singh lilhore

 ==Test Code===
 //copy file from HDFS to local
 fs.copyToLocalFile(dst, new Path(copiedlocalfile));
 ==ERROR Exception==
 java.lang.NullPointerException
   at java.lang.ProcessBuilder.start(Unknown Source)
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:483)
   at org.apache.hadoop.util.Shell.run(Shell.java:456)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:815)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:798)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:740)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:224)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.init(RawLocalFileSystem.java:208)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:304)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:292)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:325)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.init(ChecksumFileSystem.java:393)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:365)
   at 
 org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)java.lang.NullPointerException
   at java.lang.ProcessBuilder.start(Unknown Source)
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:483)
   at org.apache.hadoop.util.Shell.run(Shell.java:456)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:815)
   at org.apache.hadoop.util.Shell.execCommand(She
   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2030)
   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1999)
   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1975)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8510) Provide different timeout settings for hdfs dfsadmin -getDatanodeInfo.

2015-06-01 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-8510:
---

 Summary: Provide different timeout settings for hdfs dfsadmin 
-getDatanodeInfo.
 Key: HDFS-8510
 URL: https://issues.apache.org/jira/browse/HDFS-8510
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Reporter: Chris Nauroth
Assignee: Chris Nauroth


During a rolling upgrade, an administrator runs {{hdfs dfsadmin 
-getDatanodeInfo}} to check if a DataNode has stopped.  Currently, this 
operation is subject to the RPC connection retries defined in 
{{ipc.client.connect.max.retries}} and {{ipc.client.connect.retry.interval}}.  
This issue proposes adding separate configuration properties to control the 
retries for this operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8432) Introduce a minimum compatible layout version to allow downgrade in more rolling upgrade use cases.

2015-05-19 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-8432:
---

 Summary: Introduce a minimum compatible layout version to allow 
downgrade in more rolling upgrade use cases.
 Key: HDFS-8432
 URL: https://issues.apache.org/jira/browse/HDFS-8432
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode, rolling upgrades
Reporter: Chris Nauroth
Assignee: Chris Nauroth


Maintain the prior layout version during the upgrade window and reject attempts 
to use new features until after the upgrade has been finalized.  This 
guarantees that the prior software version can read the fsimage and edit logs 
if the administrator decides to downgrade.  This will make downgrade usable for 
the majority of NameNode layout version changes, which just involve 
introduction of new edit log operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8419) chmod impact user's effective ACL

2015-05-18 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-8419.
-
Resolution: Not A Problem

Hello, [~sinago].  This behavior is by design.  The documentation mentions that 
running {{chmod}} on a file with an ACL actually changes the permissions on the 
mask entry, which in turn alters the effective permissions for all extended ACL 
entries.

http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#ACLs_Access_Control_Lists

This behavior matches with the POSIX ACL model.  The spec that we used as a 
reference during development goes into greater detail describing the motivation 
for the mask entry and its interaction with applications that are not 
ACL-aware, such as {{chmod}}.

http://users.suse.com/~agruen/acl/linux-acls/online/

If you want, you can control the mask entry directly by using {{setfacl -m}} 
and including a mask entry with the explicit permissions that you want.

 chmod impact user's effective ACL
 -

 Key: HDFS-8419
 URL: https://issues.apache.org/jira/browse/HDFS-8419
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.6.0
Reporter: zhouyingchao
Assignee: zhouyingchao

 I set a directory's ACL to assign rwx permission to user h_user1. Later, I 
 used chmod to change the group permission to r-x. I understand chmod of an 
 acl enabled file would only change the permission mask. The abnormal thing is 
 that the operation will change the h_user1's effective ACL from rwx to r-x.
 Following are ACLs before any operaton:
 -
 \# file: /grptest
 \# owner: hdfs_tst_admin
 \# group: supergroup
 user::rwx
 user:h_user1:rwx
 group::r-x
 mask::rwx
 other::---
 -
 Following are ACLs after chmod 750 /grptest
 -
 \# file: /grptest
 \# owner: hdfs_tst_admin
 \# group: supergroup
 user::rwx
 user:h_user1:rwx  #effective:r-x
 group::r-x
 mask::r-x
 other::---
 -
 I'm wondering if this behavior is by design.  If not, I'd like to fix the 
 issue. Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-4868) Clean up error message when trying to snapshot using ViewFileSystem

2015-05-15 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-4868.
-
Resolution: Duplicate

[~rakeshr], thank you for the reminder.  I'm closing this as duplicate.

 Clean up error message when trying to snapshot using ViewFileSystem
 ---

 Key: HDFS-4868
 URL: https://issues.apache.org/jira/browse/HDFS-4868
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 3.0.0
Reporter: Stephen Chu
Priority: Minor

 Snapshots aren't supported for the ViewFileSystem. When users try to create a 
 snapshot, they'll run into a message like the following:
 {code}
 schu-mbp:presentation schu$ hadoop fs -createSnapshot /user/schu
 -createSnapshot: Fatal internal error
 java.lang.UnsupportedOperationException: ViewFileSystem doesn't support 
 createSnapshot
   at org.apache.hadoop.fs.FileSystem.createSnapshot(FileSystem.java:2285)
   at 
 org.apache.hadoop.fs.shell.SnapshotCommands$CreateSnapshot.processArguments(SnapshotCommands.java:87)
   at 
 org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:194)
   at org.apache.hadoop.fs.shell.Command.run(Command.java:155)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:255)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:305)
 {code}
 To make things more readable and avoid confusion, it would be helpful to 
 clean up the error message stacktrace and just state that ViewFileSystem 
 doesn't support createSnapshot, similar to what was done in HDFS-4846. The 
 fatal internal error message is a bit scary and it might be useful to 
 remove that message to avoid confusion from operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7316) DistCp cannot handle : colon in filename

2015-05-14 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-7316.
-
Resolution: Duplicate

This does look like a duplicate of HADOOP-3257.  I'm resolving.

 DistCp cannot handle : colon in filename
 --

 Key: HDFS-7316
 URL: https://issues.apache.org/jira/browse/HDFS-7316
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Paul Joslin

 Similar to HDFS-13.  If a source directory for distCP contains a file with a 
 colon :, the file will not be copied.
 Example error message:
 java.lang.Exception: java.lang.IllegalArgumentException: Pathname 
 /user/pk1/RECORDS/MasterLink-pk1.gateway2.example.com:22.10:22:30 from 
 hdfs:/access01.mgt.gateway2.example.com:8020/user/pk1/RECORDS/MasterLink-pk1.gateway2.example.com:22.10:22:30
  is not a valid DFS filename.
 at 
 org.apache.hadoop.mapred.example.comJobRunner$Job.runTasks(LocalJobRunner.java:462)
 at 
 org.apache.hadoop.mapred.example.comJobRunner$Job.run(LocalJobRunner.java:522)
 Caused by: java.lang.IllegalArgumentException: Pathname 
 /user/pk1/RECORDS/MasterLink-pxj29.gateway2.example.com:22.10:22:30 from 
 hdfs:/access01.mgt.gateway2.example.com:8020/user/pk1/RECORDS/MasterLink-pk1.gateway2.example.com:22.10:22:30
  is not a valid DFS filename.
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:195)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:104)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1079)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075)
 at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)
 at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
 at 
 org.apache.hadoop.mapred.example.comJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8346) libwebhdfs build fails during link due to unresolved external symbols.

2015-05-07 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-8346:
---

 Summary: libwebhdfs build fails during link due to unresolved 
external symbols.
 Key: HDFS-8346
 URL: https://issues.apache.org/jira/browse/HDFS-8346
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: native
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth


The libwebhdfs build is currently broken due to various unresolved external 
symbols during link.  Multiple patches have introduced a few different forms of 
this breakage.  See comments for full details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8290) WebHDFS calls before namesystem initialization can cause NullPointerException.

2015-04-29 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-8290:
---

 Summary: WebHDFS calls before namesystem initialization can cause 
NullPointerException.
 Key: HDFS-8290
 URL: https://issues.apache.org/jira/browse/HDFS-8290
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


The NameNode has a brief window of time when the HTTP server has been 
initialized, but the namesystem has not been initialized.  During this window, 
a WebHDFS call can cause a {{NullPointerException}}.  We can catch this 
condition and return a more meaningful error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8252) Fix test case failure in org.apache.hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate.testAppendOverTypeQuota

2015-04-25 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-8252.
-
Resolution: Duplicate

 Fix test case failure in 
 org.apache.hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate.testAppendOverTypeQuota
 

 Key: HDFS-8252
 URL: https://issues.apache.org/jira/browse/HDFS-8252
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula

 Quota by storage type : SSD on path : /TestAppendOverTypeQuota is exceeded. 
 quota = 1 B but space consumed = 1 KB
  at 
 org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuotaByStorageType(DirectoryWithQuotaFeature.java:227)
  at 
 org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:240)
  at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:874)
  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyQuotaForUCBlock(FSNamesystem.java:2765)
  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.prepareFileForAppend(FSNamesystem.java:2713)
  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2686)
  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2968)
  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2939)
  at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:659)
  at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:418)
  at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
  at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8164) cTime is 0 in VERSION file for newly formatted NameNode.

2015-04-16 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-8164:
---

 Summary: cTime is 0 in VERSION file for newly formatted NameNode.
 Key: HDFS-8164
 URL: https://issues.apache.org/jira/browse/HDFS-8164
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.3-alpha
Reporter: Chris Nauroth
Priority: Minor


After formatting a NameNode and inspecting its VERSION file, the cTime property 
shows 0.  The value does get updated to current time during an upgrade, but I 
believe this is intended to be the creation time of the cluster, and therefore 
a value of 0 can cause confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7988) Replace usage of ExactSizeInputStream with LimitInputStream.

2015-03-24 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-7988:
---

 Summary: Replace usage of ExactSizeInputStream with 
LimitInputStream.
 Key: HDFS-7988
 URL: https://issues.apache.org/jira/browse/HDFS-7988
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chris Nauroth
Priority: Minor


HDFS has a class named {{ExactSizeInputStream}} used in the protobuf 
translation layer.  This class wraps another {{InputStream}}, but constraints 
the readable bytes to a specified length.  The functionality is nearly 
identical to {{LimitInputStream}} in Hadoop Common, with some differences in 
semantics regarding premature EOF.  This issue proposes to eliminate 
{{ExactSizeInputStream}} in favor of {{LimitInputStream}} to reduce the size of 
the codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7879) hdfs.dll does not export functions of the public libhdfs API.

2015-03-03 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-7879:
---

 Summary: hdfs.dll does not export functions of the public libhdfs 
API.
 Key: HDFS-7879
 URL: https://issues.apache.org/jira/browse/HDFS-7879
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build, libhdfs
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth


HDFS-573 enabled libhdfs to be built for Windows.  This did not include marking 
the public API functions for export in hdfs.dll though, effectively making 
dynamic linking scenarios impossible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7833) DataNode reconfiguration does not recalculate valid volumes required, based on configured failed volumes tolerated.

2015-02-24 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-7833:
---

 Summary: DataNode reconfiguration does not recalculate valid 
volumes required, based on configured failed volumes tolerated.
 Key: HDFS-7833
 URL: https://issues.apache.org/jira/browse/HDFS-7833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Lei (Eddy) Xu


DataNode reconfiguration never recalculates 
{{FsDatasetImpl#validVolsRequired}}.  This may cause incorrect behavior of the 
{{dfs.datanode.failed.volumes.tolerated}} property if reconfiguration causes 
the DataNode to run with a different total number of volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7773) Additional metrics in HDFS to be accessed via jmx.

2015-02-20 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-7773.
-
   Resolution: Fixed
Fix Version/s: 2.7.0

+1 for the both the trunk and branch-2 patches.  I have finished committing 
these.  Anu, thank you for working on these new metrics.

 Additional metrics in HDFS to be accessed via jmx.
 --

 Key: HDFS-7773
 URL: https://issues.apache.org/jira/browse/HDFS-7773
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Reporter: Anu Engineer
Assignee: Anu Engineer
 Fix For: 2.7.0

 Attachments: hdfs-7773.001.patch, hdfs-7773.002.patch, 
 hdfs-7773.003.patch, hdfs-7773.branch-2.001.patch


 We would like to have the following metrics added to DataNode and name node 
 this to improve Ambari dashboard
 1) DN disk i/o utilization
 2) DN network i/o utilization
 3) Namenode read operations 
 4) Namenode write operations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7815) Loop on 'blocks does not belong to any file'

2015-02-20 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-7815.
-
Resolution: Duplicate

 Loop on 'blocks does not belong to any file'
 

 Key: HDFS-7815
 URL: https://issues.apache.org/jira/browse/HDFS-7815
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
 Environment: small cluster on RetHat. 2 namenodes (HA),  6 datanodes 
 with 19TB disk for hdfs.
Reporter: Frode Halvorsen

 I am currently experincing a looping situation;
 The namenode uses appx 1:50 (min:sec) to log a massive amount of lines 
 stating that some blocks don't belong to any file. During this time, it's 
 unresponsive to any requests from datanodes, and if the zoo-keper had been 
 running, it would have taken the name-node down (ssh-fencing : kill).
 When it has finished the 'round', it starts to do some normal work, and among 
 other things, telling the datanode to delete the blocks. But before the 
 datanode has gotten around to delete the blocks, and is about to report back 
 to the namenode, the namenode  has stared on the next round of reporing the 
 same blocks that don't belong to anly file. Thus, the datanode gets a timout 
 when reporing block-updates for the deleted blocks, And this, of course 
 repeats itself over and over again... 
 There is actually two issues , I think,;
 1- the namenode gets totally unresponsive when reporing the blocks (could 
 this be a debug-line instead of a INFO-line)
 2 - the namenode seems to 'forget' that it has already reported those blocks 
 just 2-3 minutes ago...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7714) Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode.

2015-01-30 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-7714:
---

 Summary: Simultaneous restart of HA NameNodes and DataNode can 
cause DataNode to register successfully with only one NameNode.
 Key: HDFS-7714
 URL: https://issues.apache.org/jira/browse/HDFS-7714
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Chris Nauroth


In an HA deployment, DataNodes must register with both NameNodes and send 
periodic heartbeats and block reports to both.  However, if NameNodes and 
DataNodes are restarted simultaneously, then this can trigger a race condition 
in registration.  The end result is that the {{BPServiceActor}} for one 
NameNode terminates, but the {{BPServiceActor}} for the other NameNode remains 
alive.  The DataNode process is then in a half-alive state where it only 
heartbeats and sends block reports to one of the NameNodes.  This could cause a 
loss of storage capacity after an HA failover.  The DataNode process would have 
to be restarted to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7635) Remove TestCorruptFilesJsp from branch-2.

2015-01-16 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-7635:
---

 Summary: Remove TestCorruptFilesJsp from branch-2.
 Key: HDFS-7635
 URL: https://issues.apache.org/jira/browse/HDFS-7635
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


HDFS-6252 removed corrupt_files.jsp, but there is still a test suite named 
{{TestCorruptFilesJsp}} in branch-2.  The tests attempt to call 
corrupt_files.jsp and fail on an HTTP 404 error response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7635) Remove TestCorruptFilesJsp from branch-2.

2015-01-16 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-7635.
-
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed

Arpit, thank you for the quick review.  I have committed this to branch-2.

 Remove TestCorruptFilesJsp from branch-2.
 -

 Key: HDFS-7635
 URL: https://issues.apache.org/jira/browse/HDFS-7635
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7635-branch-2.001.patch


 HDFS-6252 removed corrupt_files.jsp, but there is still a test suite named 
 {{TestCorruptFilesJsp}} in branch-2.  The tests attempt to call 
 corrupt_files.jsp and fail on an HTTP 404 error response.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7636) Support sorting by columns in the Datanode Information tables of the NameNode web UI.

2015-01-16 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-7636:
---

 Summary: Support sorting by columns in the Datanode Information 
tables of the NameNode web UI.
 Key: HDFS-7636
 URL: https://issues.apache.org/jira/browse/HDFS-7636
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Chris Nauroth
Assignee: Chris Nauroth


During discussion of HDFS-7604, we mentioned that it would be nice to be able 
to sort the Datanode Information tables by count of Failed Volumes.  This issue 
proposes to implement sorting by column in these tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7636) Support sorting by columns in the Datanode Information tables of the NameNode web UI.

2015-01-16 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-7636.
-
Resolution: Duplicate

Oops, this is a duplicate of HDFS-6407.  Resolving.

 Support sorting by columns in the Datanode Information tables of the NameNode 
 web UI.
 -

 Key: HDFS-7636
 URL: https://issues.apache.org/jira/browse/HDFS-7636
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Chris Nauroth
Assignee: Chris Nauroth

 During discussion of HDFS-7604, we mentioned that it would be nice to be able 
 to sort the Datanode Information tables by count of Failed Volumes.  This 
 issue proposes to implement sorting by column in these tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7632) MiniDFSCluster configures DataNode data directories incorrectly if using more than 1 DataNode and more than 2 storage locations per DataNode.

2015-01-15 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-7632:
---

 Summary: MiniDFSCluster configures DataNode data directories 
incorrectly if using more than 1 DataNode and more than 2 storage locations per 
DataNode.
 Key: HDFS-7632
 URL: https://issues.apache.org/jira/browse/HDFS-7632
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Chris Nauroth
Assignee: Chris Nauroth


{{MiniDFSCluster}} allows the caller to set the number of storage locations per 
DataNode.  If this number is set higher than 2, and if the cluster is 
configured with more than 1 DataNode, then the calculation of the storage 
directory paths will be incorrect.  Multiple DataNodes will attempt to use the 
same storage directory, and one of them will fail while trying to acquire the 
file lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7604) Track and display failed DataNode storage locations in NameNode.

2015-01-12 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-7604:
---

 Summary: Track and display failed DataNode storage locations in 
NameNode.
 Key: HDFS-7604
 URL: https://issues.apache.org/jira/browse/HDFS-7604
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Reporter: Chris Nauroth
Assignee: Chris Nauroth


During heartbeats, the DataNode can report a list of its storage locations that 
have been taken out of service due to failure (such as due to a bad disk or a 
permissions problem).  The NameNode can track these failed storage locations 
and then report them in JMX and the NameNode web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-1213) Implement an Apache Commons VFS Driver for HDFS

2015-01-07 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-1213.
-
Resolution: Not a Problem

There is now an HDFS provider implemented in the Apache Commons VFS tree:

http://svn.apache.org/viewvc/commons/proper/vfs/trunk/core/src/main/java/org/apache/commons/vfs2/provider/hdfs/

I believe that means this jira is no longer needed, so I'm going to resolve it. 
 (Please feel free to reopen if I misunderstood.)

 Implement an Apache Commons VFS Driver for HDFS
 ---

 Key: HDFS-1213
 URL: https://issues.apache.org/jira/browse/HDFS-1213
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Reporter: Michael D'Amour
 Attachments: HADOOP-HDFS-Apache-VFS.patch, 
 pentaho-hdfs-vfs-TRUNK-SNAPSHOT-sources.tar.gz, 
 pentaho-hdfs-vfs-TRUNK-SNAPSHOT.jar


 We have an open source ETL tool (Kettle) which uses VFS for many input/output 
 steps/jobs.  We would like to be able to read/write HDFS from Kettle using 
 VFS.  
  
 I haven't been able to find anything out there other than it would be nice.
  
 I had some time a few weeks ago to begin writing a VFS driver for HDFS and we 
 (Pentaho) would like to be able to contribute this driver.  I believe it 
 supports all the major file/folder operations and I have written unit tests 
 for all of these operations.  The code is currently checked into an open 
 Pentaho SVN repository under the Apache 2.0 license.  There are some current 
 limitations, such as a lack of authentication (kerberos), which appears to be 
 coming in 0.22.0, however, the driver supports username/password, but I just 
 can't use them yet.
 I will be attaching the code for the driver once the case is created.  The 
 project does not modify existing hadoop/hdfs source.
 Our JIRA case can be found at http://jira.pentaho.com/browse/PDI-4146



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7591) hdfs classpath command should support same options as hadoop classpath.

2015-01-07 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-7591:
---

 Summary: hdfs classpath command should support same options as 
hadoop classpath.
 Key: HDFS-7591
 URL: https://issues.apache.org/jira/browse/HDFS-7591
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Reporter: Chris Nauroth






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7522) TestDecommission#testIncludeByRegistrationName sometimes timeout

2014-12-18 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-7522.
-
Resolution: Duplicate

 TestDecommission#testIncludeByRegistrationName sometimes timeout
 

 Key: HDFS-7522
 URL: https://issues.apache.org/jira/browse/HDFS-7522
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From 
 https://builds.apache.org/job/Hadoop-hdfs-trunk/lastCompletedBuild/testReport/org.apache.hadoop.hdfs/TestDecommission/testIncludeByRegistrationName/
  :
 {code}
 java.lang.Exception: test timed out after 36 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
 {code}
 In the test output, the following repeated over 300 times:
 {code}
 2014-12-13 18:44:29,910 ERROR datanode.DataNode 
 (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
 BP-408261154-67.195.81.152-1418496249312 (Datanode Uuid null) service to 
 localhost/127.0.0.1:38607 Datanode denied communication with namenode because 
 the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
 datanodeUuid=755318af-3336-462d-9bd7-2a7b966ee4f4, infoPort=45707, 
 infoSecurePort=0, ipcPort=46621, 
 storageInfo=lv=-56;cid=testClusterID;nsid=1154823031;c=0)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
   at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
   at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
 2014-12-13 18:44:30,871 INFO  hdfs.TestDecommission 
 (TestDecommission.java:testIncludeByRegistrationName(956)) - Waiting for 
 datanode to come back
 2014-12-13 18:44:31,873 INFO  hdfs.TestDecommission 
 (TestDecommission.java:testIncludeByRegistrationName(956)) - Waiting for 
 datanode to come back
 2014-12-13 18:44:32,874 INFO  hdfs.TestDecommission 
 (TestDecommission.java:testIncludeByRegistrationName(956)) - Waiting for 
 datanode to come back
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7212) Huge number of BLOCKED threads rendering DataNodes useless

2014-12-11 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-7212.
-
Resolution: Duplicate

 Huge number of BLOCKED threads rendering DataNodes useless
 --

 Key: HDFS-7212
 URL: https://issues.apache.org/jira/browse/HDFS-7212
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
 Environment: PROD
Reporter: Istvan Szukacs

 There are 3000 - 8000 threads in each datanode JVM, blocking the entire VM 
 and rendering the service unusable, missing heartbeats and stopping data 
 access. The threads look like this:
 {code}
 3415 (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may 
 be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
 line=186 (Compiled frame)
 - 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
 @bci=1, line=834 (Interpreted frame)
 - 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
  int) @bci=67, line=867 (Interpreted frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, 
 line=1197 (Interpreted frame)
 - java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() @bci=21, 
 line=214 (Compiled frame)
 - java.util.concurrent.locks.ReentrantLock.lock() @bci=4, line=290 (Compiled 
 frame)
 - 
 org.apache.hadoop.net.unix.DomainSocketWatcher.add(org.apache.hadoop.net.unix.DomainSocket,
  org.apache.hadoop.net.unix.DomainSocketWatcher$Handler) @bci=4, line=286 
 (Interpreted frame)
 - 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(java.lang.String,
  org.apache.hadoop.net.unix.DomainSocket) @bci=169, line=283 (Interpreted 
 frame)
 - 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(java.lang.String)
  @bci=212, line=413 (Interpreted frame)
 - 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(java.io.DataInputStream)
  @bci=13, line=172 (Interpreted frame)
 - 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(org.apache.hadoop.hdfs.protocol.datatransfer.Op)
  @bci=149, line=92 (Compiled frame)
 - org.apache.hadoop.hdfs.server.datanode.DataXceiver.run() @bci=510, line=232 
 (Compiled frame)
 - java.lang.Thread.run() @bci=11, line=744 (Interpreted frame)
 {code}
 Has anybody seen this before?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7505) Old hdfs .jsp pages need to be removed due to a security risk

2014-12-10 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-7505.
-
Resolution: Duplicate

 Old hdfs .jsp pages need to be removed due to a security risk
 -

 Key: HDFS-7505
 URL: https://issues.apache.org/jira/browse/HDFS-7505
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.4.1
Reporter: Michael Segel 
Priority: Critical

 During a penetration test, by manually entering the URL for the 
 dfshealth.jsp, its possible to circumvent security on the cluster. 
 The issue was found in Hortonworks 2.1 but it is believed to exist in all of 
 the Apache based distributions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7477) Replace ACLException with AccessControlException

2014-12-05 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-7477.
-
Resolution: Not a Problem

{{AclException}} is used to indicate an attempt to set an invalid ACL.  You'll 
see it thrown from places like {{AclTransformation}} and {{AclStorage}} that 
are involved in calculating a new ACL and setting it on an inode.  This 
exception is never used to indicate access denied.

{{FSPermissionChecker}} always throws {{AccessControlException}} to indicate 
access denied.  In the presence of an ACL, the exception comes from 
{{FSPermissionChecker#checkAccessAcl}}.  This correctly triggers an audit log 
entry.

Also, if we consider the ACL mutation operations like {{setAcl}}, they check 
{{FSPermissionChecker}} first before going into {{AclTransformation}} and 
{{AclStorage}}.  That means the {{AccessControlException}} would get thrown 
before any potential {{AclException}} is thrown, so again, we have correct 
audit logging behavior for those operations.

I don't believe there is anything to be done here, so I'm resolving this as Not 
a Problem.  Please feel free to reopen if you think if I've misunderstood 
something, and we do in fact have a bug.  Thanks!

 Replace ACLException with AccessControlException
 

 Key: HDFS-7477
 URL: https://issues.apache.org/jira/browse/HDFS-7477
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Li Lu

 Currently many functions logs audit log during failures only when 
 {{AccessControlException}} is thrown, thus no audit logs are logged if 
 {{AclException}} is thrown when the ACLs deny the access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-4552) For Hadoop 2.0.3; setting CLASSPATH=$(hadoop classpath) does not work, as opposed to 1.x versions

2014-12-02 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-4552.
-
Resolution: Duplicate

 For Hadoop 2.0.3; setting CLASSPATH=$(hadoop classpath) does not work, as 
 opposed to 1.x versions
 -

 Key: HDFS-4552
 URL: https://issues.apache.org/jira/browse/HDFS-4552
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.0.3-alpha
 Environment: Ubuntu 12.04 32 bit, java version 1.7.0_03
 c++ application
Reporter: Shubhangi Garg

 I am writing an application in c++, which uses API provided by libhdfs to 
 manipulate Hadoop DFS.
 I could run the application with 1.0.4 and 1.1.1; setting classpath equal to 
 $(hadoop classpath).
 For Hadoop 2.0.3; setting CLASSPATH=$(hadoop classpath) does not load 
 necessary classes required forlibhdfs; as opposed to 1.x versions; giving the 
 following error:
 loadFileSystems error:
 (unable to get stack trace for java.lang.NoClassDefFoundError exception: 
 ExceptionUtils::getStackTrace error.)
 hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0, 
 kerbTicketCachePath=(NULL), userName=(NULL)) error:
 (unable to get stack trace for java.lang.NoClassDefFoundError exception: 
 ExceptionUtils::getStackTrace error.)
 I tried loading the jar files with their full path specified (as opposed to 
 wildcard characters used in the classpath); and the application runs, but 
 gives the following warning:
 SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
 SLF4J: Defaulting to no-operation (NOP) logger implementation
 SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
 details.
 13/03/04 11:17:23 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7425) NameNode block deletion logging uses incorrect appender.

2014-11-21 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-7425:
---

 Summary: NameNode block deletion logging uses incorrect appender.
 Key: HDFS-7425
 URL: https://issues.apache.org/jira/browse/HDFS-7425
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth


The NameNode uses 2 separate Log4J appenders for tracking state changes.  The 
appenders are named org.apache.hadoop.hdfs.StateChange and 
BlockStateChange.  The intention of BlockStateChange is to separate more 
verbose block state change logging and allow it to be configured separately.  
In branch-2, there is some block state change logging that incorrectly goes to 
the org.apache.hadoop.hdfs.StateChange appender though.  The bug is not 
present in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7425) NameNode block deletion logging uses incorrect appender.

2014-11-21 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-7425.
-
   Resolution: Fixed
Fix Version/s: 2.6.1
 Hadoop Flags: Reviewed

I committed this to branch-2 and branch-2.6.  Haohui, thank you for the code 
review.

 NameNode block deletion logging uses incorrect appender.
 

 Key: HDFS-7425
 URL: https://issues.apache.org/jira/browse/HDFS-7425
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 2.6.1

 Attachments: HDFS-7425-branch-2.1.patch


 The NameNode uses 2 separate Log4J appenders for tracking state changes.  The 
 appenders are named org.apache.hadoop.hdfs.StateChange and 
 BlockStateChange.  The intention of BlockStateChange is to separate more 
 verbose block state change logging and allow it to be configured separately.  
 In branch-2, there is some block state change logging that incorrectly goes 
 to the org.apache.hadoop.hdfs.StateChange appender though.  The bug is not 
 present in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7177) Add an option to include minimal ACL in getAclStatus return

2014-11-14 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-7177.
-
Resolution: Duplicate

 Add an option to include minimal ACL in getAclStatus return
 ---

 Key: HDFS-7177
 URL: https://issues.apache.org/jira/browse/HDFS-7177
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor

 Currently the 3 minimal ACL entries are not included in the returned value of 
 getAclStatus. {{FsShell}} gets them separately ({{FsPermission perm = 
 item.stat.getPermission();}}). It'd be useful to make it optional to include 
 them, so that external programs can get a complete view of the permissions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-6711) FSNamesystem#getAclStatus does not write to the audit log.

2014-11-14 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-6711.
-
Resolution: Duplicate

 FSNamesystem#getAclStatus does not write to the audit log.
 --

 Key: HDFS-6711
 URL: https://issues.apache.org/jira/browse/HDFS-6711
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Priority: Minor

 Consider writing an event to the audit log for the {{getAclStatus}} method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3806) Assertion failed in TestStandbyCheckpoints.testBothNodesInStandbyState

2014-11-14 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-3806.
-
Resolution: Duplicate

I'm resolving this as duplicate of HDFS-3519.

 Assertion failed in TestStandbyCheckpoints.testBothNodesInStandbyState
 --

 Key: HDFS-3806
 URL: https://issues.apache.org/jira/browse/HDFS-3806
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
 Environment: Jenkins
Reporter: Trevor Robinson
Priority: Minor

 Failed in Jenkins build for unrelated issue (HDFS-3804): 
 https://builds.apache.org/job/PreCommit-HDFS-Build/3011/testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestStandbyCheckpoints/testBothNodesInStandbyState/
 {noformat}
 java.lang.AssertionError: Expected non-empty 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/fsimage_012
   at org.junit.Assert.fail(Assert.java:91)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageTestUtil.assertNNHasCheckpoints(FSImageTestUtil.java:467)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.HATestUtil.waitForCheckpoint(HATestUtil.java:213)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints.testBothNodesInStandbyState(TestStandbyCheckpoints.java:133)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   >