[jira] [Created] (HDFS-13123) RBF: Add a balancer tool to move data across subsluter

2018-02-07 Thread Wei Yan (JIRA)
Wei Yan created HDFS-13123:
--

 Summary: RBF: Add a balancer tool to move data across subsluter 
 Key: HDFS-13123
 URL: https://issues.apache.org/jira/browse/HDFS-13123
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan


Follow the discussion in HDFS-12615. This Jira is to track effort for building 
a rebalancer tool, used by router-based federation to move data among 
subclusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13122) FSImage should not update quota counts on ObserverNode

2018-02-07 Thread Erik Krogen (JIRA)
Erik Krogen created HDFS-13122:
--

 Summary: FSImage should not update quota counts on ObserverNode
 Key: HDFS-13122
 URL: https://issues.apache.org/jira/browse/HDFS-13122
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs, namenode
Reporter: Erik Krogen
Assignee: Erik Krogen


Currently in {{FSImage#loadEdits()}}, after applying a set of edits, we call
{code}
updateCountForQuota(target.getBlockManager().getStoragePolicySuite(), 
target.dir.rootDir);
{code}
to update the quota counts for the entire namespace, which can be very 
expensive. This makes sense if we are about to become the ANN, since we need 
valid quotas, but not on an ObserverNode which does not need to enforce quotas.

This is related to increasing the frequency with which the SbNN can tail edits 
from the ANN to decrease the lag time for transactions to appear on the 
Observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13121) NPE when request file descriptors when SC read

2018-02-07 Thread Gang Xie (JIRA)
Gang Xie created HDFS-13121:
---

 Summary: NPE when request file descriptors when SC read
 Key: HDFS-13121
 URL: https://issues.apache.org/jira/browse/HDFS-13121
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Gang Xie


Recently, we hit an issue that the DFSClient throws NPE. The case is that, the 
app process exceeds the limit of the max open file. In the case, the libhadoop 
never throw and exception but return null to the request of fds. But 
requestFileDescriptors use the returned fds directly without any check and then 
NPE. 

 

We need add a sanity check here of null pointer.

 

private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer,
 Slot slot) throws IOException {
 ShortCircuitCache cache = clientContext.getShortCircuitCache();
 final DataOutputStream out =
 new DataOutputStream(new BufferedOutputStream(peer.getOutputStream()));
 SlotId slotId = slot == null ? null : slot.getSlotId();
 new Sender(out).requestShortCircuitFds(block, token, slotId, 1,
 failureInjector.getSupportsReceiptVerification());
 DataInputStream in = new DataInputStream(peer.getInputStream());
 BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(
 PBHelperClient.vintPrefixed(in));
 DomainSocket sock = peer.getDomainSocket();
 failureInjector.injectRequestFileDescriptorsFailure();
 switch (resp.getStatus()) {
 case SUCCESS:
 byte buf[] = new byte[1];
 FileInputStream[] fis = new FileInputStream[2];
 {color:#d04437}sock.recvFileInputStreams(fis, buf, 0, buf.length);{color}
 ShortCircuitReplica replica = null;
 try {
 ExtendedBlockId key =
 new ExtendedBlockId(block.getBlockId(), block.getBlockPoolId());
 if (buf[0] == USE_RECEIPT_VERIFICATION.getNumber()) {
 LOG.trace("Sending receipt verification byte for slot {}", slot);
 sock.getOutputStream().write(0);
 }
 {color:#d04437}replica = new ShortCircuitReplica(key, fis[0], fis[1], 
cache,{color}
{color:#d04437} Time.monotonicNow(), slot);{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSS] Meetup for HDFS tests and build infra

2018-02-07 Thread Chris Douglas
Created a poll [1] to inform scheduling. -C

[1]: https://doodle.com/poll/r22znitzae9apfbf

On Tue, Feb 6, 2018 at 3:09 PM, Chris Douglas  wrote:
> The HDFS build is not healthy. Many of the unit tests aren't actually
> run in Jenkins due to resource exhaustion, haven't been updated since
> build/test/data was the test temp dir, or are chronically unstable
> (I'm looking at you, TestDFSStripedOutputStreamWithFailure). The
> situation has deteriorated slowly, but we can't confidently merge
> patches, let alone significant features, when our CI infra is in this
> state.
>
> How would folks feel about a half to full-day meetup to work through
> patches improving this, specifically? We can improve tests,
> troubleshoot the build, and rev/commit existing patches. It would
> require some preparation, so the simultaneous attention is productive
> and not a coordination bottleneck. I started a wiki page for this [1],
> please add to it.
>
> If enough people can make time for this, say in 2-3 weeks, the project
> would certainly benefit. -C
>
> [1]: https://s.apache.org/ng3C

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13120) Snapshot diff could be corrupted after concat

2018-02-07 Thread Xiaoyu Yao (JIRA)
Xiaoyu Yao created HDFS-13120:
-

 Summary: Snapshot diff could be corrupted after concat
 Key: HDFS-13120
 URL: https://issues.apache.org/jira/browse/HDFS-13120
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, snapshots
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao


The snapshot diff can be corrupted after concat files. This could lead to 
Assertion upon DeleteSnapshot and getSnapshotDiff operations later. 

For example, we have seen customers hit stack trace similar to the one below 
but during loading edit entry of DeleteSnapshotOp. After the investigation, we 
found this is a regression caused by HDFS-3689 where the snapshot diff is not 
fully cleaned up after concat. 

I will post the unit test to repro this and fix for it shortly.

{code}
org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Element 
already exists: element=0.txt, CREATED=[0.txt, 1.txt, 2.txt]
at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:196)
at org.apache.hadoop.hdfs.util.Diff.create(Diff.java:216)
at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:463)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff.combinePosteriorAndCollectBlocks(DirectoryWithSnapshotFeature.java:162)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.AbstractINodeDiffList.deleteSnapshotDiff(AbstractINodeDiffList.java:100)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:728)
at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:830)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:237)
at 
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:292)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:321)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.deleteSnapshot(FSDirSnapshotOp.java:249)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteSnapshot(FSNamesystem.java:6566)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.deleteSnapshot(NameNodeRpcServer.java:1823)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.deleteSnapshot(ClientNamenodeProtocolServerSideTranslatorPB.java:1200)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1007)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:873)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:819)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2679)

{code} 





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13119) RBF: manage unavailable clusters

2018-02-07 Thread JIRA
Íñigo Goiri created HDFS-13119:
--

 Summary: RBF: manage unavailable clusters
 Key: HDFS-13119
 URL: https://issues.apache.org/jira/browse/HDFS-13119
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Íñigo Goiri


When a federated cluster has one of the subcluster down, operations that run in 
every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2018-02-07 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/129/

[Feb 6, 2018 8:04:52 PM] (billie) YARN-7890. NPE during container relaunch. 
Contributed by Jason Lowe
[Feb 6, 2018 9:36:32 PM] (kihwal) HADOOP-15212. Add independent secret manager 
method for logging expired




-1 overall


The following subsystems voted -1:
unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Unreaped Processes :

   hadoop-common:1 
   hadoop-hdfs:22 
   bkjournal:7 
   hadoop-mapreduce-client-jobclient:13 
   hadoop-archives:1 
   hadoop-distcp:6 
   hadoop-extras:1 
   hadoop-gridmix:1 
   hadoop-yarn-applications-distributedshell:1 
   hadoop-yarn-client:6 
   hadoop-yarn-server-timelineservice:1 

Failed junit tests :

   hadoop.fs.http.server.TestHttpFSServerNoACLs 
   hadoop.ipc.TestMRCJCSocketFactory 
   hadoop.mapred.TestClusterMRNotification 
   hadoop.tools.TestIntegration 
   hadoop.tools.util.TestProducerConsumer 
   hadoop.tools.TestDistCpViewFs 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.resourceestimator.service.TestResourceEstimatorService 
   hadoop.yarn.sls.appmaster.TestAMSimulator 
   
hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime
 
   
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
 

Timed out junit tests :

   org.apache.hadoop.log.TestLogLevel 
   org.apache.hadoop.hdfs.TestWriteRead 
   org.apache.hadoop.hdfs.TestDatanodeRegistration 
   org.apache.hadoop.hdfs.TestReservedRawPaths 
   org.apache.hadoop.hdfs.TestAclsEndToEnd 
   org.apache.hadoop.hdfs.TestFileCreation 
   org.apache.hadoop.hdfs.TestDatanodeDeath 
   org.apache.hadoop.hdfs.TestSafeMode 
   org.apache.hadoop.hdfs.TestBlockMissingException 
   org.apache.hadoop.hdfs.TestDFSClientRetries 
   org.apache.hadoop.hdfs.TestFileAppend2 
   org.apache.hadoop.hdfs.TestFileCorruption 
   org.apache.hadoop.hdfs.TestFileCreationDelete 
   org.apache.hadoop.hdfs.TestDFSAddressConfig 
   org.apache.hadoop.hdfs.TestSeekBug 
   org.apache.hadoop.hdfs.TestDFSInputStream 
   org.apache.hadoop.hdfs.TestRestartDFS 
   org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache 
   org.apache.hadoop.hdfs.TestDFSClientSocketSize 
   org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead 
   org.apache.hadoop.hdfs.TestDFSRollback 
   org.apache.hadoop.hdfs.TestDFSClientExcludedNodes 
   org.apache.hadoop.hdfs.TestAbandonBlock 
   org.apache.hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperJournalManager 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperEditLogStreams 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperSpeculativeRead 
   org.apache.hadoop.contrib.bkjournal.TestCurrentInprogress 
   org.apache.hadoop.mapred.lib.TestDelegatingInputFormat 
   org.apache.hadoop.mapred.TestMRCJCFileInputFormat 
   org.apache.hadoop.mapred.TestClusterMapReduceTestCase 
   org.apache.hadoop.mapred.TestMRIntermediateDataEncryption 
   org.apache.hadoop.mapred.TestJobSysDirWithDFS 
   org.apache.hadoop.mapred.TestMRTimelineEventHandling 
   org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath 
   org.apache.hadoop.mapred.TestNetworkedJob 
   org.apache.hadoop.mapred.TestMiniMRClientCluster 
   org.apache.hadoop.mapred.TestReduceFetchFromPartialMem 
   org.apache.hadoop.mapred.TestReduceFetch 
   org.apache.hadoop.mapred.TestMROpportunisticMaps 
   org.apache.hadoop.tools.TestHadoopArchives 
   org.apache.hadoop.tools.TestDistCpWithAcls 
   org.apache.hadoop.tools.TestDistCpSync 
   org.apache.hadoop.tools.TestDistCpWithXAttrs 
   org.apache.hadoop.tools.TestDistCpSyncReverseFromTarget 
   org.apache.hadoop.tools.TestDistCpSystem 
   org.apache.hadoop.tools.TestDistCpSyncReverseFromSource 
   org.apache.hadoop.tools.TestCopyFiles 
   org.apache.hadoop.mapred.gridmix.TestSleepJob 
   
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell 
   org.apache.hadoop.yarn.client.api.impl.TestAMRMProxy 
   org.apache.hadoop.yarn.client.TestRMFailover 
   org.apache.hadoop.yarn.client.cli.TestYarnCLI 
   org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA 
   org.apache.hadoop.yarn.client.api.impl.TestYarnClientWithReservation 
   

[jira] [Resolved] (HDFS-13105) Make hadoop proxy user changes reconfigurable in Datanode

2018-02-07 Thread Mukul Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh resolved HDFS-13105.
--
Resolution: Not A Problem

As pointed by [~kihwal] & [~rajive], -refreshSuperUserGroupsConfiguration 
provides method to update proxy user information on NN.

> Make hadoop proxy user changes reconfigurable in Datanode
> -
>
> Key: HDFS-13105
> URL: https://issues.apache.org/jira/browse/HDFS-13105
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>
> Currently any changes to add/delete a new proxy user requires DN restart 
> requiring a downtime. This jira proposes to make the changes in proxy/user 
> configuration reconfiguration via that ReconfigurationProtocol so that the 
> changes can take effect without a DN restart. For details please refer 
> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/Superusers.html.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-07 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota reopened HDFS-11187:
---
  Assignee: Gabor Bota  (was: Wei-Chiu Chuang)

Reopening this to add the change to branch-2

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11187.001.patch, HDFS-11187.002.patch, 
> HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13118) SnapshotDiffReport should provide the INode type

2018-02-07 Thread Ewan Higgs (JIRA)
Ewan Higgs created HDFS-13118:
-

 Summary: SnapshotDiffReport should provide the INode type
 Key: HDFS-13118
 URL: https://issues.apache.org/jira/browse/HDFS-13118
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Ewan Higgs


Currently the snapshot diff report will list which inodes were added, removed, 
renamed, etc. But to see what the INode actually is, we need to actually access 
the underlying snapshot - and this is cumbersome to do programmatically when 
the snapshot diff already has the information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org