from:"Konstantin Shvachko \(JIRA\)"

[jira] [Resolved] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2024-01-05 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-14305.

Resolution: Fixed

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Konstantin Shvachko
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-7612) TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir

2021-10-21 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-7612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-7612.
---
Fix Version/s: 3.2.4
   3.3.2
   2.10.2
   3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

I just committed this to the four active branches.
Congratulations [~mkuchenbecker]!

> TestOfflineEditsViewer.testStored() uses incorrect default value for cacheDir
> -
>
> Key: HDFS-7612
> URL: https://issues.apache.org/jira/browse/HDFS-7612
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Assignee: Michael Kuchenbecker
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> final String cacheDir = System.getProperty("test.cache.data",
> "build/test/cache");
> {code}
> results in
> {{FileNotFoundException: build/test/cache/editsStoredParsed.xml (No such file 
> or directory)}}
> when {{test.cache.data}} is not set.
> I can see this failing while running in Eclipse.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16141) [FGL] Address permission related issues with File / Directory

2021-08-13 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-16141.

Fix Version/s: Fine-Grained Locking
 Hadoop Flags: Reviewed
   Resolution: Fixed

I just committed this to fgl branch. Thank you [~prasad-acit].

> [FGL] Address permission related issues with File / Directory
> -
>
> Key: HDFS-16141
> URL: https://issues.apache.org/jira/browse/HDFS-16141
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
> Fix For: Fine-Grained Locking
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Post FGL implementation (MKDIR & Create File), there are existing UTs got 
> impacted which needs to be addressed.
> Failed Tests:
> TestDFSPermission
> TestPermission
> TestFileCreation
> TestDFSMkdirs (Added tests)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16130) [FGL] Implement Create File with FGL

2021-07-23 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-16130.

Hadoop Flags: Reviewed
  Resolution: Fixed

I just committed this. Fixed a few checkstyle warnings.
Thank you [~prasad-acit].

> [FGL] Implement Create File with FGL
> 
>
> Key: HDFS-16130
> URL: https://issues.apache.org/jira/browse/HDFS-16130
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: Fine-Grained Locking
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Implement FGL for Create File.
> Create API acquire global lock at mulitiple stages. Acquire the respective 
> partitioned lock and continue the create operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16128) [FGL] Add support for saving/loading an FS Image for PartitionedGSet

2021-07-23 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-16128.

Fix Version/s: Fine-Grained Locking
 Hadoop Flags: Reviewed
   Resolution: Fixed

I just committed this. Thank you [~xinglin].

> [FGL] Add support for saving/loading an FS Image for PartitionedGSet
> 
>
> Key: HDFS-16128
> URL: https://issues.apache.org/jira/browse/HDFS-16128
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Major
>  Labels: pull-request-available
> Fix For: Fine-Grained Locking
>
>
> Add support to save Inodes stored in PartitionedGSet when saving an FS image 
> and load Inodes into PartitionedGSet from a saved FS image.
> h1. Saving FSImage
> *Original HDFS design*: iterate every inode in inodeMap and save them into 
> the FSImage file. 
> *FGL*: no change is needed here, since PartitionedGSet also provides an 
> iterator interface, to iterate over inodes stored in partitions. 
> h1. Loading an HDFS 
> *Original HDFS design*: it first loads the FSImage files and then loads edit 
> logs for recent changes. FSImage files contain different sections, including 
> INodeSections and INodeDirectorySections. An InodeSection contains serialized 
> Inodes objects and the INodeDirectorySection contains the parent inode for an 
> Inode. When loading an FSImage, the system first loads INodeSections and then 
> load the INodeDirectorySections, to set the parent inode for each inode. 
> After FSImage files are loaded, edit logs are then loaded. Edit log contains 
> recent changes to the filesystem, including Inodes creation/deletion. For a 
> newly created INode, the parent inode is set before it is added to the 
> inodeMap.
> *FGL*: when adding an Inode into the partitionedGSet, we need the parent 
> inode of an inode, in order to determine which partition to store that inode, 
> when NAMESPACE_KEY_DEPTH = 2. Thus, in FGL, when loading FSImage files, we 
> used a temporary LightweightGSet (inodeMapTemp), to store inodes. When 
> LoadFSImage is done, the parent inode for all existing inodes in FSImage 
> files is set. We can now move the inodes into a partitionedGSet. Load edit 
> logs can work as usual, as the parent inode for an inode is set before it is 
> added to the inodeMap. 
> In theory, PartitionedGSet can support to store inodes without setting its 
> parent inodes. All these inodes will be stored in the 0th partition. However, 
> we decide to use a temporary LightweightGSet (inodeMapTemp) to store these 
> inodes, to make this case more transparent.          
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16125) [FGL] Fix the iterator for PartitionedGSet

2021-07-16 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-16125.

Fix Version/s: Fine-Grained Locking
 Hadoop Flags: Reviewed
   Resolution: Fixed

+1 on the latest patch.
I just committed this to branch fgl, also re-based flg to current trunk.
Thank you [~xinglin].

> [FGL] Fix the iterator for PartitionedGSet 
> ---
>
> Key: HDFS-16125
> URL: https://issues.apache.org/jira/browse/HDFS-16125
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, namenode
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: Fine-Grained Locking
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Iterator in PartitionedGSet would visit the first partition twice, since we 
> did not set the keyIterator to move to the first key during initialization.  
>  
> This is related to fgl: https://issues.apache.org/jira/browse/HDFS-14703



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes

2021-04-28 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-16001:
--

 Summary: TestOfflineEditsViewer.testStored() fails reading 
negative value of FSEditLogOpCodes
 Key: HDFS-16001
 URL: https://issues.apache.org/jira/browse/HDFS-16001
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Reporter: Konstantin Shvachko






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15915) Race condition with async edits logging due to updating txId outside of the namesystem log

2021-03-23 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15915:
--

 Summary: Race condition with async edits logging due to updating 
txId outside of the namesystem log
 Key: HDFS-15915
 URL: https://issues.apache.org/jira/browse/HDFS-15915
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, namenode
Reporter: Konstantin Shvachko


{{FSEditLogAsync}} creates an {{FSEditLogOp}} and populates its fields inside 
{{FSNamesystem.writeLock}}. But one essential field the transaction id of the 
edits op remains unset until the time when the operation is scheduled for 
synching. At that time {{beginTransaction()}} will set the the 
{{FSEditLogOp.txid}} and increment the global transaction count. On busy 
NameNode this event can fall outside the write lock. 
This causes problems for Observer reads. It also can potentially reshuffle 
transactions and Standby will apply them in a wrong order.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15849) ExpiredHeartbeats metric should be of Type.COUNTER

2021-02-22 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15849:
--

 Summary: ExpiredHeartbeats metric should be of Type.COUNTER
 Key: HDFS-15849
 URL: https://issues.apache.org/jira/browse/HDFS-15849
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: metrics
Reporter: Konstantin Shvachko


Currently {{ExpiredHeartbeats}} metric has default type, which makes it 
{{Type.GAUGE}}. It should be {{Type.COUNTER}} for proper graphing. See 
discussion in HDFS-15808.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-15632) AbstractContractDeleteTest should set recursive peremeter to true for recursive test cases.

2021-01-22 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-15632.

Fix Version/s: 3.2.3
   2.10.2
   3.1.5
   3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

I just committed this.
Thank you [~antn.kutuzov] for contributing.

> AbstractContractDeleteTest should set recursive peremeter to true for 
> recursive test cases.
> ---
>
> Key: HDFS-15632
> URL: https://issues.apache.org/jira/browse/HDFS-15632
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Anton Kutuzov
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{AbstractContractDeleteTest.testDeleteNonexistentPathRecursive()}} should 
> call {{delete(path, true)}} rather than {{false}}
> Also {{AbstractContractDeleteTest.testDeleteNonexistentPathNonRecursive()}} 
> has a wrong assert message. Should be {{"... attempting to non-recursively 
> delete ..."}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-954) There are two security packages in hdfs, should be one

2021-01-21 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-954.
--
Resolution: Won't Fix

Hey [~antn.kutuzov] this is a rather old jira.
I don't think it is a good idea to do repackaging at this point since it will 
make things harder to backport to older versions.
Closing as won't fix.

> There are two security packages in hdfs, should be one
> --
>
> Key: HDFS-954
> URL: https://issues.apache.org/jira/browse/HDFS-954
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Jakob Homan
>Priority: Major
>  Labels: newbie
>
> Currently the test source tree has both
> src/test/hdfs/org/apache/hadoop/hdfs/security with:
> SecurityTestUtil.java
> TestAccessToken.java
> TestClientProtocolWithDelegationToken.java
> and 
> src/test/hdfs/org/apache/hadoop/security with:
> TestDelegationToken.java
> TestGroupMappingServiceRefresh.java
> TestPermission.java
> These should be combined into one package and possibly some things moved to 
> common.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15751) Add documentation for msync() API to filesystem.md

2020-12-24 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15751:
--

 Summary: Add documentation for msync() API to filesystem.md
 Key: HDFS-15751
 URL: https://issues.apache.org/jira/browse/HDFS-15751
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko


HDFS-15567 introduced new {{FileSystem}} call {{msync()}}. Should add it to the 
API definitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-15623) Respect configured values of rpc.engine

2020-11-06 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-15623.

Fix Version/s: 3.2.3
   2.10.2
   3.1.5
   3.4.0
   3.3.1
 Hadoop Flags: Reviewed
   Resolution: Fixed

I just committed to trunk and branches 3.3, 3.2, 3.1, 2.10.
Thank you [~hchaverri]

> Respect configured values of rpc.engine
> ---
>
> Key: HDFS-15623
> URL: https://issues.apache.org/jira/browse/HDFS-15623
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Hector Sandoval Chaverri
>Assignee: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The HDFS Configuration allows users to specify the RPCEngine implementation 
> to use when communicating with Datanodes and Namenodes. However, the value is 
> overwritten to ProtobufRpcEngine.class in different classes. As an example in 
> NameNodeRpcServer:
> {{RPC.setProtocolEngine(conf, ClientNamenodeProtocolPB.class, 
> ProtobufRpcEngine.class);}}
> {{The configured value of rpc.engine.[protocolName] should be respected to 
> allow for other implementations of RPCEngine to be used}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15665) Balancer logging improvement

2020-11-02 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15665:
--

 Summary: Balancer logging improvement
 Key: HDFS-15665
 URL: https://issues.apache.org/jira/browse/HDFS-15665
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko


It would be good to have Balancer log all relevant configuration parameters on 
each iteration along with some data, which reflects its progress and the amount 
of resources it involves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15632) AbstractContractDeleteTest should set recursive peremeter to true for recursive test cases.

2020-10-14 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15632:
--

 Summary: AbstractContractDeleteTest should set recursive peremeter 
to true for recursive test cases.
 Key: HDFS-15632
 URL: https://issues.apache.org/jira/browse/HDFS-15632
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


{{AbstractContractDeleteTest.testDeleteNonexistentPathRecursive()}} should call 
{{delete(path, true)}} rather than {{false}}
Also {{AbstractContractDeleteTest.testDeleteNonexistentPathNonRecursive()}} has 
a wrong assert message. Should be {{"... attempting to non-recursively delete 
..."}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15567) [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicetly.

2020-09-09 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15567:
--

 Summary: [SBN Read] HDFS should expose msync() API to allow 
downstream applications call it explicetly.
 Key: HDFS-15567
 URL: https://issues.apache.org/jira/browse/HDFS-15567
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, hdfs-client
Reporter: Konstantin Shvachko


Consistent reads from Standby introduced {{msync()}} API HDFS-13688, which 
updates client's state ID with current state of the Active NameNode to 
guarantee consistency of subsequent calls to an ObserverNode. Currently this 
API is exposed via {{DFSClient}} only, which makes it hard for applications to 
access {{msync()}}. One way is to use something like this:
{code}
if(fs instanceof DistributedFileSystem) {
  ((DistributedFileSystem)fs).getClient().msync();
}
{code}
This should be exposed both for {{FileSystem}} and {{FileContext}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15323) StandbyNode fails transition to active due to insufficient transaction tailing

2020-05-01 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15323:
--

 Summary: StandbyNode fails transition to active due to 
insufficient transaction tailing
 Key: HDFS-15323
 URL: https://issues.apache.org/jira/browse/HDFS-15323
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, qjm
Affects Versions: 2.7.7
Reporter: Konstantin Shvachko


StandbyNode is asked to {{transitionToActive()}}. If it fell too far behind in 
tailing journal transaction (from QJM) it can crash with 
{{IllegalStateException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15291) [SBN read] Implement CyclicalBlockingQueue to avoid requing RPC calls on Observer.

2020-04-20 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15291:
--

 Summary: [SBN read] Implement CyclicalBlockingQueue to avoid 
requing RPC calls on Observer.
 Key: HDFS-15291
 URL: https://issues.apache.org/jira/browse/HDFS-15291
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Konstantin Shvachko


RPC queue is currently based on {{LinkedBlockingQueue}}, which is FIFO.
For Observer we delay execution of a call if its lastSeenStateId is larger than 
the stateId of the Observer. The delay implemented as re-queuing the call to 
the and of the queue. Re-queue is not atomic. We can avoid moving elements in 
the queue by replacing {{LinkedBlockingQueue}} with a 
{{CyclicalBlockingQueue}}. So that instead of re-queuing we just move the head 
of the queue and the call automatically becomes the last.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15290) NPE in HttpServer during NameNode startup

2020-04-20 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15290:
--

 Summary: NPE in HttpServer during NameNode startup
 Key: HDFS-15290
 URL: https://issues.apache.org/jira/browse/HDFS-15290
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.8
Reporter: Konstantin Shvachko


When NameNode starts it first starts HttpServer, then starts loading fsImage 
and edits. While loading the namesystem field in NameNode is null. I saw that a 
StandbyNode sends a checkpoint request, which fails with NPE because NNStorage 
is not instantiated yet.
We should check the NameNode startup status before accepting checkpoint 
requests.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15185) StartupProgress reports edits segments until the entire startup completes

2020-02-19 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15185:
--

 Summary: StartupProgress reports edits segments until the entire 
startup completes
 Key: HDFS-15185
 URL: https://issues.apache.org/jira/browse/HDFS-15185
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko


Startup Progress page keeps reporting edits segments after the {{LOAD_EDITS}} 
stage is complete. New steps are added to StartupProgress while journal tailing 
until all startup phases are completed. This adds a lot of edits steps, since 
{{SAFEMODE}} phase can take a long time on a large cluster.
With fast tailing the segments are small, but the number of them is large - 
160K. This makes the page load forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15166) Remove redundant field fStream in ByteStringLog

2020-02-12 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15166:
--

 Summary: Remove redundant field fStream in ByteStringLog
 Key: HDFS-15166
 URL: https://issues.apache.org/jira/browse/HDFS-15166
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


{{ByteStringLog.fStream}} is only used in {{init()}} method and can be replaced 
by a local variable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15118) [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster.

2020-01-13 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15118:
--

 Summary: [SBN Read] Slow clients when Observer reads are enabled 
but there are no Observers on the cluster.
 Key: HDFS-15118
 URL: https://issues.apache.org/jira/browse/HDFS-15118
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


We see substantial degradation in performance of HDFS clients, when Observer 
reads are enabled via {{ObserverReadProxyProvider}}, but there are no 
ObserverNodes on the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15111) start / stopStandbyServices() should log which service it is transitioning to/from.

2020-01-10 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15111:
--

 Summary: start / stopStandbyServices() should log which service it 
is transitioning to/from.
 Key: HDFS-15111
 URL: https://issues.apache.org/jira/browse/HDFS-15111
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, logging
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


Trying to transition Observer to Standby state. Both {{stopStandbyServices()}} 
and {{startStandbyServices()}} log that they are stopping/starting Standby 
services.
# {{startStandbyServices()}} should log which state it is transitioning TO.
# {{stopStandbyServices()}} should log which state it is transitioning FROM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15099) [SBN Read] getBlockLocations() should throw ObserverRetryOnActiveException on an attempt to change aTime on ObserverNode

2020-01-07 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15099:
--

 Summary: [SBN Read] getBlockLocations() should throw 
ObserverRetryOnActiveException on an attempt to change aTime on ObserverNode
 Key: HDFS-15099
 URL: https://issues.apache.org/jira/browse/HDFS-15099
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


The precision of updating an INode's aTime while executing 
{{getBlockLocations()}} is 1 hour by default. Updates cannot be handled by 
ObserverNode, so the call should be redirected to Active NameNode. In order to 
redirect to active the ObserverNode should through 
{{ObserverRetryOnActiveException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15076) Fix tests that hold FSDirectory lock, without holding FSNamesystem lock.

2019-12-20 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15076:
--

 Summary: Fix tests that hold FSDirectory lock, without holding 
FSNamesystem lock.
 Key: HDFS-15076
 URL: https://issues.apache.org/jira/browse/HDFS-15076
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Konstantin Shvachko


Three tests {{TestGetBlockLocations}}, {{TestFSNamesystem}}, 
{{TestDiskspaceQuotaUpdate}} use {{FSDirectory}} methods, which hold 
FSDirectory lock. They should also hold the global Namesystem lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15037) Encryption Zone operations should not block other RPC calls while retreivingencryption keys.

2019-12-06 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15037:
--

 Summary: Encryption Zone operations should not block other RPC 
calls while retreivingencryption keys.
 Key: HDFS-15037
 URL: https://issues.apache.org/jira/browse/HDFS-15037
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, namenode
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


I believe it was an intention to avoid blocking other operations while 
retrieving keys with holding {{[FSDirectory.dirLock}}. But in reality all other 
operations enter first {{FSNamesystemLock}} then {{dirLock}}. So they are all 
blocked waiting for the key.
We see substantial increase in RPC wait time ({{RpcQueueTimeAvgTime}}) on 
NameNode when encryption operations are intermixed with regular workloads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-06 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15036:
--

 Summary: Active NameNode should not silently fail the image 
transfer
 Key: HDFS-15036
 URL: https://issues.apache.org/jira/browse/HDFS-15036
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


Image transfer from Standby NameNode to  Active silently fails on Active, 
without any logging and not notifying the receiver side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.

2019-11-26 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15017:
--

 Summary: Remove redundant import of AtomicBoolean in 
NameNodeConnector.
 Key: HDFS-15017
 URL: https://issues.apache.org/jira/browse/HDFS-15017
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover, hdfs
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


Should remove redundant import.
Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-15004) Refactor TestBalancer for faster execution.

2019-11-21 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-15004:
--

 Summary: Refactor TestBalancer for faster execution.
 Key: HDFS-15004
 URL: https://issues.apache.org/jira/browse/HDFS-15004
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, test
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


{{TestBalancer}} is a big test by itself, it is also a part of many other 
tests. Running these tests involves spinning of {{MiniDFSCluter}} and shutting 
it down for every test case, which is inefficient. Many of the test cases can 
run using the same instance of {{MiniDFSCluter}}, but not all of them. Would be 
good to refactor the tests to optimize their running time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-14792) [SBN read] StanbyNode does not come out of safemode while adding new blocks.

2019-11-12 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-14792.

Fix Version/s: 2.10.1
   Resolution: Fixed

This turned out to be related to the same race condition between edits 
{{OP_ADD_BLOCK}} and IBRs of HDFS-14941. We do not see any delays in leaving 
safemode on StandbyNode after the HDFS-14941 fix.
Closing this as fixed.

> [SBN read] StanbyNode does not come out of safemode while adding new blocks.
> 
>
> Key: HDFS-14792
> URL: https://issues.apache.org/jira/browse/HDFS-14792
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
> Fix For: 2.10.1
>
>
> During startup StandbyNode reports that it needs additional X blocks to reach 
> the threshold 1.. Where X is changing up and down.
> This is because with fast tailing SBN adds new blocks from edits while DNs 
> have not reported replicas yet. Being in SafeMode SBN counts new blocks 
> towards the threshold and can stay in SafeMode for a long time.
> By design, the purpose of startup SafeMode is to disallow modifications of 
> the namespace and blocks map until all DN replicas are reported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-12943) Consistent Reads from Standby Node

2019-10-31 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-12943.

Fix Version/s: 2.10.0
   3.2.2
   3.1.4
   3.3.0
 Hadoop Flags: Reviewed
 Release Note: 
Observer is a new type of NameNodes in addition to Active and Standby in HA 
settings. Observer Node maintains a replica of the namespace same as a Standby 
Node. It additionally allows execution of clients read requests.
To ensure read-after-write consistency within a single client, a state ID is 
introduced in RPC headers. The Observer responds to the client request only 
after its own state has caught up with the client’s state ID, which it 
previously received from the Active NameNode.
Clients can explicitly invoke a new client protocol call msync(), which ensures 
that subsequent reads by this client from an Observer are consistent.
A new client-side ObserverReadProxyProvider is introduced to provide automatic 
switching between Active and Observer NameNodes for submitting respectively 
write and read requests.
   Resolution: Fixed

Closing this as Fixed. The feature has been tested, back-ported down to 2.10 
and released. Few remaining subtasks are being addressed as usual issues.
Added release notes. Please review if I missed anything.

_Thank you everybody for contributing to this effort._

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.0
>
> Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, 
> HDFS-12943-002.patch, HDFS-12943-003.patch, HDFS-12943-004.patch, 
> TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-14443) Throwing RemoteException in the time of Read Operation

2019-10-31 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-14443.

Resolution: Not A Problem

Resolving as not a problem. Please reopen if its.

> Throwing RemoteException in the time of Read Operation
> --
>
> Key: HDFS-14443
> URL: https://issues.apache.org/jira/browse/HDFS-14443
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ranith Sardar
>Priority: Major
>
> 2019-04-19 20:54:59,178 DEBUG 
> org.apache.hadoop.io.retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category WRITE is not supported in state observer. Visit 
> [https://s.apache.org/sbnn-error]
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1990)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1443)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1372)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1929)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2791)
>  , while invoking $Proxy5.getFileInfo over 
> [host-*-*-*-*/*.*.*.*:6*5,host-*-*-*-*/*.*.*.*:**,host-*-*-*-*/*.*.*.*:6**5]. 
> Trying to failover immediately.
>  
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category WRITE is not supported in state observer. Visit 
> [https://s.apache.org/sbnn-error]
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-14020) Emulate Observer node falling far behind the Active

2019-10-31 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-14020.

Resolution: Duplicate

Resolving as duplicate since HDFS-13873 introduced {{testObserverFallBehind()}} 
in {{TestMultiObserverNode}}, which serves the purpose. This has also been 
already tested on live clusters.

> Emulate Observer node falling far behind the Active
> ---
>
> Key: HDFS-14020
> URL: https://issues.apache.org/jira/browse/HDFS-14020
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Sherwood Zheng
>Assignee: Sherwood Zheng
>Priority: Major
>
> Emulate Observer node falling far behind the Active. Ensure readers switch 
> over
> to another Observer instead of waiting for the lagging Observer to catch up. 
> If
> there is only a single Observer, it should fall back to the Active.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2019-09-25 Thread Konstantin Shvachko (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reopened HDFS-14305:


Reopening this.
I think we should revert it before it got into a release and became a liability 
causing incompatible change.

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14305.001.patch, HDFS-14305.002.patch, 
> HDFS-14305.003.patch, HDFS-14305.004.patch, HDFS-14305.005.patch, 
> HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14794) reportBadBlock is rejected by Observer.

2019-08-28 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-14794:
--

 Summary: reportBadBlock is rejected by Observer.
 Key: HDFS-14794
 URL: https://issues.apache.org/jira/browse/HDFS-14794
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


{{reportBadBlock}} is rejected by Observer via StandbyException
{code}StandbyException: Operation category WRITE is not supported in state 
{code}
We should investigate what are the consequences of this and if we should treat 
{{reportBadBlock}} as IBRs. Note that {{reportBadBlock}} is a part of both 
{{ClientProtocol}} and {{DatanodeProtocol}}




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14793) BlockTokenSecretManager should LOG block tokaen range it operates on.

2019-08-28 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-14793:
--

 Summary: BlockTokenSecretManager should LOG block tokaen range it 
operates on.
 Key: HDFS-14793
 URL: https://issues.apache.org/jira/browse/HDFS-14793
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


At startup log enough information to identified the range of block token keys 
for the NameNode. This should make it easier to debug issues with block tokens.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14792) [SBN read] StanbyNode does not come out of safemode while adding new blocks.

2019-08-28 Thread Konstantin Shvachko (Jira)

Konstantin Shvachko created HDFS-14792:
--

 Summary: [SBN read] StanbyNode does not come out of safemode while 
adding new blocks.
 Key: HDFS-14792
 URL: https://issues.apache.org/jira/browse/HDFS-14792
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


During startup StandbyNode reports that it needs additional X blocks to reach 
the threshold 1.. Where X is changing up and down.
This is because with fast tailing SBN adds new blocks from edits while DNs have 
not reported replicas yet. Being in SafeMode SBN counts new blocks towards the 
threshold and can stays in SafeMode for a long time.
By design, the purpose of startup SafeMode is to disallow modifications of the 
namespace and blocks map until all DNs replicas are reported.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14734) [FGL] Introduce Latch Lock to replace Namesystem global lock.

2019-08-14 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14734:
--

 Summary: [FGL] Introduce Latch Lock to replace Namesystem global 
lock.
 Key: HDFS-14734
 URL: https://issues.apache.org/jira/browse/HDFS-14734
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Konstantin Shvachko


The concept of Latch Lock associates a separate lock with each partition of 
PartitionedGSet.
Define the order of acquiring locks on the partitions. Some operations will 
require holding locks on multiple partitions.
It is preferable to retain the global lock for some operations, such as rename.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14733) [FGL] Introduce INode key.

2019-08-14 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14733:
--

 Summary: [FGL] Introduce INode key.
 Key: HDFS-14733
 URL: https://issues.apache.org/jira/browse/HDFS-14733
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Konstantin Shvachko


INode keys should satisfy the locality requirement.
Keys should be plugable via a configuration parameter.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14732) [FGL] Introduce PartitionedGSet a new implementation of GSet.

2019-08-14 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14732:
--

 Summary: [FGL] Introduce PartitionedGSet a new implementation of 
GSet.
 Key: HDFS-14732
 URL: https://issues.apache.org/jira/browse/HDFS-14732
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Konstantin Shvachko


INodeMap and BlocksMap are currently represented by a hash table implemented as 
LightWeightGSet. For fine-grained locking it should be replaced by 
PartitionedGSet - a new implementation of GSet interface, which partitions 
INodes into ranges based on a key.
We should target static partitioning into a configurable number of ranges. This 
should allow avoiding the high level lock for RangeMap. It should not be a 
compromise on efficiency, because parallelism on a single node is bounded by 
the number of CPU cores.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14731) [FGL] Remove redundant locking on NameNode.

2019-08-14 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14731:
--

 Summary: [FGL] Remove redundant locking on NameNode.
 Key: HDFS-14731
 URL: https://issues.apache.org/jira/browse/HDFS-14731
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Konstantin Shvachko


Currently NameNode has two global locks: FSNamesystemLock and FSDirectoryLock. 
An analysis shows that single FSNamesystemLock is sufficient to guarantee 
consistency of the NameNode state. FSDirectoryLock can be removed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-14303) check block directory logic not correct when there is only meta file, print no meaning warn log

2019-08-06 Thread Konstantin Shvachko (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reopened HDFS-14303:


Reopening for committing the addendum patch to other versions.

> check block directory logic not correct when there is only meta file, print 
> no meaning warn log
> ---
>
> Key: HDFS-14303
> URL: https://issues.apache.org/jira/browse/HDFS-14303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.7.3, 3.2.0, 2.9.2, 2.8.5
> Environment: env free
>Reporter: qiang Liu
>Assignee: qiang Liu
>Priority: Minor
>  Labels: easy-fix
> Fix For: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-14303-addendum-01.patch, 
> HDFS-14303-addendum-02.patch, HDFS-14303-branch-2.005.patch, 
> HDFS-14303-branch-2.009.patch, HDFS-14303-branch-2.010.patch, 
> HDFS-14303-branch-2.015.patch, HDFS-14303-branch-2.017.patch, 
> HDFS-14303-branch-2.7.001.patch, HDFS-14303-branch-2.7.004.patch, 
> HDFS-14303-branch-2.7.006.patch, HDFS-14303-branch-2.9.011.patch, 
> HDFS-14303-branch-2.9.012.patch, HDFS-14303-branch-2.9.013.patch, 
> HDFS-14303-trunk.014.patch, HDFS-14303-trunk.015.patch, 
> HDFS-14303-trunk.016.patch, HDFS-14303-trunk.016.path, 
> HDFS-14303.branch-3.2.017.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> chek block directory logic not correct when there is only meta file,print no 
> meaning warn log, eg:
>  WARN DirectoryScanner:? - Block: 1101939874 has to be upgraded to block 
> ID-based layout. Actual block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68,
>  expected block file path: 
> /data14/hadoop/data/current/BP-1461038173-10.8.48.152-1481686842620/current/finalized/subdir174/subdir68/subdir68



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2019-08-05 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14703:
--

 Summary: NameNode Fine-Grained Locking via Metadata Partitioning
 Key: HDFS-14703
 URL: https://issues.apache.org/jira/browse/HDFS-14703
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs, namenode
Reporter: Konstantin Shvachko


We target to enable fine-grained locking by splitting the in-memory namespace 
into multiple partitions each having a separate lock. Intended to improve 
performance of NameNode write operations.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14502) keepResults option in NNThroughputBenchmark should call saveNamespace()

2019-05-20 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14502:
--

 Summary: keepResults option in NNThroughputBenchmark should call 
saveNamespace()
 Key: HDFS-14502
 URL: https://issues.apache.org/jira/browse/HDFS-14502
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.6
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko


{{-keepResults}} option is usually used to make it possible to rerun  
NNThroughputBenchmark with existing namespace state. E.g. first generate files 
with {{create}} command and then run {{fileStatus}} command on generated files.
NNThroughputBenchmark should call {{saveNamespace()}} when {{-keepResults}} 
option is specified. Otherwise NN startup takes a while since it needs to 
digest large edits file starting from the empty image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14494) Move Server logging of StatedId inside receiveRequestState()

2019-05-15 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14494:
--

 Summary: Move Server logging of StatedId inside 
receiveRequestState()
 Key: HDFS-14494
 URL: https://issues.apache.org/jira/browse/HDFS-14494
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Konstantin Shvachko


HDFS-14270 introduced logging of the client and server StateIds in trace level. 
Unfortunately one of the arguments {{alignmentContext.getLastSeenStateId()}} 
holds a lock on FSEdits, which is called even if trace logging level is 
disabled. I propose to move logging message inside 
{{GlobalStateIdContext.receiveRequestState()}} where {{clientStateId}} and 
{{serverStateId}} already calculated and can be easily printed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14347) Restore a comment line mistakenly removed in ProtobufRpcEngine

2019-03-07 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14347:
--

 Summary: Restore a comment line mistakenly removed in 
ProtobufRpcEngine
 Key: HDFS-14347
 URL: https://issues.apache.org/jira/browse/HDFS-14347
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Konstantin Shvachko


HDFS-12977 mistakenly removed the following comment line in 
{{ProtobufRpcEngine.Server.Server()}}
{code}
- * the range of ports used when port is 0 (an ephemeral port)
+ * @param alignmentContext provides server state info on client responses
{code}
Let's put it back. Otherwise the comment doesn't make sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-13781) Unit tests for standby reads.

2019-02-25 Thread Konstantin Shvachko (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-13781.

   Resolution: Duplicate
Fix Version/s: HDFS-12943

This was handled by HDFS-13523, HDFS-13961, HDFS-13925, and a few other issues.
Resolving as duplicate.

> Unit tests for standby reads.
> -
>
> Key: HDFS-13781
> URL: https://issues.apache.org/jira/browse/HDFS-13781
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>Priority: Major
> Fix For: HDFS-12943
>
>
> Create more unit tests supporting standby reads feature. Let's come up with a 
> list of tests that provide sufficient test coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14170) Fix white spaces related to SBN reads.

2018-12-23 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14170:
--

 Summary: Fix white spaces related to SBN reads.
 Key: HDFS-14170
 URL: https://issues.apache.org/jira/browse/HDFS-14170
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko


This is to fix some checkstyle warnings, mostly white spaces before merging 
HDFS-12943 branch to trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14162) Balancer should work with ObserverNode

2018-12-19 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14162:
--

 Summary: Balancer should work with ObserverNode
 Key: HDFS-14162
 URL: https://issues.apache.org/jira/browse/HDFS-14162
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Konstantin Shvachko


Balancer provides a substantial RPC load on NameNode. It would be good to 
divert Balancer RPCs {{getBlocks()}}, etc. to ObserverNode. The main problem is 
that Balancer uses {{NamenodeProtocol}}, while ORPP currently supports only 
{{ClientProtocol}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-14160) ObserverReadInvocationHandler should implement RpcInvocationHandler

2018-12-19 Thread Konstantin Shvachko (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-14160.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-12943

I just committed this.
Thanks for the review Chao.

> ObserverReadInvocationHandler should implement RpcInvocationHandler
> ---
>
> Key: HDFS-14160
> URL: https://issues.apache.org/jira/browse/HDFS-14160
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-14160-HDFS-12943.001.patch, 
> HDFS-14160-HDFS-12943.002.patch
>
>
> Currently ObserverReadInvocationHandler implements InvocationHandler.
> [As 
> mentioned|https://issues.apache.org/jira/browse/HDFS-14116?focusedCommentId=16710596=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16710596]
>  in HDFS-14116 this is the cause of Fsck failing with Observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14160) ObserverReadInvocationHandler should implement RpcInvocationHandler

2018-12-18 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14160:
--

 Summary: ObserverReadInvocationHandler should implement 
RpcInvocationHandler
 Key: HDFS-14160
 URL: https://issues.apache.org/jira/browse/HDFS-14160
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko


Currently ObserverReadInvocationHandler implements InvocationHandler.
[As 
mentioned|https://issues.apache.org/jira/browse/HDFS-14116?focusedCommentId=16710596=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16710596]
 in HDFS-14116 this is the cause of Fsck failing with Observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-14116) ObserverReadProxyProvider should work with protocols other than ClientProtocol

2018-12-17 Thread Konstantin Shvachko (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-14116.

  Resolution: Fixed
Hadoop Flags: Reviewed

Pushed to HDFS-12943 branch. Thanks everybody.

> ObserverReadProxyProvider should work with protocols other than ClientProtocol
> --
>
> Key: HDFS-14116
> URL: https://issues.apache.org/jira/browse/HDFS-14116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-14116-HDFS-12943.000.patch, 
> HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, 
> HDFS-14116-HDFS-12943.003.patch, HDFS-14116-HDFS-12943.004.patch, 
> HDFS-14116-HDFS-12943.005.patch
>
>
> Currently in {{ObserverReadProxyProvider}} constructor there is this line 
> {code}
> ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
> {code}
> This could potentially cause failure, because it is possible that factory can 
> not be casted here. Specifically,  
> {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the 
> constructor will be called, and there are two paths that could call into this:
> (1).{{NameNodeProxies.createProxy}}
> (2).{{NameNodeProxiesClient.createFailoverProxyProvider}}
> (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
> {{NameNodeHAProxyFactory}} which can not be casted to 
> {{ClientHAProxyFactory}}, this happens when, for example, running 
> NNThroughputBenmarck. To fix this we can at least:
> 1. introduce setAlignmentContext to HAProxyFactory which is the parent of 
> both  ClientHAProxyFactory and NameNodeHAProxyFactory OR
> 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
> if check with reflection. 
> Depending on whether it make sense to have alignment context for the case (1) 
> calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-14116) ObserverReadProxyProvider should work with protocols other than ClientProtocol

2018-12-17 Thread Konstantin Shvachko (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reopened HDFS-14116:


Reopening. Yes I was looking at this patch and it is as [~vagarychen] said.
The {{AlignmentContext}} is set in {{ClientHAProfyFactory}}. It is sort of the 
source of truth there. ORPP needs that to function properly. 
NNThroughputBenchmark uses in the end {{NameNodePoxyFactory}} to create 
essentially a non-ha proxy for a non-ClientProtocol interface. So it should use 
{{createNonHAProxy()}} rather than building ORPP.
So I propose to revert the patch. And let's think how we should fix it.
We should make {{NameNodeProxiesClient.createFailoverProxyProvider()}} return 
null if somebody tries to instantiate ORPP with non-ClientProtocol interface. 
Then things should fall in place.

> ObserverReadProxyProvider should work with protocols other than ClientProtocol
> --
>
> Key: HDFS-14116
> URL: https://issues.apache.org/jira/browse/HDFS-14116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-14116-HDFS-12943.000.patch, 
> HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, 
> HDFS-14116-HDFS-12943.003.patch, HDFS-14116-HDFS-12943.004.patch
>
>
> Currently in {{ObserverReadProxyProvider}} constructor there is this line 
> {code}
> ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
> {code}
> This could potentially cause failure, because it is possible that factory can 
> not be casted here. Specifically,  
> {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the 
> constructor will be called, and there are two paths that could call into this:
> (1).{{NameNodeProxies.createProxy}}
> (2).{{NameNodeProxiesClient.createFailoverProxyProvider}}
> (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
> {{NameNodeHAProxyFactory}} which can not be casted to 
> {{ClientHAProxyFactory}}, this happens when, for example, running 
> NNThroughputBenmarck. To fix this we can at least:
> 1. introduce setAlignmentContext to HAProxyFactory which is the parent of 
> both  ClientHAProxyFactory and NameNodeHAProxyFactory OR
> 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
> if check with reflection. 
> Depending on whether it make sense to have alignment context for the case (1) 
> calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14149) Adjust annotations on new interfaces/classes for SBN reads.

2018-12-13 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14149:
--

 Summary: Adjust annotations on new interfaces/classes for SBN 
reads.
 Key: HDFS-14149
 URL: https://issues.apache.org/jira/browse/HDFS-14149
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-12943
Reporter: Konstantin Shvachko


Let's make sure that all new classes and interfaces
# do have annotations, as some of them don't, like {{ObserverReadProxyProvider}}
# that they are annotated as {{Private}} and {{Evolving}}, to allow room for 
changes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14131) Create user guide for "Consistent reads from Observer" feature.

2018-12-06 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14131:
--

 Summary: Create user guide for "Consistent reads from Observer" 
feature.
 Key: HDFS-14131
 URL: https://issues.apache.org/jira/browse/HDFS-14131
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Affects Versions: HDFS-12943
Reporter: Konstantin Shvachko


The documentation should give an overview of the feature, explain configuration 
parameters, give an example of recommended deployment.
It should include the description of Fast Edits Tailing HDFS-13150, as this is 
required for efficient reads from Observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14130) Make ZKFC ObserverNode aware

2018-12-06 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14130:
--

 Summary: Make ZKFC ObserverNode aware
 Key: HDFS-14130
 URL: https://issues.apache.org/jira/browse/HDFS-14130
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HDFS-12943
Reporter: Konstantin Shvachko


Need to fix automatic failover with ZKFC. Currently it does not know about 
ObserverNodes trying to convert them to SBNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-14059) Test reads from standby on a secure cluster with Configured failover

2018-12-05 Thread Konstantin Shvachko (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-14059.

Resolution: Done

Thanks [~zero45]! Some cool tests there with multiple observers. Great there 
were no problem with DTs and failover.
We still have an outstanding issue to support automatic failover with ZKFC. 
Will create a jira for that.
Closing this one as done.

> Test reads from standby on a secure cluster with Configured failover
> 
>
> Key: HDFS-14059
> URL: https://issues.apache.org/jira/browse/HDFS-14059
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>Assignee: Plamen Jeliazkov
>Priority: Major
>
> Run standard HDFS tests to verify reading from ObserverNode on a secure HA 
> cluster with {{ConfiguredFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-14058) Test reads from standby on a secure cluster with IP failover

2018-12-05 Thread Konstantin Shvachko (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-14058.

Resolution: Done

Thanks, [~vagarychen]!
This was a lot of testing. With all related issues resolved and retested I 
think we can close this one.
Load testing and performance tuning should go into the next step.

> Test reads from standby on a secure cluster with IP failover
> 
>
> Key: HDFS-14058
> URL: https://issues.apache.org/jira/browse/HDFS-14058
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
> Attachments: dfsio_crs.no-crs.txt, dfsio_crs.with-crs.txt
>
>
> Run standard HDFS tests to verify reading from ObserverNode on a secure HA 
> cluster with {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14094) Fix the order of logging arguments in ObserverReadProxyProvider.

2018-11-21 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14094:
--

 Summary: Fix the order of logging arguments in 
ObserverReadProxyProvider.
 Key: HDFS-14094
 URL: https://issues.apache.org/jira/browse/HDFS-14094
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: logging
Affects Versions: HDFS-12943
Reporter: Konstantin Shvachko


[~zero45] finding from HDFS-14067

In ObserverReadProxyProvider there is a warn message:
{code:java}
    LOG.warn("{} observers have failed for read request {}; also found " +
    "{} standby and {} active. Falling back to active.",
    failedObserverCount, standbyCount, activeCount, method.getName());
{code}
Seems the arguments are out of order, should probably be {{failedObserverCount, 
method.getName(), standbyCount, activeCoun}}`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14059) Test reads from standby on a secure cluster with Configured failover

2018-11-08 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14059:
--

 Summary: Test reads from standby on a secure cluster with 
Configured failover
 Key: HDFS-14059
 URL: https://issues.apache.org/jira/browse/HDFS-14059
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov


Run standard HDFS tests to verify reading from ObserverNode on a secure HA 
cluster with {{ConfiguredFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14058) Test reads from standby on a secure cluster with IP failover

2018-11-08 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-14058:
--

 Summary: Test reads from standby on a secure cluster with IP 
failover
 Key: HDFS-14058
 URL: https://issues.apache.org/jira/browse/HDFS-14058
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Konstantin Shvachko
Assignee: Chen Liang


Run standard HDFS tests to verify reading from ObserverNode on a secure HA 
cluster with {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-12026) libhdfs++: Fix compilation errors and warnings when compiling with Clang

2018-10-19 Thread Konstantin Shvachko (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-12026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reopened HDFS-12026:


Reopening.
I think it is a blocker for release 3.2.
If there is no progress on this, I would recommend reverting.
Potentially the entire branch HDFS-8707, I didn't check how much of it is 
relied on this change.

> libhdfs++: Fix compilation errors and warnings when compiling with Clang 
> -
>
> Key: HDFS-12026
> URL: https://issues.apache.org/jira/browse/HDFS-12026
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>Assignee: Anatoli Shein
>Priority: Major
> Attachments: HDFS-12026.HDFS-8707.000.patch, 
> HDFS-12026.HDFS-8707.001.patch, HDFS-12026.HDFS-8707.002.patch, 
> HDFS-12026.HDFS-8707.003.patch, HDFS-12026.HDFS-8707.004.patch, 
> HDFS-12026.HDFS-8707.005.patch, HDFS-12026.HDFS-8707.006.patch, 
> HDFS-12026.HDFS-8707.007.patch, HDFS-12026.HDFS-8707.008.patch, 
> HDFS-12026.HDFS-8707.009.patch, HDFS-12026.HDFS-8707.010.patch
>
>
> Currently multiple errors and warnings prevent libhdfspp from being compiled 
> with clang. It should compile cleanly using flag:
> -std=c++11
> and also warning flags:
> -Weverything -Wno-c++98-compat -Wno-missing-prototypes 
> -Wno-c++98-compat-pedantic -Wno-padded -Wno-covered-switch-default 
> -Wno-missing-noreturn -Wno-unknown-pragmas -Wconversion -Werror



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-13974) Introduce the single Observer failure

2018-10-11 Thread Konstantin Shvachko (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reopened HDFS-13974:


> Introduce the single Observer failure
> -
>
> Key: HDFS-13974
> URL: https://issues.apache.org/jira/browse/HDFS-13974
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Sherwood Zheng
>Assignee: Sherwood Zheng
>Priority: Major
>
> Introduce the single Observer failure. Reads should be automatically 
> redirected
> to Active NameNode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-13974) Introduce the single Observer failure

2018-10-11 Thread Konstantin Shvachko (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-13974.

Resolution: Duplicate

Re-resolving as Duplicate

> Introduce the single Observer failure
> -
>
> Key: HDFS-13974
> URL: https://issues.apache.org/jira/browse/HDFS-13974
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Sherwood Zheng
>Assignee: Sherwood Zheng
>Priority: Major
>
> Introduce the single Observer failure. Reads should be automatically 
> redirected
> to Active NameNode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13961) TestObserverNode refactoring

2018-10-04 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-13961:
--

 Summary: TestObserverNode refactoring
 Key: HDFS-13961
 URL: https://issues.apache.org/jira/browse/HDFS-13961
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: HDFS-12943
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko


TestObserverNode combines unit tests for ObserverNode. The tests are of 
different types. I propose to split them into separate modules, factor out 
common methods, and optimize it so that it starts and shuts down 
MIniHDFSCluster once for the entire test rather than for individual test cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-13780) Postpone NameNode state discovery in ObserverReadProxyProvider until the first real RPC call.

2018-08-31 Thread Konstantin Shvachko (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-13780.

   Resolution: Duplicate
Fix Version/s: HDFS-12943

I think it was incorporated, indeed.

> Postpone NameNode state discovery in ObserverReadProxyProvider until the 
> first real RPC call.
> -
>
> Key: HDFS-13780
> URL: https://issues.apache.org/jira/browse/HDFS-13780
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
> Fix For: HDFS-12943
>
>
> Currently {{ObserverReadProxyProvider}} during instantiation discovers 
> Observers by poking known NameNodes and checking their states. This rather 
> expensive process can be postponed until the first actual RPC call.
> This is an optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13873) ObserverNode should reject read requests when it is too far behind.

2018-08-27 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-13873:
--

 Summary: ObserverNode should reject read requests when it is too 
far behind.
 Key: HDFS-13873
 URL: https://issues.apache.org/jira/browse/HDFS-13873
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS-12943
Reporter: Konstantin Shvachko


Add a server-side threshold for ObserverNode to reject read requests when it is 
too far behind.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-13782) ObserverReadProxyProvider should work with IPFailoverProxyProvider

2018-08-25 Thread Konstantin Shvachko (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-13782.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-12943
 Release Note: I just committed this.

> ObserverReadProxyProvider should work with IPFailoverProxyProvider
> --
>
> Key: HDFS-13782
> URL: https://issues.apache.org/jira/browse/HDFS-13782
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-13782-HDFS-12943.001.patch, 
> HDFS-13782-HDFS-12943.002.patch
>
>
> Currently {{ObserverReadProxyProvider}} is based on 
> {{ConfiguredFailoverProxyProvider}}. We should also be able perform SBN reads 
> in case of {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13851) Remove AlignmentContext from AbstractNNFailoverProxyProvider

2018-08-22 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-13851:
--

 Summary: Remove AlignmentContext from 
AbstractNNFailoverProxyProvider
 Key: HDFS-13851
 URL: https://issues.apache.org/jira/browse/HDFS-13851
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-12943
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko


{{AlignmentContext}} is now a part of {{ObserverReadProxyProvider}}, we can 
remove it from the base class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13848) Refactor NameNode failover proxy providers

2018-08-22 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-13848:
--

 Summary: Refactor NameNode failover proxy providers
 Key: HDFS-13848
 URL: https://issues.apache.org/jira/browse/HDFS-13848
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, hdfs-client
Affects Versions: 2.7.5
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko


Looking at NN failover proxy providers in the context of HDFS-13782 I noticed 
that {{ConfiguredFailoverProxyProvider}} and {{IPFailoverProxyProvider}} have a 
lot of common logic. We can move this common logic into 
{{AbstractNNFailoverProxyProvider}}, which simplifies things a lot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13782) IPFailoverProxyProvider should work with SBN

2018-08-01 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-13782:
--

 Summary: IPFailoverProxyProvider should work with SBN
 Key: HDFS-13782
 URL: https://issues.apache.org/jira/browse/HDFS-13782
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Konstantin Shvachko


Currently {{ObserverReadProxyProvider}} is based on 
{{ConfiguredFailoverProxyProvider}}. We should also be able perform SBN reads 
in case of {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13781) Unit test for standby reads.

2018-08-01 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-13781:
--

 Summary: Unit test for standby reads.
 Key: HDFS-13781
 URL: https://issues.apache.org/jira/browse/HDFS-13781
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Konstantin Shvachko


Create more unit tests supporting standby reads feature. Let's come up with a 
list of tests that provide sufficient test coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13780) Postpone NameNode state discovery in ObserverReadProxyProvider until the first real RPC call.

2018-08-01 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-13780:
--

 Summary: Postpone NameNode state discovery in 
ObserverReadProxyProvider until the first real RPC call.
 Key: HDFS-13780
 URL: https://issues.apache.org/jira/browse/HDFS-13780
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Konstantin Shvachko


Currently {{ObserverReadProxyProvider}} during instantiation discovers 
Observers by poking known NameNodes and checking their states. This rather 
expensive process can be postponed until the first actual RPC call. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-01 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-13779:
--

 Summary: Implement performFailover logic for 
ObserverReadProxyProvider.
 Key: HDFS-13779
 URL: https://issues.apache.org/jira/browse/HDFS-13779
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Konstantin Shvachko


Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
from {{ConfiguredFailoverProxyProvider}}, which simply increments the index and 
switches over to another NameNode. The logic for ORPP should be smart enough to 
choose another observer, otherwise it can switch to a SBN, where reads are 
disallowed, or to an ANN, which defeats the purpose of reads from standby.
This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13778) In TestStateAlignmentContextWithHA replace artificial AlignmentContextProxyProvider with real ObserverReadProxyProvider.

2018-08-01 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-13778:
--

 Summary: In TestStateAlignmentContextWithHA replace artificial 
AlignmentContextProxyProvider with real ObserverReadProxyProvider.
 Key: HDFS-13778
 URL: https://issues.apache.org/jira/browse/HDFS-13778
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Konstantin Shvachko


TestStateAlignmentContextWithHA uses an artificial 
AlignmentContextProxyProvider, which was temporary needed for testing. Now that 
we have real ObserverReadProxyProvider it can take over ACPP. This is also 
useful for testing the ORPP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13706) ClientGCIContext should be correctly named ClientGSIContext

2018-06-28 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-13706:
--

 Summary: ClientGCIContext should be correctly named 
ClientGSIContext
 Key: HDFS-13706
 URL: https://issues.apache.org/jira/browse/HDFS-13706
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Konstantin Shvachko


GSI stands for Global State Id. It's a client-side counterpart of NN's 
{{GlobalStateIdContext}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-8675) IBRs from dead DNs go into infinite loop

2018-03-30 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-8675.
---
Resolution: Not A Problem

Looks like not a problem.

> IBRs from dead DNs go into infinite loop
> 
>
> Key: HDFS-8675
> URL: https://issues.apache.org/jira/browse/HDFS-8675
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Priority: Major
>
> If the DN sends an IBR after the NN declares it dead, the NN returns an IOE 
> of unregistered or dead.  The DN catches the IOE, ignores it, and infinitely 
> loops spamming the NN with retries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-11985) Intermittent unit test failures on 2.7.4 branch.

2018-03-30 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-11985.

   Resolution: Fixed
Fix Version/s: 2.7.6

These have been fixed elsewhere. I don't see those failures in 2.7.6 nightly 
builds anymore.

> Intermittent unit test failures on 2.7.4 branch.
> 
>
> Key: HDFS-11985
> URL: https://issues.apache.org/jira/browse/HDFS-11985
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.4
>Reporter: Konstantin Shvachko
>Priority: Major
> Fix For: 2.7.6
>
>
> Some unit tests are failing intermittently on Jenkins nightly builds for 
> branch-2.7.
> Here is the list of test, which failed more than once within last week:
> * 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure.testUnderReplicationAfterVolFailure
> * 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten
> * 
> org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport.testXceiverCount
>   
> * org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12979) StandbyNode should upload FsImage to ObserverNode after checkpointing.

2018-01-02 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-12979:
--

 Summary: StandbyNode should upload FsImage to ObserverNode after 
checkpointing.
 Key: HDFS-12979
 URL: https://issues.apache.org/jira/browse/HDFS-12979
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Konstantin Shvachko


ObserverNode does not create checkpoints. So it's fsimage file can get very old 
making bootstrap of ObserverNode too long. A StandbyNode should copy latest 
fsimage to ObserverNode(s) along with ANN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12978) Fine-grained locking while consuming journal stream.

2018-01-02 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-12978:
--

 Summary: Fine-grained locking while consuming journal stream.
 Key: HDFS-12978
 URL: https://issues.apache.org/jira/browse/HDFS-12978
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Konstantin Shvachko


In current implementation SBN consumes the entire segment of transactions under 
a single namesystem lock, which does not allow reads over a long period of time 
until the segment is processed. We should break the lock into fine grained 
chunks. In extreme case each transaction should release the lock once it is 
applied.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12977) Add stateId to RPC headers.

2018-01-02 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-12977:
--

 Summary: Add stateId to RPC headers.
 Key: HDFS-12977
 URL: https://issues.apache.org/jira/browse/HDFS-12977
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ipc, namenode
Reporter: Konstantin Shvachko


stateId is a new field in the RPC headers of NameNode proto calls.
stateId is the journal transaction Id, which represents LastSeenId for the 
clients and LastWrittenId for NameNodes. See more in [reads from Standby design 
doc|https://issues.apache.org/jira/secure/attachment/12902925/ConsistentReadsFromStandbyNode.pdf].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12976) Introduce StandbyReadProxyProvider

2018-01-02 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-12976:
--

 Summary: Introduce StandbyReadProxyProvider
 Key: HDFS-12976
 URL: https://issues.apache.org/jira/browse/HDFS-12976
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Konstantin Shvachko


{{StandbyReadProxyProvider}} should implement {{FailoverProxyProvider}} 
interface and be able to submit read requests to ANN and SBN(s).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12975) Changes to the NameNode to support reads from standby

2018-01-02 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-12975:
--

 Summary: Changes to the NameNode to support reads from standby
 Key: HDFS-12975
 URL: https://issues.apache.org/jira/browse/HDFS-12975
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Konstantin Shvachko


In order to support reads from standby NameNode needs changes to add 
ObserverNode role, which turns off checkpointing and such.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12943) Consistent Reads from Standby Node

2017-12-19 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-12943:
--

 Summary: Consistent Reads from Standby Node
 Key: HDFS-12943
 URL: https://issues.apache.org/jira/browse/HDFS-12943
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs
Reporter: Konstantin Shvachko


StandbyNode in HDFS is a replica of the active NameNode. The states of the 
NameNodes are coordinated via the journal. It is natural to consider 
StandbyNode as a read-only replica. As with any replicated distributed system 
the problem of stale reads should be resolved. Our main goal is to provide 
reads from standby in a consistent way in order to enable a wide range of 
existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12884) BlockUnderConstructionFeature.truncateBlock should be of type BlockInfo

2017-12-04 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-12884:
--

 Summary: BlockUnderConstructionFeature.truncateBlock should be of 
type BlockInfo
 Key: HDFS-12884
 URL: https://issues.apache.org/jira/browse/HDFS-12884
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.4
Reporter: Konstantin Shvachko


{{BlockUnderConstructionFeature.truncateBlock}} type should be changed to 
{{BlockInfo}} from {{Block}}. {{truncateBlock}} is always assigned as 
{{BlockInfo}}, so this will avoid unnecessary casts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-12638) Delete copy-on-truncate block along with the original block, when deleting a file being truncated

2017-11-30 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-12638.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.4
   3.0.1
   2.9.1
   2.10.0
   3.1.0
   2.7.5

Just committed this into the following branches:
{code}
   3c57def..7998077  branch-2 -> branch-2
   7252e18..85eb32b  branch-2.7 -> branch-2.7
   eacccf1..19c18f7  branch-2.8 -> branch-2.8
   5a8a1e6..0f5ec01  branch-2.9 -> branch-2.9
   58d849b..def87db  branch-3.0 -> branch-3.0
   a63d19d..60fd0d7  trunk -> trunk
{code}
Thank you everybody for contributing.

> Delete copy-on-truncate block along with the original block, when deleting a 
> file being truncated
> -
>
> Key: HDFS-12638
> URL: https://issues.apache.org/jira/browse/HDFS-12638
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.8.2
>Reporter: Jiandan Yang 
>Assignee: Konstantin Shvachko
>Priority: Blocker
> Fix For: 2.7.5, 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-12638-branch-2.8.2.001.patch, HDFS-12638.002.patch, 
> HDFS-12638.003.patch, HDFS-12638.004.patch, OphanBlocksAfterTruncateDelete.jpg
>
>
> Active NamNode exit due to NPE, I can confirm that the BlockCollection passed 
> in when creating ReplicationWork is null, but I do not know why 
> BlockCollection is null, By view history I found 
> [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754] remove judging  
> whether  BlockCollection is null.
> NN logs are as following:
> {code:java}
> 2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> ReplicationMonitor thread received Runtime exception.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744)
> at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-9754) Avoid unnecessary getBlockCollection calls in BlockManager

2017-11-30 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-9754.
---
Resolution: Fixed

Resolving this, based on the discussion in HDFS-12638.
Filed HDFS-12880 instead.

> Avoid unnecessary getBlockCollection calls in BlockManager
> --
>
> Key: HDFS-9754
> URL: https://issues.apache.org/jira/browse/HDFS-9754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.8.2, 3.0.0-alpha1, 2.9.0
>
> Attachments: HDFS-9754.000.patch, HDFS-9754.001.patch, 
> HDFS-9754.002.patch
>
>
> Currently BlockManager calls {{Namesystem#getBlockCollection}} in order to:
> 1. check if the block has already been abandoned
> 2. identify the storage policy of the block
> 3. meta save
> For #1 we can use BlockInfo's internal state instead of checking if the 
> corresponding file still exists.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12880) Disallow abandoned blocks in the BlocksMap

2017-11-30 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-12880:
--

 Summary: Disallow abandoned blocks in the BlocksMap
 Key: HDFS-12880
 URL: https://issues.apache.org/jira/browse/HDFS-12880
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.4
Reporter: Konstantin Shvachko


BlocksMap used to contain only valid blocks, that is belonging to a file. The 
issue is intended to restore this invariant. This was discussed in details 
while fixing HDFS-12638



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-9754) Avoid unnecessary getBlockCollection calls in BlockManager

2017-11-28 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko reopened HDFS-9754:
---

There is clearly value in the work done here.
I would rather revert the entire thing in order to unblock 2.8 release, and 
then let people modify the patch.
Reopening this for now.

> Avoid unnecessary getBlockCollection calls in BlockManager
> --
>
> Key: HDFS-9754
> URL: https://issues.apache.org/jira/browse/HDFS-9754
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Fix For: 2.9.0, 3.0.0-alpha1, 2.8.2
>
> Attachments: HDFS-9754.000.patch, HDFS-9754.001.patch, 
> HDFS-9754.002.patch
>
>
> Currently BlockManager calls {{Namesystem#getBlockCollection}} in order to:
> 1. check if the block has already been abandoned
> 2. identify the storage policy of the block
> 3. meta save
> For #1 we can use BlockInfo's internal state instead of checking if the 
> corresponding file still exists.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12856) BlockReconstructionWork.chooseTargets() violates namesystem locking

2017-11-22 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-12856:
--

 Summary: BlockReconstructionWork.chooseTargets() violates 
namesystem locking
 Key: HDFS-12856
 URL: https://issues.apache.org/jira/browse/HDFS-12856
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.4
Reporter: Konstantin Shvachko


{{BlockReconstructionWork.chooseTargets()}} is called outside namesystem lock, 
although it works with {{DatanodeDescriptor}} and {{DatanodeStorageInfo}}, 
which can change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12855) Fsck violates namesystem locking

2017-11-22 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-12855:
--

 Summary: Fsck violates namesystem locking 
 Key: HDFS-12855
 URL: https://issues.apache.org/jira/browse/HDFS-12855
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.4
Reporter: Konstantin Shvachko


{{NamenodeFsck}} access {{FSNamesystem}} structures, such as INodes, BlockInfo 
without holding a lock. See e.g. {{NamenodeFsck.blockIdCK()}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12458) TestReencryptionWithKMS fails regularly

2017-09-14 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-12458:
--

 Summary: TestReencryptionWithKMS fails regularly
 Key: HDFS-12458
 URL: https://issues.apache.org/jira/browse/HDFS-12458
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: kms, test
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko


{{TestReencryptionWithKMS}} fails pretty often on Jenkins. Should fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-11896) Non-dfsUsed will be doubled on dead node re-registration

2017-07-27 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-11896.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.3
   3.0.0-beta1
   2.7.4
   2.9.0

I just committed this to trunk, and branches 2, 2.8, and 2.7.
Thank you [~brahmareddy] and [~zhz].

> Non-dfsUsed will be doubled on dead node re-registration
> 
>
> Key: HDFS-11896
> URL: https://issues.apache.org/jira/browse/HDFS-11896
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Blocker
> Fix For: 2.9.0, 2.7.4, 3.0.0-beta1, 2.8.3
>
> Attachments: HDFS-11896-002.patch, HDFS-11896-003.patch, 
> HDFS-11896-004.patch, HDFS-11896-005.patch, HDFS-11896-006.patch, 
> HDFS-11896-007.patch, HDFS-11896-008.patch, HDFS-11896-branch-2.7-001.patch, 
> HDFS-11896-branch-2.7-002.patch, HDFS-11896-branch-2.7-003.patch, 
> HDFS-11896-branch-2.7-004.patch, HDFS-11896-branch-2.7-005.patch, 
> HDFS-11896-branch-2.7-006.patch, HDFS-11896-branch-2.7-008.patch, 
> HDFS-11896.patch
>
>
>  *Scenario:* 
> i)Make you sure you've non-dfs data.
> ii) Stop Datanode
> iii) wait it becomes dead
> iv) now restart and check the non-dfs data



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11893) Fix TestDFSShell.testMoveWithTargetPortEmpty failure on branch-2.7

2017-05-26 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-11893:
--

 Summary: Fix TestDFSShell.testMoveWithTargetPortEmpty failure on 
branch-2.7
 Key: HDFS-11893
 URL: https://issues.apache.org/jira/browse/HDFS-11893
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.4
Reporter: Konstantin Shvachko


{{TestDFSShell.testMoveWithTargetPortEmpty()}} is consistently failing on 
branch-2.7.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-11078) NPE in LazyPersistFileScrubber

2017-05-26 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-11078.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.2
   3.0.0-alpha3
   2.7.4
   2.9.0

I just committed this. Thank you [~elgoiri].

> NPE in LazyPersistFileScrubber
> --
>
> Key: HDFS-11078
> URL: https://issues.apache.org/jira/browse/HDFS-11078
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha3, 2.8.2
>
> Attachments: HDFS-11078.000.patch, HDFS-11078.001.patch, 
> HDFS-11078-branch-2.7.patch
>
>
> If a block is removed, it will be removed from the block map. When the 
> clearCorruptLazyPersistFiles() tries to delete the block, it may already be 
> deleted and generate a null pointer exception.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:3820)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:3851)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-11867) Backport HDFS-6291 to branch 2.7

2017-05-24 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-11867.

  Resolution: Fixed
Hadoop Flags: Reviewed

Just committed this to branch-2.7. Thank you [~elgoiri]

> Backport HDFS-6291 to branch 2.7
> 
>
> Key: HDFS-11867
> URL: https://issues.apache.org/jira/browse/HDFS-11867
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
> Attachments: HDFS-6291-branch-2.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-11731) Balancer.run() prints redundant included, excluded, source nodes.

2017-05-24 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-11731.

Resolution: Invalid

Yes, I agree there is no redundancy. Not sure now where I saw it. 
[~vrushalic] thank you for verifying. Closing.

> Balancer.run() prints redundant included, excluded, source nodes.
> -
>
> Key: HDFS-11731
> URL: https://issues.apache.org/jira/browse/HDFS-11731
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.8.0
>Reporter: Konstantin Shvachko
>  Labels: newbie
>
> Included, excluded, and source nodes are printed twice by the Balancer. First 
> as part of {{BalancerParameters.toString()}} in
> {code}
> LOG.info("parameters = " + p);
> {code}
> And then separately
> {code}
> LOG.info("included nodes = " + p.getIncludedNodes());
> LOG.info("excluded nodes = " + p.getExcludedNodes());
> LOG.info("source nodes = " + p.getSourceNodes());
> {code}
> The latter can be removed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-2538) option to disable fsck dots

2017-05-18 Thread Konstantin Shvachko (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved HDFS-2538.
---
Resolution: Fixed

Resolving it back. The incompatibility concern is valid. I am still thinking 
about if we can / should include it. Sorry for confusion.
Thanks for the patch [~elgoiri].

> option to disable fsck dots 
> 
>
> Key: HDFS-2538
> URL: https://issues.apache.org/jira/browse/HDFS-2538
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Allen Wittenauer
>Assignee: Mohammad Kamrul Islam
>Priority: Minor
>  Labels: newbie, release-blocker
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-2538.1.patch, HDFS-2538.2.patch, HDFS-2538.3.patch, 
> HDFS-2538-branch-0.20-security-204.patch, 
> HDFS-2538-branch-0.20-security-204.patch, HDFS-2538-branch-1.0.patch, 
> HDFS-2538-branch-2.7.patch
>
>
> this patch turns the dots during fsck off by default and provides an option 
> to turn them back on if you have a fetish for millions and millions of dots 
> on your terminal.  i haven't done any benchmarks, but i suspect fsck is now 
> 300% faster to boot.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11736) OIV tests should not write outside 'target' directory.

2017-05-01 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-11736:
--

 Summary: OIV tests should not write outside 'target' directory.
 Key: HDFS-11736
 URL: https://issues.apache.org/jira/browse/HDFS-11736
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Konstantin Shvachko


A few tests use {{Files.createTempDir()}} from Guava package, but do not set 
{{java.io.tmpdir}} system property. Thus the temp directory is created in 
unpredictable places and is not being cleaned up by {{mvn clean}}.
This was probably introduced in {{TestOfflineImageViewer}} and then replicated 
in {{TestCheckpoint}}, {{TestStandbyCheckpoints}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11733) TestGetBlocks.getBlocksWithException() ignores datanode and size parameters.

2017-05-01 Thread Konstantin Shvachko (JIRA)

Konstantin Shvachko created HDFS-11733:
--

 Summary: TestGetBlocks.getBlocksWithException() ignores datanode 
and size parameters.
 Key: HDFS-11733
 URL: https://issues.apache.org/jira/browse/HDFS-11733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer & mover, test
Affects Versions: 2.6.1
Reporter: Konstantin Shvachko


{{TestGetBlocks.getBlocksWithException()}} has 3 parameters, but uses only one. 
So whatever callers think they pass in, it is ignored.
Looks like we should change it to use the parameters, but I am not sure how 
this will affect the test.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

1 2 3 >

1 - 100 of 221 matches

Mail list logo