from:"Xiao Chen"

[jira] [Reopened] (HDFS-13732) ECAdmin should print the policy name when an EC policy is set

2018-11-13 Thread Xiao Chen (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reopened HDFS-13732:
--

> ECAdmin should print the policy name when an EC policy is set
> -
>
> Key: HDFS-13732
> URL: https://issues.apache.org/jira/browse/HDFS-13732
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, tools
>Affects Versions: 3.0.0
>Reporter: Soumyapn
>Assignee: Zsolt Venczel
>Priority: Trivial
> Fix For: 3.2.0
>
> Attachments: EC_Policy.PNG, HDFS-13732.01.patch
>
>
> Scenerio:
> If the new policy apart from the default EC policy is set for the HDFS 
> directory, then the console message is coming as "Set default erasure coding 
> policy on "
> Expected output:
> It would be good If the EC policy name is displayed when the policy is set...
>  
> Actual output:
> Set default erasure coding policy on 
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14039) ec -listPolicies doesn't show correct state for the default policy when the default is not RS(6,3)

2018-10-30 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-14039:


 Summary: ec -listPolicies doesn't show correct state for the 
default policy when the default is not RS(6,3)
 Key: HDFS-14039
 URL: https://issues.apache.org/jira/browse/HDFS-14039
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Affects Versions: 3.0.0
Reporter: Xiao Chen
Assignee: Kitti Nanasi


{noformat}
$ hdfs ec -listPolicies
Erasure Coding Policies:
ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5], State=DISABLED
ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2], State=DISABLED
ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1], State=ENABLED
ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
CellSize=1048576, Id=3], State=DISABLED
ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4], State=DISABLED
$ hdfs ec -enablePolicy -policy XOR-2-1-1024k
Erasure coding policy XOR-2-1-1024k is enabled
$ hdfs ec -listPolicies
Erasure Coding Policies:
ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5], State=DISABLED
ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2], State=DISABLED
ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1], State=ENABLED
ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
CellSize=1048576, Id=3], State=DISABLED
ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4], State=ENABLED
--
$ #set default to be RS-3-2 for dfs.namenode.ec.system.default.policy, and 
restart NN
(this seems to be what's triggering the failure)
---
$ hdfs ec -listPolicies
Erasure Coding Policies:
ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5], State=DISABLED
ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2], State=DISABLED
ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1], State=ENABLED
ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
CellSize=1048576, Id=3], State=DISABLED
ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4], State=ENABLED
$ hdfs ec -enablePolicy -policy RS-3-2-1024k
Erasure coding policy RS-3-2-1024k is enabled
$ hdfs ec -listPolicies
Erasure Coding Policies:
ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5], State=DISABLED
ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2], State=DISABLED
ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1], State=ENABLED
ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
CellSize=1048576, Id=3], State=DISABLED
ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4], State=ENABLED
{noformat}

The last 2 should show RS-3-2 as ENABLED. RS-6-3 DISABLED if it's not enabled 
before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14038) Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate

2018-10-30 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-14038:


 Summary: Expose HdfsDataOutputStreamBuilder to include Spark in 
LimitedPrivate
 Key: HDFS-14038
 URL: https://issues.apache.org/jira/browse/HDFS-14038
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Xiao Chen


In SPARK-25855 / 
https://github.com/apache/spark/pull/22881#issuecomment-434359237, Spark prefer 
to create Spark event log files with replication (instead of EC). To do this 
currently, it has to be done by some casting / reflection, to get a 
DistributedFileSystem object (or use the {{HdfsDataOutputStreamBuilder}} 
subclass of it).

We should officially expose this for Spark's usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-14027) DFSStripedOutputStream should implement both hsync methods

2018-10-24 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-14027:


 Summary: DFSStripedOutputStream should implement both hsync methods
 Key: HDFS-14027
 URL: https://issues.apache.org/jira/browse/HDFS-14027
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Affects Versions: 3.0.0
Reporter: Xiao Chen
Assignee: Xiao Chen


In an internal spark investigation, it appears that when 
[EventLoggingListener|https://github.com/apache/spark/blob/7251be0c04f0380208e0197e559158a9e1400868/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L152-L155]
 writes to an EC file, one may get exceptions reading, or get odd outputs. A 
sample exception is
{noformat}
hdfs dfs -cat /user/spark/applicationHistory/application_1540333573846_0003 | 
head -1
18/10/23 18:12:39 WARN impl.BlockReaderFactory: I/O error constructing remote 
block reader.
java.io.IOException: Got error, status=ERROR, status message opReadBlock 
BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 received 
exception java.io.IOException:  Offset 0 and length 116161 don't match block 
BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 ( blockLen 
110296 ), for OP_READ_BLOCK, self=/HOST_IP:48610, remote=/HOST2_IP:20002, for 
file /user/spark/applicationHistory/application_1540333573846_0003, for pool 
BP-1488936467-HOST_IP-154092519 block -9223372036854774960_1085
at 
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.checkSuccess(BlockReaderRemote.java:440)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:408)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:848)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:744)
at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379)
at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644)
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:264)
at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:299)
at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:330)
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:326)
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:419)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:92)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127)
at 
org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101)
at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96)
at 
org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:367)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
at 
org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:304)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:286)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:270)
at 
org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:326)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:389)
18/10/23 18:12:39 WARN hdfs.DFSClient: Failed to connect to /HOST2_IP:20002 for 
blockBP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085
java.io.IOException: Got error, status=ERROR, status message opReadBlock 
BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 received 
exception java.io.IOException:  Offset 0 and length 116161 don't match block 
BP-1488936467-HOST_IP-154092519:blk_-9223372036854774960_1085 ( blockLen 
110296 ), for OP_READ_BLOCK, self=/HOST_IP:48610, remote=/HOST2_IP:20002, for 
file /user/spark/applicationHistory/application_1540333573846_0003, for pool 
BP-1488936467-HOST_IP-154092519 block -9223372036854774960_1085

[jira] [Created] (HDFS-14021) TestReconstructStripedBlocksWithRackAwareness#testReconstructForNotEnoughRacks fails intermittently

2018-10-23 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-14021:


 Summary: 
TestReconstructStripedBlocksWithRackAwareness#testReconstructForNotEnoughRacks 
fails intermittently
 Key: HDFS-14021
 URL: https://issues.apache.org/jira/browse/HDFS-14021
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding, test
Affects Versions: 3.0.0
Reporter: Xiao Chen
Assignee: Xiao Chen


The test sometimes fail with:
{noformat}
java.lang.AssertionError: expected:<0> but was:<1>

at 
org.apache.hadoop.hdfs.server.blockmanagement.TestReconstructStripedBlocksWithRackAwarness.testReconstructForNotEnoughRacks(TestReconstructStripedBlocksWithRackAwareness.java:171)

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13998) ECAdmin NPE with -setPolicy -replicate

2018-10-16 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13998:


 Summary: ECAdmin NPE with -setPolicy -replicate
 Key: HDFS-13998
 URL: https://issues.apache.org/jira/browse/HDFS-13998
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Affects Versions: 3.2.0, 3.1.2
Reporter: Xiao Chen
Assignee: Zsolt Venczel


HDFS-13732 tried to improve the output of the console tool. But we missed the 
fact that for replication, {{getErasureCodingPolicy}} would return null.

This jira is to fix it in ECAdmin, and add a unit test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13926) ThreadLocal aggregations for FileSystem.Statistics are incorrect with striped reads

2018-09-17 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13926:


 Summary: ThreadLocal aggregations for FileSystem.Statistics are 
incorrect with striped reads
 Key: HDFS-13926
 URL: https://issues.apache.org/jira/browse/HDFS-13926
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Affects Versions: 3.0.0
Reporter: Xiao Chen
Assignee: Xiao Chen


During some integration testing, [~nsheth] found out that per-thread read stats 
for EC is incorrect. This is due to the striped reads are done asynchronously 
on the worker threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13847) Clean up ErasureCodingPolicyManager

2018-08-22 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13847:


 Summary: Clean up ErasureCodingPolicyManager
 Key: HDFS-13847
 URL: https://issues.apache.org/jira/browse/HDFS-13847
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: erasure-coding
Affects Versions: 3.0.0
Reporter: Xiao Chen


{{ErasureCodingPolicyManager}} class is declared as LimitedPrivate for HDFS.

This doesn't seem to make sense, as I have checked all its usages are strictly 
within hadoop-hdfs project.
According to our [compat 
guide|http://hadoop.apache.org/docs/r3.1.0/hadoop-project-dist/hadoop-common/Compatibility.html]:
{quote}
Within a component Hadoop developers are free to use Private and Limited 
Private APIs,
{quote}

We should tune this down to just Private.

This is identified because an internal testing marked HDFS-13772 as 
incompatible, due to the method signature changes on the ECPM class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13788) Update EC documentation about rack fault tolerance

2018-08-02 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13788:


 Summary: Update EC documentation about rack fault tolerance
 Key: HDFS-13788
 URL: https://issues.apache.org/jira/browse/HDFS-13788
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation, erasure-coding
Affects Versions: 3.0.0
Reporter: Xiao Chen
Assignee: Kitti Nanasi


>From 
>http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html:
{quote}
For rack fault-tolerance, it is also important to have at least as many racks 
as the configured EC stripe width. For EC policy RS (6,3), this means minimally 
9 racks, and ideally 10 or 11 to handle planned and unplanned outages. For 
clusters with fewer racks than the stripe width, HDFS cannot maintain rack 
fault-tolerance, but will still attempt to spread a striped file across 
multiple nodes to preserve node-level fault-tolerance.
{quote}
Theoretical minimum is 3 racks, and ideally 9 or more, so the document should 
be updated.

(I didn't check timestamps, but this is probably due to 
{{BlockPlacementPolicyRackFaultTolerant}} isn't completely done when HDFS-9088 
introduced this doc. Later there's also examples in 
{{TestErasureCodingMultipleRacks}} to test this explicitly.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13741) Cosmetic code improvement in XAttrFormat

2018-07-17 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13741:


 Summary: Cosmetic code improvement in XAttrFormat
 Key: HDFS-13741
 URL: https://issues.apache.org/jira/browse/HDFS-13741
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Xiao Chen
Assignee: Daniel Templeton


In an offline review, [~templedf] had a comment about the following code 
snippet.

 

{code:java title=XAttrFormat.java}
static int toInt(XAttr.NameSpace namespace, String name) {
  long xattrStatusInt = 0; // <-- this can be combined with the line below

  xattrStatusInt = NAMESPACE.BITS
  .combine(namespace.ordinal(), xattrStatusInt);
  int nid = XAttrStorage.getNameSerialNumber(name);
  xattrStatusInt = NAME.BITS // <-- no line break necessary
  .combine(nid, xattrStatusInt);

  return (int) xattrStatusInt;
}
 {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-13690) Improve error message when creating encryption zone while KMS is unreachable

2018-07-16 Thread Xiao Chen (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen resolved HDFS-13690.
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0

Committed to trunk. Thanks for the contribution [~knanasi] !

> Improve error message when creating encryption zone while KMS is unreachable
> 
>
> Key: HDFS-13690
> URL: https://issues.apache.org/jira/browse/HDFS-13690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: encryption, hdfs, kms
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HDFS-13690.001.patch, HDFS-13690.002.patch, 
> HDFS-13690.003.patch, HDFS-13690.004.patch
>
>
> In failure testing, we stopped the KMS and then tried to run some encryption 
> related commands.
> {{hdfs crypto -createZone}} will complain with a short "RemoteException: 
> Connection refused." This message could be improved to explain that we cannot 
> connect to the KMSClientProvier.
> For example, {{hadoop key list}} while KMS is down will error:
> {code}
>  -bash-4.1$ hadoop key list
>  Cannot list keys for KeyProvider: 
> KMSClientProvider[http://hdfs-cdh5-vanilla-1.vpc.cloudera.com:16000/kms/v1/]: 
> Connection refusedjava.net.ConnectException: Connection refused
>  at java.net.PlainSocketImpl.socketConnect(Native Method)
>  at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>  at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>  at java.net.Socket.connect(Socket.java:579)
>  at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
>  at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
>  at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
>  at sun.net.www.http.HttpClient.(HttpClient.java:211)
>  at sun.net.www.http.HttpClient.New(HttpClient.java:308)
>  at sun.net.www.http.HttpClient.New(HttpClient.java:326)
>  at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996)
>  at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
>  at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850)
>  at 
> org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:186)
>  at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:125)
>  at 
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216)
>  at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:312)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:397)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider$1.run(KMSClientProvider.java:392)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:392)
>  at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.getKeys(KMSClientProvider.java:479)
>  at 
> org.apache.hadoop.crypto.key.KeyShell$ListCommand.execute(KeyShell.java:286)
>  at org.apache.hadoop.crypto.key.KeyShell.run(KeyShell.java:79)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>  at org.apache.hadoop.crypto.key.KeyShell.main(KeyShell.java:513)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13731) Investigate TestReencryption timeouts

2018-07-11 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13731:


 Summary: Investigate TestReencryption timeouts
 Key: HDFS-13731
 URL: https://issues.apache.org/jira/browse/HDFS-13731
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, test
Affects Versions: 3.0.0
Reporter: Xiao Chen


HDFS-12837 fixed some flakiness of Reencryption related tests. But as 
[~zvenczel]'s comment, there are a few timeouts still. We should investigate 
that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13721) NPE in DataNode due to uninitialized DiskBalancer

2018-07-05 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13721:


 Summary: NPE in DataNode due to uninitialized DiskBalancer
 Key: HDFS-13721
 URL: https://issues.apache.org/jira/browse/HDFS-13721
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, diskbalancer
Reporter: Xiao Chen
Assignee: Xiao Chen


{noformat}
2018-06-28 05:11:47,650 ERROR org.apache.hadoop.jmx.JMXJsonServlet: getting 
attribute DiskBalancerStatus of Hadoop:service=DataNode,name=DataNodeInfo threw 
an exception
javax.management.RuntimeMBeanException: java.lang.NullPointerException
 * TRACEBACK 4 *
 at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
 at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
 at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
 at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
 at org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:338)
 at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:316)
 at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:210)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
 at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
 at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
 at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1537)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
 at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
 at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
 at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
 at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
 at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
 at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
 at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
 at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
 at org.eclipse.jetty.server.Server.handle(Server.java:534)
 at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
 at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
 at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
 at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
 at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
 at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
 at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
 at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
 at 
org.apache.hadoop.hdfs.server.datanode.DataNode.getDiskBalancerStatus(DataNode.java:3146)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
 at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
 at 
com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)
 at

[jira] [Created] (HDFS-13682) Cannot create encryption zone after KMS auth token expires

2018-06-14 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13682:


 Summary: Cannot create encryption zone after KMS auth token expires
 Key: HDFS-13682
 URL: https://issues.apache.org/jira/browse/HDFS-13682
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption, namenode
Affects Versions: 3.0.0
Reporter: Xiao Chen
Assignee: Xiao Chen
 Attachments: HDFS-13682.dirty.repro.patch

Our internal testing reported this behavior recently.
{noformat}
[root@nightly6x-1 ~]# sudo -u hdfs /usr/bin/kinit -kt /cdep/keytabs/hdfs.keytab 
hdfs -l 30d -r 30d
[root@nightly6x-1 ~]# sudo -u hdfs klist
Ticket cache: FILE:/tmp/krb5cc_994
Default principal: h...@gce.cloudera.com

Valid starting   Expires  Service principal
06/12/2018 03:24:09  07/12/2018 03:24:09  
krbtgt/gce.cloudera@gce.cloudera.com
[root@nightly6x-1 ~]# sudo -u hdfs hdfs crypto -createZone -keyName key77 -path 
/user/systest/ez
RemoteException: 
org.apache.hadoop.security.authentication.client.AuthenticationException: 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)
{noformat}

Upon further investigation, it's due to the KMS client (cached in HDFS NN) 
cannot authenticate with the server after the authentication token (which is 
cached by KMSCP) expires, even if the HDFS client RPC has valid kerberos 
credentials.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-08 Thread Xiao Chen

Thanks for the effort on this Yongjun.

+1 (binding)

   - Built from src
   - Deployed a pseudo distributed HDFS with KMS
   - Ran basic hdfs commands with encryption
   - Sanity checked webui and logs


-Xiao

On Fri, Jun 8, 2018 at 10:34 AM, Brahma Reddy Battula <
brahmareddy.batt...@hotmail.com> wrote:

> Thanks yongjun zhang for driving this release.
>
> +1 (binding).
>
>
> ---Built from the source
> ---Installed HA cluster
> ---Execute the basic shell commands
> ---Browsed the UI's
> ---Ran sample jobs like pi,wordcount
>
>
>
> 
> From: Yongjun Zhang 
> Sent: Friday, June 8, 2018 1:04 PM
> To: Allen Wittenauer
> Cc: Hadoop Common; Hdfs-dev; mapreduce-...@hadoop.apache.org;
> yarn-...@hadoop.apache.org
> Subject: Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)
>
> BTW, thanks Allen and Steve for discussing and suggestion about the site
> build problem I hit earlier, I did the following step
>
> mvn install -DskipTests
>
> before doing the steps Nanda listed helped to solve the problems.
>
> --Yongjun
>
>
>
>
> On Thu, Jun 7, 2018 at 6:15 PM, Yongjun Zhang  wrote:
>
> > Thank you all very much for the testing, feedback and discussion!
> >
> > I was able to build outside docker, by following the steps Nanda
> > described, I saw the same problem; then I tried 3.0.2 released a while
> > back, it has the same issue.
> >
> > As Allen pointed out, it seems the steps to build site are not correct. I
> > have not figured out the correct steps yet.
> >
> > At this point, I think this issue should not block the 3.0.3 issue. While
> > at the same time we need to figure out the right steps to build the site.
> > Would you please let me know if you think differently?
> >
> > We only have the site build issue reported so far. And we don't have
> > enough PMC votes yet. So need some more PMCs to help.
> >
> > Thanks again, and best regards,
> >
> > --Yongjun
> >
> >
> > On Thu, Jun 7, 2018 at 4:15 PM, Allen Wittenauer <
> a...@effectivemachines.com
> > > wrote:
> >
> >> > On Jun 7, 2018, at 11:47 AM, Steve Loughran 
> >> wrote:
> >> >
> >> > Actually, Yongjun has been really good at helping me get set up for a
> >> 2.7.7 release, including "things you need to do to get GPG working in
> the
> >> docker image”
> >>
> >> *shrugs* I use a different release script after some changes
> >> broke the in-tree version for building on OS X and I couldn’t get the
> fixes
> >> committed upstream.  So not sure what the problems are that you are
> hitting.
> >>
> >> > On Jun 7, 2018, at 1:08 PM, Nandakumar Vadivelu <
> >> nvadiv...@hortonworks.com> wrote:
> >> >
> >> > It will be helpful if we can get the correct steps, and also update
> the
> >> wiki.
> >> > https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+
> >> Release+Validation
> >>
> >> Yup. Looking forward to seeing it.
> >> -
> >> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >>
> >>
> >
>

[jira] [Created] (HDFS-13655) Adding missing ClientProtocol methods to RBF

2018-06-04 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13655:


 Summary: Adding missing ClientProtocol methods to RBF
 Key: HDFS-13655
 URL: https://issues.apache.org/jira/browse/HDFS-13655
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Xiao Chen


As discussed with [~elgoiri], there are some HDFS methods that does not take 
path as a parameter. We should support these to work with federation.

The ones missing are:
 * Snapshots
 * Storage policies
 * Encryption zones
 * Cache pools

One way to reasonably have them to work with federation is to 'list' each 
nameservice and concat the results. This can be done pretty much the same as 
{{refreshNodes()}} and it would be a matter of querying all the subclusters and 
aggregate the output (e.g., {{getDatanodeReport()}}.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13642) Creating a file with block size smaller than EC policy's cell size should throw

2018-05-30 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13642:


 Summary: Creating a file with block size smaller than EC policy's 
cell size should throw
 Key: HDFS-13642
 URL: https://issues.apache.org/jira/browse/HDFS-13642
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Affects Versions: 3.0.0
Reporter: Xiao Chen
Assignee: Xiao Chen
 Attachments: HDFS-13642.01.patch

The following command causes an exception:
{noformat}
hadoop fs -Ddfs.block.size=349696 -put -f lineitem_sixblocks.parquet 
/test-warehouse/tmp123ec
{noformat}

{noformat}
18/05/25 16:00:59 WARN hdfs.DataStreamer: DataStreamer Exception
java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: 
blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 
lastPacketInBlock: false lastByteOffsetInBlock: 350208
  at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729)
  at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46)
18/05/25 16:00:59 WARN hdfs.DFSOutputStream: Failed: offset=4096, length=512, 
DFSStripedOutputStream:#0: failed, blk_-9223372036854574256_14634
java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: 
blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 
lastPacketInBlock: false lastByteOffsetInBlock: 350208
  at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729)
  at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46)
{noformat}

Then the streamer is confused and hangs.

The local file is under 6MB, the hdfs file has a RS-3-2-1024k EC policy.

 

Credit to [~tarasbob] for reporting this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13540) DFSStripedInputStream should not allocate new buffers during close / unbuffer

2018-05-08 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13540:


 Summary: DFSStripedInputStream should not allocate new buffers 
during close / unbuffer
 Key: HDFS-13540
 URL: https://issues.apache.org/jira/browse/HDFS-13540
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Xiao Chen
Assignee: Xiao Chen


This was found in the same scenario where HDFS-13539 is caught.

There are 2 OOM that looks interesting:
{noformat}
FSDataInputStream#close error:
OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct buffer 
memory
at java.nio.Bits.reserveMemory(Bits.java:694)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at 
org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672)
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181)
at java.io.FilterInputStream.close(FilterInputStream.java:181)
{noformat}
and 
{noformat}
org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error:
OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct buffer 
memory
at java.nio.Bits.reserveMemory(Bits.java:694)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at 
org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
at 
org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782)
at 
org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48)
at 
org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230)
{noformat}

As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the 
buffer pool. We could save the cost of doing so if it's just a close or 
unbuffer call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13539) DFSInputStream NPE when reportCheckSumFailure

2018-05-08 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13539:


 Summary: DFSInputStream NPE when reportCheckSumFailure
 Key: HDFS-13539
 URL: https://issues.apache.org/jira/browse/HDFS-13539
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Xiao Chen
Assignee: Xiao Chen
 Attachments: HDFS-13539.01.patch

We have seem the following exception with DFSStripedInputStream.
{noformat}
readDirect: FSDataInputStream#read error:
NullPointerException: java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:402)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:831)
at 
org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:147)
{noformat}
Line 402 is {{reportCheckSumFailure}}, and {{currentLocatedBlock}} is the only 
possible null object.

Original exception is masked by the NPE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-13430) Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445

2018-05-07 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reopened HDFS-13430:
--

> Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445
> --
>
> Key: HDFS-13430
> URL: https://issues.apache.org/jira/browse/HDFS-13430
> Project: Hadoop HDFS
>  Issue Type: Bug
>    Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 2.9.2
>
> Attachments: HDFS-13430.01.patch
>
>
> Unfortunately HADOOP-14445 had an HDFS test failure that's not caught in the 
> hadoop-common precommit runs.
> This is caught by our internal pre-commit using dist-test, and appears to be 
> the only failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-13430) Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445

2018-05-07 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen resolved HDFS-13430.
--
   Resolution: Invalid
Fix Version/s: (was: 3.0.3)
   (was: 3.1.1)
   (was: 3.2.0)

> Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445
> --
>
> Key: HDFS-13430
> URL: https://issues.apache.org/jira/browse/HDFS-13430
> Project: Hadoop HDFS
>  Issue Type: Bug
>    Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 2.9.2
>
> Attachments: HDFS-13430.01.patch
>
>
> Unfortunately HADOOP-14445 had an HDFS test failure that's not caught in the 
> hadoop-common precommit runs.
> This is caught by our internal pre-commit using dist-test, and appears to be 
> the only failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.2 (RC1)

2018-04-20 Thread Xiao Chen

Thanks Eddy for the effort!

+1 (binding)

   - Downloaded src tarball and verified checksums
   - Built from src
   - Started a pseudo distributed hdfs cluster
   - Verified basic hdfs operations work
   - Sanity checked webui and logs

Best,
-Xiao


-Xiao

On Fri, Apr 20, 2018 at 1:44 AM, 俊平堵  wrote:

> Thanks Lei for the work!
>
> +1 (binding), base on following verification work:
> - built succeed from source
> - verified signature
> - deployed a pseudo cluster and run some simple MR jobs (PI, sleep,
> terasort, etc.)
> - checked HDFS/YARN daemons' UI
> - Tried some rolling upgrade related features: MR over DistributedCache, NM
> Restart with work preserving, etc.
>
> Thanks,
>
> Junping
>
> 2018-04-17 7:59 GMT+08:00 Lei Xu :
>
> > Hi, All
> >
> > I've created release candidate RC-1 for Apache Hadoop 3.0.2, to
> > address missing source jars in the maven repository in RC-0.
> >
> > Thanks Ajay Kumar for spotting the error.
> >
> > Please note: this is an amendment for Apache Hadoop 3.0.1 release to
> > fix shaded jars in apache maven repository. The codebase of 3.0.2
> > release is the same as 3.0.1.  New bug fixes will be included in
> > Apache Hadoop 3.0.3 instead.
> >
> > The release page is:
> > https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0+Release
> >
> > New RC is available at: http://home.apache.org/~lei/hadoop-3.0.2-RC1/
> >
> > The git tag is release-3.0.2-RC1, and the latest commit is
> > 5c141f7c0f24c12cb8704a6ccc1ff8ec991f41ee, which is the same as RC-0.
> >
> > The maven artifacts are available at:
> > https://repository.apache.org/content/repositories/orgapachehadoop-1102/
> >
> > Please try the release, especially, *verify the maven artifacts*, and
> vote.
> >
> > The vote will run 5 days, ending 4/21/2018.
> >
> > Here is my +1.
> >
> > Best,
> >
> > -
> > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >
> >
>

[jira] [Created] (HDFS-13430) Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445

2018-04-11 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13430:


 Summary: Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445
 Key: HDFS-13430
 URL: https://issues.apache.org/jira/browse/HDFS-13430
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Xiao Chen
Assignee: Xiao Chen
 Attachments: HDFS-13430.01.patch

Unfortunately HADOOP-14445 had an HDFS test failure that's not caught in the 
hadoop-common precommit runs.

This is caught by our internal pre-commit using dist-test, and appears to be 
the only failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.2 (RC0)

2018-04-09 Thread Xiao Chen

Thanks Eddy for the effort!

+1 (binding)

   - Downloaded src tarball and verified checksums
   - Built from src
   - Started a pseudo distributed hdfs cluster
   - Verified basic hdfs operations work
   - Sanity checked logs / webui

Best,
-Xiao


On Mon, Apr 9, 2018 at 11:28 AM, Eric Payne 
wrote:

> Thanks a lot for working to produce this release.
>
> +1 (binding)
> Tested the following:
> - built from source and installed on 6-node pseudo-cluster
> - tested Capacity Scheduler FairOrderingPolicy and FifoOrderingPolicy to
> determine that capacity was assigned as expected in each case
> - tested user weights with FifoOrderingPolicy to ensure that weights were
> assigned to users as expected.
>
> Eric Payne
>
>
>
>
>
>
> On Friday, April 6, 2018, 1:17:10 PM CDT, Lei Xu  wrote:
>
>
>
>
>
> Hi, All
>
> I've created release candidate RC-0 for Apache Hadoop 3.0.2.
>
> Please note: this is an amendment for Apache Hadoop 3.0.1 release to
> fix shaded jars in apache maven repository. The codebase of 3.0.2
> release is the same as 3.0.1.  New bug fixes will be included in
> Apache Hadoop 3.0.3 instead.
>
> The release page is:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0+Release
>
> New RC is available at: http://home.apache.org/~lei/hadoop-3.0.2-RC0/
>
> The git tag is release-3.0.2-RC0, and the latest commit is
> 5c141f7c0f24c12cb8704a6ccc1ff8ec991f41ee
>
> The maven artifacts are available at
> https://repository.apache.org/content/repositories/orgapachehadoop-1096/
>
> Please try the release, especially, *verify the maven artifacts*, and vote.
>
> The vote will run 5 days, ending 4/11/2018.
>
> Thanks for everyone who helped to spot the error and proposed fixes!
>
> -
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Re: Hdfs build on branch-2 are failing.

2018-04-09 Thread Xiao Chen

Similar to Haibo's link I found https://stackoverflow.com/a/22526046/1338884 to
be working.
Thrown a patch at HADOOP-15375
 to unblock branch-2.

I'm also not sure why this is passing for trunk.

-Xiao

On Thu, Apr 5, 2018 at 3:41 PM, Haibo Chen  wrote:

> Not sure why this did not show up in trunk. A quick google search takes me
> to bower-install-cert-untrusted-error
>  install-cert-untrusted-error>
>
> On Thu, Apr 5, 2018 at 10:40 AM, Vrushali C 
> wrote:
>
> > Seeing the same for branch-2 yarn patches as well.
> >
> > On Fri, Mar 30, 2018 at 12:54 PM, Rushabh Shah  >
> > wrote:
> >
> > > Hi All,
> > > Recently couple of my hdfs builds failed on branch-2.
> > > Slave id# H9 
> > > Builds that failed:
> > > https://builds.apache.org/job/PreCommit-HDFS-Build/23737/console
> > > https://builds.apache.org/job/PreCommit-HDFS-Build/23735/console
> > >
> > > It failed with the following error:
> > >
> > > npm http GET https://registry.npmjs.org/bowernpm http GET
> > > https://registry.npmjs.org/bowernpm http GET
> > > https://registry.npmjs.org/bowernpm ERR! Error: CERT_UNTRUSTED
> > > npm ERR! at SecurePair. (tls.js:1370:32)
> > > npm ERR! at SecurePair.EventEmitter.emit (events.js:92:17)npm ERR!
> > > at SecurePair.maybeInitFinished (tls.js:982:10)npm ERR! at
> > > CleartextStream.read [as _read] (tls.js:469:13)
> > > npm ERR! at CleartextStream.Readable.read
> > > (_stream_readable.js:320:10)npm ERR! at EncryptedStream.write [as
> > > _write] (tls.js:366:25)npm ERR! at doWrite
> > > (_stream_writable.js:223:10)
> > > npm ERR! at writeOrBuffer (_stream_writable.js:213:5)npm ERR!
> > > at EncryptedStream.Writable.write (_stream_writable.js:180:11)
> > > npm ERR! at write (_stream_readable.js:583:24)npm ERR! If you need
> > > help, you may report this log at:
> > > npm ERR! 
> > > npm ERR! or email it to:
> > > npm ERR! npm ERR! System Linux
> > > 3.13.0-143-genericnpm ERR! command "/usr/bin/nodejs" "/usr/bin/npm"
> > > "install" "-g" "bower"
> > > npm ERR! cwd /rootnpm ERR! node -v v0.10.25npm ERR! npm -v 1.3.10npm
> ERR!
> > > npm ERR! Additional logging details can be found in:
> > > npm ERR! /root/npm-debug.log
> > > npm ERR! not ok code 0
> > >
> > >
> > >
> > > The certificate details on https://registry.npmjs.org/bower:
> > >
> > > Not valid before: Thursday, March 15, 2018 at 8:39:52 AM Central
> Daylight
> > > Time
> > > Not valid after: Saturday, June 13, 2020 at 2:06:17 PM Central Daylight
> > > Time
> > >
> > > Far from being an expert on ssl, do we need to change the truststore on
> > > slave also ?
> > >
> > > Appreciate if anyone can help fixing this.
> > >
> > >
> > > Thanks,
> > > Rushabh Shah.
> > >
> >
>

Re: [VOTE] Adopt HDSL as a new Hadoop subproject

2018-03-26 Thread Xiao Chen

+1

Thanks,
-Xiao

On Sun, Mar 25, 2018 at 9:07 PM, Akira Ajisaka 
wrote:

> +1
>
> Thanks,
> Akira
>
>
> On 2018/03/24 15:18, Lokesh Jain wrote:
>
>> +1 (non-binding)
>>
>> Thanks
>> Lokesh
>>
>>
>> -
>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>>
>>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>

Re: [VOTE] Release Apache Hadoop 3.0.1 (RC1)

2018-03-19 Thread Xiao Chen

Thanks Eddy for the effort!

+1 (binding)

   - Downloaded src tarball and verified checksums
   - Built from src
   - Started a pseudo distributed hdfs cluster
   - Verified basic hdfs operations work
   - Verified hdfs encryption basic operations work
   - Sanity checked logs / webui


-Xiao

On Mon, Mar 19, 2018 at 10:45 AM, Lei Xu  wrote:

> Sure, Akira
>
> .mds files are uploaded to http://home.apache.org/~lei/hadoop-3.0.1-RC1/
>
> On Sun, Mar 18, 2018 at 6:04 PM, Akira Ajisaka
>  wrote:
> > Hi Lei,
> >
> > Would you provide SHA checksum files instead of MD5?
> > http://www.apache.org/dev/release-distribution#sigs-and-sums
> >
> > -Akira
> >
> >
> > On 2018/03/18 13:11, Lei Xu wrote:
> >>
> >> Hi, all
> >>
> >> I've created release candidate RC-1 for Apache Hadoop 3.0.1
> >>
> >> Apache Hadoop 3.0.1 will be the first bug fix release for Apache
> >> Hadoop 3.0 release. It includes 49 bug fixes and security fixes, which
> >> include 12
> >> blockers and 17 are critical.
> >>
> >> Please note:
> >> * HDFS-12990. Change default NameNode RPC port back to 8020. It makes
> >> incompatible changes to Hadoop 3.0.0.  After 3.0.1 releases, Apache
> >> Hadoop 3.0.0 will be deprecated due to this change.
> >>
> >> The release page is:
> >> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0+Release
> >>
> >> New RC is available at: http://home.apache.org/~lei/hadoop-3.0.1-RC1/
> >>
> >> The git tag is release-3.0.1-RC1, and the latest commit is
> >> 496dc57cc2e4f4da117f7a8e3840aaeac0c1d2d0
> >>
> >> The maven artifacts are available at:
> >> https://repository.apache.org/content/repositories/
> orgapachehadoop-1081/
> >>
> >> Please try the release and vote; the vote will run for the usual 5
> >> days, ending on 3/22/2017 6pm PST time.
> >>
> >> Thanks!
> >>
> >> -
> >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >>
> >
>
>
>
> --
> Lei (Eddy) Xu
> Software Engineer, Cloudera
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>

[jira] [Reopened] (HDFS-13164) File not closed if streamer fail with DSQuotaExceededException

2018-03-02 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reopened HDFS-13164:
--

> File not closed if streamer fail with DSQuotaExceededException
> --
>
> Key: HDFS-13164
> URL: https://issues.apache.org/jira/browse/HDFS-13164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Xiao Chen
>    Assignee: Xiao Chen
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: HDFS-13164.01.patch, HDFS-13164.02.patch, 
> HDFS-13164.branch-2.8.01.patch
>
>
>  This is found during yarn log aggregation but theoretically could happen to 
> any client.
> If the dir's space quota is exceeded, the following would happen when a file 
> is created:
>  - client {{startFile}} rpc to NN, gets a {{DFSOutputStream}}.
>  - writing to the stream would trigger the streamer to {{getAdditionalBlock}} 
> rpc to NN, which would get the DSQuotaExceededException
>  - client closes the stream
>   
>  The fact that this would leave a 0-sized (or whatever size left in the 
> quota) file in HDFS is beyond the scope of this jira. However, the file would 
> be left in openforwrite status (shown in {{fsck -openforwrite)}} at least, 
> and could potentially leak leaseRenewer too.
> This is because in the close implementation,
>  # {{isClosed}} is first checked, and the close call will be a no-op if 
> {{isClosed == true}}.
>  # {{flushInternal}} checks {{isClosed}}, and throws the exception right away 
> if true
> {{isClosed}} does this: {{return closed || getStreamer().streamerClosed;}}
> When the disk quota is reached, {{getAdditionalBlock}} will throw when the 
> streamer calls. Because the streamer runs in a separate thread, at the time 
> the client calls close on the stream, the streamer may or may not have 
> reached the Quota exception. If it has, then due to #1, the close call on the 
> stream will be no-op. If it hasn't, then due to #2 the {{completeFile}} logic 
> will be skipped.
> {code:java}
> protected synchronized void closeImpl() throws IOException {
> if (isClosed()) {
>   IOException e = lastException.getAndSet(null);
>   if (e == null)
> return;
>   else
> throw e;
> }
>   try {
> flushBuffer(); // flush from all upper layers
> ...
> flushInternal(); // flush all data to Datanodes
> // get last block before destroying the streamer
> ExtendedBlock lastBlock = getStreamer().getBlock();
> try (TraceScope ignored =
>dfsClient.getTracer().newScope("completeFile")) {
>completeFile(lastBlock);
> }
>} catch (ClosedChannelException ignored) {
>} finally {
>  closeThreads(true);
>}
>  }
>  {code}
> Log snippets:
> {noformat}
> 2018-02-16 15:59:32,916 DEBUG org.apache.hadoop.hdfs.DFSClient: DataStreamer 
> Quota Exception
> org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
> of /DIR is exceeded: quota = 200 B = 1.91 MB but diskspace consumed = 
> 404139552 B = 385.42 MB
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyDiskspaceQuota(DirectoryWithQuotaFeature.java:149)
> at 
> org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:159)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:2124)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1991)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1966)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:463)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3896)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3484)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:686)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlo

[jira] [Created] (HDFS-13177) Investigate and fix DFSStripedOutputStream handling of DSQuotaExceededException

2018-02-21 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13177:


 Summary: Investigate and fix DFSStripedOutputStream handling of 
DSQuotaExceededException
 Key: HDFS-13177
 URL: https://issues.apache.org/jira/browse/HDFS-13177
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Xiao Chen
Assignee: Xiao Chen


This is the DFSStripedOutputStream equivalent of HDFS-13164



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13164) File not closed if append fail with DSQuotaExceededException

2018-02-16 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13164:


 Summary: File not closed if append fail with 
DSQuotaExceededException
 Key: HDFS-13164
 URL: https://issues.apache.org/jira/browse/HDFS-13164
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.5
Reporter: Xiao Chen
Assignee: Xiao Chen


 This is found during yarn log aggregation but theoretically could happen to 
any client.

If the dir's space quota is exceeded, the following would happen when a file is 
created:
 - client {{startFile}} rpc to NN, gets a {{DFSOutputStream}}.
 - writing to the stream would trigger the streamer to {{getAdditionalBlock}} 
rpc to NN, which would get the DSQuotaExceededException
 - client closes the stream
  
 The fact that this would leave a 0-sized (or whatever size left in the quota) 
file in HDFS is beyond the scope of this jira. However, the file would be left 
in openforwrite status (shown in {{fsck -openforwrite)}} at least, and could 
potentially leak leaseRenewer too.

This is because in the close implementation,
 # {{isClosed}} is first checked, and the close call will be a no-op if 
{{isClosed == true}}.
 # {{flushInternal}} checks {{isClosed}}, and throws the exception right away 
if true

{{isClosed}} does this: {{return closed || getStreamer().streamerClosed;}}

When the disk quota is reached, {{getAdditionalBlock}} will throw when the 
streamer calls. Because the streamer runs in a separate thread, at the time the 
client calls close on the stream, the streamer may or may not have reached the 
Quota exception. If it has, then due to #1, the close call on the stream will 
be no-op. If it hasn't, then due to #2 the {{completeFile}} logic will be 
skipped.
{code:java}
protected synchronized void closeImpl() throws IOException {
if (isClosed()) {
  IOException e = lastException.getAndSet(null);
  if (e == null)
return;
  else
throw e;
}
  try {
flushBuffer(); // flush from all upper layers
...
flushInternal(); // flush all data to Datanodes

// get last block before destroying the streamer
ExtendedBlock lastBlock = getStreamer().getBlock();

try (TraceScope ignored =
   dfsClient.getTracer().newScope("completeFile")) {
   completeFile(lastBlock);
}
   } catch (ClosedChannelException ignored) {
   } finally {
 closeThreads(true);
   }
 }

 {code}
Log snippets:
{noformat}
2018-02-16 15:59:32,916 DEBUG org.apache.hadoop.hdfs.DFSClient: DataStreamer 
Quota Exception
org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
of /DIR is exceeded: quota = 200 B = 1.91 MB but diskspace consumed = 
404139552 B = 385.42 MB
at 
org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyDiskspaceQuota(DirectoryWithQuotaFeature.java:149)
at 
org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:159)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:2124)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1991)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1966)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:463)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3896)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3484)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:686)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAcce

Re: [VOTE] Release Apache Hadoop 3.0.1 (RC0)

2018-02-16 Thread Xiao Chen

Thanks Eddy for driving this!

+1 (binding)

   - Downloaded src tarball and verified md5
   - Built from src
   - Started a pseudo distributed cluster
   - Verified basic hdfs operations work
   - Verified hdfs encryption basic operations work
   - Sanity checked logs

The wiki release page seems to have less items than jira query (p1/p2,
fixed, 3.0.1) though...




-Xiao

On Thu, Feb 15, 2018 at 3:36 PM, Lei Xu  wrote:

> Hi, all
>
> I've created release candidate 0 for Apache Hadoop 3.0.1
>
> Apache Hadoop 3.0.1 will be the first bug fix release for Apache
> Hadoop 3.0 release. It includes 49 bug fixes, which include 10
> blockers and 8 are critical.
>
> Please note:
> * HDFS-12990. Change default NameNode RPC port back to 8020. It makes
> incompatible changes to Hadoop 3.0.0.  After 3.0.1 releases, Apache
> Hadoop 3.0.0 will be deprecated due to this change.
>
> The release page is:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0+Release
>
> New RC is available at: http://home.apache.org/~lei/hadoop-3.0.1-RC0/
>
> The git tag is release-3.0.1-RC0, and the latest commit is
> 494d075055b52b0cc922bc25237e231bb3771c90
>
> The maven artifacts are available:
> https://repository.apache.org/content/repositories/orgapachehadoop-1078/
>
> Please try the release and vote; the vote will run for the usual 5
> days, ending on 2/20/2017 6pm PST time.
>
> Thanks!
>
> --
> Lei (Eddy) Xu
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>

[jira] [Created] (HDFS-13091) Remove tomcat from the Hadoop-auth test bundle

2018-01-30 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-13091:


 Summary: Remove tomcat from the Hadoop-auth test bundle
 Key: HDFS-13091
 URL: https://issues.apache.org/jira/browse/HDFS-13091
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Xiao Chen
Assignee: Xiao Chen


We have switched KMS and HttpFS from tomcat to jetty in 3.0. There appears to 
have some left over tests in Hadoop-auth which were for used for KMS / HttpFS 
coverage.

We should cleanup the test accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-01-04 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12990:


 Summary: Change default NameNode RPC port back to 8020
 Key: HDFS-12990
 URL: https://issues.apache.org/jira/browse/HDFS-12990
 Project: Hadoop HDFS
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: Xiao Chen
Assignee: Xiao Chen
Priority: Critical


In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
default ports to ephemeral ports, which is very appreciated by admin. As part 
of that change, we also modified the NN RPC port from the famous 8020 to 9820, 
to be closer to other ports changed there.

With more integration going on, it appears that all the other ephemeral port 
changes are fine, but the NN RPC port change is painful for downstream on 
migrating to Hadoop 3. Some examples include:
# Hive table locations pointing to hdfs://nn:port/dir
# Downstream minicluster unit tests that assumed 8020
# Oozie workflows / downstream scripts that used 8020

This isn't a problem for HA URLs, since that does not include the port number. 
But considering the downstream impact, instead of requiring all of them change 
their stuff, it would be a way better experience to leave the NN port 
unchanged. This will benefit Hadoop 3 adoption and ease unnecessary upgrade 
burdens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12946) Add a tool to check rack configuration against EC policies

2017-12-19 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12946:


 Summary: Add a tool to check rack configuration against EC policies
 Key: HDFS-12946
 URL: https://issues.apache.org/jira/browse/HDFS-12946
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: erasure-coding
Reporter: Xiao Chen
Assignee: Xiao Chen


>From testing we have seen setups with problematic racks / datanodes that would 
>not suffice basic EC usages. These are usually found out only after the tests 
>failed.

We should provide a way to check this beforehand.

Some scenarios:
- not enough datanodes compared to EC policy's highest data+parity number
- not enough racks to satisfy BPPRackFaultTolerant
- highly uneven racks to satisfy BPPRackFaultTolerant
- highly uneven racks (so that BPP's considerLoad logic may exclude some busy 
nodes on the rack, resulting in #2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12933) Improve logging when DFSStripedOutputStream failed to read some blocks

2017-12-15 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12933:


 Summary: Improve logging when DFSStripedOutputStream failed to 
read some blocks
 Key: HDFS-12933
 URL: https://issues.apache.org/jira/browse/HDFS-12933
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: erasure-coding
Reporter: Xiao Chen
Priority: Minor


Currently if there are less DataNodes than the erasure coding policy's (# of 
data blocks + # of parity blocks), the client sees this:

{noformat}
09:18:24 17/12/14 09:18:24 WARN hdfs.DFSOutputStream: Cannot allocate parity 
block(index=13, policy=RS-10-4-1024k). Not enough datanodes? Exclude nodes=[]
09:18:24 17/12/14 09:18:24 WARN hdfs.DFSOutputStream: Block group <1> has 1 
corrupt blocks.
{noformat}

The 1st line is good. The 2nd line may be confusing to end users. We should 
investigate the error and be more general / accurate. Maybe something like 
'failed to read x blocks'.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-11 Thread Xiao Chen

+1 (binding)

- downloaded src tarball, verified md5
- built from source with jdk1.8.0_112
- started a pseudo cluster with hdfs and kms
- sanity checked encryption related operations working
- sanity checked webui and logs.

-Xiao

On Mon, Dec 11, 2017 at 6:10 PM, Aaron T. Myers  wrote:

> +1 (binding)
>
> - downloaded the src tarball and built the source (-Pdist -Pnative)
> - verified the checksum
> - brought up a secure pseudo distributed cluster
> - did some basic file system operations (mkdir, list, put, cat) and
> confirmed that everything was working
> - confirmed that the web UI worked
>
> Best,
> Aaron
>
> On Fri, Dec 8, 2017 at 12:31 PM, Andrew Wang 
> wrote:
>
> > Hi all,
> >
> > Let me start, as always, by thanking the efforts of all the contributors
> > who contributed to this release, especially those who jumped on the
> issues
> > found in RC0.
> >
> > I've prepared RC1 for Apache Hadoop 3.0.0. This release incorporates 302
> > fixed JIRAs since the previous 3.0.0-beta1 release.
> >
> > You can find the artifacts here:
> >
> > http://home.apache.org/~wang/3.0.0-RC1/
> >
> > I've done the traditional testing of building from the source tarball and
> > running a Pi job on a single node cluster. I also verified that the
> shaded
> > jars are not empty.
> >
> > Found one issue that create-release (probably due to the mvn deploy
> change)
> > didn't sign the artifacts, but I fixed that by calling mvn one more time.
> > Available here:
> >
> > https://repository.apache.org/content/repositories/orgapachehadoop-1075/
> >
> > This release will run the standard 5 days, closing on Dec 13th at 12:31pm
> > Pacific. My +1 to start.
> >
> > Best,
> > Andrew
> >
>

[jira] [Created] (HDFS-12872) EC Checksum broken when BlockAccessToken is enabled

2017-11-29 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12872:


 Summary: EC Checksum broken when BlockAccessToken is enabled
 Key: HDFS-12872
 URL: https://issues.apache.org/jira/browse/HDFS-12872
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Reporter: Xiao Chen
Assignee: Xiao Chen
Priority: Critical


It appears {{hdfs ec -checksum}} doesn't work when block access token is 
enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-20 Thread Xiao Chen

+1 (binding)

Thanks Andrew!


   - Verified md5 and built from source
   - Started a pseudo distributed cluster with KMS,
   - Performed basic hdfs operations plus encryption related operations
   - Verified logs and webui
   - Confidence from CDH testings (will let Andrew answer officially, but
   we have smokes and nightlies for various downstream projects)


-Xiao

On Mon, Nov 20, 2017 at 3:38 PM, Shane Kumpf 
wrote:

> Thanks, Andrew!
>
> +1 (non-binding)
>
> - Verified checksums and signatures
> - Deployed a single node cluster on CentOS 7.4 using the binary and source
> release
> - Ran hdfs commands
> - Ran pi and distributed shell using the default and docker runtimes
> - Verified the UIs
> - Verified the change log
>
> -Shane
>
>
> On Tue, Nov 14, 2017 at 2:34 PM, Andrew Wang 
> wrote:
>
> > Hi folks,
> >
> > Thanks as always to the many, many contributors who helped with this
> > release. I've created RC0 for Apache Hadoop 3.0.0. The artifacts are
> > available here:
> >
> > http://people.apache.org/~wang/3.0.0-RC0/
> >
> > This vote will run 5 days, ending on Nov 19th at 1:30pm Pacific.
> >
> > 3.0.0 GA contains 291 fixed JIRA issues since 3.0.0-beta1. Notable
> > additions include the merge of YARN resource types, API-based
> configuration
> > of the CapacityScheduler, and HDFS router-based federation.
> >
> > I've done my traditional testing with a pseudo cluster and a Pi job. My
> +1
> > to start.
> >
> > Best,
> > Andrew
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC2)

2017-11-13 Thread Xiao Chen

I have no issues now that RC3 is rolling. Thanks for the quick actions.

My point is I'm not sure that's even required. MIT licenses standard format
has that copyright line https://en.wikipedia.org/wiki/MIT_License, and
combined with the licensing-howto I referred earlier, HADOOP-15036 is not
necessary. Though IANAL.

-Xiao

On Mon, Nov 13, 2017 at 3:27 PM, Arun Suresh  wrote:

> Cancelling the RC2 to fix the LICENSE.txt issue.
> Will be rolling out an RC3 shortly. Given that the delta is just the
> LICENSE fix, we will be carrying over the votes from this thread as well.
>
> Cheers
> -Arun/Subru
>
> On Mon, Nov 13, 2017 at 2:59 PM, Arun Suresh  wrote:
>
>> Hi Xiao,
>> What Anu pointed out was that the copyright was missing in the LICENCE
>> not notice. Don't think we need to file LEGAL, since what we are doing is
>> similar to the other MIT dependencies.
>>
>> Cheers
>> -Arun
>>
>>
>>
>> On Mon, Nov 13, 2017 at 2:53 PM, Xiao Chen  wrote:
>>
>>> Thanks guys for pulling the RC and prompt reviews.
>>>
>>> From previous L&N discussions
>>> <https://issues.apache.org/jira/browse/HADOOP-12893?focusedC
>>> ommentId=15284739&page=com.atlassian.jira.plugin.system.issu
>>> etabpanels:comment-tabpanel#comment-15284739>
>>>  and http://www.apache.org/dev/licensing-howto.html#permissive-deps, my
>>> impression was BSD / MIT doesn't require a Notice. (Usually the copyright
>>> parts go to the notice). For this case, the trickiness comes from the
>>> License itself has this added line of copyright...
>>>
>>> Should we file a LEGAL jira to confirm first? There may (or may not) be
>>> other similar instances.
>>>
>>> -Xiao
>>>
>>> On Mon, Nov 13, 2017 at 2:34 PM, Arun Suresh  wrote:
>>>
>>> > Thanks for notifying Anu,
>>> > We've raised https://issues.apache.org/jira/browse/HADOOP-15036 to
>>> address
>>> > this and have included a patch.
>>> > Can you please take a look and possibly +1 if you are fine ?
>>> >
>>> > Cheers
>>> > -Arun
>>> >
>>> > On Mon, Nov 13, 2017 at 2:19 PM, Anu Engineer <
>>> aengin...@hortonworks.com>
>>> > wrote:
>>> >
>>> > > -1 (binding)
>>> > >
>>> > > Thank you for all the hard work on 2.9 series. Unfortunately, this
>>> is one
>>> > > of the times I have to -1 this release.
>>> > >
>>> > > Looks like HADOOP-14840 added a dependency on “oj! Algorithms -
>>> version
>>> > > 43.0”, but we have just added “oj! Algorithms - version 43.0” to the
>>> > > “LICENSE.txt”. The right addition to the LICENESE.txt should contain
>>> the
>>> > > original MIT License, especially “Copyright (c) 2003-2017
>>> Optimatika”.
>>> > >
>>> > > Please take a look at https://github.com/optimatika/
>>> > > ojAlgo/blob/master/LICENSE
>>> > >
>>> > > I am a +1 after this is fixed.
>>> > >
>>> > > Thanks
>>> > > Anu
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On 11/13/17, 9:50 AM, "Sunil G"  wrote:
>>> > >
>>> > > +1 (binding)
>>> > >
>>> > > Deployed cluster built from source.
>>> > >
>>> > >
>>> > >
>>> > >- Tested few cases in an HA cluster and tried to do failover
>>> by
>>> > > using
>>> > >rmadmin commands etc. This seems works fine including
>>> submitting
>>> > > apps.
>>> > >- I also tested many MR apps and all are running fine w/o any
>>> > > issues.
>>> > >- Majorly tested below feature sanity too (works fine)
>>> > >   - Application priority
>>> > >   - Application timeout
>>> > >- Tested basic NodeLabel scenarios.
>>> > >   - Added some labels to couple of nodes
>>> > >   - Verified old UI for labels
>>> > >   - Submitted apps to labelled cluster and it works fine.
>>> > >   - Also performed few cli commands related to nodelabel
>>> > >- Verified new YARN UI and accessed various pages when
>>> cluster was
>>> > > in
>>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC2)

2017-11-13 Thread Xiao Chen

Thanks guys for pulling the RC and prompt reviews.

>From previous L&N discussions

 and http://www.apache.org/dev/licensing-howto.html#permissive-deps, my
impression was BSD / MIT doesn't require a Notice. (Usually the copyright
parts go to the notice). For this case, the trickiness comes from the
License itself has this added line of copyright...

Should we file a LEGAL jira to confirm first? There may (or may not) be
other similar instances.

-Xiao

On Mon, Nov 13, 2017 at 2:34 PM, Arun Suresh  wrote:

> Thanks for notifying Anu,
> We've raised https://issues.apache.org/jira/browse/HADOOP-15036 to address
> this and have included a patch.
> Can you please take a look and possibly +1 if you are fine ?
>
> Cheers
> -Arun
>
> On Mon, Nov 13, 2017 at 2:19 PM, Anu Engineer 
> wrote:
>
> > -1 (binding)
> >
> > Thank you for all the hard work on 2.9 series. Unfortunately, this is one
> > of the times I have to -1 this release.
> >
> > Looks like HADOOP-14840 added a dependency on “oj! Algorithms - version
> > 43.0”, but we have just added “oj! Algorithms - version 43.0” to the
> > “LICENSE.txt”. The right addition to the LICENESE.txt should contain the
> > original MIT License, especially “Copyright (c) 2003-2017 Optimatika”.
> >
> > Please take a look at https://github.com/optimatika/
> > ojAlgo/blob/master/LICENSE
> >
> > I am a +1 after this is fixed.
> >
> > Thanks
> > Anu
> >
> >
> >
> >
> > On 11/13/17, 9:50 AM, "Sunil G"  wrote:
> >
> > +1 (binding)
> >
> > Deployed cluster built from source.
> >
> >
> >
> >- Tested few cases in an HA cluster and tried to do failover by
> > using
> >rmadmin commands etc. This seems works fine including submitting
> > apps.
> >- I also tested many MR apps and all are running fine w/o any
> > issues.
> >- Majorly tested below feature sanity too (works fine)
> >   - Application priority
> >   - Application timeout
> >- Tested basic NodeLabel scenarios.
> >   - Added some labels to couple of nodes
> >   - Verified old UI for labels
> >   - Submitted apps to labelled cluster and it works fine.
> >   - Also performed few cli commands related to nodelabel
> >- Verified new YARN UI and accessed various pages when cluster was
> > in
> >use. It seems fine to me.
> >
> >
> > Thanks all folks who participated in this release, appreciate the
> same!
> >
> > - Sunil
> >
> >
> > On Mon, Nov 13, 2017 at 3:01 AM Subru Krishnan 
> > wrote:
> >
> > > Hi Folks,
> > >
> > > Apache Hadoop 2.9.0 is the first release of Hadoop 2.9 line and
> will
> > be the
> > > starting release for Apache Hadoop 2.9.x line - it includes 30 New
> > Features
> > > with 500+ subtasks, 407 Improvements, 790 Bug fixes new fixed
> issues
> > since
> > > 2.8.2.
> > >
> > > More information about the 2.9.0 release plan can be found here:
> > > *
> > > https://cwiki.apache.org/confluence/display/HADOOP/
> > Roadmap#Roadmap-Version2.9
> > > <
> > > https://cwiki.apache.org/confluence/display/HADOOP/
> > Roadmap#Roadmap-Version2.9
> > > >*
> > >
> > > New RC is available at: http://home.apache.org/~
> > asuresh/hadoop-2.9.0-RC2/
> > > <
> > > http://www.google.com/url?q=http%3A%2F%2Fhome.apache.org%
> > 2F~asuresh%2Fhadoop-2.9.0-RC1%2F&sa=D&sntz=1&usg=
> > AFQjCNE7BF35IDIMZID3hPqiNglWEVsTpg
> > > >
> > >
> > > The RC tag in git is: release-2.9.0-RC2, and the latest commit id
> is:
> > > 1eb05c1dd48fbc9e4b375a76f2046a59103bbeb1.
> > >
> > > The maven artifacts are available via repository.apache.org at:
> > > https://repository.apache.org/content/repositories/
> > orgapachehadoop-1067/
> > > <
> > > https://www.google.com/url?q=https%3A%2F%2Frepository.
> > apache.org%2Fcontent%2Frepositories%2Forgapachehadoop-1066&sa=D&
> > sntz=1&usg=AFQjCNFcern4uingMV_sEreko_zeLlgdlg
> > > >
> > >
> > > Please try the release and vote; the vote will run for the usual 5
> > days,
> > > ending on Friday 17th November 2017 2pm PT time.
> > >
> > > We want to give a big shout out to Sunil, Varun, Rohith, Wangda,
> > Vrushali
> > > and Inigo for the extensive testing/validation which helped prepare
> > for
> > > RC2. Do report your results in this vote as it'll be very useful to
> > the
> > > entire community.
> > >
> > > Thanks,
> > > -Subru/Arun
> > >
> >
> >
> >
>

[jira] [Created] (HDFS-12726) BlockPlacementPolicyDefault's debugLoggingBuilder is not logged

2017-10-26 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12726:


 Summary: BlockPlacementPolicyDefault's debugLoggingBuilder is not 
logged
 Key: HDFS-12726
 URL: https://issues.apache.org/jira/browse/HDFS-12726
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: logging
Reporter: Xiao Chen
Assignee: Xiao Chen


During debugging HDFS-12725, {{BlockPlacementPolicyDefault's}} class' 
{{debugLoggingBuilder}} does a lot of {{get}} and {{append}}, but never 
{{toString}} and {{LOG.debug}}'ed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-10-26 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12725:


 Summary: BlockPlacementPolicyRackFaultTolerant still fails with 
racks with very few nodes
 Key: HDFS-12725
 URL: https://issues.apache.org/jira/browse/HDFS-12725
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Affects Versions: 3.0.0
Reporter: Xiao Chen
Assignee: Xiao Chen


HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
extremely rack-imbalanced cluster.

The added fall-back step of the fix could be improved to do a best-effort 
placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-12686) Erasure coding system policy state is not correctly saved and loaded during real cluster restart

2017-10-24 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen resolved HDFS-12686.
--
Resolution: Duplicate

Since HDFS-12682 should be able to handle this, and it's pretty hard to split 
that work into 2 jiras due to protobuf changes, I'll resolve this as a dup.

Thanks Sammi for filing the jira and Wei-Chiu for checking!

> Erasure coding system policy state is not correctly saved and loaded during 
> real cluster restart
> 
>
> Key: HDFS-12686
> URL: https://issues.apache.org/jira/browse/HDFS-12686
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
>
> Inspired by HDFS-12682,  I found the system erasure coding policy state will  
> not  be correctly saved and loaded in a real cluster.  Through there are such 
> kind of unit tests and all are passed with MiniCluster. It's because the 
> MiniCluster keeps the same static system erasure coding policy object after 
> the NN restart operation. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12682) ECAdmin -listPolicies will always show policy state as DISABLED

2017-10-18 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12682:


 Summary: ECAdmin -listPolicies will always show policy state as 
DISABLED
 Key: HDFS-12682
 URL: https://issues.apache.org/jira/browse/HDFS-12682
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: erasure-coding
Reporter: Xiao Chen
Assignee: Xiao Chen


On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as 
DISABLED.

{noformat}
[hdfs@nightly6x-1 root]$ hdfs ec -listPolicies
Erasure Coding Policies:
ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED]
ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED]
ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED]
ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
CellSize=1048576, Id=3, State=DISABLED]
ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED]
[hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec
XOR-2-1-1024k
{noformat}

This is because when [deserializing 
protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942],
 the static instance of [SystemErasureCodingPolicies 
class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101]
 is first checked, and always returns the cached policy objects, which are 
created by default with state=DISABLED.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12668) MetricsSystemImpl should consistently check minicluster mode

2017-10-16 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12668:


 Summary: MetricsSystemImpl should consistently check minicluster 
mode
 Key: HDFS-12668
 URL: https://issues.apache.org/jira/browse/HDFS-12668
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Xiao Chen
Priority: Minor


Found this when writing some tests related to JvmMetrics.
It appears {{JvmMetrics.initSingleton}} twice in minicluster works, but 
{{JvmMetrics.create}} twice doesn't.

This jira suggests to investigate whether this is intentional, and likely make 
the check of {{DefaultMetricsSystem.inMiniClusterMode()}} consistent in 
{{MetricsSystemImpl}} to ease testing.

{noformat}
org.apache.hadoop.metrics2.MetricsException: Metrics source JvmMetrics already 
exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
at 
org.apache.hadoop.metrics2.source.JvmMetrics.create(JvmMetrics.java:95)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12642) Log block and datanode details in BlockRecoveryWorker

2017-10-11 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12642:


 Summary: Log block and datanode details in BlockRecoveryWorker
 Key: HDFS-12642
 URL: https://issues.apache.org/jira/browse/HDFS-12642
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Xiao Chen
Assignee: Xiao Chen


In a recent investigation, we have seen a weird block recovery issue, which is 
difficult to reach to a conclusion because of insufficient logs.

For the most critical part of the events, we see block recovery failed to 
{{commitBlockSynchronization]} on the NN, due to the block not closed. This 
leaves the file as open forever (for 1+ months).

The reason the block was not closed on NN, was because it is configured with 
{{dfs.namenode.replication.min}} =2, and only 1 replica was with the latest 
genstamp.

We were not able to tell why only 1 replica is on latest genstamp.

>From the primary node of the recovery (ps2204), {{initReplicaRecoveryImpl}} 
>was called on each of the 7 DNs the block were ever placed. All DNs but ps2204 
>and ps3765 failed because of genstamp comparison - that's expected. ps2204 and 
>ps3765 have gone past the comparison (since no exceptions from their logs), 
>but {{updateReplicaUnderRecovery}} only appeared to be called on ps3765.

This jira is to propose we log more details when {{BlockRecoveryWorker}} is 
about to call {{updateReplicaUnderRecovery}} on the DataNodes, so this could be 
figured out in the future.

{noformat}
$ grep "updateReplica:" ps2204.dn.log 
$ grep "updateReplica:" ps3765.dn.log 
hadoop-hdfs-datanode-ps3765.log.2:{"@timestamp":"2017-09-13T00:56:20.933Z","source_host":"ps3765.example.com","file":"FsDatasetImpl.java","method":"updateReplicaUnderRecovery","level":"INFO","line_number":"2512","thread_name":"IPC
 Server handler 6 on 
50020","@version":1,"logger_name":"org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl","message":"updateReplica:
 BP-550436645-17.142.147.13-1438988035284:blk_2172795728_1106150312, 
recoveryId=1107074793, length=65024, replica=ReplicaUnderRecovery, 
blk_2172795728_1106150312, RUR
$ grep "initReplicaRecovery:" ps2204.dn.log 
hadoop-hdfs-datanode-ps2204.log.1:{"@timestamp":"2017-09-13T00:56:20.691Z","source_host":"ps2204.example.com","file":"FsDatasetImpl.java","method":"initReplicaRecoveryImpl","level":"INFO","line_number":"2441","thread_name":"org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@5ae3cb26","@version":1,"logger_name":"org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl","message":"initReplicaRecovery:
 blk_2172795728_1106150312, recoveryId=1107074793, 
replica=ReplicaWaitingToBeRecovered, blk_2172795728_1106150312, RWR
hadoop-hdfs-datanode-ps2204.log.1:{"@timestamp":"2017-09-13T00:56:20.691Z","source_host":"ps2204.example.com","file":"FsDatasetImpl.java","method":"initReplicaRecoveryImpl","level":"INFO","line_number":"2497","thread_name":"org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@5ae3cb26","@version":1,"logger_name":"org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl","message":"initReplicaRecovery:
 changing replica state for blk_2172795728_1106150312 from RWR to 
RUR","class":"org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl","mdc":{}}
$ grep "initReplicaRecovery:" ps3765.dn.log 
hadoop-hdfs-datanode-ps3765.log.2:{"@timestamp":"2017-09-13T00:56:20.457Z","source_host":"ps3765.example.com","file":"FsDatasetImpl.java","method":"initReplicaRecoveryImpl","level":"INFO","line_number":"2441","thread_name":"IPC
 Server handler 5 on 
50020","@version":1,"logger_name":"org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl","message":"initReplicaRecovery:
 blk_2172795728_1106150312, recoveryId=1107074793, replica=ReplicaBeingWritten, 
blk_2172795728_1106150312, RBW
hadoop-hdfs-datanode-ps3765.log.2:{"@timestamp":"2017-09-13T00:56:20.457Z","source_host":"ps3765.example.com","file":"FsDatasetImpl.java","method":"initReplicaRe

[jira] [Created] (HDFS-12596) Add TestFsck#testFsckCorruptWhenOneReplicaIsCorrupt back to branch-2.7

2017-10-05 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12596:


 Summary: Add TestFsck#testFsckCorruptWhenOneReplicaIsCorrupt back 
to branch-2.7
 Key: HDFS-12596
 URL: https://issues.apache.org/jira/browse/HDFS-12596
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 2.7.4
Reporter: Xiao Chen
Assignee: Xiao Chen


See 
https://issues.apache.org/jira/browse/HDFS-11743?focusedCommentId=16186328&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16186328



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12578) TestDeadDatanode#testNonDFSUsedONDeadNodeReReg failing in branch-2.7

2017-10-02 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12578:


 Summary: TestDeadDatanode#testNonDFSUsedONDeadNodeReReg failing in 
branch-2.7
 Key: HDFS-12578
 URL: https://issues.apache.org/jira/browse/HDFS-12578
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Xiao Chen
Priority: Blocker


It appears {{TestDeadDatanode#testNonDFSUsedONDeadNodeReReg}} is consistently 
failing in branch-2.7. We should investigate and fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.0-beta1 RC0

2017-09-28 Thread Xiao Chen

Thanks Andrew!

+1 (binding)

   - Verified md5 matches
   - Built from src tarball
   - Started pseudo-distributed hdfs cluster with kms, ran basic operations
   - Sanity checked logs and NN webui

P.S. Obviously Andrew meant to say Oct. 3rd, not November. :)

-Xiao

On Thu, Sep 28, 2017 at 5:04 PM, Andrew Wang 
wrote:

> Hi all,
>
> Let me start, as always, by thanking the many, many contributors who helped
> with this release! I've prepared an RC0 for 3.0.0-beta1:
>
> http://home.apache.org/~wang/3.0.0-beta1-RC0/
>
> This vote will run five days, ending on Nov 3rd at 5PM Pacific.
>
> beta1 contains 576 fixed JIRA issues comprising a number of bug fixes,
> improvements, and feature enhancements. Notable additions include the
> addition of YARN Timeline Service v2 alpha2, S3Guard, completion of the
> shaded client, and HDFS erasure coding pluggable policy support.
>
> I've done the traditional testing of running a Pi job on a pseudo cluster.
> My +1 to start.
>
> We're working internally on getting this run through our integration test
> rig. I'm hoping Vijay or Ray can ring in with a +1 once that's complete.
>
> Best,
> Andrew
>

[jira] [Reopened] (HDFS-8865) Improve quota initialization performance

2017-09-28 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reopened HDFS-8865:
-

Sorry to reopen. Trying to backport this to branch-2.7 and branch-2.6, 
attaching patches...

> Improve quota initialization performance
> 
>
> Key: HDFS-8865
> URL: https://issues.apache.org/jira/browse/HDFS-8865
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-8865_branch-2.6.patch, HDFS-8865_branch-2.7.patch, 
> HDFS-8865.patch, HDFS-8865.v2.checkstyle.patch, HDFS-8865.v2.patch, 
> HDFS-8865.v3.patch
>
>
> After replaying edits, the whole file system tree is recursively scanned in 
> order to initialize the quota. For big name space, this can take a very long 
> time.  Since this is done during namenode failover, it also affects failover 
> latency.
> By using the Fork-Join framework, I was able to greatly reduce the 
> initialization time.  The following is the test result using the fsimage from 
> one of the big name nodes we have.
> || threads || seconds||
> | 1 (existing) | 55|
> | 1 (fork-join) | 68 |
> | 4 | 16 |
> | 8 | 8 |
> | 12 | 6 |
> | 16 | 5 |
> | 20 | 4 |



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12518) Re-encryption should handle task cancellation and progress better

2017-09-20 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12518:


 Summary: Re-encryption should handle task cancellation and 
progress better
 Key: HDFS-12518
 URL: https://issues.apache.org/jira/browse/HDFS-12518
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption
Affects Versions: 3.0.0-beta1
Reporter: Xiao Chen
Assignee: Xiao Chen


Re-encryption should handle task cancellation and progress tracking better in 
general.

In a recent internal report, a canceled re-encryption could lead to the 
progress of the zone being 'Processing' forever. Sending a new cancel command 
would make it complete, but new re-encryptions for the same zone wouldn't work 
because the canceled future is not removed.

This jira proposes to fix that, and enhance the currently handling so new 
command would start from a clean state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12400) Provide a way for NN to drain the local key cache before re-encryption

2017-09-06 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12400:


 Summary: Provide a way for NN to drain the local key cache before 
re-encryption
 Key: HDFS-12400
 URL: https://issues.apache.org/jira/browse/HDFS-12400
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption
Affects Versions: 3.0.0-beta1
Reporter: Xiao Chen
Assignee: Xiao Chen


In HDFS-12359, a fix for the KMS ACLs required for re-encryption was done. As 
part of the fix,  the following code is used to make sure the local provider 
cache in the NN is drained.
{code:java}
if (dir.getProvider() instanceof CryptoExtension) {
  ((CryptoExtension) dir.getProvider()).drain(keyName);
}
{code}
This doesn't work, because the provider is {{KeyProviderCryptoExtension}} 
instead of {{CryptoExtension}} - the latter is composite of the former.

Unfortunately unit test didn't catch this, because it conveniently rolled the 
from the NN's provider.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12383) Re-encryption updater should handle canceled tasks

2017-08-31 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12383:


 Summary: Re-encryption updater should handle canceled tasks 
 Key: HDFS-12383
 URL: https://issues.apache.org/jira/browse/HDFS-12383
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption
Affects Versions: 3.0.0-beta1
Reporter: Xiao Chen
Assignee: Xiao Chen


Seen an instance where the re-encryption updater exited due to an exception, 
and later tasks no longer executes. Logs below:
{noformat}
2017-08-31 09:54:08,104 INFO 
org.apache.hadoop.hdfs.server.namenode.EncryptionZoneManager: Zone 
/tmp/encryption-zone-3(16819) is submitted for re-encryption.
2017-08-31 09:54:08,104 INFO 
org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler: Executing 
re-encrypt commands on zone 16819. Current zones:[zone:16787 state:Completed 
lastProcessed:null filesReencrypted:1 fileReencryptionFailures:0][zone:16813 
state:Completed lastProcessed:null filesReencrypted:1 
fileReencryptionFailures:0][zone:16819 state:Submitted lastProcessed:null 
filesReencrypted:0 fileReencryptionFailures:0]
2017-08-31 09:54:08,105 INFO 
org.apache.hadoop.hdfs.protocol.ReencryptionStatus: Zone 16819 starts 
re-encryption processing
2017-08-31 09:54:08,105 INFO 
org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler: Re-encrypting zone 
/tmp/encryption-zone-3(id=16819)
2017-08-31 09:54:08,105 INFO 
org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler: Submitted batch 
(start:/tmp/encryption-zone-3/data1, size:1) of zone 16819 to re-encrypt.
2017-08-31 09:54:08,105 INFO 
org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler: Submission 
completed of zone 16819 for re-encryption.
2017-08-31 09:54:08,105 INFO 
org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler: Processing batched 
re-encryption for zone 16819, batch size 1, start:/tmp/encryption-zone-3/data1
2017-08-31 09:54:08,979 INFO BlockStateChange: BLOCK* BlockManager: ask 
172.26.1.71:20002 to delete [blk_1073742291_1467]
2017-08-31 09:54:18,295 INFO 
org.apache.hadoop.hdfs.server.namenode.ReencryptionUpdater: Cancelling 1 
re-encryption tasks
2017-08-31 09:54:18,295 INFO 
org.apache.hadoop.hdfs.server.namenode.EncryptionZoneManager: Cancelled zone 
/tmp/encryption-zone-3(16819) for re-encryption.
2017-08-31 09:54:18,295 INFO 
org.apache.hadoop.hdfs.protocol.ReencryptionStatus: Zone 16819 completed 
re-encryption.
2017-08-31 09:54:18,296 INFO 
org.apache.hadoop.hdfs.server.namenode.ReencryptionHandler: Completed 
re-encrypting one batch of 1 edeks from KMS, time consumed: 10.19 s, start: 
/tmp/encryption-zone-3/data1.
2017-08-31 09:54:18,296 ERROR 
org.apache.hadoop.hdfs.server.namenode.ReencryptionUpdater: Re-encryption 
updater thread exiting.
java.util.concurrent.CancellationException
at java.util.concurrent.FutureTask.report(FutureTask.java:121)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.hadoop.hdfs.server.namenode.ReencryptionUpdater.takeAndProcessTasks(ReencryptionUpdater.java:404)
at 
org.apache.hadoop.hdfs.server.namenode.ReencryptionUpdater.run(ReencryptionUpdater.java:250)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Updater should be fixed to handle cancelled tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12378) TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk

2017-08-30 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12378:


 Summary: 
TestClientProtocolForPipelineRecovery#testZeroByteBlockRecovery fails on trunk
 Key: HDFS-12378
 URL: https://issues.apache.org/jira/browse/HDFS-12378
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Xiao Chen


Saw on 
https://builds.apache.org/job/PreCommit-HDFS-Build/20928/testReport/org.apache.hadoop.hdfs/TestClientProtocolForPipelineRecovery/testZeroByteBlockRecovery/:


Error Message
{noformat}
Failed to replace a bad datanode on the existing pipeline due to no more good 
datanodes being available to try. (Nodes: 
current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
 
original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
 The current failed datanode replacement policy is ALWAYS, and a client may 
configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' 
in its configuration.
{noformat}
Stacktrace
{noformat}
java.io.IOException: Failed to replace a bad datanode on the existing pipeline 
due to no more good datanodes being available to try. (Nodes: 
current=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]],
 
original=[DatanodeInfoWithStorage[127.0.0.1:51925,DS-274e8cc9-280b-4370-b494-6a4f0d67ccf4,DISK]]).
 The current failed datanode replacement policy is ALWAYS, and a client may 
configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' 
in its configuration.
at 
org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1322)
at 
org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1388)
at 
org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1587)
at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1488)
at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1470)
at 
org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1274)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:684)
{noformat}
Standard Output
{noformat}
2017-08-30 18:02:37,714 [main] INFO  hdfs.MiniDFSCluster 
(MiniDFSCluster.java:(469)) - starting cluster: numNameNodes=1, 
numDataNodes=3
Formatting using clusterid: testClusterID
2017-08-30 18:02:37,716 [main] INFO  namenode.FSEditLog 
(FSEditLog.java:newInstance(224)) - Edit logging is async:false
2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
(FSNamesystem.java:(742)) - KeyProvider: null
2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
(FSNamesystemLock.java:(120)) - fsLock is fair: true
2017-08-30 18:02:37,716 [main] INFO  namenode.FSNamesystem 
(FSNamesystemLock.java:(136)) - Detailed lock hold time metrics enabled: 
false
2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
(FSNamesystem.java:(763)) - fsOwner = jenkins (auth:SIMPLE)
2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
(FSNamesystem.java:(764)) - supergroup  = supergroup
2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
(FSNamesystem.java:(765)) - isPermissionEnabled = true
2017-08-30 18:02:37,717 [main] INFO  namenode.FSNamesystem 
(FSNamesystem.java:(776)) - HA Enabled: false
2017-08-30 18:02:37,718 [main] INFO  common.Util 
(Util.java:isDiskStatsEnabled(395)) - 
dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO 
profiling
2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
(DatanodeManager.java:(301)) - dfs.block.invalidate.limit: 
configured=1000, counted=60, effected=1000
2017-08-30 18:02:37,718 [main] INFO  blockmanagement.DatanodeManager 
(DatanodeManager.java:(309)) - 
dfs.namenode.datanode.registration.ip-hostname-check=true
2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
(InvalidateBlocks.java:printBlockDeletionTime(76)) - 
dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2017-08-30 18:02:37,719 [main] INFO  blockmanagement.BlockManager 
(InvalidateBlocks.java:printBlockDeletionTime(82)) - The block deletion will 
start around 2017 Aug 30 18:02:37
2017-08-30 18:02:37,719 [main] INFO  util.GSet 
(LightWeightGSet.java:computeCapacity(395)) - Computing capacity for map 
BlocksMap
2017-08-30 18:02:37,719 [main] INFO  util.GSet 
(LightWeightGSet.java:computeCapacity(396)) - VM type   = 64-bit
2017-08-30 18:02:37,720 [main] INFO  util.GSet 
(LightWeightGSet.java:computeCapacity(397)) - 2.0% max memory 1.8 GB = 36.4 MB
2017-08-30 18:02:37,720 [main] INFO  util.GSet 
(LightWeightGSet.java:computeCapacity(402)) - capacity  = 2^22 = 4194304 
entries
2017-08-30 18:02:37,726 [main] INFO  blockmanagement.BlockManager 
(BlockManager.java:createBlockTo

[jira] [Created] (HDFS-12369) Edit log corruption due to hard lease recovery of not-closed file

2017-08-28 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12369:


 Summary: Edit log corruption due to hard lease recovery of 
not-closed file
 Key: HDFS-12369
 URL: https://issues.apache.org/jira/browse/HDFS-12369
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Xiao Chen
Assignee: Xiao Chen


HDFS-6257 and HDFS-7707 worked hard to prevent corruption from combinations of 
client operations.

Recently, we have observed NN not able to start with the following exception:
{noformat}
2017-08-17 14:32:18,418 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: 
Failed to start namenode.
java.io.FileNotFoundException: File does not exist: 
/home/Events/CancellationSurvey_MySQL/2015/12/31/.part-0.9nlJ3M
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:429)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:897)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:750)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:318)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1125)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:789)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
{noformat}

Quoting a nicely analysed edits:
{quote}
In the edits logged about 1 hour later, we see this failing OP_CLOSE. The 
sequence in the edits shows the file going through:

  OPEN
  ADD_BLOCK
  CLOSE
  ADD_BLOCK # perhaps this was an append
  DELETE
  (about 1 hour later) CLOSE

It is interesting that there was no CLOSE logged before the delete.
{quote}

Grepping that file name, it turns out the close was triggered by lease reaching 
hard limit.
{noformat}
2017-08-16 15:05:45,927 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
  Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_-1997177597_28, pending 
creates: 75], 
  src=/home/Events/CancellationSurvey_MySQL/2015/12/31/.part-0.9nlJ3M
2017-08-16 15:05:45,927 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
  internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
  /home/Events/CancellationSurvey_MySQL/2015/12/31/.part-0.9nlJ3M closed.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12363) Possible NPE in BlockManager$StorageInfoDefragmenter#scanAndCompactStorages

2017-08-27 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12363:


 Summary: Possible NPE in 
BlockManager$StorageInfoDefragmenter#scanAndCompactStorages
 Key: HDFS-12363
 URL: https://issues.apache.org/jira/browse/HDFS-12363
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Xiao Chen
Assignee: Xiao Chen


Saw NN going down with NPE below:

{noformat}
ERROR org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Thread 
received Runtime exception.
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$StorageInfoDefragmenter.scanAndCompactStorages(BlockManager.java:3897)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$StorageInfoDefragmenter.run(BlockManager.java:3852)
at java.lang.Thread.run(Thread.java:745)
2017-08-21 22:14:05,303 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1
2017-08-21 22:14:05,313 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
{noformat}

In that version, {{BlockManager}} code is:
{code}
3896  try {
3897   DatanodeStorageInfo storage = datanodeManager.
3898 getDatanode(datanodesAndStorages.get(i)).
3899getStorageInfo(datanodesAndStorages.get(i + 1));
3900if (storage != null) {
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12359) Re-encryption should operate with minimum KMS ACL requirements.

2017-08-25 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12359:


 Summary: Re-encryption should operate with minimum KMS ACL 
requirements.
 Key: HDFS-12359
 URL: https://issues.apache.org/jira/browse/HDFS-12359
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: encryption
Affects Versions: 3.0.0-beta1
Reporter: Xiao Chen
Assignee: Xiao Chen


This was caught from KMS acl testing.

HDFS-10899 gets the current key versions from KMS directly, which requires 
{{READ}} acls.
It also calls invalidateCache, which requires {{MANAGEMENT}} acls.

We should fix re-encryption to not require additional ACLs than original 
encryption.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12347) TestBalancerRPCDelay#testBalancerRPCDelay fails consistently

2017-08-23 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12347:


 Summary: TestBalancerRPCDelay#testBalancerRPCDelay fails 
consistently
 Key: HDFS-12347
 URL: https://issues.apache.org/jira/browse/HDFS-12347
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0-beta1
Reporter: Xiao Chen


Seems to be failing consistently on trunk from yesterday-ish.

A sample failure is 
https://builds.apache.org/job/PreCommit-HDFS-Build/20824/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerRPCDelay/testBalancerRPCDelay/

Running locally failed with:
{noformat}

{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Why aren't delegation token operations audit logged?

2017-08-15 Thread Xiao Chen

Thanks Allen for bringing more history to this, and Erik for the discussion.

True they're logged in NN logs, but with different log rotation and level
of convenience for investigation, so I can see value adding these.
Will pursue the implementation on HDFS-12300 - more discussions welcome!

-Xiao

On Tue, Aug 15, 2017 at 8:10 AM, Erik Krogen  wrote:

> Given that the current audit log also includes the majority of read-only
> operations (getfileinfo, liststatus, etc.) it seems to me that the audit
> log's purpose has changed to be more of a record of both modifications and
> queries against the file system's metadata. The delegation token related
> operations match closely with what is currently in the audit log. Our team
> was also surprised to find that they were not currently present. Especially
> given that we have HDFS-6888 to limit the size of the audit log by omitting
> common operations, it does not seem harmful to add these token ops.
>
> Erik
>
> On 8/14/17, 5:44 PM, "Allen Wittenauer"  wrote:
>
> [You don't often get email from a...@apache.org. Learn why this is
> important at http://aka.ms/LearnAboutSenderIdentification.]
>
> On 2017-08-14 11:52, Xiao Chen  wrote:
>
> > When inspecting the code, I found that the following methods in
> > FSNamesystem are not audit logged:
>
> ...
>
> > I checked with ATM hoping for some history, but no known to him.
> Anyone
> > know the reason to not audit log these?
>
> The audit log was designed for keeping track of things that
> actually change the contents/metadata of the file system. Other HDFS
> operations were getting logged to the NN log or some other more appropriate
> to limit the noise.
>
> https://effectivemachines.com/2017/03/08/unofficial-history-
> of-the-hdfs-audit-log/
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>
>
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>

Re: Why aren't delegation token operations audit logged?

2017-08-14 Thread Xiao Chen

Thanks a lot Daryn! Filed https://issues.apache.org/jira/browse/HDFS-12300.


-Xiao

On Mon, Aug 14, 2017 at 12:46 PM, Daryn Sharp  wrote:

> I don't think there's a historical reason for not logging token ops, and
> have no objections to logging them – as long as the log line does not
> contain anything like the identifier/password.  My first thought was
> logging overhead but I checked our clusters and the rate of logging would
> be insignificant.
>
> Daryn
>
> On Mon, Aug 14, 2017 at 1:52 PM, Xiao Chen  wrote:
>
>> Hello,
>>
>> When inspecting the code, I found that the following methods in
>> FSNamesystem are not audit logged:
>>
>>- getDelegationToken
>>- renewDelegationToken
>>- cancelDelegationToken
>>
>> The audit log itself does have a logTokenTrackingId
>> <https://github.com/apache/hadoop/blob/branch-3.0.0-alpha4/
>> hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/
>> hadoop/hdfs/server/namenode/FSNamesystem.java#L7432>
>> field
>> to additionally log some details when a token is used for authentication.
>> But why aren't the token operations themselves audit logged?
>>
>> I checked with ATM hoping for some history, but no known to him. Anyone
>> know the reason to not audit log these?
>>
>> Thanks,
>> -Xiao
>>
>
>

[jira] [Created] (HDFS-12300) Audit-log delegation token related operations

2017-08-14 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12300:


 Summary: Audit-log delegation token related operations
 Key: HDFS-12300
 URL: https://issues.apache.org/jira/browse/HDFS-12300
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 0.22.0
Reporter: Xiao Chen
Assignee: Xiao Chen


When inspecting the code, I found that the following methods in FSNamesystem 
are not audit logged:
- getDelegationToken
- renewDelegationToken
- cancelDelegationToken

The audit log itself does have a logTokenTrackingId field to additionally log 
some details when a token is used for authentication.

After emailing the community, we should add that.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Why aren't delegation token operations audit logged?

2017-08-14 Thread Xiao Chen

Hello,

When inspecting the code, I found that the following methods in
FSNamesystem are not audit logged:

   - getDelegationToken
   - renewDelegationToken
   - cancelDelegationToken

The audit log itself does have a logTokenTrackingId

field
to additionally log some details when a token is used for authentication.
But why aren't the token operations themselves audit logged?

I checked with ATM hoping for some history, but no known to him. Anyone
know the reason to not audit log these?

Thanks,
-Xiao

[jira] [Created] (HDFS-12261) Improve HdfsAdmin's javadoc regarding superusers.

2017-08-04 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12261:


 Summary: Improve HdfsAdmin's javadoc regarding superusers.
 Key: HDFS-12261
 URL: https://issues.apache.org/jira/browse/HDFS-12261
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Xiao Chen
Assignee: Xiao Chen
Priority: Minor


>From [discussions 
>of|https://issues.apache.org/jira/browse/HDFS-10899?focusedCommentId=16113267&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16113267]
> HDFS-10899, we should improve the javadoc accordingly.
{quote}
This is not actually true, this class is just for HDFS-specific operations. 
Putting "Admin" in the name is a misnomer, and since this continues to be 
confusing, maybe we should enhance the class javadoc to make this explicit.
{quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-9590) NPE in Storage$StorageDirectory#unlock()

2017-07-11 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen resolved HDFS-9590.
-
Resolution: Cannot Reproduce

Haven't seen this since, closing...

> NPE in Storage$StorageDirectory#unlock()
> 
>
> Key: HDFS-9590
> URL: https://issues.apache.org/jira/browse/HDFS-9590
> Project: Hadoop HDFS
>  Issue Type: Bug
>    Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9590.01.patch
>
>
> The code looks to be possible to have race conditions in multiple-threaded 
> runs.
> {code}
> public void unlock() throws IOException {
>   if (this.lock == null)
> return;
>   this.lock.release();
>   lock.channel().close();
>   lock = null;
> }
> {code}
> This is called in a handful of places, and I don't see any protection. Shall 
> we add some synchronization mechanism? Not sure if I missed any design 
> assumptions here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12116) BlockReportTestBase#blockReport_08 and #blockReport_08 intermittently fail

2017-07-10 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12116:


 Summary: BlockReportTestBase#blockReport_08 and #blockReport_08 
intermittently fail
 Key: HDFS-12116
 URL: https://issues.apache.org/jira/browse/HDFS-12116
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Xiao Chen
Assignee: Xiao Chen


This seems to be long-standing, but the failure rate (~10%) is slightly higher 
in dist-test run in using cdh.
In both _08 and _09 tests:
# an attempt is made to make a replica in {{TEMPORARY}}
 state, by {{waitForTempReplica}}.
# Once that's returned, the test goes on to verify block reports shows correct 
pending replication blocks.

But there's a race condition. If the replica is replicated between steps #1 and 
#2, {{getPendingReplicationBlocks}} could return 0 or 1, depending on how many 
replicas are replicated, hence failing the test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.0-alpha4-RC0

2017-07-06 Thread Xiao Chen

Thanks Andrew!
+1 (non-binding)

   - Verified md5's, checked tarball sizes are reasonable
   - Built source tarball and deployed a pseudo-distributed cluster with
   hdfs/kms
   - Tested basic hdfs/kms operations
   - Sanity checked webuis/logs


-Xiao

On Wed, Jul 5, 2017 at 10:33 PM, John Zhuge  wrote:

> +1 (non-binding)
>
>
>- Verified checksums and signatures of the tarballs
>- Built source with native, Java 1.8.0_131 on Mac OS X 10.12.5
>- Cloud connectors:
>   - A few S3A integration tests
>   - A few ADL live unit tests
>- Deployed both binary and built source to a pseudo cluster, passed the
>following sanity tests in insecure, SSL, and SSL+Kerberos mode:
>   - HDFS basic and ACL
>   - DistCp basic
>   - WordCount (skipped in Kerberos mode)
>   - KMS and HttpFS basic
>
> Thanks Andrew for the great effort!
>
> On Wed, Jul 5, 2017 at 1:33 PM, Eric Payne  invalid>
> wrote:
>
> > Thanks Andrew.
> > I downloaded the source, built it, and installed it onto a pseudo
> > distributed 4-node cluster.
> >
> > I ran mapred and streaming test cases, including sleep and wordcount.
> > +1 (non-binding)
> > -Eric
> >
> >   From: Andrew Wang 
> >  To: "common-...@hadoop.apache.org" ; "
> > hdfs-dev@hadoop.apache.org" ; "
> > mapreduce-...@hadoop.apache.org" ; "
> > yarn-...@hadoop.apache.org" 
> >  Sent: Thursday, June 29, 2017 9:41 PM
> >  Subject: [VOTE] Release Apache Hadoop 3.0.0-alpha4-RC0
> >
> > Hi all,
> >
> > As always, thanks to the many, many contributors who helped with this
> > release! I've prepared an RC0 for 3.0.0-alpha4:
> >
> > http://home.apache.org/~wang/3.0.0-alpha4-RC0/
> >
> > The standard 5-day vote would run until midnight on Tuesday, July 4th.
> > Given that July 4th is a holiday in the US, I expect this vote might have
> > to be extended, but I'd like to close the vote relatively soon after.
> >
> > I've done my traditional testing of a pseudo-distributed cluster with a
> > single task pi job, which was successful.
> >
> > Normally my testing would end there, but I'm slightly more confident this
> > time. At Cloudera, we've successfully packaged and deployed a snapshot
> from
> > a few days ago, and run basic smoke tests. Some bugs found from this
> > include HDFS-11956, which fixes backwards compat with Hadoop 2 clients,
> and
> > the revert of HDFS-11696, which broke NN QJM HA setup.
> >
> > Vijay is working on a test run with a fuller test suite (the results of
> > which we can hopefully post soon).
> >
> > My +1 to start,
> >
> > Best,
> > Andrew
> >
> >
> >
> >
>
>
>
> --
> John
>

[jira] [Resolved] (HDFS-12073) Add option to oiv to print out more information about snapshots.

2017-06-30 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen resolved HDFS-12073.
--
Resolution: Duplicate

It seems HDFS-10506 did this sorry didn't find it before filing

> Add option to oiv to print out more information about snapshots.
> 
>
> Key: HDFS-12073
> URL: https://issues.apache.org/jira/browse/HDFS-12073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 2.6.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>
> OIV can print out snapshot diff sections with filediff/dirdiff nicely. But it 
> does not print out the snapshotCopy object, which can potentially show more 
> information.
> Let's add an option to also print those.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12073) Add option to oiv to print out more information about snapshots.

2017-06-30 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-12073:


 Summary: Add option to oiv to print out more information about 
snapshots.
 Key: HDFS-12073
 URL: https://issues.apache.org/jira/browse/HDFS-12073
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 2.6.0
Reporter: Xiao Chen
Assignee: Xiao Chen


OIV can print out snapshot diff sections with filediff/dirdiff nicely. But it 
does not print out the snapshotCopy object, which can potentially show more 
information.

Let's add an option to also print those.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Jenkins failures

2017-06-01 Thread Xiao Chen

There was some maven-caused failures last night, and should be fixed by
INFRA-14261 .
Saw another error just now but not sure if that's just my run or something
globally.

Thanks,

-Xiao

On Thu, Jun 1, 2017 at 10:12 AM, Anu Engineer 
wrote:

> Scratch that , it looks like Jenkins is just really slow in picking up the
> patches. Failures are all normal.
>
> Thanks
> Anu
>
>
>
>
>
> On 6/1/17, 10:04 AM, "Anu Engineer"  wrote:
>
> >Hi All,
> >
> >Looks like we are having failures in the Jenkins pipeline. Would someone
> with access to build machines be able to take a look ? Not able to see
> human readable build logs from builds.apache.org.
> >I can see a message saying builds have been broken since build #19584.
> >
> >Thanks in advance
> >Anu
> >
>

[jira] [Created] (HDFS-11904) Reuse iip in unprotectedRemoveXAttrs calls

2017-05-30 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11904:


 Summary: Reuse iip in unprotectedRemoveXAttrs calls
 Key: HDFS-11904
 URL: https://issues.apache.org/jira/browse/HDFS-11904
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Xiao Chen
Assignee: Xiao Chen


In HDFS-10939, {{unprotectedSetXAttrs}} was optimized to use IIP instead of 
path string.
This jira is to do the same on {{unprotectedRemoveXAttrs}}.

No performance test specifically for this is done, but since the optimization 
is trivial to do I think we should do it to save future effort.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-11421) Make WebHDFS' ACLs RegEx configurable

2017-05-25 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen resolved HDFS-11421.
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.0

Committed to branch-2, thanks Harsh!

> Make WebHDFS' ACLs RegEx configurable
> -
>
> Key: HDFS-11421
> URL: https://issues.apache.org/jira/browse/HDFS-11421
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.6.0
>Reporter: Harsh J
>Assignee: Harsh J
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: HDFS-11421.000.patch, HDFS-11421-branch-2.000.patch, 
> HDFS-11421.branch-2.001.patch, HDFS-11421.branch-2.003.patch
>
>
> Part of HDFS-5608 added support for GET/SET ACLs over WebHDFS. This currently 
> identifies the passed arguments via a hard-coded regex that mandates certain 
> group and user naming styles.
> A similar limitation had existed before for CHOWN and other User/Group set 
> related operations of WebHDFS, where it was then made configurable via 
> HDFS-11391 + HDFS-4983.
> Such configurability should be allowed for the ACL operations too.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11876) Make WebHDFS' ACLs RegEx configurable Testing

2017-05-24 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11876:


 Summary: Make WebHDFS' ACLs RegEx configurable Testing
 Key: HDFS-11876
 URL: https://issues.apache.org/jira/browse/HDFS-11876
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Xiao Chen
Assignee: Xiao Chen
Priority: Trivial
 Attachments: HDFS-11421.001.patch

See HDFS-11421, running branch-2 test here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-11421) Make WebHDFS' ACLs RegEx configurable

2017-05-24 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reopened HDFS-11421:
--

Thanks for chiming in, folks.

I have returned from paternity leave today, sorry for the delay in the past 
months.

I think the branch-2 patch is still good. It seems to me the patch applies ok, 
but failed to compile, due to a line number change and a missing import.

I'm attaching a modified version of the branch-2 patch, which compiles and 
passes changed tests locally.

Given this has been a while, I'd like to be on the safe side and run a more 
thorough branch-2 test - creating a duplicate jira (that doesn't associate with 
github) to do so.

> Make WebHDFS' ACLs RegEx configurable
> -
>
> Key: HDFS-11421
> URL: https://issues.apache.org/jira/browse/HDFS-11421
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.6.0
>Reporter: Harsh J
>Assignee: Harsh J
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-11421.000.patch, HDFS-11421-branch-2.000.patch
>
>
> Part of HDFS-5608 added support for GET/SET ACLs over WebHDFS. This currently 
> identifies the passed arguments via a hard-coded regex that mandates certain 
> group and user naming styles.
> A similar limitation had existed before for CHOWN and other User/Group set 
> related operations of WebHDFS, where it was then made configurable via 
> HDFS-11391 + HDFS-4983.
> Such configurability should be allowed for the ACL operations too.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-11379) DFSInputStream may infinite loop requesting block locations

2017-02-14 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reopened HDFS-11379:
--

Sorry for reopening this. Attaching a branch-2.7 patch.

2.6 seems will be EOL'ed soon so maybe we can skip (and more conflicts 
seemingly due to the lack of HDFS-7495). thoughts?

> DFSInputStream may infinite loop requesting block locations
> ---
>
> Key: HDFS-11379
> URL: https://issues.apache.org/jira/browse/HDFS-11379
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: HDFS-11379.branch-2.patch, HDFS-11379.trunk.patch
>
>
> DFSInputStream creation caches file size and initial range of locations.  If 
> the file is truncated (or replaced) and the client attempts to read outside 
> the initial range, the client goes into a tight infinite looping requesting 
> locations for the nonexistent range.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11410) Use the cache when edit logging XAttrOps

2017-02-13 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11410:


 Summary: Use the cache when edit logging XAttrOps
 Key: HDFS-11410
 URL: https://issues.apache.org/jira/browse/HDFS-11410
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Xiao Chen
Assignee: Xiao Chen


[~andrew.wang] recently had a comment on HDFS-10899:
{quote}
Looks like we aren't using the op cache in FSEditLog SetXAttrOp / 
RemoveXAttrOp. I think this is accidental, could you do some research? 
Particularly since we'll be doing a lot of SetXAttrOps, avoiding all that 
object allocation would be nice. This could be a separate JIRA.
{quote}

i.e. 
{code}
static SetXAttrOp getInstance() {
  return new SetXAttrOp();
}
{code}
v.s.
{code}
static AddOp getInstance(OpInstanceCache cache) {
  return (AddOp) cache.get(OP_ADD);
}
{code}

Seems we should fix these non-caching usages.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Git Infra problem?

2017-01-27 Thread Xiao Chen

Hello,

Saw some weird error when committing a jira this morning. Don't think I
broke it...
Filed https://issues.apache.org/jira/browse/INFRA-13412


Best,
-Xiao

[jira] [Resolved] (HDFS-11372) Increase test timeouts that are too aggressive.

2017-01-27 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen resolved HDFS-11372.
--
Resolution: Duplicate

> Increase test timeouts that are too aggressive.
> ---
>
> Key: HDFS-11372
> URL: https://issues.apache.org/jira/browse/HDFS-11372
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha3
>    Reporter: Xiao Chen
>Priority: Minor
>
> Seen these timeout in some 
> [precommit|https://issues.apache.org/jira/browse/HDFS-10899?focusedCommentId=15838964&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15838964]
>  false positives, at a brief look I think likely due to timeout being to 
> small.
> - TestLeaseRecovery2
> - TestDataNodeVolumeFailure 
> Can't seem to find from jenkins which test method is at fault, but 
> TestLeaseRecovery2 has some 30-second timeout cases, and 
> TestDataNodeVolumeFailure has 1 10-second timeout case.
> We should make them at least 2 minutes, or maybe 10x local run time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-8377) Support HTTP/2 in datanode

2017-01-26 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen resolved HDFS-8377.
-
   Resolution: Fixed
Fix Version/s: (was: 3.0.0-alpha1)

Committed the revert to trunk and branch-2. Only keeping 2.8.0 as fix version 
here, to hopefully make it look cleaner.

The TestAclsEndToEnd is caused by HADOOP-13988, pinged.
Thanks, [~Apache9]!

> Support HTTP/2 in datanode
> --
>
> Key: HDFS-8377
> URL: https://issues.apache.org/jira/browse/HDFS-8377
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.8.0
>
> Attachments: HDFS-8377.1.patch, HDFS-8377.2.patch, HDFS-8377.patch, 
> HDFS-8377.revert.branch-2.patch, HDFS-8377.revert.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11372) Increase test timeouts that are too aggressive.

2017-01-25 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11372:


 Summary: Increase test timeouts that are too aggressive.
 Key: HDFS-11372
 URL: https://issues.apache.org/jira/browse/HDFS-11372
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 3.0.0-alpha3
Reporter: Xiao Chen
Priority: Minor


Seen these timeout in some 
[precommit|https://issues.apache.org/jira/browse/HDFS-10899?focusedCommentId=15838964&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15838964]
 false positives, at a brief look I think likely due to timeout being to small.
- TestLeaseRecovery2
- TestDataNodeVolumeFailure 

Can't seem to find from jenkins which test method is at fault, but 
TestLeaseRecovery2 has some 30-second timeout cases, and 
TestDataNodeVolumeFailure has 1 10-second timeout case.

We should make them at least 2 minutes, or maybe 10x local run time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-8377) Support HTTP/2 in datanode

2017-01-25 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reopened HDFS-8377:
-

Reopening to run pre-commit.

> Support HTTP/2 in datanode
> --
>
> Key: HDFS-8377
> URL: https://issues.apache.org/jira/browse/HDFS-8377
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HDFS-8377.1.patch, HDFS-8377.2.patch, HDFS-8377.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.0-alpha2 RC0

2017-01-25 Thread Xiao Chen

Thanks Andrew and the community to work out the alpha2 RC!

+1 (non-binding)

   - Built the source tarball
   - Tested on a pseudo-distributed cluster, basic HDFS operations/sample
   pi job over HDFS encryption zone work.
   - Sanity checked NN and KMS webui
   - Sanity checked NN/DN/KMS logs.


-Xiao

On Wed, Jan 25, 2017 at 9:41 AM, Zhihai Xu  wrote:

> Thanks Andrew for creating release Hadoop 3.0.0-alpha2 RC0
> +1 ( non-binding)
>
> --Downloaded source and built from it.
> --Deployed on a pseudo-distributed cluster.
> --Ran sample MR jobs and tested with basics HDFS operations.
> --Did a sanity check for RM and NM UI.
>
> Best,
> zhihai
>
> On Wed, Jan 25, 2017 at 8:07 AM, Kuhu Shukla  >
> wrote:
>
> > +1 (non-binding)* Built from source* Deployed on a pseudo-distributed
> > cluster (MAC)* Ran wordcount and sleep jobs.
> >
> >
> > On Wednesday, January 25, 2017 3:21 AM, Marton Elek <
> > me...@hortonworks.com> wrote:
> >
> >
> >  Hi,
> >
> > I also did a quick smoketest with the provided 3.0.0-alpha2 binaries:
> >
> > TLDR; It works well
> >
> > Environment:
> >  * 5 hosts, docker based hadoop cluster, every component in separated
> > container (5 datanode/5 nodemanager/...)
> >  * Components are:
> >   * Hdfs/Yarn cluster (upgraded 2.7.3 to 3.0.0-alpha2 using the binary
> > package for vote)
> >   * Zeppelin 0.6.2/0.7.0-RC2
> >   * Spark 2.0.2/2.1.0
> >   * HBase 1.2.4 + zookeeper
> >   * + additional docker containers for configuration management and
> > monitoring
> > * No HA, no kerberos, no wire encryption
> >
> >  * HDFS cluster upgraded successfully from 2.7.3 (with about 200G data)
> >  * Imported 100G data to HBase successfully
> >  * Started Spark jobs to process 1G json from HDFS (using
> > spark-master/slave cluster). It worked even when I used the Zeppelin
> 0.6.2
> > + Spark 2.0.2 (with old hadoop client included). Obviously the old
> version
> > can't use the new Yarn cluster as the token file format has been changed.
> >  * I upgraded my setup to use Zeppelin 0.7.0-RC2/Spark 2.1.0(distribution
> > without hadoop)/hadoop 3.0.0-alpha2. It also worked well: processed the
> > same json files from HDFS with spark jobs (from zeppelin) using yarn
> > cluster (master: yarn deploy-mode: cluster)
> >  * Started spark jobs (with spark submit, master: yarn) to count records
> > from the hbase database: OK
> >  * Started example Mapreduce jobs from distribution over yarn. It was OK
> > but only with specific configuration (see bellow)
> >
> > So my overall impression that it works very well (at least with my
> > 'smalldata')
> >
> > Some notes (none of them are blocking):
> >
> > 1. To run the example mapreduce jobs I defined HADOOP_MAPRED_HOME at
> > command line:
> > ./bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-
> alpha2.jar
> > pi -Dyarn.app.mapreduce.am.env="HADOOP_MAPRED_HOME={{HADOOP_
> COMMON_HOME}}"
> > -Dmapreduce.admin.user.env="HADOOP_MAPRED_HOME={{HADOOP_COMMON_HOME}}"
> 10
> > 10
> >
> > And in the yarn-site:
> >
> > yarn.nodemanager.env-whitelist: JAVA_HOME,HADOOP_COMMON_HOME,
> > HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_
> > DISTCACHE,HADOOP_YARN_HOME,MAPRED_HOME_DIR
> >
> > I don't know the exact reason for the change, but the 2.7.3 was more
> > userfriendly as the example could be run without specific configuration.
> >
> > For the same reason I didn't start hbase mapreduce job with hbase command
> > line app (There could be some option for hbase to define MAPRED_HOME_DIR
> as
> > well, but by default I got ClassNotFoundException for one of the MR
> class)
> >
> > 2. For the records: The logging and htrace classes are excluded from the
> > shaded hadoop client jar so I added it manually one by one to the spark
> > (spark 2.1.0 distribution without hadoop):
> >
> > RUN wget `cat url` -O spark.tar.gz && tar zxf spark.tar.gz && rm
> > spark.tar.gz && mv spark* spark
> > RUN cp /opt/hadoop/share/hadoop/client/hadoop-client-api-3.0.
> 0-alpha2.jar
> > /opt/spark/jars
> > RUN cp /opt/hadoop/share/hadoop/client/hadoop-client-runtime-
> 3.0.0-alpha2.jar
> > /opt/spark/jars
> > ADD https://repo1.maven.org/maven2/org/slf4j/slf4j-
> > log4j12/1.7.10/slf4j-log4j12-1.7.10.jar /opt/spark/jars
> > ADD https://repo1.maven.org/maven2/org/apache/htrace/
> > htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar
> > /opt/spark/jars
> > ADD https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.
> > 7.10/slf4j-api-1.7.10.jar /opt/spark/jars/
> > ADD https://repo1.maven.org/maven2/log4j/log4j/1.2.17/log4j-1.2.17.jar
> > /opt/spark/jars
> >
> > With this jars files spark 2.1.0 works well with the alpha2 version of
> > HDFS and YARN.
> >
> > 3. The messages "Upgrade in progress. Not yet finalized." wasn't
> > disappeared from the namenode webui but the cluster works well.
> >
> > Most probably I missed to do something, but it's a little bit confusing.
> >
> > (I checked the REST call, it is the jmx bean who reports that it was not
> > yet finalized, the code

[jira] [Resolved] (HDFS-11366) Clean up old .ckpt files after saveNamespace

2017-01-24 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-11366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen resolved HDFS-11366.
--
Resolution: Duplicate

Looks like we already have HDFS-3716 in place to take care of this problem. 
Sorry didn't find that earlier.

It's more aggressive than proposed here, but since the purge only happens after 
a successful checkpoint, the risk is low.

> Clean up old .ckpt files after saveNamespace
> 
>
> Key: HDFS-11366
> URL: https://issues.apache.org/jira/browse/HDFS-11366
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 2.6.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>
> Checkpoints are done in the NN by writing to {{fsimage.ckpt_TXID}} files, and 
> rename to {{fsimage_TXID}} files upon success.
> If a checkpoint fails half way, the fsimage.ckpt_ file will be left on disk. 
> There is no logic to clean it up at all.
> After talking with [~atm], I understand the historical reason for not 
> immediately cleaning up those files, since they maybe useful for disaster 
> recovery.
> But feels like cleaning those ckpt files after a successful checkpoint, with 
> a larger TXID threshold is also safe to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11366) Clean up old .ckpt files after saveNamespace

2017-01-24 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11366:


 Summary: Clean up old .ckpt files after saveNamespace
 Key: HDFS-11366
 URL: https://issues.apache.org/jira/browse/HDFS-11366
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs, namenode
Affects Versions: 2.6.0
Reporter: Xiao Chen
Assignee: Xiao Chen


Checkpoints are done in the NN by writing to {{fsimage.ckpt_TXID}} files, and 
rename to {{fsimage_TXID}} files upon success.

If a checkpoint fails half way, the fsimage.ckpt_ file will be left on disk. 
There is no logic to clean it up at all.

After talking with [~atm], I understand the historical reason for not 
immediately cleaning up those files, since they maybe useful for disaster 
recovery.

But feels like cleaning those ckpt files after a successful checkpoint, with a 
larger TXID threshold is also safe to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11275) Check groupEntryIndex and throw a helpful exception on failures when removing ACL.

2016-12-27 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11275:


 Summary: Check groupEntryIndex and throw a helpful exception on 
failures when removing ACL.
 Key: HDFS-11275
 URL: https://issues.apache.org/jira/browse/HDFS-11275
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Xiao Chen
Assignee: Xiao Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11210) Enhance key rolling to be atomic

2016-12-05 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11210:


 Summary: Enhance key rolling to be atomic
 Key: HDFS-11210
 URL: https://issues.apache.org/jira/browse/HDFS-11210
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: encryption, kms
Affects Versions: 2.6.5
Reporter: Xiao Chen
Assignee: Xiao Chen


To support re-encrypting EDEK, we need to make sure after a key is rolled, no 
old version EDEKs are used anymore. This includes various caches when 
generating EDEK.
This is not true currently, simply because no such requirements / necessities 
before.

This includes
- Client Provider(s), and corresponding cache(s).
When LoadBalancingKMSCP is used, we need to clear all KMSCPs.
- KMS server instance(s), and corresponding cache(s)
When KMS HA is configured with multiple KMS instances, only 1 will receive the 
{{rollNewVersion}} request, we need to make sure other instances are rolled too.
- The Client instance inside NN(s), and corresponding cache(s)
When {{hadoop key roll}} is succeeded, the client provider inside NN should be 
drained too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11203) Rename handling during re-encrypt EDEK

2016-12-04 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11203:


 Summary: Rename handling during re-encrypt EDEK
 Key: HDFS-11203
 URL: https://issues.apache.org/jira/browse/HDFS-11203
 Project: Hadoop HDFS
  Issue Type: Task
  Components: encryption
Reporter: Xiao Chen
Assignee: Xiao Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11159) Add reencryptEDEK interface for KMS

2016-11-18 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11159:


 Summary: Add reencryptEDEK interface for KMS
 Key: HDFS-11159
 URL: https://issues.apache.org/jira/browse/HDFS-11159
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: encryption, kms
Reporter: Xiao Chen
Assignee: Xiao Chen


This is the KMS part. Please refer to HDFS-10899 for the design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11120) TestEncryptionZones should waitActive

2016-11-08 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11120:


 Summary: TestEncryptionZones should waitActive
 Key: HDFS-11120
 URL: https://issues.apache.org/jira/browse/HDFS-11120
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 2.8.0
Reporter: Xiao Chen
Priority: Minor


Happened to notice this.

{{TestEncryptionZones#setup}} didn't {{waitActive}} on the minicluster. There's 
also a test case that does a unnecessary waitActive:
{code}
cluster.restartNameNode(true);
cluster.waitActive();
{code}

We should fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11093) TestEncryptionZones#testStartFileRetry intermittently fails with timeout

2016-11-02 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11093:


 Summary: TestEncryptionZones#testStartFileRetry intermittently 
fails with timeout
 Key: HDFS-11093
 URL: https://issues.apache.org/jira/browse/HDFS-11093
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Xiao Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11091) Implement a getTrashRoot that does not fall-back

2016-11-01 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11091:


 Summary: Implement a getTrashRoot that does not fall-back
 Key: HDFS-11091
 URL: https://issues.apache.org/jira/browse/HDFS-11091
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: Xiao Chen


>From HDFS-10756's 
>[discussion|https://issues.apache.org/jira/browse/HDFS-10756?focusedCommentId=15623755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15623755]:
{{getTrashRoot}} is supposed to return the trash dir considering encryption 
zone. But if there's an error encountered (e.g. access control exception), it 
falls back to the default trash dir.

Although there is a warning message about this, it is still a somewhat 
surprising behavior. The fall back was added by HDFS-9799 for compatibility 
reasons. This jira is to propose we add a getTrashRoot that throws, which will 
actually be more user-friendly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-10423) Increase default value of httpfs maxHttpHeaderSize

2016-10-20 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reopened HDFS-10423:
--

> Increase default value of httpfs maxHttpHeaderSize
> --
>
> Key: HDFS-10423
> URL: https://issues.apache.org/jira/browse/HDFS-10423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.6.4, 3.0.0-alpha1
>Reporter: Nicolae Popa
>Assignee: Nicolae Popa
>Priority: Minor
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-10423.01.patch, HDFS-10423.02.patch, 
> HDFS-10423.branch-2.patch, testing-after-HDFS-10423.txt, 
> testing-after-HDFS-10423_withCustomHeader4.txt, 
> testing-before-HDFS-10423.txt
>
>
> The Tomcat default value of maxHttpHeaderSize is 8k, which is too low for 
> certain Hadoop workloads in kerberos enabled environments. This JIRA will to 
> change it to 65536 in server.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Listing large directories via WebHDFS

2016-10-19 Thread Xiao Chen

Hi Zhe,

Per my understanding, the runner in webhdfs goes to NamenodeWebHdfsMethods
,
which eventually calls FSNameSystem#getListing. So it's still throttled on
the NN side. Up for discussions for ddos part...

Also, Andrew did some pagination features for webhdfs/httpfs via
https://issues.apache.org/jira/browse/HDFS-10784 and
https://issues.apache.org/jira/browse/HDFS-10823, to provide better control.

Best,

-Xiao

On Wed, Oct 19, 2016 at 2:08 PM, Zhe Zhang  wrote:

> Hi,
>
> The regular HDFS client (DistributedFileSystem) throttles the workload of
> listing large directories by dividing the work into batches, something like
> below:
> {code}
> // fetch the first batch of entries in the directory
> DirectoryListing thisListing = dfs.listPaths(
> src, HdfsFileStatus.EMPTY_NAME);
>  ..
> if (!thisListing.hasMore()) { // got all entries of the directory
>   FileStatus[] stats = new FileStatus[partialListing.length];
> {code}
>
> However, WebHDFS doesn't seem to have this batching logic.
> {code}
>   @Override
>   public FileStatus[] listStatus(final Path f) throws IOException {
> final HttpOpParam.Op op = GetOpParam.Op.LISTSTATUS;
> return new FsPathResponseRunner(op, f) {
>   @Override
>   FileStatus[] decodeResponse(Map json) {
>   
>   }
> }.run();
>   }
> {code}
>
> Am I missing anything? So a user can DDoS by {{hadoop fs -ls -R /}} via
> WebHDFS?
>

[jira] [Created] (HDFS-11009) Add a tool to reconstruct block meta file from CLI

2016-10-13 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-11009:


 Summary: Add a tool to reconstruct block meta file from CLI
 Key: HDFS-11009
 URL: https://issues.apache.org/jira/browse/HDFS-11009
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Xiao Chen
Assignee: Xiao Chen


If the block file on local disk presents, but the meta file is missing, it's 
theoretically possible to manually restore the meta file and have the block 
readable.
This jira is to propose to add such a tool to do that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-10963) Reduce log level when network topology cannot find enough datanodes.

2016-10-04 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-10963:


 Summary: Reduce log level when network topology cannot find enough 
datanodes.
 Key: HDFS-10963
 URL: https://issues.apache.org/jira/browse/HDFS-10963
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Xiao Chen
Assignee: Xiao Chen
Priority: Minor
 Attachments: HDFS-10963.01.patch

Credit to [~qwertymaniac] who reported this.
{quote}
The change made by HDFS-10320 now causes all 1-rack cluster to keep printing 
below:
  WARN  org.apache.hadoop.net.NetworkTopology   
  Failed to find datanode (scope="" excludedScope="/default").
{quote}

This was added in HDFS-10320 to replace an exception that caused NN terminate. 
A warn log was added. But thinking closely, this should be debug to reduce 
confusion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-10918) Add a tool to get FileEncryptionInfo from CLI

2016-09-27 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-10918:


 Summary: Add a tool to get FileEncryptionInfo from CLI
 Key: HDFS-10918
 URL: https://issues.apache.org/jira/browse/HDFS-10918
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: encryption
Reporter: Xiao Chen
Assignee: Xiao Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-10899) Add functionality to re-encrypt EDEKs.

2016-09-23 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-10899:


 Summary: Add functionality to re-encrypt EDEKs.
 Key: HDFS-10899
 URL: https://issues.apache.org/jira/browse/HDFS-10899
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: encryption
Reporter: Xiao Chen
Assignee: Xiao Chen


Currently when an encryption zone (EZ) key is rotated, it only takes effect on 
new EDEKs. We should provide a util to re-encrypt EDEKs after the EZ key 
rotation, for improved security.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-10879) TestEncryptionZonesWithKMS#testReadWrite fails intermitently

2016-09-20 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-10879:


 Summary: TestEncryptionZonesWithKMS#testReadWrite fails 
intermitently
 Key: HDFS-10879
 URL: https://issues.apache.org/jira/browse/HDFS-10879
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Xiao Chen
Assignee: Xiao Chen


{noformat}
Error Message:
Key was rolled, versions should be different. Actual: test_key@0

Stack Trace:
java.lang.AssertionError: Key was rolled, versions should be different. Actual: 
test_key@0
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failEquals(Assert.java:185)
at org.junit.Assert.assertNotEquals(Assert.java:161)
at 
org.apache.hadoop.hdfs.TestEncryptionZones.testReadWrite(TestEncryptionZones.java:726)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-10875) Optimize du -x to cache intermediate result

2016-09-19 Thread Xiao Chen (JIRA)

Xiao Chen created HDFS-10875:


 Summary: Optimize du -x to cache intermediate result
 Key: HDFS-10875
 URL: https://issues.apache.org/jira/browse/HDFS-10875
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 2.8.0
Reporter: Xiao Chen
Assignee: Xiao Chen


As [~jingzhao] pointed out in HDFS-8986, we can save a 
{{computeContentSummary4Snapshot}} call in 
{{INodeDirectory#computeContentSummary}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

2016-08-31 Thread Xiao Chen

+1 (non-binding)

Thanks Andrew for putting this up! Super excited to see a Hadoop 3 RC.

Verifications done on OS X:
- Verified md5 on all files
- Spot checked release notes and changes
- Built from source
- Verified LICENSE and NOTICE are correctly contained in the jars built
- Started pseudo-distributed HDFS, verified basic operations work.
- Started KMS service, verified basic KMS operations work, a simple
encryption zone works.
- Sanity checked NN webui.

Best,
-Xiao

On Wed, Aug 31, 2016 at 9:25 PM, Rakesh Radhakrishnan 
wrote:

> Thanks for getting this out.
>
> +1 (non-binding)
>
> - downloaded and built tarball from source
> - deployed HDFS-HA cluster and tested few EC file operations
> - executed few hdfs commands including EC commands
> - viewed basic UI
> - ran some of the sample jobs
>
>
> Best Regards,
> Rakesh
> Intel
>
> On Thu, Sep 1, 2016 at 6:19 AM, John Zhuge  wrote:
>
> > +1 (non-binding)
> >
> > - Build source with Java 1.8.0_101 on Centos 6.6 without native
> > - Verify license and notice using the shell script in HADOOP-13374
> > - Deploy a pseudo cluster
> > - Run basic dfs, distcp, ACL, webhdfs commands
> > - Run MapReduce workcount and pi examples
> > - Run balancer
> >
> > Thanks,
> > John
> >
> > John Zhuge
> > Software Engineer, Cloudera
> >
> > On Wed, Aug 31, 2016 at 11:46 AM, Gangumalla, Uma <
> > uma.ganguma...@intel.com>
> > wrote:
> >
> > > +1 (binding).
> > >
> > > Overall it¹s a great effort, Andrew. Thank you for putting all the
> > energy.
> > >
> > > Downloaded and built.
> > > Ran some sample jobs.
> > >
> > > I would love to see all this efforts will lead to get the GA from
> Hadoop
> > > 3.X soon.
> > >
> > > Regards,
> > > Uma
> > >
> > >
> > > On 8/30/16, 8:51 AM, "Andrew Wang"  wrote:
> > >
> > > >Hi all,
> > > >
> > > >Thanks to the combined work of many, many contributors, here's an RC0
> > for
> > > >3.0.0-alpha1:
> > > >
> > > >http://home.apache.org/~wang/3.0.0-alpha1-RC0/
> > > >
> > > >alpha1 is the first in a series of planned alpha releases leading up
> to
> > > >GA.
> > > >The objective is to get an artifact out to downstreams for testing and
> > to
> > > >iterate quickly based on their feedback. So, please keep that in mind
> > when
> > > >voting; hopefully most issues can be addressed by future alphas rather
> > > >than
> > > >future RCs.
> > > >
> > > >Sorry for getting this out on a Tuesday, but I'd still like this vote
> to
> > > >run the normal 5 days, thus ending Saturday (9/3) at 9AM PDT. I'll
> > extend
> > > >if we lack the votes.
> > > >
> > > >Please try it out and let me know what you think.
> > > >
> > > >Best,
> > > >Andrew
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> > >
> > >
> >
>

[jira] [Resolved] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-11 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen resolved HDFS-10757.
--
Resolution: Duplicate

Agree with Arun. Closing this as a dup of HADOOP-13381. Feel free to 
reopen/comment if this is not true.

Thanks for reporting this, [~sershe].

> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

1 2 >

1 - 100 of 142 matches

Mail list logo