[jira] [Comment Edited] (HDFS-11912) Add a snapshot unit test with randomized file IO operations
[ https://issues.apache.org/jira/browse/HDFS-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130850#comment-16130850 ] George Huang edited comment on HDFS-11912 at 8/17/17 5:42 PM: -- Test was having a 10 min timeout. However, it took more than 10 mins to create 5000 test files in HDFS: 2017-08-17 03:12:54,076 [Thread-126] INFO common.Storage (Storage.java:tryLock(847)) - Lock on /testptch/hadoop/hadoop-hdf ...[truncated 9653305 chars]... wed=trueugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=create src=/WITNESSDIR/1720/1719/1718/1717/1716/1715/1714/1713/file1720 dst=nullperm=jenkins:supergroup:rw-r--r-- proto=rpc 2017-08-17 03:22:52,582 [IPC Server handler 6 on 40751] INFO hdfs.StateChange (FSDirWriteFileOp.java:logAllocatedBlock(787)) - BLOCK* allocate blk_1073745266_4442, replicas=127.0.0.1:43960 for /WITNESSDIR/1720/1719/1718/1717/1716/1715/1714/1713/file1720 2017-08-17 03:22:52,583 [DataXceiver for client DFSClient_NONMAPREDUCE_1460336802_1 at /127.0.0.1:53834 [Receiving block BP-233349655-172.17.0.2-1502939568997:blk_1073745266_4442]] INFO datanode.DataNode (DataXceiver.java:writeBlock(742)) - Receiving BP-233349655-172.17.0.2-1502939568997:blk_1073745266_4442 src: /127.0.0.1:53834 dest: /127.0.0.1:43960 2017-08-17 03:22:52,586 [PacketResponder: BP-233349655-172.17.0.2-1502939568997:blk_1073745266_4442, type=LAST_IN_PIPELINE] INFO DataNode.clienttrace (BlockReceiver.java:finalizeBlock(1523)) - src: /127.0.0.1:53834, dest: /127.0.0.1:43960, bytes: 69, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_1460336802_1, offset: 0, srvID: ae9a2151-cb92-461b-8d73-cd9641184228, blockid: BP-233349655-172.17.0.2-1502939568997:blk_1073745266_4442, duration(ns): 1413179 2017-08-17 03:22:52,586 [PacketResponder: BP-233349655-172.17.0.2-1502939568997:blk_1073745266_4442, type=LAST_IN_PIPELINE] INFO datanode.DataNode (BlockReceiver.java:run(1496)) - PacketResponder: BP-233349655-172.17.0.2-1502939568997:blk_1073745266_4442, type=LAST_IN_PIPELINE terminating 2017-08-17 03:22:52,590 [IPC Server handler 4 on 40751] INFO hdfs.StateChange (FSNamesystem.java:completeFile(2755)) - DIR* completeFile: /WITNESSDIR/1720/1719/1718/1717/1716/1715/1714/1713/file1720 is closed by DFSClient_NONMAPREDUCE_1460336802_1 2017-08-17 03:22:52,600 [IPC Server handler 5 on 40751] INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7512)) - allowed=true ugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=getfileinfo src=/WITNESSDIR/1720/1719/1718/1717/1716/1715/1714/1713/file1720 dst=nullperm=null proto=rpc 2017-08-17 03:22:52,601 [Thread-173] INFO snapshot.TestRandomOpsWithSnapshots (TestRandomOpsWithSnapshots.java:createFiles(634)) - createFiles, file: was (Author: ghuangups): Test was having a 10 min timeout. However, setup related operations took almost 10 mins and hence left no time for test to finish. Test starts at around 03:12, but it reaches to actual test at around 03:22: :( 2017-08-17 03:12:47,798 [main] INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:(469)) - starting cluster: numNameNodes=1, numDataNodes=3 Formatting using clusterid: testClusterID : : 2017-08-17 03:12:54,076 [Thread-126] INFO common.Storage (Storage.java:tryLock(847)) - Lock on /testptch/hadoop/hadoop-hdf ...[truncated 9653305 chars]... wed=trueugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=create src=/WITNESSDIR/1720/1719/1718/1717/1716/1715/1714/1713/file1720 dst=nullperm=jenkins:supergroup:rw-r--r-- proto=rpc 2017-08-17 03:22:52,582 [IPC Server handler 6 on 40751] INFO hdfs.StateChange : : > Add a snapshot unit test with randomized file IO operations > --- > > Key: HDFS-11912 > URL: https://issues.apache.org/jira/browse/HDFS-11912 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Reporter: George Huang >Assignee: George Huang >Priority: Minor > Labels: TestGap > Attachments: HDFS-11912.001.patch, HDFS-11912.002.patch, > HDFS-11912.003.patch, HDFS-11912.004.patch > > > Adding a snapshot unit test with randomized file IO operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11912) Add a snapshot unit test with randomized file IO operations
[ https://issues.apache.org/jira/browse/HDFS-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129881#comment-16129881 ] Manoj Govindassamy edited comment on HDFS-11912 at 8/17/17 4:18 AM: [~ghuangups], Thanks much for working on the patch revision. Checkstyle issues are mostly fixed. Just the last 4 pending. The newly added unit test is still failing. Can you please take a look? was (Author: manojg): [~ghuangups], Checkstyle issues are mostly fixed. Just the last 4 pending. The newly added unit test is still failing. Can you please take a look? > Add a snapshot unit test with randomized file IO operations > --- > > Key: HDFS-11912 > URL: https://issues.apache.org/jira/browse/HDFS-11912 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Reporter: George Huang >Assignee: George Huang >Priority: Minor > Labels: TestGap > Attachments: HDFS-11912.001.patch, HDFS-11912.002.patch, > HDFS-11912.003.patch, HDFS-11912.004.patch > > > Adding a snapshot unit test with randomized file IO operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11912) Add a snapshot unit test with randomized file IO operations
[ https://issues.apache.org/jira/browse/HDFS-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129736#comment-16129736 ] George Huang edited comment on HDFS-11912 at 8/17/17 1:29 AM: -- Hi Manoj, Thank you so much for the comment. Test randomly generated number of iterations to execute. It may time out if the overall operation takes too long. I'm reducing the max number of iterations and executed locally many times without timeout. Also fixed checkstyle issues listed. Many thanks, George was (Author: ghuangups): Test randomly generated number of iterations for the current run. Test may time out if the overall operation takes too long. I'm reducing the max number of iterations and executed locally many times without timeout. > Add a snapshot unit test with randomized file IO operations > --- > > Key: HDFS-11912 > URL: https://issues.apache.org/jira/browse/HDFS-11912 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Reporter: George Huang >Assignee: George Huang >Priority: Minor > Labels: TestGap > Attachments: HDFS-11912.001.patch, HDFS-11912.002.patch, > HDFS-11912.003.patch > > > Adding a snapshot unit test with randomized file IO operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11912) Add a snapshot unit test with randomized file IO operations
[ https://issues.apache.org/jira/browse/HDFS-11912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033821#comment-16033821 ] Manoj Govindassamy edited comment on HDFS-11912 at 6/1/17 10:27 PM: Thanks for contributing this patch [~ghuangups]. Looks good overall. Few comments from the quick look. Will add more comments later. In HDFS-9406, snapshot operations were believed to causing metadata inconsistencies in the fsimage. Can you please try running this new test without the fix for HDFS-9406 and see if it can recreate the problem? 1. {noformat} if (randomNum > currentWeightSum && randomNum <= (currentWeightSum + currentValue.getWeight())) { snapshotRandomOp = currentValue; break; } {noformat} Shouldn't the check be just (randomNum < (currentWeightSum + currentValue.getWeight()) 2. {noformat} private static MiniDFSCluster cluster; private static DistributedFileSystem hdfs; private static Random GENERATOR = null; {noformat} Above class members need not be static. 3. {{FileSystemOperations}} and {{SnapshotOperations}} are very similar except for enum values and weights. Code duplication here can be avoided if we can merge these two enums into one and expose proper methods. 4. {noformat} // Set Random RANDOM = new Random(); long seed = RANDOM.nextLong(); GENERATOR = new Random(seed); {noformat} Any specific reason why a simple seed like System.currentTimeMillis() will not be useful here ? Here seed is generated from random, which is inturn is not seeded. Also, RANDOM need not be all caps. 5. {noformat} int fileLen = new Random().nextInt(MAX_NUM_FILE_LENGTH); createFiles(testDirString, fileLen); {noformat} GENERATOR random can be used here instead of creating a new one. 6. {noformat} // Create files in a directory with random depth, ranging from 0-10. for (int i = 0; i < TOTAL_BLOCKS; i += fileLength) { {noformat} Is this TOTAL_BLOCKS are total files ? 7. {noformat} private String GetNewPathString(String originalString, {noformat} Metnhod name should be in camel case, like getNewPathString() was (Author: manojg): Thanks for contributing this patch [~ghuangups]. Few comments from the quick look. Will add more comments later. In HDFS-9406, snapshot operations were believed to causing metadata inconsistencies in the fsimage. Can you please try running this new test without the fix for HDFS-9406 and see if it can recreate the problem? 1. {noformat} if (randomNum > currentWeightSum && randomNum <= (currentWeightSum + currentValue.getWeight())) { snapshotRandomOp = currentValue; break; } {noformat} Shouldn't the check be just (randomNum < (currentWeightSum + currentValue.getWeight()) 2. {noformat} private static MiniDFSCluster cluster; private static DistributedFileSystem hdfs; private static Random GENERATOR = null; {noformat} Above class members need not be static. 3. {{FileSystemOperations}} and {{SnapshotOperations}} are very similar except for enum values and weights. Code duplication here can be avoided if we can merge these two enums into one and expose proper methods. 4. {noformat} // Set Random RANDOM = new Random(); long seed = RANDOM.nextLong(); GENERATOR = new Random(seed); {noformat} Any specific reason why a simple seed like System.currentTimeMillis() will not be useful here ? Here seed is generated from random, which is inturn is not seeded. Also, RANDOM need not be all caps. 5. {noformat} int fileLen = new Random().nextInt(MAX_NUM_FILE_LENGTH); createFiles(testDirString, fileLen); {noformat} GENERATOR random can be used here instead of creating a new one. 6. {noformat} // Create files in a directory with random depth, ranging from 0-10. for (int i = 0; i < TOTAL_BLOCKS; i += fileLength) { {noformat} Is this TOTAL_BLOCKS are total files ? 7. {noformat} private String GetNewPathString(String originalString, {noformat} Metnhod name should be in camel case, like getNewPathString() > Add a snapshot unit test with randomized file IO operations > --- > > Key: HDFS-11912 > URL: https://issues.apache.org/jira/browse/HDFS-11912 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Reporter: George Huang >Priority: Minor > Attachments: HDFS-11912.001.patch > > > Adding a snapshot unit test with randomized file IO operations. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org