[jira] [Commented] (HDFS-10543) hdfsRead read stops at block boundary
[ https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363214#comment-15363214 ] Colin Patrick McCabe commented on HDFS-10543: - One approach would be to try checking the behavior of the Java client and seeing if you can do something similar. It is not incorrect to avoid short reads, just potentially inefficient. > hdfsRead read stops at block boundary > - > > Key: HDFS-10543 > URL: https://issues.apache.org/jira/browse/HDFS-10543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Xiaowei Zhu > Fix For: HDFS-8707 > > Attachments: HDFS-10543.HDFS-8707.000.patch, > HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, > HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch > > > Reproducer: > char *buf2 = new char[file_info->mSize]; > memset(buf2, 0, (size_t)file_info->mSize); > int ret = hdfsRead(fs, file, buf2, file_info->mSize); > delete [] buf2; > if(ret != file_info->mSize) { > std::stringstream ss; > ss << "tried to read " << file_info->mSize << " bytes. but read " << > ret << " bytes"; > ReportError(ss.str()); > hdfsCloseFile(fs, file); > continue; > } > When it runs with a file ~1.4GB large, it will return an error like "tried to > read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs > against has a block size of 134217728 bytes. So it seems hdfsRead will stop > at a block boundary. Looks like a regression. We should add retry to continue > reading cross blocks in case of files w/ multiple blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10555) Unable to loadFSEdits due to a failure in readCachePoolInfo
[ https://issues.apache.org/jira/browse/HDFS-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363005#comment-15363005 ] Colin Patrick McCabe commented on HDFS-10555: - Thanks, [~umamaheswararao], [~jingzhao], and [~kihwal]. > Unable to loadFSEdits due to a failure in readCachePoolInfo > --- > > Key: HDFS-10555 > URL: https://issues.apache.org/jira/browse/HDFS-10555 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, namenode >Affects Versions: 2.9.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 2.9.0 > > Attachments: HDFS-10555-00.patch > > > Recently some tests are failing and unable to loadFSEdits due to a failure in > readCachePoolInfo. > Here in below code > FSImageSerialization.java > {code} > } > if ((flags & ~0x2F) != 0) { > throw new IOException("Unknown flag in CachePoolInfo: " + flags); > } > {code} > When all values of CachePool variable set to true, flags value & ~0x2F turns > out to non zero value. So, this condition failing due to the addition of 0x20 > and changing value from ~0x1F to ~0x2F. > May be to fix this issue, we may can change multiply value to ~0x3F -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote
[ https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363003#comment-15363003 ] Colin Patrick McCabe commented on HDFS-10548: - Thanks for tackling this, guys. It is good to see this code duplication finally go away. Next target: {{BlockReaderLocalLegacy}}? I do think renaming {{BlockReaderRemote2}} will make merging code back to branch-2 more difficult-- you might want to reconsider that. > Remove the long deprecated BlockReaderRemote > > > Key: HDFS-10548 > URL: https://issues.apache.org/jira/browse/HDFS-10548 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, > HDFS-10548-v3.patch > > > To lessen the maintain burden like raised in HDFS-8901, suggest we remove > {{BlockReaderRemote}} class that's deprecated very long time ago. > From {{BlockReaderRemote}} header: > {quote} > * @deprecated this is an old implementation that is being left around > * in case any issues spring up with the new {@link BlockReaderRemote2} > * implementation. > * It will be removed in the next release. > {quote} > From {{BlockReaderRemote2}} class header: > {quote} > * This is a new implementation introduced in Hadoop 0.23 which > * is more efficient and simpler than the older BlockReader > * implementation. It should be renamed to BlockReaderRemote > * once we are confident in it. > {quote} > So even further, after getting rid of the old class, we could rename as the > comment suggested: BlockReaderRemote2 => BlockReaderRemote. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10543) hdfsRead read stops at block boundary
[ https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362997#comment-15362997 ] Colin Patrick McCabe commented on HDFS-10543: - Just to be clear, the existing HDFS Java client can return "short reads" that are less than what was requested, even when there is more remaining in the file. This is traditional in POSIX and nearly all filesystems I'm aware of have these semantics. The justification is that applications may not want to wait a long time to fetch more bytes, if there are some bytes available already that they can process. Applications that do want the full buffer can just call read() again. APIs like {{readFully}} exist to provide these semantics. > hdfsRead read stops at block boundary > - > > Key: HDFS-10543 > URL: https://issues.apache.org/jira/browse/HDFS-10543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Xiaowei Zhu > Fix For: HDFS-8707 > > Attachments: HDFS-10543.HDFS-8707.000.patch, > HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, > HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch > > > Reproducer: > char *buf2 = new char[file_info->mSize]; > memset(buf2, 0, (size_t)file_info->mSize); > int ret = hdfsRead(fs, file, buf2, file_info->mSize); > delete [] buf2; > if(ret != file_info->mSize) { > std::stringstream ss; > ss << "tried to read " << file_info->mSize << " bytes. but read " << > ret << " bytes"; > ReportError(ss.str()); > hdfsCloseFile(fs, file); > continue; > } > When it runs with a file ~1.4GB large, it will return an error like "tried to > read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs > against has a block size of 134217728 bytes. So it seems hdfsRead will stop > at a block boundary. Looks like a regression. We should add retry to continue > reading cross blocks in case of files w/ multiple blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9805) TCP_NODELAY not set before SASL handshake in data transfer pipeline
[ https://issues.apache.org/jira/browse/HDFS-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-9805: --- Resolution: Fixed Fix Version/s: 3.0.0-alpha1 Status: Resolved (was: Patch Available) > TCP_NODELAY not set before SASL handshake in data transfer pipeline > --- > > Key: HDFS-9805 > URL: https://issues.apache.org/jira/browse/HDFS-9805 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Gary Helmling >Assignee: Gary Helmling > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-9805.002.patch, HDFS-9805.003.patch, > HDFS-9805.004.patch, HDFS-9805.005.patch > > > There are a few places in the DN -> DN block transfer pipeline where > TCP_NODELAY is not set before doing a SASL handshake: > * in {{DataNode.DataTransfer::run()}} > * in {{DataXceiver::replaceBlock()}} > * in {{DataXceiver::writeBlock()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9805) TCP_NODELAY not set before SASL handshake in data transfer pipeline
[ https://issues.apache.org/jira/browse/HDFS-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362983#comment-15362983 ] Colin Patrick McCabe commented on HDFS-9805: Thanks for the reminder, [~jzhuge]. I committed the patch last week, but JIRA went down before I could mark the ticket as resolved. I have committed this to trunk only for the moment. The backport to branch-2 looks like it might be a little tricky, and our next release will be 3.0 anyway. If anyone is interested in backporting to branch-2, please do and update the ticket. Cheers. > TCP_NODELAY not set before SASL handshake in data transfer pipeline > --- > > Key: HDFS-9805 > URL: https://issues.apache.org/jira/browse/HDFS-9805 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Gary Helmling >Assignee: Gary Helmling > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-9805.002.patch, HDFS-9805.003.patch, > HDFS-9805.004.patch, HDFS-9805.005.patch > > > There are a few places in the DN -> DN block transfer pipeline where > TCP_NODELAY is not set before doing a SASL handshake: > * in {{DataNode.DataTransfer::run()}} > * in {{DataXceiver::replaceBlock()}} > * in {{DataXceiver::writeBlock()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10594) HDFS-4949 should support recursive cache directives
[ https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10594: Summary: HDFS-4949 should support recursive cache directives (was: CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory) > HDFS-4949 should support recursive cache directives > --- > > Key: HDFS-10594 > URL: https://issues.apache.org/jira/browse/HDFS-10594 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10594.001.patch > > > In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively > rescan the path when the inode of the path is a directory. In these code: > {code} > } else if (node.isDirectory()) { > INodeDirectory dir = node.asDirectory(); > ReadOnlyList children = dir > .getChildrenList(Snapshot.CURRENT_STATE_ID); > for (INode child : children) { > if (child.isFile()) { > rescanFile(directive, child.asFile()); > } > } >} > {code} > If we did the this logic, it means that some inode files will be ignored when > the child inode is also a directory and there are some other child inode file > in it. Finally the child's child file which belong to this path will not be > cached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9700) DFSClient and DFSOutputStream should set TCP_NODELAY on sockets for DataTransferProtocol
[ https://issues.apache.org/jira/browse/HDFS-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348970#comment-15348970 ] Colin Patrick McCabe commented on HDFS-9700: Hmm. I think it's confusing to use a configuration key for Hadoop RPC to configure something that isn't Hadoop RPC. We have tons of keys named with {{ipc}} and all of them relate to Hadoop RPC, not to DataTransferProtocol. {{ipc.client.connect.max.retries}}, {{ipc.server.listen.queue.size}}, {{ipc.client.connect.timeout}}, and so forth. There are valid cases where you might want a different configuration for RPC versus datatransferprotocol. For example, conservative users might also want to avoid turning on {{TCP_NODELAY}} for {{DataTransferProtocol}} since it is a new feature, and not as well tested as doing what we do currently. But since we have {{TCP_NODELAY}} on for RPC, they might want to keep that on. I agree that in the long term, {{TCP_NODELAY}} should be used for both. But that's an argument for removing the configuration altogether, not for making it do something other than what it's named. > DFSClient and DFSOutputStream should set TCP_NODELAY on sockets for > DataTransferProtocol > > > Key: HDFS-9700 > URL: https://issues.apache.org/jira/browse/HDFS-9700 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.7.1, 2.6.3 >Reporter: Gary Helmling >Assignee: Gary Helmling > Fix For: 2.8.0 > > Attachments: HDFS-9700-branch-2.7.002.patch, > HDFS-9700-branch-2.7.003.patch, HDFS-9700-v1.patch, HDFS-9700-v2.patch, > HDFS-9700.002.patch, HDFS-9700.003.patch, HDFS-9700.004.patch, > HDFS-9700_branch-2.7-v2.patch, HDFS-9700_branch-2.7.patch > > > In {{DFSClient.connectToDN()}} and > {{DFSOutputStream.createSocketForPipeline()}}, we never call > {{setTcpNoDelay()}} on the constructed socket before sending. In both cases, > we should respect the value of ipc.client.tcpnodelay in the configuration. > While this applies whether security is enabled or not, it seems to have a > bigger impact on latency when security is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8940) Support for large-scale multi-tenant inotify service
[ https://issues.apache.org/jira/browse/HDFS-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348554#comment-15348554 ] Colin Patrick McCabe commented on HDFS-8940: bq. You mean reading inotify messages from the SbNN? It's a very attractive idea from scalability angle. But how would we handle the staleness? The SbNN could be a few mins behind ANN right? Sorry for the misunderstanding. I wasn't talking about HDFS HA. The point that I was making is that you don't want a single point of failure in whatever service you are using to fetch the events from HDFS and put them in Kafka. Perhaps you could also execute the code which fetches events in the context of Kafka itself somehow, to avoid creating a new service? I'm not familiar with the programming model there. > Support for large-scale multi-tenant inotify service > > > Key: HDFS-8940 > URL: https://issues.apache.org/jira/browse/HDFS-8940 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma > Attachments: Large-Scale-Multi-Tenant-Inotify-Service.pdf > > > HDFS-6634 provides the core inotify functionality. We would like to extend > that to provide a large-scale service that ten of thousands of clients can > subscribe to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10328) Add per-cache-pool default replication num configuration
[ https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345791#comment-15345791 ] Colin Patrick McCabe commented on HDFS-10328: - Sorry for the breakage, [~kshukla]. HDFS-10555 should have fixed it-- check it out. > Add per-cache-pool default replication num configuration > > > Key: HDFS-10328 > URL: https://issues.apache.org/jira/browse/HDFS-10328 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching >Reporter: xupeng >Assignee: xupeng >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch, > HDFS-10328.003.patch, HDFS-10328.004.patch > > > For now, hdfs cacheadmin can not set a default replication num for cached > directive in the same cachepool. Each cache directive added in the same cache > pool should set their own replication num individually. > Consider this situation, we add daily hive table into cache pool "hive" .Each > time i should set the same replication num for every table directive in the > same cache pool. > I think we should enable setting a default replication num for a cachepool > that every cache directive in the pool can inherit replication configuration > from the pool. Also cache directive can override replication configuration > explicitly by calling "add & modify directive -replication" command from > cli. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8940) Support for large-scale multi-tenant inotify service
[ https://issues.apache.org/jira/browse/HDFS-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345766#comment-15345766 ] Colin Patrick McCabe commented on HDFS-8940: I think Kafka would be a great choice for scaling HDFS inotify. You would probably want an HA service for fetching HDFS inotify messages, and then just put them directly into Kafka. No serialization needed because it's already protobuf. > Support for large-scale multi-tenant inotify service > > > Key: HDFS-8940 > URL: https://issues.apache.org/jira/browse/HDFS-8940 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma > Attachments: Large-Scale-Multi-Tenant-Inotify-Service.pdf > > > HDFS-6634 provides the core inotify functionality. We would like to extend > that to provide a large-scale service that ten of thousands of clients can > subscribe to. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340868#comment-15340868 ] Colin Patrick McCabe commented on HDFS-10301: - The "you" in that sentence was targetted at you, [~shv]. I realized that [~redvine] wrote the patch, but I spoke imprecisely. Sorry for the confusion. bq. This is her first encounter with HDFS community. Let's try to make it pleasant enough so that she wished to come back and work with us more. To be honest, I don't think this is a very good newbie JIRA. It is clearly a very controversial issue, and it's also a very difficult piece of code with a lot of subtlety. Since you clearly have strong opinions about this JIRA, I believe it would be more appropriate for you to post patches implementing your ideas yourself. But that is up to you, of course. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.01.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10448) CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with replication factors other than 1
[ https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340862#comment-15340862 ] Colin Patrick McCabe commented on HDFS-10448: - Committed to 2.8. Thanks, [~linyiqun]! Sorry for the delays in reviews. > CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with > replication factors other than 1 > -- > > Key: HDFS-10448 > URL: https://issues.apache.org/jira/browse/HDFS-10448 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Fix For: 2.8.0 > > Attachments: HDFS-10448.001.patch > > > The logic in {{CacheManager#checkLimit}} is not correct. In this method, it > does with these three logic: > First, it will compute needed bytes for the specific path. > {code} > CacheDirectiveStats stats = computeNeeded(path, replication); > {code} > But the param {{replication}} is not used here. And the bytesNeeded is just > one replication's vaue. > {code} > return new CacheDirectiveStats.Builder() > .setBytesNeeded(requestedBytes) > .setFilesCached(requestedFiles) > .build(); > {code} > Second, then it should be multiply by the replication to compare the limit > size because the method {{computeNeeded}} was not used replication. > {code} > pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > > pool.getLimit() > {code} > Third, if we find the size was more than the limit value and then print > warning info. It divided by replication here, while the > {{stats.getBytesNeeded()}} was just one replication value. > {code} > throw new InvalidRequestException("Caching path " + path + " of size " > + stats.getBytesNeeded() / replication + " bytes at replication " > + replication + " would exceed pool " + pool.getPoolName() > + "'s remaining capacity of " > + (pool.getLimit() - pool.getBytesNeeded()) + " bytes."); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10448) CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with replication factors other than 1
[ https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10448: Resolution: Fixed Fix Version/s: 2.8.0 Target Version/s: 2.8.0 Status: Resolved (was: Patch Available) > CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with > replication factors other than 1 > -- > > Key: HDFS-10448 > URL: https://issues.apache.org/jira/browse/HDFS-10448 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Fix For: 2.8.0 > > Attachments: HDFS-10448.001.patch > > > The logic in {{CacheManager#checkLimit}} is not correct. In this method, it > does with these three logic: > First, it will compute needed bytes for the specific path. > {code} > CacheDirectiveStats stats = computeNeeded(path, replication); > {code} > But the param {{replication}} is not used here. And the bytesNeeded is just > one replication's vaue. > {code} > return new CacheDirectiveStats.Builder() > .setBytesNeeded(requestedBytes) > .setFilesCached(requestedFiles) > .build(); > {code} > Second, then it should be multiply by the replication to compare the limit > size because the method {{computeNeeded}} was not used replication. > {code} > pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > > pool.getLimit() > {code} > Third, if we find the size was more than the limit value and then print > warning info. It divided by replication here, while the > {{stats.getBytesNeeded()}} was just one replication value. > {code} > throw new InvalidRequestException("Caching path " + path + " of size " > + stats.getBytesNeeded() / replication + " bytes at replication " > + replication + " would exceed pool " + pool.getPoolName() > + "'s remaining capacity of " > + (pool.getLimit() - pool.getBytesNeeded()) + " bytes."); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10448) CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with replication factors other than 1
[ https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10448: Summary: CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with replication factors other than 1 (was: CacheManager#checkLimit always assumes a replication factor of 1) > CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with > replication factors other than 1 > -- > > Key: HDFS-10448 > URL: https://issues.apache.org/jira/browse/HDFS-10448 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10448.001.patch > > > The logic in {{CacheManager#checkLimit}} is not correct. In this method, it > does with these three logic: > First, it will compute needed bytes for the specific path. > {code} > CacheDirectiveStats stats = computeNeeded(path, replication); > {code} > But the param {{replication}} is not used here. And the bytesNeeded is just > one replication's vaue. > {code} > return new CacheDirectiveStats.Builder() > .setBytesNeeded(requestedBytes) > .setFilesCached(requestedFiles) > .build(); > {code} > Second, then it should be multiply by the replication to compare the limit > size because the method {{computeNeeded}} was not used replication. > {code} > pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > > pool.getLimit() > {code} > Third, if we find the size was more than the limit value and then print > warning info. It divided by replication here, while the > {{stats.getBytesNeeded()}} was just one replication value. > {code} > throw new InvalidRequestException("Caching path " + path + " of size " > + stats.getBytesNeeded() / replication + " bytes at replication " > + replication + " would exceed pool " + pool.getPoolName() > + "'s remaining capacity of " > + (pool.getLimit() - pool.getBytesNeeded()) + " bytes."); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10448) CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with replication factors other than 1
[ https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340851#comment-15340851 ] Colin Patrick McCabe commented on HDFS-10448: - Hi [~linyiqun], Sorry, I misread the patch the first time around. You are indeed changing computeNeeded to take the replication factor into account, which seems like a better way to go. +1 > CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with > replication factors other than 1 > -- > > Key: HDFS-10448 > URL: https://issues.apache.org/jira/browse/HDFS-10448 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10448.001.patch > > > The logic in {{CacheManager#checkLimit}} is not correct. In this method, it > does with these three logic: > First, it will compute needed bytes for the specific path. > {code} > CacheDirectiveStats stats = computeNeeded(path, replication); > {code} > But the param {{replication}} is not used here. And the bytesNeeded is just > one replication's vaue. > {code} > return new CacheDirectiveStats.Builder() > .setBytesNeeded(requestedBytes) > .setFilesCached(requestedFiles) > .build(); > {code} > Second, then it should be multiply by the replication to compare the limit > size because the method {{computeNeeded}} was not used replication. > {code} > pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > > pool.getLimit() > {code} > Third, if we find the size was more than the limit value and then print > warning info. It divided by replication here, while the > {{stats.getBytesNeeded()}} was just one replication value. > {code} > throw new InvalidRequestException("Caching path " + path + " of size " > + stats.getBytesNeeded() / replication + " bytes at replication " > + replication + " would exceed pool " + pool.getPoolName() > + "'s remaining capacity of " > + (pool.getLimit() - pool.getBytesNeeded()) + " bytes."); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10328) Add per-cache-pool default replication num configuration
[ https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10328: Resolution: Fixed Target Version/s: 2.9.0 Status: Resolved (was: Patch Available) Committed to 2.9. Thanks, [~xupener]. > Add per-cache-pool default replication num configuration > > > Key: HDFS-10328 > URL: https://issues.apache.org/jira/browse/HDFS-10328 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching >Reporter: xupeng >Assignee: xupeng >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch, > HDFS-10328.003.patch, HDFS-10328.004.patch > > > For now, hdfs cacheadmin can not set a default replication num for cached > directive in the same cachepool. Each cache directive added in the same cache > pool should set their own replication num individually. > Consider this situation, we add daily hive table into cache pool "hive" .Each > time i should set the same replication num for every table directive in the > same cache pool. > I think we should enable setting a default replication num for a cachepool > that every cache directive in the pool can inherit replication configuration > from the pool. Also cache directive can override replication configuration > explicitly by calling "add & modify directive -replication" command from > cli. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote
[ https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340836#comment-15340836 ] Colin Patrick McCabe commented on HDFS-10548: - I would love to see this class go away. It is truly a relic of another time, which lasted much longer than it should. I think the only remaining use case for it is when using sockets that don't have associated channels (SOCKS sockets don't, I think?) We should be able to create adaptors for those, though, assuming anyone even uses SOCKS with the DN any more. Unfortunately I don't have a lot of time to review this at the moment, though. > Remove the long deprecated BlockReaderRemote > > > Key: HDFS-10548 > URL: https://issues.apache.org/jira/browse/HDFS-10548 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Kai Zheng >Assignee: Kai Zheng > > To lessen the maintain burden like raised in HDFS-8901, suggest we remove > {{BlockReaderRemote}} class that's deprecated very long time ago. > From {{BlockReaderRemote}} header: > {quote} > * @deprecated this is an old implementation that is being left around > * in case any issues spring up with the new {@link BlockReaderRemote2} > * implementation. > * It will be removed in the next release. > {quote} > From {{BlockReaderRemote2}} class header: > {quote} > * This is a new implementation introduced in Hadoop 0.23 which > * is more efficient and simpler than the older BlockReader > * implementation. It should be renamed to BlockReaderRemote > * once we are confident in it. > {quote} > So even further, after getting rid of the old class, we could rename as the > comment suggested: BlockReaderRemote2 => BlockReaderRemote. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10328) Add per-cache-pool default replication num configuration
[ https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10328: Attachment: (was: HDFS-10328.004.patch) > Add per-cache-pool default replication num configuration > > > Key: HDFS-10328 > URL: https://issues.apache.org/jira/browse/HDFS-10328 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching >Reporter: xupeng >Assignee: xupeng >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch, > HDFS-10328.003.patch, HDFS-10328.004.patch > > > For now, hdfs cacheadmin can not set a default replication num for cached > directive in the same cachepool. Each cache directive added in the same cache > pool should set their own replication num individually. > Consider this situation, we add daily hive table into cache pool "hive" .Each > time i should set the same replication num for every table directive in the > same cache pool. > I think we should enable setting a default replication num for a cachepool > that every cache directive in the pool can inherit replication configuration > from the pool. Also cache directive can override replication configuration > explicitly by calling "add & modify directive -replication" command from > cli. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10328) Add per-cache-pool default replication num configuration
[ https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10328: Attachment: HDFS-10328.004.patch Reposting patch 004 (and rebasing on trunk) to get a Jenkins run > Add per-cache-pool default replication num configuration > > > Key: HDFS-10328 > URL: https://issues.apache.org/jira/browse/HDFS-10328 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching >Reporter: xupeng >Assignee: xupeng >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch, > HDFS-10328.003.patch, HDFS-10328.004.patch, HDFS-10328.004.patch > > > For now, hdfs cacheadmin can not set a default replication num for cached > directive in the same cachepool. Each cache directive added in the same cache > pool should set their own replication num individually. > Consider this situation, we add daily hive table into cache pool "hive" .Each > time i should set the same replication num for every table directive in the > same cache pool. > I think we should enable setting a default replication num for a cachepool > that every cache directive in the pool can inherit replication configuration > from the pool. Also cache directive can override replication configuration > explicitly by calling "add & modify directive -replication" command from > cli. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10328) Add per-cache-pool default replication num configuration
[ https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336775#comment-15336775 ] Colin Patrick McCabe commented on HDFS-10328: - +1 pending jenkins. Thanks, [~xupener]. > Add per-cache-pool default replication num configuration > > > Key: HDFS-10328 > URL: https://issues.apache.org/jira/browse/HDFS-10328 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching >Reporter: xupeng >Assignee: xupeng >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch, > HDFS-10328.003.patch, HDFS-10328.004.patch > > > For now, hdfs cacheadmin can not set a default replication num for cached > directive in the same cachepool. Each cache directive added in the same cache > pool should set their own replication num individually. > Consider this situation, we add daily hive table into cache pool "hive" .Each > time i should set the same replication num for every table directive in the > same cache pool. > I think we should enable setting a default replication num for a cachepool > that every cache directive in the pool can inherit replication configuration > from the pool. Also cache directive can override replication configuration > explicitly by calling "add & modify directive -replication" command from > cli. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335100#comment-15335100 ] Colin Patrick McCabe commented on HDFS-10301: - {code} +if (context.getTotalRpcs() == context.getCurRpc() + 1) { + long leaseId = this.getBlockReportLeaseManager().removeLease(node); + BlockManagerFaultInjector.getInstance(). + removeBlockReportLease(node, leaseId); } +LOG.debug("Processing RPC with index " + context.getCurRpc() ++ " out of total " + context.getTotalRpcs() + " RPCs in " ++ "processReport 0x" + +Long.toHexString(context.getReportId())); } {code} This won't work in the presence of reordered RPCs. If the RPCs are reordered so that curRpc 1 arrives before curRpc 0, the lease will be removed and RPC 0 will be rejected. {code} for (int r = 0; r < reports.length; r++) { final BlockListAsLongs blocks = reports[r].getBlocks(); if (blocks != BlockListAsLongs.STORAGE_REPORT_ONLY) { {code} Using object equality to compare two {{BlockListAsLongs}} objects is very surprising to anyone reading the code. In general, I find the idea of overloading the block list to sometimes not be a block list to be very weird and surprising. If we are going to do it, it certainly needs a lot of comments in the code to explain what's going on. I think it would be clearer and less error-prone just to add an optional list of storage ID strings in the {{.proto}} file. {code} if (nn.getFSImage().isUpgradeFinalized()) { Set storageIDsInBlockReport = new HashSet<>(); if (context.getTotalRpcs() == context.getCurRpc() + 1) { for (StorageBlockReport report : reports) { storageIDsInBlockReport.add(report.getStorage().getStorageID()); } bm.removeZombieStorages(nodeReg, context, storageIDsInBlockReport); } } {code} This isn't going to work in the presence of reordered RPCs, is it? If curRpc 1 appears before curRpc 0, we'll never get into this clause at all and so zombies won't be removed. Considering you are so concerned that my patch didn't solve the interleaved and/or reordered RPC case, this seems like something you should solve. I also don't understand what the rationale for ignoring zombies during an upgrade is. Keep in mind zombie storages can lead to data loss under some conditions (see HDFS-7960 for details). > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Vinitha Reddy Gankidi >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.01.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-9466: --- Resolution: Fixed Fix Version/s: 2.8.0 Target Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to 2.8. Thanks, [~jojochuang]. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Fix For: 2.8.0 > > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10525) Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap
[ https://issues.apache.org/jira/browse/HDFS-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10525: Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) +1. Committed to 2.8. Thanks, [~xiaochen]. > Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap > --- > > Key: HDFS-10525 > URL: https://issues.apache.org/jira/browse/HDFS-10525 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.8.0 >Reporter: Xiao Chen >Assignee: Xiao Chen > Fix For: 2.8.0 > > Attachments: HDFS-10525.01.patch, HDFS-10525.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10525) Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap
[ https://issues.apache.org/jira/browse/HDFS-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333153#comment-15333153 ] Colin Patrick McCabe commented on HDFS-10525: - +1. Thanks, [~xiaochen]. > Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap > --- > > Key: HDFS-10525 > URL: https://issues.apache.org/jira/browse/HDFS-10525 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.8.0 >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-10525.01.patch, HDFS-10525.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10505) OIV's ReverseXML processor should support ACLs
[ https://issues.apache.org/jira/browse/HDFS-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10505: Resolution: Fixed Fix Version/s: 2.8.0 Target Version/s: 2.8.0 Status: Resolved (was: Patch Available) > OIV's ReverseXML processor should support ACLs > -- > > Key: HDFS-10505 > URL: https://issues.apache.org/jira/browse/HDFS-10505 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.8.0 >Reporter: Colin Patrick McCabe >Assignee: Surendra Singh Lilhore > Fix For: 2.8.0 > > Attachments: HDFS-10505-001.patch, HDFS-10505-002.patch > > > OIV's ReverseXML processor should support ACLs. Currently ACLs show up in > the fsimage.xml file, but we don't reconstruct them with ReverseXML. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10505) OIV's ReverseXML processor should support ACLs
[ https://issues.apache.org/jira/browse/HDFS-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333136#comment-15333136 ] Colin Patrick McCabe commented on HDFS-10505: - +1. Thanks, [~surendrasingh] > OIV's ReverseXML processor should support ACLs > -- > > Key: HDFS-10505 > URL: https://issues.apache.org/jira/browse/HDFS-10505 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.8.0 >Reporter: Colin Patrick McCabe >Assignee: Surendra Singh Lilhore > Attachments: HDFS-10505-001.patch, HDFS-10505-002.patch > > > OIV's ReverseXML processor should support ACLs. Currently ACLs show up in > the fsimage.xml file, but we don't reconstruct them with ReverseXML. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10525) Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap
[ https://issues.apache.org/jira/browse/HDFS-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330503#comment-15330503 ] Colin Patrick McCabe commented on HDFS-10525: - +1. Thanks, [~xiaochen]. > Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap > --- > > Key: HDFS-10525 > URL: https://issues.apache.org/jira/browse/HDFS-10525 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.8.0 >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-10525.01.patch, HDFS-10525.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10505) OIV's ReverseXML processor should support ACLs
[ https://issues.apache.org/jira/browse/HDFS-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330492#comment-15330492 ] Colin Patrick McCabe commented on HDFS-10505: - Thanks for this, [~surendrasingh]. It's good to see progress on supporting ACLs here! I am confused by the changes for setting {{latestStringId}} to 1, or special-casing {{null}} in {{registerStringId}}. If we are going to do "magical" things with special indexes in the string table, we need to document it somewhere. Actually, though, I would prefer to simply handle it without the magic. We know that a null entry for an ACL name simply means that the name was an empty string. You can see that in {{AclEntry.java}}: {code} String name = split[index]; if (!name.isEmpty()) { builder.setName(name); } {code} In ReverseXML, we should simply translate these {{null}} ACL names back into empty strings, and then the existing logic for handling the string table would work, with no magic. We also need a test case which has null ACL names, so that this code is being exercised. > OIV's ReverseXML processor should support ACLs > -- > > Key: HDFS-10505 > URL: https://issues.apache.org/jira/browse/HDFS-10505 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.8.0 >Reporter: Colin Patrick McCabe >Assignee: Surendra Singh Lilhore > Attachments: HDFS-10505-001.patch > > > OIV's ReverseXML processor should support ACLs. Currently ACLs show up in > the fsimage.xml file, but we don't reconstruct them with ReverseXML. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10525) Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap
[ https://issues.apache.org/jira/browse/HDFS-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330133#comment-15330133 ] Colin Patrick McCabe commented on HDFS-10525: - Thanks, [~xiaochen]. Can you add a {{LOG.debug}} to the "if" statement that talks about the block ID that is getting skipped? +1 once that's done. > Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap > --- > > Key: HDFS-10525 > URL: https://issues.apache.org/jira/browse/HDFS-10525 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.8.0 >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-10525.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322994#comment-15322994 ] Colin Patrick McCabe commented on HDFS-10301: - [~shv], comments about me "being on a -1 spree" are not constructive and they don't do anything to help the tone of the discussion. We've been talking about this since April and my views have been consistent the whole time. I have a solution, but I am open to other solutions as long as they don't have big disadvantages. bq. The whole approach of keeping the state for the block report processing on the NameNode is error-prone. It assumes at-once execution, and therefore when block reports interleave the BR-state gets messed up. Particularly, the BitSet used to mark storages, which have been processed, can be reset during interleaving multiple times and cannot be used to count storages in the report. In current implementation the messing-up of BR-state leads to false positive detection of a zombie storage and removal of a perfectly valid one. Block report processing is inherently about state. It is inherently stateful. It is a mechanism for the DN to synchronize its entire block state with the block state on the NN. Interleaved block reports are very bad news, even if this bug didn't exist, because they mean that the state on the NN will go "back in time" for some storages, rather than monotonically moving forward in time. This may lead the NN to make incorrect (and potentially irreversible) decisions like deleting a replica somewhere because it appears to exist on an old stale interleaved block report. Keep in mind that these old stale interleaved FBRs will override any incremental BRs that were sent in the meantime! Interleaved block reports also potentially indicate that the DNs are sending new full block reports before the last ones have been processed. So either our FBR retransmission mechanism is screwed up and is spewing a firehose of FBRs at an unresponsive NameNode (making it even more unresponsive, no doubt), or the NN can't process an FBR in the extremely long FBR sending period. Both of these explanations mean that you've got a cluster which has serious, serious problems and you should fix it right now. That's the reason why people are not taking this JIRA as seriously as they otherwise might-- because they know that interleaved FBRs mean that something is very wrong. And you are consistently ignoring this feedback and telling us how my patch is bad because it doesn't perform zombie storage elimination when FBRs get interleaved. bq. It seems that you don't or don't want to understand reasoning around adding separate storage reporting RPC call. At least you addressed it only by repeating your -1. For the third time. And did not respond to Zhe Zhang's proposal to merge the storage reporting RPC into one of the storage reports in the next jira. Given that and in order to move forward, we should look into making changes to the last BR RPC call, which should now also report all storages. I am fine with adding storage reporting to any of the existing FBR RPCs. What I am not fine with is adding another RPC which will create more load. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10506) OIV's ReverseXML processor cannot reconstruct some snapshot details
[ https://issues.apache.org/jira/browse/HDFS-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321320#comment-15321320 ] Colin Patrick McCabe commented on HDFS-10506: - >From >{{hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageReconstructor.java}}: {code} private void processDirDiffEntry() throws IOException { LOG.debug("Processing dirDiffEntry"); ... // TODO: add missing snapshotCopy field to XML {code} {code} private void processFileDiffEntry() throws IOException { LOG.debug("Processing fileDiffEntry"); ... // TODO: missing snapshotCopy // TODO: missing blocks fileDiff.verifyNoRemainingKeys("fileDiff"); bld.build().writeDelimitedTo(out); } expectTagEnd(SNAPSHOT_DIFF_SECTION_FILE_DIFF_ENTRY); } {code} > OIV's ReverseXML processor cannot reconstruct some snapshot details > --- > > Key: HDFS-10506 > URL: https://issues.apache.org/jira/browse/HDFS-10506 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.8.0 >Reporter: Colin Patrick McCabe > > OIV's ReverseXML processor cannot reconstruct some snapshot details. > Specifically, should contain a and field, > but does not. should contain a field. OIV also > needs to be changed to emit these fields into the XML (they are currently > missing). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10505) OIV's ReverseXML processor should support ACLs
[ https://issues.apache.org/jira/browse/HDFS-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321317#comment-15321317 ] Colin Patrick McCabe commented on HDFS-10505: - >From >{{hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageReconstructor.java}}: {code} private INodeSection.AclFeatureProto.Builder aclXmlToProto(Node acl) throws IOException { // TODO: support ACLs throw new IOException("ACLs are not supported yet."); } {code} > OIV's ReverseXML processor should support ACLs > -- > > Key: HDFS-10505 > URL: https://issues.apache.org/jira/browse/HDFS-10505 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 2.8.0 >Reporter: Colin Patrick McCabe > > OIV's ReverseXML processor should support ACLs. Currently ACLs show up in > the fsimage.xml file, but we don't reconstruct them with ReverseXML. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10506) OIV's ReverseXML processor cannot reconstruct some snapshot details
Colin Patrick McCabe created HDFS-10506: --- Summary: OIV's ReverseXML processor cannot reconstruct some snapshot details Key: HDFS-10506 URL: https://issues.apache.org/jira/browse/HDFS-10506 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe OIV's ReverseXML processor cannot reconstruct some snapshot details. Specifically, should contain a and field, but does not. should contain a field. OIV also needs to be changed to emit these fields into the XML (they are currently missing). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10505) OIV's ReverseXML processor should support ACLs
Colin Patrick McCabe created HDFS-10505: --- Summary: OIV's ReverseXML processor should support ACLs Key: HDFS-10505 URL: https://issues.apache.org/jira/browse/HDFS-10505 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe OIV's ReverseXML processor should support ACLs. Currently ACLs show up in the fsimage.xml file, but we don't reconstruct them with ReverseXML. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-8061) Create an Offline FSImage Viewer tool
[ https://issues.apache.org/jira/browse/HDFS-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe resolved HDFS-8061. Resolution: Duplicate Fix Version/s: 2.8.0 Target Version/s: 2.8.0 I believe this is a duplicate of HDFS-9835. Feel free to reopen if there is more here not covered by that JIRA. > Create an Offline FSImage Viewer tool > - > > Key: HDFS-8061 > URL: https://issues.apache.org/jira/browse/HDFS-8061 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode >Reporter: Mike Drob >Assignee: Lei (Eddy) Xu > Fix For: 2.8.0 > > > We already have a tool for converting edit logs to and from binary and xml. > The next logical step it to create an `oiv` (offline image viewer) that will > allow users to manipulate the FS Image. > When outputting to text, it might make sense to have two output formats - 1) > an XML that is easier to convert back to binary and 2) something that looks > like the output from `tree` command. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8306) Outputs Xattr in OIV XML format
[ https://issues.apache.org/jira/browse/HDFS-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-8306: --- Resolution: Duplicate Fix Version/s: 2.8.0 Target Version/s: 2.8.0 (was: ) Status: Resolved (was: Patch Available) We added xattrs in the OIV XML format in HDFS-9835. > Outputs Xattr in OIV XML format > --- > > Key: HDFS-8306 > URL: https://issues.apache.org/jira/browse/HDFS-8306 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.7.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 2.8.0 > > Attachments: HDFS-8306.000.patch, HDFS-8306.001.patch, > HDFS-8306.002.patch, HDFS-8306.003.patch, HDFS-8306.004.patch, > HDFS-8306.005.patch, HDFS-8306.006.patch, HDFS-8306.007.patch, > HDFS-8306.008.patch, HDFS-8306.009.patch, HDFS-8306.debug0.patch, > HDFS-8306.debug1.patch > > > Currently, in the {{hdfs oiv}} XML outputs, not all fields of fsimage are > outputs. It makes inspecting {{fsimage}} from XML outputs less practical. > Also it prevents recovering a fsimage from XML file. > This JIRA is adding ACL and XAttrs in the XML outputs as the first step to > achieve the goal described in HDFS-8061. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317644#comment-15317644 ] Colin Patrick McCabe commented on HDFS-10301: - Sorry for the slow reply. I was on vacation. Like I said earlier, I am -1 on patch v4 because adding new RPCs is bad for NN scalability. I also think it's a much larger patch than needed. It doesn't make sense as an interim solution. Why don't we commit v5 and discuss improvements in a follow-on JIRA? So far there is no concrete argument against it other than the fact that it doesn't remove zombie storages in the case where BRs are interleaved. But we already know that BR interleaving is an extremely rare corner case-- otherwise you can bet that this JIRA would have attracted a lot more attention. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317629#comment-15317629 ] Colin Patrick McCabe commented on HDFS-9924: +1 for a feature branch > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308997#comment-15308997 ] Colin Patrick McCabe commented on HDFS-10301: - bq. Vinitha's patch adds one RPC only in the case when block reports are sent in multiple RPCs. The case where block reports are sent in multiple RPCs is exactly the case where scalability is the most important, since it indicates that we have a large number of blocks. My patch adds no new RPCs. If we are going to take an alternate approach, it should not involve a performance regression. bq. Could you please review the patch. I did review the patch. I suggested adding an optional field in an existing RPC rather than adding a new RPC, and stated that I was -1 on adding new RPC load to the NN. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308980#comment-15308980 ] Colin Patrick McCabe edited comment on HDFS-9466 at 6/1/16 12:52 AM: - Thanks for the explanation. It sounds like the race condition is that the ShortCircuitRegistry on the DN needs to be informed about the client's decision that short-circuit is not working for the block, and this RPC takes time to arrive. That background process races with completing the TCP read successfully and checking the number of slots in the unit test. {code} public static interface Visitor { -void accept(HashMapsegments, +boolean accept(HashMap segments, HashMultimap slots); } {code} I don't think it makes sense to change the return type of the visitor. While you might find a boolean convenient, some other potential users of the interface might not find it useful. Instead, just have your closure modify a {{final MutableBoolean}} declared nearby. {code} +}, 100, 1); {code} It seems like we could lower the latency here (perhaps check every 10 ms) and lengthen the timeout. Since the test timeouts are generally 60s, I don't think it makes sense to make this timeout shorter than that. +1 once that's addressed. Thanks, [~jojochuang]. Sorry for the delay in reviews. was (Author: cmccabe): Thanks for the explanation. It sounds like the race condition is that the ShortCircuitRegistry on the DN needs to be informed about the client's decision that short-circuit is not working for the block, and this RPC takes time to arrive. That background process races with completing the TCP read successfully and checking the number of slots in the unit test. {code} public static interface Visitor { -void accept(HashMap segments, +boolean accept(HashMap segments, HashMultimap slots); } {code} I don't think it makes sense to change the return type of the visitor. While you might find a boolean convenient, some other potential users of the interface would have no use for it. Instead, just have your closure modify a {{final MutableBoolean}} declared nearby. {code} +}, 100, 1); {code} No reason to make this shorter than the test limit, surely? +1 once that's addressed. Thanks, [~jojochuang]. Sorry for the delay in reviews. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308980#comment-15308980 ] Colin Patrick McCabe commented on HDFS-9466: Thanks for the explanation. It sounds like the race condition is that the ShortCircuitRegistry on the DN needs to be informed about the client's decision that short-circuit is not working for the block, and this RPC takes time to arrive. That background process races with completing the TCP read successfully and checking the number of slots in the unit test. {code} public static interface Visitor { -void accept(HashMapsegments, +boolean accept(HashMap segments, HashMultimap slots); } {code} I don't think it makes sense to change the return type of the visitor. While you might find a boolean convenient, some other potential users of the interface would have no use for it. Instead, just have your closure modify a {{final MutableBoolean}} declared nearby. {code} +}, 100, 1); {code} No reason to make this shorter than the test limit, surely? +1 once that's addressed. Thanks, [~jojochuang]. Sorry for the delay in reviews. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10415: Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to 2.8. > TestDistributedFileSystem#MyDistributedFileSystem attempts to set up > statistics before initialize() is called > - > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308928#comment-15308928 ] Colin Patrick McCabe commented on HDFS-10415: - The subclass can change the configuration that gets passed to the superclass. class SuperClass { SuperClass(Configuration conf) { ... initialize superclass part of the object ... } } class SubClass extends SuperClass { SubClass(Configuration conf) { super(changeConf(conf)); ... initialize my part of the object ... } private static Configuration changeConf(Configuration conf) { Configuration nconf = new Configuration(conf); nconf.set("foo", "bar"); return nconf; } } Having a separate init() method is a well-known antipattern. Initialization belongs in the constructor. The only time a separate init method is really necessary is if you're using a dialect of C++ that doesn't support exceptions. > TestDistributedFileSystem#MyDistributedFileSystem attempts to set up > statistics before initialize() is called > - > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308928#comment-15308928 ] Colin Patrick McCabe edited comment on HDFS-10415 at 6/1/16 12:09 AM: -- The subclass can change the configuration that gets passed to the superclass. {code} class SuperClass { SuperClass(Configuration conf) { ... initialize superclass part of the object ... } } class SubClass extends SuperClass { SubClass(Configuration conf) { super(changeConf(conf)); ... initialize my part of the object ... } private static Configuration changeConf(Configuration conf) { Configuration nconf = new Configuration(conf); nconf.set("foo", "bar"); return nconf; } } {code} Having a separate init() method is a well-known antipattern. Initialization belongs in the constructor. The only time a separate init method is really necessary is if you're using a dialect of C++ that doesn't support exceptions. was (Author: cmccabe): The subclass can change the configuration that gets passed to the superclass. class SuperClass { SuperClass(Configuration conf) { ... initialize superclass part of the object ... } } class SubClass extends SuperClass { SubClass(Configuration conf) { super(changeConf(conf)); ... initialize my part of the object ... } private static Configuration changeConf(Configuration conf) { Configuration nconf = new Configuration(conf); nconf.set("foo", "bar"); return nconf; } } Having a separate init() method is a well-known antipattern. Initialization belongs in the constructor. The only time a separate init method is really necessary is if you're using a dialect of C++ that doesn't support exceptions. > TestDistributedFileSystem#MyDistributedFileSystem attempts to set up > statistics before initialize() is called > - > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10415: Summary: TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called (was: TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2) > TestDistributedFileSystem#MyDistributedFileSystem attempts to set up > statistics before initialize() is called > - > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10415) TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308894#comment-15308894 ] Colin Patrick McCabe commented on HDFS-10415: - It sounds like there are no strong objections to HDFS-10415.000.patch and HDFS-10415-branch-2.001.patch Let's fix this unit test! We can improve this in a follow-on JIRA (personally, I like the idea of adding the initialization to the {{init}} method). But it's not worth blocking the unit test fix. +1. > TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2 > -- > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304702#comment-15304702 ] Colin Patrick McCabe commented on HDFS-10301: - [~redvine], the fact that you are having trouble with stale storages versus zombie storages is because your patch uses a separate mechanism to detect what storages exist on the DN. The existing code doesn't have this problem because the full block report itself acted as the record of what storages existed. This is one negative side effect of the more complex approach. Another negative side effect is that you are transmitting the same information about which storages are present multiple times. Despite these negatives, I'm still willing to review a patch that uses the more complicated method as long as you don't introduce extra RPCs. I agree that we should remove a stale storage if it doesn't appear in the full listing that gets sent. Just to be clear, I am -1 on a patch which adds extra RPCs. Perhaps you can send this listing in an optional field in the first RPC. [~daryn], I don't like the idea of "band-aiding" this issue rather than fixing it at the root. Throwing an exception on interleaved storage reports, or forbidding combined storage reports, seem like very brittle work-arounds that could easily be undone by someone making follow-on changes. I came up with patch 005 and the earlier patches as a very simple fix that could easily be backported. If you are interested in something simple, then please check it out... or at least give a reason for not checking it out. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303085#comment-15303085 ] Colin Patrick McCabe commented on HDFS-7240: bq. Correct me if I am wrong – before Andrew Wang's contribution, symlink was somehow working (based on Eli Collins's work). After Andrew's work, we had no choice but disable the symlink feature. It this sense, symlink became even worse. Anyway, Andrew/Eli, any plan to fix symlink? Symlinks were broken before Andrew started working on them. They had serious security, performance, and usability issues. If you are interested in learning more about the issues and helping to fix them, take a look at HADOOP-10019. They were disabled to avoid exposing people to serious security risks. In the meantime, I will note that you were one of the reviewers on the JIRA that initially introduced symlinks, HDFS-245, before Andrew or I had even started working on Hadoop. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, > ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302426#comment-15302426 ] Colin Patrick McCabe commented on HDFS-10301: - I never said that patch 004 introduced incompatible changes. I just argued that it was a bigger change than was necessary to fix the problem. All other things being equal, we would prefer a smaller change to a bigger one. The only argument you have given against my change is that it doesn't fix the problem in the case where full block reports are interleaved. But this is an extremely, extremely rare case, to the point where nobody else has even seen this problem in their cluster. I still think that patch 005 is an easier way to fix the problem. It's basically a simple bugfix to my original patch. However, if you want to do something more complex, I will review it. But I don't want to add any additional RPCs. We already have problems with NameNode performance and we should not be adding more RPCs when it's not needed. We can include the storage information in the first RPC of the block report as an optional field. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301560#comment-15301560 ] Colin Patrick McCabe commented on HDFS-7240: bq. [~szetszwo] wrote: I seem to recall that you got your committership by contributing the symlink feature, however, the symlink feature is still not working as of today. Why don't you fix it? I think you want to build up a good track record for yourself. [~andrew.wang] did not get his commitership by contributing the symlink feature. By the time he was elected as a committer, he had contributed a system for efficiently storing and reporting high-percentile metrics, an API to expose disk location information to advanced HDFS clients, converted all remaining JUnit 3 HDFS tests to JUnit 4, and added symlink support to FileSystem. The last one was just contributing a new API to the FileSystem class, not implementing the symlink feature itself. You are probably thinking of [~eli], who became a committer partly by working on HDFS symlinks. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, > ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298675#comment-15298675 ] Colin Patrick McCabe commented on HDFS-10301: - Oh, sorry! I didn't realize we had added a new rule about attaching patches. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10301: Attachment: HDFS-10301.005.patch Rebasing patch 003 on trunk. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298565#comment-15298565 ] Colin Patrick McCabe commented on HDFS-10301: - Hi [~redvine], Thanks for your interest in this. I wish I could get more people interested in this JIRA-- it has been hard to raise interest, unfortunately. Just to clarify, you don't need to assign a JIRA to yourself in order to post a patch or suggest a solution. In general, when someone is actively working on a patch, you should ask before reassigning their JIRAs to yourself. A whole separate RPC just for reporting the storages which are present seems excessive. It will add additional load to the namenode. {code} if (node.leaseId == 0) { - LOG.warn("BR lease 0x{} is not valid for DN {}, because the DN " + - "is not in the pending set.", - Long.toHexString(id), dn.getDatanodeUuid()); - return false; + LOG.debug("DN {} is not in the pending set because BR with lease 0x{} was processed out of order", + dn.getDatanodeUuid(), Long.toHexString(id)); + return true; {code} The leaseId being 0 doesn't mean that the block report was processed out of order. If you manually trigger a block report with the {{hdfs dfsadmin \-triggerBlockReport}} command, it will also have lease id 0. Legacy block reports will also have lease ID 0. In general, your solution doesn't fix the problem during upgrade and is a much bigger patch, which is why I think HDFS-10301.003.patch should be committed and the RPC changes should be done in a follow-on JIRA. I do not see us backporting RPC changes to all the stable branches. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.01.patch, HDFS-10301.sample.patch, > zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe reassigned HDFS-10301: --- Assignee: Colin Patrick McCabe (was: Vinitha Reddy Gankidi) > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.01.patch, HDFS-10301.sample.patch, > zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10448) CacheManager#checkLimit always assumes a replication factor of 1
[ https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298543#comment-15298543 ] Colin Patrick McCabe commented on HDFS-10448: - I think it should change {{computeNeeded}} to take replication into account, rather than modifying the code that calls {{computeNeeded}}. > CacheManager#checkLimit always assumes a replication factor of 1 > - > > Key: HDFS-10448 > URL: https://issues.apache.org/jira/browse/HDFS-10448 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10448.001.patch > > > The logic in {{CacheManager#checkLimit}} is not correct. In this method, it > does with these three logic: > First, it will compute needed bytes for the specific path. > {code} > CacheDirectiveStats stats = computeNeeded(path, replication); > {code} > But the param {{replication}} is not used here. And the bytesNeeded is just > one replication's vaue. > {code} > return new CacheDirectiveStats.Builder() > .setBytesNeeded(requestedBytes) > .setFilesCached(requestedFiles) > .build(); > {code} > Second, then it should be multiply by the replication to compare the limit > size because the method {{computeNeeded}} was not used replication. > {code} > pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > > pool.getLimit() > {code} > Third, if we find the size was more than the limit value and then print > warning info. It divided by replication here, while the > {{stats.getBytesNeeded()}} was just one replication value. > {code} > throw new InvalidRequestException("Caching path " + path + " of size " > + stats.getBytesNeeded() / replication + " bytes at replication " > + replication + " would exceed pool " + pool.getPoolName() > + "'s remaining capacity of " > + (pool.getLimit() - pool.getBytesNeeded()) + " bytes."); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10448) CacheManager#checkLimit always assumes a replication factor of 1
[ https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10448: Summary: CacheManager#checkLimit always assumes a replication factor of 1 (was: CacheManager#checkLimit not correctly) > CacheManager#checkLimit always assumes a replication factor of 1 > - > > Key: HDFS-10448 > URL: https://issues.apache.org/jira/browse/HDFS-10448 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10448.001.patch > > > The logic in {{CacheManager#checkLimit}} is not correct. In this method, it > does with these three logic: > First, it will compute needed bytes for the specific path. > {code} > CacheDirectiveStats stats = computeNeeded(path, replication); > {code} > But the param {{replication}} is not used here. And the bytesNeeded is just > one replication's vaue. > {code} > return new CacheDirectiveStats.Builder() > .setBytesNeeded(requestedBytes) > .setFilesCached(requestedFiles) > .build(); > {code} > Second, then it should be multiply by the replication to compare the limit > size because the method {{computeNeeded}} was not used replication. > {code} > pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > > pool.getLimit() > {code} > Third, if we find the size was more than the limit value and then print > warning info. It divided by replication here, while the > {{stats.getBytesNeeded()}} was just one replication value. > {code} > throw new InvalidRequestException("Caching path " + path + " of size " > + stats.getBytesNeeded() / replication + " bytes at replication " > + replication + " would exceed pool " + pool.getPoolName() > + "'s remaining capacity of " > + (pool.getLimit() - pool.getBytesNeeded()) + " bytes."); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10448) CacheManager#checkLimit not correctly
[ https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297068#comment-15297068 ] Colin Patrick McCabe commented on HDFS-10448: - This is a good find. I think that {{computeNeeded}} should take replication into account-- the fact that it doesn't currently is a bug. Then there would be no need to change the callers of {{computeNeeded}}. > CacheManager#checkLimit not correctly > -- > > Key: HDFS-10448 > URL: https://issues.apache.org/jira/browse/HDFS-10448 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10448.001.patch > > > The logic in {{CacheManager#checkLimit}} is not correct. In this method, it > does with these three logic: > First, it will compute needed bytes for the specific path. > {code} > CacheDirectiveStats stats = computeNeeded(path, replication); > {code} > But the param {{replication}} is not used here. And the bytesNeeded is just > one replication's vaue. > {code} > return new CacheDirectiveStats.Builder() > .setBytesNeeded(requestedBytes) > .setFilesCached(requestedFiles) > .build(); > {code} > Second, then it should be multiply by the replication to compare the limit > size because the method {{computeNeeded}} was not used replication. > {code} > pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > > pool.getLimit() > {code} > Third, if we find the size was more than the limit value and then print > warning info. It divided by replication here, while the > {{stats.getBytesNeeded()}} was just one replication value. > {code} > throw new InvalidRequestException("Caching path " + path + " of size " > + stats.getBytesNeeded() / replication + " bytes at replication " > + replication + " would exceed pool " + pool.getPoolName() > + "'s remaining capacity of " > + (pool.getLimit() - pool.getBytesNeeded()) + " bytes."); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294073#comment-15294073 ] Colin Patrick McCabe commented on HDFS-9924: bq. Good point! On the other hand, programmers who are NOT familiar with Node.js may NOT want something that supports callback chaining. Callback chaining is an optional feature which nobody is forced to use. I don't see why anyone would prefer Future over CompletableFuture. bq. Also, you might not have noticed, supporting Future is a step toward supporting CompletableFuture. I don't see why supporting Future is a step towards supporting a different API. I think Hadoop has too many APIs with duplicate functionality already, and we should try to minimize the cognitive load on new developers. bq. I guess you might have misunderstood the release process. The release manager could include/exclude any feature as she/he pleases. Which 2.x release do you want this to become part of? > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293822#comment-15293822 ] Colin Patrick McCabe commented on HDFS-7240: bq. Another question about reading the ApacheCon slides: the question "Why an Object Store" was well answered. How about "why an object store as part of HDFS"? IIUC Ozone is only leveraging a very small portion of HDFS code. Why should it be a part of HDFS instead of a separate project? That's a very good question. Why can't ozone be its own subproject within Hadoop? We could add a hadoop-ozone directory at the top level of the git repo. Ozone seems to be reusing very little of the HDFS code. For example, it doesn't store blocks the way the DataNode stores blocks. It doesn't run the HDFS NameNode. It doesn't use the HDFS client code. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293809#comment-15293809 ] Colin Patrick McCabe commented on HDFS-9924: I have to agree with [~andrew.wang] that it makes more sense to put these changes in trunk than in branch-2. The Hadoop 2.8 release has been blocked for a very, very long time. There are tons of features in branch-2 that have been waiting for almost a year to be released. Adding yet another feature to branch-2, when we're so far behind on releases, doesn't make sense. Programmers who are familiar with Node.js will want something that supports callback chaining, like CompletableFuture, rather than something like the old-style Future API. If we target this at branch-3, we can use the jdk8 CompletableFuture. If we are going to backport this to branch-2, we should do it once the feature is done, rather than backporting bits and pieces as we go. This is especially true when there are still open questions about the API. > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10415) TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289769#comment-15289769 ] Colin Patrick McCabe commented on HDFS-10415: - bq. As Steve Loughran's concern, if the stats has nothing to do with this unit test, we can consider avoiding it. I'm more favor of this approach. Sure. Thanks for the explanation. bq. there's another option, you know. Do the stats init in the constructor rather than initialize. There is no information used in setting up DFSClient.storageStatistics, its only ever written to once. Move it to the constructor and make final and maybe this problem will go away (maybe, mocks are a PITA) It seems like this would prevent us from using the Configuration object in the future when creating stats, right? I think we should keep this flexibility. This whole problem arises because the FileSystem constructor doesn't require a Configuration and it should, which leads to the "construct then initialize" idiom. If it just took a Configuration in the first place we could initialize everything in the constructor. grumble grumble > TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2 > -- > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10415) TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289312#comment-15289312 ] Colin Patrick McCabe commented on HDFS-10415: - Thanks for looking at this. So basically the problem is that we're attempting to do something in the constructor of our {{DistributedFileSystem}} subclass that requires that the FS already be initialized. Why not just override the {{initialize}} method with something like: {code} public void initialize() { super.initialize(); statistics = new FileSystem.Statistics("myhdfs"); // can't mock finals } {code} That seems like the most natural fix since it's not doing "weird stuff" that we don't do outside unit tests. I don't feel strongly about this, though, any of the solutions proposed here seems like it would work. > TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2 > -- > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.9.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8829) Make SO_RCVBUF and SO_SNDBUF size configurable for DataTransferProtocol sockets and allow configuring auto-tuning
[ https://issues.apache.org/jira/browse/HDFS-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289239#comment-15289239 ] Colin Patrick McCabe commented on HDFS-8829: There was no need to make it public because it's only used by unit tests. Is there a reason why it should be public? > Make SO_RCVBUF and SO_SNDBUF size configurable for DataTransferProtocol > sockets and allow configuring auto-tuning > - > > Key: HDFS-8829 > URL: https://issues.apache.org/jira/browse/HDFS-8829 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.3.0, 2.6.0 >Reporter: He Tianyi >Assignee: He Tianyi > Fix For: 2.8.0 > > Attachments: HDFS-8829.0001.patch, HDFS-8829.0002.patch, > HDFS-8829.0003.patch, HDFS-8829.0004.patch, HDFS-8829.0005.patch, > HDFS-8829.0006.patch > > > {code:java} > private void initDataXceiver(Configuration conf) throws IOException { > // find free port or use privileged port provided > TcpPeerServer tcpPeerServer; > if (secureResources != null) { > tcpPeerServer = new TcpPeerServer(secureResources); > } else { > tcpPeerServer = new TcpPeerServer(dnConf.socketWriteTimeout, > DataNode.getStreamingAddr(conf)); > } > > tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE); > {code} > The last line sets SO_RCVBUF explicitly, thus disabling tcp auto-tuning on > some system. > Shall we make this behavior configurable? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10404) Fix formatting of CacheAdmin command usage help text
[ https://issues.apache.org/jira/browse/HDFS-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10404: Resolution: Fixed Fix Version/s: 2.9.0 Target Version/s: 2.9.0 Status: Resolved (was: Patch Available) > Fix formatting of CacheAdmin command usage help text > > > Key: HDFS-10404 > URL: https://issues.apache.org/jira/browse/HDFS-10404 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Fix For: 2.9.0 > > Attachments: HDFS-10404.001.patch, HDFS-10404.002.patch > > > In {{CacheAdmin}}, there are two places that not completely showing the cmd > usage message. > {code} > $ hdfs cacheadmin > Usage: bin/hdfs cacheadmin [COMMAND] > [-addDirective -path -pool [-force] > [-replication ] [-ttl ]] > [-modifyDirective -id [-path ] [-force] [-replication > ] [-pool ] [-ttl ]] > [-listDirectives [-stats] [-path ] [-pool ] [-id ] > [-removeDirective ] > [-removeDirectives -path ] > [-addPool [-owner ] [-group ] [-mode ] > [-limit ] [-maxTtl ] > {code} > The command {{-listDirectives}} and {{-addPool}} are not showing completely, > they are both lacking a ']' in the end of line. > In the {{CentralizedCacheManagement.md}}, there is also a similar problem. > The page of {{CentralizedCacheManagement}} can also showed this, > https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10404) Fix formatting of CacheAdmin command usage help text
[ https://issues.apache.org/jira/browse/HDFS-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10404: Summary: Fix formatting of CacheAdmin command usage help text (was: CacheAdmin command usage message not shows completely) > Fix formatting of CacheAdmin command usage help text > > > Key: HDFS-10404 > URL: https://issues.apache.org/jira/browse/HDFS-10404 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10404.001.patch, HDFS-10404.002.patch > > > In {{CacheAdmin}}, there are two places that not completely showing the cmd > usage message. > {code} > $ hdfs cacheadmin > Usage: bin/hdfs cacheadmin [COMMAND] > [-addDirective -path -pool [-force] > [-replication ] [-ttl ]] > [-modifyDirective -id [-path ] [-force] [-replication > ] [-pool ] [-ttl ]] > [-listDirectives [-stats] [-path ] [-pool ] [-id ] > [-removeDirective ] > [-removeDirectives -path ] > [-addPool [-owner ] [-group ] [-mode ] > [-limit ] [-maxTtl ] > {code} > The command {{-listDirectives}} and {{-addPool}} are not showing completely, > they are both lacking a ']' in the end of line. > In the {{CentralizedCacheManagement.md}}, there is also a similar problem. > The page of {{CentralizedCacheManagement}} can also showed this, > https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10404) CacheAdmin command usage message not shows completely
[ https://issues.apache.org/jira/browse/HDFS-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287149#comment-15287149 ] Colin Patrick McCabe commented on HDFS-10404: - +1. Thanks, [~linyiqun]. > CacheAdmin command usage message not shows completely > - > > Key: HDFS-10404 > URL: https://issues.apache.org/jira/browse/HDFS-10404 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10404.001.patch, HDFS-10404.002.patch > > > In {{CacheAdmin}}, there are two places that not completely showing the cmd > usage message. > {code} > $ hdfs cacheadmin > Usage: bin/hdfs cacheadmin [COMMAND] > [-addDirective -path -pool [-force] > [-replication ] [-ttl ]] > [-modifyDirective -id [-path ] [-force] [-replication > ] [-pool ] [-ttl ]] > [-listDirectives [-stats] [-path ] [-pool ] [-id ] > [-removeDirective ] > [-removeDirectives -path ] > [-addPool [-owner ] [-group ] [-mode ] > [-limit ] [-maxTtl ] > {code} > The command {{-listDirectives}} and {{-addPool}} are not showing completely, > they are both lacking a ']' in the end of line. > In the {{CentralizedCacheManagement.md}}, there is also a similar problem. > The page of {{CentralizedCacheManagement}} can also showed this, > https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10276) HDFS throws AccessControlException when checking for the existence of /a/b when /a is a file
[ https://issues.apache.org/jira/browse/HDFS-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10276: Summary: HDFS throws AccessControlException when checking for the existence of /a/b when /a is a file (was: Different results for exist call for file.ext/name) > HDFS throws AccessControlException when checking for the existence of /a/b > when /a is a file > > > Key: HDFS-10276 > URL: https://issues.apache.org/jira/browse/HDFS-10276 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kevin Cox >Assignee: Yuanbo Liu > Attachments: HDFS-10276.001.patch, HDFS-10276.002.patch, > HDFS-10276.003.patch, HDFS-10276.004.patch > > > Given you have a file {{/file}} an existence check for the path > {{/file/whatever}} will give different responses for different > implementations of FileSystem. > LocalFileSystem will return false while DistributedFileSystem will throw > {{org.apache.hadoop.security.AccessControlException: Permission denied: ..., > access=EXECUTE, ...}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10276) Different results for exist call for file.ext/name
[ https://issues.apache.org/jira/browse/HDFS-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281676#comment-15281676 ] Colin Patrick McCabe commented on HDFS-10276: - Thanks for this, [~yuanbo]. Sorry for the sometimes slow pace of reviews. If I understand correctly, the approach taken in this path is to change HDFS to throw an exception stating that the parent path is not a directory, rather than throwing an AccessControlException. So first of all, this sounds like an incompatible change. That's OK-- it just means this should probably go into branch-3 (trunk) rather than branch-2. Secondly, it seems like it would be better to make the modification inside {{FSPermissionChecker}}, rather than adding an external function. After all, this is a general problem, which affects more than just listDir. We also need to make sure that we are not giving away too much information about the filesystem. For example, if the user asks for {{/a/b/c}}, but does not have permission to list {{/a}}, we should not complain about {{/a/b}} not being a directory since that reveals privileged information. > Different results for exist call for file.ext/name > -- > > Key: HDFS-10276 > URL: https://issues.apache.org/jira/browse/HDFS-10276 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kevin Cox >Assignee: Yuanbo Liu > Attachments: HDFS-10276.001.patch, HDFS-10276.002.patch, > HDFS-10276.003.patch, HDFS-10276.004.patch > > > Given you have a file {{/file}} an existence check for the path > {{/file/whatever}} will give different responses for different > implementations of FileSystem. > LocalFileSystem will return false while DistributedFileSystem will throw > {{org.apache.hadoop.security.AccessControlException: Permission denied: ..., > access=EXECUTE, ...}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280809#comment-15280809 ] Colin Patrick McCabe commented on HDFS-9924: bq. For some case like network connection errors, if we do not throw exception until Future#get, the client could summit a large number of calls and the catch a lot of exceptions in Future#get. It is fail-fast if the client catch an exception in the first async call. That makes sense. Thanks for the explanation. bq. It actually throws AsyncCallLimitExceededException so the client can keep trying sending more requests by catching it. Hmm. Is there a way for the client to wait until more async calls are available, without polling? > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280617#comment-15280617 ] Colin Patrick McCabe commented on HDFS-9924: With regard to error handling, why not handle all errors as exceptions thrown from {{Future#get}}? Handling some errors in a different way because they happened "earlier" (let's say, on the client side rather than server side) forces the client to put error checking code in two places. Does the {{Future#get}} callback get made without holding any locks? Can other asynchronous calls be made from this context? {code} public boolean rename(Path src, Path dst) throws IOException { if (isAsynchronousMode()) { return getFutureDistributedFileSystem().rename(src, dst).get(); } else { ... //current implementation. } } {code} It seems concerning that we would have to make such a large change to the synchronous {{DistributedFileSystem}} code. This would also result in more GC load since we'd be creating lots of {{Future}} objects. Shouldn't it be possible to avoid this? I do not think having some kind of global async bit is a good idea. bq. In order to avoid client abusing the server by asynchronous calls. The RPC client should have a configurable limit in order to limit the outstanding asynchronous calls. The caller may be blocked if the number of outstanding calls hits the limit so that the caller is slowed down. Blocking the client seems like it could be problematic for code which expects to be asynchronous. There should be an option to throw an exception in this case. I also think that we could maintain a queue of async calls that we have not submitted to the IPC layer yet, to avoid being limited by issues at the IPC layer. bq. Support asynchronous FileContext (client API) {{AsynchronousFileSystem}} is a separate API from {{FileSystem}}. If there are issues with {{FileSystem}}, surely we can fix them in {{AsynchronousFileSystem}} rather than creating a fourth API? bq. Use Java 8’s new language feature in the API (client API). Given that Hadoop 3.x will probably be Java 8 (based on the mailing list discussion), why not just make the async API use jdk8's {{CompletableFuture}} from day 1, rather than hacking it in later? > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278651#comment-15278651 ] Colin Patrick McCabe commented on HDFS-9924: Hi all, I am +1 for doing this work, based on possible performance improvements we might see, and the need for a convenient asynchronous API for applications. However, I am concerned that there has been no design doc posted, but already code committed to trunk. I am -1 on committing anything more to trunk until we have a design document explaining how the API will work and what changes it will require in HDFS. Just to clarify, I would be fine on committing code to a feature branch without a design document, since we can review it later prior to the merge. However, it is concerning to see such a large feature proceed on trunk without either a branch or a design that the community can review. > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10377) CacheReplicationMonitor shutdown log message should use INFO level.
[ https://issues.apache.org/jira/browse/HDFS-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10377: Resolution: Fixed Fix Version/s: 2.6.5 Target Version/s: 2.6.5 Status: Resolved (was: Patch Available) > CacheReplicationMonitor shutdown log message should use INFO level. > --- > > Key: HDFS-10377 > URL: https://issues.apache.org/jira/browse/HDFS-10377 > Project: Hadoop HDFS > Issue Type: Improvement > Components: logging, namenode >Affects Versions: 2.6.5 >Reporter: Konstantin Shvachko >Assignee: Yiqun Lin > Labels: newbie > Fix For: 2.6.5 > > Attachments: HDFS-10377.001.patch > > > HDFS-7258 changed some log messages to DEBUG level from INFO. DEBUG level is > good for frequently logged messages, but the shutdown message is logged once > and should be INFO level the same as the startup. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10377) CacheReplicationMonitor shutdown log message should use INFO level.
[ https://issues.apache.org/jira/browse/HDFS-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278609#comment-15278609 ] Colin Patrick McCabe commented on HDFS-10377: - +1. Thanks, [~linyiqun]. > CacheReplicationMonitor shutdown log message should use INFO level. > --- > > Key: HDFS-10377 > URL: https://issues.apache.org/jira/browse/HDFS-10377 > Project: Hadoop HDFS > Issue Type: Improvement > Components: logging, namenode >Affects Versions: 2.6.5 >Reporter: Konstantin Shvachko >Assignee: Yiqun Lin > Labels: newbie > Attachments: HDFS-10377.001.patch > > > HDFS-7258 changed some log messages to DEBUG level from INFO. DEBUG level is > good for frequently logged messages, but the shutdown message is logged once > and should be INFO level the same as the startup. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271863#comment-15271863 ] Colin Patrick McCabe commented on HDFS-10301: - Thanks for looking at this, [~daryn]. I'm not sure about the approach you proposed, though. If interleaved full block reports really are very common for [~shv], it seems like throwing an exception when these are received would be problematic. It sounds like there might be some implementation concerns as well, although I didn't look at the patch. bq. [~shv] wrote: I don't think my approach requires RPC change, since the block-report RPC message already has all required structures in place. It should require only the processing logic change. Just to be clear. If what is being sent over the wire is changing, I would consider that an "RPC change." We can create an RPC change without modifying the {{.proto}} file-- for example, by choosing not to fill in some optional field, or filling in some other field. bq. Colin, it would have been good to have an interim solution, but it does not seem reasonable to commit a patch, which fixes one bug, while introducing another. The patch doesn't introduce any bugs. It does mean that we won't remove zombie storages when interleaved block reports are received. But we are not handling this correctly right now either, so that is not a regression. Like I said earlier, I think your approach is a good one, but I think we should get in the patch I posted here. It is a very small and non-disruptive change which doesn't alter what is sent over the wire. It can easily be backported to stable branches. Why don't we commit this patch, and then work on a follow-on with the RPC change and simplification that you proposed? > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10328) Add per-cache-pool default replication num configuration
[ https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268954#comment-15268954 ] Colin Patrick McCabe commented on HDFS-10328: - Thanks for the patch, [~xupener]. {code} diff --git a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto index 7acb394..73db055 100644 --- a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto +++ b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto @@ -533,7 +533,8 @@ message CachePoolInfoProto { optional string groupName = 3; optional int32 mode = 4; optional int64 limit = 5; - optional int64 maxRelativeExpiry = 6; + optional uint32 defaultReplication = 6; + optional int64 maxRelativeExpiry = 7; } {code} Please be careful not to remove or change fields that already exist. In this case, you have moved maxRelativeExpiry from field 6 to field 7, which is an incompatible change. Instead, you should simply add your new field to the end. I suggest using something like this: {code} + optional uint32 defaultReplication = 6 [default=1]; {code} To avoid having to programmatically add a default of 1 in so many places. > Add per-cache-pool default replication num configuration > > > Key: HDFS-10328 > URL: https://issues.apache.org/jira/browse/HDFS-10328 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching >Reporter: xupeng >Assignee: xupeng >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch > > > For now, hdfs cacheadmin can not set a default replication num for cached > directive in the same cachepool. Each cache directive added in the same cache > pool should set their own replication num individually. > Consider this situation, we add daily hive table into cache pool "hive" .Each > time i should set the same replication num for every table directive in the > same cache pool. > I think we should enable setting a default replication num for a cachepool > that every cache directive in the pool can inherit replication configuration > from the pool. Also cache directive can override replication configuration > explicitly by calling "add & modify directive -replication" command from > cli. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10352) Allow users to get last access time of a given directory
[ https://issues.apache.org/jira/browse/HDFS-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266999#comment-15266999 ] Colin Patrick McCabe commented on HDFS-10352: - -1. As [~linyiqun] commented, the performance would be bad, because it is O(N) in terms of number of files in the directory. This also would be very confusing to operators, since it doesn't match the semantics of any other known filesystem or operating system. Finally, if users want to take the maximum value of all the entries in a directory, they can easily do this by calling listDir and computing the maximum themselves. This is just as (in)efficient as what is proposed here, and much cleaner. > Allow users to get last access time of a given directory > > > Key: HDFS-10352 > URL: https://issues.apache.org/jira/browse/HDFS-10352 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs >Affects Versions: 2.6.4 >Reporter: Eric Lin >Assignee: Lin Yiqun >Priority: Minor > > Currently FileStatus.getAccessTime() function will return 0 if path is a > directory, it would be ideal that if a directory path is passed, the code > will go through all the files under the directory and return the MAX access > time of all the files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10328) Add cache pool level replication managment
[ https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264841#comment-15264841 ] Colin Patrick McCabe commented on HDFS-10328: - Hi [~xupener], Interesting idea. However, this doesn't sound like "cache pool level replication management", since the replication management is still per-directive, even after this patch. This seems like adding a per-cache-pool default. If you agree, can you update the JIRA name and some of the names in the patch? > Add cache pool level replication managment > --- > > Key: HDFS-10328 > URL: https://issues.apache.org/jira/browse/HDFS-10328 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: caching >Reporter: xupeng >Assignee: xupeng >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-10328.001.patch > > > For now, hdfs cacheadmin can not set a replication num for cachepool. Each > cache directive added in the cache pool should set their own replication num > individually. > Consider this situation, we add daily hive table into cache pool "hive" .Each > time i should set the same replication num for every table directive in the > same cache pool. > I think we should enable setting a replication num for cachepool that every > cache directive in the pool can inherit replication configuration from the > pool. Also cache directive can override replication configuration explicitly > by calling "add & modify directive -replication" command from cli. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264772#comment-15264772 ] Colin Patrick McCabe commented on HDFS-10301: - bq. You can think of it as a new operation SyncStorages, which does just that - updates NameNode's knowledge of DN's storages. I combined this operation with the first br-RPC. One can combine it with any other call, same as you propose to combine it with the heartbeat. Except it seems a poor idea, since we don't want to wait for removal of thousands of replicas on a heartbeat. Thanks for explaining your proposal a little bit more. I agree that enumerating all the storages in the first block report RPC is a fairly simple way to handle this, and shouldn't add too much size to the FBR. It seems like a better idea than adding it to the heartbeat, like I proposed. In the short term, however, I would prefer the current patch, since it involves no RPC changes, and doesn't require all the DataNodes to be upgraded before it can work. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260498#comment-15260498 ] Colin Patrick McCabe commented on HDFS-10175: - BTW, sorry for the last-minute-ness of this scheduling, [~liuml07] and [~steve_l]. Webex here at 10:30: HDFS-10175 webex Wednesday, April 27, 2016 10:30 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 1 hr JOIN WEBEX MEETING https://cloudera.webex.com/cloudera/j.php?MTID=mebca25435f158dec71b2589561e71b29 Meeting number: 294 963 170 Meeting password: 1234 JOIN BY PHONE 1-650-479-3208 Call-in toll number (US/Canada) Access code: 294 963 170 Global call-in numbers: https://cloudera.webex.com/cloudera/globalcallin.php?serviceType=MC=45642173=0 > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, > HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, > HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260487#comment-15260487 ] Colin Patrick McCabe commented on HDFS-10175: - Great. Let me add a webex > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, > HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, > HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259529#comment-15259529 ] Colin Patrick McCabe commented on HDFS-10301: - bq. Hey Colin, I reviewed your patch more thoroughly. There is still a problem with interleaving reports. See updateBlockReportContext(). Suppose that block reports interleave like this:. Then br1-s2 will reset curBlockReportRpcsSeen since curBlockReportId is not the same as in the report, which will discard the bit set for s1 in br2-s1, and the count of rpcsSeen = 0 will be wrong for br2-s2. So possibly unreported (zombie) storages will not be removed. LMK if you see what I see. Thanks for looking at the patch. I agree that in the case of interleaving, zombie storages will not be removed. I don't consider that a problem, since we will eventually get a non-interleaved full block report that will do the zombie storage removal. In practice, interleaved block reports are extremely rare (we have never seen the problem described in this JIRA, after deploying to thousands of clusters). bq. May be we should go with a different approach for this problem. Single block report can be split into multiple RPCs. Within single block-report-RPC NameNode processes each storage under a lock, but then releases and re-acquires the lock for the next storage, so that multiple RPC reports can interleave due to multi-threading. Maybe I'm misunderstanding the proposal, but don't we already do all of this? We split block reports into multiple RPCs when the storage reports grow beyond a certain size. bq. Approach. DN should report full list of its storages in the first block-report-RPC. The NameNode first cleans up unreported storages and replicas belonging them, then start processing the rest of block reports as usually. So DataNodes explicitly report storages that they have, which eliminates NameNode guessing, which storage is the last in the block report RPC. What does the NameNode do if the DataNode is restarted while sending these RPCs, so that it never gets a chance to send all the storages that it claimed existed? It seems like you will get stuck and not be able to accept any new reports. Or, you can take the same approach the current patch does, and clear the current state every time you see a new ID (but then you can't do zombie storage elimination in the presence of interleaving.) One approach that avoids all these problems is to avoid doing zombie storage elimination during FBRs entirely, and do it instead during DN heartbeats (for example). DN heartbeats are small messages that are never split, and their processing is not interleaved with anything. We agree that the current patch solves the problem of storages falsely being declared as zombies, I hope. I think that's a good enough reason to get this patch in, and then think about alternate approaches later. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259223#comment-15259223 ] Colin Patrick McCabe commented on HDFS-10175: - Hi [~steve_l], does 10:30AM work tomorrow? Unfortunately I'll be out on Thursday and most of Friday, so if we can't do tomorrow we'd have to do Friday afternoon or early next week. > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, > HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, > HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258726#comment-15258726 ] Colin Patrick McCabe commented on HDFS-10175: - bq. I see that, but stream-level counters are essential at least for the tests which verify forward and lazy seeks. Which means that yes, they do have to go into the 2.8.0 release. What I can do is set up the scope so that they are package private, then, in the test code, implement the assertions about metric-derived state into that package. I guess my hope here is that whatever mechanism we come up with is something that could easily be integrated into the upcoming 2.8 release. Since we have talked about requiring our new metrics to not modify existing stable public interfaces, that seems very reasonable. One thing that is a bit concerning about metrics2 is that I think people feel that this interface should be stable (i.e. don't remove or alter things once they're in), which would be a big constraint on us. Perhaps we could document that per-fs stats were \@Public \@Evolving rather than stable? bq. Regarding the metrics2 instrumentation in HADOOP-13028, I'm aggregating the stream statistics back into the metrics 2 data. That's something which isn't needed for the hadoop tests, but which I'm logging in spark test runs, such as (formatted for readability): Do we have any ideas about how Spark will consume these metrics in the longer term? Do they prefer to go through metrics2, for example? I definitely don't object to putting this kind of stuff in metrics2, but if we go that route, we have to accept that we'll just get global (or at best per-fs-type) statistics, rather than per-fs-instance statistics. Is that acceptable? So far, nobody has spoken up strongly in favor of per-fs-instance statistics. > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, > HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, > HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258706#comment-15258706 ] Colin Patrick McCabe commented on HDFS-10175: - bq. I prefer earlier (being in UK time and all); I could do the first half hour of the webex \[at 12:30pm\] How about 10:30AM PST to noon on tomorrow on Wednesday? > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, > HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, > HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.
[ https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258603#comment-15258603 ] Colin Patrick McCabe commented on HDFS-10305: - bq. That's interesting then. hdfs dfs -mkdir /dirAlreadyExists returns a non-zero return code. I assumed a non-zero error code == a failed operation. Obviously I was wrong. A non-zero error code on the shell does indicate a failed operation. You can see that FsShell explicitly checks to see whether the path exists and exits with an error code if so. The code is in ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Mkdir.java I don't think this has anything to do with what hdfs should put in the audit log, since in this case, FsShell doesn't even call mkdir. > Hdfs audit shouldn't log mkdir operaton if the directory already exists. > > > Key: HDFS-10305 > URL: https://issues.apache.org/jira/browse/HDFS-10305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Minor > > Currently Hdfs audit logs mkdir operation even if the directory already > exists. > This creates confusion while analyzing audit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines
[ https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258592#comment-15258592 ] Colin Patrick McCabe commented on HDFS-10326: - bq. Some system may not support auto tuning, defaulting to a small window size (say 64k? which may make the scenario worse). Can you give a concrete example of a system where Hadoop is actually deployed which doesn't support auto-tuning? bq. I'd suggest we keep the configuration. Or maybe add another one, say dfs.socket.detect-auto-turning. When this is set to true (maybe turned on by default), socket buffer behavior depends on whether OS supports auto-tuning. If auto-tuning is not supported, use configured value automatically. Hmm. As far as I know, there is no way to detect auto-tuning. If there is, then we wouldn't need a new configuration... we could just set the appropriate value when no configuration was given. > Disable setting tcp socket send/receive buffers for write pipelines > --- > > Key: HDFS-10326 > URL: https://issues.apache.org/jira/browse/HDFS-10326 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 2.6.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > > The DataStreamer and the Datanode use a hardcoded > DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write > pipeline. Explicitly setting tcp buffer sizes disables tcp stack > auto-tuning. > The hardcoded value will saturate a 1Gb with 1ms RTT. 105Mbs at 10ms. > Paltry 11Mbs over a 100ms long haul. 10Gb networks are underutilized. > There should either be a configuration to completely disable setting the > buffers, or the the setReceiveBuffer and setSendBuffer should be removed > entirely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.
[ https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257383#comment-15257383 ] Colin Patrick McCabe commented on HDFS-10305: - Thanks, [~raviprak]. It's always good to have more people looking at this. Since the mkdir operation succeeded, it seems like it should be in the audit log, according to the policy set in HDFS-9395... perhaps I missed something. > Hdfs audit shouldn't log mkdir operaton if the directory already exists. > > > Key: HDFS-10305 > URL: https://issues.apache.org/jira/browse/HDFS-10305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Minor > > Currently Hdfs audit logs mkdir operation even if the directory already > exists. > This creates confusion while analyzing audit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257074#comment-15257074 ] Colin Patrick McCabe commented on HDFS-10175: - We already have three statistics interfaces: 1. FileSystem#Statistics 2. DFSInputStream#ReadStatistics 3. metrics2 etc. #1 existed for a very long time and is tied into MR in the ways discussed above. I didn't create it, but I did implement the thread-local optimization, based on some performance issues we were having. I have to take the blame for adding #2, in HDFS-4698. At the time, the main focus was on ensuring we were doing short-circuit reads, which didn't really fit into #1. And like you, I felt that it was "very low-level stream behavior" that was decoupled from the rest of the stats. Of course #3 has been around a while, and is used more generally than just in our storage code. I understand your eagerness to get the s3 stats in, but I would rather not proliferate more statistics interfaces if possible. Once they're in, we really can't get rid of them, and it becomes very confusing and clunky. Are you guys free for a webex on Wednesday afternoon? Maybe 12:30pm to 2pm? > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, > HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, > HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering
[ https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256861#comment-15256861 ] Colin Patrick McCabe commented on HDFS-10323: - Thanks for the detailed bug report, [~bpodgursky]. bq. 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child FileSystem, and not hold onto that path itself. This would be an incompatible change, right? It seems like a lot of code calling {{FS#close}} might not work with this change. bq. 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all other FileSystems. This seems like the safest way to go. > transient deleteOnExit failure in ViewFileSystem due to close() ordering > > > Key: HDFS-10323 > URL: https://issues.apache.org/jira/browse/HDFS-10323 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Reporter: Ben Podgursky > > After switching to using a ViewFileSystem, fs.deleteOnExit calls began > failing frequently, displaying this error on failure: > 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for > path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84 > Since FileSystem eats the error involved, it is difficult to be sure what the > error is, but I believe what is happening is that the ViewFileSystem’s child > FileSystems are being close()’d before the ViewFileSystem, due to the random > order ClientFinalizer closes FileSystems; so then when the ViewFileSystem > tries to close(), it tries to forward the delete() calls to the appropriate > child, and fails because the child is already closed. > I’m unsure how to write an actual Hadoop test to reproduce this, since it > involves testing behavior on actual JVM shutdown. However, I can verify that > while > {code:java} > fs.deleteOnExit(randomTemporaryDir); > {code} > regularly (~50% of the time) fails to delete the temporary directory, this > code: > {code:java} > ViewFileSystem viewfs = (ViewFileSystem)fs1; > for (FileSystem fileSystem : viewfs.getChildFileSystems()) { > if (fileSystem.exists(randomTemporaryDir)) { > fileSystem.deleteOnExit(randomTemporaryDir); > } > } > {code} > always successfully deletes the temporary directory on JVM shutdown. > I am not very familiar with FileSystem inheritance hierarchies, but at first > glance I see two ways to fix this behavior: > 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child > FileSystem, and not hold onto that path itself. > 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all > other FileSystems. > Would appreciate any thoughts of whether this seems accurate, and thoughts > (or help) on the fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.
[ https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256858#comment-15256858 ] Colin Patrick McCabe commented on HDFS-10305: - bq. I believe the audit log was supposed to capture failed operations as well. I'd be inclined to close this JIRA as WON'T FIX I can see why you might think this, but no, the audit log should not capture failed operations. Check out the discussion at HDFS-9395 for more details about this. bq. It's not a failed operation if the directory already exists. Yeah, I agree. bq. Closing this as Won't Fix Sounds good. > Hdfs audit shouldn't log mkdir operaton if the directory already exists. > > > Key: HDFS-10305 > URL: https://issues.apache.org/jira/browse/HDFS-10305 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah >Priority: Minor > > Currently Hdfs audit logs mkdir operation even if the directory already > exists. > This creates confusion while analyzing audit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines
[ https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256848#comment-15256848 ] Colin Patrick McCabe commented on HDFS-10326: - bq. I think we can keep the configurability, but set the default to 0. I agree. The reason why the original patches didn't set the default was basically that we wanted to be conservative. Basically, we were adding the option to use auto-tuning, but not making it the default. If we strongly believe that auto-tuning should be the default, we should make these options default to 0 unless set by the admin. > Disable setting tcp socket send/receive buffers for write pipelines > --- > > Key: HDFS-10326 > URL: https://issues.apache.org/jira/browse/HDFS-10326 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Affects Versions: 2.6.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > > The DataStreamer and the Datanode use a hardcoded > DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write > pipeline. Explicitly setting tcp buffer sizes disables tcp stack > auto-tuning. > The hardcoded value will saturate a 1Gb with 1ms RTT. 105Mbs at 10ms. > Paltry 11Mbs over a 100ms long haul. 10Gb networks are underutilized. > There should either be a configuration to completely disable setting the > buffers, or the the setReceiveBuffer and setSendBuffer should be removed > entirely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256842#comment-15256842 ] Colin Patrick McCabe edited comment on HDFS-10175 at 4/25/16 7:20 PM: -- Thanks for commenting, [~steve_l]. It's great to see work on s3 stats. s3a has needed love for a while. Did you get a chance to look at my HDFS-10175.006.patch on this JIRA? It seems to address all of your concerns. It provides a standard API that every FileSystem can implement (not just s3, just HDFS, etc. etc.). Once we adopt jdk8, we can easily implement this API using {{java.util.concurrent.atomic.LongAdder}} if that proves to be more readable and/or efficient. bq. Don't break any existing filesystem code by adding new params to existing methods, etc. I agree. My patch doesn't add new params to any existing methods. bq. add the new code out of FileSystem I agree. That's why I separated {{StorageStatistics.java}} from {{FileSystem.java}}. {{FileContext}} should be able to use this API as well, simply by returning a {{StorageStatistics}} instance just like {{FileSystem}} does. bq. Use an int rather than an enum; lets filesystems add their own counters. I hereby reserve 0x200-0x255 for object store operations. Hmm. I'm not sure I follow. My patch identifies counters by name (string), not by an int, enum, or byte. This is necessary because different storage backends will want to track different things (s3a wants to track s3 PUTs, HDFS wants to track genstamp bump ops, etc. etc.). We should not try to create the "statistics type enum of doom" in some misguided attempt at space optimization. Consider the case of out-of-tree Filesystem implementations as well... they are not going to add entries to some enum of doom in hadoop-common. bq. public interface StatisticsSource { Mapsnapshot(); } I don't think an API that returns a map is the right approach for statistics. That map could get quite large. We already know that people love adding just one more statistic to things (and often for quite valid reasons). It's very difficult to \-1 a patch just because it bloats the statistics map more. Once this API exists, the natural progression will be people adding tons and tons of new entries to it. We should be prepared for this and use an API that doesn't choke if we have tons of stats. We shouldn't have to materialize everything all the time-- an iterator approach is smarter because it can be O(1) in terms of memory, no matter how many entries we have. I also don't think we need snapshot consistency for stats. It's a heavy burden for an implementation to carry (it basically requires some kind of materialization into a map, and probably synchronization to stop the world while the materialization is going on). And there is no user demand for it... the current FileSystem#Statistics interface doesn't have it, and nobody has asked for it. It seems like you are focusing on the ability to expose new stats to our metrics2 subsystem, while this JIRA originally focused on adding metrics that MapReduce could read at the end of a job. I think these two use-cases can be covered by the same API. We should try to hammer that out (probably as a HADOOP JIRA rather than an HDFS JIRA, as well). Do you think we should have a call about this or something? I know some folks who might be interested in testing the s3 metrics stuff, if there was a reasonable API to access it. was (Author: cmccabe): Thanks for commenting, [~steve_l]. It's great to see work on s3 stats. s3a has needed love for a while. Did you get a chance to look at my HDFS-10175.006.patch on this JIRA? It seems to address all of your concerns. It provides a standard API that every FileSystem can implement (not just s3, just HDFS, etc. etc.). Once we adopt jdk8, we can easily implement this API using {{java.util.concurrent.atomic.LongAdder}} if that proves to be more readable and/or efficient. bq. Don't break any existing filesystem code by adding new params to existing methods, etc. I agree. My patch doesn't add new params to any existing methods. bq. add the new code out of FileSystem I agree. That's why I separated {{StorageStatistics.java}} from {{FileSystem.java}}. {{FileContext}} should be able to use this API as well, simply by returning a {{StorageStatistics}} instance just like {{FileSystem}} does. bq. Use an int rather than an enum; lets filesystems add their own counters. I hereby reserve 0x200-0x255 for object store operations. Hmm. I'm not sure I follow. My patch identifies counters by name (string), not by an int, enum, or byte. This is necessary because different storage backends will want to track different things (s3a wants to track s3 PUTs, HDFS wants to track genstamp bump ops, etc. etc.). We should not try to create the "statistics type
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256842#comment-15256842 ] Colin Patrick McCabe commented on HDFS-10175: - Thanks for commenting, [~steve_l]. It's great to see work on s3 stats. s3a has needed love for a while. Did you get a chance to look at my HDFS-10175.006.patch on this JIRA? It seems to address all of your concerns. It provides a standard API that every FileSystem can implement (not just s3, just HDFS, etc. etc.). Once we adopt jdk8, we can easily implement this API using {{java.util.concurrent.atomic.LongAdder}} if that proves to be more readable and/or efficient. bq. Don't break any existing filesystem code by adding new params to existing methods, etc. I agree. My patch doesn't add new params to any existing methods. bq. add the new code out of FileSystem I agree. That's why I separated {{StorageStatistics.java}} from {{FileSystem.java}}. {{FileContext}} should be able to use this API as well, simply by returning a {{StorageStatistics}} instance just like {{FileSystem}} does. bq. Use an int rather than an enum; lets filesystems add their own counters. I hereby reserve 0x200-0x255 for object store operations. Hmm. I'm not sure I follow. My patch identifies counters by name (string), not by an int, enum, or byte. This is necessary because different storage backends will want to track different things (s3a wants to track s3 PUTs, HDFS wants to track genstamp bump ops, etc. etc.). We should not try to create the "statistics type enum of doom" in some misguided attempt at space optimization. Consider the case of out-of-tree Filesystem implementations as well... they are not going to add entries to some enum of doom in hadoop-common. bq. public interface StatisticsSource { Mapsnapshot(); } I don't think an API that returns a map is the right approach for statistics. That map could get quite large. We already know that people love adding just one more statistic to things (and often for quite valid reasons). It's very difficult to -1 a patch just because it bloats the statistics map more. Once this API exists, the natural progression will be people adding tons and tons of new entries to it. We should be prepared for this and use an API that doesn't choke if we have tons of stats. We shouldn't have to materialize everything all the time-- an iterator approach is smarter because it can be O(1) in terms of memory, no matter how many entries we have. I also don't think we need snapshot consistency for stats. It's a heavy burden for an implementation to carry (it basically requires some kind of materialization into a map, and probably synchronization to stop the world while the materialization is going on). And there is no user demand for it... the current FileSystem#Statistics interface doesn't have it, and nobody has asked for it. It seems like you are focusing on the ability to expose new stats to our metrics2 subsystem, while this JIRA originally focused on adding metrics that MapReduce could read at the end of a job. I think these two use-cases can be covered by the same API. We should try to hammer that out (probably as a HADOOP JIRA rather than an HDFS JIRA, as well). Do you think we should have a call about this or something? I know some folks who might be interested in testing the s3 metrics stuff, if there was a reasonable API to access it. > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, > HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, > HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like
[jira] [Comment Edited] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256791#comment-15256791 ] Colin Patrick McCabe edited comment on HDFS-10301 at 4/25/16 6:58 PM: -- bq. [~shv] wrote: The last line is confusing, because it should have been 2, but its is 0 since br2 overridden lastBlockReportId for s1 and s2 . It's OK for it to be 0 here. It just means that we will not do the zombie storage elimination for these particular full block reports. Remember that interleaved block reports are an extremely rare case, and so are zombie storages. We can wait for the next FBR to do the zombie elimination. bq. I think this could be a simple fix for this jira, and we can discuss other approaches to zombie storage detection in the next issue. Current approach seems to be error prone. One way is to go with the retry cache as Jing Zhao suggested. Or there could be other ideas. The problem with a retry cache is that it uses up memory. We don't have an easy way to put an upper bound on the amount of memory that we need, except through adding complex logic to limit the number of full block reports accepted for a specific DataNode in a given time period. bq. This brought me to an idea. BR ids are monotonically increasing... The code for generating block report IDs is here: {code} private long generateUniqueBlockReportId() { // Initialize the block report ID the first time through. // Note that 0 is used on the NN to indicate "uninitialized", so we should // not send a 0 value ourselves. prevBlockReportId++; while (prevBlockReportId == 0) { prevBlockReportId = ThreadLocalRandom.current().nextLong(); } return prevBlockReportId; } {code} It's not monotonically increasing in the case where rollover occurs. While this is an extremely rare case, the consequences of getting it wrong would be extremely severe. So this might be possible as an incompatible change, but not a change in branch-2. Edit: another reason not to do this is because on restart, the DN could get a number lower than its previous one. We can't use IDs as epoch numbers unless we actually persist them to disk, like Paxos transaction IDs or HDFS edit log IDs. bq. [~walter.k.su] wrote: If BR is splitted into multipe RPCs, there's no interleaving natually because DN get the acked before it sends next RPC. Interleaving only exists if BR is not splitted. I agree bug need to be fixed from inside, It's just eliminating interleaving for good maybe not a bad idea, as it simplifies the problem, and is also a simple workaround for this jira. We don't document anywhere that interleaving doesn't occur. We don't have unit tests that it doesn't occur, and if we did, those unit tests might accidentally pass because of race conditions. Even if we eliminated interleaving for now, anyone changing the RPC code or the queuing code could easily re-introduce interleaving and this bug would come back. That's why I agree with [~shv]-- we should not focus on trying to remove interleaving. bq. [~shv] wrote: I think this could be a simple fix for this jira, and we can discuss other approaches to zombie storage detection in the next issue. Yeah, let's get in this fix and then talk about potential improvements in a follow-on jira. was (Author: cmccabe): bq. [~shv] wrote: The last line is confusing, because it should have been 2, but its is 0 since br2 overridden lastBlockReportId for s1 and s2 . It's OK for it to be 0 here. It just means that we will not do the zombie storage elimination for these particular full block reports. Remember that interleaved block reports are an extremely rare case, and so are zombie storages. We can wait for the next FBR to do the zombie elimination. bq. I think this could be a simple fix for this jira, and we can discuss other approaches to zombie storage detection in the next issue. Current approach seems to be error prone. One way is to go with the retry cache as Jing Zhao suggested. Or there could be other ideas. The problem with a retry cache is that it uses up memory. We don't have an easy way to put an upper bound on the amount of memory that we need, except through adding complex logic to limit the number of full block reports accepted for a specific DataNode in a given time period. bq. This brought me to an idea. BR ids are monotonically increasing... The code for generating block report IDs is here: {code} private long generateUniqueBlockReportId() { // Initialize the block report ID the first time through. // Note that 0 is used on the NN to indicate "uninitialized", so we should // not send a 0 value ourselves. prevBlockReportId++; while (prevBlockReportId == 0) { prevBlockReportId = ThreadLocalRandom.current().nextLong(); } return prevBlockReportId; } {code} It's not
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256791#comment-15256791 ] Colin Patrick McCabe commented on HDFS-10301: - bq. [~shv] wrote: The last line is confusing, because it should have been 2, but its is 0 since br2 overridden lastBlockReportId for s1 and s2 . It's OK for it to be 0 here. It just means that we will not do the zombie storage elimination for these particular full block reports. Remember that interleaved block reports are an extremely rare case, and so are zombie storages. We can wait for the next FBR to do the zombie elimination. bq. I think this could be a simple fix for this jira, and we can discuss other approaches to zombie storage detection in the next issue. Current approach seems to be error prone. One way is to go with the retry cache as Jing Zhao suggested. Or there could be other ideas. The problem with a retry cache is that it uses up memory. We don't have an easy way to put an upper bound on the amount of memory that we need, except through adding complex logic to limit the number of full block reports accepted for a specific DataNode in a given time period. bq. This brought me to an idea. BR ids are monotonically increasing... The code for generating block report IDs is here: {code} private long generateUniqueBlockReportId() { // Initialize the block report ID the first time through. // Note that 0 is used on the NN to indicate "uninitialized", so we should // not send a 0 value ourselves. prevBlockReportId++; while (prevBlockReportId == 0) { prevBlockReportId = ThreadLocalRandom.current().nextLong(); } return prevBlockReportId; } {code} It's not monotonically increasing in the case where rollover occurs. While this is an extremely rare case, the consequences of getting it wrong would be extremely severe. So this might be possible as an incompatible change, but not a change in branch-2. bq. [~walter.k.su] wrote: If BR is splitted into multipe RPCs, there's no interleaving natually because DN get the acked before it sends next RPC. Interleaving only exists if BR is not splitted. I agree bug need to be fixed from inside, It's just eliminating interleaving for good maybe not a bad idea, as it simplifies the problem, and is also a simple workaround for this jira. We don't document anywhere that interleaving doesn't occur. We don't have unit tests that it doesn't occur, and if we did, those unit tests might accidentally pass because of race conditions. Even if we eliminated interleaving for now, anyone changing the RPC code or the queuing code could easily re-introduce interleaving and this bug would come back. That's why I agree with [~shv]-- we should not focus on trying to remove interleaving. bq. [~shv] wrote: I think this could be a simple fix for this jira, and we can discuss other approaches to zombie storage detection in the next issue. Yeah, let's get in this fix and then talk about potential improvements in a follow-on jira. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics
[ https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254503#comment-15254503 ] Colin Patrick McCabe commented on HDFS-10175: - bq. Can I also note that as the @Public @Stable FileSystem is widely subclassed, with its protected statistics field accessed in those subclasses, nobody is allowed to take it or its current methods away. Thanks. Yeah, I agree. I would like to see us get more cautious about adding new things to {{FileSystem#Statistics}}, though, since I think it's not a good match for most of the new stats we're proposing here. bq. There's no per-thread tracking, —its collecting overall stats, rather than trying to add up the cost of a single execution, which is what per-thread stuff would, presumably do. This is lower cost but still permits microbenchmark-style analysis of performance problems against S3a. It doesn't directly let you get results of a job, "34MB of data, 2000 stream aborts, 1998 backward seeks" which are the kind of things I'm curious about. Overall stats are lower cost in terms of memory consumption, and the cost to read (as opposed to update) a metric. They are higher cost in terms of the CPU consumed for each update of the metric. In particular, for applications that do a lot of stream operations from many different threads, updating an AtomicLong can become a performance bottleneck One of the points that I was making above is that I think it's appropriate for some metrics to be tracked per-thread, but for others, we probably want to use AtomicLong or similar. I would expect that anything that led to an s3 RPC would be appropriate to be tracked by an AtomicLong very easily, since the overhead of the network activity would dwarf the AtomicLong update overhead. And we should have a common interface for getting this information that MR and stats consumers can use. bq. Maybe, and this would be nice, whatever is implemented here is (a) extensible to support some duration type too, at least in parallel, The interface here supports storing durations as 64-bit numbers of milliseconds, which seems good. It is up to the implementor of the statistic to determine what the 64-bit long represents (average duration in ms, median duration in ms, number of RPCs, etc. etc.) This is similar to metrics2 and jmx, etc. where you have basic types that can be used in a few different ways. bq. and (b) could be used as a back end by both Metrics2 and Coda Hale metrics registries. That way the slightly more expensive metric systems would have access to this more raw data. Sure. The difficult question is how metrics2 hooks up to metrics which are per FS or per-stream. Should the output of metrics2 reflect the union of all existing FS and stream instances? Some applications open a very large number of streams, so it seems impractical for metrics2 to include all those streams in its output. > add per-operation stats to FileSystem.Statistics > > > Key: HDFS-10175 > URL: https://issues.apache.org/jira/browse/HDFS-10175 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Ram Venkatesh >Assignee: Mingliang Liu > Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, > HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, > HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java > > > Currently FileSystem.Statistics exposes the following statistics: > BytesRead > BytesWritten > ReadOps > LargeReadOps > WriteOps > These are in-turn exposed as job counters by MapReduce and other frameworks. > There is logic within DfsClient to map operations to these counters that can > be confusing, for instance, mkdirs counts as a writeOp. > Proposed enhancement: > Add a statistic for each DfsClient operation including create, append, > createSymlink, delete, exists, mkdirs, rename and expose them as new > properties on the Statistics object. The operation-specific counters can be > used for analyzing the load imposed by a particular job on HDFS. > For example, we can use them to identify jobs that end up creating a large > number of files. > Once this information is available in the Statistics object, the app > frameworks like MapReduce can expose them as additional counters to be > aggregated and recorded as part of job summary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254350#comment-15254350 ] Colin Patrick McCabe commented on HDFS-10301: - Yeah, perhaps we should file a follow-on JIRA to optimize by removing the storage reports with an older ID when a newer one was received. The challenge will be implementing it efficiently-- we probably need to move away from BlockingQueue and towards something with our own locking. And probably something other than plain Runnables. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe reassigned HDFS-10301: --- Assignee: Colin Patrick McCabe > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10301: Attachment: HDFS-10301.003.patch added a unit test > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.01.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)