[jira] [Commented] (HDFS-10543) hdfsRead read stops at block boundary

2016-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363214#comment-15363214
 ] 

Colin Patrick McCabe commented on HDFS-10543:
-

One approach would be to try checking the behavior of the Java client and 
seeing if you can do something similar.  It is not incorrect to avoid short 
reads, just potentially inefficient.

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
> Fix For: HDFS-8707
>
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, 
> HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10555) Unable to loadFSEdits due to a failure in readCachePoolInfo

2016-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363005#comment-15363005
 ] 

Colin Patrick McCabe commented on HDFS-10555:
-

Thanks, [~umamaheswararao], [~jingzhao], and [~kihwal].

> Unable to loadFSEdits due to a failure in readCachePoolInfo
> ---
>
> Key: HDFS-10555
> URL: https://issues.apache.org/jira/browse/HDFS-10555
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, namenode
>Affects Versions: 2.9.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 2.9.0
>
> Attachments: HDFS-10555-00.patch
>
>
> Recently some tests are failing and unable to loadFSEdits due to a failure in 
> readCachePoolInfo.
> Here in below code
> FSImageSerialization.java
> {code}
>   }
> if ((flags & ~0x2F) != 0) {
>   throw new IOException("Unknown flag in CachePoolInfo: " + flags);
> }
> {code}
> When all values of CachePool variable set to true, flags value & ~0x2F turns 
> out to non zero value. So, this condition failing due to the addition of 0x20 
>  and changing  value from ~0x1F to ~0x2F.
> May be to fix this issue, we may can change multiply value to ~0x3F 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote

2016-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363003#comment-15363003
 ] 

Colin Patrick McCabe commented on HDFS-10548:
-

Thanks for tackling this, guys.  It is good to see this code duplication 
finally go away.  Next target: {{BlockReaderLocalLegacy}}?

I do think renaming {{BlockReaderRemote2}} will make merging code back to 
branch-2 more difficult-- you might want to reconsider that.

> Remove the long deprecated BlockReaderRemote
> 
>
> Key: HDFS-10548
> URL: https://issues.apache.org/jira/browse/HDFS-10548
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, 
> HDFS-10548-v3.patch
>
>
> To lessen the maintain burden like raised in HDFS-8901, suggest we remove 
> {{BlockReaderRemote}} class that's deprecated very long time ago. 
> From {{BlockReaderRemote}} header:
> {quote}
>  * @deprecated this is an old implementation that is being left around
>  * in case any issues spring up with the new {@link BlockReaderRemote2}
>  * implementation.
>  * It will be removed in the next release.
> {quote}
> From {{BlockReaderRemote2}} class header:
> {quote}
>  * This is a new implementation introduced in Hadoop 0.23 which
>  * is more efficient and simpler than the older BlockReader
>  * implementation. It should be renamed to BlockReaderRemote
>  * once we are confident in it.
> {quote}
> So even further, after getting rid of the old class, we could rename as the 
> comment suggested: BlockReaderRemote2 => BlockReaderRemote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10543) hdfsRead read stops at block boundary

2016-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362997#comment-15362997
 ] 

Colin Patrick McCabe commented on HDFS-10543:
-

Just to be clear, the existing HDFS Java client can return "short reads" that 
are less than what was requested, even when there is more remaining in the 
file.  This is traditional in POSIX and nearly all filesystems I'm aware of 
have these semantics.  The justification is that applications may not want to 
wait a long time to fetch more bytes, if there are some bytes available already 
that they can process.  Applications that do want the full buffer can just call 
read() again.  APIs like {{readFully}} exist to provide these semantics.

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
> Fix For: HDFS-8707
>
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, 
> HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9805) TCP_NODELAY not set before SASL handshake in data transfer pipeline

2016-07-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-9805:
---
   Resolution: Fixed
Fix Version/s: 3.0.0-alpha1
   Status: Resolved  (was: Patch Available)

> TCP_NODELAY not set before SASL handshake in data transfer pipeline
> ---
>
> Key: HDFS-9805
> URL: https://issues.apache.org/jira/browse/HDFS-9805
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-9805.002.patch, HDFS-9805.003.patch, 
> HDFS-9805.004.patch, HDFS-9805.005.patch
>
>
> There are a few places in the DN -> DN block transfer pipeline where 
> TCP_NODELAY is not set before doing a SASL handshake:
> * in {{DataNode.DataTransfer::run()}}
> * in {{DataXceiver::replaceBlock()}}
> * in {{DataXceiver::writeBlock()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9805) TCP_NODELAY not set before SASL handshake in data transfer pipeline

2016-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362983#comment-15362983
 ] 

Colin Patrick McCabe commented on HDFS-9805:


Thanks for the reminder, [~jzhuge].  I committed the patch last week, but JIRA 
went down before I could mark the ticket as resolved.

I have committed this to trunk only for the moment.  The backport to branch-2 
looks like it might be a little tricky, and our next release will be 3.0 
anyway.  If anyone is interested in backporting to branch-2, please do and 
update the ticket. Cheers.

> TCP_NODELAY not set before SASL handshake in data transfer pipeline
> ---
>
> Key: HDFS-9805
> URL: https://issues.apache.org/jira/browse/HDFS-9805
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-9805.002.patch, HDFS-9805.003.patch, 
> HDFS-9805.004.patch, HDFS-9805.005.patch
>
>
> There are a few places in the DN -> DN block transfer pipeline where 
> TCP_NODELAY is not set before doing a SASL handshake:
> * in {{DataNode.DataTransfer::run()}}
> * in {{DataXceiver::replaceBlock()}}
> * in {{DataXceiver::writeBlock()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10594) HDFS-4949 should support recursive cache directives

2016-07-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10594:

Summary: HDFS-4949 should support recursive cache directives  (was: 
CacheReplicationMonitor should recursively rescan the path when the inode of 
the path is directory)

> HDFS-4949 should support recursive cache directives
> ---
>
> Key: HDFS-10594
> URL: https://issues.apache.org/jira/browse/HDFS-10594
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10594.001.patch
>
>
> In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively 
> rescan the path when the inode of the path is a directory. In these code:
> {code}
> } else if (node.isDirectory()) {
> INodeDirectory dir = node.asDirectory();
> ReadOnlyList children = dir
> .getChildrenList(Snapshot.CURRENT_STATE_ID);
> for (INode child : children) {
>   if (child.isFile()) {
> rescanFile(directive, child.asFile());
>   }
> }
>}
> {code}
> If we did the this logic, it means that some inode files will be ignored when 
> the child inode is also a directory and there are some other child inode file 
> in it. Finally the child's child file which belong to this path will not be 
> cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9700) DFSClient and DFSOutputStream should set TCP_NODELAY on sockets for DataTransferProtocol

2016-06-24 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348970#comment-15348970
 ] 

Colin Patrick McCabe commented on HDFS-9700:


Hmm.  I think it's confusing to use a configuration key for Hadoop RPC to 
configure something that isn't Hadoop RPC.  We have tons of keys named with 
{{ipc}} and all of them relate to Hadoop RPC, not to DataTransferProtocol.  
{{ipc.client.connect.max.retries}}, {{ipc.server.listen.queue.size}}, 
{{ipc.client.connect.timeout}}, and so forth.

There are valid cases where you might want a different configuration for RPC 
versus datatransferprotocol.  For example, conservative users might also want 
to avoid turning on {{TCP_NODELAY}} for {{DataTransferProtocol}} since it is a 
new feature, and not as well tested as doing what we do currently.  But since 
we have {{TCP_NODELAY}} on for RPC, they might want to keep that on.

I agree that in the long term, {{TCP_NODELAY}} should be used for both.  But 
that's an argument for removing the configuration altogether, not for making it 
do something other than what it's named.

> DFSClient and DFSOutputStream should set TCP_NODELAY on sockets for 
> DataTransferProtocol
> 
>
> Key: HDFS-9700
> URL: https://issues.apache.org/jira/browse/HDFS-9700
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.7.1, 2.6.3
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 2.8.0
>
> Attachments: HDFS-9700-branch-2.7.002.patch, 
> HDFS-9700-branch-2.7.003.patch, HDFS-9700-v1.patch, HDFS-9700-v2.patch, 
> HDFS-9700.002.patch, HDFS-9700.003.patch, HDFS-9700.004.patch, 
> HDFS-9700_branch-2.7-v2.patch, HDFS-9700_branch-2.7.patch
>
>
> In {{DFSClient.connectToDN()}} and 
> {{DFSOutputStream.createSocketForPipeline()}}, we never call 
> {{setTcpNoDelay()}} on the constructed socket before sending.  In both cases, 
> we should respect the value of ipc.client.tcpnodelay in the configuration.
> While this applies whether security is enabled or not, it seems to have a 
> bigger impact on latency when security is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8940) Support for large-scale multi-tenant inotify service

2016-06-24 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348554#comment-15348554
 ] 

Colin Patrick McCabe commented on HDFS-8940:


bq. You mean reading inotify messages from the SbNN? It's a very attractive 
idea from scalability angle. But how would we handle the staleness? The SbNN 
could be a few mins behind ANN right?

Sorry for the misunderstanding.  I wasn't talking about HDFS HA.  The point 
that I was making is that you don't want a single point of failure in whatever 
service you are using to fetch the events from HDFS and put them in Kafka.  
Perhaps you could also execute the code which fetches events in the context of 
Kafka itself somehow, to avoid creating a new service?  I'm not familiar with 
the programming model there.

> Support for large-scale multi-tenant inotify service
> 
>
> Key: HDFS-8940
> URL: https://issues.apache.org/jira/browse/HDFS-8940
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
> Attachments: Large-Scale-Multi-Tenant-Inotify-Service.pdf
>
>
> HDFS-6634 provides the core inotify functionality. We would like to extend 
> that to provide a large-scale service that ten of thousands of clients can 
> subscribe to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10328) Add per-cache-pool default replication num configuration

2016-06-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345791#comment-15345791
 ] 

Colin Patrick McCabe commented on HDFS-10328:
-

Sorry for the breakage, [~kshukla].  HDFS-10555 should have fixed it-- check it 
out.

> Add per-cache-pool default replication num configuration
> 
>
> Key: HDFS-10328
> URL: https://issues.apache.org/jira/browse/HDFS-10328
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch, 
> HDFS-10328.003.patch, HDFS-10328.004.patch
>
>
> For now, hdfs cacheadmin can not set a default replication num for cached 
> directive in the same cachepool. Each cache directive added in the same cache 
> pool should set their own replication num individually. 
> Consider this situation, we add daily hive table into cache pool "hive" .Each 
> time i should set the same replication num for every table directive in the 
> same cache pool.  
> I think we should enable setting a default replication num for a cachepool 
> that every cache directive in the pool can inherit replication configuration 
> from the pool. Also cache directive can override replication configuration 
> explicitly by calling "add & modify  directive -replication" command from 
> cli.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8940) Support for large-scale multi-tenant inotify service

2016-06-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345766#comment-15345766
 ] 

Colin Patrick McCabe commented on HDFS-8940:


I think Kafka would be a great choice for scaling HDFS inotify.  You would 
probably want an HA service for fetching HDFS inotify messages, and then just 
put them directly into Kafka.  No serialization needed because it's already 
protobuf.

> Support for large-scale multi-tenant inotify service
> 
>
> Key: HDFS-8940
> URL: https://issues.apache.org/jira/browse/HDFS-8940
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
> Attachments: Large-Scale-Multi-Tenant-Inotify-Service.pdf
>
>
> HDFS-6634 provides the core inotify functionality. We would like to extend 
> that to provide a large-scale service that ten of thousands of clients can 
> subscribe to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-06-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340868#comment-15340868
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

The "you" in that sentence was targetted at you, [~shv].  I realized that 
[~redvine] wrote the patch, but I spoke imprecisely.  Sorry for the confusion.

bq. This is her first encounter with HDFS community. Let's try to make it 
pleasant enough so that she wished to come back and work with us more.

To be honest, I don't think this is a very good newbie JIRA.  It is clearly a 
very controversial issue, and it's also a very difficult piece of code with a 
lot of subtlety.  Since you clearly have strong opinions about this JIRA, I 
believe it would be more appropriate for you to post patches implementing your 
ideas yourself.  But that is up to you, of course.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.01.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10448) CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with replication factors other than 1

2016-06-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340862#comment-15340862
 ] 

Colin Patrick McCabe commented on HDFS-10448:
-

Committed to 2.8.  Thanks, [~linyiqun]!  Sorry for the delays in reviews.

> CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with 
> replication factors other than 1
> --
>
> Key: HDFS-10448
> URL: https://issues.apache.org/jira/browse/HDFS-10448
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Fix For: 2.8.0
>
> Attachments: HDFS-10448.001.patch
>
>
> The logic in {{CacheManager#checkLimit}} is not correct. In this method, it 
> does with these three logic:
> First, it will compute needed bytes for the specific path.
> {code}
> CacheDirectiveStats stats = computeNeeded(path, replication);
> {code}
> But the param {{replication}} is not used here. And the bytesNeeded is just 
> one replication's vaue.
> {code}
> return new CacheDirectiveStats.Builder()
> .setBytesNeeded(requestedBytes)
> .setFilesCached(requestedFiles)
> .build();
> {code}
> Second, then it should be multiply by the replication to compare the limit 
> size because the method {{computeNeeded}} was not used replication.
> {code}
> pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > 
> pool.getLimit()
> {code}
> Third, if we find the size was more than the limit value and then print 
> warning info. It divided by replication here, while the 
> {{stats.getBytesNeeded()}} was just one replication value.
> {code}
>   throw new InvalidRequestException("Caching path " + path + " of size "
>   + stats.getBytesNeeded() / replication + " bytes at replication "
>   + replication + " would exceed pool " + pool.getPoolName()
>   + "'s remaining capacity of "
>   + (pool.getLimit() - pool.getBytesNeeded()) + " bytes.");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10448) CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with replication factors other than 1

2016-06-20 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10448:

  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

> CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with 
> replication factors other than 1
> --
>
> Key: HDFS-10448
> URL: https://issues.apache.org/jira/browse/HDFS-10448
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Fix For: 2.8.0
>
> Attachments: HDFS-10448.001.patch
>
>
> The logic in {{CacheManager#checkLimit}} is not correct. In this method, it 
> does with these three logic:
> First, it will compute needed bytes for the specific path.
> {code}
> CacheDirectiveStats stats = computeNeeded(path, replication);
> {code}
> But the param {{replication}} is not used here. And the bytesNeeded is just 
> one replication's vaue.
> {code}
> return new CacheDirectiveStats.Builder()
> .setBytesNeeded(requestedBytes)
> .setFilesCached(requestedFiles)
> .build();
> {code}
> Second, then it should be multiply by the replication to compare the limit 
> size because the method {{computeNeeded}} was not used replication.
> {code}
> pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > 
> pool.getLimit()
> {code}
> Third, if we find the size was more than the limit value and then print 
> warning info. It divided by replication here, while the 
> {{stats.getBytesNeeded()}} was just one replication value.
> {code}
>   throw new InvalidRequestException("Caching path " + path + " of size "
>   + stats.getBytesNeeded() / replication + " bytes at replication "
>   + replication + " would exceed pool " + pool.getPoolName()
>   + "'s remaining capacity of "
>   + (pool.getLimit() - pool.getBytesNeeded()) + " bytes.");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10448) CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with replication factors other than 1

2016-06-20 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10448:

Summary: CacheManager#addInternal tracks bytesNeeded incorrectly when 
dealing with replication factors other than 1  (was: CacheManager#checkLimit  
always assumes a replication factor of 1)

> CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with 
> replication factors other than 1
> --
>
> Key: HDFS-10448
> URL: https://issues.apache.org/jira/browse/HDFS-10448
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10448.001.patch
>
>
> The logic in {{CacheManager#checkLimit}} is not correct. In this method, it 
> does with these three logic:
> First, it will compute needed bytes for the specific path.
> {code}
> CacheDirectiveStats stats = computeNeeded(path, replication);
> {code}
> But the param {{replication}} is not used here. And the bytesNeeded is just 
> one replication's vaue.
> {code}
> return new CacheDirectiveStats.Builder()
> .setBytesNeeded(requestedBytes)
> .setFilesCached(requestedFiles)
> .build();
> {code}
> Second, then it should be multiply by the replication to compare the limit 
> size because the method {{computeNeeded}} was not used replication.
> {code}
> pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > 
> pool.getLimit()
> {code}
> Third, if we find the size was more than the limit value and then print 
> warning info. It divided by replication here, while the 
> {{stats.getBytesNeeded()}} was just one replication value.
> {code}
>   throw new InvalidRequestException("Caching path " + path + " of size "
>   + stats.getBytesNeeded() / replication + " bytes at replication "
>   + replication + " would exceed pool " + pool.getPoolName()
>   + "'s remaining capacity of "
>   + (pool.getLimit() - pool.getBytesNeeded()) + " bytes.");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10448) CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with replication factors other than 1

2016-06-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340851#comment-15340851
 ] 

Colin Patrick McCabe commented on HDFS-10448:
-

Hi [~linyiqun],

Sorry, I misread the patch the first time around.  You are indeed changing 
computeNeeded to take the replication factor into account, which seems like a 
better way to go.

+1

> CacheManager#addInternal tracks bytesNeeded incorrectly when dealing with 
> replication factors other than 1
> --
>
> Key: HDFS-10448
> URL: https://issues.apache.org/jira/browse/HDFS-10448
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10448.001.patch
>
>
> The logic in {{CacheManager#checkLimit}} is not correct. In this method, it 
> does with these three logic:
> First, it will compute needed bytes for the specific path.
> {code}
> CacheDirectiveStats stats = computeNeeded(path, replication);
> {code}
> But the param {{replication}} is not used here. And the bytesNeeded is just 
> one replication's vaue.
> {code}
> return new CacheDirectiveStats.Builder()
> .setBytesNeeded(requestedBytes)
> .setFilesCached(requestedFiles)
> .build();
> {code}
> Second, then it should be multiply by the replication to compare the limit 
> size because the method {{computeNeeded}} was not used replication.
> {code}
> pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > 
> pool.getLimit()
> {code}
> Third, if we find the size was more than the limit value and then print 
> warning info. It divided by replication here, while the 
> {{stats.getBytesNeeded()}} was just one replication value.
> {code}
>   throw new InvalidRequestException("Caching path " + path + " of size "
>   + stats.getBytesNeeded() / replication + " bytes at replication "
>   + replication + " would exceed pool " + pool.getPoolName()
>   + "'s remaining capacity of "
>   + (pool.getLimit() - pool.getBytesNeeded()) + " bytes.");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10328) Add per-cache-pool default replication num configuration

2016-06-20 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10328:

  Resolution: Fixed
Target Version/s: 2.9.0
  Status: Resolved  (was: Patch Available)

Committed to 2.9.  Thanks, [~xupener].

> Add per-cache-pool default replication num configuration
> 
>
> Key: HDFS-10328
> URL: https://issues.apache.org/jira/browse/HDFS-10328
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch, 
> HDFS-10328.003.patch, HDFS-10328.004.patch
>
>
> For now, hdfs cacheadmin can not set a default replication num for cached 
> directive in the same cachepool. Each cache directive added in the same cache 
> pool should set their own replication num individually. 
> Consider this situation, we add daily hive table into cache pool "hive" .Each 
> time i should set the same replication num for every table directive in the 
> same cache pool.  
> I think we should enable setting a default replication num for a cachepool 
> that every cache directive in the pool can inherit replication configuration 
> from the pool. Also cache directive can override replication configuration 
> explicitly by calling "add & modify  directive -replication" command from 
> cli.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote

2016-06-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340836#comment-15340836
 ] 

Colin Patrick McCabe commented on HDFS-10548:
-

I would love to see this class go away.  It is truly a relic of another time, 
which lasted much longer than it should.  I think the only remaining use case 
for it is when using sockets that don't have associated channels (SOCKS sockets 
don't, I think?)  We should be able to create adaptors for those, though, 
assuming anyone even uses SOCKS with the DN any more.  Unfortunately I don't 
have a lot of time to review this at the moment, though.

> Remove the long deprecated BlockReaderRemote
> 
>
> Key: HDFS-10548
> URL: https://issues.apache.org/jira/browse/HDFS-10548
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Kai Zheng
>Assignee: Kai Zheng
>
> To lessen the maintain burden like raised in HDFS-8901, suggest we remove 
> {{BlockReaderRemote}} class that's deprecated very long time ago. 
> From {{BlockReaderRemote}} header:
> {quote}
>  * @deprecated this is an old implementation that is being left around
>  * in case any issues spring up with the new {@link BlockReaderRemote2}
>  * implementation.
>  * It will be removed in the next release.
> {quote}
> From {{BlockReaderRemote2}} class header:
> {quote}
>  * This is a new implementation introduced in Hadoop 0.23 which
>  * is more efficient and simpler than the older BlockReader
>  * implementation. It should be renamed to BlockReaderRemote
>  * once we are confident in it.
> {quote}
> So even further, after getting rid of the old class, we could rename as the 
> comment suggested: BlockReaderRemote2 => BlockReaderRemote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10328) Add per-cache-pool default replication num configuration

2016-06-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10328:

Attachment: (was: HDFS-10328.004.patch)

> Add per-cache-pool default replication num configuration
> 
>
> Key: HDFS-10328
> URL: https://issues.apache.org/jira/browse/HDFS-10328
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch, 
> HDFS-10328.003.patch, HDFS-10328.004.patch
>
>
> For now, hdfs cacheadmin can not set a default replication num for cached 
> directive in the same cachepool. Each cache directive added in the same cache 
> pool should set their own replication num individually. 
> Consider this situation, we add daily hive table into cache pool "hive" .Each 
> time i should set the same replication num for every table directive in the 
> same cache pool.  
> I think we should enable setting a default replication num for a cachepool 
> that every cache directive in the pool can inherit replication configuration 
> from the pool. Also cache directive can override replication configuration 
> explicitly by calling "add & modify  directive -replication" command from 
> cli.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10328) Add per-cache-pool default replication num configuration

2016-06-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10328:

Attachment: HDFS-10328.004.patch

Reposting patch 004 (and rebasing on trunk) to get a Jenkins run

> Add per-cache-pool default replication num configuration
> 
>
> Key: HDFS-10328
> URL: https://issues.apache.org/jira/browse/HDFS-10328
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch, 
> HDFS-10328.003.patch, HDFS-10328.004.patch, HDFS-10328.004.patch
>
>
> For now, hdfs cacheadmin can not set a default replication num for cached 
> directive in the same cachepool. Each cache directive added in the same cache 
> pool should set their own replication num individually. 
> Consider this situation, we add daily hive table into cache pool "hive" .Each 
> time i should set the same replication num for every table directive in the 
> same cache pool.  
> I think we should enable setting a default replication num for a cachepool 
> that every cache directive in the pool can inherit replication configuration 
> from the pool. Also cache directive can override replication configuration 
> explicitly by calling "add & modify  directive -replication" command from 
> cli.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10328) Add per-cache-pool default replication num configuration

2016-06-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336775#comment-15336775
 ] 

Colin Patrick McCabe commented on HDFS-10328:
-

+1 pending jenkins.  Thanks, [~xupener].

> Add per-cache-pool default replication num configuration
> 
>
> Key: HDFS-10328
> URL: https://issues.apache.org/jira/browse/HDFS-10328
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch, 
> HDFS-10328.003.patch, HDFS-10328.004.patch
>
>
> For now, hdfs cacheadmin can not set a default replication num for cached 
> directive in the same cachepool. Each cache directive added in the same cache 
> pool should set their own replication num individually. 
> Consider this situation, we add daily hive table into cache pool "hive" .Each 
> time i should set the same replication num for every table directive in the 
> same cache pool.  
> I think we should enable setting a default replication num for a cachepool 
> that every cache directive in the pool can inherit replication configuration 
> from the pool. Also cache directive can override replication configuration 
> explicitly by calling "add & modify  directive -replication" command from 
> cli.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-06-16 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335100#comment-15335100
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

{code}
+if (context.getTotalRpcs() == context.getCurRpc() + 1) {
+  long leaseId = this.getBlockReportLeaseManager().removeLease(node);
+  BlockManagerFaultInjector.getInstance().
+  removeBlockReportLease(node, leaseId);
 }
+LOG.debug("Processing RPC with index " + context.getCurRpc()
++ " out of total " + context.getTotalRpcs() + " RPCs in "
++ "processReport 0x" +
+Long.toHexString(context.getReportId()));
   }
{code}
This won't work in the presence of reordered RPCs.  If the RPCs are reordered 
so that curRpc 1 arrives before curRpc 0, the lease will be removed and RPC 0 
will be rejected.

{code}
for (int r = 0; r < reports.length; r++) {
  final BlockListAsLongs blocks = reports[r].getBlocks();
  if (blocks != BlockListAsLongs.STORAGE_REPORT_ONLY) {
{code}
Using object equality to compare two {{BlockListAsLongs}} objects is very 
surprising to anyone reading the code.  In general, I find the idea of 
overloading the block list to sometimes not be a block list to be very weird 
and surprising.  If we are going to do it, it certainly needs a lot of comments 
in the code to explain what's going on.  I think it would be clearer and less 
error-prone just to add an optional list of storage ID strings in the 
{{.proto}} file.

{code}
if (nn.getFSImage().isUpgradeFinalized()) {
  Set storageIDsInBlockReport = new HashSet<>();
  if (context.getTotalRpcs() == context.getCurRpc() + 1) {
for (StorageBlockReport report : reports) {
  storageIDsInBlockReport.add(report.getStorage().getStorageID());
}
bm.removeZombieStorages(nodeReg, context, storageIDsInBlockReport);
  }
}
{code}
This isn't going to work in the presence of reordered RPCs, is it?  If curRpc 1 
appears before curRpc 0, we'll never get into this clause at all and so zombies 
won't be removed.  Considering you are so concerned that my patch didn't solve 
the interleaved and/or reordered RPC case, this seems like something you should 
solve.  I also don't understand what the rationale for ignoring zombies during 
an upgrade is.  Keep in mind zombie storages can lead to data loss under some 
conditions (see HDFS-7960 for details).

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Vinitha Reddy Gankidi
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.01.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky

2016-06-16 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-9466:
---
  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

Committed to 2.8.  Thanks, [~jojochuang].

> TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
> 
>
> Key: HDFS-9466
> URL: https://issues.apache.org/jira/browse/HDFS-9466
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, hdfs-client
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.8.0
>
> Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt
>
>
> This test is flaky and fails quite frequently in trunk.
> Error Message
> expected:<1> but was:<2>
> Stacktrace
> {noformat}
> java.lang.AssertionError: expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684)
> {noformat}
> Thanks to [~xiaochen] for identifying the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10525) Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap

2016-06-15 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10525:

   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

+1.

Committed to 2.8.  Thanks, [~xiaochen].

> Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap
> ---
>
> Key: HDFS-10525
> URL: https://issues.apache.org/jira/browse/HDFS-10525
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Fix For: 2.8.0
>
> Attachments: HDFS-10525.01.patch, HDFS-10525.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10525) Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap

2016-06-15 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333153#comment-15333153
 ] 

Colin Patrick McCabe commented on HDFS-10525:
-

+1.  Thanks, [~xiaochen].

> Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap
> ---
>
> Key: HDFS-10525
> URL: https://issues.apache.org/jira/browse/HDFS-10525
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-10525.01.patch, HDFS-10525.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10505) OIV's ReverseXML processor should support ACLs

2016-06-15 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10505:

  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

> OIV's ReverseXML processor should support ACLs
> --
>
> Key: HDFS-10505
> URL: https://issues.apache.org/jira/browse/HDFS-10505
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Surendra Singh Lilhore
> Fix For: 2.8.0
>
> Attachments: HDFS-10505-001.patch, HDFS-10505-002.patch
>
>
> OIV's ReverseXML processor should support ACLs.  Currently ACLs show up in 
> the fsimage.xml file, but we don't reconstruct them with ReverseXML.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10505) OIV's ReverseXML processor should support ACLs

2016-06-15 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333136#comment-15333136
 ] 

Colin Patrick McCabe commented on HDFS-10505:
-

+1.  Thanks, [~surendrasingh]

> OIV's ReverseXML processor should support ACLs
> --
>
> Key: HDFS-10505
> URL: https://issues.apache.org/jira/browse/HDFS-10505
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-10505-001.patch, HDFS-10505-002.patch
>
>
> OIV's ReverseXML processor should support ACLs.  Currently ACLs show up in 
> the fsimage.xml file, but we don't reconstruct them with ReverseXML.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10525) Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap

2016-06-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330503#comment-15330503
 ] 

Colin Patrick McCabe commented on HDFS-10525:
-

+1.  Thanks, [~xiaochen].

> Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap
> ---
>
> Key: HDFS-10525
> URL: https://issues.apache.org/jira/browse/HDFS-10525
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-10525.01.patch, HDFS-10525.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10505) OIV's ReverseXML processor should support ACLs

2016-06-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330492#comment-15330492
 ] 

Colin Patrick McCabe commented on HDFS-10505:
-

Thanks for this, [~surendrasingh].  It's good to see progress on supporting 
ACLs here!

I am confused by the changes for setting {{latestStringId}} to 1, or 
special-casing {{null}} in {{registerStringId}}.  If we are going to do 
"magical" things with special indexes in the string table, we need to document 
it somewhere.  Actually, though, I would prefer to simply handle it without the 
magic.  We know that a null entry for an ACL name simply means that the name 
was an empty string.  You can see that in {{AclEntry.java}}:
{code}
  String name = split[index];
  if (!name.isEmpty()) {
builder.setName(name);
  }
{code}

In ReverseXML, we should simply translate these {{null}} ACL names back into 
empty strings, and then the existing logic for handling the string table would 
work, with no magic.  We also need a test case which has null ACL names, so 
that this code is being exercised.

> OIV's ReverseXML processor should support ACLs
> --
>
> Key: HDFS-10505
> URL: https://issues.apache.org/jira/browse/HDFS-10505
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-10505-001.patch
>
>
> OIV's ReverseXML processor should support ACLs.  Currently ACLs show up in 
> the fsimage.xml file, but we don't reconstruct them with ReverseXML.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10525) Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap

2016-06-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330133#comment-15330133
 ] 

Colin Patrick McCabe commented on HDFS-10525:
-

Thanks, [~xiaochen].  Can you add a {{LOG.debug}} to the "if" statement that 
talks about the block ID that is getting skipped?

+1 once that's done.

> Fix NPE in CacheReplicationMonitor#rescanCachedBlockMap
> ---
>
> Key: HDFS-10525
> URL: https://issues.apache.org/jira/browse/HDFS-10525
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-10525.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-06-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322994#comment-15322994
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

[~shv], comments about me "being on a -1 spree" are not constructive and they 
don't do anything to help the tone of the discussion.  We've been talking about 
this since April and my views have been consistent the whole time.  I have a 
solution, but I am open to other solutions as long as they don't have big 
disadvantages.

bq. The whole approach of keeping the state for the block report processing on 
the NameNode is error-prone. It assumes at-once execution, and therefore when 
block reports interleave the BR-state gets messed up. Particularly, the BitSet 
used to mark storages, which have been processed, can be reset during 
interleaving multiple times and cannot be used to count storages in the report. 
In current implementation the messing-up of BR-state leads to false positive 
detection of a zombie storage and removal of a perfectly valid one.

Block report processing is inherently about state.  It is inherently stateful.  
It is a mechanism for the DN to synchronize its entire block state with the 
block state on the NN.  Interleaved block reports are very bad news, even if 
this bug didn't exist, because they mean that the state on the NN will go "back 
in time" for some storages, rather than monotonically moving forward in time.  
This may lead the NN to make incorrect (and potentially irreversible) decisions 
like deleting a replica somewhere because it appears to exist on an old stale 
interleaved block report.  Keep in mind that these old stale interleaved FBRs 
will override any incremental BRs that were sent in the meantime!

Interleaved block reports also potentially indicate that the DNs are sending 
new full block reports before the last ones have been processed.  So either our 
FBR retransmission mechanism is screwed up and is spewing a firehose of FBRs at 
an unresponsive NameNode (making it even more unresponsive, no doubt), or the 
NN can't process an FBR in the extremely long FBR sending period.  Both of 
these explanations mean that you've got a cluster which has serious, serious 
problems and you should fix it right now.

That's the reason why people are not taking this JIRA as seriously as they 
otherwise might-- because they know that interleaved FBRs mean that something 
is very wrong.  And you are consistently ignoring this feedback and telling us 
how my patch is bad because it doesn't perform zombie storage elimination when 
FBRs get interleaved.

bq. It seems that you don't or don't want to understand reasoning around adding 
separate storage reporting RPC call. At least you addressed it only by 
repeating your -1. For the third time. And did not respond to Zhe Zhang's 
proposal to merge the storage reporting RPC into one of the storage reports in 
the next jira.  Given that and in order to move forward, we should look into 
making changes to the last BR RPC call, which should now also report all 
storages.

I am fine with adding storage reporting to any of the existing FBR RPCs.  What 
I am not fine with is adding another RPC which will create more load.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10506) OIV's ReverseXML processor cannot reconstruct some snapshot details

2016-06-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321320#comment-15321320
 ] 

Colin Patrick McCabe commented on HDFS-10506:
-

>From 
>{{hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageReconstructor.java}}:
{code}
private void processDirDiffEntry() throws IOException {
  LOG.debug("Processing dirDiffEntry");
...
// TODO: add missing snapshotCopy field to XML
{code}

{code}
private void processFileDiffEntry() throws IOException {
  LOG.debug("Processing fileDiffEntry");
...
// TODO: missing snapshotCopy
// TODO: missing blocks
fileDiff.verifyNoRemainingKeys("fileDiff");
bld.build().writeDelimitedTo(out);
  }
  expectTagEnd(SNAPSHOT_DIFF_SECTION_FILE_DIFF_ENTRY);
}
{code}

> OIV's ReverseXML processor cannot reconstruct some snapshot details
> ---
>
> Key: HDFS-10506
> URL: https://issues.apache.org/jira/browse/HDFS-10506
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>
> OIV's ReverseXML processor cannot reconstruct some snapshot details.  
> Specifically,  should contain a  and  field, 
> but does not.   should contain a  field.  OIV also 
> needs to be changed to emit these fields into the XML (they are currently 
> missing).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10505) OIV's ReverseXML processor should support ACLs

2016-06-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321317#comment-15321317
 ] 

Colin Patrick McCabe commented on HDFS-10505:
-

>From 
>{{hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageReconstructor.java}}:

{code}
  private INodeSection.AclFeatureProto.Builder aclXmlToProto(Node acl)
  throws IOException {
// TODO: support ACLs
throw new IOException("ACLs are not supported yet.");
  }
{code}

> OIV's ReverseXML processor should support ACLs
> --
>
> Key: HDFS-10505
> URL: https://issues.apache.org/jira/browse/HDFS-10505
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>
> OIV's ReverseXML processor should support ACLs.  Currently ACLs show up in 
> the fsimage.xml file, but we don't reconstruct them with ReverseXML.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10506) OIV's ReverseXML processor cannot reconstruct some snapshot details

2016-06-08 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-10506:
---

 Summary: OIV's ReverseXML processor cannot reconstruct some 
snapshot details
 Key: HDFS-10506
 URL: https://issues.apache.org/jira/browse/HDFS-10506
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe


OIV's ReverseXML processor cannot reconstruct some snapshot details.  
Specifically,  should contain a  and  field, 
but does not.   should contain a  field.  OIV also needs 
to be changed to emit these fields into the XML (they are currently missing).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10505) OIV's ReverseXML processor should support ACLs

2016-06-08 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-10505:
---

 Summary: OIV's ReverseXML processor should support ACLs
 Key: HDFS-10505
 URL: https://issues.apache.org/jira/browse/HDFS-10505
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.8.0
Reporter: Colin Patrick McCabe


OIV's ReverseXML processor should support ACLs.  Currently ACLs show up in the 
fsimage.xml file, but we don't reconstruct them with ReverseXML.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-8061) Create an Offline FSImage Viewer tool

2016-06-08 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HDFS-8061.

  Resolution: Duplicate
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0

I believe this is a duplicate of HDFS-9835.  Feel free to reopen if there is 
more here not covered by that JIRA.

> Create an Offline FSImage Viewer tool
> -
>
> Key: HDFS-8061
> URL: https://issues.apache.org/jira/browse/HDFS-8061
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: Mike Drob
>Assignee: Lei (Eddy) Xu
> Fix For: 2.8.0
>
>
> We already have a tool for converting edit logs to and from binary and xml. 
> The next logical step it to create an `oiv` (offline image viewer) that will 
> allow users to manipulate the FS Image.
> When outputting to text, it might make sense to have two output formats - 1) 
> an XML that is easier to convert back to binary and 2) something that looks 
> like the output from `tree` command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8306) Outputs Xattr in OIV XML format

2016-06-08 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-8306:
---
  Resolution: Duplicate
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0  (was: )
  Status: Resolved  (was: Patch Available)

We added xattrs in the OIV XML format in HDFS-9835.

> Outputs Xattr in OIV XML format
> ---
>
> Key: HDFS-8306
> URL: https://issues.apache.org/jira/browse/HDFS-8306
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 2.8.0
>
> Attachments: HDFS-8306.000.patch, HDFS-8306.001.patch, 
> HDFS-8306.002.patch, HDFS-8306.003.patch, HDFS-8306.004.patch, 
> HDFS-8306.005.patch, HDFS-8306.006.patch, HDFS-8306.007.patch, 
> HDFS-8306.008.patch, HDFS-8306.009.patch, HDFS-8306.debug0.patch, 
> HDFS-8306.debug1.patch
>
>
> Currently, in the {{hdfs oiv}} XML outputs, not all fields of fsimage are 
> outputs. It makes inspecting {{fsimage}} from XML outputs less practical. 
> Also it prevents recovering a fsimage from XML file.
> This JIRA is adding ACL and XAttrs in the XML outputs as the first step to 
> achieve the goal described in HDFS-8061.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-06-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317644#comment-15317644
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

Sorry for the slow reply.  I was on vacation.

Like I said earlier, I am -1 on patch v4 because adding new RPCs is bad for NN 
scalability.  I also think it's a much larger patch than needed.  It doesn't 
make sense as an interim solution.

Why don't we commit v5 and discuss improvements in a follow-on JIRA?  So far 
there is no concrete argument against it other than the fact that it doesn't 
remove zombie storages in the case where BRs are interleaved.  But we already 
know that BR interleaving is an extremely rare corner case-- otherwise you can 
bet that this JIRA would have attracted a lot more attention.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-06-06 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317629#comment-15317629
 ] 

Colin Patrick McCabe commented on HDFS-9924:


+1 for a feature branch

> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-05-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308997#comment-15308997
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

bq. Vinitha's patch adds one RPC only in the case when block reports are sent 
in multiple RPCs.

The case where block reports are sent in multiple RPCs is exactly the case 
where scalability is the most important, since it indicates that we have a 
large number of blocks.  My patch adds no new RPCs.  If we are going to take an 
alternate approach, it should not involve a performance regression.

bq. Could you please review the patch.

I did review the patch.  I suggested adding an optional field in an existing 
RPC rather than adding a new RPC, and stated that I was -1 on adding new RPC 
load to the NN.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky

2016-05-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308980#comment-15308980
 ] 

Colin Patrick McCabe edited comment on HDFS-9466 at 6/1/16 12:52 AM:
-

Thanks for the explanation.  It sounds like the race condition is that the 
ShortCircuitRegistry on the DN needs to be informed about the client's decision 
that short-circuit is not working for the block, and this RPC takes time to 
arrive.  That background process races with completing the TCP read 
successfully and checking the number of slots in the unit test.

{code}
   public static interface Visitor {
-void accept(HashMap segments,
+boolean accept(HashMap segments,
 HashMultimap slots);
   }
{code}
I don't think it makes sense to change the return type of the visitor.  While 
you might find a boolean convenient, some other potential users of the 
interface might not find it useful.  Instead, just have your closure modify a 
{{final MutableBoolean}} declared nearby.

{code}
+}, 100, 1);
{code}
It seems like we could lower the latency here (perhaps check every 10 ms) and 
lengthen the timeout.  Since the test timeouts are generally 60s, I don't think 
it makes sense to make this timeout shorter than that.

+1 once that's addressed.  Thanks, [~jojochuang].  Sorry for the delay in 
reviews.


was (Author: cmccabe):
Thanks for the explanation.  It sounds like the race condition is that the 
ShortCircuitRegistry on the DN needs to be informed about the client's decision 
that short-circuit is not working for the block, and this RPC takes time to 
arrive.  That background process races with completing the TCP read 
successfully and checking the number of slots in the unit test.

{code}
   public static interface Visitor {
-void accept(HashMap segments,
+boolean accept(HashMap segments,
 HashMultimap slots);
   }
{code}
I don't think it makes sense to change the return type of the visitor.  While 
you might find a boolean convenient, some other potential users of the 
interface would have no use for it.  Instead, just have your closure modify a 
{{final MutableBoolean}} declared nearby.

{code}
+}, 100, 1);
{code}
No reason to make this shorter than the test limit, surely?

+1 once that's addressed.  Thanks, [~jojochuang].  Sorry for the delay in 
reviews.

> TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
> 
>
> Key: HDFS-9466
> URL: https://issues.apache.org/jira/browse/HDFS-9466
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, hdfs-client
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt
>
>
> This test is flaky and fails quite frequently in trunk.
> Error Message
> expected:<1> but was:<2>
> Stacktrace
> {noformat}
> java.lang.AssertionError: expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684)
> {noformat}
> Thanks to [~xiaochen] for identifying the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky

2016-05-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308980#comment-15308980
 ] 

Colin Patrick McCabe commented on HDFS-9466:


Thanks for the explanation.  It sounds like the race condition is that the 
ShortCircuitRegistry on the DN needs to be informed about the client's decision 
that short-circuit is not working for the block, and this RPC takes time to 
arrive.  That background process races with completing the TCP read 
successfully and checking the number of slots in the unit test.

{code}
   public static interface Visitor {
-void accept(HashMap segments,
+boolean accept(HashMap segments,
 HashMultimap slots);
   }
{code}
I don't think it makes sense to change the return type of the visitor.  While 
you might find a boolean convenient, some other potential users of the 
interface would have no use for it.  Instead, just have your closure modify a 
{{final MutableBoolean}} declared nearby.

{code}
+}, 100, 1);
{code}
No reason to make this shorter than the test limit, surely?

+1 once that's addressed.  Thanks, [~jojochuang].  Sorry for the delay in 
reviews.

> TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
> 
>
> Key: HDFS-9466
> URL: https://issues.apache.org/jira/browse/HDFS-9466
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, hdfs-client
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt
>
>
> This test is flaky and fails quite frequently in trunk.
> Error Message
> expected:<1> but was:<2>
> Stacktrace
> {noformat}
> java.lang.AssertionError: expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636)
>   at 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631)
>   at 
> org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684)
> {noformat}
> Thanks to [~xiaochen] for identifying the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called

2016-05-31 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10415:

   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to 2.8.

> TestDistributedFileSystem#MyDistributedFileSystem attempts to set up 
> statistics before initialize() is called
> -
>
> Key: HDFS-10415
> URL: https://issues.apache.org/jira/browse/HDFS-10415
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: jenkins
>Reporter: Sangjin Lee
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-10415-branch-2.000.patch, 
> HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch
>
>
> {noformat}
> Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem
> testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem)  Time 
> elapsed: 0.045 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790)
>   at 
> org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417)
>   at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187)
>   at 
> org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217)
> {noformat}
> This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other 
> combinations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called

2016-05-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308928#comment-15308928
 ] 

Colin Patrick McCabe commented on HDFS-10415:
-

The subclass can change the configuration that gets passed to the superclass.

class SuperClass {
  SuperClass(Configuration conf) {
... initialize superclass part of the object ...
  }
}

class SubClass extends SuperClass {
  SubClass(Configuration conf) {
super(changeConf(conf));
... initialize my part of the object ...
  }

  private static Configuration changeConf(Configuration conf) {
Configuration nconf = new Configuration(conf);
nconf.set("foo", "bar");
return nconf;
  }
}

Having a separate init() method is a well-known antipattern.  Initialization 
belongs in the constructor.  The only time a separate init method is really 
necessary is if you're using a dialect of C++ that doesn't support exceptions.

> TestDistributedFileSystem#MyDistributedFileSystem attempts to set up 
> statistics before initialize() is called
> -
>
> Key: HDFS-10415
> URL: https://issues.apache.org/jira/browse/HDFS-10415
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: jenkins
>Reporter: Sangjin Lee
>Assignee: Mingliang Liu
> Attachments: HDFS-10415-branch-2.000.patch, 
> HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch
>
>
> {noformat}
> Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem
> testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem)  Time 
> elapsed: 0.045 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790)
>   at 
> org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417)
>   at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187)
>   at 
> org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217)
> {noformat}
> This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other 
> combinations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called

2016-05-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308928#comment-15308928
 ] 

Colin Patrick McCabe edited comment on HDFS-10415 at 6/1/16 12:09 AM:
--

The subclass can change the configuration that gets passed to the superclass.

{code}
class SuperClass {
  SuperClass(Configuration conf) {
... initialize superclass part of the object ...
  }
}

class SubClass extends SuperClass {
  SubClass(Configuration conf) {
super(changeConf(conf));
... initialize my part of the object ...
  }

  private static Configuration changeConf(Configuration conf) {
Configuration nconf = new Configuration(conf);
nconf.set("foo", "bar");
return nconf;
  }
}
{code}

Having a separate init() method is a well-known antipattern.  Initialization 
belongs in the constructor.  The only time a separate init method is really 
necessary is if you're using a dialect of C++ that doesn't support exceptions.


was (Author: cmccabe):
The subclass can change the configuration that gets passed to the superclass.

class SuperClass {
  SuperClass(Configuration conf) {
... initialize superclass part of the object ...
  }
}

class SubClass extends SuperClass {
  SubClass(Configuration conf) {
super(changeConf(conf));
... initialize my part of the object ...
  }

  private static Configuration changeConf(Configuration conf) {
Configuration nconf = new Configuration(conf);
nconf.set("foo", "bar");
return nconf;
  }
}

Having a separate init() method is a well-known antipattern.  Initialization 
belongs in the constructor.  The only time a separate init method is really 
necessary is if you're using a dialect of C++ that doesn't support exceptions.

> TestDistributedFileSystem#MyDistributedFileSystem attempts to set up 
> statistics before initialize() is called
> -
>
> Key: HDFS-10415
> URL: https://issues.apache.org/jira/browse/HDFS-10415
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: jenkins
>Reporter: Sangjin Lee
>Assignee: Mingliang Liu
> Attachments: HDFS-10415-branch-2.000.patch, 
> HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch
>
>
> {noformat}
> Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem
> testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem)  Time 
> elapsed: 0.045 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790)
>   at 
> org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417)
>   at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187)
>   at 
> org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217)
> {noformat}
> This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other 
> combinations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called

2016-05-31 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10415:

Summary: TestDistributedFileSystem#MyDistributedFileSystem attempts to set 
up statistics before initialize() is called  (was: 
TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2)

> TestDistributedFileSystem#MyDistributedFileSystem attempts to set up 
> statistics before initialize() is called
> -
>
> Key: HDFS-10415
> URL: https://issues.apache.org/jira/browse/HDFS-10415
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: jenkins
>Reporter: Sangjin Lee
>Assignee: Mingliang Liu
> Attachments: HDFS-10415-branch-2.000.patch, 
> HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch
>
>
> {noformat}
> Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem
> testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem)  Time 
> elapsed: 0.045 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790)
>   at 
> org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417)
>   at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187)
>   at 
> org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217)
> {noformat}
> This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other 
> combinations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10415) TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2

2016-05-31 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308894#comment-15308894
 ] 

Colin Patrick McCabe commented on HDFS-10415:
-

It sounds like there are no strong objections to HDFS-10415.000.patch and 
HDFS-10415-branch-2.001.patch  Let's fix this unit test!

We can improve this in a follow-on JIRA (personally, I like the idea of adding 
the initialization to the {{init}} method).  But it's not worth blocking the 
unit test fix.

+1.



> TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2
> --
>
> Key: HDFS-10415
> URL: https://issues.apache.org/jira/browse/HDFS-10415
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
> Environment: jenkins
>Reporter: Sangjin Lee
>Assignee: Mingliang Liu
> Attachments: HDFS-10415-branch-2.000.patch, 
> HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch
>
>
> {noformat}
> Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem
> testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem)  Time 
> elapsed: 0.045 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790)
>   at 
> org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417)
>   at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187)
>   at 
> org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217)
> {noformat}
> This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other 
> combinations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-05-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304702#comment-15304702
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

[~redvine], the fact that you are having trouble with stale storages versus 
zombie storages is because your patch uses a separate mechanism to detect what 
storages exist on the DN.  The existing code doesn't have this problem because 
the full block report itself acted as the record of what storages existed.  
This is one negative side effect of the more complex approach.  Another 
negative side effect is that you are transmitting the same information about 
which storages are present multiple times.

Despite these negatives, I'm still willing to review a patch that uses the more 
complicated method as long as you don't introduce extra RPCs.  I agree that we 
should remove a stale storage if it doesn't appear in the full listing that 
gets sent.  Just to be clear, I am -1 on a patch which adds extra RPCs.  
Perhaps you can send this listing in an optional field in the first RPC.

[~daryn], I don't like the idea of "band-aiding" this issue rather than fixing 
it at the root.  Throwing an exception on interleaved storage reports, or 
forbidding combined storage reports, seem like very brittle work-arounds that 
could easily be undone by someone making follow-on changes.  I came up with 
patch 005 and the earlier patches as a very simple fix that could easily be 
backported.  If you are interested in something simple, then please check it 
out... or at least give a reason for not checking it out.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303085#comment-15303085
 ] 

Colin Patrick McCabe commented on HDFS-7240:


bq. Correct me if I am wrong – before Andrew Wang's contribution, symlink was 
somehow working (based on Eli Collins's work). After Andrew's work, we had no 
choice but disable the symlink feature. It this sense, symlink became even 
worse. Anyway, Andrew/Eli, any plan to fix symlink?

Symlinks were broken before Andrew started working on them.  They had serious 
security, performance, and usability issues.  If you are interested in learning 
more about the issues and helping to fix them, take a look at HADOOP-10019.  
They were disabled to avoid exposing people to serious security risks.  In the 
meantime, I will note that you were one of the reviewers on the JIRA that 
initially introduced symlinks, HDFS-245, before Andrew or I had even started 
working on Hadoop.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-05-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302426#comment-15302426
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

I never said that patch 004 introduced incompatible changes.  I just argued 
that it was a bigger change than was necessary to fix the problem.  All other 
things being equal, we would prefer a smaller change to a bigger one.  The only 
argument you have given against my change is that it doesn't fix the problem in 
the case where full block reports are interleaved.  But this is an extremely, 
extremely rare case, to the point where nobody else has even seen this problem 
in their cluster.

I still think that patch 005 is an easier way to fix the problem.  It's 
basically a simple bugfix to my original patch.  However, if you want to do 
something more complex, I will review it.  But I don't want to add any 
additional RPCs.  We already have problems with NameNode performance and we 
should not be adding more RPCs when it's not needed.  We can include the 
storage information in the first RPC of the block report as an optional field.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301560#comment-15301560
 ] 

Colin Patrick McCabe commented on HDFS-7240:


bq. [~szetszwo] wrote: I seem to recall that you got your committership by 
contributing the symlink feature, however, the symlink feature is still not 
working as of today. Why don't you fix it? I think you want to build up a good 
track record for yourself.

[~andrew.wang] did not get his commitership by contributing the symlink 
feature.  By the time he was elected as a committer, he had contributed a 
system for efficiently storing and reporting high-percentile metrics, an API to 
expose disk location information to advanced HDFS clients, converted all 
remaining JUnit 3 HDFS tests to JUnit 4, and added symlink support to 
FileSystem.  The last one was just contributing a new API to the FileSystem 
class, not implementing the symlink feature itself.  You are probably thinking 
of [~eli], who became a committer partly by working on HDFS symlinks.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-05-24 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298675#comment-15298675
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

Oh, sorry!  I didn't realize we had added a new rule about attaching patches.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-05-24 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10301:

Attachment: HDFS-10301.005.patch

Rebasing patch 003 on trunk.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-05-24 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298565#comment-15298565
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

Hi [~redvine],

Thanks for your interest in this.  I wish I could get more people interested in 
this JIRA-- it has been hard to raise interest, unfortunately.

Just to clarify, you don't need to assign a JIRA to yourself in order to post a 
patch or suggest a solution.  In general, when someone is actively working on a 
patch, you should ask before reassigning their JIRAs to yourself.

A whole separate RPC just for reporting the storages which are present seems 
excessive.  It will add additional load to the namenode.

{code}
 if (node.leaseId == 0) {
-  LOG.warn("BR lease 0x{} is not valid for DN {}, because the DN " +
-   "is not in the pending set.",
-   Long.toHexString(id), dn.getDatanodeUuid());
-  return false;
+  LOG.debug("DN {} is not in the pending set because BR with lease 0x{} 
was processed out of order",
+  dn.getDatanodeUuid(), Long.toHexString(id));
+  return true;
{code}
The leaseId being 0 doesn't mean that the block report was processed out of 
order.  If you manually trigger a block report with the {{hdfs dfsadmin 
\-triggerBlockReport}} command, it will also have lease id 0.  Legacy block 
reports will also have lease ID 0.

In general, your solution doesn't fix the problem during upgrade and is a much 
bigger patch, which is why I think HDFS-10301.003.patch should be committed and 
the RPC changes should be done in a follow-on JIRA.  I do not see us 
backporting RPC changes to all the stable branches.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.01.patch, HDFS-10301.sample.patch, 
> zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-05-24 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reassigned HDFS-10301:
---

Assignee: Colin Patrick McCabe  (was: Vinitha Reddy Gankidi)

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.01.patch, HDFS-10301.sample.patch, 
> zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10448) CacheManager#checkLimit always assumes a replication factor of 1

2016-05-24 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298543#comment-15298543
 ] 

Colin Patrick McCabe commented on HDFS-10448:
-

I think it should change {{computeNeeded}} to take replication into account, 
rather than modifying the code that calls {{computeNeeded}}.

> CacheManager#checkLimit  always assumes a replication factor of 1
> -
>
> Key: HDFS-10448
> URL: https://issues.apache.org/jira/browse/HDFS-10448
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10448.001.patch
>
>
> The logic in {{CacheManager#checkLimit}} is not correct. In this method, it 
> does with these three logic:
> First, it will compute needed bytes for the specific path.
> {code}
> CacheDirectiveStats stats = computeNeeded(path, replication);
> {code}
> But the param {{replication}} is not used here. And the bytesNeeded is just 
> one replication's vaue.
> {code}
> return new CacheDirectiveStats.Builder()
> .setBytesNeeded(requestedBytes)
> .setFilesCached(requestedFiles)
> .build();
> {code}
> Second, then it should be multiply by the replication to compare the limit 
> size because the method {{computeNeeded}} was not used replication.
> {code}
> pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > 
> pool.getLimit()
> {code}
> Third, if we find the size was more than the limit value and then print 
> warning info. It divided by replication here, while the 
> {{stats.getBytesNeeded()}} was just one replication value.
> {code}
>   throw new InvalidRequestException("Caching path " + path + " of size "
>   + stats.getBytesNeeded() / replication + " bytes at replication "
>   + replication + " would exceed pool " + pool.getPoolName()
>   + "'s remaining capacity of "
>   + (pool.getLimit() - pool.getBytesNeeded()) + " bytes.");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10448) CacheManager#checkLimit always assumes a replication factor of 1

2016-05-23 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10448:

Summary: CacheManager#checkLimit  always assumes a replication factor of 1  
(was: CacheManager#checkLimit  not correctly)

> CacheManager#checkLimit  always assumes a replication factor of 1
> -
>
> Key: HDFS-10448
> URL: https://issues.apache.org/jira/browse/HDFS-10448
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10448.001.patch
>
>
> The logic in {{CacheManager#checkLimit}} is not correct. In this method, it 
> does with these three logic:
> First, it will compute needed bytes for the specific path.
> {code}
> CacheDirectiveStats stats = computeNeeded(path, replication);
> {code}
> But the param {{replication}} is not used here. And the bytesNeeded is just 
> one replication's vaue.
> {code}
> return new CacheDirectiveStats.Builder()
> .setBytesNeeded(requestedBytes)
> .setFilesCached(requestedFiles)
> .build();
> {code}
> Second, then it should be multiply by the replication to compare the limit 
> size because the method {{computeNeeded}} was not used replication.
> {code}
> pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > 
> pool.getLimit()
> {code}
> Third, if we find the size was more than the limit value and then print 
> warning info. It divided by replication here, while the 
> {{stats.getBytesNeeded()}} was just one replication value.
> {code}
>   throw new InvalidRequestException("Caching path " + path + " of size "
>   + stats.getBytesNeeded() / replication + " bytes at replication "
>   + replication + " would exceed pool " + pool.getPoolName()
>   + "'s remaining capacity of "
>   + (pool.getLimit() - pool.getBytesNeeded()) + " bytes.");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10448) CacheManager#checkLimit not correctly

2016-05-23 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297068#comment-15297068
 ] 

Colin Patrick McCabe commented on HDFS-10448:
-

This is a good find.  I think that {{computeNeeded}} should take replication 
into account-- the fact that it doesn't currently is a bug.  Then there would 
be no need to change the callers of {{computeNeeded}}.

> CacheManager#checkLimit  not correctly
> --
>
> Key: HDFS-10448
> URL: https://issues.apache.org/jira/browse/HDFS-10448
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10448.001.patch
>
>
> The logic in {{CacheManager#checkLimit}} is not correct. In this method, it 
> does with these three logic:
> First, it will compute needed bytes for the specific path.
> {code}
> CacheDirectiveStats stats = computeNeeded(path, replication);
> {code}
> But the param {{replication}} is not used here. And the bytesNeeded is just 
> one replication's vaue.
> {code}
> return new CacheDirectiveStats.Builder()
> .setBytesNeeded(requestedBytes)
> .setFilesCached(requestedFiles)
> .build();
> {code}
> Second, then it should be multiply by the replication to compare the limit 
> size because the method {{computeNeeded}} was not used replication.
> {code}
> pool.getBytesNeeded() + (stats.getBytesNeeded() * replication) > 
> pool.getLimit()
> {code}
> Third, if we find the size was more than the limit value and then print 
> warning info. It divided by replication here, while the 
> {{stats.getBytesNeeded()}} was just one replication value.
> {code}
>   throw new InvalidRequestException("Caching path " + path + " of size "
>   + stats.getBytesNeeded() / replication + " bytes at replication "
>   + replication + " would exceed pool " + pool.getPoolName()
>   + "'s remaining capacity of "
>   + (pool.getLimit() - pool.getBytesNeeded()) + " bytes.");
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-05-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294073#comment-15294073
 ] 

Colin Patrick McCabe commented on HDFS-9924:


bq. Good point! On the other hand, programmers who are NOT familiar with 
Node.js may NOT want something that supports callback chaining.

Callback chaining is an optional feature which nobody is forced to use.  I 
don't see why anyone would prefer Future over CompletableFuture.

bq. Also, you might not have noticed, supporting Future is a step toward 
supporting CompletableFuture.

I don't see why supporting Future is a step towards supporting a different API. 
 I think Hadoop has too many APIs with duplicate functionality already, and we 
should try to minimize the cognitive load on new developers.

bq. I guess you might have misunderstood the release process. The release 
manager could include/exclude any feature as she/he pleases.

Which 2.x release do you want this to become part of?

> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293822#comment-15293822
 ] 

Colin Patrick McCabe commented on HDFS-7240:


bq. Another question about reading the ApacheCon slides: the question "Why an 
Object Store" was well answered. How about "why an object store as part of 
HDFS"? IIUC Ozone is only leveraging a very small portion of HDFS code. Why 
should it be a part of HDFS instead of a separate project?

That's a very good question.  Why can't ozone be its own subproject within 
Hadoop?  We could add a hadoop-ozone directory at the top level of the git 
repo.  Ozone seems to be reusing very little of the HDFS code.  For example, it 
doesn't store blocks the way the DataNode stores blocks.  It doesn't run the 
HDFS NameNode.  It doesn't use the HDFS client code.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-05-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293809#comment-15293809
 ] 

Colin Patrick McCabe commented on HDFS-9924:


I have to agree with [~andrew.wang] that it makes more sense to put these 
changes in trunk than in branch-2.  The Hadoop 2.8 release has been blocked for 
a very, very long time.  There are tons of features in branch-2 that have been 
waiting for almost a year to be released.  Adding yet another feature to 
branch-2, when we're so far behind on releases, doesn't make sense.

Programmers who are familiar with Node.js will want something that supports 
callback chaining, like CompletableFuture, rather than something like the 
old-style Future API.  If we target this at branch-3, we can use the jdk8 
CompletableFuture.

If we are going to backport this to branch-2, we should do it once the feature 
is done, rather than backporting bits and pieces as we go.  This is especially 
true when there are still open questions about the API.

> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10415) TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2

2016-05-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289769#comment-15289769
 ] 

Colin Patrick McCabe commented on HDFS-10415:
-

bq. As Steve Loughran's concern, if the stats has nothing to do with this unit 
test, we can consider avoiding it. I'm more favor of this approach.

Sure.  Thanks for the explanation.

bq. there's another option, you know. Do the stats init in the constructor 
rather than initialize. There is no information used in setting up 
DFSClient.storageStatistics, its only ever written to once. Move it to the 
constructor and make final and maybe this problem will go away (maybe, mocks 
are a PITA)

It seems like this would prevent us from using the Configuration object in the 
future when creating stats, right?  I think we should keep this flexibility.

This whole problem arises because the FileSystem constructor doesn't require a 
Configuration and it should, which leads to the "construct then initialize" 
idiom.  If it just took a Configuration in the first place we could initialize 
everything in the constructor.  grumble grumble

> TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2
> --
>
> Key: HDFS-10415
> URL: https://issues.apache.org/jira/browse/HDFS-10415
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.9.0
> Environment: jenkins
>Reporter: Sangjin Lee
>Assignee: Mingliang Liu
> Attachments: HDFS-10415-branch-2.000.patch, 
> HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch
>
>
> {noformat}
> Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem
> testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem)  Time 
> elapsed: 0.045 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790)
>   at 
> org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417)
>   at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187)
>   at 
> org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217)
> {noformat}
> This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other 
> combinations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10415) TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2

2016-05-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289312#comment-15289312
 ] 

Colin Patrick McCabe commented on HDFS-10415:
-

Thanks for looking at this.

So basically the problem is that we're attempting to do something in the 
constructor of our {{DistributedFileSystem}} subclass that requires that the FS 
already be initialized.  Why not just override the {{initialize}} method with 
something like:

{code}
public void initialize() {
  super.initialize();
  statistics = new FileSystem.Statistics("myhdfs"); // can't mock finals
}
{code}

That seems like the most natural fix since it's not doing "weird stuff" that we 
don't do outside unit tests.

I don't feel strongly about this, though, any of the solutions proposed here 
seems like it would work.

> TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2
> --
>
> Key: HDFS-10415
> URL: https://issues.apache.org/jira/browse/HDFS-10415
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.9.0
> Environment: jenkins
>Reporter: Sangjin Lee
>Assignee: Mingliang Liu
> Attachments: HDFS-10415-branch-2.000.patch, 
> HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch
>
>
> {noformat}
> Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem
> testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem)  Time 
> elapsed: 0.045 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790)
>   at 
> org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417)
>   at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187)
>   at 
> org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217)
> {noformat}
> This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other 
> combinations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8829) Make SO_RCVBUF and SO_SNDBUF size configurable for DataTransferProtocol sockets and allow configuring auto-tuning

2016-05-18 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289239#comment-15289239
 ] 

Colin Patrick McCabe commented on HDFS-8829:


There was no need to make it public because it's only used by unit tests.  Is 
there a reason why it should be public?

> Make SO_RCVBUF and SO_SNDBUF size configurable for DataTransferProtocol 
> sockets and allow configuring auto-tuning
> -
>
> Key: HDFS-8829
> URL: https://issues.apache.org/jira/browse/HDFS-8829
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.3.0, 2.6.0
>Reporter: He Tianyi
>Assignee: He Tianyi
> Fix For: 2.8.0
>
> Attachments: HDFS-8829.0001.patch, HDFS-8829.0002.patch, 
> HDFS-8829.0003.patch, HDFS-8829.0004.patch, HDFS-8829.0005.patch, 
> HDFS-8829.0006.patch
>
>
> {code:java}
>   private void initDataXceiver(Configuration conf) throws IOException {
> // find free port or use privileged port provided
> TcpPeerServer tcpPeerServer;
> if (secureResources != null) {
>   tcpPeerServer = new TcpPeerServer(secureResources);
> } else {
>   tcpPeerServer = new TcpPeerServer(dnConf.socketWriteTimeout,
>   DataNode.getStreamingAddr(conf));
> }
> 
> tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE);
> {code}
> The last line sets SO_RCVBUF explicitly, thus disabling tcp auto-tuning on 
> some system.
> Shall we make this behavior configurable?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10404) Fix formatting of CacheAdmin command usage help text

2016-05-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10404:

  Resolution: Fixed
   Fix Version/s: 2.9.0
Target Version/s: 2.9.0
  Status: Resolved  (was: Patch Available)

> Fix formatting of CacheAdmin command usage help text
> 
>
> Key: HDFS-10404
> URL: https://issues.apache.org/jira/browse/HDFS-10404
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Fix For: 2.9.0
>
> Attachments: HDFS-10404.001.patch, HDFS-10404.002.patch
>
>
> In {{CacheAdmin}}, there are two places that not completely showing the cmd 
> usage message.
> {code}
> $ hdfs cacheadmin
> Usage: bin/hdfs cacheadmin [COMMAND]
>   [-addDirective -path  -pool  [-force] 
> [-replication ] [-ttl ]]
>   [-modifyDirective -id  [-path ] [-force] [-replication 
> ] [-pool ] [-ttl ]]
>   [-listDirectives [-stats] [-path ] [-pool ] [-id ]
>   [-removeDirective ]
>   [-removeDirectives -path ]
>   [-addPool  [-owner ] [-group ] [-mode ] 
> [-limit ] [-maxTtl ]
> {code}
> The command {{-listDirectives}} and {{-addPool}} are not showing completely, 
> they are both lacking a ']' in the end of line.
> In the {{CentralizedCacheManagement.md}}, there is also a similar problem. 
> The page of {{CentralizedCacheManagement}} can also showed this, 
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10404) Fix formatting of CacheAdmin command usage help text

2016-05-17 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10404:

Summary: Fix formatting of CacheAdmin command usage help text  (was: 
CacheAdmin command usage message not shows completely)

> Fix formatting of CacheAdmin command usage help text
> 
>
> Key: HDFS-10404
> URL: https://issues.apache.org/jira/browse/HDFS-10404
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10404.001.patch, HDFS-10404.002.patch
>
>
> In {{CacheAdmin}}, there are two places that not completely showing the cmd 
> usage message.
> {code}
> $ hdfs cacheadmin
> Usage: bin/hdfs cacheadmin [COMMAND]
>   [-addDirective -path  -pool  [-force] 
> [-replication ] [-ttl ]]
>   [-modifyDirective -id  [-path ] [-force] [-replication 
> ] [-pool ] [-ttl ]]
>   [-listDirectives [-stats] [-path ] [-pool ] [-id ]
>   [-removeDirective ]
>   [-removeDirectives -path ]
>   [-addPool  [-owner ] [-group ] [-mode ] 
> [-limit ] [-maxTtl ]
> {code}
> The command {{-listDirectives}} and {{-addPool}} are not showing completely, 
> they are both lacking a ']' in the end of line.
> In the {{CentralizedCacheManagement.md}}, there is also a similar problem. 
> The page of {{CentralizedCacheManagement}} can also showed this, 
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10404) CacheAdmin command usage message not shows completely

2016-05-17 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287149#comment-15287149
 ] 

Colin Patrick McCabe commented on HDFS-10404:
-

+1.  Thanks, [~linyiqun].

> CacheAdmin command usage message not shows completely
> -
>
> Key: HDFS-10404
> URL: https://issues.apache.org/jira/browse/HDFS-10404
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10404.001.patch, HDFS-10404.002.patch
>
>
> In {{CacheAdmin}}, there are two places that not completely showing the cmd 
> usage message.
> {code}
> $ hdfs cacheadmin
> Usage: bin/hdfs cacheadmin [COMMAND]
>   [-addDirective -path  -pool  [-force] 
> [-replication ] [-ttl ]]
>   [-modifyDirective -id  [-path ] [-force] [-replication 
> ] [-pool ] [-ttl ]]
>   [-listDirectives [-stats] [-path ] [-pool ] [-id ]
>   [-removeDirective ]
>   [-removeDirectives -path ]
>   [-addPool  [-owner ] [-group ] [-mode ] 
> [-limit ] [-maxTtl ]
> {code}
> The command {{-listDirectives}} and {{-addPool}} are not showing completely, 
> they are both lacking a ']' in the end of line.
> In the {{CentralizedCacheManagement.md}}, there is also a similar problem. 
> The page of {{CentralizedCacheManagement}} can also showed this, 
> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10276) HDFS throws AccessControlException when checking for the existence of /a/b when /a is a file

2016-05-12 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10276:

Summary: HDFS throws AccessControlException when checking for the existence 
of /a/b when /a is a file  (was: Different results for exist call for 
file.ext/name)

> HDFS throws AccessControlException when checking for the existence of /a/b 
> when /a is a file
> 
>
> Key: HDFS-10276
> URL: https://issues.apache.org/jira/browse/HDFS-10276
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kevin Cox
>Assignee: Yuanbo Liu
> Attachments: HDFS-10276.001.patch, HDFS-10276.002.patch, 
> HDFS-10276.003.patch, HDFS-10276.004.patch
>
>
> Given you have a file {{/file}} an existence check for the path 
> {{/file/whatever}} will give different responses for different 
> implementations of FileSystem.
> LocalFileSystem will return false while DistributedFileSystem will throw 
> {{org.apache.hadoop.security.AccessControlException: Permission denied: ..., 
> access=EXECUTE, ...}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10276) Different results for exist call for file.ext/name

2016-05-12 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281676#comment-15281676
 ] 

Colin Patrick McCabe commented on HDFS-10276:
-

Thanks for this, [~yuanbo].  Sorry for the sometimes slow pace of reviews.

If I understand correctly, the approach taken in this path is to change HDFS to 
throw an exception stating that the parent path is not a directory, rather than 
throwing an AccessControlException.

So first of all, this sounds like an incompatible change.  That's OK-- it just 
means this should probably go into branch-3 (trunk) rather than branch-2.

Secondly, it seems like it would be better to make the modification inside 
{{FSPermissionChecker}}, rather than adding an external function.  After all, 
this is a general problem, which affects more than just listDir.  We also need 
to make sure that we are not giving away too much information about the 
filesystem.  For example, if the user asks for {{/a/b/c}}, but does not have 
permission to list {{/a}}, we should not complain about {{/a/b}} not being a 
directory since that reveals privileged information.

> Different results for exist call for file.ext/name
> --
>
> Key: HDFS-10276
> URL: https://issues.apache.org/jira/browse/HDFS-10276
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kevin Cox
>Assignee: Yuanbo Liu
> Attachments: HDFS-10276.001.patch, HDFS-10276.002.patch, 
> HDFS-10276.003.patch, HDFS-10276.004.patch
>
>
> Given you have a file {{/file}} an existence check for the path 
> {{/file/whatever}} will give different responses for different 
> implementations of FileSystem.
> LocalFileSystem will return false while DistributedFileSystem will throw 
> {{org.apache.hadoop.security.AccessControlException: Permission denied: ..., 
> access=EXECUTE, ...}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-05-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280809#comment-15280809
 ] 

Colin Patrick McCabe commented on HDFS-9924:


bq. For some case like network connection errors, if we do not throw exception 
until Future#get, the client could summit a large number of calls and the catch 
a lot of exceptions in Future#get. It is fail-fast if the client catch an 
exception in the first async call.

That makes sense.  Thanks for the explanation.

bq. It actually throws AsyncCallLimitExceededException so the client can keep 
trying sending more requests by catching it.

Hmm.  Is there a way for the client to wait until more async calls are 
available, without polling?

> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-05-11 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280617#comment-15280617
 ] 

Colin Patrick McCabe commented on HDFS-9924:


With regard to error handling, why not handle all errors as exceptions thrown 
from {{Future#get}}?  Handling some errors in a different way because they 
happened "earlier" (let's say, on the client side rather than server side) 
forces the client to put error checking code in two places.

Does the {{Future#get}} callback get made without holding any locks?  Can other 
asynchronous calls be made from this context?

{code}
public boolean rename(Path src, Path dst) throws IOException {
  if (isAsynchronousMode()) {
return getFutureDistributedFileSystem().rename(src, dst).get();
  } else {
... //current implementation.
  }
}
{code}
It seems concerning that we would have to make such a large change to the 
synchronous {{DistributedFileSystem}} code.  This would also result in more GC 
load since we'd be creating lots of {{Future}} objects.  Shouldn't it be 
possible to avoid this?  I do not think having some kind of global async bit is 
a good idea.

bq. In order to avoid client abusing the server by asynchronous calls. The RPC 
client should have a configurable limit in order to limit the outstanding 
asynchronous calls. The caller may be blocked if the number of outstanding 
calls hits the limit so that the caller is slowed down.

Blocking the client seems like it could be problematic for code which expects 
to be asynchronous.  There should be an option to throw an exception in this 
case.

I also think that we could maintain a queue of async calls that we have not 
submitted to the IPC layer yet, to avoid being limited by issues at the IPC 
layer.

bq.­ Support asynchronous FileContext (client API)

{{AsynchronousFileSystem}} is a separate API from {{FileSystem}}.  If there are 
issues with {{FileSystem}}, surely we can fix them in 
{{AsynchronousFileSystem}} rather than creating a fourth API?

bq.­ Use Java 8’s new language feature in the API (client API).

Given that Hadoop 3.x will probably be Java 8 (based on the mailing list 
discussion), why not just make the async API use jdk8's {{CompletableFuture}} 
from day 1, rather than hacking it in later?

> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-05-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278651#comment-15278651
 ] 

Colin Patrick McCabe commented on HDFS-9924:


Hi all,

I am +1 for doing this work, based on possible performance improvements we 
might see, and the need for a convenient asynchronous API for applications.

However, I am concerned that there has been no design doc posted, but already 
code committed to trunk.  I am -1 on committing anything more to trunk until we 
have a design document explaining how the API will work and what changes it 
will require in HDFS.  Just to clarify, I would be fine on committing code to a 
feature branch without a design document, since we can review it later prior to 
the merge.  However, it is concerning to see such a large feature proceed on 
trunk without either a branch or a design that the community can review.

> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10377) CacheReplicationMonitor shutdown log message should use INFO level.

2016-05-10 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10377:

  Resolution: Fixed
   Fix Version/s: 2.6.5
Target Version/s: 2.6.5
  Status: Resolved  (was: Patch Available)

> CacheReplicationMonitor shutdown log message should use INFO level.
> ---
>
> Key: HDFS-10377
> URL: https://issues.apache.org/jira/browse/HDFS-10377
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Affects Versions: 2.6.5
>Reporter: Konstantin Shvachko
>Assignee: Yiqun Lin
>  Labels: newbie
> Fix For: 2.6.5
>
> Attachments: HDFS-10377.001.patch
>
>
> HDFS-7258 changed some log messages to DEBUG level from INFO. DEBUG level is 
> good for frequently logged messages, but the shutdown message is logged once 
> and should be INFO level the same as the startup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10377) CacheReplicationMonitor shutdown log message should use INFO level.

2016-05-10 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278609#comment-15278609
 ] 

Colin Patrick McCabe commented on HDFS-10377:
-

+1.  Thanks, [~linyiqun].

> CacheReplicationMonitor shutdown log message should use INFO level.
> ---
>
> Key: HDFS-10377
> URL: https://issues.apache.org/jira/browse/HDFS-10377
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging, namenode
>Affects Versions: 2.6.5
>Reporter: Konstantin Shvachko
>Assignee: Yiqun Lin
>  Labels: newbie
> Attachments: HDFS-10377.001.patch
>
>
> HDFS-7258 changed some log messages to DEBUG level from INFO. DEBUG level is 
> good for frequently logged messages, but the shutdown message is logged once 
> and should be INFO level the same as the startup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-05-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271863#comment-15271863
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

Thanks for looking at this, [~daryn].  I'm not sure about the approach you 
proposed, though.  If interleaved full block reports really are very common for 
[~shv], it seems like throwing an exception when these are received would be 
problematic.  It sounds like there might be some implementation concerns as 
well, although I didn't look at the patch.

bq. [~shv] wrote: I don't think my approach requires RPC change, since the 
block-report RPC message already has all required structures in place. It 
should require only the processing logic change.

Just to be clear.  If what is being sent over the wire is changing, I would 
consider that an "RPC change."  We can create an RPC change without modifying 
the {{.proto}} file-- for example, by choosing not to fill in some optional 
field, or filling in some other field.

bq. Colin, it would have been good to have an interim solution, but it does not 
seem reasonable to commit a patch, which fixes one bug, while introducing 
another.

The patch doesn't introduce any bugs.  It does mean that we won't remove zombie 
storages when interleaved block reports are received.  But we are not handling 
this correctly right now either, so that is not a regression.

Like I said earlier, I think your approach is a good one, but I think we should 
get in the patch I posted here.  It is a very small and non-disruptive change 
which doesn't alter what is sent over the wire.  It can easily be backported to 
stable branches.  Why don't we commit this patch, and then work on a follow-on 
with the RPC change and simplification that you proposed?

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10328) Add per-cache-pool default replication num configuration

2016-05-03 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268954#comment-15268954
 ] 

Colin Patrick McCabe commented on HDFS-10328:
-

Thanks for the patch, [~xupener].

{code}
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
 
b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
index 7acb394..73db055 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
+++ 
b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
@@ -533,7 +533,8 @@ message CachePoolInfoProto {
   optional string groupName = 3;
   optional int32 mode = 4;
   optional int64 limit = 5;
-  optional int64 maxRelativeExpiry = 6;
+  optional uint32 defaultReplication = 6;
+  optional int64 maxRelativeExpiry = 7;
 }
{code}
Please be careful not to remove or change fields that already exist.  In this 
case, you have moved maxRelativeExpiry from field 6 to field 7, which is an 
incompatible change.  Instead, you should simply add your new field to the end.

I suggest using something like this:
{code}
+  optional uint32 defaultReplication = 6 [default=1];
{code}

To avoid having to programmatically add a default of 1 in so many places.

> Add per-cache-pool default replication num configuration
> 
>
> Key: HDFS-10328
> URL: https://issues.apache.org/jira/browse/HDFS-10328
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: HDFS-10328.001.patch, HDFS-10328.002.patch
>
>
> For now, hdfs cacheadmin can not set a default replication num for cached 
> directive in the same cachepool. Each cache directive added in the same cache 
> pool should set their own replication num individually. 
> Consider this situation, we add daily hive table into cache pool "hive" .Each 
> time i should set the same replication num for every table directive in the 
> same cache pool.  
> I think we should enable setting a default replication num for a cachepool 
> that every cache directive in the pool can inherit replication configuration 
> from the pool. Also cache directive can override replication configuration 
> explicitly by calling "add & modify  directive -replication" command from 
> cli.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10352) Allow users to get last access time of a given directory

2016-05-02 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266999#comment-15266999
 ] 

Colin Patrick McCabe commented on HDFS-10352:
-

-1.  As [~linyiqun] commented, the performance would be bad, because it is O(N) 
in terms of number of files in the directory.  This also would be very 
confusing to operators, since it doesn't match the semantics of any other known 
filesystem or operating system.  Finally, if users want to take the maximum 
value of all the entries in a directory, they can easily do this by calling 
listDir and computing the maximum themselves.  This is just as (in)efficient as 
what is proposed here, and much cleaner.

> Allow users to get last access time of a given directory
> 
>
> Key: HDFS-10352
> URL: https://issues.apache.org/jira/browse/HDFS-10352
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.6.4
>Reporter: Eric Lin
>Assignee: Lin Yiqun
>Priority: Minor
>
> Currently FileStatus.getAccessTime() function will return 0 if path is a 
> directory, it would be ideal that if a directory path is passed, the code 
> will go through all the files under the directory and return the MAX access 
> time of all the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10328) Add cache pool level replication managment

2016-04-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264841#comment-15264841
 ] 

Colin Patrick McCabe commented on HDFS-10328:
-

Hi [~xupener],

Interesting idea.  However, this doesn't sound like "cache pool level 
replication management", since the replication management is still 
per-directive, even after this patch.  This seems like adding a per-cache-pool 
default.  If you agree, can you update the JIRA name and some of the names in 
the patch?

> Add cache pool level replication managment 
> ---
>
> Key: HDFS-10328
> URL: https://issues.apache.org/jira/browse/HDFS-10328
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: caching
>Reporter: xupeng
>Assignee: xupeng
>Priority: Minor
> Fix For: 2.3.0
>
> Attachments: HDFS-10328.001.patch
>
>
> For now, hdfs cacheadmin can not set a replication num for cachepool. Each 
> cache directive added in the cache pool should set their own replication num 
> individually. 
> Consider this situation, we add daily hive table into cache pool "hive" .Each 
> time i should set the same replication num for every table directive in the 
> same cache pool.  
> I think we should enable setting a replication num for cachepool that every 
> cache directive in the pool can inherit replication configuration from the 
> pool. Also cache directive can override replication configuration explicitly 
> by calling "add & modify  directive -replication" command from cli.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264772#comment-15264772
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

bq. You can think of it as a new operation SyncStorages, which does just that - 
updates NameNode's knowledge of DN's storages. I combined this operation with 
the first br-RPC. One can combine it with any other call, same as you propose 
to combine it with the heartbeat. Except it seems a poor idea, since we don't 
want to wait for removal of thousands of replicas on a heartbeat.

Thanks for explaining your proposal a little bit more.  I agree that 
enumerating all the storages in the first block report RPC is a fairly simple 
way to handle this, and shouldn't add too much size to the FBR.  It seems like 
a better idea than adding it to the heartbeat, like I proposed.  In the short 
term, however, I would prefer the current patch, since it involves no RPC 
changes, and doesn't require all the DataNodes to be upgraded before it can 
work.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260498#comment-15260498
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

BTW, sorry for the last-minute-ness of this scheduling, [~liuml07] and 
[~steve_l].

Webex here at 10:30:

HDFS-10175 webex
Wednesday, April 27, 2016
10:30 am | Pacific Daylight Time (San Francisco, GMT-07:00) | 1 hr

JOIN WEBEX MEETING
https://cloudera.webex.com/cloudera/j.php?MTID=mebca25435f158dec71b2589561e71b29
Meeting number: 294 963 170 Meeting password: 1234

JOIN BY PHONE 1-650-479-3208 Call-in toll number (US/Canada) Access code: 294 
963 170 Global call-in numbers: 
https://cloudera.webex.com/cloudera/globalcallin.php?serviceType=MC=45642173=0
 

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260487#comment-15260487
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

Great.  Let me add a webex

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259529#comment-15259529
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

bq. Hey Colin, I reviewed your patch more thoroughly. There is still a problem 
with interleaving reports. See updateBlockReportContext(). Suppose that block 
reports interleave like this: . Then br1-s2 
will reset curBlockReportRpcsSeen since curBlockReportId is not the same as in 
the report, which will discard the bit set for s1 in br2-s1, and the count of 
rpcsSeen = 0 will be wrong for br2-s2. So possibly unreported (zombie) storages 
will not be removed. LMK if you see what I see.

Thanks for looking at the patch.  I agree that in the case of interleaving, 
zombie storages will not be removed.  I don't consider that a problem, since we 
will eventually get a non-interleaved full block report that will do the zombie 
storage removal.  In practice, interleaved block reports are extremely rare (we 
have never seen the problem described in this JIRA, after deploying to 
thousands of clusters).

bq. May be we should go with a different approach for this problem.  Single 
block report can be split into multiple RPCs. Within single block-report-RPC 
NameNode processes each storage under a lock, but then releases and re-acquires 
the lock for the next storage, so that multiple RPC reports can interleave due 
to multi-threading.

Maybe I'm misunderstanding the proposal, but don't we already do all of this?  
We split block reports into multiple RPCs when the storage reports grow beyond 
a certain size.

bq. Approach. DN should report full list of its storages in the first 
block-report-RPC. The NameNode first cleans up unreported storages and replicas 
belonging them, then start processing the rest of block reports as usually. So 
DataNodes explicitly report storages that they have, which eliminates NameNode 
guessing, which storage is the last in the block report RPC.

What does the NameNode do if the DataNode is restarted while sending these 
RPCs, so that it never gets a chance to send all the storages that it claimed 
existed?  It seems like you will get stuck and not be able to accept any new 
reports.  Or, you can take the same approach the current patch does, and clear 
the current state every time you see a new ID (but then you can't do zombie 
storage elimination in the presence of interleaving.)

One approach that avoids all these problems is to avoid doing zombie storage 
elimination during FBRs entirely, and do it instead during DN heartbeats (for 
example).  DN heartbeats are small messages that are never split, and their 
processing is not interleaved with anything.

We agree that the current patch solves the problem of storages falsely being 
declared as zombies, I hope.  I think that's a good enough reason to get this 
patch in, and then think about alternate approaches later.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259223#comment-15259223
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

Hi [~steve_l], does 10:30AM work tomorrow?  Unfortunately I'll be out on 
Thursday and most of Friday, so if we can't do tomorrow we'd have to do Friday 
afternoon or early next week.

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258726#comment-15258726
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

bq. I see that, but stream-level counters are essential at least for the tests 
which verify forward and lazy seeks. Which means that yes, they do have to go 
into the 2.8.0 release. What I can do is set up the scope so that they are 
package private, then, in the test code, implement the assertions about 
metric-derived state into that package.

I guess my hope here is that whatever mechanism we come up with is something 
that could easily be integrated into the upcoming 2.8 release.  Since we have 
talked about requiring our new metrics to not modify existing stable public 
interfaces, that seems very reasonable.

One thing that is a bit concerning about metrics2 is that I think people feel 
that this interface should be stable (i.e. don't remove or alter things once 
they're in), which would be a big constraint on us.  Perhaps we could document 
that per-fs stats were \@Public \@Evolving rather than stable?

bq. Regarding the metrics2 instrumentation in HADOOP-13028, I'm aggregating the 
stream statistics back into the metrics 2 data. That's something which isn't 
needed for the hadoop tests, but which I'm logging in spark test runs, such as 
(formatted for readability):

Do we have any ideas about how Spark will consume these metrics in the longer 
term?  Do they prefer to go through metrics2, for example?  I definitely don't 
object to putting this kind of stuff in metrics2, but if we go that route, we 
have to accept that we'll just get global (or at best per-fs-type) statistics, 
rather than per-fs-instance statistics.  Is that acceptable?  So far, nobody 
has spoken up strongly in favor of per-fs-instance statistics.

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258706#comment-15258706
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

bq. I prefer earlier (being in UK time and all); I could do the first half hour 
of the webex \[at 12:30pm\]

How about 10:30AM PST to noon on tomorrow on Wednesday?

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258603#comment-15258603
 ] 

Colin Patrick McCabe commented on HDFS-10305:
-

bq. That's interesting then. hdfs dfs -mkdir /dirAlreadyExists returns a 
non-zero return code. I assumed a non-zero error code == a failed operation. 
Obviously I was wrong.

A non-zero error code on the shell does indicate a failed operation.  You can 
see that FsShell explicitly checks to see whether the path exists and exits 
with an error code if so.  The code is in 
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Mkdir.java
  I don't think this has anything to do with what hdfs should put in the audit 
log, since in this case, FsShell doesn't even call mkdir.

> Hdfs audit shouldn't log mkdir operaton if the directory already exists.
> 
>
> Key: HDFS-10305
> URL: https://issues.apache.org/jira/browse/HDFS-10305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Minor
>
> Currently Hdfs audit logs mkdir operation even if the directory already 
> exists.
> This creates confusion while analyzing audit logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines

2016-04-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258592#comment-15258592
 ] 

Colin Patrick McCabe commented on HDFS-10326:
-

bq. Some system may not support auto tuning, defaulting to a small window size 
(say 64k? which may make the scenario worse).

Can you give a concrete example of a system where Hadoop is actually deployed 
which doesn't support auto-tuning?

bq. I'd suggest we keep the configuration. Or maybe add another one, say 
dfs.socket.detect-auto-turning. When this is set to true (maybe turned on by 
default), socket buffer behavior depends on whether OS supports auto-tuning. If 
auto-tuning is not supported, use configured value automatically.

Hmm.  As far as I know, there is no way to detect auto-tuning.  If there is, 
then we wouldn't need a new configuration... we could just set the appropriate 
value when no configuration was given.

> Disable setting tcp socket send/receive buffers for write pipelines
> ---
>
> Key: HDFS-10326
> URL: https://issues.apache.org/jira/browse/HDFS-10326
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>
> The DataStreamer and the Datanode use a hardcoded 
> DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write 
> pipeline.  Explicitly setting tcp buffer sizes disables tcp stack 
> auto-tuning.  
> The hardcoded value will saturate a 1Gb with 1ms RTT.  105Mbs at 10ms.  
> Paltry 11Mbs over a 100ms long haul.  10Gb networks are underutilized.
> There should either be a configuration to completely disable setting the 
> buffers, or the the setReceiveBuffer and setSendBuffer should be removed 
> entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.

2016-04-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257383#comment-15257383
 ] 

Colin Patrick McCabe commented on HDFS-10305:
-

Thanks, [~raviprak].  It's always good to have more people looking at this.

Since the mkdir operation succeeded, it seems like it should be in the audit 
log, according to the policy set in HDFS-9395... perhaps I missed something.

> Hdfs audit shouldn't log mkdir operaton if the directory already exists.
> 
>
> Key: HDFS-10305
> URL: https://issues.apache.org/jira/browse/HDFS-10305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Minor
>
> Currently Hdfs audit logs mkdir operation even if the directory already 
> exists.
> This creates confusion while analyzing audit logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257074#comment-15257074
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

We already have three statistics interfaces:
1. FileSystem#Statistics
2. DFSInputStream#ReadStatistics
3. metrics2 etc.

#1 existed for a very long time and is tied into MR in the ways discussed 
above.  I didn't create it, but I did implement the thread-local optimization, 
based on some performance issues we were having.

I have to take the blame for adding #2, in HDFS-4698.  At the time, the main 
focus was on ensuring we were doing short-circuit reads, which didn't really 
fit into #1.  And like you, I felt that it was "very low-level stream behavior" 
that was decoupled from the rest of the stats.

Of course #3 has been around a while, and is used more generally than just in 
our storage code.

I understand your eagerness to get the s3 stats in, but I would rather not 
proliferate more statistics interfaces if possible.  Once they're in, we really 
can't get rid of them, and it becomes very confusing and clunky.

Are you guys free for a webex on Wednesday afternoon?  Maybe 12:30pm to 2pm?

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering

2016-04-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256861#comment-15256861
 ] 

Colin Patrick McCabe commented on HDFS-10323:
-

Thanks for the detailed bug report, [~bpodgursky].

bq. 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child 
FileSystem, and not hold onto that path itself.

This would be an incompatible change, right?  It seems like a lot of code 
calling {{FS#close}} might not work with this change.

bq. 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then 
all other FileSystems.

This seems like the safest way to go.

> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> 
>
> Key: HDFS-10323
> URL: https://issues.apache.org/jira/browse/HDFS-10323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Ben Podgursky
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began 
> failing frequently, displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for 
> path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the 
> error is, but I believe what is happening is that the ViewFileSystem’s child 
> FileSystems are being close()’d before the ViewFileSystem, due to the random 
> order ClientFinalizer closes FileSystems; so then when the ViewFileSystem 
> tries to close(), it tries to forward the delete() calls to the appropriate 
> child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it 
> involves testing behavior on actual JVM shutdown.  However, I can verify that 
> while
> {code:java}
> fs.deleteOnExit(randomTemporaryDir);

> {code}
> regularly (~50% of the time) fails to delete the temporary directory, this 
> code:
> {code:java}
> ViewFileSystem viewfs = (ViewFileSystem)fs1;

> for (FileSystem fileSystem : viewfs.getChildFileSystems()) {
  
>   if (fileSystem.exists(randomTemporaryDir)) {

> fileSystem.deleteOnExit(randomTemporaryDir);
  
>   }
> 
}

> {code}
> always successfully deletes the temporary directory on JVM shutdown.
> I am not very familiar with FileSystem inheritance hierarchies, but at first 
> glance I see two ways to fix this behavior:
> 1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child 
> FileSystem, and not hold onto that path itself.
> 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all 
> other FileSystems.  
> Would appreciate any thoughts of whether this seems accurate, and thoughts 
> (or help) on the fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10305) Hdfs audit shouldn't log mkdir operaton if the directory already exists.

2016-04-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256858#comment-15256858
 ] 

Colin Patrick McCabe commented on HDFS-10305:
-

bq. I believe the audit log was supposed to capture failed operations as well. 
I'd be inclined to close this JIRA as WON'T FIX

I can see why you might think this, but no, the audit log should not capture 
failed operations.  Check out the discussion at HDFS-9395 for more details 
about this.

bq. It's not a failed operation if the directory already exists.

Yeah, I agree.

bq. Closing this as Won't Fix

Sounds good.

> Hdfs audit shouldn't log mkdir operaton if the directory already exists.
> 
>
> Key: HDFS-10305
> URL: https://issues.apache.org/jira/browse/HDFS-10305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Minor
>
> Currently Hdfs audit logs mkdir operation even if the directory already 
> exists.
> This creates confusion while analyzing audit logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10326) Disable setting tcp socket send/receive buffers for write pipelines

2016-04-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256848#comment-15256848
 ] 

Colin Patrick McCabe commented on HDFS-10326:
-

bq. I think we can keep the configurability, but set the default to 0.

I agree.  The reason why the original patches didn't set the default was 
basically that we wanted to be conservative.  Basically, we were adding the 
option to use auto-tuning, but not making it the default.  If we strongly 
believe that auto-tuning should be the default, we should make these options 
default to 0 unless set by the admin.

> Disable setting tcp socket send/receive buffers for write pipelines
> ---
>
> Key: HDFS-10326
> URL: https://issues.apache.org/jira/browse/HDFS-10326
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>
> The DataStreamer and the Datanode use a hardcoded 
> DEFAULT_DATA_SOCKET_SIZE=128K for the send and receive buffers of a write 
> pipeline.  Explicitly setting tcp buffer sizes disables tcp stack 
> auto-tuning.  
> The hardcoded value will saturate a 1Gb with 1ms RTT.  105Mbs at 10ms.  
> Paltry 11Mbs over a 100ms long haul.  10Gb networks are underutilized.
> There should either be a configuration to completely disable setting the 
> buffers, or the the setReceiveBuffer and setSendBuffer should be removed 
> entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256842#comment-15256842
 ] 

Colin Patrick McCabe edited comment on HDFS-10175 at 4/25/16 7:20 PM:
--

Thanks for commenting, [~steve_l].  It's great to see work on s3 stats.  s3a 
has needed love for a while.

Did you get a chance to look at my HDFS-10175.006.patch on this JIRA?  It seems 
to address all of your concerns.  It provides a standard API that every 
FileSystem can implement (not just s3, just HDFS, etc. etc.).  Once we adopt 
jdk8, we can easily implement this API using 
{{java.util.concurrent.atomic.LongAdder}} if that proves to be more readable 
and/or efficient.

bq. Don't break any existing filesystem code by adding new params to existing 
methods, etc.

I agree.  My patch doesn't add new params to any existing methods.

bq. add the new code out of FileSystem

I agree.  That's why I separated {{StorageStatistics.java}} from 
{{FileSystem.java}}.  {{FileContext}} should be able to use this API as well, 
simply by returning a {{StorageStatistics}} instance just like {{FileSystem}} 
does.

bq. Use an int rather than an enum; lets filesystems add their own counters. I 
hereby reserve 0x200-0x255 for object store operations.

Hmm.  I'm not sure I follow.  My patch identifies counters by name (string), 
not by an int, enum, or byte.  This is necessary because different storage 
backends will want to track different things (s3a wants to track s3 PUTs, HDFS 
wants to track genstamp bump ops, etc. etc.).  We should not try to create the 
"statistics type enum of doom" in some misguided attempt at space optimization. 
 Consider the case of out-of-tree Filesystem implementations as well... they 
are not going to add entries to some enum of doom in hadoop-common.

bq. public interface StatisticsSource {  Map snapshot(); }

I don't think an API that returns a map is the right approach for statistics.  
That map could get quite large.   We already know that people love adding just 
one more statistic to things (and often for quite valid reasons).  It's very 
difficult to \-1 a patch just because it bloats the statistics map more.  Once 
this API exists, the natural progression will be people adding tons and tons of 
new entries to it.  We should be prepared for this and use an API that doesn't 
choke if we have tons of stats.  We shouldn't have to materialize everything 
all the time-- an iterator approach is smarter because it can be O(1) in terms 
of memory, no matter how many entries we have.

I also don't think we need snapshot consistency for stats.  It's a heavy burden 
for an implementation to carry (it basically requires some kind of 
materialization into a map, and probably synchronization to stop the world 
while the materialization is going on).  And there is no user demand for it... 
the current FileSystem#Statistics interface doesn't have it, and nobody has 
asked for it.

It seems like you are focusing on the ability to expose new stats to our 
metrics2 subsystem, while this JIRA originally focused on adding metrics that 
MapReduce could read at the end of a job.  I think these two use-cases can be 
covered by the same API.  We should try to hammer that out (probably as a 
HADOOP JIRA rather than an HDFS JIRA, as well).

Do you think we should have a call about this or something?  I know some folks 
who might be interested in testing the s3 metrics stuff, if there was a 
reasonable API to access it.


was (Author: cmccabe):
Thanks for commenting, [~steve_l].  It's great to see work on s3 stats.  s3a 
has needed love for a while.

Did you get a chance to look at my HDFS-10175.006.patch on this JIRA?  It seems 
to address all of your concerns.  It provides a standard API that every 
FileSystem can implement (not just s3, just HDFS, etc. etc.).  Once we adopt 
jdk8, we can easily implement this API using 
{{java.util.concurrent.atomic.LongAdder}} if that proves to be more readable 
and/or efficient.

bq. Don't break any existing filesystem code by adding new params to existing 
methods, etc.

I agree.  My patch doesn't add new params to any existing methods.

bq. add the new code out of FileSystem

I agree.  That's why I separated {{StorageStatistics.java}} from 
{{FileSystem.java}}.  {{FileContext}} should be able to use this API as well, 
simply by returning a {{StorageStatistics}} instance just like {{FileSystem}} 
does.

bq. Use an int rather than an enum; lets filesystems add their own counters. I 
hereby reserve 0x200-0x255 for object store operations.

Hmm.  I'm not sure I follow.  My patch identifies counters by name (string), 
not by an int, enum, or byte.  This is necessary because different storage 
backends will want to track different things (s3a wants to track s3 PUTs, HDFS 
wants to track genstamp bump ops, etc. etc.).  We should not try to create the 
"statistics type 

[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256842#comment-15256842
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

Thanks for commenting, [~steve_l].  It's great to see work on s3 stats.  s3a 
has needed love for a while.

Did you get a chance to look at my HDFS-10175.006.patch on this JIRA?  It seems 
to address all of your concerns.  It provides a standard API that every 
FileSystem can implement (not just s3, just HDFS, etc. etc.).  Once we adopt 
jdk8, we can easily implement this API using 
{{java.util.concurrent.atomic.LongAdder}} if that proves to be more readable 
and/or efficient.

bq. Don't break any existing filesystem code by adding new params to existing 
methods, etc.

I agree.  My patch doesn't add new params to any existing methods.

bq. add the new code out of FileSystem

I agree.  That's why I separated {{StorageStatistics.java}} from 
{{FileSystem.java}}.  {{FileContext}} should be able to use this API as well, 
simply by returning a {{StorageStatistics}} instance just like {{FileSystem}} 
does.

bq. Use an int rather than an enum; lets filesystems add their own counters. I 
hereby reserve 0x200-0x255 for object store operations.

Hmm.  I'm not sure I follow.  My patch identifies counters by name (string), 
not by an int, enum, or byte.  This is necessary because different storage 
backends will want to track different things (s3a wants to track s3 PUTs, HDFS 
wants to track genstamp bump ops, etc. etc.).  We should not try to create the 
"statistics type enum of doom" in some misguided attempt at space optimization. 
 Consider the case of out-of-tree Filesystem implementations as well... they 
are not going to add entries to some enum of doom in hadoop-common.

bq. public interface StatisticsSource {  Map snapshot(); }

I don't think an API that returns a map is the right approach for statistics.  
That map could get quite large.   We already know that people love adding just 
one more statistic to things (and often for quite valid reasons).  It's very 
difficult to -1 a patch just because it bloats the statistics map more.  Once 
this API exists, the natural progression will be people adding tons and tons of 
new entries to it.  We should be prepared for this and use an API that doesn't 
choke if we have tons of stats.  We shouldn't have to materialize everything 
all the time-- an iterator approach is smarter because it can be O(1) in terms 
of memory, no matter how many entries we have.

I also don't think we need snapshot consistency for stats.  It's a heavy burden 
for an implementation to carry (it basically requires some kind of 
materialization into a map, and probably synchronization to stop the world 
while the materialization is going on).  And there is no user demand for it... 
the current FileSystem#Statistics interface doesn't have it, and nobody has 
asked for it.

It seems like you are focusing on the ability to expose new stats to our 
metrics2 subsystem, while this JIRA originally focused on adding metrics that 
MapReduce could read at the end of a job.  I think these two use-cases can be 
covered by the same API.  We should try to hammer that out (probably as a 
HADOOP JIRA rather than an HDFS JIRA, as well).

Do you think we should have a call about this or something?  I know some folks 
who might be interested in testing the s3 metrics stuff, if there was a 
reasonable API to access it.

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like 

[jira] [Comment Edited] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256791#comment-15256791
 ] 

Colin Patrick McCabe edited comment on HDFS-10301 at 4/25/16 6:58 PM:
--

bq. [~shv] wrote: The last line is confusing, because it should have been 2, 
but its is 0 since br2 overridden lastBlockReportId for s1 and s2 .

It's OK for it to be 0 here.  It just means that we will not do the zombie 
storage elimination for these particular full block reports.  Remember that 
interleaved block reports are an extremely rare case, and so are zombie 
storages.  We can wait for the next FBR to do the zombie elimination.

bq. I think this could be a simple fix for this jira, and we can discuss other 
approaches to zombie storage detection in the next issue. Current approach 
seems to be error prone. One way is to go with the retry cache as Jing Zhao 
suggested. Or there could be other ideas.

The problem with a retry cache is that it uses up memory.  We don't have an 
easy way to put an upper bound on the amount of memory that we need, except 
through adding complex logic to limit the number of full block reports accepted 
for a specific DataNode in a given time period.

bq. This brought me to an idea. BR ids are monotonically increasing...
The code for generating block report IDs is here:
{code}
  private long generateUniqueBlockReportId() {
// Initialize the block report ID the first time through.
// Note that 0 is used on the NN to indicate "uninitialized", so we should
// not send a 0 value ourselves.
prevBlockReportId++;
while (prevBlockReportId == 0) {
  prevBlockReportId = ThreadLocalRandom.current().nextLong();
} 
return prevBlockReportId;
  } 
{code}

It's not monotonically increasing in the case where rollover occurs.  While 
this is an extremely rare case, the consequences of getting it wrong would be 
extremely severe.  So this might be possible as an incompatible change, but not 
a change in branch-2.

Edit: another reason not to do this is because on restart, the DN could get a 
number lower than its previous one.  We can't use IDs as epoch numbers unless 
we actually persist them to disk, like Paxos transaction IDs or HDFS edit log 
IDs.

bq. [~walter.k.su] wrote: If BR is splitted into multipe RPCs, there's no 
interleaving natually because DN get the acked before it sends next RPC. 
Interleaving only exists if BR is not splitted. I agree bug need to be fixed 
from inside, It's just eliminating interleaving for good maybe not a bad idea, 
as it simplifies the problem, and is also a simple workaround for this jira.

We don't document anywhere that interleaving doesn't occur.  We don't have unit 
tests that it doesn't occur, and if we did, those unit tests might accidentally 
pass because of race conditions.  Even if we eliminated interleaving for now, 
anyone changing the RPC code or the queuing code could easily re-introduce 
interleaving and this bug would come back.  That's why I agree with [~shv]-- we 
should not focus on trying to remove interleaving.

bq. [~shv] wrote: I think this could be a simple fix for this jira, and we can 
discuss other approaches to zombie storage detection in the next issue.

Yeah, let's get in this fix and then talk about potential improvements in a 
follow-on jira.


was (Author: cmccabe):
bq. [~shv] wrote: The last line is confusing, because it should have been 2, 
but its is 0 since br2 overridden lastBlockReportId for s1 and s2 .

It's OK for it to be 0 here.  It just means that we will not do the zombie 
storage elimination for these particular full block reports.  Remember that 
interleaved block reports are an extremely rare case, and so are zombie 
storages.  We can wait for the next FBR to do the zombie elimination.

bq. I think this could be a simple fix for this jira, and we can discuss other 
approaches to zombie storage detection in the next issue. Current approach 
seems to be error prone. One way is to go with the retry cache as Jing Zhao 
suggested. Or there could be other ideas.

The problem with a retry cache is that it uses up memory.  We don't have an 
easy way to put an upper bound on the amount of memory that we need, except 
through adding complex logic to limit the number of full block reports accepted 
for a specific DataNode in a given time period.

bq. This brought me to an idea. BR ids are monotonically increasing...
The code for generating block report IDs is here:
{code}
  private long generateUniqueBlockReportId() {
// Initialize the block report ID the first time through.
// Note that 0 is used on the NN to indicate "uninitialized", so we should
// not send a 0 value ourselves.
prevBlockReportId++;
while (prevBlockReportId == 0) {
  prevBlockReportId = ThreadLocalRandom.current().nextLong();
} 
return prevBlockReportId;
  } 
{code}

It's not 

[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-25 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256791#comment-15256791
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

bq. [~shv] wrote: The last line is confusing, because it should have been 2, 
but its is 0 since br2 overridden lastBlockReportId for s1 and s2 .

It's OK for it to be 0 here.  It just means that we will not do the zombie 
storage elimination for these particular full block reports.  Remember that 
interleaved block reports are an extremely rare case, and so are zombie 
storages.  We can wait for the next FBR to do the zombie elimination.

bq. I think this could be a simple fix for this jira, and we can discuss other 
approaches to zombie storage detection in the next issue. Current approach 
seems to be error prone. One way is to go with the retry cache as Jing Zhao 
suggested. Or there could be other ideas.

The problem with a retry cache is that it uses up memory.  We don't have an 
easy way to put an upper bound on the amount of memory that we need, except 
through adding complex logic to limit the number of full block reports accepted 
for a specific DataNode in a given time period.

bq. This brought me to an idea. BR ids are monotonically increasing...
The code for generating block report IDs is here:
{code}
  private long generateUniqueBlockReportId() {
// Initialize the block report ID the first time through.
// Note that 0 is used on the NN to indicate "uninitialized", so we should
// not send a 0 value ourselves.
prevBlockReportId++;
while (prevBlockReportId == 0) {
  prevBlockReportId = ThreadLocalRandom.current().nextLong();
} 
return prevBlockReportId;
  } 
{code}

It's not monotonically increasing in the case where rollover occurs.  While 
this is an extremely rare case, the consequences of getting it wrong would be 
extremely severe.  So this might be possible as an incompatible change, but not 
a change in branch-2.

bq. [~walter.k.su] wrote: If BR is splitted into multipe RPCs, there's no 
interleaving natually because DN get the acked before it sends next RPC. 
Interleaving only exists if BR is not splitted. I agree bug need to be fixed 
from inside, It's just eliminating interleaving for good maybe not a bad idea, 
as it simplifies the problem, and is also a simple workaround for this jira.

We don't document anywhere that interleaving doesn't occur.  We don't have unit 
tests that it doesn't occur, and if we did, those unit tests might accidentally 
pass because of race conditions.  Even if we eliminated interleaving for now, 
anyone changing the RPC code or the queuing code could easily re-introduce 
interleaving and this bug would come back.  That's why I agree with [~shv]-- we 
should not focus on trying to remove interleaving.

bq. [~shv] wrote: I think this could be a simple fix for this jira, and we can 
discuss other approaches to zombie storage detection in the next issue.

Yeah, let's get in this fix and then talk about potential improvements in a 
follow-on jira.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10175) add per-operation stats to FileSystem.Statistics

2016-04-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254503#comment-15254503
 ] 

Colin Patrick McCabe commented on HDFS-10175:
-

bq. Can I also note that as the @Public @Stable FileSystem is widely 
subclassed, with its protected statistics field accessed in those subclasses, 
nobody is allowed to take it or its current methods away. Thanks.

Yeah, I agree.  I would like to see us get more cautious about adding new 
things to {{FileSystem#Statistics}}, though, since I think it's not a good 
match for most of the new stats we're proposing here.

bq. There's no per-thread tracking, —its collecting overall stats, rather than 
trying to add up the cost of a single execution, which is what per-thread stuff 
would, presumably do. This is lower cost but still permits microbenchmark-style 
analysis of performance problems against S3a. It doesn't directly let you get 
results of a job, "34MB of data, 2000 stream aborts, 1998 backward seeks" which 
are the kind of things I'm curious about.

Overall stats are lower cost in terms of memory consumption, and the cost to 
read (as opposed to update) a metric.  They are higher cost in terms of the CPU 
consumed for each update of the metric.  In particular, for applications that 
do a lot of stream operations from many different threads, updating an 
AtomicLong can become a performance bottleneck

One of the points that I was making above is that I think it's appropriate for 
some metrics to be tracked per-thread, but for others, we probably want to use 
AtomicLong or similar.  I would expect that anything that led to an s3 RPC 
would be appropriate to be tracked by an AtomicLong very easily, since the 
overhead of the network activity would dwarf the AtomicLong update overhead.  
And we should have a common interface for getting this information that MR and 
stats consumers can use.

bq. Maybe, and this would be nice, whatever is implemented here is (a) 
extensible to support some duration type too, at least in parallel, 

The interface here supports storing durations as 64-bit numbers of 
milliseconds, which seems good.  It is up to the implementor of the statistic 
to determine what the 64-bit long represents (average duration in ms, median 
duration in ms, number of RPCs, etc. etc.)  This is similar to metrics2 and 
jmx, etc. where you have basic types that can be used in a few different ways.

bq. and (b) could be used as a back end by both Metrics2 and Coda Hale metrics 
registries. That way the slightly more expensive metric systems would have 
access to this more raw data.

Sure.  The difficult question is how metrics2 hooks up to metrics which are per 
FS or per-stream.  Should the output of metrics2 reflect the union of all 
existing FS and stream instances?  Some applications open a very large number 
of streams, so it seems impractical for metrics2 to include all those streams 
in its output.

> add per-operation stats to FileSystem.Statistics
> 
>
> Key: HDFS-10175
> URL: https://issues.apache.org/jira/browse/HDFS-10175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Ram Venkatesh
>Assignee: Mingliang Liu
> Attachments: HDFS-10175.000.patch, HDFS-10175.001.patch, 
> HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, 
> HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. 
> There is logic within DfsClient to map operations to these counters that can 
> be confusing, for instance, mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, 
> createSymlink, delete, exists, mkdirs, rename and expose them as new 
> properties on the Statistics object. The operation-specific counters can be 
> used for analyzing the load imposed by a particular job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large 
> number of files.
> Once this information is available in the Statistics object, the app 
> frameworks like MapReduce can expose them as additional counters to be 
> aggregated and recorded as part of job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254350#comment-15254350
 ] 

Colin Patrick McCabe commented on HDFS-10301:
-

Yeah, perhaps we should file a follow-on JIRA to optimize by removing the 
storage reports with an older ID when a newer one was received.  The challenge 
will be implementing it efficiently-- we probably need to move away from 
BlockingQueue and towards something with our own locking.  And probably 
something other than plain Runnables.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reassigned HDFS-10301:
---

Assignee: Colin Patrick McCabe

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Assignee: Colin Patrick McCabe
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

2016-04-21 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10301:

Attachment: HDFS-10301.003.patch

added a unit test

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> 
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.1
>Reporter: Konstantin Shvachko
>Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >