[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524812#comment-14524812
 ] 

Hadoop QA commented on HDFS-6093:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12635714/hdfs-6093-4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10615/console |


This message was automatically generated.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch, hdfs-6093-2.patch, hdfs-6093-3.patch, 
 hdfs-6093-4.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524765#comment-14524765
 ] 

Hadoop QA commented on HDFS-6093:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12635714/hdfs-6093-4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10595/console |


This message was automatically generated.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch, hdfs-6093-2.patch, hdfs-6093-3.patch, 
 hdfs-6093-4.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-20 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942169#comment-13942169
 ] 

Arpit Agarwal commented on HDFS-6093:
-

+1 from me, nice diagnosability improvements.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch, hdfs-6093-2.patch, hdfs-6093-3.patch, 
 hdfs-6093-4.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942306#comment-13942306
 ] 

Colin Patrick McCabe commented on HDFS-6093:


{code}
With the locking issues resolved, is it okay to just leave it with a single set 
of variables? I could switch it over to AtomicLongs or something, but I think 
it's all under the FSN lock anyway.
{code}

I think putting more things under the big lock is the wrong direction to go.  
In particular, we will eventually need to release the big lock from time to 
time while doing the CacheReplicationMonitor scan.  When we do that, having 
just one set of counters is not going to work.  It seems simple enough just to 
have a {{CacheManager#Counters}} object with its own lock, and set it at the 
end of the scan.  There's other ways to do this too (atomics, etc.)

This would also make it easier to modify the pending cache count in 
{{processCacheReportImpl}}.  It's easy to understand the concept of modifying a 
copy of the stats, harder to understand all the locking interactions of 
modifying the counter that the CRM is actually using.  At least for me.

With regard to the {{processCacheReportImpl}} changes, I think there are still 
some issues here.  I don't like the fact that we are now potentially allocating 
a TreeMap of size NUM_PENDING_UNCACHED blocks in every cache report.  There are 
a few different ways to handle this without a huge memory blowup.  The simplest 
is probably to remove the final on {{DatanodeDescriptor#pendingUncached}}.  
Then you just create a new list in {{processCacheReportImpl}}, and selectively 
add the still-need-to-be-uncached blocks to that.  Then at the end, you throw 
away the old list and make {{DatanodeDescriptor}} use the new list.

+1 once all that is addressed

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch, hdfs-6093-2.patch, hdfs-6093-3.patch, 
 hdfs-6093-4.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-19 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941108#comment-13941108
 ] 

Andrew Wang commented on HDFS-6093:
---

Hi Arpit and Colin, thanks for reviewing. New patch is up. Addressed your 
feedback except the following, and I also fixed a logging issue I found:

bq. update CentralizedCacheManagement.html in the docs?

Added a short blurb. A nice follow-on JIRA would be an FAQ for debugging 
caching, since it can be tricky right now.

bq. display the pending caching/uncaching counts in the output of 'dfsadmin 
-report'?

I think dfsadmin -report is more about usage statistics than replication work. 
Having the pending stats as a metric and on the webUI means it should still be 
easy enough to access.

bq. Was stillPendingUncached introduced to fix a bug?

This is required because cache reports just tell you what's cached, not also 
what was uncached. So, we need to compute a diff to update pendingUncached 
correctly.

bq. ternary statement code nit

I prefer not to use ternary statements, so I'd like to leave it as is if that's 
okay.

bq. decouple the counter(s) that can be read from the CRM from the counters 
that the CRM uses internally

With the locking issues resolved, is it okay to just leave it with a single set 
of variables? I could switch it over to AtomicLongs or something, but I think 
it's all under the FSN lock anyway.

bq. colin Incrementally updating the pendingUncached list and stats is a nice 
idea, but it seems too ambitious for 2.4 at this point. 

I'm okay bumping this to 2.5 if you'd rather not put this in 2.4, but I think 
this all works now with the locking fixed.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch, hdfs-6093-2.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941115#comment-13941115
 ] 

Hadoop QA commented on HDFS-6093:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635665/hdfs-6093-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6445//console

This message is automatically generated.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch, hdfs-6093-2.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941166#comment-13941166
 ] 

Hadoop QA commented on HDFS-6093:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635673/hdfs-6093-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6446//console

This message is automatically generated.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch, hdfs-6093-2.patch, hdfs-6093-3.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941424#comment-13941424
 ] 

Hadoop QA commented on HDFS-6093:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12635714/hdfs-6093-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6452//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6452//console

This message is automatically generated.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch, hdfs-6093-2.patch, hdfs-6093-3.patch, 
 hdfs-6093-4.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-15 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936087#comment-13936087
 ] 

Colin Patrick McCabe commented on HDFS-6093:


bq. Arpit said: In addition to reducing the timeout as you suggested, can we 
add some explanation to the command output, or update 
CentralizedCacheManagement.html in the docs?

I agree, we could add a short comment to the docs about this.  Now that the 
timeout has been reduced, there should be much less discrepancy between the 
output of the two commands, of course.

Taking a more detailed look at the patch now.

{code}
+  public FsStatus getCacheStatus() throws IOException {
{code}

I know it seems clever to reuse the same class for getStatus and 
getCacheStatus, but it could become a problem if someone later adds more fields 
to getStatus that don't apply to getCacheStatus.  I think we need our own type 
for this, to maintain sanity in the future.  It's not that much code.

{code}
   public long getMissingBlocksCount() throws IOException {
+statistics.incrementReadOps(1);
 return dfs.getMissingBlocksCount();
{code}

Can we put the non-caching-related incrementReadOps changes in their own JIRA?  
It may seem like a trivial change, but it's kind of distracting from this JIRA. 
 Also I'm not sure I understand when we're supposed to increment this...

{code}
  /**
   * Number of replicas pending caching.
   */
  private long numPendingCaching;
  /**
   * Number of replicas pending uncaching.
   */
  private long numPendingUncaching;
{code}

Could use a linebreak after {{numPendingCaching}} for consistency.

Like I said earlier, I'd prefer to decouple the counter(s) that can be read 
from the CRM from the counters that the CRM uses internally during the scan.  
Using the same variable for both just invites bugs like the one Arpit pointed 
out, where rescan zeroes the counter outside the lock.

{code}
[CacheManager#processCacheReportImpl changes]
{code}

Incrementally updating the pendingUncached list and stats is a nice idea, but 
it seems too ambitious for 2.4 at this point.  Now that the CRM interval is 30 
seconds, it shouldn't be too bad to just wait for the CRM to update its stats 
and the lists.  Additionally, we don't even know that monitor is non-null at 
this point, so there is an NPE here, I think.  Let's leave this out and revisit 
it later.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935723#comment-13935723
 ] 

Colin Patrick McCabe commented on HDFS-6093:


This looks good overall.  I think rather than protect 
{{CacheReplicationMonitor#numPendingCaching}} with the FSN lock, it would be 
better to make it an Atomic64 that we swap in at the end of the rescan.  That 
way we're not baking in the assumption that the rescan thread holds the FSN 
lock for the whole duration of the rescan.  It also would minimize the time we 
spend blocking waiting for the FSN lock in the MBean stuff.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-14 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935758#comment-13935758
 ] 

Arpit Agarwal commented on HDFS-6093:
-

Hi Andrew,

I just tried this out your patch and I think there is some mismatch between the 
output of {{dfsAdmin -report}} and {{cacheadmin -listPools}}.

This is with a single NN/single DN pseudocluster on Centos 6.5.

I ran the following commands:
- bin/hdfs cacheadmin -addPool pool1 -limit 1073741824
- bin/hdfs cacheadmin -addDirective -path /f1 -pool pool1

This says FILES_CACHED is zero.
{code}
$ bin/hdfs cacheadmin -listPools -stats
Found 1 result.
NAME   OWNER GROUP MODE LIMIT  MAXTTL  BYTES_NEEDED  
BYTES_CACHED  BYTES_OVERLIMIT  FILES_NEEDED  FILES_CACHED
pool1  aagarwal  aagarwal  rwxr-xr-x   1073741824   never   1048576 
00 1 0
{code}

However this says cache used is 1MB. 
{code}
aagarwal@arrow ~/deploy2/hadoop-3.0.0-SNAPSHOT$ bin/hdfs dfsadmin -report
Configured Capacity: 49202208768 (45.82 GB)
Present Capacity: 39676268544 (36.95 GB)
DFS Remaining: 39675179008 (36.95 GB)
DFS Used: 1089536 (1.04 MB)
DFS Used%: 0.00%

Configured Cache Capacity: 268435456 (256 MB)
Present Cache Capacity: 268435456 (256 MB)
Cache Remaining: 267386880 (255 MB)
Cache Used: 1048576 (1 MB)
Cache Used%: 0.39%
{code}

I did not see any error messages related to caching in the DN/NN logs.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935771#comment-13935771
 ] 

Colin Patrick McCabe commented on HDFS-6093:


Hi Aprit,

It takes time for the values reported by dfsAdmin -report and cacheadmin 
-listPools to converge, since dfsAdmin comes from information taken from the DN 
heartbeat, and listPools comes from information taken from the 
CacheReplicationMonitor.  Try waiting 5 or 10 minutes.  We might want to 
shorten the default for {{dfs.namenode.path.based.cache.retry.interval.ms}} for 
this reason.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-14 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935770#comment-13935770
 ] 

Andrew Wang commented on HDFS-6093:
---

Hey Arpit,

So the confusing thing about these stats is how the pool and directive stats 
are only updated when the CacheReplicationMonitor runs (default every 5 mins). 
The datanode-level stats are updated on the heartbeat, so much more frequent. I 
think if you wait for a CRM run, it'll then show up in listPools.

I was considering lowering the default CRM interval for this reason, maybe to 1 
min or 30s, for this reason.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935777#comment-13935777
 ] 

Colin Patrick McCabe commented on HDFS-6093:


bq. I was considering lowering the default CRM interval for this reason, maybe 
to 1 min or 30s, for this reason.

Yeah, maybe we should set it to 30 seconds for now to get a better user 
experience.  We can always raise it if a performance issues emerges on a big 
cluster.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935791#comment-13935791
 ] 

Colin Patrick McCabe commented on HDFS-6093:


sorry, meant to write dfs.namenode.path.based.cache.refresh.interval.ms

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935797#comment-13935797
 ] 

Colin Patrick McCabe commented on HDFS-6093:


I filed HDFS-6106 to reduce the defaults a bit.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-14 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935858#comment-13935858
 ] 

Arpit Agarwal commented on HDFS-6093:
-

Thanks Andrew/Colin, the values did converge! 

wrt the patch:
# In addition to reducing the timeout as you suggested, can we add some 
explanation to the command output, or update CentralizedCacheManagement.html in 
the docs? Additionally does it make sense to display the pending 
caching/uncaching counts in the output of 'dfsadmin -report'? This would make 
it clear right away that there are some pending cache operations.
# {{CacheReplicationMonitor#rescan}} resets the counters to zero outside the 
write lock. It should be moved inside the lock else readers might see blips 
with the counters intermittently going to zero.
# Was {{stillPendingUncached}} introduced to fix a bug?

Minor code style comment: {{getPendingCachingCount}} can be condensed to 
{code}
return (monitor != null ? monitor.getPendingCachingCount() : 0);
{code}

Same with {{getPendingUncachingCount}}.

Change looks good otherwise.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-11 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931265#comment-13931265
 ] 

Arpit Agarwal commented on HDFS-6093:
-

+1 for the idea, thanks Andrew! :-)

I'll try to review this tomorrow if no one else has got to it by then.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users

2014-03-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931375#comment-13931375
 ] 

Hadoop QA commented on HDFS-6093:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634074/hdfs-6093-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6378//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6378//console

This message is automatically generated.

 Expose more caching information for debugging by users
 --

 Key: HDFS-6093
 URL: https://issues.apache.org/jira/browse/HDFS-6093
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-6093-1.patch


 When users submit a new cache directive, it's unclear if the NN has 
 recognized it and is actively trying to cache it, or if it's hung for some 
 other reason. It'd be nice to expose a pending caching/uncaching count the 
 same way we expose pending replication work.
 It'd also be nice to display the aggregate cache capacity and usage in 
 dfsadmin -report, since we already have have it as a metric and expose it 
 per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)