from:"Harsh J $JIRA$"

[jira] [Created] (HDFS-11421) Make WebHDFS' ACLs RegEx configurable

2017-02-16 Thread Harsh J (JIRA)

Harsh J created HDFS-11421:
--

 Summary: Make WebHDFS' ACLs RegEx configurable
 Key: HDFS-11421
 URL: https://issues.apache.org/jira/browse/HDFS-11421
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Reporter: Harsh J
Assignee: Harsh J


Part of HDFS-5608 added support for GET/SET ACLs over WebHDFS. This currently 
identifies the passed arguments via a hard-coded regex that mandates certain 
group and user naming styles.

A similar limitation had existed before for CHOWN and other User/Group set 
related operations of WebHDFS, where it was then made configurable via 
HDFS-11391 + HDFS-4983.

Such configurability should be allowed for the ACL operations too.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-2569) DN decommissioning quirks

2016-11-24 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-2569.
---
Resolution: Cannot Reproduce
  Assignee: (was: Harsh J)

Cannot quite reproduce this on current versions.

> DN decommissioning quirks
> -
>
> Key: HDFS-2569
> URL: https://issues.apache.org/jira/browse/HDFS-2569
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 0.23.0
>Reporter: Harsh J
>
> Decommissioning a node is working slightly odd in 0.23+:
> The steps I did:
> - Start HDFS via {{hdfs namenode}} and {{hdfs datanode}}. 1-node cluster.
> - Zero files/blocks, so I go ahead and exclude-add my DN and do {{hdfs 
> dfsadmin -refreshNodes}}
> - I see the following log in NN tails, which is fine:
> {code}
> 11/11/20 09:28:10 INFO util.HostsFileReader: Setting the includes file to 
> 11/11/20 09:28:10 INFO util.HostsFileReader: Setting the excludes file to 
> build/test/excludes
> 11/11/20 09:28:10 INFO util.HostsFileReader: Refreshing hosts 
> (include/exclude) list
> 11/11/20 09:28:10 INFO util.HostsFileReader: Adding 192.168.1.23 to the list 
> of hosts from build/test/excludes
> {code}
> - However, DN log tail gets no new messages. DN still runs.
> - The dfshealth.jsp page shows this table, which makes no sense -- why is 
> there 1 live and 1 dead?:
> |Live Nodes|1 (Decommissioned: 1)|
> |Dead Nodes|1 (Decommissioned: 0)|
> |Decommissioning Nodes|0|
> - The live nodes page shows this, meaning DN is still up and heartbeating but 
> is decommissioned:
> |Node|Last Contact|Admin State|
> |192.168.1.23|0|Decommissioned|
> - The dead nodes page shows this, and the link to the DN is broken cause the 
> port is linked as -1. Also, showing 'false' for decommissioned makes no sense 
> when live node page shows that it is already decommissioned:
> |Node|Decommissioned|
> |192.168.1.23|false|
> Investigating if this is a quirk only observed when the DN had 0 blocks on it 
> in sum total.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-11012) Unnecessary INFO logging on DFSClients for InvalidToken

2016-10-14 Thread Harsh J (JIRA)

Harsh J created HDFS-11012:
--

 Summary: Unnecessary INFO logging on DFSClients for InvalidToken
 Key: HDFS-11012
 URL: https://issues.apache.org/jira/browse/HDFS-11012
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.5.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


In situations where a DFSClient would receive an InvalidToken exception (as 
described at [1]), a single retry is automatically made (as observed at [2]). 
However, we still print an INFO message into the DFSClient's logger even though 
the message is expected in some scenarios. This should ideally be a DEBUG level 
message to avoid confusion.

If the retry or the retried attempt fails, the final clause handles it anyway 
and prints out a proper WARN (as seen at [3]) so the INFO is unnecessary.

[1] - 
https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1330-L1356
[2] - 
https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L649-L651
 and 
https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1163-L1170
[3] - 
https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L652-L658
 and 
https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1171-L1177



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-6542) WebHDFSFileSystem doesn't transmit desired checksum type

2016-03-30 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-6542.
---
Resolution: Duplicate

I missed this JIRA when searching before I filed HDFS-10237, but now noticed 
via association to HADOOP-8240.

Since I've already posted a patch on HDFS-10237 and there's no ongoing 
work/assignee here, am marking this as a duplicate of HDFS-10237.

Sorry for the extra noise!

> WebHDFSFileSystem doesn't transmit desired checksum type
> 
>
> Key: HDFS-6542
> URL: https://issues.apache.org/jira/browse/HDFS-6542
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: Andrey Stepachev
>Priority: Minor
>
> Currently DFSClient has possibility to specify desired checksum type. This 
> behaviour controlled by dfs.checksym.type parameter settable by client. 
> It works with hdfs:// filesystem, but doesn't works with webhdfs.It fails to 
> work because webhdfs will use default type of checksumming initialised by 
> server instance of DFSClient.
> As example https://issues.apache.org/jira/browse/HADOOP-8240 doesn't works 
> with webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-9949) Testcase for catching DN UUID regeneration regression

2016-03-12 Thread Harsh J (JIRA)

Harsh J created HDFS-9949:
-

 Summary: Testcase for catching DN UUID regeneration regression
 Key: HDFS-9949
 URL: https://issues.apache.org/jira/browse/HDFS-9949
 Project: Hadoop HDFS
  Issue Type: Test
Affects Versions: 2.6.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


In the following scenario, in releases without HDFS-8211, the DN may regenerate 
its UUIDs unintentionally.

0. Consider a DN with two disks {{/data1/dfs/dn,/data2/dfs/dn}}
1. Stop DN
2. Unmount the second disk, {{/data2/dfs/dn}}
3. Create (in the scenario, this was an accident) /data2/dfs/dn on the root path
4. Start DN
5. DN now considers /data2/dfs/dn empty so formats it, but during the format it 
uses {{datanode.getDatanodeUuid()}} which is null until register() is called.
6. As a result, after the directory loading, {{datanode.checkDatanodUuid()}} 
gets called with successful condition, and it causes a new generation of UUID 
which is written to all disks {{/data1/dfs/dn/current/VERSION}} and 
{{/data2/dfs/dn/current/VERSION}}.
7. Stop DN (in the scenario, this was when the mistake of unmounted disk was 
realised)
8. Mount the second disk back again {{/data2/dfs/dn}}, causing the {{VERSION}} 
file to be the original one again on it (mounting masks the root path that we 
last generated upon).
9. DN fails to start up cause it finds mismatched UUID between the two disks

The DN should not generate a new UUID if one of the storage disks already have 
the older one.

HDFS-8211 unintentionally fixes this by changing the 
{{datanode.getDatanodeUuid()}} function to rely on the {{DataStorage}} 
representation of the UUID vs. the {{DatanodeID}} object which only gets 
available (non-null) _after_ the registration.

It'd still be good to add a direct test case to the above scenario that passes 
on trunk and branch-2, but fails on branch-2.7 and lower, so we can catch a 
regression around this in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-8475) Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available

2016-03-09 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-8475.
---
Resolution: Not A Bug

I don't see a bug reported here - the report says the write was done with a 
single replica and that the single replica was manually corrupted.

Please post to u...@hadoop.apache.org for problems observed in usage.

If you plan to reopen this, please post precise steps of how the bug may be 
reproduced.

I'd recommend looking at your NN and DN logs to trace further on what's 
happening.

> Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no 
> length prefix available
> 
>
> Key: HDFS-8475
> URL: https://issues.apache.org/jira/browse/HDFS-8475
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Vinod Valecha
>Priority: Blocker
>
> Scenraio:
> =
> write a file
> corrupt block manually
> Exception stack trace- 
> 2015-05-24 02:31:55.291 INFO [T-33716795] 
> [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Exception in 
> createBlockOutputStream
> java.io.EOFException: Premature EOF: no length prefix available
> at 
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1155)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> [5/24/15 2:31:55:291 UTC] 02027a3b DFSClient I 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer createBlockOutputStream 
> Exception in createBlockOutputStream
>  java.io.EOFException: Premature EOF: no 
> length prefix available
> at 
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1155)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> 2015-05-24 02:31:55.291 INFO [T-33716795] 
> [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Abandoning 
> BP-176676314-10.108.106.59-1402620296713:blk_1404621403_330880579
> [5/24/15 2:31:55:291 UTC] 02027a3b DFSClient I 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream 
> Abandoning BP-176676314-10.108.106.59-1402620296713:blk_1404621403_330880579
> 2015-05-24 02:31:55.299 INFO [T-33716795] 
> [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Excluding datanode 
> 10.108.106.59:50010
> [5/24/15 2:31:55:299 UTC] 02027a3b DFSClient I 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream 
> Excluding datanode 10.108.106.59:50010
> 2015-05-24 02:31:55.300 WARNING [T-33716795] 
> [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /var/db/opera/files/B4889CCDA75F9751DDBB488E5AAB433E/BE4DAEF290B7136ED6EF3D4B157441A2/BE4DAEF290B7136ED6EF3D4B157441A2-4.pag
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and 1 node(s) are excluded in this operation.
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
> [5/24/15 2:31:55:300 UTC] 02027a3b DFSClient W 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer run DataStreamer Exception
>  
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /var/db/opera/files/B4889CCDA75F9751DDBB488E5AAB433E/BE4DAEF290B7136ED6EF3D4B157441A2/BE4DAEF290B7136ED6EF3D4B157441A2-4.pag
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and 1 node(s) are excluded in this operation.
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
> at 
>

[jira] [Resolved] (HDFS-8298) HA: NameNode should not shut down completely without quorum, doesn't recover from temporary network outages

2015-11-16 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-8298.
---
Resolution: Invalid

Closing out - for specific identified improvements (such as log improvements, 
or ideas about improving unclear root-causing), please log a more direct JIRA.

> HA: NameNode should not shut down completely without quorum, doesn't recover 
> from temporary network outages
> ---
>
> Key: HDFS-8298
> URL: https://issues.apache.org/jira/browse/HDFS-8298
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, HDFS, namenode, qjm
>Affects Versions: 2.6.0
>Reporter: Hari Sekhon
>
> In an HDFS HA setup if there is a temporary problem with contacting journal 
> nodes (eg. network interruption), the NameNode shuts down entirely, when it 
> should instead go in to a standby mode so that it can stay online and retry 
> to achieve quorum later.
> If both NameNodes shut themselves off like this then even after the temporary 
> network outage is resolved, the entire cluster remains offline indefinitely 
> until operator intervention, whereas it could have self-repaired after 
> re-contacting the journalnodes and re-achieving quorum.
> {code}2015-04-15 15:59:26,900 FATAL namenode.FSEditLog 
> (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for 
> required journal (JournalAndStre
> am(mgr=QJM to [:8485, :8485, :8485], stream=QuorumOutputStream 
> starting at txid 54270281))
> java.io.IOException: Interrupted waiting 2ms for a quorum of nodes to 
> respond.
> at 
> org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:134)
> at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
> at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:639)
> at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:388)
> at java.lang.Thread.run(Thread.java:745)
> 2015-04-15 15:59:26,901 WARN  client.QuorumJournalManager 
> (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at 
> txid 54270281
> 2015-04-15 15:59:26,904 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
> Exiting with status 1
> 2015-04-15 15:59:27,001 INFO  namenode.NameNode (StringUtils.java:run(659)) - 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NameNode at /
> /{code}
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-6674) UserGroupInformation.loginUserFromKeytab will hang forever if keytab file length is less than 6 byte.

2015-09-25 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-6674.
---
Resolution: Invalid

The hang, if still valid, seems to result as an outcome of the underlying Java 
libraries being at fault. There's not anything HDFS can control about this, and 
this bug instead needs to be reported to the Oracle/OpenJDK communities with a 
test case.

> UserGroupInformation.loginUserFromKeytab will hang forever if keytab file 
> length  is less than 6 byte.
> --
>
> Key: HDFS-6674
> URL: https://issues.apache.org/jira/browse/HDFS-6674
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.0.1-alpha
>Reporter: liuyang
>Priority: Minor
>
> The jstack is as follows:
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.available(Native Method)
>   at java.io.BufferedInputStream.available(BufferedInputStream.java:399)
>   - locked <0x000745585330> (a 
> sun.security.krb5.internal.ktab.KeyTabInputStream)
>   at sun.security.krb5.internal.ktab.KeyTab.load(KeyTab.java:257)
>   at sun.security.krb5.internal.ktab.KeyTab.(KeyTab.java:97)
>   at sun.security.krb5.internal.ktab.KeyTab.getInstance0(KeyTab.java:124)
>   - locked <0x000745586560> (a java.lang.Class for 
> sun.security.krb5.internal.ktab.KeyTab)
>   at sun.security.krb5.internal.ktab.KeyTab.getInstance(KeyTab.java:157)
>   at javax.security.auth.kerberos.KeyTab.takeSnapshot(KeyTab.java:119)
>   at 
> javax.security.auth.kerberos.KeyTab.getEncryptionKeys(KeyTab.java:192)
>   at 
> javax.security.auth.kerberos.JavaxSecurityAuthKerberosAccessImpl.keyTabGetEncryptionKeys(JavaxSecurityAuthKerberosAccessImpl.java:36)
>   at 
> sun.security.jgss.krb5.Krb5Util.keysFromJavaxKeyTab(Krb5Util.java:381)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:701)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:784)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:203)
>   at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721)
>   at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:590)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:679)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-4224) The dncp_block_verification log can be compressed

2015-09-15 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4224.
---
Resolution: Invalid

Invalid after HDFS-7430

> The dncp_block_verification log can be compressed
> -
>
> Key: HDFS-4224
> URL: https://issues.apache.org/jira/browse/HDFS-4224
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Priority: Minor
>
> On some systems, I noticed that when the scanner runs, the 
> dncp_block_verification.log.curr file under the block pool gets quite large 
> (several GBs). Although this is rolled away, we could also configure 
> compression upon it (a codec that may work without natives, would be a good 
> default) and save on I/O and space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-237) Better handling of dfsadmin command when namenode is slow

2015-09-06 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-237.
--
Resolution: Later

This older JIRA is a bit stale given the multiple changes that went into the 
RPC side. Follow HADOOP-9640 and related JIRAs instead for more recent work.

bq. a separate rpc queue

This is supported today via the servicerpc-address configs (typically set to 
8022, and strongly recommended for HA modes).

> Better handling of dfsadmin command when namenode is slow
> -
>
> Key: HDFS-237
> URL: https://issues.apache.org/jira/browse/HDFS-237
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Koji Noguchi
>
> Probably when hitting HADOOP-3810, Namenode became unresponsive.  Large time 
> spent in GC.
> All dfs/dfsadmin command were timing out.
> WebUI was coming up after waiting for a long time.
> Maybe setting a long timeout would have made the dfsadmin command go through.
> But it would be nice to have a separate queue/handler which doesn't compete 
> with regular rpc calls.
> All I wanted to do was dfsadmin -safemode enter, dfsadmin -finalizeUpgrade ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8516) The 'hdfs crypto -listZones' should not print an extra newline at end of output

2015-06-02 Thread Harsh J (JIRA)

Harsh J created HDFS-8516:
-

 Summary: The 'hdfs crypto -listZones' should not print an extra 
newline at end of output
 Key: HDFS-8516
 URL: https://issues.apache.org/jira/browse/HDFS-8516
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


It currently prints an extra newline (TableListing already adds a newline to 
end of table string).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7306) can't decommission w/under construction blocks

2015-04-01 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-7306.
---
Resolution: Duplicate

This should be resolved via HDFS-5579.

 can't decommission w/under construction blocks
 --

 Key: HDFS-7306
 URL: https://issues.apache.org/jira/browse/HDFS-7306
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Allen Wittenauer

 We need a way to decommission a node with open blocks.  Now that HDFS 
 supports append, this should be do-able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-3349) DFSAdmin fetchImage command should initialize security credentials

2015-03-16 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3349.
---
  Resolution: Cannot Reproduce
Target Version/s:   (was: 2.0.0-alpha)

Trying with lack of credentials throws the proper response back (No tgt). I 
think this is stale given Aaron's comment as well, marking as resolved.

 DFSAdmin fetchImage command should initialize security credentials
 --

 Key: HDFS-3349
 URL: https://issues.apache.org/jira/browse/HDFS-3349
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Priority: Minor

 The `hdfs dfsadmin -fetchImage' command should fetch the fsimage using the 
 appropriate credentials if security is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-2360) Ugly stacktrace when quota exceeds

2015-03-16 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-2360.
---
Resolution: Not a Problem

The last line of the command (excluding the log and its stack trace via the 
WARN) does today print the base message reason that should catch the eye 
clearly:

{code}
put: The DiskSpace quota of /testDir is exceeded: quota = 1024 B = 1 KB but 
diskspace consumed = 402653184 B = 384 MB
{code}

Resolving this as it should be clear enough. To get rid of the WARN, the client 
logger can be nullified, but the catch layer is rather generic today to 
specifically turn it off without causing other impact (for other use-cases and 
troubles) I think.

As always though, feel free to reopen with any counter-point.

 Ugly stacktrace when quota exceeds
 --

 Key: HDFS-2360
 URL: https://issues.apache.org/jira/browse/HDFS-2360
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 0.23.0
Reporter: Rajit Saha
Priority: Minor

 Will it be better to catch the exception and throw a small reasonable messege 
 to user when they exceed quota?
 $hdfs  dfs -mkdir testDir
 $hdfs  dfsadmin -setSpaceQuota 191M  testDir
 $hdfs dfs -count -q testDir
 none inf   200278016   2002780161 
0  0
 hdfs://NN hostname:port/user/hdfsqa/testDir
 $hdfs dfs -put /etc/passwd /user/hadoopqa/testDir 
 11/09/19 08:08:15 WARN hdfs.DFSClient: DataStreamer Exception
 org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota 
 of /user/hdfsqa/testDir is exceeded:
 quota=191.0m diskspace consumed=768.0m
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.verifyQuota(INodeDirectoryWithQuota.java:159)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:1609)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1383)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:370)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.allocateBlock(FSNamesystem.java:1681)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1476)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:389)
 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:365)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1496)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1492)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1490)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
 at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1100)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:972)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:454)
 Caused by: org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The 
 DiskSpace quota of /user/hdfsqa/testDir is
 exceeded: quota=191.0m diskspace consumed=768.0m
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.verifyQuota(INodeDirectoryWithQuota.java:159)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:1609)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1383)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:370)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.allocateBlock(FSNamesystem.java:1681)
 at

[jira] [Resolved] (HDFS-5740) getmerge file system shell command needs error message for user error

2015-03-16 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-5740.
---
Resolution: Not a Problem

This is no longer an issue on branch-2 and trunk today. The command accepts a 
collection of files now, and prepares the output accordingly.

 getmerge file system shell command needs error message for user error
 -

 Key: HDFS-5740
 URL: https://issues.apache.org/jira/browse/HDFS-5740
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 1.1.2
 Environment: {noformat}[jpfuntner@h58 tmp]$ cat /etc/redhat-release
 Red Hat Enterprise Linux Server release 6.0 (Santiago)
 [jpfuntner@h58 tmp]$ hadoop version
 Hadoop 1.1.2.21
 Subversion  -r 
 Compiled by jenkins on Thu Jan 10 03:38:39 PST 2013
 From source with checksum ce0aa0de785f572347f1afee69c73861{noformat}
Reporter: John Pfuntner
Priority: Minor

 I naively tried a {{getmerge}} operation but it didn't seem to do anything 
 and there was no error message:
 {noformat}[jpfuntner@h58 tmp]$ hadoop fs -mkdir /user/jpfuntner/tmp
 [jpfuntner@h58 tmp]$ num=0; while [ $num -lt 5 ]; do echo file$num | hadoop 
 fs -put - /user/jpfuntner/tmp/file$num; let num=num+1; done
 [jpfuntner@h58 tmp]$ ls -A
 [jpfuntner@h58 tmp]$ hadoop fs -getmerge /user/jpfuntner/tmp/file* files.txt
 [jpfuntner@h58 tmp]$ ls -A
 [jpfuntner@h58 tmp]$ hadoop fs -ls /user/jpfuntner/tmp
 Found 5 items
 -rw---   3 jpfuntner hdfs  6 2014-01-08 17:37 
 /user/jpfuntner/tmp/file0
 -rw---   3 jpfuntner hdfs  6 2014-01-08 17:37 
 /user/jpfuntner/tmp/file1
 -rw---   3 jpfuntner hdfs  6 2014-01-08 17:37 
 /user/jpfuntner/tmp/file2
 -rw---   3 jpfuntner hdfs  6 2014-01-08 17:37 
 /user/jpfuntner/tmp/file3
 -rw---   3 jpfuntner hdfs  6 2014-01-08 17:37 
 /user/jpfuntner/tmp/file4
 [jpfuntner@h58 tmp]$ {noformat}
 It was pointed out to me that I made a mistake and my source should have been 
 a directory not a set of regular files.  It works if I use the directory:
 {noformat}[jpfuntner@h58 tmp]$ hadoop fs -getmerge /user/jpfuntner/tmp/ 
 files.txt
 [jpfuntner@h58 tmp]$ ls -A
 files.txt  .files.txt.crc
 [jpfuntner@h58 tmp]$ cat files.txt
 file0
 file1
 file2
 file3
 file4
 [jpfuntner@h58 tmp]$ {noformat}
 I think the {{getmerge}} command should issue an error message to let the 
 user know they made a mistake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-4494) Confusing exception for unresolvable hdfs host with security enabled

2015-03-16 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4494.
---
  Resolution: Done
Target Version/s: 2.1.0-beta, 3.0.0  (was: 3.0.0, 2.1.0-beta)

This seems resolved now (as of 2.6.0):

{code}
[root@host ~]# hdfs getconf -confKey hadoop.security.authentication
kerberos
[root@host ~]# hadoop fs -ls hdfs://asdfsdfsdf/
-ls: java.net.UnknownHostException: asdfsdfsdf
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [path ...]
{code}

Marking as Done.

 Confusing exception for unresolvable hdfs host with security enabled
 

 Key: HDFS-4494
 URL: https://issues.apache.org/jira/browse/HDFS-4494
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Priority: Minor

 {noformat}
 $ hadoop fs -ls hdfs://unresolvable-host
 ls: Can't replace _HOST pattern since client address is null
 {noformat}
 It's misleading because it's not even related to the client's address.  It'd 
 be a bit more informative to see something like {{UnknownHostException: 
 unresolvable-host}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-4290) Expose an event listener interface in DFSOutputStreams for block write pipeline status changes

2015-03-16 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4290.
---
Resolution: Later

Specific problems/use-cases driving this need haven't been bought up in the 
past years. Resolving as Later for now.

 Expose an event listener interface in DFSOutputStreams for block write 
 pipeline status changes
 --

 Key: HDFS-4290
 URL: https://issues.apache.org/jira/browse/HDFS-4290
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor

 I've noticed HBase periodically polls the current status of block replicas 
 for its HLog files via the API presented by HDFS-826.
 It would perhaps be better for such clients if they could register a listener 
 instead. The listener(s) can be sent an event in case things change in the 
 last open block (due to DN fall but no replacement found, etc. cases). This 
 would avoid having a periodic, parallel looped check in such clients and be 
 more efficient.
 Just a thought :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-3621) Add a main method to HdfsConfiguration, for debug purposes

2015-03-16 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3621.
---
Resolution: Won't Fix

Thanks for the work Plamen!

 Add a main method to HdfsConfiguration, for debug purposes
 --

 Key: HDFS-3621
 URL: https://issues.apache.org/jira/browse/HDFS-3621
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Plamen Jeliazkov
Priority: Trivial
  Labels: newbie
 Attachments: HDFS-3621.patch


 Just like Configuration has a main() func that dumps XML out for debug 
 purposes, we should have a similar function under the HdfsConfiguration class 
 that does the same. This is useful in testing out app classpath setups at 
 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7899) Improve EOF error message

2015-03-06 Thread Harsh J (JIRA)

Harsh J created HDFS-7899:
-

 Summary: Improve EOF error message
 Key: HDFS-7899
 URL: https://issues.apache.org/jira/browse/HDFS-7899
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Harsh J
Priority: Minor


Currently, a DN disconnection for reasons other than connection timeout or 
refused messages, such as an EOF message as a result of rejection or other 
network fault, reports in this manner:

{code}
WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /x.x.x.x: for 
block, add to deadNodes and continue. java.io.EOFException: Premature EOF: no 
length prefix available 
java.io.EOFException: Premature EOF: no length prefix available 
at 
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
 
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392)
 
at 
org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137)
 
at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1103) 
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538) 
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750) 
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794) 
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:602) 
{code}

This is not very clear to a user (warn's at the hdfs-client). It could likely 
be improved with a more diagnosable message, or at least the direct reason than 
an EOF.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-5688) Wire-encription in QJM

2015-03-05 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-5688.
---
Resolution: Cannot Reproduce

 Wire-encription in QJM
 --

 Key: HDFS-5688
 URL: https://issues.apache.org/jira/browse/HDFS-5688
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, journal-node, security
Affects Versions: 2.2.0
Reporter: Juan Carlos Fernandez
  Labels: security
 Attachments: core-site.xml, hdfs-site.xml, jaas.conf, journal.xml, 
 namenode.xml, ssl-client.xml, ssl-server.xml


 When HA is implemented with QJM and using kerberos, it's not possible to set 
 wire-encrypted data.
 If it's set property hadoop.rpc.protection to something different to 
 authentication it doesn't work propertly, getting the error:
 ERROR security.UserGroupInformation: PriviledgedActionException 
 as:principal@REALM (auth:KERBEROS) cause:javax.security.sasl.SaslException: 
 No common protection layer between client and ser
 With NFS as shared storage everything works like a charm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7752) Improve description for dfs.namenode.num.extra.edits.retained and dfs.namenode.num.checkpoints.retained properties on hdfs-default.xml

2015-02-20 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-7752.
---
  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s:   (was: 2.7.0)

Thanks Wellington! I've committed this to branch-2 and trunk.

 Improve description for dfs.namenode.num.extra.edits.retained and 
 dfs.namenode.num.checkpoints.retained properties on hdfs-default.xml
 --

 Key: HDFS-7752
 URL: https://issues.apache.org/jira/browse/HDFS-7752
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.6.0
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7752.patch, HDFS-7752.patch


 Current description for dfs.namenode.num.extra.edits.retained and 
 dfs.namenode.num.checkpoints.retained properties on hdfs-default.xml is not 
 clear on how much and which files will be kept on namenodes meta-data 
 directory. 
 For dfs.namenode.num.checkpoints.retained, it's not clear that it applies 
 to the number of fsimage_* files.
 For dfs.namenode.num.extra.edits.retained, it's not clear the value set 
 indirectly applies to edits_* files, and how the configured value 
 translates into the number of edit files to be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7580) NN - JN communication should use reusable authentication methods

2015-01-04 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-7580.
---
Resolution: Invalid

Looking at the JDK sources there's no way to programmatically configure the KDC 
timeouts, so resolving this as invalid as there's nothing we can really do at 
our end.

I'll just make a krb5.conf change.

 NN - JN communication should use reusable authentication methods
 -

 Key: HDFS-7580
 URL: https://issues.apache.org/jira/browse/HDFS-7580
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: journal-node, namenode
Affects Versions: 2.5.0
Reporter: Harsh J

 It appears that NNs talk to JNs via general SaslRPC in secure mode, causing 
 all requests to be carried out with a kerberos authentication. This can cause 
 delays and occasionally NN failures if the KDC used does not respond in its 
 default timeout period (30s, whereas the QJM writes come with default of 20s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7532) dncp_block_verification.log.prev too large

2015-01-01 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-7532.
---
Resolution: Duplicate

Should be eventually fixed via HDFS-7430.

Yes, you may shutdown the affected DN temporarily, delete these files and start 
it back up.

 dncp_block_verification.log.prev too large
 --

 Key: HDFS-7532
 URL: https://issues.apache.org/jira/browse/HDFS-7532
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Arti Wadhwani
Priority: Blocker

 Hi, 
 Using hadoop version :  Hadoop 2.0.0-cdh4.7.0
 can see on one datanode, dncp_block_verification.log.prev is too  large. 
 Is it safe to delete this file? 
 {noformat}
 -rw-r--r-- 1 hdfs hdfs 1166438426181 Oct 31 09:34 
 dncp_block_verification.log.prev
 -rw-r--r-- 1 hdfs hdfs 138576163 Dec 15 22:16 
 dncp_block_verification.log.curr
 {noformat}
 This is similar to HDFS-6114 but that is for dncp_block_verification.log.curr 
 file. 
 Thanks,
 Arti Wadhwani



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7580) NN - JN communication should use reusable authentication methods

2015-01-01 Thread Harsh J (JIRA)

Harsh J created HDFS-7580:
-

 Summary: NN - JN communication should use reusable authentication 
methods
 Key: HDFS-7580
 URL: https://issues.apache.org/jira/browse/HDFS-7580
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: journal-node, namenode
Affects Versions: 2.5.0
Reporter: Harsh J


It appears that NNs talk to JNs via general SaslRPC in secure mode, causing all 
requests to be carried out with a kerberos authentication. This can cause 
delays and occasionally NN failures if the KDC used does not respond in its 
default timeout period (30s, whereas the QJM writes come with default of 20s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern

2014-12-18 Thread Harsh J (JIRA)

Harsh J created HDFS-7546:
-

 Summary: Document, and set an accepting default for 
dfs.namenode.kerberos.principal.pattern
 Key: HDFS-7546
 URL: https://issues.apache.org/jira/browse/HDFS-7546
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: security
Reporter: Harsh J
Priority: Minor


This config is used in the SaslRpcClient, and the no-default breaks cross-realm 
trust principals being used at clients.

Current location: 
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309

The config should be documented and the default should be set to * to preserve 
the prior-to-introduction behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7501) TransactionsSinceLastCheckpoint can be negative on SBNs

2014-12-09 Thread Harsh J (JIRA)

Harsh J created HDFS-7501:
-

 Summary: TransactionsSinceLastCheckpoint can be negative on SBNs
 Key: HDFS-7501
 URL: https://issues.apache.org/jira/browse/HDFS-7501
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Harsh J
Priority: Trivial


The metric TransactionsSinceLastCheckpoint is derived as FSEditLog.txid minus 
NNStorage.mostRecentCheckpointTxId.

In Standby mode, the former does not increment beyond the loaded or 
last-when-active value, but the latter does change due to checkpoints done 
regularly in this mode. Thereby, the SBN will eventually end up showing 
negative values for TransactionsSinceLastCheckpoint.

This is not an issue as the metric only makes sense to be monitored on the 
Active NameNode, but we should perhaps just show the value 0 by detecting if 
the NN is in SBN form, as allowing a negative number is confusing to view 
within a chart that tracks it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7290) Add HTTP response code to the HttpPutFailedException message

2014-10-25 Thread Harsh J (JIRA)

Harsh J created HDFS-7290:
-

 Summary: Add HTTP response code to the HttpPutFailedException 
message
 Key: HDFS-7290
 URL: https://issues.apache.org/jira/browse/HDFS-7290
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.5.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


If the TransferFsImage#uploadImageFromStorage(…) call fails for some reason, we 
try to print back the reason of the connection failure.

We currently only grab connection.getResponseMessage(…) and use that as our 
exception's lone string, but this can often be empty if there was no real 
response message from the connection end. However, the failures always have a 
code, so we should also ensure to print the error code returned, for at least a 
partial hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-3534) LeaseExpiredException on NameNode if file is moved while being created.

2014-03-26 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3534.
---

Resolution: Not A Problem

As explained in above comments, this is expected behaviour. Resolving.

 LeaseExpiredException on NameNode if file is moved while being created.
 ---

 Key: HDFS-3534
 URL: https://issues.apache.org/jira/browse/HDFS-3534
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.2, 0.20.205.0
Reporter: Mitesh Singh Jat

 If a file (big_file.txt size=512MB) being created (or uploaded) on hdfs, and 
 a rename (fs -mv) of that file is done. Then following exception occurs:-
 {noformat}
 12/06/13 08:56:42 WARN hdfs.DFSClient: DataStreamer Exception: 
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
 /user/mitesh/temp/big_file.txt File does not exist. [Lease.  Holder: 
 DFSClient_-2105467303, pendingcreates: 1]
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1604)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1595)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1511)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:685)
 at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
 at org.apache.hadoop.ipc.Client.call(Client.java:1066)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
 at $Proxy6.addBlock(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 at $Proxy6.addBlock(Unknown Source)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3324)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3188)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2300(DFSClient.java:2406)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2646)
 12/06/13 08:56:42 WARN hdfs.DFSClient: Error Recovery for block 
 blk_-5525713112321593595_679317395 bad datanode[0] nodes == null
 12/06/13 08:56:42 WARN hdfs.DFSClient: Could not get block locations. Source 
 file /user/mitesh/temp/big_file.txt - Aborting...
 ...
 {noformat}
 Whereas this issue is not seen on *Hadoop 0.23*.
 I have used following shell script to simulate the issue.
 {code:title=run_parallely.sh}
 #!/bin/bash
 hadoop=hadoop
 filename=big_file.txt
 dest=/user/mitesh/temp/$filename
 dest2=/user/mitesh/xyz/$filename
 ## Clean up
 hadoop fs -rm -skipTrash $dest
 hadoop fs -rm -skipTrash $dest2
 ## Copy big_file.txt onto hdfs
 hadoop fs -put $filename $dest  cmd1.log 21 
 ## sleep until entry is created, hoping copying is not finished
 until $(hadoop fs -test -e $dest)
 do
 sleep 1
 done
 ## Now move
 hadoop fs -mv $dest $dest2  cmd2.log 21 
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-160) namenode fails to run on ppc

2014-02-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-160.
--

Resolution: Cannot Reproduce

This has likely gone stale now, and no similar reports have been received (PPC 
may be the reason?) - closing this one out as Cannot Reproduce.

 namenode fails to run on ppc
 

 Key: HDFS-160
 URL: https://issues.apache.org/jira/browse/HDFS-160
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: PowerPC using Fedora 9 (all updates) and gcj-1.5.0.0
Reporter: Fabian Deutsch
Priority: Minor
 Attachments: build.log, hadoop-env.sh, hadoop-site.xml, 
 java.hprof.txt, jdb-namenode-QUIT.log, netstat.log


 Hadoop starts, but eats 100% CPU. Data- and Secondarynamenodes can not 
 connect. No jobs were run, just trying to start the daemon. using 
 bin/start-dfs.sh.
 Using the same simple configuration on an x86-arch - also using Fedora 9 and 
 gcj-1.5.0.0 - works perfectly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-114) Remove code related to OP_READ_METADATA from DataNode

2014-02-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-114.
--

Resolution: Duplicate

 Remove code related to OP_READ_METADATA from DataNode
 -

 Key: HDFS-114
 URL: https://issues.apache.org/jira/browse/HDFS-114
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: All
Reporter: Lohit Vijayarenu
Priority: Minor

 HADOOP-2797 removed OP_READ_METADATA. But there is code still in DataNode for 
 this. We could remove this and the corresponding datanode metrics associated 
 with it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-156) namenode doesn't start if group id cannot be resolved to name

2014-02-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-156.
--

Resolution: Duplicate

Fixed indirectly via HADOOP-4656's change set. The 'id' command is now used 
instead of 'groups' when looking up user memberships.

 namenode doesn't start if group id cannot be resolved to name
 -

 Key: HDFS-156
 URL: https://issues.apache.org/jira/browse/HDFS-156
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: Linux n510 2.6.22-3-686 #1 SMP Mon Nov 12 08:32:57 UTC 
 2007 i686 GNU/Linux
 Java:
 java version 1.5.0_14
 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-b03)
 Java HotSpot(TM) Client VM (build 1.5.0_14-b03, mixed mode, sharing)
 PAM: ldap
Reporter: Andrew Gudkov
Assignee: Patrick Winters
Priority: Minor
 Attachments: groupname.patch


 Namenode failes to start because unix group name for my user can't be got. 
 First, system threw rather obscure message:
 {quote}
 ERROR dfs.NameNode (NameNode.java:main(856)) - java.lang.NullPointerException
 at org.apache.hadoop.dfs.FSNamesystem.close(FSNamesystem.java:428)
 at org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:237)
 at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
 at org.apache.hadoop.dfs.NameNode.init(NameNode.java:175)
 at org.apache.hadoop.dfs.NameNode.init(NameNode.java:161)
 at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
 at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)
 {quote}
 I traversed through stack trace entries, and found (FSNamesystem:237) this 
 code
 {quote}
 233   FSNamesystem(NameNode nn, Configuration conf) throws IOException {
  234 try {
  235   initialize(nn, conf);
  236 } catch(IOException e) {
  237   close();
  238   throw e;
  239 }
  240   }
 {quote}
 Inserting e.printStackTrace() gave me next
 {quote}
 dfs.NameNodeMetrics (NameNodeMetrics.java:init(76)) - Initializing 
 NameNodeMeterics using context 
 object:org.apache.hadoop.metrics.spi.NullContext
 java.io.IOException: javax.security.auth.login.LoginException: Login failed: 
 id: cannot find name for group ID 1040
 at 
 org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
 at 
 org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:268)
 at 
 org.apache.hadoop.dfs.FSNamesystem.setConfigurationParameters(FSNamesystem.java:330)
 at 
 org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:249)
 at org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:235)
 at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
 at org.apache.hadoop.dfs.NameNode.init(NameNode.java:175)
 at org.apache.hadoop.dfs.NameNode.init(NameNode.java:161)
 at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
 at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)
 at 
 org.apache.hadoop.dfs.FSNamesystem.setConfigurationParameters(FSNamesystem.java:332)
 at 
 org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:249)
 at org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:235)
 at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
 at org.apache.hadoop.dfs.NameNode.init(NameNode.java:175)
 at org.apache.hadoop.dfs.NameNode.init(NameNode.java:161)
 at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
 at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)
 {quote}
 And this is true - command groups returns the same - id: cannot find name 
 for group ID 1040.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-184) SecondaryNameNode doCheckpoint() renames current directory before asking NameNode to rollEditLog()

2014-02-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-184.
--

Resolution: Not A Problem

This doesn't appear to be a problem today, esp. after the new edits and fsimage 
retention style, as we do rollEditLog as the first-thing before any other local 
operation.

Likely gone stale. Closing out for now as 'Not A Problem' (anymore).

 SecondaryNameNode doCheckpoint() renames current directory before asking 
 NameNode to rollEditLog()
 --

 Key: HDFS-184
 URL: https://issues.apache.org/jira/browse/HDFS-184
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lohit Vijayarenu
Priority: Minor

 In SecondaryNameNode doCheckPoint() function invokes _startCheckpoint()_ 
 before calling _namenode.rollEditLog()_
 _startCheckpoint()_ internally invokes _CheckpointStorage::startCheckpoint()_ 
 which renames current to lastcheckpoint.tmp. if call to namenode failed, then 
 we would redo the above step renaming empty current directory in next 
 iteration? Should we remove after we know namenode has successfully rolled 
 edits?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-174) GnuWin32 coreutils df output causes DF to throw

2014-02-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-174.
--

Resolution: Not A Problem

HDFS currently has proper Windows environment support without relying on local 
unix-like tools to be available.

Resolving as 'Not a Problem' (anymore).

 GnuWin32 coreutils df output causes DF to throw
 ---

 Key: HDFS-174
 URL: https://issues.apache.org/jira/browse/HDFS-174
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Albert Strasheim
Priority: Minor

 The output from GnuWin32's coreutils's df looks like this:
 C:\Program Files\GnuWin32\bindf -k C:\hadoop-0.13.0
 Filesystem   1K-blocks  Used Available Use% Mounted on
 df: `NTFS': No such file or directory
 - 96124924  86288848   9836076  90% C:\
 This causes DF's parsing to fail with the following exception:
 Exception in thread main java.io.IOException: df: `NTFS': No such file or 
 directory
   at org.apache.hadoop.fs.DF.doDF(DF.java:65)
   at org.apache.hadoop.fs.DF.init(DF.java:54)
   at org.apache.hadoop.fs.DF.main(DF.java:168)
 Fixing this would be useful since it might allow for Hadoop to be used 
 without installing Cygwin.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-285) limit concurrent connections(data serving thread) in one datanode

2014-02-03 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Harsh J resolved HDFS-285.
--

Resolution: Not A Problem

This has likely gone stale (probably addressed at a higher level via Raghu's
earliest comments).

In having seen some pretty large HBase region sets on several clusters, and
never having faced the described stack limit OOME (but having faced the
transceiver limits) I think this is likely no longer an issue.

Closing out as 'Not a Problem' (anymore).

limit concurrent connections(data serving thread) in one datanode
-

Key: HDFS-285
URL: https://issues.apache.org/jira/browse/HDFS-285
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Luo Ning
Priority: Minor

i'm here after HADOOP-2341 and HADOOP-2346, in my hbase env, many opening
mapfiles cause datanode OOME(stack memory), because 2000+ data serving
threads in datanode process.
although HADOOP-2346 has implements timeouts, it will be some situation many
connection created before the read timeout(default 6min) reach. like hbase
does, it open all files on regionserver startup.
limit concurrent connections(data serving thread) will make datanode more
stable. and i think it could be done in
SocketIOWithTimeout$SelectorPool#select:
1. in SelectorPool#select, record all waiting SelectorInfo instances in a
List at the beginning, and remove it after 'Selector#select' done.
2. before real 'select', do a limitation check, if reached, close the first
selectorInfo.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-345) DataNode to send block reports to multiple namenodes?

2014-02-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-345.
--

Resolution: Implemented

This is pretty close to the HDFS HA mechanism available in current versions.

Resolving as 'Implemented'.

 DataNode to send block reports to multiple namenodes?
 -

 Key: HDFS-345
 URL: https://issues.apache.org/jira/browse/HDFS-345
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Marco Nicosia
Priority: Minor

 I have this theory that I could test the memory footprint of a new version of 
 the Hadoop namenode, without interrupting a running instance. We could shut 
 down the secondary namenode process, and run a new version of the namenode 
 code on the image file found on the secondary namenode server.
 But just running on the image file wouldn't be enough. It'd be great if I 
 could get a real feel by having all the block reports also make their way to 
 my fake namenode.
 Would it be possible for datanodes to report to two different namenodes, even 
 if only one is the active, live namenode? (I understand that this wouldn't 
 work if the format of the block report, or worse, the rpc layer, were 
 incompatible.)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-372) DataNode should reuse delBlockFromDisk

2014-02-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-372.
--

Resolution: Not A Problem

The invalidation of blocks was deliberately moved onto using the async disk 
deletion services to facilitate for large delete operations without blocking 
operations.

The remainder of the deletes (except for unfinalizing a block) appear to be 
special cases (missing block files while meta continues to exist) under the 
FSDataSet implementation and the delBlockFromDisk wouldn't apply to it.

Likely gone stale. Closing out as 'Not a Problem'.

 DataNode should reuse delBlockFromDisk
 --

 Key: HDFS-372
 URL: https://issues.apache.org/jira/browse/HDFS-372
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Priority: Minor

 FSDataSet should reuse delBlcokFromDisk where it should/can be used like in 
 invalidateBlock.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-386) NameNode webUI should show the config it is running with.

2014-02-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-386.
--

Resolution: Duplicate

 NameNode webUI should show the config it is running with.
 -

 Key: HDFS-386
 URL: https://issues.apache.org/jira/browse/HDFS-386
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Lohit Vijayarenu
Priority: Minor

 It would be good if Namenode webUI also showed the config it is running with. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-431) port fuse-dfs existing autoconf to hadoop project's autoconf infrastructure

2014-02-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-431.
--

Resolution: Invalid

This is now Invalid since we've moved on to using CMake as the build framework 
instead.

 port fuse-dfs existing autoconf to hadoop project's autoconf infrastructure
 ---

 Key: HDFS-431
 URL: https://issues.apache.org/jira/browse/HDFS-431
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fuse-dfs
Reporter: Pete Wyckoff
Priority: Minor

 Although fuse-dfs has its own autoconf macros and such, better to use one set 
 of macros and in some places the macros could be improved.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-803) eclipse-files target needs to depend on 'ivy-retrieve-test'

2014-02-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-803.
--

Resolution: Not A Problem

Resolving as 'Not a Problem' (anymore) as we've long since moved onto using 
Maven instead of ant on trunk and on the 2.x stable releases.

 eclipse-files target needs to depend on 'ivy-retrieve-test'
 ---

 Key: HDFS-803
 URL: https://issues.apache.org/jira/browse/HDFS-803
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Reporter: Konstantin Boudnik
Priority: Minor
 Attachments: hdfs-803.patch


 When {{ant eclipse-files}} is executed only common jars are guarantee to be 
 pulled in. To pull test jars one needs to manually run {{ant 
 ivy-retrieve-test}} first.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-2892) Some of property descriptions are not given(hdfs-default.xml)

2014-01-27 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-2892.
---

  Resolution: Invalid
Target Version/s:   (was: 2.0.0-alpha, 3.0.0)

Resolving as Invalid as these were user questions.

 Some of property descriptions are not given(hdfs-default.xml) 
 --

 Key: HDFS-2892
 URL: https://issues.apache.org/jira/browse/HDFS-2892
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.23.0
Reporter: Brahma Reddy Battula
Priority: Trivial

 Hi..I taken 23.0 release form 
 http://hadoop.apache.org/common/releases.html#11+Nov%2C+2011%3A+release+0.23.0+available
 I just gone through all properties provided in the hdfs-default.xml..Some of 
 the property description not mentioned..It's better to give description of 
 property and usage(how to configure ) and Only MapReduce related jars only 
 provided..Please check following two configurations
  *No Description*
 {noformat}
 property
   namedfs.datanode.https.address/name
   value0.0.0.0:50475/value
 /property
 property
   namedfs.namenode.https-address/name
   value0.0.0.0:50470/value
 /property
 {noformat}
  Better to mention example usage (what to configure...format(syntax))in 
 desc,here I did not get what default mean whether this name of n/w interface 
 or something else
  property
   namedfs.datanode.dns.interface/name
   valuedefault/value
   descriptionThe name of the Network Interface from which a data node 
 should 
   report its IP address.
   /description
  /property
 The following property is commented..If it is not supported better to remove.
 property
namedfs.cluster.administrators/name
valueACL for the admins/value
descriptionThis configuration is used to control who can access the
 default servlets in the namenode, etc.
/description
 /property
  Small clarification for following property..if some value configured this 
 then NN will be safe mode upto this much time..
 May I know usage of the following property...
 property
   namedfs.blockreport.initialDelay/name  value0/value
   descriptionDelay for first block report in seconds./description
 /property



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HDFS-198) org.apache.hadoop.dfs.LeaseExpiredException during dfs write

2014-01-20 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Harsh J resolved HDFS-198.
--

Resolution: Not A Problem

This one has gone very stale and we have not seen any properly true reports of
lease renewals going amiss during long waiting tasks recently. Marking as 'Not
a Problem' (anymore). If there's a proper new report of this behaviour, please
lets file a new JIRA with the newer data.

[~bugcy013] - Your problem is pretty different from what OP appears to have
reported in an older version. Your problem arises out of MR tasks not utilising
an attempt ID based directory (which Hive appears to do sometimes), in which
case two different running attempts (out of speculative exec. or otherwise) can
cause one of them to run into this error as a result of the file overwrite.
Best to investigate further on a mailing list rather than here.

org.apache.hadoop.dfs.LeaseExpiredException during dfs write

Key: HDFS-198
URL: https://issues.apache.org/jira/browse/HDFS-198
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs-client, namenode
Reporter: Runping Qi

Many long running cpu intensive map tasks failed due to
org.apache.hadoop.dfs.LeaseExpiredException.
See [a comment
below|https://issues.apache.org/jira/browse/HDFS-198?focusedCommentId=12910298page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12910298]
for the exceptions from the log:

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5802) NameNode does not check for inode type before traversing down a path

2014-01-20 Thread Harsh J (JIRA)

Harsh J created HDFS-5802:
-

 Summary: NameNode does not check for inode type before traversing 
down a path
 Key: HDFS-5802
 URL: https://issues.apache.org/jira/browse/HDFS-5802
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Trivial


This came up during the discussion on a forum at 
http://community.cloudera.com/t5/Batch-Processing-and-Workflow/Permission-denied-access-EXECUTE-on-getting-the-status-of-a-file/m-p/5049#M162
 surrounding an fs.exists(…) check running on a path /foo/bar, where /foo is a 
file and not a directory.

In such a case, NameNode yields a user-confusing message of {{Permission 
denied: user=foo, access=EXECUTE, inode=/foo:foo:foo:-rw-r--r--}} instead of 
clearly saying (and realising) /foo is not a directory or /foo is a file 
before it tries to traverse further down to locate the requested path.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5189) Rename the CorruptBlocks metric to CorruptReplicas

2013-09-11 Thread Harsh J (JIRA)

Harsh J created HDFS-5189:
-

 Summary: Rename the CorruptBlocks metric to CorruptReplicas
 Key: HDFS-5189
 URL: https://issues.apache.org/jira/browse/HDFS-5189
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.1.0-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


The NameNode increments a CorruptBlocks metric even if only one of the block's
replicas is reported corrupt (genuine checksum fail, or even if a
replica has a bad genstamp). In cases where this is incremented, fsck
still reports a healthy state.

This is confusing to users and causes false alarm as they feel this is to be 
monitored (instead of MissingBlocks). The metric is truly trying to report only 
corrupt replicas, not whole blocks, and ought to be renamed.

FWIW, the dfsadmin -report reports a proper string of Blocks with corrupt 
replicas: when printing this count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-5046) Hang when add/remove a datanode into/from a 2 datanode cluster

2013-07-31 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Harsh J resolved HDFS-5046.
---

Resolution: Not A Problem

bq. a). decommission progress hangs and the status always be 'Waiting DataNode
status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
decommission continues and will be completed finally.

The step (a) points to your problem and solution both. You have files
being created with repl=3 on a 2 DN cluster which will prevent
decommission. This is not a bug.

Hang when add/remove a datanode into/from a 2 datanode cluster
--

Key: HDFS-5046
URL: https://issues.apache.org/jira/browse/HDFS-5046
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Affects Versions: 1.1.1
Environment: Red Hat Enterprise Linux Server release 5.3, 64 bit
Reporter: sam liu

1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in
hdfs-site.xml, set the 'dfs.replication' to 2
2. Add node dn3 into the cluster as a new datanode, and did not change the
'dfs.replication' value in hdfs-site.xml and keep it as 2
note: step 2 passed
3. Decommission dn3 from the cluster
Expected result: dn3 could be decommissioned successfully
Actual result:
a). decommission progress hangs and the status always be 'Waiting DataNode
status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the
decommission continues and will be completed finally.
b). However, if the initial cluster includes = 3 datanodes, this issue won't
be encountered when add/remove another datanode. For example, if I setup a
cluster with 3 datanodes, and then I can successfully add the 4th datanode
into it, and then also can successfully remove the 4th datanode from the
cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-4991) HDFSSeek API fails to seek to position when file is opened in write mode.

2013-07-14 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Harsh J resolved HDFS-4991.
---

Resolution: Invalid

First off, please do not use the JIRA as a QA medium. The HDFS project
provides and maintains a development list at hdfs-dev@hadoop.apache.org you
should mail to with such questions. Only file valid issues on the JIRA please.

Onto your question, HDFS has no random-write features, it does not support it
yet, hence there exists no API. If you plan to add such a feature, a design
document for your implementation idea and discussion on the hdfs-dev@ lists is
very welcome. Merely adding an API will not solve this - you will need to
understand why its a limitation at the architecture level currently first.

Resolving as invalid. Please use lists for general QA.

HDFSSeek API fails to seek to position when file is opened in write mode.
-

Key: HDFS-4991
URL: https://issues.apache.org/jira/browse/HDFS-4991
Project: Hadoop HDFS
Issue Type: Bug
Components: libhdfs
Affects Versions: 0.20.1
Environment: Redhat Linux
Reporter: Dayakar Reddy

Hi,
hdfsSeek API fails to seek to position when file is opened in write mode. I
studied in documentation that hdfsSeek is only supported when file is opened
in read mode.
We have a requirement of replacing the file resided on hadoop environment.
Is there any possibility of having HDFSSeek to be supported when file is
opened in write mode?
Regards,
Dayakar

[jira] [Created] (HDFS-4983) Numeric usernames do not work with WebHDFS FS

2013-07-11 Thread Harsh J (JIRA)

Harsh J created HDFS-4983:
-

 Summary: Numeric usernames do not work with WebHDFS FS
 Key: HDFS-4983
 URL: https://issues.apache.org/jira/browse/HDFS-4983
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Harsh J


Per the file 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/UserParam.java,
 the DOMAIN pattern is set to: {{^[A-Za-z_][A-Za-z0-9._-]*[$]?$}}.

Given this, using a username such as 123 seems to fail for some reason (tried 
on insecure setup):

{code}
[123@host-1 ~]$ whoami
123
[123@host-1 ~]$ hadoop fs -fs webhdfs://host-2.domain.com -ls /
-ls: Invalid value: 123 does not belong to the domain 
^[A-Za-z_][A-Za-z0-9._-]*[$]?$
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [path ...]
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE

2013-06-25 Thread Harsh J (JIRA)

Harsh J created HDFS-4936:
-

 Summary: Handle overflow condition for txid going over 
Long.MAX_VALUE
 Key: HDFS-4936
 URL: https://issues.apache.org/jira/browse/HDFS-4936
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on 
mailing lists).

I hacked up my local NN's txids manually to go very large (close to max) and 
decided to try out if this causes any harm. I basically bumped up the freshly 
formatted files' starting txid to 9223372036854775805 (and ensured image 
references the same by hex-editing it):

{code}
➜  current  ls
VERSION
fsimage_9223372036854775805.md5
fsimage_9223372036854775805
seen_txid
➜  current  cat seen_txid
9223372036854775805
{code}

NameNode started up as expected.

{code}
13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
seconds.
13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
9223372036854775805 from 
/temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
9223372036854775806
{code}

I could create a bunch of files and do regular ops (counting to much after the 
long max increments). I created over 10 files, just to make it go well over the 
Long.MAX_VALUE.

Quitting NameNode and restarting fails though, with the following error:

{code}
13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized 
segments in 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current
13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806
 - 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807
13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: Gap in transactions. Expected to be able to read up until 
at least txid 9223372036854775806 but unable to find any edit logs containing 
txid -9223372036854775808
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:609)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:590)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
{code}

Looks like we also lose some edits when we restart, as noted by the finalized 
edits filename:

{code}
VERSION
edits_9223372036854775806-9223372036854775807
fsimage_9223372036854775805
fsimage_9223372036854775805.md5
seen_txid
{code}

It seems like we won't be able to handle the case where txid overflows. Its a 
very very large number so that's not an immediate concern but seemed worthy of 
a report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE

2013-06-25 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4936.
---

Resolution: Not A Problem

 Handle overflow condition for txid going over Long.MAX_VALUE
 

 Key: HDFS-4936
 URL: https://issues.apache.org/jira/browse/HDFS-4936
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor

 Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on 
 mailing lists).
 I hacked up my local NN's txids manually to go very large (close to max) and 
 decided to try out if this causes any harm. I basically bumped up the freshly 
 formatted files' starting txid to 9223372036854775805 (and ensured image 
 references the same by hex-editing it):
 {code}
 ➜  current  ls
 VERSION
 fsimage_9223372036854775805.md5
 fsimage_9223372036854775805
 seen_txid
 ➜  current  cat seen_txid
 9223372036854775805
 {code}
 NameNode started up as expected.
 {code}
 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
 seconds.
 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
 9223372036854775805 from 
 /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
 9223372036854775806
 {code}
 I could create a bunch of files and do regular ops (counting to much after 
 the long max increments). I created over 10 files, just to make it go well 
 over the Long.MAX_VALUE.
 Quitting NameNode and restarting fails though, with the following error:
 {code}
 13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized 
 segments in 
 /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current
 13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file 
 /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806
  - 
 /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807
 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
 java.io.IOException: Gap in transactions. Expected to be able to read up 
 until at least txid 9223372036854775806 but unable to find any edit logs 
 containing txid -9223372036854775808
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:609)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:590)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
 {code}
 Looks like we also lose some edits when we restart, as noted by the finalized 
 edits filename:
 {code}
 VERSION
 edits_9223372036854775806-9223372036854775807
 fsimage_9223372036854775805
 fsimage_9223372036854775805.md5
 seen_txid
 {code}
 It seems like we won't be able to handle the case where txid overflows. Its a 
 very very large number so that's not an immediate concern but seemed worthy 
 of a report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-2316) [umbrella] WebHDFS: a complete FileSystem implementation for accessing HDFS over HTTP

2013-05-24 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-2316.
---

  Resolution: Fixed
Target Version/s:   (was: 0.22.1)

 [umbrella] WebHDFS: a complete FileSystem implementation for accessing HDFS 
 over HTTP
 -

 Key: HDFS-2316
 URL: https://issues.apache.org/jira/browse/HDFS-2316
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: webhdfs
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
  Labels: critical-0.22.0
 Fix For: 1.0.0, 0.23.1

 Attachments: test-webhdfs, test-webhdfs-0.20s, 
 WebHdfsAPI20111020.pdf, WebHdfsAPI2003.pdf, WebHdfsAPI2011.pdf


 We current have hftp for accessing HDFS over HTTP.  However, hftp is a 
 read-only FileSystem and does not provide write accesses.
 In HDFS-2284, we propose to have WebHDFS for providing a complete FileSystem 
 implementation for accessing HDFS over HTTP.  The is the umbrella JIRA for 
 the tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-4630) Datanode is going OOM due to small files in hdfs

2013-03-24 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4630.
---

Resolution: Invalid

Closing again per Suresh's comment, as this is by design and you're merely 
required to raise your heap to accommodate more files (and thereby, blocks). 
Please also see HDFS-4465 and HDFS-4461 on optimizations of this.

 Datanode is going OOM due to small files in hdfs
 

 Key: HDFS-4630
 URL: https://issues.apache.org/jira/browse/HDFS-4630
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha
 Environment: Ubuntu, Java 1.6
Reporter: Ankush Bhatiya
Priority: Blocker

 Hi, 
 We have very small files(size ranging 10KB-1MB) in our hdfs and no of files 
 are in tens of millions. Due to this namenode and datanode both going out of 
 memory very frequently. When we analyse the head dump of datanode most of the 
 memory was used by ReplicaMap. 
 Can we use EhCache or other to not to store all the data in memory? 
 Thanks
 Ankush

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-4624) eclipse plugin for hadoop 2.0.0-alpha

2013-03-21 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4624.
---

Resolution: Invalid

 eclipse plugin for hadoop 2.0.0-alpha
 -

 Key: HDFS-4624
 URL: https://issues.apache.org/jira/browse/HDFS-4624
 Project: Hadoop HDFS
  Issue Type: Wish
  Components: federation
 Environment: ubuntu 12.04, java 1.7, 
Reporter: Sreevatson

 Is there an eclipse plug in available for hadoop 2.0.0-alpha? i am currently 
 working on a project to device a solution for small files problem and i am 
 using hdfs federation. I want to integrate our web server with hdfs. So I 
 need eclipse plugin for this version. Please help me out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-4509) Provide a way to ask Balancer to exclude certain DataNodes in its computation and/or work.

2013-02-28 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4509.
---

Resolution: Duplicate

Dupe of HDFS-4420.

 Provide a way to ask Balancer to exclude certain DataNodes in its computation 
 and/or work.
 --

 Key: HDFS-4509
 URL: https://issues.apache.org/jira/browse/HDFS-4509
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Harsh J
Priority: Minor

 This comes particularly useful in clusters that have a split between DNs used 
 for regular purpose and DNs used for HBase RSes specifically. By asking the 
 balancer to exclude the DNs that RSes run on, its possible to avoid impacting 
 HBase's local reads performance, and the balancing of these nodes can be 
 deferred to a later time.
 An alternate, and perhaps simpler approach would be to make the Balancer 
 file-aware and ask it to skip a specific directory's file's blocks (i.e. that 
 of /hbase for example).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4508) Two minor improvements to the QJM Deployment docs

2013-02-17 Thread Harsh J (JIRA)

Harsh J created HDFS-4508:
-

 Summary: Two minor improvements to the QJM Deployment docs
 Key: HDFS-4508
 URL: https://issues.apache.org/jira/browse/HDFS-4508
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Priority: Minor


Suggested by ML user Azurry, the docs at 
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html#Deployment_details
 can be improved for two specific lines:

{quote}
* If you have already formatted the NameNode, or are converting a 
non-HA-enabled cluster to be HA-enabled, you should now copy over the contents 
of your NameNode metadata directories to the other, unformatted NameNode by 
running the command hdfs namenode -bootstrapStandby on the unformatted 
NameNode. Running this command will also ensure that the JournalNodes (as 
configured by dfs.namenode.shared.edits.dir) contain sufficient edits 
transactions to be able to start both NameNodes.
* If you are converting a non-HA NameNode to be HA, you should run the command 
hdfs -initializeSharedEdits, which will initialize the JournalNodes with the 
edits data from the local NameNode edits directories.
{quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4509) Provide a way to ask Balancer to exclude certain DataNodes in its computation and/or work.

2013-02-17 Thread Harsh J (JIRA)

Harsh J created HDFS-4509:
-

 Summary: Provide a way to ask Balancer to exclude certain 
DataNodes in its computation and/or work.
 Key: HDFS-4509
 URL: https://issues.apache.org/jira/browse/HDFS-4509
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Reporter: Harsh J
Priority: Minor


This comes particularly useful in clusters that have a split between DNs used 
for regular purpose and DNs used for HBase RSes specifically. By asking the 
balancer to exclude the DNs that RSes run on, its possible to avoid impacting 
HBase's local reads performance, and the balancing of these nodes can be 
deferred to a later time.

An alternate, and perhaps simpler approach would be to make the Balancer 
file-aware and ask it to skip a specific directory's file's blocks (i.e. that 
of /hbase for example).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-976) Hot Standby for NameNode

2013-02-08 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-976.
--

Resolution: Duplicate

A working HDFS HA mode has been implemented via HDFS-1623. Closing this one out 
as a 'dupe'.

 Hot Standby for NameNode
 

 Key: HDFS-976
 URL: https://issues.apache.org/jira/browse/HDFS-976
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Reporter: dhruba borthakur
Assignee: Dmytro Molkov
 Attachments: 0001-0.20.3_rc2-AvatarNode.patch, AvatarNode.20.patch, 
 AvatarNodeDescription.txt, AvatarNode.patch, AvatarPatch.2.patch


 This is a place holder to share our code and experiences about implementing a 
 Hot Standby for the HDFS NameNode for hadoop 0.20. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4449) When a decommission is awaiting closure of live blocks, show the block IDs on the NameNode's UI report

2013-01-28 Thread Harsh J (JIRA)

Harsh J created HDFS-4449:
-

 Summary: When a decommission is awaiting closure of live blocks, 
show the block IDs on the NameNode's UI report
 Key: HDFS-4449
 URL: https://issues.apache.org/jira/browse/HDFS-4449
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Harsh J
Assignee: Harsh J


It is rather common for people to be complaining about 'DN decommission' hangs 
cause of live blocks waiting to get completed by some app (especially certain 
HBase specifics cause a file to be open for a longer time, as compared with 
MR/etc.).

While they can see a count of the blocks that are live, we should add some more 
details to that view. Particularly add the list of live blocks waiting to be 
closed, so that a user may understand better on why its hung and also be able 
to trace back the block to files manually if needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3801) Provide a way to disable browsing of files from the web UI

2013-01-25 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3801.
---

Resolution: Won't Fix

Hi Suresh and others,

Yes I agree, we can close this. It is better to go with a filter.

 Provide a way to disable browsing of files from the web UI
 --

 Key: HDFS-3801
 URL: https://issues.apache.org/jira/browse/HDFS-3801
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
 Attachments: HDFS-3801.patch


 A few times we've had requests from users who wish to disable browsing of the 
 filesystem in the web UI completely, while keeping other servlet 
 functionality enabled (such as fsck, etc.). Right now, the cheap way to do 
 this is by blocking out the DN web port (50075) from access by clients, but 
 that also hampers HFTP transfers.
 We should instead provide a toggle config for the JSPs to use and disallow 
 browsing if the toggle's enabled. The config can be true by default, to not 
 change the behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-4425) NameNode low on available disk space

2013-01-22 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Harsh J resolved HDFS-4425.
---

Resolution: Invalid

The Apache JIRA is not for user help but only for confirmed bug reports. Please
send usage help requests such as your questions to u...@hadoop.apache.org.

I'm resolving this as Invalid; lets carry forward on your email instead. Many
have already answered you there. The key to tweak the default is
dfs.namenode.resource.du.reserved.

NameNode low on available disk space

Key: HDFS-4425
URL: https://issues.apache.org/jira/browse/HDFS-4425
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: project
Priority: Critical

Hi,
Namenode switches into safemode when it has low disk space on the root fs / i
have to manually run a command to leave it. Below are log messages for low
space on root / fs. Is there any parameter so that i can reduce reserved
amount.
2013-01-21 01:22:52,217 WARN
org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space
available on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the
configured reserved amount 104857600
2013-01-21 01:22:52,218 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on
available disk space. Entering safe mode.
2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe
mode is ON.

[jira] [Created] (HDFS-4255) Useless stacktrace shown in DN when there's an error writing a block

2012-12-01 Thread Harsh J (JIRA)

Harsh J created HDFS-4255:
-

 Summary: Useless stacktrace shown in DN when there's an error 
writing a block
 Key: HDFS-4255
 URL: https://issues.apache.org/jira/browse/HDFS-4255
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Priority: Minor


The DN sometimes carries these, especially when its asked to shutdown and 
there's ongoing write activity. The stacktrace is absolutely useless and may be 
improved, and the message it comes as part of is an INFO, which should not be 
the case when a stacktrace is necessary to be print (indicative of a trouble).

{code}
2012-12-01 19:10:23,167 INFO  datanode.DataNode (BlockReceiver.java:run(955)) - 
PacketResponder: 
BP-1493454111-192.168.2.1-1354369220726:blk_-8775461920430955284_1002, 
type=HAS_DOWNSTREAM_IN_PIPELINE
java.io.EOFException: Premature EOF: no length prefix available
at 
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:905)
at java.lang.Thread.run(Thread.java:680)
{code}

Full scenario log in comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4259) Improve pipeline DN replacement failure message

2012-12-01 Thread Harsh J (JIRA)

Harsh J created HDFS-4259:
-

 Summary: Improve pipeline DN replacement failure message
 Key: HDFS-4259
 URL: https://issues.apache.org/jira/browse/HDFS-4259
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Priority: Minor


The current message shown is something such as below:

bq. Failed to add a datanode. User may turn off this feature by setting 
X.policy in configuration, where the current policy is Y. (Nodes: 
current=[foo], original=[bar])

This reads off like failing is a feature (but the intention and the reason we 
hit this isn't indicated strongly), and can be bettered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-4069) File mode bits of some scripts in rpm package are incorrect

2012-10-17 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4069.
---

Resolution: Won't Fix

 File mode bits of some scripts in rpm package are incorrect
 ---

 Key: HDFS-4069
 URL: https://issues.apache.org/jira/browse/HDFS-4069
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 1.0.3, 1.1.0
 Environment: Fedora 17 3.3.4-5.fc17.x86_64, OpenJDK Runtime 
 Environment 1.7.0_06-icedtea, Rackspace Cloud
Reporter: Haoquan Wang
Priority: Minor
  Labels: patch
   Original Estimate: 1h
  Remaining Estimate: 1h

 These scripts should have execute permission(755). It only happens to rpm 
 package, deb package does not have this problem.
 {noformat}-rw-r--r--. 1 root root  2143 Oct  4 22:12 /usr/sbin/slaves.sh
 -rw-r--r--. 1 root root  1166 Oct  4 22:12 /usr/sbin/start-all.sh
 -rw-r--r--. 1 root root  1065 Oct  4 22:12 /usr/sbin/start-balancer.sh
 -rw-r--r--. 1 root root  1745 Oct  4 22:12 /usr/sbin/start-dfs.sh
 -rw-r--r--. 1 root root  1145 Oct  4 22:12 /usr/sbin/start-jobhistoryserver.sh
 -rw-r--r--. 1 root root  1259 Oct  4 22:12 /usr/sbin/start-mapred.sh
 -rw-r--r--. 1 root root  1119 Oct  4 22:12 /usr/sbin/stop-all.sh
 -rw-r--r--. 1 root root  1116 Oct  4 22:12 /usr/sbin/stop-balancer.sh
 -rw-r--r--. 1 root root  1246 Oct  4 22:12 /usr/sbin/stop-dfs.sh
 -rw-r--r--. 1 root root  1131 Oct  4 22:12 /usr/sbin/stop-jobhistoryserver.sh
 -rw-r--r--. 1 root root  1168 Oct  4 22:12 /usr/sbin/stop-mapred.sh
 -rw-r--r--. 1 root root  4210 Oct  4 22:12 
 /usr/sbin/update-hadoop-env.sh{noformat} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4002) Tool-ize OfflineImageViewer and make sure it returns proper return codes upon exit

2012-10-03 Thread Harsh J (JIRA)

Harsh J created HDFS-4002:
-

 Summary: Tool-ize OfflineImageViewer and make sure it returns 
proper return codes upon exit
 Key: HDFS-4002
 URL: https://issues.apache.org/jira/browse/HDFS-4002
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


We should make OfflineImageViewer structured (code-wise) in the same way as 
OfflineEditsViewer is. Particularly, OIV must implement the Tool interface, and 
must return proper exit codes upon success/failure conditions. Right now, it 
returns 0 in both successful parse and unsuccessful ones.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3968) TestPersistBlocks seems to fail intermittently

2012-09-23 Thread Harsh J (JIRA)

Harsh J created HDFS-3968:
-

 Summary: TestPersistBlocks seems to fail intermittently
 Key: HDFS-3968
 URL: https://issues.apache.org/jira/browse/HDFS-3968
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Harsh J


Received on HADOOP-8158:

{code}
-1 core tests. The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.TestPersistBlocks
{code}

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1503//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1503//console

But test seems to pass on my local build.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-2530) Add testcases for -n option of FSshell cat

2012-09-22 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-2530.
---

Resolution: Invalid

Hi Xie,

Please post the patch along with the parent JIRA itself, to keep the commits 
single for this new feature.

 Add testcases for -n option of FSshell cat
 --

 Key: HDFS-2530
 URL: https://issues.apache.org/jira/browse/HDFS-2530
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 0.24.0
Reporter: XieXianshan
Priority: Trivial
 Attachments: HDFS-2530.patch


 Add test cases for HADOOP-7795.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3886) Shutdown requests can possibly check for checkpoint issues (corrupted edits) and save a good namespace copy before closing down?

2012-09-01 Thread Harsh J (JIRA)

Harsh J created HDFS-3886:
-

 Summary: Shutdown requests can possibly check for checkpoint 
issues (corrupted edits) and save a good namespace copy before closing down?
 Key: HDFS-3886
 URL: https://issues.apache.org/jira/browse/HDFS-3886
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


HDFS-3878 sorta gives me this idea. Aside of having a method to download it to 
a different location, we can also lock up the namesystem (or deactivate the 
client rpc server) and save the namesystem before we complete up the shutdown.

The init.d/shutdown scripts would have to work with this somehow though, to not 
kill -9 it when in-process. Also, the new image may be stored in a 
shutdown.chkpt directory, to not interfere in the regular dirs, but still allow 
easier recovery.

Obviously this will still not work if all directories are broken. So maybe we 
could have some configs to tackle that as well?

I haven't thought this through, so let me know what part is wrong to do :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3801) Provide a way to disable browsing of the files from the web UI

2012-08-13 Thread Harsh J (JIRA)

Harsh J created HDFS-3801:
-

 Summary: Provide a way to disable browsing of the files from the 
web UI
 Key: HDFS-3801
 URL: https://issues.apache.org/jira/browse/HDFS-3801
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


A few times we've had requests from users who wish to disable browsing of the 
filesystem in the web UI completely, while keeping other servlet functionality 
enabled (such as fsck, etc.). Right now, the cheap way to do this is by 
blocking out the DN web port (50075) from access by clients, but that also 
hampers HFTP transfers.

We should instead provide a toggle config for the JSPs to use and disallow 
browsing if the toggle's enabled. The config can be true by default, to not 
change the behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3647) Backport HDFS-2868 (Add number of active transfer threads to the DataNode status) to branch-1

2012-07-13 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3647.
---

  Resolution: Fixed
   Fix Version/s: 1.2.0
Target Version/s:   (was: 1.2.0)
Hadoop Flags: Reviewed

Thanks Todd, I've committed this backport to branch-1.

 Backport HDFS-2868 (Add number of active transfer threads to the DataNode 
 status) to branch-1
 -

 Key: HDFS-3647
 URL: https://issues.apache.org/jira/browse/HDFS-3647
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, performance
Affects Versions: 0.20.2
Reporter: Steve Hoffman
Assignee: Harsh J
 Fix For: 1.2.0

 Attachments: HDFS-3647.patch, Screen Shot 2012-07-14 at 12.41.07 
 AM.png


 Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't 
 there.
 There is a lot of mystery surrounding how large to set 
 dfs.datanode.max.xcievers.  Most people say to just up it to 4096, but given 
 that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog 
 post here: http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), 
 it would be nice if we could expose the current count via the built-in 
 metrics framework (most likely under dfs).  In this way we could watch it to 
 see if we have it set too high, too low, time to bump it up, etc.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3628) The dfsadmin -setBalancerBandwidth command does not check for superuser privileges

2012-07-10 Thread Harsh J (JIRA)

Harsh J created HDFS-3628:
-

 Summary: The dfsadmin -setBalancerBandwidth command does not check 
for superuser privileges
 Key: HDFS-3628
 URL: https://issues.apache.org/jira/browse/HDFS-3628
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 0.23.0, 0.20.205.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Blocker


The changes from HDFS-2202 failed to add in a checkSuperuserPrivilege();, and 
hence any user (not admins alone) can reset the balancer bandwidth across the 
cluster if they wished to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3621) Add a main method to HdfsConfiguration, for debug purposes

2012-07-09 Thread Harsh J (JIRA)

Harsh J created HDFS-3621:
-

 Summary: Add a main method to HdfsConfiguration, for debug purposes
 Key: HDFS-3621
 URL: https://issues.apache.org/jira/browse/HDFS-3621
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Harsh J
Priority: Trivial


Just like Configuration has a main() func that dumps XML out for debug 
purposes, we should have a similar function under the HdfsConfiguration class 
that does the same. This is useful in testing out app classpath setups at times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3611) NameNode prints unnecessary WARNs about edit log normally skipping a few bytes

2012-07-08 Thread Harsh J (JIRA)

Harsh J created HDFS-3611:
-

 Summary: NameNode prints unnecessary WARNs about edit log normally 
skipping a few bytes
 Key: HDFS-3611
 URL: https://issues.apache.org/jira/browse/HDFS-3611
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Trivial


The NameNode currently warns these form of lines at every startup, even if 
there's no trouble really. For instance, the below is from a NN startup that 
was only just freshly formatted.

{code}
12/07/08 20:00:22 WARN namenode.EditLogInputStream: skipping 1048563 bytes at 
the end of edit log  
'/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/data/current/edits_003-003':
 reached txid 3 out of 3
{code}

If this skipping is not really a cause for warning, we should not log it at a 
WARN level but at an INFO or even DEBUG one. Avoids users getting unnecessarily 
concerned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3612) Single namenode image directory config warning can be improved

2012-07-08 Thread Harsh J (JIRA)

Harsh J created HDFS-3612:
-

 Summary: Single namenode image directory config warning can be 
improved
 Key: HDFS-3612
 URL: https://issues.apache.org/jira/browse/HDFS-3612
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Trivial


Currently, if you configure the NameNode to run with just one 
dfs.namenode.name.dir directory, it prints:

{code}
12/07/08 20:00:22 WARN namenode.FSNamesystem: Only one dfs.namenode.name.dir 
directory configured , beware data loss!{code}

We can improve this in a few ways as it is slightly ambiguous:
# Fix punctuation spacing, there's always a space after a punctuation mark but 
never before one.
# Perhaps the message is better printed with a reason of why it may cause a 
scare of data loss. For instance, we can print Detected a single storage 
directory in dfs.namenode.name.dir configuration. Beware of dataloss due to 
lack of redundant storage directories or so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3613) GSet prints some INFO level values, which aren't really very useful to all

2012-07-08 Thread Harsh J (JIRA)

Harsh J created HDFS-3613:
-

 Summary: GSet prints some INFO level values, which aren't really 
very useful to all
 Key: HDFS-3613
 URL: https://issues.apache.org/jira/browse/HDFS-3613
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Harsh J
Priority: Trivial


The following has long been seen in NameNode but I have never seen it being 
valued by anyone other than a HDFS developer:

{code}
12/07/08 20:00:22 INFO util.GSet: VM type   = 64-bit
12/07/08 20:00:22 INFO util.GSet: 2% max memory = 19.75 MB
12/07/08 20:00:22 INFO util.GSet: capacity  = 2^21 = 2097152 entries
12/07/08 20:00:22 INFO util.GSet: recommended=2097152, actual=2097152
{code}

Lets switch it down to DEBUG.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-1125) Removing a datanode (failed or decommissioned) should not require a namenode restart

2012-07-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-1125.
---

Resolution: Duplicate

Resolved via HDFS-1773. It was in the version after the one Allen tried above I 
think, thats why he may not have seen it? Please reopen if not.

 Removing a datanode (failed or decommissioned) should not require a namenode 
 restart
 

 Key: HDFS-1125
 URL: https://issues.apache.org/jira/browse/HDFS-1125
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.20.2
Reporter: Alex Loddengaard
Priority: Blocker

 I've heard of several Hadoop users using dfsadmin -report to monitor the 
 number of dead nodes, and alert if that number is not 0.  This mechanism 
 tends to work pretty well, except when a node is decommissioned or fails, 
 because then the namenode requires a restart for said node to be entirely 
 removed from HDFS.  More details here:
 http://markmail.org/search/?q=decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode#query:decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode+page:1+mid:7gwqwdkobgfuszb4+state:results
 Removal from the exclude file and a refresh should get rid of the dead node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3567) Provide a way to enforce clearing of trash data immediately

2012-06-26 Thread Harsh J (JIRA)

Harsh J created HDFS-3567:
-

 Summary: Provide a way to enforce clearing of trash data 
immediately
 Key: HDFS-3567
 URL: https://issues.apache.org/jira/browse/HDFS-3567
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 3.0.0
Reporter: Harsh J
Priority: Minor


As discussed at http://search-hadoop.com/m/r1lMa13eN7O, it would be good to 
have a dfsadmin sub-command (or similar) that admins can use to enforce a trash 
emptier option from the NameNode, instead of waiting for the trash clearance 
interval to pass. Can come handy when attempting to quickly delete away data in 
a filling up cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3570) Balancer shouldn't rely on DFS Space Used % as that ignores non-DFS used space

2012-06-26 Thread Harsh J (JIRA)

Harsh J created HDFS-3570:
-

 Summary: Balancer shouldn't rely on DFS Space Used % as that 
ignores non-DFS used space
 Key: HDFS-3570
 URL: https://issues.apache.org/jira/browse/HDFS-3570
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


Report from a user here: 
https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ,
 post archived at http://pastebin.com/eVFkk0A0

This user had a specific DN that had a large non-DFS usage among dfs.data.dirs, 
and very little DFS usage (which is computed against total possible capacity). 

Balancer apparently only looks at the usage, and ignores to consider that 
non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a DFS 
Usage report from DN is 8% only, its got a lot of free space to write more 
blocks, when that isn't true as shown by the case of this user. It went on 
scheduling writes to the DN to balance it out, but the DN simply can't accept 
any more blocks as a result of its disks' state.

I think it would be better if we _computed_ the actual utilization based on 
{{(100-(actual remaining space))/(capacity)}}, as opposed to the current {{(dfs 
used)/(capacity)}}. Thoughts?

This isn't very critical, however, cause it is very rare to see DN space being 
used for non DN data, but it does expose a valid bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3560) Enable automatic NameNode name-directory restore by default

2012-06-24 Thread Harsh J (JIRA)

Harsh J created HDFS-3560:
-

 Summary: Enable automatic NameNode name-directory restore by 
default
 Key: HDFS-3560
 URL: https://issues.apache.org/jira/browse/HDFS-3560
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 3.0.0
Reporter: Harsh J
Priority: Minor


HADOOP-4885 and several of its friends added in this feature since 0.21 
(versions of these are also in 1.x).

However, the feature is disabled by default currently. Since we've had it in 
use for a long time now, we should enable it by default (with any side-changes 
if necessary) - it is a helpful feature and since it has been working well for 
several users now without any issues, I do not see why it should be turned off 
by default anymore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HDFS-3522) If NN is in safemode, it should throw SafeModeException when getBlockLocations has zero locations

2012-06-12 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened HDFS-3522:
---


Reopening issue until comments and test are both addressed, just to have it 
noticed :)

 If NN is in safemode, it should throw SafeModeException when 
 getBlockLocations has zero locations
 -

 Key: HDFS-3522
 URL: https://issues.apache.org/jira/browse/HDFS-3522
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.0.1-alpha

 Attachments: HDFS-3522.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3522) If NN is in safemode, it should throw SafeModeException when getBlockLocations has zero locations

2012-06-12 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3522.
---

Resolution: Fixed

I am good with this change. Resolving. Thanks Nicholas for catching my blunder 
:)

 If NN is in safemode, it should throw SafeModeException when 
 getBlockLocations has zero locations
 -

 Key: HDFS-3522
 URL: https://issues.apache.org/jira/browse/HDFS-3522
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.0.1-alpha

 Attachments: HDFS-3522.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3475) Make the replication monitor multipliers configurable

2012-05-30 Thread Harsh J (JIRA)

Harsh J created HDFS-3475:
-

 Summary: Make the replication monitor multipliers configurable
 Key: HDFS-3475
 URL: https://issues.apache.org/jira/browse/HDFS-3475
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial


BlockManager currently hardcodes the following two constants:

{code}
private static final int INVALIDATE_WORK_PCT_PER_ITERATION = 32;
private static final int REPLICATION_WORK_MULTIPLIER_PER_ITERATION = 2;
{code}

These are used to throttle/limit the amount of deletion and 
replication-to-other-DN work done per heartbeat interval of a live DN.

Not many have had reasons to want these changed so far but there have been a 
few requests I've faced over the past year from a variety of clusters I've 
helped maintain. I think with the improvements in disks and network thats 
already started to be rolled out in production environments out there, changing 
these may start making sense to some.

Lets at least make it advanced-configurable with proper docs that warn 
adequately, with the defaults being what they are today. With hardcodes, it 
comes down to a recompile for admins, which is not something they may like.

Please let me know your thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3476) Correct the default used in TestDFSClientRetries.busyTest() after HDFS-3462

2012-05-30 Thread Harsh J (JIRA)

Harsh J created HDFS-3476:
-

 Summary: Correct the default used in 
TestDFSClientRetries.busyTest() after HDFS-3462
 Key: HDFS-3476
 URL: https://issues.apache.org/jira/browse/HDFS-3476
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 3.0.0
Reporter: Harsh J
Priority: Minor


Per Konstantin on HDFS-3462, the current default value specified in the changes 
made there is 0 and has to instead be the proper transceivers-# default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3358) Specify explicitly that the NN UI status total is talking of persistent objects on heap.

2012-05-03 Thread Harsh J (JIRA)

Harsh J created HDFS-3358:
-

 Summary: Specify explicitly that the NN UI status total is talking 
of persistent objects on heap.
 Key: HDFS-3358
 URL: https://issues.apache.org/jira/browse/HDFS-3358
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial


The NN shows, on its web UI, something like:

{{223 files and directories, 138 blocks = 361 total.}}

Followed by heap stats.

We should clarify this line is talking of objects and is related to the heap 
summaries. Perhaps just being explicit about java-terms would be nicer:

{{223 files and directories, 138 blocks = 361 total objects.}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3366) Some stacktraces are now too lengthy and sometimes no good

2012-05-03 Thread Harsh J (JIRA)

Harsh J created HDFS-3366:
-

 Summary: Some stacktraces are now too lengthy and sometimes no good
 Key: HDFS-3366
 URL: https://issues.apache.org/jira/browse/HDFS-3366
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Harsh J
Priority: Minor


This is a high-on-nitpick ticket for the benefit of troubleshooting.

This is partially related to all the PB-changes we've had. And also partially 
related to Java/JVMs.

Take a case of an AccessControlException, which is pretty common in HDFS 
permissions layer. We  now get, due to several more calls added at the RPC 
layer for PB (or maybe something else, if am mistaken):
{code}
Caused by: org.apache.hadoop.security.AccessControlException: 
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=yarn, access=WRITE, inode=/:hdfs:supergroup:drwxr-xr-x
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:186)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:135)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4204)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4175)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2565)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2529)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:640)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42618)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1204)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655)

at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:205)
at $Proxy10.mkdirs(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84)
at $Proxy10.mkdirs(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:430)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1717)
... 9 more
{code}

The 9 more is what I was looking for, to identify the caller to debug on/find 
the exact directory. However it now gets eaten away cause just the 
mkdir-to-exception trace itself has grown quite a bit. Comparing this to 0.20, 
we have much fewer calls and that helps us see at least the real caller of 
mkdirs.

I'm actually not sure what causes Java to print ... X more in these form of 
exception prints, but if thats controllable am all in favor of increasing its 
amount for HDFS (using new default java opts?). So that when an exception does 
occur, we don't get a nearly-unusable stacktrace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3366) Some stacktraces are now too lengthy and sometimes no good

2012-05-03 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-3366.
---

Resolution: Invalid

Actually, never mind. The 
http://stackoverflow.com/questions/1043378/print-full-call-stack-on-printstacktrace
 posts explain it all. And indeed, the docs be right.

This is an invalid ticket. Please excuse the noise. There's no trouble :)

 Some stacktraces are now too lengthy and sometimes no good
 --

 Key: HDFS-3366
 URL: https://issues.apache.org/jira/browse/HDFS-3366
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Harsh J
Priority: Minor

 This is a high-on-nitpick ticket for the benefit of troubleshooting.
 This is partially related to all the PB-changes we've had. And also partially 
 related to Java/JVMs.
 Take a case of an AccessControlException, which is pretty common in HDFS 
 permissions layer. We  now get, due to several more calls added at the RPC 
 layer for PB (or maybe something else, if am mistaken):
 {code}
 Caused by: org.apache.hadoop.security.AccessControlException: 
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=yarn, access=WRITE, inode=/:hdfs:supergroup:drwxr-xr-x
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:186)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:135)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4204)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4175)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2565)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2529)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:640)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42618)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1204)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:205)
   at $Proxy10.mkdirs(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84)
   at $Proxy10.mkdirs(Unknown Source)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:430)
   at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1717)
   ... 9 more
 {code}
 The 9 more is what I was looking for, to identify the caller to debug 
 on/find the exact directory. However it now gets eaten away cause just the 
 mkdir-to-exception trace itself has grown quite a bit. Comparing this to 
 0.20, we have much fewer calls and that helps us see at least the real caller 
 of mkdirs.
 I'm actually not sure what causes Java to print ... X more in these form of 
 exception prints, but if thats controllable am all in favor of increasing its 
 amount for HDFS (using new default java opts?). So that when an exception 
 does occur, we don't get a nearly-unusable stacktrace.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:

[jira] [Created] (HDFS-2349) DN should log a WARN, not an INFO when it detects a corruption during block transfer

2011-09-19 Thread Harsh J (JIRA)

DN should log a WARN, not an INFO when it detects a corruption during block 
transfer


 Key: HDFS-2349
 URL: https://issues.apache.org/jira/browse/HDFS-2349
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.20.204.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Fix For: 0.24.0


Currently, in DataNode.java, we have:

{code}

  LOG.info(Can't replicate block  + block
  +  because on-disk length  + onDiskLength 
  +  is shorter than NameNode recorded length  + block.getNumBytes());

{code}

This log is better off as a WARN as it indicates (and also reports) a 
corruption.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-2311) With just 1 block on the HDFS cluster, NN exits safemode on startup immediately.

2011-09-05 Thread Harsh J (JIRA)

With just 1 block on the HDFS cluster, NN exits safemode on startup immediately.


 Key: HDFS-2311
 URL: https://issues.apache.org/jira/browse/HDFS-2311
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.203.0
Reporter: Harsh J
Priority: Minor


This is cause:

(int) ((1 Block) * (0.999f Threshold Default Pct)) == 0, in which case 
SafeModeInfo's checks of modes makes an simple, direct exit of the safemode.

Faulty code is possibly in FSNamesystem#setBlockTotal. This is a non major 
issue since with 2 blocks it would work fine, and will work fine with 1.0f 
Threshold Pct too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-112) ClusterTestDFS fails

2011-07-17 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Harsh J resolved HDFS-112.
--

Resolution: Not A Problem

This JIRA has grown stale over years and needs to be closed. The tests
framework has changed considerably since '06.

With the current mini clusters, giving a hosts array is possible for different
hostnamed daemons, and by all the tests it carries, it does appear to work
alright if you wanna use it for such purposes.

ClusterTestDFS fails

Key: HDFS-112
URL: https://issues.apache.org/jira/browse/HDFS-112
Project: Hadoop HDFS
Issue Type: Bug
Environment: local workstation (windows)
Reporter: alan wootton
Assignee: Sameer Paranjpye
Attachments: ClusterTestFixes.patch, fix_clustertestdfs.patch

The dfs unit tests, from the ant target 'cluster' have been failing.
(ClusterTestDFSNamespaceLogging, ClusterTestDFS). I don't know if anyone but
me cares about these tests, but I do. I would like to write better tests for
dns. I think we all need that.
They have been partially broken since test.dfs.same.host.targets.allowed
went away and replication ceased for these tests.
They got really broken when NameNode stopped automatically formatting itself .
Since they seem to be ignored, I took the liberty of changing how they work.
The main thing is, you must put this into your hosts file:
127.0.0.1 localhost0
127.0.0.1 localhost1
127.0.0.1 localhost2
127.0.0.1 localhost3
127.0.0.1 localhost4
127.0.0.1 localhost5
127.0.0.1 localhost6
127.0.0.1 localhost7
127.0.0.1 localhost8
127.0.0.1 localhost9
127.0.0.1 localhost10
127.0.0.1 localhost11
127.0.0.1 localhost12
127.0.0.1 localhost13
127.0.0.1 localhost14
127.0.0.1 localhost15
This way you can start DataNodes, and TaskTrackers (up to 16 of them) with
unique hostnames.
Also, I changed all the places that used to call
InetAddress.getLocalHost().getHostName() to get it from a new method in
Configuration (this issue is the same as
http://issues.apache.org/jira/browse/HADOOP-197 ).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-70) Data node should shutdown when a critical error is returned by the name node

2011-07-17 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-70?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-70.
-

Resolution: Won't Fix

HADOOP-266 was resolved as a Won't Fix and the DN currently works OK with the 
way it analyzes the exception classnames and determines if it has to shutdown.

Marking this one as Won't Fix as well, following HADOOP-266 :)

 Data node should shutdown when a critical error is returned by the name node
 --

 Key: HDFS-70
 URL: https://issues.apache.org/jira/browse/HDFS-70
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Konstantin Shvachko
Assignee: Sameer Paranjpye
Priority: Minor

 Currently data node does not distinguish between critical and non critical 
 exceptions.
 Any exception is treated as a signal to sleep and then try again. See
 org.apache.hadoop.dfs.DataNode.run()
 This is happening because RPC always throws the same RemoteException.
 In some cases (like UnregisteredDatanodeException, IncorrectVersionException) 
 the data 
 node should shutdown rather than retry.
 This logic naturally belongs to the 
 org.apache.hadoop.dfs.DataNode.offerService()
 but can be reasonably implemented (without examining the 
 RemoteException.className 
 field) after HADOOP-266 (2) is fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-339) Periodically move blocks from full nodes to those with space

2011-07-17 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Harsh J resolved HDFS-339.
--

Resolution: Not A Problem

The introduction of the DFS Balancer (post '06, apparently) provides the
feature of moving blocks around and balancing the DFS DNs.

I personally do not think its a good idea to add a monitor to the NN for
auto-triggering balancers since it would use up bandwidth without the
user/admin ever knowing about it. One could surely write external tools to
achieve this monitoring and running separately though.

Resolving a Not-A-Problem, but do reopen if you feel strongly that the NN would
really benefit from an additional service as this.

Periodically move blocks from full nodes to those with space
-

Key: HDFS-339
URL: https://issues.apache.org/jira/browse/HDFS-339
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Bryan Pendleton
Assignee: Sameer Paranjpye

Continuance of Hadoop-386. The patch to that issue makes it possible to
redistribute blocks (change replication up, wait for replication to succeed,
then lower replication again). However, this requires a lot more space, is
not automatic, and doesn't respect a reasonable I/O limit. I have actually
had MapReduce jobs fail from block missing execptions after having recently
changed the replication level (from 3 to 4, with no underreplications to
start with) because the datanodes were too slow responding to requests while
performing the necessary replications.
A good fix to this problem would be a low-priority thread on the NameNode
that schedules low-priority replications of blocks on over-full machines,
followed by the removal of the extra replications. It might be worth having a
specific prototocol for asking for these low-priority copies to happen in the
datanodes, so that they continue to service (and be available to service)
normal block requests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-251) Automatically increase replication of often used files/blocks

2011-07-17 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-251.
--

Resolution: Duplicate

Duplicate of https://issues.apache.org/jira/browse/HDFS-782 (closing this one, 
since the other has more discussions in it regarding dynamic replication)

 Automatically increase replication of often used files/blocks
 -

 Key: HDFS-251
 URL: https://issues.apache.org/jira/browse/HDFS-251
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Johan Oskarsson
Assignee: Sameer Paranjpye

 It would be interesting to see if a patch that makes the namenode save the 
 number of times a certain
 file (or block if possible) is used. And then increase the replication of 
 these files to increase performance.
 Any ideas on how to implement?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-128) downloading a file from dfs using the WI, using firefox, creates local files that start with a '-'

2011-07-17 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-128.
--

Resolution: Not A Problem

This is more of a browser issue. For instance, Chrome chooses to replace the 
filename's {{/}} as {{_}} instead of a {{-}} like Safari and Firefox use.

I do not think HDFS can/should take an action here and tweak filenames while 
sending things out (if you see the headers, the filename is intact with {{/}}s.

 downloading a file from dfs using the WI, using firefox, creates local files 
 that start with a '-'
 --

 Key: HDFS-128
 URL: https://issues.apache.org/jira/browse/HDFS-128
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yoram Arnon
Assignee: Sameer Paranjpye
Priority: Minor

 '/' characters are converted to '-' when downloading a file from dfs using 
 the WI. That's a good thing.
 But using firefox, where file names can not be modified when saving to disk, 
 creates local files that start with a '-', which is inconvenient on some OS's.
 the first '/' character should be dropped rather than converted to a '-'

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-186) directory browser can not list all the entries for a large directory

2011-07-17 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-186.
--

Resolution: Not A Problem

I just loaded 2000 files to test this. The web browser loads the list just fine.

Even with larger numbers, I think it'd only take time depending on the NN's 
responsiveness. Probably best not to browse such directories in the browser 
page by page and get to things directly via the URLs.

Resolving as not a problem. (This was filed circa '07)

 directory browser can not  list all the entries for a large directory
 -

 Key: HDFS-186
 URL: https://issues.apache.org/jira/browse/HDFS-186
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: IE Firefox Safari
Reporter: Hairong Kuang
Assignee: Sameer Paranjpye

 When browsing a large directory, for example, one with 500 files, web browser 
 is not able to display all the entries. Instead, it stops loading the page in 
 the middle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-2111) Add tests for ensuring that the DN will start with a few bad data directories (Part 1 of testing DiskChecker)

2011-06-28 Thread Harsh J (JIRA)

Add tests for ensuring that the DN will start with a few bad data directories 
(Part 1 of testing DiskChecker)
-

 Key: HDFS-2111
 URL: https://issues.apache.org/jira/browse/HDFS-2111
 Project: Hadoop HDFS
  Issue Type: Test
  Components: data-node, test
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
 Fix For: 0.23.0


Add tests to ensure that given multiple data dirs, if a single is bad, the DN 
should still start up.

This is to check DiskChecker's functionality used in instantiating DataNodes

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HDFS-1454) Update the documentation to reflect true client caching strategy

2011-06-05 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened HDFS-1454:
---


@Eli - Failed to actually verify the entire documents. Patch was intended to 
only remove staging, but I should've fixed other things as well I guess.

 Update the documentation to reflect true client caching strategy
 

 Key: HDFS-1454
 URL: https://issues.apache.org/jira/browse/HDFS-1454
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, hdfs client
Affects Versions: 0.20.2
Reporter: Jeff Hammerbacher
Assignee: Harsh J
 Fix For: 0.22.0

 Attachments: HDFS-1454.r1.diff


 As noted on the mailing list 
 (http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/%3CAANLkTi=2csK+aY05bTOuO-UZv=o4w6ox2pq4nxgpd...@mail.gmail.com%3E),
  the Staging section of 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_design.html#Data+Organization 
 is out of date.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-1454) Update the documentation to reflect true client caching strategy

2011-06-05 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-1454.
---

Resolution: Fixed

Re-resolving per Todd's suggestion.

Opened: HDFS-2036

 Update the documentation to reflect true client caching strategy
 

 Key: HDFS-1454
 URL: https://issues.apache.org/jira/browse/HDFS-1454
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, hdfs client
Affects Versions: 0.20.2
Reporter: Jeff Hammerbacher
Assignee: Harsh J
 Fix For: 0.22.0

 Attachments: HDFS-1454-reop.r1.diff, HDFS-1454-reop.r1.diff, 
 HDFS-1454.r1.diff


 As noted on the mailing list 
 (http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/%3CAANLkTi=2csK+aY05bTOuO-UZv=o4w6ox2pq4nxgpd...@mail.gmail.com%3E),
  the Staging section of 
 http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_design.html#Data+Organization 
 is out of date.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-2036) Revise the HDFS design documentation

2011-06-05 Thread Harsh J (JIRA)

Revise the HDFS design documentation


 Key: HDFS-2036
 URL: https://issues.apache.org/jira/browse/HDFS-2036
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.20.2
Reporter: Harsh J
 Attachments: HDFS-2036.r1.diff

Although HDFS-1454 covered one change related to the staging feature, I think 
it would be a better idea to revise the entire document once more for any stale 
info it may carry (cause misleading of new adopters).

Attached is one fix that fixes the default packet size (was: 4 KB, is: 64 KB).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

95 matches

Mail list logo