[jira] [Reopened] (HADOOP-10434) Is it possible to use "df" to calculate the dfs usage instead of "du"
[ https://issues.apache.org/jira/browse/HADOOP-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened HADOOP-10434: -- Reopening to close as Duplicate status vs. Fixed. > Is it possible to use "df" to calculate the dfs usage instead of "du" > - > > Key: HADOOP-10434 > URL: https://issues.apache.org/jira/browse/HADOOP-10434 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.3.0 >Reporter: MaoYuan Xian >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HADOOP-10434-1.patch > > > When we run datanode from the machine with big disk volume, it's found du > operations from org.apache.hadoop.fs.DU's DURefreshThread cost lots of disk > performance. > As we use the whole disk for hdfs storage, it is possible calculate volume > usage via "df" command. Is it necessary adding the "df" option for usage > calculation in hdfs > (org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice)? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-10434) Is it possible to use "df" to calculate the dfs usage instead of "du"
[ https://issues.apache.org/jira/browse/HADOOP-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-10434. -- Resolution: Duplicate > Is it possible to use "df" to calculate the dfs usage instead of "du" > - > > Key: HADOOP-10434 > URL: https://issues.apache.org/jira/browse/HADOOP-10434 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.3.0 >Reporter: MaoYuan Xian >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 2.8.0 > > Attachments: HADOOP-10434-1.patch > > > When we run datanode from the machine with big disk volume, it's found du > operations from org.apache.hadoop.fs.DU's DURefreshThread cost lots of disk > performance. > As we use the whole disk for hdfs storage, it is possible calculate volume > usage via "df" command. Is it necessary adding the "df" option for usage > calculation in hdfs > (org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice)? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13817) Add a finite shell command timeout to ShellBasedUnixGroupsMapping
Harsh J created HADOOP-13817: Summary: Add a finite shell command timeout to ShellBasedUnixGroupsMapping Key: HADOOP-13817 URL: https://issues.apache.org/jira/browse/HADOOP-13817 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 2.6.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor The ShellBasedUnixGroupsMapping run various {{id}} commands via the ShellCommandExecutor modules without a timeout set (its set to 0, which implies infinite). If this command hangs for a long time on the OS end due to an unresponsive groups backend or other reasons, it also blocks the handlers that use it on the NameNode (or other services that use this class). That inadvertently causes odd timeout troubles on the client end where its forced to retry (only to likely run into such hangs again with every attempt until at least one command returns). It would be helpful to have a finite command timeout after which we may give up on the command and return the result equivalent of no groups found. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-8134) DNS claims to return a hostname but returns a PTR record in some cases
[ https://issues.apache.org/jira/browse/HADOOP-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-8134. - Resolution: Not A Problem Assignee: (was: Harsh J) This hasn't proven as a problem in late. Closing as stale. > DNS claims to return a hostname but returns a PTR record in some cases > -- > > Key: HADOOP-8134 > URL: https://issues.apache.org/jira/browse/HADOOP-8134 > Project: Hadoop Common > Issue Type: Bug > Components: util >Affects Versions: 0.23.0 >Reporter: Harsh J >Priority: Minor > > Per Shrijeet on HBASE-4109: > {quote} > If you are using an interface anything other than 'default' (literally that > keyword) DNS.java's getDefaultHost will return a string which will have a > trailing period at the end. It seems javadoc of reverseDns in DNS.java (see > below) is conflicting with what that function is actually doing. > It is returning a PTR record while claims it returns a hostname. The PTR > record always has period at the end , RFC: > http://irbs.net/bog-4.9.5/bog47.html > We make call to DNS.getDefaultHost at more than one places and treat that as > actual hostname. > Quoting HRegionServer for example > String machineName = DNS.getDefaultHost(conf.get( > "hbase.regionserver.dns.interface", "default"), conf.get( > "hbase.regionserver.dns.nameserver", "default")); > We may want to sanitize the string returned from DNS class. Or better we can > take a path of overhauling the way we do DNS name matching all over. > {quote} > While HBase has worked around the issue, we should fix the methods that > aren't doing what they've intended. > 1. We fix the method. This may be an 'incompatible change'. But I do not know > who outside of us uses DNS classes. > 2. We fix HDFS's DN at the calling end, cause that is affected by the > trailing period in its reporting back to the NN as well (Just affects NN->DN > weblinks, non critical). > For 2, we can close this and open a HDFS JIRA. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-7505) EOFException in RPC stack should have a nicer error message
[ https://issues.apache.org/jira/browse/HADOOP-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-7505. - Resolution: Duplicate Assignee: (was: Harsh J) This seems to be taken care (in part) via HADOOP-7346 > EOFException in RPC stack should have a nicer error message > --- > > Key: HADOOP-7505 > URL: https://issues.apache.org/jira/browse/HADOOP-7505 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Affects Versions: 0.23.0 >Reporter: Eli Collins >Priority: Minor > > Lots of user logs involve a user running mismatched versions, and for some > reason or another, they get EOFException instead of a proper version mismatch > exception. We should be able to catch this at appropriate points, and have a > nicer exception message explaining that it's a possible version mismatch, or > that they're trying to connect to the incorrect port. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-8579) Websites for HDFS and MapReduce both send users to video training resource which is non-public
[ https://issues.apache.org/jira/browse/HADOOP-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-8579. - Resolution: Not A Problem Assignee: (was: Harsh J) This does not appear to be a problem after the project re-merge. > Websites for HDFS and MapReduce both send users to video training resource > which is non-public > -- > > Key: HADOOP-8579 > URL: https://issues.apache.org/jira/browse/HADOOP-8579 > Project: Hadoop Common > Issue Type: Bug > Environment: website >Reporter: David L. Willson >Priority: Minor > Original Estimate: 2h > Remaining Estimate: 2h > > Main pages for HDFS and MapReduce send new user to unavailable training > resource. > These two pages: > http://hadoop.apache.org/mapreduce/ > http://hadoop.apache.org/hdfs/ > Link to this page: > http://vimeo.com/3584536 > That page is not public, and not shared to all registered Vimeo users, and I > see nothing indicating how to ask for access to the resource. > Please make the vids public, or remove the link of disappointment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-8863) Eclipse plugin may not be working on Juno due to changes in it
[ https://issues.apache.org/jira/browse/HADOOP-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-8863. - Resolution: Won't Fix Assignee: (was: Harsh J) The eclipse plugin is formally out. > Eclipse plugin may not be working on Juno due to changes in it > -- > > Key: HADOOP-8863 > URL: https://issues.apache.org/jira/browse/HADOOP-8863 > Project: Hadoop Common > Issue Type: Bug > Components: contrib/eclipse-plugin >Affects Versions: 1.2.0 >Reporter: Harsh J > > We need to debug/investigate why it is so. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13515) Redundant transitionToActive call can cause a NameNode to crash
Harsh J created HADOOP-13515: Summary: Redundant transitionToActive call can cause a NameNode to crash Key: HADOOP-13515 URL: https://issues.apache.org/jira/browse/HADOOP-13515 Project: Hadoop Common Issue Type: Bug Components: ha Affects Versions: 2.5.0 Reporter: Harsh J Priority: Minor The situation in parts is similar to HADOOP-8217, but the cause is different and so is the result. Consider this situation: - At the beginning NN1 is Active, NN2 is Standby - ZKFC1 faces a ZK disconnect (not a session timeout, just a socket disconnect) and thereby reconnects {code} 2016-08-11 07:00:46,068 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 4000ms for sessionid 0x4566f0c97500bd9, closing socket connection and attempting reconnect 2016-08-11 07:00:46,169 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode... … 2016-08-11 07:00:46,610 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected. {code} - The reconnection on the ZKFC1 triggers the elector code, and the elector re-run finds that NN1 should be the new active (a redundant decision cause NN1 is already active) {code} 2016-08-11 07:00:46,615 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced... 2016-08-11 07:00:46,630 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: … 2016-08-11 07:00:46,630 INFO org.apache.hadoop.ha.ActiveStandbyElector: But old node has our own data, so don't need to fence it. {code} - The ZKFC1 sets the new ZK data, and fires a NN1 RPC call of transitionToActive {code} 2016-08-11 07:00:46,630 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /hadoop-ha/nameservice1/ActiveBreadCrumb to indicate that the local node is the most recent active... 2016-08-11 07:00:46,649 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 175: Call -> nn01/10.10.10.10:8022: transitionToActive {reqInfo { reqSource: REQUEST_BY_ZKFC }} {code} - At the same time as the transitionToActive call is in progress at NN1, but not complete yet, the ZK session of ZKFC1 is timed out by ZK Quorum, and a watch notification is sent to ZKFC2 {code} 2016-08-11 07:01:00,003 DEBUG org.apache.zookeeper.ClientCnxn: Got notification sessionid:0x4566f0c97500bde 2016-08-11 07:01:00,004 DEBUG org.apache.zookeeper.ClientCnxn: Got WatchedEvent state:SyncConnected type:NodeDeleted path:/hadoop-ha/nameservice1/ActiveStandbyElectorLock for sessionid 0x4566f0c97500bde {code} - ZKFC2 responds by marking NN2 as standby, which succeeds (NN hasn't handled transitionToActive call yet due to busy status, but has handled transitionToStandby before it) {code} 2016-08-11 07:01:00,013 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced... 2016-08-11 07:01:00,018 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at nn01/10.10.10.10:8022 2016-08-11 07:01:00,020 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 412: Call -> nn01/10.10.10.10:8022: transitionToStandby {reqInfo { reqSource: REQUEST_BY_ZKFC }} 2016-08-11 07:01:03,880 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: transitionToStandby took 3860ms {code} - ZKFC2 then marks NN2 as active, and NN2 begins its transition (is in midst of it, not done yet at this point) {code} 2016-08-11 07:01:03,894 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at nn02/11.11.11.11:8022 active... 2016-08-11 07:01:03,895 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 412: Call -> nn02/11.11.11.11:8022: transitionToActive {reqInfo { reqSource: REQUEST_BY_ZKFC }} … {code} {code} 2016-08-11 07:01:09,558 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state … 2016-08-11 07:01:19,968 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 5635 {code} - At the same time in parallel NN1 processes the transitionToActive requests finally, and becomes active {code} 2016-08-11 07:01:13,281 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state … 2016-08-11 07:01:19,599 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 5635 … 2016-08-11 07:01:19,602 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 5635 {code} - NN2's active transition fails as a result of this parallel active transition on NN1 which has completed right before it tries to take over {code} 2016-08-11 07:01:19,968 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 5635 2016-08-11 07:01:22,799 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Error encountered requiring NN
[jira] [Created] (HADOOP-13056) Print expected values when rejecting a server's determined principal
Harsh J created HADOOP-13056: Summary: Print expected values when rejecting a server's determined principal Key: HADOOP-13056 URL: https://issues.apache.org/jira/browse/HADOOP-13056 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.5.0 Reporter: Harsh J Assignee: Harsh J Priority: Trivial When an address-constructed service principal by a client does not match a provided pattern or the configured principal property, the error is very uninformative on what the specific cause is. Currently the only error printed is, in both cases: {code} java.lang.IllegalArgumentException: Server has invalid Kerberos principal: hdfs/host.internal@REALM {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-13051) Test for special characters in path being respected during globPaths
Harsh J created HADOOP-13051: Summary: Test for special characters in path being respected during globPaths Key: HADOOP-13051 URL: https://issues.apache.org/jira/browse/HADOOP-13051 Project: Hadoop Common Issue Type: Test Components: fs Affects Versions: 3.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor On {{branch-2}}, the below is the (incorrect) behaviour today, where paths with special characters get dropped during globStatus calls: {code} bin/hdfs dfs -mkdir /foo bin/hdfs dfs -touchz /foo/foo1 bin/hdfs dfs -touchz $'/foo/foo1\r' bin/hdfs dfs -ls '/foo/*' -rw-r--r-- 3 harsh supergroup 0 2016-04-22 17:35 /foo/foo1 -rw-r--r-- 3 harsh supergroup 0 2016-04-22 17:35 /foo/foo1^M bin/hdfs dfs -ls '/foo/*' -rw-r--r-- 3 harsh supergroup 0 2016-04-22 17:35 /foo/foo1 {code} Whereas trunk has the right behaviour, subtly fixed via the pattern library change of HADOOP-12436: {code} bin/hdfs dfs -mkdir /foo bin/hdfs dfs -touchz /foo/foo1 bin/hdfs dfs -touchz $'/foo/foo1\r' bin/hdfs dfs -ls '/foo/*' -rw-r--r-- 3 harsh supergroup 0 2016-04-22 17:35 /foo/foo1 -rw-r--r-- 3 harsh supergroup 0 2016-04-22 17:35 /foo/foo1^M bin/hdfs dfs -ls '/foo/*' -rw-r--r-- 3 harsh supergroup 0 2016-04-22 17:35 /foo/foo1 -rw-r--r-- 3 harsh supergroup 0 2016-04-22 17:35 /foo/foo1^M {code} (I've placed a ^M explicitly to indicate presence of the intentional hidden character) We should still add a simple test-case to cover this situation for future regressions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12970) Intermittent signature match failures in S3AFileSystem due connection closure
Harsh J created HADOOP-12970: Summary: Intermittent signature match failures in S3AFileSystem due connection closure Key: HADOOP-12970 URL: https://issues.apache.org/jira/browse/HADOOP-12970 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.7.0 Reporter: Harsh J Assignee: Harsh J S3AFileSystem's use of {{ObjectMetadata#clone()}} method inside the {{copyFile}} implementation may fail in circumstances where the connection used for obtaining the metadata is closed by the server (i.e. response carries a {{Connection: close}} header). Due to this header not being stripped away when the {{ObjectMetadata}} is created, and due to us cloning it for use in the next {{CopyObjectRequest}}, it causes the request to use {{Connection: close}} headers as a part of itself. This causes signer related exceptions because the client now includes the {{Connection}} header as part of the {{SignedHeaders}}, but the S3 server does not receive the same value for it ({{Connection}} headers are likely stripped away before the S3 Server tries to match signature hashes), causing a failure like below: {code} 2016-03-29 19:59:30,120 DEBUG [s3a-transfer-shared--pool1-t35] org.apache.http.wire: >> "Authorization: AWS4-HMAC-SHA256 Credential=XXX/20160329/eu-central-1/s3/aws4_request, SignedHeaders=accept-ranges;connection;content-length;content-type;etag;host;last-modified;user-agent;x-amz-acl;x-amz-content-sha256;x-amz-copy-source;x-amz-date;x-amz-metadata-directive;x-amz-server-side-encryption;x-amz-version-id, Signature=MNOPQRSTUVWXYZ[\r][\n]" … com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: ABC), S3 Extended Request ID: XYZ {code} This is intermittent because the S3 Server does not always add a {{Connection: close}} directive in its response, but whenever we receive it AND we clone it, the above exception would happen for the copy request. The copy request is often used in the context of FileOutputCommitter, when a lot of the MR attempt files on {{s3a://}} destination filesystem are to be moved to their parent directories post-commit. I've also submitted a fix upstream with AWS Java SDK to strip out the {{Connection}} headers when dealing with {{ObjectMetadata}}, which is pending acceptance and release at: https://github.com/aws/aws-sdk-java/pull/669, but until that release is available and can be used by us, we'll need to workaround the clone approach by manually excluding the {{Connection}} header (not straight-forward due to the {{metadata}} object being private with no mutable access). We can remove such a change in future when there's a release available with the upstream fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12894) Add yarn.app.mapreduce.am.log.level to mapred-default.xml
Harsh J created HADOOP-12894: Summary: Add yarn.app.mapreduce.am.log.level to mapred-default.xml Key: HADOOP-12894 URL: https://issues.apache.org/jira/browse/HADOOP-12894 Project: Hadoop Common Issue Type: Improvement Components: documentation Affects Versions: 2.9.0 Reporter: Harsh J Assignee: Harsh J Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12549) Extend HDFS-7456 default generically to all pattern lookups
Harsh J created HADOOP-12549: Summary: Extend HDFS-7456 default generically to all pattern lookups Key: HADOOP-12549 URL: https://issues.apache.org/jira/browse/HADOOP-12549 Project: Hadoop Common Issue Type: Improvement Components: ipc, security Affects Versions: 2.7.1 Reporter: Harsh J Assignee: Harsh J Priority: Minor In HDFS-7546 we added a hdfs-default.xml property to bring back the regular behaviour of trusting all principals (as was the case before HADOOP-9789). However, the change only targeted HDFS users and also only those that used the default-loading mechanism of Configuration class (i.e. not {{new Configuration(false)}} users). I'd like to propose adding the same default to the generic RPC client code also, so the default affects all form of clients equally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-9461) JobTracker and NameNode both grant delegation tokens to non-secure clients
[ https://issues.apache.org/jira/browse/HADOOP-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-9461. - Resolution: Won't Fix Not an issue on trunk/branch-2. JobTracker and NameNode both grant delegation tokens to non-secure clients -- Key: HADOOP-9461 URL: https://issues.apache.org/jira/browse/HADOOP-9461 Project: Hadoop Common Issue Type: Bug Components: security Reporter: Harsh J Assignee: Harsh J Priority: Minor If one looks at the MAPREDUCE-1516 added logic in JobTracker.java's isAllowedDelegationTokenOp() method, and apply non-secure states of UGI.isSecurityEnabled == false and authMethod == SIMPLE, the return result is true when the intention is false (due to the shorted conditionals). This is allowing non-secure JobClients to easily request and use DelegationTokens and cause unwanted errors to be printed in the JobTracker when the renewer attempts to run. Ideally such clients ought to get an error if they request a DT in non-secure mode. HDFS in trunk and branch-1 both too have the same problem. Trunk MR (HistoryServer) and YARN are however, unaffected due to a simpler, inlined logic instead of reuse of this faulty method. Note that fixing this will break Oozie today, due to the merged logic of OOZIE-734. Oozie will require a fix as well if this is to be fixed in branch-1. As a result, I'm going to mark this as an Incompatible Change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11512) Use getTrimmedStrings when reading serialization keys
Harsh J created HADOOP-11512: Summary: Use getTrimmedStrings when reading serialization keys Key: HADOOP-11512 URL: https://issues.apache.org/jira/browse/HADOOP-11512 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.6.0 Reporter: Harsh J Priority: Minor In the file {{hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/serializer/SerializationFactory.java}}, we grab the IO_SERIALIZATIONS_KEY config as Configuration#getStrings(…) which does not trim the input. This could cause confusing user issues if someone manually overrides the key in the XML files/Configuration object without using the dynamic approach. The call should instead use Configuration#getTrimmedStrings(…), so the whitespace is trimmed before the class names are searched on the classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11488) Difference in default connection timeout for S3A FS
Harsh J created HADOOP-11488: Summary: Difference in default connection timeout for S3A FS Key: HADOOP-11488 URL: https://issues.apache.org/jira/browse/HADOOP-11488 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.6.0 Reporter: Harsh J Priority: Minor The core-default.xml defines fs.s3a.connection.timeout as 5000, and the code under hadoop-tools/hadoop-aws defines it as 5. We should update the former to 50s so it gets taken properly, as we're also noticing that 5s is often too low, especially in cases such as large DistCp operations (which fail with {{Read timed out}} errors from the S3 service). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11224) Improve error messages for all permission related failures
Harsh J created HADOOP-11224: Summary: Improve error messages for all permission related failures Key: HADOOP-11224 URL: https://issues.apache.org/jira/browse/HADOOP-11224 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.2.0 Reporter: Harsh J Priority: Trivial If a bad file create request fails, you get a juicy error self-describing the reason almost: {code}Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode=/:hdfs:supergroup:drwxr-xr-x{code} However, if a setPermission fails, one only gets a vague: {code}Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied{code} It would be nicer if all forms of permission failures logged the accessed inode and current ownership and permissions in the same way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-8719) Workaround for kerberos-related log errors upon running any hadoop command on OSX
[ https://issues.apache.org/jira/browse/HADOOP-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-8719. - Resolution: Fixed When this was committed, OSX was not a targeted platform for security or native support. If that has changed recently, lets revert this fix over a new JIRA - I see no issues with doing that. The fix here merely got rid of a verbose warning appearing unnecessarily over unsecured pseudo-distributed clusters running on OSX. Re-resolving. Thanks! Workaround for kerberos-related log errors upon running any hadoop command on OSX - Key: HADOOP-8719 URL: https://issues.apache.org/jira/browse/HADOOP-8719 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.0-alpha Environment: Mac OS X 10.7, Java 1.6.0_26 Reporter: Jianbin Wei Priority: Trivial Fix For: 3.0.0 Attachments: HADOOP-8719.patch, HADOOP-8719.patch, HADOOP-8719.patch, HADOOP-8719.patch When starting Hadoop on OS X 10.7 (Lion) using start-all.sh, Hadoop logs the following errors: 2011-07-28 11:45:31.469 java[77427:1a03] Unable to load realm info from SCDynamicStore Hadoop does seem to function properly despite this. The workaround takes only 10 minutes. There are numerous discussions about this: google Unable to load realm mapping info from SCDynamicStore returns 1770 hits. Each one has many discussions. Assume each discussion take only 5 minute, a 10-minute fix can save ~150 hours. This does not count much search of this issue and its solution/workaround, which can easily hit (wasted) thousands of hours!!! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-10707) support bzip2 in python avro tool
[ https://issues.apache.org/jira/browse/HADOOP-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-10707. -- Resolution: Invalid Moved to AVRO-1527 support bzip2 in python avro tool - Key: HADOOP-10707 URL: https://issues.apache.org/jira/browse/HADOOP-10707 Project: Hadoop Common Issue Type: Improvement Components: tools Reporter: Eustache Priority: Minor Labels: avro The Python tool to decode avro files is currently missing support for bzip2 compression. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10572) Example NFS mount command must pass noacl as it isn't supported by the server yet
Harsh J created HADOOP-10572: Summary: Example NFS mount command must pass noacl as it isn't supported by the server yet Key: HADOOP-10572 URL: https://issues.apache.org/jira/browse/HADOOP-10572 Project: Hadoop Common Issue Type: Improvement Components: nfs Affects Versions: 2.4.0 Reporter: Harsh J Priority: Trivial Use of the documented default mount command results in the below server side log WARN event, cause the client tries to locate the ACL program (#100227): {code} 12:26:11.975 AM TRACE org.apache.hadoop.oncrpc.RpcCall Xid:-1114380537, messageType:RPC_CALL, rpcVersion:2, program:100227, version:3, procedure:0, credential:(AuthFlavor:AUTH_NONE), verifier:(AuthFlavor:AUTH_NONE) 12:26:11.976 AM TRACE org.apache.hadoop.oncrpc.RpcProgram NFS3 procedure #0 12:26:11.976 AM WARNorg.apache.hadoop.oncrpc.RpcProgram Invalid RPC call program 100227 {code} The client mount command must pass {{noacl}} to avoid this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-10002) Tool's config option wouldn't work on secure clusters
[ https://issues.apache.org/jira/browse/HADOOP-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-10002. -- Resolution: Duplicate Fix Version/s: 2.0.3-alpha Sorry about the noise. This should be fixed by HADOOP-9021 - turns out I wasn't looking at the right 2.0.x sources when debugging this. Tool's config option wouldn't work on secure clusters - Key: HADOOP-10002 URL: https://issues.apache.org/jira/browse/HADOOP-10002 Project: Hadoop Common Issue Type: Bug Components: security, util Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Minor Fix For: 2.0.3-alpha The Tool framework provides a way for clients to run without classpath *-site.xml configs, by letting users pass -conf file to parse into the app's Configuration object. In a secure cluster config setup, such a runner will not work cause of UserGroupInformation.isSecurityEnabled() check, which is used in Server.java to determine what form of communication to use, will load statically a {{new Configuration()}} object to inspect if security is turned on during its initialization, which ignores the application config object and tries to load from classpath and ends up loading non-secure defaults. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10002) Tool's config option wouldn't work on secure clusters
Harsh J created HADOOP-10002: Summary: Tool's config option wouldn't work on secure clusters Key: HADOOP-10002 URL: https://issues.apache.org/jira/browse/HADOOP-10002 Project: Hadoop Common Issue Type: Bug Components: security, util Affects Versions: 2.0.6-alpha Reporter: Harsh J Priority: Minor The Tool framework provides a way for clients to run without classpath *-site.xml configs, by letting users pass -conf file to parse into the app's Configuration object. In a secure cluster config setup, such a runner will not work cause of UserGroupInformation.isSecurityEnabled() check, which is used in Server.java to determine what form of communication to use, will load statically a {{new Configuration()}} object to inspect if security is turned on during its initialization, which ignores the application config object and tries to load from classpath and ends up loading non-secure defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HADOOP-9878) getting rid of all the 'bin/../' from all the paths
[ https://issues.apache.org/jira/browse/HADOOP-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened HADOOP-9878: - getting rid of all the 'bin/../' from all the paths --- Key: HADOOP-9878 URL: https://issues.apache.org/jira/browse/HADOOP-9878 Project: Hadoop Common Issue Type: Improvement Components: conf Reporter: kaveh minooie Priority: Trivial Fix For: 2.1.0-beta Original Estimate: 1m Remaining Estimate: 1m by simply replacing line 34 of libexec/hadoop-config.sh from: {quote} export HADOOP_PREFIX=`dirname $this`/.. {quote} to {quote} export HADOOP_PREFIX=$( cd $config_bin/..; pwd -P ) {quote} we can eliminate all the annoying 'bin/../' from the library paths and make the output of commands like ps a lot more readable. not to mention that OS would do just a bit less work as well. I can post a patch for it as well if it is needed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-9878) getting rid of all the 'bin/../' from all the paths
[ https://issues.apache.org/jira/browse/HADOOP-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-9878. - Resolution: Duplicate getting rid of all the 'bin/../' from all the paths --- Key: HADOOP-9878 URL: https://issues.apache.org/jira/browse/HADOOP-9878 Project: Hadoop Common Issue Type: Improvement Components: conf Reporter: kaveh minooie Priority: Trivial Original Estimate: 1m Remaining Estimate: 1m by simply replacing line 34 of libexec/hadoop-config.sh from: {quote} export HADOOP_PREFIX=`dirname $this`/.. {quote} to {quote} export HADOOP_PREFIX=$( cd $config_bin/..; pwd -P ) {quote} we can eliminate all the annoying 'bin/../' from the library paths and make the output of commands like ps a lot more readable. not to mention that OS would do just a bit less work as well. I can post a patch for it as well if it is needed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-9346) Upgrading to protoc 2.5.0 fails the build
[ https://issues.apache.org/jira/browse/HADOOP-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-9346. - Resolution: Duplicate Thanks for pinging Ravi. I'd discussed with Alejandro that this could be closed. Looks like we added a dupe link but failed to close. Closing now. Upgrading to protoc 2.5.0 fails the build - Key: HADOOP-9346 URL: https://issues.apache.org/jira/browse/HADOOP-9346 Project: Hadoop Common Issue Type: Task Components: build Affects Versions: 3.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Labels: protobuf Attachments: HADOOP-9346.patch Reported over the impala lists, one of the errors received is: {code} src/hadoop-common-project/hadoop-common/target/generated-sources/java/org/apache/hadoop/ha/proto/ZKFCProtocolProtos.java:[104,37] can not find symbol. symbol: class Parser location: package com.google.protobuf {code} Worth looking into as we'll eventually someday bump our protobuf deps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9861) Invert ReflectionUtils' stack trace
Harsh J created HADOOP-9861: --- Summary: Invert ReflectionUtils' stack trace Key: HADOOP-9861 URL: https://issues.apache.org/jira/browse/HADOOP-9861 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.0.5-alpha Reporter: Harsh J Often an MR task (as an example) may fail at the configure stage due to a misconfiguration or whatever, and the only thing a user gets by virtue of MR pulling limited bytes of the diagnostic error data is the top part of the stacktrace: {code} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) {code} This is absolutely useless to a user, and he also goes ahead and blames the framework for having an issue, rather than thinking (non-intuitively) to go see the whole task log for the full trace, especially the last part. Hundreds of time its been a mere class thats missing, etc. but there's just too much pain involved here to troubleshoot. Would be much much better, if we inverted the trace. For example, here's what Hive can return back if we did so, for a random trouble I pulled from the web: {code} java.lang.RuntimeException: Error in configuring object Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:64) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:100) ... 22 more {code} This way the user can at least be sure what part's really failing, and not get lost trying to work their way through reflection utils and upwards/downwards. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9567) Provide auto-renewal for keytab based logins
Harsh J created HADOOP-9567: --- Summary: Provide auto-renewal for keytab based logins Key: HADOOP-9567 URL: https://issues.apache.org/jira/browse/HADOOP-9567 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Minor We do a renewal for cached tickets (obtained via kinit before using a Hadoop application) but we explicitly seem to avoid doing a renewal for keytab based logins (done from within the client code) when we could do that as well via a similar thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-9510) DU command should provide a -h flag to display a more human readable format.
[ https://issues.apache.org/jira/browse/HADOOP-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-9510. - Resolution: Not A Problem This is already available in the revamped shell apps under 2.x releases today; see http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#du DU command should provide a -h flag to display a more human readable format. Key: HADOOP-9510 URL: https://issues.apache.org/jira/browse/HADOOP-9510 Project: Hadoop Common Issue Type: Improvement Reporter: Corey J. Nolet Priority: Minor Would be useful to have the sizes print out as 500M or 3.4G instead of bytes only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-9496) Bad merge of HADOOP-9450 on branch-2 breaks all bin/hadoop calls that need HADOOP_CLASSPATH
[ https://issues.apache.org/jira/browse/HADOOP-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-9496. - Resolution: Fixed Fix Version/s: 2.0.5-beta Committed revision 1471230 to fix this properly. Bad merge of HADOOP-9450 on branch-2 breaks all bin/hadoop calls that need HADOOP_CLASSPATH Key: HADOOP-9496 URL: https://issues.apache.org/jira/browse/HADOOP-9496 Project: Hadoop Common Issue Type: Bug Components: bin Affects Versions: 2.0.5-beta Reporter: Gopal V Assignee: Harsh J Priority: Critical Fix For: 2.0.5-beta Attachments: HADOOP-9496.patch Merge of HADOOP-9450 to branch-2 is broken for hadoop-config.sh on trunk http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh?r1=1453486r2=1469214pathrev=1469214 vs on branch-2 http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh?r1=1390222r2=1469215 This is breaking all hadoop client code which needs HADOOP_CLASSPATH to be set correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9461) JobTracker and NameNode both grant delegation tokens to non-secure clients
Harsh J created HADOOP-9461: --- Summary: JobTracker and NameNode both grant delegation tokens to non-secure clients Key: HADOOP-9461 URL: https://issues.apache.org/jira/browse/HADOOP-9461 Project: Hadoop Common Issue Type: Bug Components: security Reporter: Harsh J Assignee: Harsh J Priority: Minor If one looks at the MAPREDUCE-1516 added logic in JobTracker.java's isAllowedDelegationTokenOp() method, and apply non-secure states of UGI.isSecurityEnabled == false and authMethod == SIMPLE, the return result is true when the intention is false (due to the shorted conditionals). This is allowing non-secure JobClients to easily request and use DelegationTokens and cause unwanted errors to be printed in the JobTracker when the renewer attempts to run. Ideally such clients ought to get an error if they request a DT in non-secure mode. HDFS in trunk and branch-1 both too have the same problem. Trunk MR (HistoryServer) and YARN are however, unaffected due to a simpler, inlined logic instead of reuse of this faulty method. Note that fixing this will break Oozie today, due to the merged logic of OOZIE-734. Oozie will require a fix as well if this is to be fixed in branch-1. As a result, I'm going to mark this as an Incompatible Change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-2781) Hadoop/Groovy integration
[ https://issues.apache.org/jira/browse/HADOOP-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-2781. - Resolution: Won't Fix Closing per comment below and inactive for couple years now: bq. Grool was a dead end. Possible alternatives (given FlumeJava's mention): Apache Crunch - http://crunch.apache.org and/or Cascading - http://cascading.org. Hadoop/Groovy integration - Key: HADOOP-2781 URL: https://issues.apache.org/jira/browse/HADOOP-2781 Project: Hadoop Common Issue Type: New Feature Environment: Any Reporter: Ted Dunning Attachments: trunk.tgz This is a place-holder issue to hold initial release of the groovy integration for hadoop. The goal is to be able to write very simple map-reduce programs in just a few lines of code in a functional style. Word count should be less than 5 lines of code! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9424) The hadoop jar invocation should include the passed jar on the classpath as a whole
Harsh J created HADOOP-9424: --- Summary: The hadoop jar invocation should include the passed jar on the classpath as a whole Key: HADOOP-9424 URL: https://issues.apache.org/jira/browse/HADOOP-9424 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.0.3-alpha Reporter: Harsh J Assignee: Harsh J Priority: Minor When you have a case such as this: {{X.jar - Classes = Main, Foo}} {{Y.jar - Classes = Bar}} With implementation details such as: * Main references Bar and invokes a public, static method on it. * Bar does a class lookup to find Foo (Class.forName(Foo)). Then when you do a {{HADOOP_CLASSPATH=Y.jar hadoop jar X.jar Main}}, the Bar's method fails with a ClassNotFound exception cause of the way RunJar runs. RunJar extracts the passed jar and includes its contents on the ClassLoader of its current thread but the {{Class.forName(…)}} call from another class does not check that class loader and hence cannot find the class as its not on any classpath it is aware of. The script of hadoop jar should ideally include the passed jar argument to the CLASSPATH before RunJar is invoked, for this above case to pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-6942) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/HADOOP-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-6942. - Resolution: Duplicate Fixed via MAPREDUCE-1938. Closing as dupe. Ability for having user's classes take precedence over the system classes for tasks' classpath -- Key: HADOOP-6942 URL: https://issues.apache.org/jira/browse/HADOOP-6942 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: 0.22.0 Reporter: Krishna Ramachandran Attachments: HADOOP-6942.y20.patch, hadoop-common-6942.patch Fix bin/hadoop script to facilitate mapred-1938 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9346) Upgrading to protoc 2.5.0 fails the build
Harsh J created HADOOP-9346: --- Summary: Upgrading to protoc 2.5.0 fails the build Key: HADOOP-9346 URL: https://issues.apache.org/jira/browse/HADOOP-9346 Project: Hadoop Common Issue Type: Task Reporter: Harsh J Priority: Minor Reported over the impala lists, one of the errors received is: {code} src/hadoop-common-project/hadoop-common/target/generated-sources/java/org/apache/hadoop/ha/proto/ZKFCProtocolProtos.java:[104,37] can not find symbol. symbol: class Parser location: package com.google.protobuf {code} Worth looking into as we'll eventually someday bump our protobuf deps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9322) LdapGroupsMapping doesn't seem to set a timeout for its directory search
Harsh J created HADOOP-9322: --- Summary: LdapGroupsMapping doesn't seem to set a timeout for its directory search Key: HADOOP-9322 URL: https://issues.apache.org/jira/browse/HADOOP-9322 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 2.0.3-alpha Reporter: Harsh J Priority: Minor We don't appear to be setting a timeout via http://docs.oracle.com/javase/6/docs/api/javax/naming/directory/SearchControls.html#setTimeLimit(int) before we search with http://docs.oracle.com/javase/6/docs/api/javax/naming/directory/DirContext.html#search(javax.naming.Name,%20java.lang.String,%20javax.naming.directory.SearchControls). This may occasionally lead to some unwanted NN pauses due to lock-holding on the operations that do group lookups. A timeout is better to define than rely on 0 (infinite wait). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HADOOP-9241) DU refresh interval is not configurable
[ https://issues.apache.org/jira/browse/HADOOP-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened HADOOP-9241: - Thanks Nicholas; I have reverted HADOOP-9241 from trunk and branch-2. I will attach a proper patch now. DU refresh interval is not configurable --- Key: HADOOP-9241 URL: https://issues.apache.org/jira/browse/HADOOP-9241 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Harsh J Assignee: Harsh J Priority: Trivial Fix For: 2.0.3-alpha Attachments: HADOOP-9241.patch While the {{DF}} class's refresh interval is configurable, the {{DU}}'s isn't. We should ensure both be configurable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9257) HADOOP-9241 changed DN's default DU interval to 1m instead of 10m accidentally
Harsh J created HADOOP-9257: --- Summary: HADOOP-9241 changed DN's default DU interval to 1m instead of 10m accidentally Key: HADOOP-9257 URL: https://issues.apache.org/jira/browse/HADOOP-9257 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.0.3-alpha Reporter: Harsh J Assignee: Harsh J Suresh caught this on HADOOP-9241: {quote} Even for trivial jiras, I suggest getting the code review done before committing the code. Such changes are easy and quick to review. In this patch, did DU interval become 1 minute instead of 10 minutes? {code} -this(path, 60L); -//10 minutes default refresh interval +this(path, conf.getLong(CommonConfigurationKeys.FS_DU_INTERVAL_KEY, +CommonConfigurationKeys.FS_DU_INTERVAL_DEFAULT)); + /** See a href={@docRoot}/../core-default.htmlcore-default.xml/a */ + public static final String FS_DU_INTERVAL_KEY = fs.du.interval; + /** Default value for FS_DU_INTERVAL_KEY */ + public static final longFS_DU_INTERVAL_DEFAULT = 6; {code} {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9241) DU refresh interval is not configurable
Harsh J created HADOOP-9241: --- Summary: DU refresh interval is not configurable Key: HADOOP-9241 URL: https://issues.apache.org/jira/browse/HADOOP-9241 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Harsh J Priority: Trivial While the {{DF}} class's refresh interval is configurable, the {{DU}}'s isn't. We should ensure both be configurable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9243) Some improvements to the mailing lists webpage for lowering unrelated content rate
Harsh J created HADOOP-9243: --- Summary: Some improvements to the mailing lists webpage for lowering unrelated content rate Key: HADOOP-9243 URL: https://issues.apache.org/jira/browse/HADOOP-9243 Project: Hadoop Common Issue Type: Improvement Components: documentation Reporter: Harsh J Priority: Minor From Steve on HADOOP-9329: {quote} * could you add a bit of text to say user@ is not the place to discuss installation problems related to any third party products that install some variant of Hadoop on people's desktops and servers. You're the one who ends up having to bounce off all the CDH-related queries -it would help you too. * For the new Invalid JIRA link to paste into JIRA issues about this, I point to the distributions and Commercial support page on the wiki -something similar on the mailing lists page would avoid having to put any specific vendor links into the mailing lists page, and support a higher/more open update process. See http://wiki.apache.org/hadoop/InvalidJiraIssues {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9239) Move the general@ description to the end of lists in the mailing lists web page
Harsh J created HADOOP-9239: --- Summary: Move the general@ description to the end of lists in the mailing lists web page Key: HADOOP-9239 URL: https://issues.apache.org/jira/browse/HADOOP-9239 Project: Hadoop Common Issue Type: Improvement Components: documentation Reporter: Harsh J Priority: Minor We have users unnecessarily subscribing to and abusing the general@ list mainly cause of its presence as the first option in the page http://hadoop.apache.org/mailing_lists.html, and secondarily cause of its name. This is to at least address the first one that is causing growing pain to its subscribers. Lets move it to the bottom of the presented list of lists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8274) In pseudo or cluster model under Cygwin, tasktracker can not create a new job because of symlink problem.
[ https://issues.apache.org/jira/browse/HADOOP-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-8274. - Resolution: Won't Fix For Windows, since the mainstream branch does not support it actively, am closing this as a Won't Fix. I'm certain the same issue does not happen on the branch-1-win 1.x branch (or the branch-trunk-win branch), and I urge you to use that instead if you wish to continue using Windows for development or other usage. Find the Windows-optimized sources at http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1-win/ or http://svn.apache.org/repos/asf/hadoop/common/branches/branch-trunk-win/. In pseudo or cluster model under Cygwin, tasktracker can not create a new job because of symlink problem. - Key: HADOOP-8274 URL: https://issues.apache.org/jira/browse/HADOOP-8274 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.20.205.0, 1.0.0, 1.0.1, 0.22.0 Environment: windows7+cygwin 1.7.11-1+jdk1.6.0_31+hadoop 1.0.0 Reporter: tim.wu The standalone model is ok. But, in pseudo or cluster model, it always throw errors, even I just run wordcount example. The HDFS works fine, but tasktracker can not create threads(jvm) for new job. It is empty under /logs/userlogs/job-/attempt-/. The reason looks like that in windows, Java can not recognize a symlink of folder as a folder. The detail description is as following, == First, the error log of tasktracker is like: == 12/03/28 14:35:13 INFO mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201203280212_0005_m_-1386636958 12/03/28 14:35:13 INFO mapred.JvmManager: JVM Runner jvm_201203280212_0005_m_-1386636958 spawned. 12/03/28 14:35:17 INFO mapred.JvmManager: JVM Not killed jvm_201203280212_0005_m_-1386636958 but just removed 12/03/28 14:35:17 INFO mapred.JvmManager: JVM : jvm_201203280212_0005_m_-1386636958 exited with exit code -1. Number of tasks it ran: 0 12/03/28 14:35:17 WARN mapred.TaskRunner: attempt_201203280212_0005_m_02_0 : Child Error java.io.IOException: Task process exit with nonzero status of -1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) 12/03/28 14:35:21 INFO mapred.TaskTracker: addFreeSlot : current free slots : 2 12/03/28 14:35:24 INFO mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201203280212_0005_m_02_1 task's state:UNASSIGNED 12/03/28 14:35:24 INFO mapred.TaskTracker: Trying to launch : attempt_201203280212_0005_m_02_1 which needs 1 slots 12/03/28 14:35:24 INFO mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201203280212_0005_m_02_1 which needs 1 slots 12/03/28 14:35:24 WARN mapred.TaskLog: Failed to retrieve stdout log for task: attempt_201203280212_0005_m_02_0 java.io.FileNotFoundException: D:\cygwin\home\timwu\hadoop-1.0.0\logs\userlogs\job_201203280212_0005\attempt_201203280212_0005_m_02_0\log.index (The system cannot find the path specified) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:120) at org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:102) at org.apache.hadoop.mapred.TaskLog.getAllLogsFileDetails(TaskLog.java:188) at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:423) at org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81) at org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at
[jira] [Resolved] (HADOOP-7386) Support concatenated bzip2 files
[ https://issues.apache.org/jira/browse/HADOOP-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-7386. - Resolution: Duplicate Thanks for confirming! Resolving as dupe. Support concatenated bzip2 files Key: HADOOP-7386 URL: https://issues.apache.org/jira/browse/HADOOP-7386 Project: Hadoop Common Issue Type: Improvement Reporter: Allen Wittenauer Assignee: Karthik Kambatla HADOOP-6835 added the framework and direct support for concatenated gzip files. We should do the same for bzip files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8301) Common (hadoop-tools) side of MAPREDUCE-4172
[ https://issues.apache.org/jira/browse/HADOOP-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-8301. - Resolution: Won't Fix Patches were too broad and have gone stale. Will address these forms of issue over separate, smaller and more divided JIRAs in future. Closing out parent JIRA MAPREDUCE-4172, and hence closing out this. Common (hadoop-tools) side of MAPREDUCE-4172 Key: HADOOP-8301 URL: https://issues.apache.org/jira/browse/HADOOP-8301 Project: Hadoop Common Issue Type: Task Components: build Affects Versions: 3.0.0 Reporter: Harsh J Assignee: Harsh J Patches on MAPREDUCE-4172 (for MR-relevant projects) that requires to run off of Hadoop Common project for Hadoop QA. One sub-task per hadoop-tools submodule will be added here for reviews. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-9091) Allow daemon startup when at least 1 (or configurable) disk is in an OK state.
[ https://issues.apache.org/jira/browse/HADOOP-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-9091. - Resolution: Fixed This feature is already available in all our current releases via the DN volume failure toleration properties. Please see https://issues.apache.org/jira/browse/HDFS-1592. Resolving as not a problem. Please update to an inclusive release to have this addressed in your environment. Allow daemon startup when at least 1 (or configurable) disk is in an OK state. -- Key: HADOOP-9091 URL: https://issues.apache.org/jira/browse/HADOOP-9091 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.20.2 Reporter: Jelle Smet Labels: features, hadoop The given example is if datanode disk definitions but should be applicable to all configuration where a list of disks are provided. I have defined multiple local disks defined for a datanode: property namedfs.data.dir/name value/data/01/dfs/dn,/data/02/dfs/dn,/data/03/dfs/dn,/data/04/dfs/dn,/data/05/dfs/dn,/data/06/dfs/dn/value finaltrue/final /property When one of those disks breaks and is unmounted then the mountpoint (such as /data/03 in this example) becomes a regular directory which doesn't have the valid permissions and possible directory structure Hadoop is expecting. When this situation happens, the datanode fails to restart because of this while actually we have enough disks in an OK state to proceed. The only way around this is to alter the configuration and omit that specific disk configuration. To my opinion, It would be more practical to let Hadoop daemons start when at least 1 disks/partition in the provided list is in a usable state. This prevents having to roll out custom configurations for systems which have temporarily a disk (and therefor directory layout) missing. This might also be configurable that at least X partitions out of he available ones are in OK state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-9066) Sorting for FileStatus[]
[ https://issues.apache.org/jira/browse/HADOOP-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-9066. - Resolution: Invalid Since HADOOP-8934 is already adding FileStatus data based sorting in a place that matters, and this JIRA seems to just add a simple example of utilizing FileStatus comparatives, am resolving this as Invalid at the moment, as the example isn't too much of a value (given that the Javadoc already is clear for FileStatus, and there's no use-case for this stuff in MR, etc.) so far. Sorting for FileStatus[] Key: HADOOP-9066 URL: https://issues.apache.org/jira/browse/HADOOP-9066 Project: Hadoop Common Issue Type: Improvement Environment: java7 , RedHat9 , Hadoop 0.20.2 ,eclipse-jee-juno-linux-gtk.tar.gz Reporter: david king Labels: patch Attachments: ConcreteFileStatusAscComparable.java, ConcreteFileStatusDescComparable.java, FileStatusComparable.java, FileStatusTool.java, TestFileStatusTool.java I will submit a batch of FileStatusTool that used to sort FileStatus by the Comparator, the Comparator not only customer to realizate , but alse use the example code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9068) Reuse (and not duplicate) globbing logic between FileSystem and FileContext
Harsh J created HADOOP-9068: --- Summary: Reuse (and not duplicate) globbing logic between FileSystem and FileContext Key: HADOOP-9068 URL: https://issues.apache.org/jira/browse/HADOOP-9068 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.0-alpha Reporter: Harsh J FileSystem's globbing code is currently duplicated in FileContext.Util class. We should reuse the implementation rather than maintain two pieces of it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9023) HttpFs is too restrictive on usernames
Harsh J created HADOOP-9023: --- Summary: HttpFs is too restrictive on usernames Key: HADOOP-9023 URL: https://issues.apache.org/jira/browse/HADOOP-9023 Project: Hadoop Common Issue Type: Bug Reporter: Harsh J HttpFs tries to use UserProfile.USER_PATTERN to match all usernames before a doAs impersonation function. This regex is too strict for most usernames, as it disallows any special character at all. We should relax it more or ditch needing to match things there. WebHDFS currently has no such limitations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8927) org.apache.hadoop.hive.jdbc.HiveDriver loads outside of Map Reduce but fails on Map reduce
[ https://issues.apache.org/jira/browse/HADOOP-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-8927. - Resolution: Not A Problem org.apache.hadoop.hive.jdbc.HiveDriver loads outside of Map Reduce but fails on Map reduce -- Key: HADOOP-8927 URL: https://issues.apache.org/jira/browse/HADOOP-8927 Project: Hadoop Common Issue Type: Bug Components: conf, tools Affects Versions: 2.0.2-alpha Reporter: VJ Priority: Minor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8876) SequenceFile default compression is RECORD, not BLOCK
Harsh J created HADOOP-8876: --- Summary: SequenceFile default compression is RECORD, not BLOCK Key: HADOOP-8876 URL: https://issues.apache.org/jira/browse/HADOOP-8876 Project: Hadoop Common Issue Type: Improvement Components: io Affects Versions: 2.0.0-alpha Reporter: Harsh J Currently both the SequenceFile writer and the MR defaults for SequenceFile compression default to RECORD type compression, while most recommendations are to use BLOCK for smaller end sizes instead. Should we not change the default? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8863) Eclipse plugin may not be working on Juno due to changes in it
Harsh J created HADOOP-8863: --- Summary: Eclipse plugin may not be working on Juno due to changes in it Key: HADOOP-8863 URL: https://issues.apache.org/jira/browse/HADOOP-8863 Project: Hadoop Common Issue Type: Bug Components: contrib/eclipse-plugin Affects Versions: 1.2.0 Reporter: Harsh J Assignee: Harsh J We need to debug/investigate why it is so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7941) NoClassDefFoundError while running distcp/archive
[ https://issues.apache.org/jira/browse/HADOOP-7941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-7941. - Resolution: Cannot Reproduce Doesn't seem to be a problem anymore, both hadoop and mapred scripts are running these fine. Resolving as Cannot Reproduce (anymore). NoClassDefFoundError while running distcp/archive - Key: HADOOP-7941 URL: https://issues.apache.org/jira/browse/HADOOP-7941 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.23.1 Reporter: Ramya Sunil bin/hadoop distcp {noformat} Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/tools/DistCp Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.tools.DistCp at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) Could not find the main class: org.apache.hadoop.tools.DistCp. Program will exit. {noformat} Same is the case while running 'bin/hadoop archive' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8864) Addendum to HADOOP-8840: Add a coloring case for +0 results too.
Harsh J created HADOOP-8864: --- Summary: Addendum to HADOOP-8840: Add a coloring case for +0 results too. Key: HADOOP-8864 URL: https://issues.apache.org/jira/browse/HADOOP-8864 Project: Hadoop Common Issue Type: Bug Reporter: Harsh J Assignee: Harsh J Attachments: HADOOP-8864.patch Noticed on MAPREDUCE-3223 that we failed to cover coloring the +0 case we print sometimes for doc-only patches. These can be colored green too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8864) Addendum to HADOOP-8840: Add a coloring case for +0 results too.
[ https://issues.apache.org/jira/browse/HADOOP-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-8864. - Resolution: Fixed Fix Version/s: 3.0.0 Since this was a trivial addition, I went ahead and committed it to trunk. Addendum to HADOOP-8840: Add a coloring case for +0 results too. Key: HADOOP-8864 URL: https://issues.apache.org/jira/browse/HADOOP-8864 Project: Hadoop Common Issue Type: Improvement Reporter: Harsh J Assignee: Harsh J Priority: Trivial Fix For: 3.0.0 Attachments: HADOOP-8864.patch Noticed on MAPREDUCE-3223 that we failed to cover coloring the +0 case we print sometimes for doc-only patches. These can be colored green too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8838) Colorize the test-patch output sent to JIRA
Harsh J created HADOOP-8838: --- Summary: Colorize the test-patch output sent to JIRA Key: HADOOP-8838 URL: https://issues.apache.org/jira/browse/HADOOP-8838 Project: Hadoop Common Issue Type: Improvement Components: build Reporter: Harsh J Assignee: Harsh J Priority: Trivial It would be helpful to mark the -1s in red and +1s in green. Helps avoid missing stuff like findbugs warnings, etc., we've been bitten by. Also helps run through the results faster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8839) test-patch's -1 on @author tag presence doesn't cause a -1 to the overall result
Harsh J created HADOOP-8839: --- Summary: test-patch's -1 on @author tag presence doesn't cause a -1 to the overall result Key: HADOOP-8839 URL: https://issues.apache.org/jira/browse/HADOOP-8839 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Harsh J Priority: Trivial As observed on HADOOP-8838. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8840) Fix the test-patch colorizer to cover all sorts of +1 lines.
Harsh J created HADOOP-8840: --- Summary: Fix the test-patch colorizer to cover all sorts of +1 lines. Key: HADOOP-8840 URL: https://issues.apache.org/jira/browse/HADOOP-8840 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Harsh J Assignee: Harsh J As noticed by Jason on HADOOP-8838, I missed some of the entries needed to be colorized. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8844) Add a plaintext fs -text test-case
Harsh J created HADOOP-8844: --- Summary: Add a plaintext fs -text test-case Key: HADOOP-8844 URL: https://issues.apache.org/jira/browse/HADOOP-8844 Project: Hadoop Common Issue Type: Test Components: fs Affects Versions: 2.0.0-alpha Reporter: Harsh J The TestDFSShell's textTest(…) currently tests all sorts of binary and compressed files, but doesn't test plaintext files. We should add one test for plaintext as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8845) When looking for parent paths info, globStatus must filter out non-directory elements to avoid an AccessControlException
Harsh J created HADOOP-8845: --- Summary: When looking for parent paths info, globStatus must filter out non-directory elements to avoid an AccessControlException Key: HADOOP-8845 URL: https://issues.apache.org/jira/browse/HADOOP-8845 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.0-alpha Reporter: Harsh J Assignee: Harsh J A brief description from my colleague Stephen Fritz who helped discover it: {quote} [root@node1 ~]# su - hdfs -bash-4.1$ echo My Test Stringtestfile -- just a text file, for testing below -bash-4.1$ hadoop dfs -mkdir /tmp/testdir -- create a directory -bash-4.1$ hadoop dfs -mkdir /tmp/testdir/1 -- create a subdirectory -bash-4.1$ hadoop dfs -put testfile /tmp/testdir/1/testfile -- put the test file in the subdirectory -bash-4.1$ hadoop dfs -put testfile /tmp/testdir/testfile -- put the test file in the directory -bash-4.1$ hadoop dfs -lsr /tmp/testdir drwxr-xr-x - hdfs hadoop 0 2012-09-25 06:52 /tmp/testdir/1 -rw-r--r-- 3 hdfs hadoop 15 2012-09-25 06:52 /tmp/testdir/1/testfile -rw-r--r-- 3 hdfs hadoop 15 2012-09-25 06:52 /tmp/testdir/testfile All files are where we expect them...OK, let's try reading -bash-4.1$ hadoop dfs -cat /tmp/testdir/testfile My Test String -- success! -bash-4.1$ hadoop dfs -cat /tmp/testdir/1/testfile My Test String -- success! -bash-4.1$ hadoop dfs -cat /tmp/testdir/*/testfile My Test String -- success! Note that we used an '*' in the cat command, and it correctly found the subdirectory '/tmp/testdir/1', and ignore the regular file '/tmp/testdir/testfile' -bash-4.1$ exit logout [root@node1 ~]# su - testuser -- lets try it as a different user: [testuser@node1 ~]$ hadoop dfs -lsr /tmp/testdir drwxr-xr-x - hdfs hadoop 0 2012-09-25 06:52 /tmp/testdir/1 -rw-r--r-- 3 hdfs hadoop 15 2012-09-25 06:52 /tmp/testdir/1/testfile -rw-r--r-- 3 hdfs hadoop 15 2012-09-25 06:52 /tmp/testdir/testfile [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/testfile My Test String -- good [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/1/testfile My Test String -- so far so good [testuser@node1 ~]$ hadoop dfs -cat /tmp/testdir/*/testfile cat: org.apache.hadoop.security.AccessControlException: Permission denied: user=testuser, access=EXECUTE, inode=/tmp/testdir/testfile:hdfs:hadoop:-rw-r--r-- {code} Essentially, we hit a ACE with access=EXECUTE on file /tmp/testdir/testfile cause we tried to access the /tmp/testdir/testfile/testfile as a path. This shouldn't happen, as the testfile is a file and not a path parent to be looked up upon. Surprisingly the superuser avoids hitting into the error, as a result of bypassing permissions, but that can be looked up on another JIRA - if it is fine to let it be like that or not. This JIRA targets a client-sided fix to not cause such /path/file/dir kinda lookups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HADOOP-7698) jsvc target fails on x86_64
[ https://issues.apache.org/jira/browse/HADOOP-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened HADOOP-7698: - Dang, I forgot to see this was 1.x related. Reopened to check 1.x now. jsvc target fails on x86_64 --- Key: HADOOP-7698 URL: https://issues.apache.org/jira/browse/HADOOP-7698 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.20.205.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HADOOP-7698-1.patch, HADOOP-7968.patch Recent changes to the build.xml determine with jsvc file to download based on the os.arch. It maps various arch values to i386 or x86_64. However, it notably doesn't consider x86_64 to be x86_64. The result is this the download fails because {{os-arch}} doesn't expand. {code} build.xml:2626: Can't get http://archive.apache.org/dist/commons/daemon/binaries/1.0.2/linux/commons-daemon-1.0.2-bin-linux-${os-arch}.tar.gz {code} This breaks {{test-patch}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7698) jsvc target fails on x86_64
[ https://issues.apache.org/jira/browse/HADOOP-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-7698. - Resolution: Fixed Fix Version/s: 1.2.0 jsvc target fails on x86_64 --- Key: HADOOP-7698 URL: https://issues.apache.org/jira/browse/HADOOP-7698 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.20.205.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Fix For: 1.2.0 Attachments: HADOOP-7698-1.patch, HADOOP-7968.patch Recent changes to the build.xml determine with jsvc file to download based on the os.arch. It maps various arch values to i386 or x86_64. However, it notably doesn't consider x86_64 to be x86_64. The result is this the download fails because {{os-arch}} doesn't expand. {code} build.xml:2626: Can't get http://archive.apache.org/dist/commons/daemon/binaries/1.0.2/linux/commons-daemon-1.0.2-bin-linux-${os-arch}.tar.gz {code} This breaks {{test-patch}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7542) Change XML format to 1.1 to add support for serializing additional characters
[ https://issues.apache.org/jira/browse/HADOOP-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-7542. - Resolution: Won't Fix Release Note: (was: Changes the Configuration file's XML format to 1.1 from 1.0, which adds support for serializing additional separator characters.) Thanks Steve. For the original reason of the textoutputformat separator config, I've filed MAPREDUCE-4677 for getting that done. Change XML format to 1.1 to add support for serializing additional characters - Key: HADOOP-7542 URL: https://issues.apache.org/jira/browse/HADOOP-7542 Project: Hadoop Common Issue Type: Improvement Components: conf Affects Versions: 0.20.2 Reporter: Suhas Gogate Assignee: Michael Katzenellenbogen Attachments: HADOOP-7542-v1.patch, MAPREDUCE-109.patch, MAPREDUCE-109-v2.patch, MAPREDUCE-109-v3.patch, MAPREDUCE-109-v4.patch Feature added by this Jira has a problem while setting up some of the invalid xml characters e.g. ctrl-A e.g. mapred.textoutputformat.separator = \u0001 e,g, String delim = \u0001; Conf.set(mapred.textoutputformat.separator, delim); Job client serializes the jobconf with mapred.textoutputformat.separator set to \u0001 (ctrl-A) and problem happens when it is de-serialized (read back) by job tracker, where it encounters invalid xml character. The test for this feature public : testFormatWithCustomSeparator() does not serialize the jobconf after adding the separator as ctrl-A and hence does not detect the specific problem. Here is an exception: 08/12/06 01:40:50 INFO mapred.FileInputFormat: Total input paths to process : 1 org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.RuntimeException: org.xml.sax.SAXParseException: Character reference #1 is an invalid XML character. at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:961) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:864) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:832) at org.apache.hadoop.conf.Configuration.get(Configuration.java:291) at org.apache.hadoop.mapred.JobConf.getJobPriority(JobConf.java:1163) at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:179) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1783) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) at org.apache.hadoop.ipc.Client.call(Client.java:715) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.mapred.$Proxy1.submitJob(Unknown Source) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026) at -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7725) fix test-patch so that Jenkins can accept patches to the hadoop-tools module.
[ https://issues.apache.org/jira/browse/HADOOP-7725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-7725. - Resolution: Duplicate Fix Version/s: (was: 0.24.0) The hadoop-tools and others are all covered via HADOOP-8308 now. Marking as dupe of that. Reopen if am incorrect. fix test-patch so that Jenkins can accept patches to the hadoop-tools module. - Key: HADOOP-7725 URL: https://issues.apache.org/jira/browse/HADOOP-7725 Project: Hadoop Common Issue Type: Improvement Components: build Affects Versions: 0.23.0, 0.24.0 Reporter: Alejandro Abdelnur Basically, test-patch.sh needs some tinkering to recognize hadoop-tools-project along-side common/mapreduce/hdfs. It also needs changes to compile and run tests in hadoop-tools_projects on patch submission. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7855) Improve DiskChecker javadocs
[ https://issues.apache.org/jira/browse/HADOOP-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-7855. - Resolution: Duplicate Fix Version/s: (was: 0.24.0) Dupe of HADOOP-7856 (Accidental dual-submit). Improve DiskChecker javadocs Key: HADOOP-7855 URL: https://issues.apache.org/jira/browse/HADOOP-7855 Project: Hadoop Common Issue Type: Bug Components: util Reporter: Eli Collins Labels: noob The javadocs for DiskChecker#checkDir(File dir) trail off, look like they weren't completed, should be. While checkDir(File) uses java File to check if a dir actually is writable, the version of checkDir that takes an FsPermission uses FsAction#implies which doesn't actually check if a dir is writable (eg it passes on a read-only file system). So switching from one version to the other can cause unexpected bugs. Let's call this out explicitly in the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7884) test-patch seems to fail when a patch goes across projects (common/hdfs/mapreduce) or touches hadoop-assemblies/hadoop-dist.
[ https://issues.apache.org/jira/browse/HADOOP-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-7884. - Resolution: Not A Problem Fix Version/s: (was: 0.24.0) No longer a problem after HADOOP-8308. test-patch seems to fail when a patch goes across projects (common/hdfs/mapreduce) or touches hadoop-assemblies/hadoop-dist. Key: HADOOP-7884 URL: https://issues.apache.org/jira/browse/HADOOP-7884 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.24.0 Reporter: Alejandro Abdelnur Take for example HDFS-2178, the patch applies cleanly, but test-patch fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7840) Cleanup unnecessary exceptions thrown and unnecessary casts
[ https://issues.apache.org/jira/browse/HADOOP-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-7840. - Resolution: Invalid If we remove those throws IOException bits out of the methods you've changed, won't we inadvertently affect DistributedFileSystem that does Override these methods and makes use of the throws specification? We'll be breaking API if am not wrong. For example, I tried removing throws IOException from FileSystem and DistributedFileSystem immediately complained at the overriden method cause it broke compatibility. I'm closing this as Invalid, at this point, but please reopen if I got something wrong. Cleanup unnecessary exceptions thrown and unnecessary casts --- Key: HADOOP-7840 URL: https://issues.apache.org/jira/browse/HADOOP-7840 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.24.0 Reporter: Hari Mankude Assignee: Hari Mankude Priority: Minor Attachments: hadoop-7840.trunk.patch Cleanup build warnings. It is the file in the hadoop-common subtree for hdfs-2564. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8833) fs -text should make sure to call inputstream.seek(0) before using input stream
Harsh J created HADOOP-8833: --- Summary: fs -text should make sure to call inputstream.seek(0) before using input stream Key: HADOOP-8833 URL: https://issues.apache.org/jira/browse/HADOOP-8833 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.2-alpha Reporter: Harsh J Assignee: Harsh J From Muddy Dixon on HADOOP-8449: Hi We found the changes in order of switch and guard block in {code} private InputStream forMagic(Path p, FileSystem srcFs) throws IOException {code} Because of this change, return value of {code} codec.createInputStream(i) {code} is changed if codec exists. {code} private InputStream forMagic(Path p, FileSystem srcFs) throws IOException { FSDataInputStream i = srcFs.open(p); // check codecs CompressionCodecFactory cf = new CompressionCodecFactory(getConf()); CompressionCodec codec = cf.getCodec(p); if (codec != null) { return codec.createInputStream(i); } switch(i.readShort()) { // cases } {code} New: {code} private InputStream forMagic(Path p, FileSystem srcFs) throws IOException { FSDataInputStream i = srcFs.open(p); switch(i.readShort()) { // === index (or pointer) processes!! // cases default: { // Check the type of compression instead, depending on Codec class's // own detection methods, based on the provided path. CompressionCodecFactory cf = new CompressionCodecFactory(getConf()); CompressionCodec codec = cf.getCodec(p); if (codec != null) { return codec.createInputStream(i); } break; } } // File is non-compressed, or not a file container we know. i.seek(0); return i; } {code} Fix is to use i.seek(0) before we use i anywhere. I missed that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7726) eclipse target does not build with 0.24.0
[ https://issues.apache.org/jira/browse/HADOOP-7726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-7726. - Resolution: Cannot Reproduce (This has nothing to do with the eclipse-plugin) No longer an issue on trunk/2.x branches. Please see http://wiki.apache.org/hadoop/HowToContribute to import sources into eclipse. eclipse target does not build with 0.24.0 - Key: HADOOP-7726 URL: https://issues.apache.org/jira/browse/HADOOP-7726 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.24.0 Environment: Fedora 15 Reporter: Tim Broberg I'm new to hadoop, java, and eclipse, so please forgive me if I jumble multiple issues together or mistake the symptoms of one problem for a separate issue. Attempting to follow the build instructions from http://wiki.apache.org/hadoop/EclipseEnvironment, the following commands are to be executed: 1 - mvn test -DskipTests 2 - mvn eclipse:eclipse -DdownloadSources=true -DdownloadJavadocs=true 3 - cd hdfs; ant compile eclipse 4 - cd ../; cd mapreduce; ant compile eclipse A - If mvn test -DskipTests is used for #1, #2 fails with [ERROR] Failed to execute goal on project hadoop-yarn-common: Could not resolve dependencies for project org.apache.hadoop:hadoop-yarn-common:jar:0.24.0-SNAPSHOT: Per Luke Lu's suggestion, mvn install -DskipTests -P-cbuild instead of step #1 cleared up this issue. B - For steps #3, and #4 there are no hdfs or mapreduce subdirectories. These appear to have been renamed hadoop-hdfs-project and hadoop-mapreduce-project. C - For step #3, if I then go to hadoop-hdfs-project instead and perform ant compile eclipse no build.xml file is found - Buildfile: build.xml does not exist! D - For step #4, if I go to hadoop-mapreduce-project and do ant compile eclipse a set of errors much like #A is produced: [ivy:resolve] :: [ivy:resolve] :: UNRESOLVED DEPENDENCIES :: [ivy:resolve] :: [ivy:resolve] :: org.apache.hadoop#hadoop-yarn-server-common;0.24.0-SNAPSHOT: not found [ivy:resolve] :: org.apache.hadoop#hadoop-mapreduce-client-core;0.24.0-SNAPSHOT: not found [ivy:resolve] :: org.apache.hadoop#hadoop-yarn-common;0.24.0-SNAPSHOT: not found [ivy:resolve] :: E - If I ignore these issues and import the projects generated in step #2, I get a bunch of errors related to the lack of an M2_REPO definition. Adding this variable needs to be included in the build scripts or documentation in the wiki. F - Once that is resolved, eclipse shows hundreds of errors and warnings starting with AvroRecord cannot be resolved to a type. Thanks so much for your work on this, but it needs a little more effort in documentation and/or development before it is usable again. Thanks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HADOOP-8362) Improve exception message when Configuration.set() is called with a null key or value
[ https://issues.apache.org/jira/browse/HADOOP-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened HADOOP-8362: - Hi Suresh, Looks like this wasn't committed? I'm going ahead and committing it in. Reopening for until it is done. Improve exception message when Configuration.set() is called with a null key or value - Key: HADOOP-8362 URL: https://issues.apache.org/jira/browse/HADOOP-8362 Project: Hadoop Common Issue Type: Improvement Components: conf Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Assignee: madhukara phatak Priority: Trivial Labels: newbie Fix For: 3.0.0 Attachments: HADOOP-8362-1.patch, HADOOP-8362-2.patch, HADOOP-8362-3.patch, HADOOP-8362-4.patch, HADOOP-8362-5.patch, HADOOP-8362-6.patch, HADOOP-8362-7.patch, HADOOP-8362-8.patch, HADOOP-8362.9.patch, HADOOP-8362.patch Currently, calling Configuration.set(...) with a null value results in a NullPointerException within Properties.setProperty. We should check for null key/value and throw a better exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8362) Improve exception message when Configuration.set() is called with a null key or value
[ https://issues.apache.org/jira/browse/HADOOP-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-8362. - Resolution: Fixed Fix Version/s: (was: 3.0.0) 2.0.1-alpha Target Version/s: (was: 2.0.0-alpha) Committed to branch-2 and trunk. Thanks Madhukara and Suresh! Improve exception message when Configuration.set() is called with a null key or value - Key: HADOOP-8362 URL: https://issues.apache.org/jira/browse/HADOOP-8362 Project: Hadoop Common Issue Type: Improvement Components: conf Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Assignee: madhukara phatak Priority: Trivial Labels: newbie Fix For: 2.0.1-alpha Attachments: HADOOP-8362-1.patch, HADOOP-8362-2.patch, HADOOP-8362-3.patch, HADOOP-8362-4.patch, HADOOP-8362-5.patch, HADOOP-8362-6.patch, HADOOP-8362-7.patch, HADOOP-8362-8.patch, HADOOP-8362.10.patch, HADOOP-8362.9.patch, HADOOP-8362.patch Currently, calling Configuration.set(...) with a null value results in a NullPointerException within Properties.setProperty. We should check for null key/value and throw a better exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8597) FsShell's Text command should be able to read avro data files
Harsh J created HADOOP-8597: --- Summary: FsShell's Text command should be able to read avro data files Key: HADOOP-8597 URL: https://issues.apache.org/jira/browse/HADOOP-8597 Project: Hadoop Common Issue Type: New Feature Components: fs Affects Versions: 2.0.0-alpha Reporter: Harsh J Similar to SequenceFiles are Apache Avro's DataFiles. Since these are getting popular as a data format, perhaps it would be useful if {{fs -text}} were to add some support for reading it, like it reads SequenceFiles. Should be easy since Avro is already a dependency and provides the required classes. Of discussion is the output we ought to emit. Avro DataFiles aren't simple as text, nor have they the singular Key-Value pair structure of SequenceFiles. They usually contain a set of fields defined as a record, and the usual text emit, as available from avro-tools via http://avro.apache.org/docs/current/api/java/org/apache/avro/tool/DataFileReadTool.html, is in proper JSON format. I think we should use the JSON format as the output, rather than a delimited form, for there are many complex structures in Avro and JSON is the easiest and least-work-to-do way to display it (Avro supports json dumping by itself). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8588) SerializationFactory shouldn't throw a NullPointerException if the serializations list is empty
Harsh J created HADOOP-8588: --- Summary: SerializationFactory shouldn't throw a NullPointerException if the serializations list is empty Key: HADOOP-8588 URL: https://issues.apache.org/jira/browse/HADOOP-8588 Project: Hadoop Common Issue Type: Improvement Components: io Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Minor The SerializationFactory throws an NPE if CommonConfigurationKeys.IO_SERIALIZATIONS_KEY is set to an empty list in the config. It should rather print a WARN log indicating the serializations list is empty, and start up without any valid serialization classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-5555) JobClient should provide an API to return the job names of jobs
[ https://issues.apache.org/jira/browse/HADOOP-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-. - Resolution: Not A Problem The JobClient provides both Job and RunningJob returns via some of its cluster-connecting methods, that in turn provide an API to retrieve the Job Name string already. Hence, this has already been fixed. For the 'hadoop job -list' enhancement to show the same, see MAPREDUCE-4424 instead (which I just forked out). Resolving as Not a Problem (anymore). JobClient should provide an API to return the job names of jobs --- Key: HADOOP- URL: https://issues.apache.org/jira/browse/HADOOP- Project: Hadoop Common Issue Type: Improvement Reporter: Runping Qi Currently, there seems to be no way to get the job name of a job from its job id. The JobClient should provide a way to do so. Also, the command line hadoop job -list should also return the job names. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-6817) SequenceFile.Reader can't read gzip format compressed sequence file which produce by a mapreduce job without native compression library
[ https://issues.apache.org/jira/browse/HADOOP-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-6817. - Resolution: Duplicate This is being addressed via HADOOP-8582. SequenceFile.Reader can't read gzip format compressed sequence file which produce by a mapreduce job without native compression library --- Key: HADOOP-6817 URL: https://issues.apache.org/jira/browse/HADOOP-6817 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.2 Environment: Cluster:CentOS 5,jdk1.6.0_20 Client:Mac SnowLeopard,jdk1.6.0_20 Reporter: Wenjun Huang An hadoop job output a gzip compressed sequence file(whether record compressed or block compressed).The client program use SequenceFile.Reader to read this sequence file,when reading the client program shows the following exceptions: 2090 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2091 [main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new decompressor Exception in thread main java.io.EOFException at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207) at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197) at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136) at java.util.zip.GZIPInputStream.init(GZIPInputStream.java:58) at java.util.zip.GZIPInputStream.init(GZIPInputStream.java:68) at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.init(GzipCodec.java:92) at org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.init(GzipCodec.java:101) at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:170) at org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:180) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1428) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1417) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1412) at com.shiningware.intelligenceonline.taobao.mapreduce.HtmlContentSeqOutputView.main(HtmlContentSeqOutputView.java:28) I studied the code in org.apache.hadoop.io.SequenceFile.Reader.init method and read: // Initialize... *not* if this we are constructing a temporary Reader if (!tempReader) { valBuffer = new DataInputBuffer(); if (decompress) { valDecompressor = CodecPool.getDecompressor(codec); valInFilter = codec.createInputStream(valBuffer, valDecompressor); valIn = new DataInputStream(valInFilter); } else { valIn = valBuffer; } the problem seems to be caused by valBuffer = new DataInputBuffer(); ,because GzipCodec.createInputStream creates an instance of GzipInputStream whose constructor creates an instance of ResetableGZIPInputStream class.When ResetableGZIPInputStream's constructor calls it base class java.util.zip.GZIPInputStream's constructor ,it trys to read the empty valBuffer = new DataInputBuffer(); and get no content,so it throws an EOFException. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8577) The RPC must have failed proxyUser (auth:SIMPLE) via realus...@hadoop.apache.org (auth:SIMPLE)
[ https://issues.apache.org/jira/browse/HADOOP-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-8577. - Resolution: Invalid The JIRA is to track issues with the project, not for user/dev-help. Please ask your question on common-dev[at]hadoop.apache.org mailing lists instead, and refrain from posting general questions on the JIRA. Thanks! :) P.s. The issue is your OS. Fix your /etc/hosts to use the right format of IP FQDN ALIAS, instead of IP ALIAS FQDN. In any case, please mail the right user/dev group. See http://hadoop.apache.org/mailing_lists.html The RPC must have failed proxyUser (auth:SIMPLE) via realus...@hadoop.apache.org (auth:SIMPLE) -- Key: HADOOP-8577 URL: https://issues.apache.org/jira/browse/HADOOP-8577 Project: Hadoop Common Issue Type: Bug Components: test Environment: Ubuntu 11 JDK 1.7 Maven 3.0.4 Reporter: chandrashekhar Kotekar Priority: Minor Original Estimate: 12h Remaining Estimate: 12h Hi, I have downloaded maven source code today itself and tried test it. I did following steps : 1) mvn clean 2) mvn compile 3) mvn test After 3rd step one step failed. Stack trace of failed test is as follows : Failed tests: testRealUserIPNotSpecified(org.apache.hadoop.security.TestDoAsEffectiveUser): The RPC must have failed proxyUser (auth:SIMPLE) via realus...@hadoop.apache.org (auth:SIMPLE) testWithDirStringAndConf(org.apache.hadoop.fs.shell.TestPathData): checking exist testPartialAuthority(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:myfs://host.a.b:123 but was:myfs://host.a:123 testFullAuthority(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:null but was:java.lang.IllegalArgumentException: Wrong FS: myfs://host/file, expected: myfs://host.a.b testShortAuthorityWithDefaultPort(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:myfs://host.a.b:123 but was:myfs://host:123 testPartialAuthorityWithDefaultPort(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:myfs://host.a.b:123 but was:myfs://host.a:123 testShortAuthority(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:myfs://host.a.b:123 but was:myfs://host:123 testIpAuthorityWithOtherPort(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:myfs://127.0.0.1:456 but was:myfs://localhost:456 testAuthorityFromDefaultFS(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:myfs://host.a.b:123 but was:myfs://host:123 testFullAuthorityWithDefaultPort(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:null but was:java.lang.IllegalArgumentException: Wrong FS: myfs://host/file, expected: myfs://host.a.b:123 testShortAuthorityWithOtherPort(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:myfs://host.a.b:456 but was:myfs://host:456 testPartialAuthorityWithOtherPort(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:myfs://host.a.b:456 but was:myfs://host.a:456 testFullAuthorityWithOtherPort(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:null but was:java.lang.IllegalArgumentException: Wrong FS: myfs://host:456/file, expected: myfs://host.a.b:456 testIpAuthority(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:myfs://127.0.0.1:123 but was:myfs://localhost:123 testIpAuthorityWithDefaultPort(org.apache.hadoop.fs.TestFileSystemCanonicalization): expected:myfs://127.0.0.1:123 but was:myfs://localhost:123 Tests in error: testUnqualifiedUriContents(org.apache.hadoop.fs.shell.TestPathData): `d1': No such file or directory I am newbie in Hadoop source code world. Please help me in building hadoop source code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8570) Bzip2Codec should accept .bz files too
Harsh J created HADOOP-8570: --- Summary: Bzip2Codec should accept .bz files too Key: HADOOP-8570 URL: https://issues.apache.org/jira/browse/HADOOP-8570 Project: Hadoop Common Issue Type: Improvement Components: io Affects Versions: 2.0.0-alpha, 1.0.0 Reporter: Harsh J The default extension reported for Bzip2Codec today is .bz2. This causes it not to pick up .bz files as Bzip2Codec files. Although the extension is not very popular today, it is still mentioned as a valid extension in the bunzip manual and we should support it. We should change the Bzip2Codec default extension to bz, or we should add in a new extension list support to allow for better detection across various aliases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-3450) Add tests to Local Directory Allocator for asserting their URI-returning capability
[ https://issues.apache.org/jira/browse/HADOOP-3450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-3450. - Resolution: Fixed Fix Version/s: 2.0.1-alpha Target Version/s: (was: 2.0.1-alpha, 3.0.0) Committed to trunk and branch-2. Thank you Sho! Add tests to Local Directory Allocator for asserting their URI-returning capability --- Key: HADOOP-3450 URL: https://issues.apache.org/jira/browse/HADOOP-3450 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 0.17.0 Reporter: Ari Rabkin Assignee: Sho Shimauchi Priority: Minor Labels: newbie Fix For: 2.0.1-alpha Attachments: HADOOP-3450.txt Original comment: {quote}Local directory allocator returns a bare path, without a URI specifier. This means that calling Path.getFileSystem will do the wrong thing with the returned path. Should really stick a file:// in front. Also it's test cases need to be improved to make sure this class works fine. {quote} Only the latter needed to be done (see below for discussion). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8531) SequenceFile Writer can throw out a better error if a serializer isn't available
Harsh J created HADOOP-8531: --- Summary: SequenceFile Writer can throw out a better error if a serializer isn't available Key: HADOOP-8531 URL: https://issues.apache.org/jira/browse/HADOOP-8531 Project: Hadoop Common Issue Type: Improvement Reporter: Harsh J Priority: Trivial Currently, if the provided Key/Value class lacks a proper serializer in the loaded config for the SequenceFile.Writer, we get an NPE as the null return goes unchecked. Hence we get: {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1163) at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1079) at org.apache.hadoop.io.SequenceFile$RecordCompressWriter.init(SequenceFile.java:1331) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:271) {code} We can provide a better message + exception in such cases. This is slightly related to MAPREDUCE-2584. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8532) [Configuration] Increase or make variable substitution depth configurable
Harsh J created HADOOP-8532: --- Summary: [Configuration] Increase or make variable substitution depth configurable Key: HADOOP-8532 URL: https://issues.apache.org/jira/browse/HADOOP-8532 Project: Hadoop Common Issue Type: Improvement Components: conf Affects Versions: 2.0.0-alpha Reporter: Harsh J We've had some users recently complain that the default MAX_SUBST hardcode of 20 isn't sufficient for their substitution needs and they wished it were configurable rather than having to roll about with workarounds such as using temporary smaller substitutes and then building the fuller one after it. We should consider raising the default hardcode, or provide a way to make it configurable instead. Related: HIVE-2021 changed something similar for their HiveConf classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-3421) Requirements for a Resource Manager for Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-3421. - Resolution: Duplicate Resolving as dupe of MAPREDUCE-279. Although, this is much better doc-wise and serves as a good reference. Please reopen if I missed something that the other didn't provide (and was the goal here). Requirements for a Resource Manager for Hadoop -- Key: HADOOP-3421 URL: https://issues.apache.org/jira/browse/HADOOP-3421 Project: Hadoop Common Issue Type: New Feature Reporter: Vivek Ratan This is a proposal to extend the scheduling functionality of Hadoop to allow sharing of large clusters without the use of HOD. We're suffering from performance issues with HOD and not finding it the right model for running jobs. We have concluded that a native Hadoop Resource Manager would be more useful to many people if it supported the features we need for sharing clusters across large groups and organizations. Below are the key requirements for a Resource Manager for Hadoop. First, some terminology used in this writeup: * *RM*: Resource Manager. What we're building. * *MR*: Map Reduce. * A *job* is an MR job for now, but can be any request. Jobs are submitted by users to the Grid. MR jobs are made up of units of computation called *tasks*. * A grid has a variety of *resources* of different *capacities* that are allocated to tasks. For the the early version of the grid, the only resource considered is a Map or Reduce slot, which can execute a task. Each slot can run one or more tasks. Later versions may look at resources such as local temporary storage or CPUs. * *V1*: version 1. Some features are simplified for V1. h3. Orgs, queues, users, jobs Organizations (*Orgs*) are distinct entities for administration, configuration, billing and reporting purposes. *Users* belong to Orgs. Orgs have *queues* of jobs, where a queue represents a collection of jobs that share some scheduling criteria. * *1.1.* For V1, each queue will belong to one Org and each Org will have one queue. * *1.2.* Jobs are submitted to queues. A single job can be submitted to only one queue. It follows that a job will have a user and an Org associated with it. * *1.3.* A user can belong to multiple Orgs and can potentially submit jobs to multiple queues. * *1.4.* Orgs are guaranteed a fraction of the capacity of the grid (their 'guaranteed capacity') in the sense that a certain capacity of resources will be at their disposal. All jobs submitted to the queues of an Org will have access to the capacity guaranteed to the Org. ** Note: it is expected that the sum of the guaranteed capacity of each Org should equal the resources in the Grid. If the sum is lower, some resources will not be used. If the sum is higher, the RM cannot maintain guarantees for all Orgs. * *1.5.* At any given time, free resources can be allocated to any Org beyond their guaranteed capacity. For example this may be in the proportion of guaranteed capacities of various Orgs or some other way. However, these excess allocated resources can be reclaimed and made available to another Org in order to meet its capacity guarantee. * *1.6.* N minutes after an org reclaims resources, it should have all its reserved capacity available. Put another way, the system will guarantee that excess resources taken from an Org will be restored to it within N minutes of its need for them. * *1.7.* Queues have access control. Queues can specify which users are (not) allowed to submit jobs to it. A user's job submission will be rejected if the user does not have access rights to the queue. h3. Job capacity * *2.1.* Users will just submit jobs to the Grid. They do not need to specify the capacity required for their jobs (i.e. how many parallel tasks the job needs). [Most MR jobs are elastic and do not require a fixed number of parallel tasks to run - they can run with as little or as much task parallelism as they can get. This amount of task parallelism is usually limited by the number of mappers required (which is computed by the system and not by the user) or the amount of free resources available in the grid. In most cases, the user wants to just submit a job and let the system take care of utilizing as many or as little resources as it can.] h3. Priorities * *3.1.* Jobs can optionally have priorities associated with them. For V1, we support the same set of priorities available to MR jobs today. * *3.2.* Queues can optionally support priorities for jobs. By default, a queue does not support priorities, in which case it will ignore (with a warning) any priority levels specified by jobs submitted to it. If
[jira] [Reopened] (HADOOP-3444) Implementing a Resource Manager (V1) for Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened HADOOP-3444: - Implementing a Resource Manager (V1) for Hadoop --- Key: HADOOP-3444 URL: https://issues.apache.org/jira/browse/HADOOP-3444 Project: Hadoop Common Issue Type: New Feature Reporter: Vivek Ratan Attachments: RMArch-V1.jpg HADOOP-3421 lists the requirements for a Resource Manager for Hadoop. This Jira tracks its implementation. It is expected that this Jira will be used to keep track of various other Jiras that will be opened towards implementing Version 1 of the Resource Manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-3444) Implementing a Resource Manager (V1) for Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-3444. - Resolution: Fixed MAPREDUCE-279 has covered this. Resolving as dupe, same as its parent. Implementing a Resource Manager (V1) for Hadoop --- Key: HADOOP-3444 URL: https://issues.apache.org/jira/browse/HADOOP-3444 Project: Hadoop Common Issue Type: New Feature Reporter: Vivek Ratan Attachments: RMArch-V1.jpg HADOOP-3421 lists the requirements for a Resource Manager for Hadoop. This Jira tracks its implementation. It is expected that this Jira will be used to keep track of various other Jiras that will be opened towards implementing Version 1 of the Resource Manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-3444) Implementing a Resource Manager (V1) for Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-3444. - Resolution: Duplicate (Re-resolving as dupe) Implementing a Resource Manager (V1) for Hadoop --- Key: HADOOP-3444 URL: https://issues.apache.org/jira/browse/HADOOP-3444 Project: Hadoop Common Issue Type: New Feature Reporter: Vivek Ratan Attachments: RMArch-V1.jpg HADOOP-3421 lists the requirements for a Resource Manager for Hadoop. This Jira tracks its implementation. It is expected that this Jira will be used to keep track of various other Jiras that will be opened towards implementing Version 1 of the Resource Manager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8435) Propdel all svn:mergeinfo
Harsh J created HADOOP-8435: --- Summary: Propdel all svn:mergeinfo Key: HADOOP-8435 URL: https://issues.apache.org/jira/browse/HADOOP-8435 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Harsh J Assignee: Harsh J TortoiseSVN/some versions of svn have added several mergeinfo props to Hadoop's svn files/dirs (list below). We should propdel that unneeded property, and fix it up. This otherwise causes pain to those who backport with a simple root-dir-down command (svn merge -c num url/path). We should also make sure to update the HowToCommit page on advising to avoid mergeinfo additions to prevent this from reoccurring. Files affected are, from my propdel revert output earlier today: {code} Reverted '.' Reverted 'hadoop-hdfs-project' Reverted 'hadoop-hdfs-project/hadoop-hdfs' Reverted 'hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs' Reverted 'hadoop-hdfs-project/hadoop-hdfs/src/main/java' Reverted 'hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode' Reverted 'hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs' Reverted 'hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary' Reverted 'hadoop-hdfs-project/hadoop-hdfs/src/main/native' Reverted 'hadoop-mapreduce-project' Reverted 'hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site' Reverted 'hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt' Reverted 'hadoop-mapreduce-project/conf' Reverted 'hadoop-mapreduce-project/CHANGES.txt' Reverted 'hadoop-mapreduce-project/src/test/mapred' Reverted 'hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs' Reverted 'hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs' Reverted 'hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc' Reverted 'hadoop-mapreduce-project/src/contrib' Reverted 'hadoop-mapreduce-project/src/contrib/eclipse-plugin' Reverted 'hadoop-mapreduce-project/src/contrib/block_forensics' Reverted 'hadoop-mapreduce-project/src/contrib/index' Reverted 'hadoop-mapreduce-project/src/contrib/data_join' Reverted 'hadoop-mapreduce-project/src/contrib/build-contrib.xml' Reverted 'hadoop-mapreduce-project/src/contrib/vaidya' Reverted 'hadoop-mapreduce-project/src/contrib/build.xml' Reverted 'hadoop-mapreduce-project/src/java' Reverted 'hadoop-mapreduce-project/src/webapps/job' Reverted 'hadoop-mapreduce-project/src/c++' Reverted 'hadoop-mapreduce-project/src/examples' Reverted 'hadoop-mapreduce-project/hadoop-mapreduce-examples' Reverted 'hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml' Reverted 'hadoop-mapreduce-project/bin' Reverted 'hadoop-common-project' Reverted 'hadoop-common-project/hadoop-common' Reverted 'hadoop-common-project/hadoop-common/src/test/core' Reverted 'hadoop-common-project/hadoop-common/src/main/java' Reverted 'hadoop-common-project/hadoop-common/src/main/docs' Reverted 'hadoop-common-project/hadoop-auth' Reverted 'hadoop-project' Reverted 'hadoop-project/src/site' {code} Proposed set of fix (from http://stackoverflow.com/questions/767418/remove-unnecessary-svnmergeinfo-properties): {code} svn propdel svn:mergeinfo -R svn revert . svn commit -m appropriate message {code} (To be done on branch-2 and trunk both) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8395) Text shell command unnecessarily demands that a SequenceFile's key class be WritableComparable
Harsh J created HADOOP-8395: --- Summary: Text shell command unnecessarily demands that a SequenceFile's key class be WritableComparable Key: HADOOP-8395 URL: https://issues.apache.org/jira/browse/HADOOP-8395 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.0.0 Reporter: Harsh J Priority: Trivial Text from Display set of Shell commands (hadoop fs -text), has a strict subclass check for a sequence-file-header loaded key class to be a subclass of WritableComparable. The sequence file writer itself has no such checks (one can create sequence files with just plain writable keys, comparable is needed for sequence file's sorter alone, which not all of them use always), and hence its not reasonable for Text command to carry it either. We should relax the check and simply just check for Writable, not WritableComparable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8359) Clear up javadoc warnings in hadoop-common-project
Harsh J created HADOOP-8359: --- Summary: Clear up javadoc warnings in hadoop-common-project Key: HADOOP-8359 URL: https://issues.apache.org/jira/browse/HADOOP-8359 Project: Hadoop Common Issue Type: Task Components: conf Affects Versions: 2.0.0 Reporter: Harsh J Priority: Trivial Javadocs added in HADOOP-8172 has introduced two new javadoc warnings. Should be easy to fix these (just missing #s for method refs). {code} [WARNING] Javadoc Warnings [WARNING] /Users/harshchouraria/Work/code/apache/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java:334: warning - Tag @link: missing '#': addDeprecation(String key, String newKey) [WARNING] /Users/harshchouraria/Work/code/apache/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java:285: warning - Tag @link: missing '#': addDeprecation(String key, String newKey, [WARNING] String customMessage) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8323) Revert HADOOP-7940
[ https://issues.apache.org/jira/browse/HADOOP-8323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-8323. - Resolution: Won't Fix Target Version/s: (was: 3.0.0, 2.0.0) Actually, I looked at the API and this would only impact the Text usage iff clear() is called upon it. I do not think we should revert this. Clear must work as intended - and clear the byte array states inside. There wouldn't be any other way to free the memory if we didn't do this. I do not see clear() being used in MR directly. So the largest length is still maintained but thats not an issue (except that clear may be called for memory gains if the user wants that). I'm resolving this as Won't Fix (Won't revert). But if I've missed addressing something, please reopen. Revert HADOOP-7940 -- Key: HADOOP-8323 URL: https://issues.apache.org/jira/browse/HADOOP-8323 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 2.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Critical Per [~jdonofrio]'s comments on HADOOP-7940, we should revert it as it has caused a performance regression (for scenarios where Text is reused, popular in MR). The clear() works as intended, as the API also offers a current length API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HADOOP-8323) Revert HADOOP-7940
[ https://issues.apache.org/jira/browse/HADOOP-8323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened HADOOP-8323: - Revert HADOOP-7940 -- Key: HADOOP-8323 URL: https://issues.apache.org/jira/browse/HADOOP-8323 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 2.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Critical Per [~jdonofrio]'s comments on HADOOP-7940, we should revert it as it has caused a performance regression (for scenarios where Text is reused, popular in MR). The clear() works as intended, as the API also offers a current length API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-3977) SequenceFile.Writer reopen (hdfs append)
[ https://issues.apache.org/jira/browse/HADOOP-3977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-3977. - Resolution: Duplicate A fresher effort is ongoing at HADOOP-7139 (Resolving as duplicate) SequenceFile.Writer reopen (hdfs append) Key: HADOOP-3977 URL: https://issues.apache.org/jira/browse/HADOOP-3977 Project: Hadoop Common Issue Type: Improvement Components: io Reporter: Karl Wettin Assignee: Karl Wettin Priority: Minor Attachments: HADOOP-3977.txt, HADOOP-3977.txt Allows for reopening and appending to a SequenceFile -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8301) Common (hadoop-tools) side of MAPREDUCE-4172
Harsh J created HADOOP-8301: --- Summary: Common (hadoop-tools) side of MAPREDUCE-4172 Key: HADOOP-8301 URL: https://issues.apache.org/jira/browse/HADOOP-8301 Project: Hadoop Common Issue Type: Task Components: build Affects Versions: 3.0.0 Reporter: Harsh J Assignee: Harsh J Patches on MAPREDUCE-4172 (for MR-relevant projects) that requires to run off of Hadoop Common project for Hadoop QA. One sub-task per hadoop-tools submodule will be added here for reviews. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8302) Clean up hadoop-rumen
Harsh J created HADOOP-8302: --- Summary: Clean up hadoop-rumen Key: HADOOP-8302 URL: https://issues.apache.org/jira/browse/HADOOP-8302 Project: Hadoop Common Issue Type: Sub-task Components: build Affects Versions: 3.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Clean up a bunch of existing javac warnings in hadoop-rumen module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8303) Clean up hadoop-streaming
Harsh J created HADOOP-8303: --- Summary: Clean up hadoop-streaming Key: HADOOP-8303 URL: https://issues.apache.org/jira/browse/HADOOP-8303 Project: Hadoop Common Issue Type: Sub-task Components: build Affects Versions: 3.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor Clean up a bunch of existing javac warnings in hadoop-streaming module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7431) Test DiskChecker's functionality in identifying bad directories (Part 2 of testing DiskChecker)
[ https://issues.apache.org/jira/browse/HADOOP-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-7431. - Resolution: Not A Problem See my earlier comment. This was already covered. Test DiskChecker's functionality in identifying bad directories (Part 2 of testing DiskChecker) --- Key: HADOOP-7431 URL: https://issues.apache.org/jira/browse/HADOOP-7431 Project: Hadoop Common Issue Type: Test Components: test, util Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Labels: test Fix For: 0.23.0 Add a test for the DiskChecker#checkDir method used in other projects (HDFS). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-1922) The task output promotion exception handler should include the IOException in the diagnostic message
[ https://issues.apache.org/jira/browse/HADOOP-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-1922. - Resolution: Fixed This is now taken care of by OutputCommitter framework. Although, at the time this was opened around, it was addressed by refactors at HADOOP-1874 The task output promotion exception handler should include the IOException in the diagnostic message Key: HADOOP-1922 URL: https://issues.apache.org/jira/browse/HADOOP-1922 Project: Hadoop Common Issue Type: Bug Reporter: Owen O'Malley Assignee: Devaraj Das When the JobTracker fails to promote output, it should have a more detailed error message that includes the exception that was thrown by the FileSystem operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-1769) Possible StackOverflowError in FileSystem.get(Uri uri, Configuration conf) method
[ https://issues.apache.org/jira/browse/HADOOP-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-1769. - Resolution: Cannot Reproduce Doesn't look like its a problem anymore. Here's a simple test to guarantee (there may already exist something like this, but I did not look) {code} public void testStack() throws IOException, URISyntaxException { Configuration conf = new Configuration(); String url = /; URI uri = new URI(url); assertEquals(null, uri.getScheme()); FileSystem fs = FileSystem.get(uri, conf); } {code} Marking as 'Cannot Reproduce' (now). Possible StackOverflowError in FileSystem.get(Uri uri, Configuration conf) method - Key: HADOOP-1769 URL: https://issues.apache.org/jira/browse/HADOOP-1769 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 0.14.0 Reporter: Thomas Friol When calling the method Filesytem.get(Uri uri, Configuration conf) with an URI without scheme - StackOverflowError {noformat} Exception in thread Main Thread java.lang.StackOverflowError: at java.util.regex.Matcher.init(Matcher.java:201) at java.util.regex.Pattern.matcher(Pattern.java:879) at org.apache.hadoop.conf.Configuration.substituteVars(Configuration.java:182) at org.apache.hadoop.conf.Configuration.get(Configuration.java:247) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:143) at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:118) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:143) at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:118) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:143) at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:118) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:143) at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:118) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:143) at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:118) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:143) at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:118) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:143) at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:118) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:143) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-2221) Configuration.toString is broken
[ https://issues.apache.org/jira/browse/HADOOP-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-2221. - Resolution: Not A Problem Nicholas - Yep, looks invalid given current state (0.23/trunk) of Configuration. Resources and Default Resources are now loaded during toString ops. {code} @Override public String toString() { StringBuilder sb = new StringBuilder(); sb.append(Configuration: ); if(loadDefaults) { toString(defaultResources, sb); if(resources.size()0) { sb.append(, ); } } toString(resources, sb); return sb.toString(); } {code} Closing as Not-a-problem (anymore). Configuration.toString is broken Key: HADOOP-2221 URL: https://issues.apache.org/jira/browse/HADOOP-2221 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 0.15.0 Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: HADOOP-2221_1_2007117.patch {{Configuration.toString}} doesn't string-ify the {{Configuration.resources}} field which was added in HADOOP-785. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-3291) Add StackWritable and QueueWritable classes
[ https://issues.apache.org/jira/browse/HADOOP-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-3291. - Resolution: Not A Problem I feel these are better suited as 3rd party packages, or under projects like Mahout where they may be utilized as a utility class. However, feel free to reopen if you feel they add good value to Hadoop common itself. Add StackWritable and QueueWritable classes --- Key: HADOOP-3291 URL: https://issues.apache.org/jira/browse/HADOOP-3291 Project: Hadoop Common Issue Type: New Feature Components: io Environment: All Reporter: Dennis Kubes Assignee: Dennis Kubes Attachments: HADOOP-3291-1-20080421.patch Adds Writable classes for FIFO Queu and LIFO Stack data structures. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-6342) Create a script to squash a common, hdfs, and mapreduce tarball into a single hadoop tarball
[ https://issues.apache.org/jira/browse/HADOOP-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-6342. - Resolution: Not A Problem Fix Version/s: (was: 0.21.1) 0.22.0 Looks like this was not marked as resolved post Tom's comment earlier, of https://issues.apache.org/jira/browse/HADOOP-6846 having fixed it in 0.22 Create a script to squash a common, hdfs, and mapreduce tarball into a single hadoop tarball Key: HADOOP-6342 URL: https://issues.apache.org/jira/browse/HADOOP-6342 Project: Hadoop Common Issue Type: New Feature Components: build Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.22.0 Attachments: HADOOP-6342.2.patch, HADOOP-6342.patch, h-6342.patch, tar-munge, tar-munge It would be convenient for the transition if we had a script to take a set of common, hdfs, and mapreduce tarballs and merge them into a single tarball. This is intended just to help users who don't want to transition to split projects for deployment immediately. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-516) Eclipse-based GUI: DFS explorer and basic Map/Reduce job launcher
[ https://issues.apache.org/jira/browse/HADOOP-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-516. Resolution: Fixed The plugin seems to have integrated this feature already. Should this still be open, please reopen it. Eclipse-based GUI: DFS explorer and basic Map/Reduce job launcher - Key: HADOOP-516 URL: https://issues.apache.org/jira/browse/HADOOP-516 Project: Hadoop Common Issue Type: New Feature Environment: Eclipse 3.2 JDK 1.5 Reporter: Frédéric Bertin Attachments: hdfsExplorer.zip, hdfsExplorer2.zip to increase productivity in our current project (which makes a heavy use of Hadoop), we wrote a small Eclipse-based GUI application which basically consists in 2 views: * a HDFS explorer adapted from Eclipse filesystem explorer example. For now, it includes the following features: o classical tree-based browsing interface, with directory content being detailed in a 3 columns table (file name, file size, file type) o refresh button o delete file or directory (with confirm dialog): select files in the tree or table and click the Delete button o rename file or directory: simple click on the file in the table, type the new name and validate o open file with system editor: select the file in the table and click Open button (works on Windows, not on Linux) o internal drag drop o external drag drop from the local filesystem to the HDFS (the opposite doesn't work) * a MapReduce *very* simple job launcher: o select the job XML configuration file o run the job o kill the job o visualize map and reduce progress with progress bars o open a browser on the Hadoop job tracker web interface INSTALLATION NOTES: - Eclipse 3.2 - JDK 1.5 - import the archive in Eclipse - copy your hadoop conf file (hadoop-default.xml in src folder) - this step should be moved in the GUI later - right-click on the project and Run As - Eclipse Application - enjoy... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-323) IO Exception at LocalFileSystem.renameRaw, when running Nutch nightly builds (0.8-dev).
[ https://issues.apache.org/jira/browse/HADOOP-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-323. Resolution: Invalid Was fixed long time ago, but wasn't closed. Doesn't apply today. I run LJRunner and it never really complains in any run about things like these. Closing as Invalid (now). IO Exception at LocalFileSystem.renameRaw, when running Nutch nightly builds (0.8-dev). --- Key: HADOOP-323 URL: https://issues.apache.org/jira/browse/HADOOP-323 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 0.3.2 Environment: Windows XP + CygWin Reporter: KuroSaka TeruHiko IO Exception at LocalFileSystem.renameRaw, when running Nutch nightly builds (0.8-dev). Please see the deatil descriptions in: http://issues.apache.org/jira/browse/NUTCH-266 Not knowing how to reclassify an existing bug, I am opening this new bug under Hadoop. The version number is 0.3.3 but because I don't see it in the jira list, I chose the closest matching version. The Nutch-with-GUI build was running with hadoop-0.2 but stopped running, exhibiting the same symptom with other nightly builds, when switched to use hadoop-0.3.3. I checked fs as component but this bug could also be caused by the order in which jobs are scheduled, I suspect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-466) Startup scripts will not start instances of Hadoop daemons w/different configs w/o setting separate PID directories
[ https://issues.apache.org/jira/browse/HADOOP-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HADOOP-466. Resolution: Fixed Fix Version/s: 0.20.0 This problem indeed exists if one doesn't use the HADOOP_IDENT_STRING, but that's a valid workaround than adding a dependency on md5sum and the like (or do we already use it?). I think this may be resolved as fixed with the availability of HADOOP_IDENT_STRING to workaround with. Workaround (tested to work in 0.20.2): {code} # To start a second DN on same machine, with separated config. HADOOP_IDENT_STRING=$USER-DN2 hadoop-daemon.sh --config /conf/dn2 start datanode HADOOP_IDENT_STRING=$USER-DN2 hadoop-daemon.sh --config /conf/dn2 stop datanode # These manage the PIDs as well, and will not complain that stuff is already running. {code} Startup scripts will not start instances of Hadoop daemons w/different configs w/o setting separate PID directories --- Key: HADOOP-466 URL: https://issues.apache.org/jira/browse/HADOOP-466 Project: Hadoop Common Issue Type: Improvement Components: conf Affects Versions: 0.5.0 Reporter: Vetle Roeim Fix For: 0.20.0 Attachments: hadoop-466.diff Configuration directories can be specified by either setting HADOOP_CONF_DIR or using the --config command line option. However, the hadoop-daemon.sh script will not start the daemons unless the PID directory is separate for each configuration. The issue is that the code for generating PID filenames is not dependent on the configuration directory. While the PID directory can be changed in hadoop-env.sh, it seems a little unnecessary to have this restriction. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira