[jira] [Created] (HADOOP-16526) Support LDAP authenticaition (bind) via GSSAPI
Todd Lipcon created HADOOP-16526: Summary: Support LDAP authenticaition (bind) via GSSAPI Key: HADOOP-16526 URL: https://issues.apache.org/jira/browse/HADOOP-16526 Project: Hadoop Common Issue Type: Improvement Components: security Reporter: Todd Lipcon Currently the LDAP group mapping provider only supports simple (user/password) authentication. In some cases it's more convenient to use GSSAPI (kerberos) authentication here, particularly when the server doing the mapping is already using a keytab provided by the same instance (eg IPA or AD). We should provide a configuration to turn on GSSAPI and put the right UGI 'doAs' calls in place to ensure an appropriate Subject in those calls. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16525) LDAP group mapping should include primary posix group
Todd Lipcon created HADOOP-16525: Summary: LDAP group mapping should include primary posix group Key: HADOOP-16525 URL: https://issues.apache.org/jira/browse/HADOOP-16525 Project: Hadoop Common Issue Type: Improvement Reporter: Todd Lipcon Assignee: Todd Lipcon When configuring LdapGroupsMapping against FreeIPA, the current implementation searches for groups which have the user listed as a member. This catches all "secondary" groups but misses the user's primary group (typically the same name as their username). We should include a search for a group matching the user's primary gidNumber in the group search. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16179) hadoop-common pom should not depend on kerb-simplekdc
Todd Lipcon created HADOOP-16179: Summary: hadoop-common pom should not depend on kerb-simplekdc Key: HADOOP-16179 URL: https://issues.apache.org/jira/browse/HADOOP-16179 Project: Hadoop Common Issue Type: Improvement Reporter: Todd Lipcon The hadoop-common pom currently has a dependency on kerb-simplekdc. In fact, the only classes used from Kerby are in kerb-core and kerb-util (which is a transitive dependency frmo kerb-core). Depending on kerb-simplekdc pulls a bunch of other unnecessary classes into the hadoop-common classpath. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16011) OsSecureRandom very slow compared to other SecureRandom implementations
Todd Lipcon created HADOOP-16011: Summary: OsSecureRandom very slow compared to other SecureRandom implementations Key: HADOOP-16011 URL: https://issues.apache.org/jira/browse/HADOOP-16011 Project: Hadoop Common Issue Type: Bug Components: security Reporter: Todd Lipcon In looking at performance of a workload which creates a lot of short-lived remote connections to a secured DN, [~philip] and I found very high system CPU usage. We tracked it down to reads from /dev/random, which are incurred by the DN using CryptoCodec.generateSecureRandom to generate a transient session key and IV for AES encryption. In the case that the OpenSSL codec is not enabled, the above code falls through to the JDK SecureRandom implementation, which performs reasonably. However, OpenSSLCodec defaults to using OsSecureRandom, which reads all random data from /dev/random rather than doing something more efficient like initializing a CSPRNG from a small seed. I wrote a simple JMH benchmark to compare various approaches when running with concurrency 10: testHadoop - using CryptoCodec testNewSecureRandom - using 'new SecureRandom()' each iteration testSha1PrngNew - using the SHA1PRNG explicitly, new instance each iteration testSha1PrngShared - using a single shared instance of SHA1PRNG testSha1PrngThread - using a thread-specific instance of SHA1PRNG {code:java} Benchmark Mode CntScore Error Units MyBenchmark.testHadoop thrpt 1293.000 ops/s [with libhadoop.so] MyBenchmark.testHadoop thrpt461515.697 ops/s [without libhadoop.so] MyBenchmark.testNewSecureRandom thrpt 43413.640 ops/s MyBenchmark.testSha1PrngNew thrpt395515.000 ops/s MyBenchmark.testSha1PrngShared thrpt164488.713 ops/s MyBenchmark.testSha1PrngThread thrpt 4295123.210 ops/s {code} In other words, the presence of the OpenSSL acceleration slows down this code path by 356x. And, compared to the optimal (thread-local Sha1Prng) it's 3321x slower. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15566) Remove HTrace support
Todd Lipcon created HADOOP-15566: Summary: Remove HTrace support Key: HADOOP-15566 URL: https://issues.apache.org/jira/browse/HADOOP-15566 Project: Hadoop Common Issue Type: Improvement Reporter: Todd Lipcon The HTrace incubator project has voted to retire itself and won't be making further releases. The Hadoop project currently has various hooks with HTrace. It seems in some cases (eg HDFS-13702) these hooks have had measurable performance overhead. Given these two factors, I think we should consider removing the HTrace integration. If there is someone willing to do the work, replacing it with OpenTracing might be a better choice since there is an active community. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15564) Classloading Shell should not run a subprocess
Todd Lipcon created HADOOP-15564: Summary: Classloading Shell should not run a subprocess Key: HADOOP-15564 URL: https://issues.apache.org/jira/browse/HADOOP-15564 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 3.0.0 Reporter: Todd Lipcon The 'Shell' class has a static member isSetsidSupported which, in order to initialize, forks out a subprocess. Various other parts of the code reference Shell.WINDOWS. For example, the StringUtils class has such a reference. This means that, during startup, a seemingly fast call like Configuration.getBoolean() ends up class-loading StringUtils, which class-loads Shell, which forks out a subprocess. I couldn't measure any big improvement by fixing this, but seemed surprising to say the least. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15557) Crypto streams should not crash when mis-used
Todd Lipcon created HADOOP-15557: Summary: Crypto streams should not crash when mis-used Key: HADOOP-15557 URL: https://issues.apache.org/jira/browse/HADOOP-15557 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 3.2.0 Reporter: Todd Lipcon In general, the non-positional read APIs for streams in Hadoop Common are meant to be used by only a single thread at a time. It would not make much sense to have concurrent multi-threaded access to seek+read because they modify the stream's file position. Multi-threaded access on input streams can be done using positional read APIs. Multi-threaded access on output streams probably never makes sense. In the case of DFSInputStream, the positional read APIs are marked synchronized, so that even when misused, no strange exceptions are thrown. The results are just somewhat undefined in that it's hard for a thread to know which position was read from. However, when running on an encrypted file system, the results are much worse: since CryptoInputStream's read methods are not marked synchronized, the caller can get strange ByteBuffer exceptions or even a JVM crash due to concurrent use and free of underlying OpenSSL Cipher buffers. The crypto stream wrappers should be made more resilient to such misuse, for example by: (a) making the read methods safer by making them synchronized (so they have the same behavior as DFSInputStream) or (b) trying to detect concurrent access to these methods and throwing ConcurrentModificationException so that the user is alerted to their probable misuse. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15554) Improve JIT performance for Configuration parsing
Todd Lipcon created HADOOP-15554: Summary: Improve JIT performance for Configuration parsing Key: HADOOP-15554 URL: https://issues.apache.org/jira/browse/HADOOP-15554 Project: Hadoop Common Issue Type: Improvement Components: conf, performance Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon In investigating a performance regression for small tasks between Hadoop 2 and Hadoop 3, we found that the amount of time spent in JIT was significantly higher. Using jitwatch we were able to determine that, due to a combination of switching from DOM to SAX style parsing and just having more configuration key/value pairs, Configuration.loadResource is now getting compiled with the C2 compiler and taking quite some time. Breaking that very large function up into several smaller ones and eliminating some redundant bits of code improves the JIT performance measurably. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15551) Avoid use of Java8 streams in Configuration.addTags
Todd Lipcon created HADOOP-15551: Summary: Avoid use of Java8 streams in Configuration.addTags Key: HADOOP-15551 URL: https://issues.apache.org/jira/browse/HADOOP-15551 Project: Hadoop Common Issue Type: Improvement Components: performance Affects Versions: 3.2 Reporter: Todd Lipcon Assignee: Todd Lipcon Configuration.addTags oddly uses Arrays.stream instead of a more conventional mechanism. When profiling a simple program that uses Configuration, I found that addTags was taking tens of millis of CPU to do very little work the first time it's called, accounting for ~8% of total profiler samples in my program. {code} [9] 4.52% 253 self: 0.00% 0 java/lang/invoke/MethodHandleNatives.linkCallSite [9] 3.71% 208 self: 0.00% 0 java/lang/invoke/MethodHandleNatives.linkMethodHandleConstant {code} I don't know much about the implementation details of the Streams stuff, but it seems it's probably meant more for cases with very large arrays or somesuch. Switching to a normal Set.addAll() call eliminates this from the profile. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15550) Avoid static initialization of ObjectMappers
Todd Lipcon created HADOOP-15550: Summary: Avoid static initialization of ObjectMappers Key: HADOOP-15550 URL: https://issues.apache.org/jira/browse/HADOOP-15550 Project: Hadoop Common Issue Type: Bug Components: performance Affects Versions: 3.2.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Various classes statically initialize an ObjectMapper READER instance. This ends up doing a bunch of class-loading of Jackson libraries that can add up to a fair amount of CPU, even if the reader ends up not being used. This is particularly the case with WebHdfsFileSystem, which is class-loaded by a serviceloader even when unused in a particular job. We should lazy-init these members instead of doing so as a static class member. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15549) Upgrade to commons-configuration 2.1 regresses task CPU consumption
Todd Lipcon created HADOOP-15549: Summary: Upgrade to commons-configuration 2.1 regresses task CPU consumption Key: HADOOP-15549 URL: https://issues.apache.org/jira/browse/HADOOP-15549 Project: Hadoop Common Issue Type: Bug Components: metrics Affects Versions: 3.0.2 Reporter: Todd Lipcon Assignee: Todd Lipcon HADOOP-13660 upgraded from commons-configuration 1.x to 2.x. commons-configuration is used when parsing the metrics configuration properties file. The new builder API used in the new version apparently makes use of a bunch of very bloated reflection and classloading nonsense to achieve the same goal, and this results in a regression of >100ms of CPU time as measured by a program which simply initializes DefaultMetricsSystem. This isn't a big deal for long-running daemons, but for MR tasks which might only run a few seconds on poorly-tuned jobs, this can be noticeable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-9545) Improve logging in ActiveStandbyElector
[ https://issues.apache.org/jira/browse/HADOOP-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-9545. - Resolution: Won't Fix > Improve logging in ActiveStandbyElector > --- > > Key: HADOOP-9545 > URL: https://issues.apache.org/jira/browse/HADOOP-9545 > Project: Hadoop Common > Issue Type: Improvement > Components: auto-failover, ha >Affects Versions: 2.1.0-beta >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Minor > > The ActiveStandbyElector currently logs a lot of stuff at DEBUG level which > would be useful for troubleshooting. We've seen one instance in the wild of a > ZKFC thinking it should be in standby state when in fact it won the election, > but the logging is insufficient to understand why. I'd like to bump most of > the existing DEBUG logs to INFO and add some additional logs as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-10859) Native implementation of java Checksum interface
[ https://issues.apache.org/jira/browse/HADOOP-10859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-10859. -- Resolution: Won't Fix No plans to work on this. > Native implementation of java Checksum interface > > > Key: HADOOP-10859 > URL: https://issues.apache.org/jira/browse/HADOOP-10859 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Minor > > Some parts of our code such as IFileInputStream/IFileOutputStream use the > java Checksum interface to calculate/verify checksums. Currently we don't > have a native implementation of these. For CRC32C in particular, we can get a > very big speedup with a native implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-10882) Move DirectBufferPool into common util
Todd Lipcon created HADOOP-10882: Summary: Move DirectBufferPool into common util Key: HADOOP-10882 URL: https://issues.apache.org/jira/browse/HADOOP-10882 Project: Hadoop Common Issue Type: Task Components: util Affects Versions: 2.6.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor MAPREDUCE-2841 uses a direct buffer pool to pass data back and forth between native and Java code. The branch has an implementation which appears to be derived from the one in HDFS. Instead of copy-pasting, we should move the HDFS DirectBufferPool into Common so that MR can make use of it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10859) Native implementation of java Checksum interface
Todd Lipcon created HADOOP-10859: Summary: Native implementation of java Checksum interface Key: HADOOP-10859 URL: https://issues.apache.org/jira/browse/HADOOP-10859 Project: Hadoop Common Issue Type: Improvement Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Some parts of our code such as IFileInputStream/IFileOutputStream use the java Checksum interface to calculate/verify checksums. Currently we don't have a native implementation of these. For CRC32C in particular, we can get a very big speedup with a native implementation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10855) Allow Text to be read with a known length
Todd Lipcon created HADOOP-10855: Summary: Allow Text to be read with a known length Key: HADOOP-10855 URL: https://issues.apache.org/jira/browse/HADOOP-10855 Project: Hadoop Common Issue Type: Improvement Components: io Affects Versions: 2.6.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor For the native task work (MAPREDUCE-2841) it is useful to be able to store strings in a different fashion than the default (varint-prefixed) serialization. We should provide a read method in Text which takes an already-known length to support this use case while still providing Text objects back to the user. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10288) Explicit reference to Log4JLogger breaks non-log4j users
Todd Lipcon created HADOOP-10288: Summary: Explicit reference to Log4JLogger breaks non-log4j users Key: HADOOP-10288 URL: https://issues.apache.org/jira/browse/HADOOP-10288 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.4.0 Reporter: Todd Lipcon Assignee: Todd Lipcon In HttpRequestLog, we make an explicit reference to the Log4JLogger class for an instanceof check. If the log4j implementation isn't actually on the classpath, the instanceof check throws NoClassDefFoundError instead of returning false. This means that dependent projects that don't use log4j can no longer embed HttpServer -- typically this is an issue when they use MiniDFSCluster as part of their testing. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HADOOP-10199) Precommit Admin build is not running because no previous successful build is available
[ https://issues.apache.org/jira/browse/HADOOP-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-10199. -- Resolution: Fixed Hadoop Flags: Reviewed Precommit Admin build is not running because no previous successful build is available -- Key: HADOOP-10199 URL: https://issues.apache.org/jira/browse/HADOOP-10199 Project: Hadoop Common Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Priority: Blocker Attachments: HADOOP-10199.patch It seems at some point the builds started failing for an unknown reason and eventually the last successful was rolled off. At that point the precommit builds started failing because they pull an artifact from the last successful build. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HADOOP-10200) Fix precommit script patch_tested.txt fallback option
[ https://issues.apache.org/jira/browse/HADOOP-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-10200. -- Resolution: Fixed Hadoop Flags: Reviewed Fix precommit script patch_tested.txt fallback option - Key: HADOOP-10200 URL: https://issues.apache.org/jira/browse/HADOOP-10200 Project: Hadoop Common Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HADOOP-10200.patch HADOOP-10199 created a fallback option when there is successful artifact. However that fallback option used the jenkins lastBuild build indicator. It appears that does not mean the last completed build, but strictly the last build, which in this context is the current build. The current build is running so it doesn't have any artifacts. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HADOOP-9765) Precommit Admin job chokes on issues without an attachment
[ https://issues.apache.org/jira/browse/HADOOP-9765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-9765. - Resolution: Fixed Hadoop Flags: Reviewed Precommit Admin job chokes on issues without an attachment -- Key: HADOOP-9765 URL: https://issues.apache.org/jira/browse/HADOOP-9765 Project: Hadoop Common Issue Type: Bug Components: build Reporter: Brock Noland Assignee: Brock Noland Attachments: HADOOP-9765.patch, HADOOP-9765.patch Check out this file: https://builds.apache.org/job/PreCommit-Admin/lastSuccessfulBuild/artifact/patch_tested.txt It has corrupt data: {noformat} HIVE-4877HDFS-5010,12593214 HIVE-4877HBASE-8693,12593082 HIVE-4877YARN-919,12593107 YARN-905,12593225 HIVE-4877HBASE-8752,12588069 {noformat} which resulted in the Hive precommit job being called with the ISSUE_NUM of 5010, 8693, 919, and 8752. Looking at the script and some output, I pulled from the last run, it looks like it gets hosed up when there is a JIRA which is PA but doesn't have an attachment (as ZK-1402 is currently sitting). For example: This is the bad data the script is encountering: {noformat} $ grep -A 2 'ZOOKEEPER-1402' patch_available2.elements ZOOKEEPER-1402 HBASE-8348 id=12592318 {noformat} This is where it screws up: {noformat} $ awk '{ printf %s, $0 }' patch_available2.elements | sed -e s/\W*id=\/,/g | perl -pe s/\/\n/g | grep ZOOKEEPER-1402 ZOOKEEPER-1402HBASE-8348 ,12592318 {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-9908) Fix NPE when versioninfo properties file is missing
Todd Lipcon created HADOOP-9908: --- Summary: Fix NPE when versioninfo properties file is missing Key: HADOOP-9908 URL: https://issues.apache.org/jira/browse/HADOOP-9908 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.1.0-beta, 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hadoop-9908.txt When running tests in Eclipse I ran into an NPE in VersionInfo since the version info properties file didn't properly make it to the classpath. This is because getResourceAsStream can return null if the file is not found. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8336) LocalFileSystem Does not seek to the correct location when Checksumming is off.
[ https://issues.apache.org/jira/browse/HADOOP-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-8336. - Resolution: Duplicate Resolved as duplicate of HADOOP-9307, since I'm pretty sure that solved this issue. LocalFileSystem Does not seek to the correct location when Checksumming is off. --- Key: HADOOP-8336 URL: https://issues.apache.org/jira/browse/HADOOP-8336 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Elliott Clark Assignee: Todd Lipcon Attachments: branch-1-test.txt Hbase was seeing an issue when trying to read data from a local filesystem instance with setVerifyChecksum(false). On debugging into it, the seek on the file was seeking to the checksum block index, but since checksumming was off that was the incorrect location. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9898) Set SO_KEEPALIVE on all our sockets
Todd Lipcon created HADOOP-9898: --- Summary: Set SO_KEEPALIVE on all our sockets Key: HADOOP-9898 URL: https://issues.apache.org/jira/browse/HADOOP-9898 Project: Hadoop Common Issue Type: Bug Components: ipc, net Affects Versions: 3.0.0 Reporter: Todd Lipcon Priority: Minor We recently saw an issue where network issues between slaves and the NN caused ESTABLISHED TCP connections to pile up and leak on the NN side. It looks like the RST packets were getting dropped, which meant that the client thought the connections were closed, while they hung open forever on the server. Setting the SO_KEEPALIVE option on our sockets would prevent this kind of leak from going unchecked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9707) Fix register lists for crc32c inline assembly
Todd Lipcon created HADOOP-9707: --- Summary: Fix register lists for crc32c inline assembly Key: HADOOP-9707 URL: https://issues.apache.org/jira/browse/HADOOP-9707 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 3.0.0, 2.1.0-beta Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor The inline assembly used for the crc32 instructions has an incorrect clobber list: the computed CRC values are in-out variables and thus need to use the matching constraint syntax in the clobber list. This doesn't seem to cause a problem now in Hadoop, but may break in a different compiler version which allocates registers differently, or may break when the same code is used in another context. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9618) Add thread which detects JVM pauses
Todd Lipcon created HADOOP-9618: --- Summary: Add thread which detects JVM pauses Key: HADOOP-9618 URL: https://issues.apache.org/jira/browse/HADOOP-9618 Project: Hadoop Common Issue Type: New Feature Components: util Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Often times users struggle to understand what happened when a long JVM pause (GC or otherwise) causes things to malfunction inside a Hadoop daemon. For example, a long GC pause while logging an edit to the QJM may cause the edit to timeout, or a long GC pause may make other IPCs to the NameNode timeout. We should add a simple thread which loops on 1-second sleeps, and if the sleep ever takes significantly longer than 1 second, log a WARN. This will make GC pauses obvious in logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9608) ZKFC should abort if it sees an unrecognized NN become active
Todd Lipcon created HADOOP-9608: --- Summary: ZKFC should abort if it sees an unrecognized NN become active Key: HADOOP-9608 URL: https://issues.apache.org/jira/browse/HADOOP-9608 Project: Hadoop Common Issue Type: Bug Components: ha Affects Versions: 3.0.0 Reporter: Todd Lipcon We recently had an issue where one NameNode and ZKFC was updated to a new configuration/IP address but the ZKFC on the other node was not rebooted. Then, next time a failover occurred, the second ZKFC was not able to become active because the data in the ActiveBreadCrumb didn't match the data in its own configuration: {code} org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election java.lang.IllegalArgumentException: Unable to determine service address for namenode '' {code} To prevent this from happening, whenever the ZKFC sees a new NN become active, it should check that it's properly able to instantiate a ServiceTarget for it, and if not, abort (since this ZKFC wouldn't be able to handle a failover successfully) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9601) Support native CRC on byte arrays
Todd Lipcon created HADOOP-9601: --- Summary: Support native CRC on byte arrays Key: HADOOP-9601 URL: https://issues.apache.org/jira/browse/HADOOP-9601 Project: Hadoop Common Issue Type: Improvement Components: performance, util Affects Versions: 3.0.0 Reporter: Todd Lipcon When we first implemented the Native CRC code, we only did so for direct byte buffers, because these correspond directly to native heap memory and thus make it easy to access via JNI. We'd generally assumed that accessing byte[] arrays from JNI was not efficient enough, but now that I know more about JNI I don't think that's true -- we just need to make sure that the critical sections where we lock the buffers are short. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9545) Improve logging in ActiveStandbyElector
Todd Lipcon created HADOOP-9545: --- Summary: Improve logging in ActiveStandbyElector Key: HADOOP-9545 URL: https://issues.apache.org/jira/browse/HADOOP-9545 Project: Hadoop Common Issue Type: Improvement Components: auto-failover, ha Affects Versions: 2.0.5-beta Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor The ActiveStandbyElector currently logs a lot of stuff at DEBUG level which would be useful for troubleshooting. We've seen one instance in the wild of a ZKFC thinking it should be in standby state when in fact it won the election, but the logging is insufficient to understand why. I'd like to bump most of the existing DEBUG logs to INFO and add some additional logs as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9420) Add percentile or max metric for rpcQueueTime, processing time
Todd Lipcon created HADOOP-9420: --- Summary: Add percentile or max metric for rpcQueueTime, processing time Key: HADOOP-9420 URL: https://issues.apache.org/jira/browse/HADOOP-9420 Project: Hadoop Common Issue Type: Bug Components: ipc, metrics Affects Versions: 2.0.3-alpha Reporter: Todd Lipcon Currently, we only export averages for rpcQueueTime and rpcProcessingTime. These metrics are most useful when looking at timeouts and slow responses, which in my experience are often caused by momentary spikes in load, which won't show up in averages over the 15+ second time intervals often used by metrics systems. We should collect at least the max queuetime and processing time over each interval, or the percentiles if it's not too expensive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9399) protoc maven plugin doesn't work on mvn 3.0.2
Todd Lipcon created HADOOP-9399: --- Summary: protoc maven plugin doesn't work on mvn 3.0.2 Key: HADOOP-9399 URL: https://issues.apache.org/jira/browse/HADOOP-9399 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hadoop-9399.txt On my machine with mvn 3.0.2, I get a ClassCastException trying to use the maven protoc plugin. The issue seems to be that mvn 3.0.2 sees the ListFile parameter, and doesn't see the generic type argument, and stuffs Strings inside instead. So, we get ClassCastException trying to use the objects as Files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9358) Auth failed log should include exception string
Todd Lipcon created HADOOP-9358: --- Summary: Auth failed log should include exception string Key: HADOOP-9358 URL: https://issues.apache.org/jira/browse/HADOOP-9358 Project: Hadoop Common Issue Type: Bug Components: ipc, security Affects Versions: 3.0.0, 2.0.4-beta Reporter: Todd Lipcon Assignee: Todd Lipcon Currently, when authentication fails, we see a WARN message like: {code} 2013-02-28 22:49:03,152 WARN ipc.Server (Server.java:saslReadAndProcess(1056)) - Auth failed for 1.2.3.4:12345:null {code} This is not useful to understand the underlying cause. The WARN entry should additionally include the exception text, eg: {code} 2013-02-28 22:49:03,152 WARN ipc.Server (Server.java:saslReadAndProcess(1056)) - Auth failed for 1.2.3.4:12345:null (GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34))]) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9307) BufferedFSInputStream.read returns wrong results after certain seeks
Todd Lipcon created HADOOP-9307: --- Summary: BufferedFSInputStream.read returns wrong results after certain seeks Key: HADOOP-9307 URL: https://issues.apache.org/jira/browse/HADOOP-9307 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.2-alpha, 1.1.1 Reporter: Todd Lipcon Assignee: Todd Lipcon After certain sequences of seek/read, BufferedFSInputStream can silently return data from the wrong part of the file. Further description in first comment below. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9150) Unnecessary DNS resolution attempts for logical URIs
Todd Lipcon created HADOOP-9150: --- Summary: Unnecessary DNS resolution attempts for logical URIs Key: HADOOP-9150 URL: https://issues.apache.org/jira/browse/HADOOP-9150 Project: Hadoop Common Issue Type: Bug Components: ha Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Todd Lipcon Priority: Critical In the FileSystem code, we accidentally try to DNS-resolve the logical name before it is converted to an actual domain name. In some DNS setups, this can cause a big slowdown - eg in one misconfigured cluster we saw a 2-3x drop in terasort throughput, since every task wasted a lot of time waiting for slow not found responses from DNS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9112) test-patch should -1 for @Tests without a timeout
Todd Lipcon created HADOOP-9112: --- Summary: test-patch should -1 for @Tests without a timeout Key: HADOOP-9112 URL: https://issues.apache.org/jira/browse/HADOOP-9112 Project: Hadoop Common Issue Type: Improvement Reporter: Todd Lipcon With our current test running infrastructure, if a test with no timeout set runs too long, it triggers a surefire-wide timeout, which for some reason doesn't show up as a failed test in the test-patch output. Given that, we should require that all tests have a timeout set, and have test-patch enforce this with a simple check -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9106) Allow configuration of IPC connect timeout
Todd Lipcon created HADOOP-9106: --- Summary: Allow configuration of IPC connect timeout Key: HADOOP-9106 URL: https://issues.apache.org/jira/browse/HADOOP-9106 Project: Hadoop Common Issue Type: Improvement Components: ipc Affects Versions: 3.0.0 Reporter: Todd Lipcon Currently the connection timeout in Client.setupConnection() is hard coded to 20seconds. This is unreasonable in some scenarios, such as HA failover, if we want a faster failover time. We should allow this to be configured per-client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8929) Add toString for SampleQuantiles
Todd Lipcon created HADOOP-8929: --- Summary: Add toString for SampleQuantiles Key: HADOOP-8929 URL: https://issues.apache.org/jira/browse/HADOOP-8929 Project: Hadoop Common Issue Type: Improvement Components: metrics Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Todd Lipcon The new SampleQuantiles class is useful in the context of benchmarks, but currently there is no way to print it out outside the context of a metrics sink. It would be nice to have a convenient way to stringify it for logging, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8905) Add metrics for HTTP Server
Todd Lipcon created HADOOP-8905: --- Summary: Add metrics for HTTP Server Key: HADOOP-8905 URL: https://issues.apache.org/jira/browse/HADOOP-8905 Project: Hadoop Common Issue Type: Improvement Components: metrics Affects Versions: 3.0.0 Reporter: Todd Lipcon Currently we don't expose any metrics about the HTTP server. It would be useful to be able to monitor the following: - Number of threads currently actively serving servlet requests - Total number of requests served - Perhaps break down time/count by endpoint (eg /jmx, /conf, various JSPs) This becomes more important as http-based protocols like webhdfs become more common -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8889) Upgrade to Surefire 2.12.3
Todd Lipcon created HADOOP-8889: --- Summary: Upgrade to Surefire 2.12.3 Key: HADOOP-8889 URL: https://issues.apache.org/jira/browse/HADOOP-8889 Project: Hadoop Common Issue Type: Improvement Components: build, test Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hadoop-8889.txt Surefire 2.12.3 has a couple improvements which are helpful for us. In particular, it fixes http://jira.codehaus.org/browse/SUREFIRE-817 which has been aggravating in the past. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8894) GenericTestUtils.waitFor should dump thread stacks on timeout
Todd Lipcon created HADOOP-8894: --- Summary: GenericTestUtils.waitFor should dump thread stacks on timeout Key: HADOOP-8894 URL: https://issues.apache.org/jira/browse/HADOOP-8894 Project: Hadoop Common Issue Type: Improvement Reporter: Todd Lipcon Assignee: Todd Lipcon Many tests use this utility to wait for a condition to become true. In the event that it times out, we should dump all the thread stack traces, in case the timeout was due to a deadlock. This should make it easier to debug scenarios like HDFS-4001. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8855) SSL-based image transfer does not work when Kerberos is disabled
Todd Lipcon created HADOOP-8855: --- Summary: SSL-based image transfer does not work when Kerberos is disabled Key: HADOOP-8855 URL: https://issues.apache.org/jira/browse/HADOOP-8855 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor In SecurityUtil.openSecureHttpConnection, we first check {{UserGroupInformation.isSecurityEnabled()}}. However, this only checks the kerberos config, which is independent of {{hadoop.ssl.enabled}}. Instead, we should check {{HttpConfig.isSecure()}}. Credit to Wing Yew Poon for discovering this bug -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8786) HttpServer continues to start even if AuthenticationFilter fails to init
Todd Lipcon created HADOOP-8786: --- Summary: HttpServer continues to start even if AuthenticationFilter fails to init Key: HADOOP-8786 URL: https://issues.apache.org/jira/browse/HADOOP-8786 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.1-alpha, 1.2.0, 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon As seen in HDFS-3904, if the AuthenticationFilter fails to initialize, the web server will continue to start up. We need to check for context initialization errors after starting the server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8757) Metrics should disallow names with invalid characters
Todd Lipcon created HADOOP-8757: --- Summary: Metrics should disallow names with invalid characters Key: HADOOP-8757 URL: https://issues.apache.org/jira/browse/HADOOP-8757 Project: Hadoop Common Issue Type: Improvement Components: metrics Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Todd Lipcon Priority: Minor Just spent a couple hours trying to figure out why a metric I added didn't show up in JMX, only to eventually realize it was because I had a whitespace in the property name. This didn't cause any errors to be logged -- the metric just didn't show up in JMX. We should check that the name is valid and log an error, or replace invalid characters with something like an underscore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8031) Configuration class fails to find embedded .jar resources; should use URL.openStream()
[ https://issues.apache.org/jira/browse/HADOOP-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-8031. - Resolution: Fixed Re-resolving this since Ahmed is addressing my issue in HADOOP-8749 Configuration class fails to find embedded .jar resources; should use URL.openStream() -- Key: HADOOP-8031 URL: https://issues.apache.org/jira/browse/HADOOP-8031 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.0.0-alpha Reporter: Elias Ross Assignee: Elias Ross Fix For: 2.2.0-alpha Attachments: 0001-fix-HADOOP-7982-class-loader.patch, HADOOP-8031-part2.patch, HADOOP-8031.patch, hadoop-8031.txt While running a hadoop client within RHQ (monitoring software) using its classloader, I see this: 2012-02-07 09:15:25,313 INFO [ResourceContainer.invoker.daemon-2] (org.apache.hadoop.conf.Configuration)- parsing jar:file:/usr/local/rhq-agent/data/tmp/rhq-hadoop-plugin-4.3.0-SNAPSHOT.jar6856622641102893436.classloader/hadoop-core-0.20.2+737+1.jar7204287718482036191.tmp!/core-default.xml 2012-02-07 09:15:25,318 ERROR [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Failed to start component for Resource[id=16290, type=NameNode, key=NameNode:/usr/lib/hadoop-0.20, name=NameNode, parent=vg61l01ad-hadoop002.apple.com] from synchronized merge. org.rhq.core.clientapi.agent.PluginContainerException: Failed to start component for resource Resource[id=16290, type=NameNode, key=NameNode:/usr/lib/hadoop-0.20, name=NameNode, parent=vg61l01ad-hadoop002.apple.com]. Caused by: java.lang.RuntimeException: core-site.xml not found at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1228) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1169) at org.apache.hadoop.conf.Configuration.set(Configuration.java:438) This is because the URL jar:file:/usr/local/rhq-agent/data/tmp/rhq-hadoop-plugin-4.3.0-SNAPSHOT.jar6856622641102893436.classloader/hadoop-core-0.20.2+737+1.jar7204287718482036191.tmp!/core-default.xml cannot be found by DocumentBuilder (doesn't understand it). (Note: the logs are for an old version of Configuration class, but the new version has the same code.) The solution is to obtain the resource stream directly from the URL object itself. That is to say: {code} URL url = getResource((String)name); -if (url != null) { - if (!quiet) { -LOG.info(parsing + url); - } - doc = builder.parse(url.toString()); -} +doc = builder.parse(url.openStream()); {code} Note: I have a full patch pending approval at Apple for this change, including some cleanup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HADOOP-8031) Configuration class fails to find embedded .jar resources; should use URL.openStream()
[ https://issues.apache.org/jira/browse/HADOOP-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reopened HADOOP-8031: - I confirmed that reverting this patch locally restored the old behavior. If we can't maintain the old behavior, we should at least mark this as an incompatible change. But I bet it's doable to both fix it and have relative xincludes. Configuration class fails to find embedded .jar resources; should use URL.openStream() -- Key: HADOOP-8031 URL: https://issues.apache.org/jira/browse/HADOOP-8031 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 2.0.0-alpha Reporter: Elias Ross Assignee: Elias Ross Fix For: 2.2.0-alpha Attachments: 0001-fix-HADOOP-7982-class-loader.patch, HADOOP-8031.patch, hadoop-8031.txt While running a hadoop client within RHQ (monitoring software) using its classloader, I see this: 2012-02-07 09:15:25,313 INFO [ResourceContainer.invoker.daemon-2] (org.apache.hadoop.conf.Configuration)- parsing jar:file:/usr/local/rhq-agent/data/tmp/rhq-hadoop-plugin-4.3.0-SNAPSHOT.jar6856622641102893436.classloader/hadoop-core-0.20.2+737+1.jar7204287718482036191.tmp!/core-default.xml 2012-02-07 09:15:25,318 ERROR [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Failed to start component for Resource[id=16290, type=NameNode, key=NameNode:/usr/lib/hadoop-0.20, name=NameNode, parent=vg61l01ad-hadoop002.apple.com] from synchronized merge. org.rhq.core.clientapi.agent.PluginContainerException: Failed to start component for resource Resource[id=16290, type=NameNode, key=NameNode:/usr/lib/hadoop-0.20, name=NameNode, parent=vg61l01ad-hadoop002.apple.com]. Caused by: java.lang.RuntimeException: core-site.xml not found at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1228) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1169) at org.apache.hadoop.conf.Configuration.set(Configuration.java:438) This is because the URL jar:file:/usr/local/rhq-agent/data/tmp/rhq-hadoop-plugin-4.3.0-SNAPSHOT.jar6856622641102893436.classloader/hadoop-core-0.20.2+737+1.jar7204287718482036191.tmp!/core-default.xml cannot be found by DocumentBuilder (doesn't understand it). (Note: the logs are for an old version of Configuration class, but the new version has the same code.) The solution is to obtain the resource stream directly from the URL object itself. That is to say: {code} URL url = getResource((String)name); -if (url != null) { - if (!quiet) { -LOG.info(parsing + url); - } - doc = builder.parse(url.toString()); -} +doc = builder.parse(url.openStream()); {code} Note: I have a full patch pending approval at Apple for this change, including some cleanup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8624) ProtobufRpcEngine should log all RPCs if TRACE logging is enabled
Todd Lipcon created HADOOP-8624: --- Summary: ProtobufRpcEngine should log all RPCs if TRACE logging is enabled Key: HADOOP-8624 URL: https://issues.apache.org/jira/browse/HADOOP-8624 Project: Hadoop Common Issue Type: Improvement Components: ipc Affects Versions: 3.0.0, 2.2.0-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Since all RPC requests/responses are now ProtoBufs, it's easy to add a TRACE level logging output for ProtobufRpcEngine that actually shows the full content of all calls. This is very handy especially when writing/debugging unit tests, but might also be useful to enable at runtime for short periods of time to debug certain production issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8608) Add Configuration API for parsing time durations
Todd Lipcon created HADOOP-8608: --- Summary: Add Configuration API for parsing time durations Key: HADOOP-8608 URL: https://issues.apache.org/jira/browse/HADOOP-8608 Project: Hadoop Common Issue Type: Improvement Components: conf Affects Versions: 3.0.0 Reporter: Todd Lipcon Hadoop has a lot of configurations which specify durations or intervals of time. Unfortunately these different configurations have little consistency in units - eg some are in milliseconds, some in seconds, and some in minutes. This makes it difficult for users to configure, since they have to always refer back to docs to remember the unit for each property. The proposed solution is to add an API like {{Configuration.getTimeDuration}} which allows the user to specify the units with a prefix. For example, 10ms, 10s, 10m, 10h, or even 10d. For backwards-compatibility, if the user does not specify a unit, the API can specify the default unit, and warn the user that they should specify an explicit unit instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8557) Core Test failed in jekins for patch pre-commit
[ https://issues.apache.org/jira/browse/HADOOP-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-8557. - Resolution: Duplicate Resolving as dup of HADOOP-8537 Core Test failed in jekins for patch pre-commit Key: HADOOP-8557 URL: https://issues.apache.org/jira/browse/HADOOP-8557 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Junping Du Priority: Blocker In jenkins PreCommit build history (https://builds.apache.org/job/PreCommit-HADOOP-Build/), following tests are failed for all recently patches (build-1164,1166,1168,1170): org.apache.hadoop.ha.TestZKFailoverController.testGracefulFailover org.apache.hadoop.ha.TestZKFailoverController.testOneOfEverything org.apache.hadoop.io.file.tfile.TestTFileByteArrays.testOneBlock org.apache.hadoop.io.file.tfile.TestTFileByteArrays.testOneBlockPlusOneEntry org.apache.hadoop.io.file.tfile.TestTFileByteArrays.testThreeBlocks org.apache.hadoop.io.file.tfile.TestTFileJClassComparatorByteArrays.testOneBlock org.apache.hadoop.io.file.tfile.TestTFileJClassComparatorByteArrays.testOneBlockPlusOneEntry org.apache.hadoop.io.file.tfile.TestTFileJClassComparatorByteArrays.testThreeBlocks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8423) MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data
[ https://issues.apache.org/jira/browse/HADOOP-8423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-8423. - Resolution: Fixed Fix Version/s: 1.2.0 Committed to branch-1 for 1.2. Thanks for backporting, Harsh. MapFile.Reader.get() crashes jvm or throws EOFException on Snappy or LZO block-compressed data -- Key: HADOOP-8423 URL: https://issues.apache.org/jira/browse/HADOOP-8423 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.2 Environment: Linux 2.6.32.23-0.3-default #1 SMP 2010-10-07 14:57:45 +0200 x86_64 x86_64 x86_64 GNU/Linux Reporter: Jason B Assignee: Todd Lipcon Fix For: 1.2.0, 2.1.0-alpha Attachments: HADOOP-8423-branch-1.patch, HADOOP-8423-branch-1.patch, MapFileCodecTest.java, hadoop-8423.txt I am using Cloudera distribution cdh3u1. When trying to check native codecs for better decompression performance such as Snappy or LZO, I ran into issues with random access using MapFile.Reader.get(key, value) method. First call of MapFile.Reader.get() works but a second call fails. Also I am getting different exceptions depending on number of entries in a map file. With LzoCodec and 10 record file, jvm gets aborted. At the same time the DefaultCodec works fine for all cases, as well as record compression for the native codecs. I created a simple test program (attached) that creates map files locally with sizes of 10 and 100 records for three codecs: Default, Snappy, and LZO. (The test requires corresponding native library available) The summary of problems are given below: Map Size: 100 Compression: RECORD == DefaultCodec: OK SnappyCodec: OK LzoCodec: OK Map Size: 10 Compression: RECORD == DefaultCodec: OK SnappyCodec: OK LzoCodec: OK Map Size: 100 Compression: BLOCK DefaultCodec: OK SnappyCodec: java.io.EOFException at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:114) LzoCodec: java.io.EOFException at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:114) Map Size: 10 Compression: BLOCK == DefaultCodec: OK SnappyCodec: java.lang.NoClassDefFoundError: Ljava/lang/InternalError at org.apache.hadoop.io.compress.snappy.SnappyDecompressor.decompressBytesDirect(Native Method) LzoCodec: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2b068ffcbc00, pid=6385, tid=47304763508496 # # JRE version: 6.0_21-b07 # Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b17 mixed mode linux-amd64 ) # Problematic frame: # C [liblzo2.so.2+0x13c00] lzo1x_decompress+0x1a0 # -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8590) Backport HADOOP-7318 (MD5Hash factory should reset the digester it returns) to branch-1
Todd Lipcon created HADOOP-8590: --- Summary: Backport HADOOP-7318 (MD5Hash factory should reset the digester it returns) to branch-1 Key: HADOOP-8590 URL: https://issues.apache.org/jira/browse/HADOOP-8590 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 1.0.3 Reporter: Todd Lipcon I ran into this bug on branch-1 today, it seems like we should backport it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8537) Two TFile tests failing recently
Todd Lipcon created HADOOP-8537: --- Summary: Two TFile tests failing recently Key: HADOOP-8537 URL: https://issues.apache.org/jira/browse/HADOOP-8537 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 3.0.0 Reporter: Todd Lipcon TestTFileJClassComparatorByteArrays and TestTFileByteArrays are failing in some recent patch builds (seems to have started in the middle of May). These tests previously failed in HADOOP-7111 - perhaps something regressed there? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8529) Error while formatting the namenode in hadoop single node setup in windows
[ https://issues.apache.org/jira/browse/HADOOP-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-8529. - Resolution: Invalid Please inquire with the user mailing lists for questions like this. JIRA is meant for bug/task tracking. Error while formatting the namenode in hadoop single node setup in windows -- Key: HADOOP-8529 URL: https://issues.apache.org/jira/browse/HADOOP-8529 Project: Hadoop Common Issue Type: Task Components: conf Affects Versions: 1.0.3 Environment: Windows XP using Cygwin Reporter: Narayana Karteek Priority: Blocker Attachments: capture8.bmp Original Estimate: 5h Remaining Estimate: 5h Hi, I tried to configure hadoop 1.0.3 .I added all libs from share folder to lib directory.But still i get the error while formatting the namenode $ ./hadoop namenode -format java.lang.NoClassDefFoundError: Caused by: java.lang.ClassNotFoundException: at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) . Program will exit.in class: Exception in thread main -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8497) Shell needs a way to list amount of physical consumed space in a directory
Todd Lipcon created HADOOP-8497: --- Summary: Shell needs a way to list amount of physical consumed space in a directory Key: HADOOP-8497 URL: https://issues.apache.org/jira/browse/HADOOP-8497 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 2.0.0-alpha, 1.0.3, 3.0.0 Reporter: Todd Lipcon Assignee: Andy Isaacson Currently, there is no way to see the physical consumed space for a directory. du lists the logical (pre-replication) space, and fs -count only displays the consumed space when a quota is set. This makes it hard for administrators to set a quota on a directory, since they have no way to determine a reasonable value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8410) SPNEGO filter should have better error messages when not fully configured
Todd Lipcon created HADOOP-8410: --- Summary: SPNEGO filter should have better error messages when not fully configured Key: HADOOP-8410 URL: https://issues.apache.org/jira/browse/HADOOP-8410 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.0.0 Reporter: Todd Lipcon Priority: Minor I upgraded to a build which includes SPNEGO, but neglected to configure dfs.web.authentication.kerberos.principal. This resulted in the following error: 12/05/18 14:46:20 INFO server.KerberosAuthenticationHandler: Login using keytab //home/todd/confs/conf.pseudo.security//hdfs.keytab, for principal ${dfs.web.authentication.kerberos.principal} 12/05/18 14:46:20 WARN mortbay.log: failed SpnegoFilter: javax.servlet.ServletException: javax.security.auth.login.LoginException: Unable to obtain password from user Instead, it should give an error that the principal needs to be configured. Even better would be if we could default to HTTP/_HOST@default realm -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8404) etc/hadoop in binary tarball missing hadoop-env.sh
Todd Lipcon created HADOOP-8404: --- Summary: etc/hadoop in binary tarball missing hadoop-env.sh Key: HADOOP-8404 URL: https://issues.apache.org/jira/browse/HADOOP-8404 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.0 Reporter: Todd Lipcon todd@todd-w510:~/releases/hadoop-2.0.0-alpha$ ls etc/hadoop/ core-site.xml hdfs-site.xmlhttpfs-signature.secret slaves yarn-env.sh hadoop-metrics2.properties httpfs-env.shhttpfs-site.xml ssl-client.xml.example yarn-site.xml hadoop-metrics.properties httpfs-log4j.properties log4j.properties ssl-server.xml.example -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8405) ZKFC tests leak ZK instances
Todd Lipcon created HADOOP-8405: --- Summary: ZKFC tests leak ZK instances Key: HADOOP-8405 URL: https://issues.apache.org/jira/browse/HADOOP-8405 Project: Hadoop Common Issue Type: Bug Components: auto-failover, test Affects Versions: Auto Failover (HDFS-3042) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: hadoop-8405.txt The ZKFC code wasn't previously terminating the ZK connection in all cases where it should (eg after a failed startup or after formatting ZK). This didn't cause a problem for CLI usage, since the process exited afterwards, but caused the test results to get clouded with a lot of Reconecting to ZK messages, which make the logs hard to read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8405) ZKFC tests leak ZK instances
[ https://issues.apache.org/jira/browse/HADOOP-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-8405. - Resolution: Fixed Fix Version/s: Auto Failover (HDFS-3042) Hadoop Flags: Reviewed Committed to branch, thanks Eli. ZKFC tests leak ZK instances Key: HADOOP-8405 URL: https://issues.apache.org/jira/browse/HADOOP-8405 Project: Hadoop Common Issue Type: Bug Components: auto-failover, test Affects Versions: Auto Failover (HDFS-3042) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: Auto Failover (HDFS-3042) Attachments: hadoop-8405.txt The ZKFC code wasn't previously terminating the ZK connection in all cases where it should (eg after a failed startup or after formatting ZK). This didn't cause a problem for CLI usage, since the process exited afterwards, but caused the test results to get clouded with a lot of Reconecting to ZK messages, which make the logs hard to read. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8406) CompressionCodecFactory.CODEC_PROVIDERS iteration is thread-unsafe
Todd Lipcon created HADOOP-8406: --- Summary: CompressionCodecFactory.CODEC_PROVIDERS iteration is thread-unsafe Key: HADOOP-8406 URL: https://issues.apache.org/jira/browse/HADOOP-8406 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon CompressionCodecFactory defines CODEC_PROVIDERS as: {code} private static final ServiceLoaderCompressionCodec CODEC_PROVIDERS = ServiceLoader.load(CompressionCodec.class); {code} but this is a lazy collection which is thread-unsafe to iterate. We either need to synchronize when we iterate over it, or we need to materialize it during class-loading time by copying to a non-lazy collection -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8276) Auto-HA: add config for java options to pass to zkfc daemon
[ https://issues.apache.org/jira/browse/HADOOP-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-8276. - Resolution: Fixed Fix Version/s: Auto Failover (HDFS-3042) Hadoop Flags: Reviewed Auto-HA: add config for java options to pass to zkfc daemon --- Key: HADOOP-8276 URL: https://issues.apache.org/jira/browse/HADOOP-8276 Project: Hadoop Common Issue Type: Improvement Components: scripts Affects Versions: Auto Failover (HDFS-3042) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: Auto Failover (HDFS-3042) Attachments: hadoop-8276.txt Currently the zkfc daemon is started without any ability to specify java options for it. We should be add a flag so heap size, etc can be specified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8397) NPE thrown when IPC layer gets an EOF reading a response
Todd Lipcon created HADOOP-8397: --- Summary: NPE thrown when IPC layer gets an EOF reading a response Key: HADOOP-8397 URL: https://issues.apache.org/jira/browse/HADOOP-8397 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.0.0 Reporter: Todd Lipcon Priority: Critical When making a call on an IPC connection where the other end has shut down, I see the following exception: Caused by: java.lang.NullPointerException at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:852) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:781) from the lines: {code} RpcResponseHeaderProto response = RpcResponseHeaderProto.parseDelimitedFrom(in); int callId = response.getCallId(); {code} This is because parseDelimitedFrom() returns null in the case that the next thing to be read on the stream is an EOF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8362) Improve exception message when Configuration.set() is called with a null key or value
Todd Lipcon created HADOOP-8362: --- Summary: Improve exception message when Configuration.set() is called with a null key or value Key: HADOOP-8362 URL: https://issues.apache.org/jira/browse/HADOOP-8362 Project: Hadoop Common Issue Type: Improvement Components: conf Affects Versions: 2.0.0 Reporter: Todd Lipcon Priority: Trivial Currently, calling Configuration.set(...) with a null value results in a NullPointerException within Properties.setProperty. We should check for null key/value and throw a better exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8344) Improve test-patch to make it easier to find javadoc warnings
Todd Lipcon created HADOOP-8344: --- Summary: Improve test-patch to make it easier to find javadoc warnings Key: HADOOP-8344 URL: https://issues.apache.org/jira/browse/HADOOP-8344 Project: Hadoop Common Issue Type: Improvement Components: build, test Reporter: Todd Lipcon Priority: Minor Often I have to spend a lot of time digging through logs to find javadoc warnings as the result of a test-patch. Similar to the improvement made in HADOOP-8339, we should do the following: - test-patch should only run javadoc on modules that have changed - the exclusions OK_JAVADOC should be per-project rather than cross-project - rather than just have a number, we should check in the actual list of warnings to ignore and then fuzzy-match the patch warnings against the exclude list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8279) Auto-HA: Allow manual failover to be invoked from zkfc.
[ https://issues.apache.org/jira/browse/HADOOP-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-8279. - Resolution: Fixed Hadoop Flags: Reviewed Committed to branch, thanks Aaron. Auto-HA: Allow manual failover to be invoked from zkfc. --- Key: HADOOP-8279 URL: https://issues.apache.org/jira/browse/HADOOP-8279 Project: Hadoop Common Issue Type: Improvement Components: auto-failover, ha Affects Versions: Auto Failover (HDFS-3042) Reporter: Mingjie Lai Assignee: Todd Lipcon Fix For: Auto Failover (HDFS-3042) Attachments: hadoop-8279.txt, hadoop-8279.txt, hadoop-8279.txt, hadoop-8279.txt, hadoop-8279.txt HADOOP-8247 introduces a configure flag to prevent potential status inconsistency between zkfc and namenode, by making auto and manual failover mutually exclusive. However, as described in 2.7.2 section of design doc at HDFS-2185, we should allow manual and auto failover co-exist, by: - adding some rpc interfaces at zkfc - manual failover shall be triggered by haadmin, and handled by zkfc if auto failover is enabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8340) SNAPSHOT build versions should compare as less than their eventual final release
Todd Lipcon created HADOOP-8340: --- Summary: SNAPSHOT build versions should compare as less than their eventual final release Key: HADOOP-8340 URL: https://issues.apache.org/jira/browse/HADOOP-8340 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor We recently added a utility function to compare two version strings, based on splitting on '.'s and comparing each component. However, it considers a version like 2.0.0-SNAPSHOT as being greater than 2.0.0. This isn't right, since SNAPSHOT builds come before the final release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8315) Support SASL-authenticated ZooKeeper in ActiveStandbyElector
Todd Lipcon created HADOOP-8315: --- Summary: Support SASL-authenticated ZooKeeper in ActiveStandbyElector Key: HADOOP-8315 URL: https://issues.apache.org/jira/browse/HADOOP-8315 Project: Hadoop Common Issue Type: Improvement Components: auto-failover, ha Affects Versions: Auto Failover (HDFS-3042) Reporter: Todd Lipcon Currently, if you try to use SASL-authenticated ZK with the ActiveStandbyElector, you run into a couple issues: 1) We hit ZOOKEEPER-1437 - we need to wait until we see SaslAuthenticated before we can make any requests 2) We currently throw a fatalError when we see the SaslAuthenticated callback on the connection watcher We need to wait for ZK-1437 upstream, and then upgrade to the fixed version for #1. For #2 we just need to add a case there and ignore it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-8306) ZKFC: improve error message when ZK is not running
[ https://issues.apache.org/jira/browse/HADOOP-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-8306. - Resolution: Fixed Fix Version/s: Auto Failover (HDFS-3042) Hadoop Flags: Reviewed Committed to branch, thanks Eli ZKFC: improve error message when ZK is not running -- Key: HADOOP-8306 URL: https://issues.apache.org/jira/browse/HADOOP-8306 Project: Hadoop Common Issue Type: Improvement Components: auto-failover, ha Affects Versions: Auto Failover (HDFS-3042) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: Auto Failover (HDFS-3042) Attachments: hadoop-8306.txt Currently if you start the ZKFC without starting ZK, you get an ugly stack trace. We should improve the error message and give it a unique exit code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8306) ZKFC: improve error message when ZK is not running
Todd Lipcon created HADOOP-8306: --- Summary: ZKFC: improve error message when ZK is not running Key: HADOOP-8306 URL: https://issues.apache.org/jira/browse/HADOOP-8306 Project: Hadoop Common Issue Type: Improvement Components: auto-failover, ha Affects Versions: Auto Failover (HDFS-3042) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Currently if you start the ZKFC without starting ZK, you get an ugly stack trace. We should improve the error message and give it a unique exit code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7632) NPE in copyToLocal
NPE in copyToLocal -- Key: HADOOP-7632 URL: https://issues.apache.org/jira/browse/HADOOP-7632 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 0.23.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.23.0 [todd@c0309 hadoop-trunk-home]$ ./bin/hadoop fs -copyToLocal /hbase/.META./1028785192/ /tmp/meta/ copyToLocal: Fatal internal error java.lang.NullPointerException at org.apache.hadoop.fs.shell.PathData.getPathDataForChild(PathData.java:182) at org.apache.hadoop.fs.shell.CommandWithDestination.processPaths(CommandWithDestination.java:115) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:329) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:302) at org.apache.hadoop.fs.shell.CommandWithDestination.processPaths(CommandWithDestination.java:116) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:272) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:255) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:239) at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:105) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:185) at org.apache.hadoop.fs.shell.Command.run(Command.java:149) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7627) Improve MetricsAsserts to give more understandable output on failure
Improve MetricsAsserts to give more understandable output on failure Key: HADOOP-7627 URL: https://issues.apache.org/jira/browse/HADOOP-7627 Project: Hadoop Common Issue Type: Improvement Components: metrics, test Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor In developing a test case that uses MetricsAsserts, I had two issues: 1) the error output in the case that an assertion failed does not currently give any information as to the _actual_ value of the metric 2) there is no way to retrieve the metric variable (eg to assert that the sum of a metric over all DNs is equal to some value) This JIRA is to improve this test class to fix the above issues. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7613) invalid common pom breaking HDFS build
invalid common pom breaking HDFS build -- Key: HADOOP-7613 URL: https://issues.apache.org/jira/browse/HADOOP-7613 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 0.24.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.24.0 [WARNING] The POM for org.apache.hadoop:hadoop-common:jar:0.24.0-20110905.195220-41 is invalid, transitive dependencies (if any) will not be available: 5 problems were encountered while building the effective model for org.apache.hadoop:hadoop-common:0.24.0-SNAPSHOT [ERROR] 'dependencies.dependency.version' for asm:asm:jar is missing. @ [ERROR] 'dependencies.dependency.version' for com.sun.jersey:jersey-core:jar is missing. @ [ERROR] 'dependencies.dependency.version' for com.sun.jersey:jersey-json:jar is missing. @ [ERROR] 'dependencies.dependency.version' for com.sun.jersey:jersey-server:jar is missing. @ [ERROR] 'dependencies.dependency.version' for org.apache.hadoop:hadoop-auth:jar is missing. @ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7559) Broken Link On Web Access Source View
[ https://issues.apache.org/jira/browse/HADOOP-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-7559. - Resolution: Fixed Fixed. Should propagate soon. Thanks for the watchful eye, Nong :) Broken Link On Web Access Source View - Key: HADOOP-7559 URL: https://issues.apache.org/jira/browse/HADOOP-7559 Project: Hadoop Common Issue Type: Bug Reporter: Nong Li Assignee: Todd Lipcon Priority: Minor Labels: website From this page, http://hadoop.apache.org/hdfs/version_control.html, the link to the Web Access (readonly), http://svn.apache.org/viewvc/hadoop/common/trunk/hdfs/, is broken. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7587) hadoop-auth module has 4 findbugs warnings
hadoop-auth module has 4 findbugs warnings -- Key: HADOOP-7587 URL: https://issues.apache.org/jira/browse/HADOOP-7587 Project: Hadoop Common Issue Type: Bug Components: build, security Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Alejandro Abdelnur Priority: Critical Precommit builds are all assigning -1 findbugs due to the following four findbugs warning: Found 4 Findbugs warnings (/home/jenkins/jenkins-slave/workspace/PreCommit-HADOOP-Build/trunk/hadoop-common-project/hadoop-auth/target/findbugsXml.xml) Due to some issues with the current hudson setup, I'm not sure what the warnings are, but clear that's where they're coming from. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7584) Truncate stack traces coming out of RPC
Truncate stack traces coming out of RPC --- Key: HADOOP-7584 URL: https://issues.apache.org/jira/browse/HADOOP-7584 Project: Hadoop Common Issue Type: Improvement Reporter: Todd Lipcon Currently stack traces logged and sent back as part of RPC responses have a lot of cruft on them from the internals of the IPC stack. These aren't particularly useful for users or developers - it would be nicer to truncate these portions of the trace. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HADOOP-7559) Broken Link On Web Access Source View
[ https://issues.apache.org/jira/browse/HADOOP-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reopened HADOOP-7559: - Codebase is somewhat migratory this week... reopening Broken Link On Web Access Source View - Key: HADOOP-7559 URL: https://issues.apache.org/jira/browse/HADOOP-7559 Project: Hadoop Common Issue Type: Bug Reporter: Nong Li Assignee: Todd Lipcon Priority: Minor Labels: website From this page, http://hadoop.apache.org/hdfs/version_control.html, the link to the Web Access (readonly), http://svn.apache.org/viewvc/hadoop/common/trunk/hdfs/, is broken. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7562) Update asf-mailer.conf for new trees
Update asf-mailer.conf for new trees Key: HADOOP-7562 URL: https://issues.apache.org/jira/browse/HADOOP-7562 Project: Hadoop Common Issue Type: Bug Reporter: Todd Lipcon Priority: Critical Since the mavenization, we have renamed various directories inside hadoop/trunk. The asf-mailer.conf file in the infrastructure SVN needs to be updated for these new paths in order to fix the commit mails and the triggering of the git mirror. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7559) Broken Link On Web Access Source View
[ https://issues.apache.org/jira/browse/HADOOP-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-7559. - Resolution: Fixed Thanks for reporting the broken link. I just updated the page to point to http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-hdfs/ - it should propagate within an hour or so. Broken Link On Web Access Source View - Key: HADOOP-7559 URL: https://issues.apache.org/jira/browse/HADOOP-7559 Project: Hadoop Common Issue Type: Bug Reporter: Nong Li Assignee: Todd Lipcon Priority: Minor Labels: website From this page, http://hadoop.apache.org/hdfs/version_control.html, the link to the Web Access (readonly), http://svn.apache.org/viewvc/hadoop/common/trunk/hdfs/, is broken. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7545) common -tests jar should not include properties and configs
common -tests jar should not include properties and configs --- Key: HADOOP-7545 URL: https://issues.apache.org/jira/browse/HADOOP-7545 Project: Hadoop Common Issue Type: Bug Components: build, test Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 0.23.0 This is the cause of HDFS-2242. The -tests jar generated from the common build should only include the test classes, and not the test resources. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7529) Possible deadlock in metrics2
Possible deadlock in metrics2 - Key: HADOOP-7529 URL: https://issues.apache.org/jira/browse/HADOOP-7529 Project: Hadoop Common Issue Type: Bug Components: metrics Affects Versions: 0.23.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.23.0 Attachments: metrics-deadlock.png Lock cycle detected by jcarder between MetricsSystemImpl and DefaultMetricsSystem -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7487) DF should throw a more reasonable exception when mount cannot be determined
DF should throw a more reasonable exception when mount cannot be determined --- Key: HADOOP-7487 URL: https://issues.apache.org/jira/browse/HADOOP-7487 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 0.23.0 Reporter: Todd Lipcon Currently, when using the DF class to determine the mount corresponding to a given directory, it will throw the generic exception Expecting a line not the end of stream if it can't determine the mount (for example if the directory doesn't exist). This error message should be improved in several ways: # If the dir to check doesn't exist, we can see that before even execing df, and throw a better exception (or behave better by chopping path components until it exists) # Rather than parsing the lines out of df's stdout, collect the whole output, and then parse. So, if df returns a non-zero exit code, we can avoid trying to parse the empty result # If there's a success exit code, and we still can't parse it (eg incompatible OS), we should include the unparseable line in the exception message. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7440) HttpServer.getParameterValues throws NPE for missing parameters
HttpServer.getParameterValues throws NPE for missing parameters --- Key: HADOOP-7440 URL: https://issues.apache.org/jira/browse/HADOOP-7440 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 If the requested parameter was not specified in the request, the raw request's getParameterValues function returns null. Thus, trying to access {{unquoteValue.length}} throws NPE. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7428) IPC connection is orphaned with null 'out' member
IPC connection is orphaned with null 'out' member - Key: HADOOP-7428 URL: https://issues.apache.org/jira/browse/HADOOP-7428 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon We had a situation a JT ended up in a state where a certain user could not submit a job, due to an NPE on the following line in {{sendParam}}: {code} synchronized (Connection.this.out) { {code} Looking at the code, my guess is that an RTE was thrown in setupIOstreams, which only catches IOE. This could leave the connection in a half-setup state which is never cleaned up and also cannot perform IPCs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7403) Fix SVN urls on site after unsplit
Fix SVN urls on site after unsplit -- Key: HADOOP-7403 URL: https://issues.apache.org/jira/browse/HADOOP-7403 Project: Hadoop Common Issue Type: Bug Affects Versions: site Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hadoop-7403.txt We need to update the version control URLs on the site -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-7106) Re-organize hadoop subversion layout
[ https://issues.apache.org/jira/browse/HADOOP-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HADOOP-7106. - Resolution: Fixed Hadoop Flags: [Reviewed] I think all the pieces of this are complete now, so marking resolved. Thanks to the many people who contributed: Nigel, Owen, Doug, Ian, Jukka, etc. Re-organize hadoop subversion layout Key: HADOOP-7106 URL: https://issues.apache.org/jira/browse/HADOOP-7106 Project: Hadoop Common Issue Type: Improvement Components: build Reporter: Nigel Daley Assignee: Todd Lipcon Priority: Blocker Fix For: 0.22.0 Attachments: HADOOP-7106-auth.patch, HADOOP-7106-auth.patch, HADOOP-7106-auth.patch, HADOOP-7106-git.sh, HADOOP-7106-git.sh, HADOOP-7106.sh, HADOOP-7106.sh, HADOOP-7106.sh, HADOOP-7106.sh, HADOOP-7106.sh, HADOOP-7106.sh, HADOOP-7106.sh, HADOOP-7106.sh, gitk-example.png, mailer-conf.diff As discussed on general@ at http://tinyurl.com/4q6lhxm -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7376) DataTransfer Protocol using protobufs
DataTransfer Protocol using protobufs - Key: HADOOP-7376 URL: https://issues.apache.org/jira/browse/HADOOP-7376 Project: Hadoop Common Issue Type: Sub-task Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon We've been talking about this for a long time... would be nice to use something like protobufs or Thrift for some of our wire protocols. I knocked together a prototype of DataTransferProtocol on top of proto bufs that seems to work. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7379) Add ability to include Protobufs in ObjectWritable
Add ability to include Protobufs in ObjectWritable -- Key: HADOOP-7379 URL: https://issues.apache.org/jira/browse/HADOOP-7379 Project: Hadoop Common Issue Type: Improvement Components: io, ipc Affects Versions: 0.23.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Per HDFS-2060, it would make it easier to piecemeal switch to protocol buffer based data structures in the wire protocol if we could intermix the two. The IPC framework currently provides the concept of engines for RPC, but that doesn't easily allow mixed types within the same framework for ease of transition. I'd like to add the cases to ObjectWritable to be handle subclasses of {{Message}}, the superclass of codegenned protobufs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7346) Send back nicer error to clients using outdated IPC version
Send back nicer error to clients using outdated IPC version --- Key: HADOOP-7346 URL: https://issues.apache.org/jira/browse/HADOOP-7346 Project: Hadoop Common Issue Type: Improvement Components: ipc Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.22.0 When an older Hadoop version tries to contact a newer Hadoop version across an IPC protocol version bump, the client currently just gets a non-useful error message like EOFException. Instead, the IPC server code can speak just enough of prior IPC protocols to send back a fatal message indicating the version mismatch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7335) Force entropy to come from non-true random for tests
Force entropy to come from non-true random for tests Key: HADOOP-7335 URL: https://issues.apache.org/jira/browse/HADOOP-7335 Project: Hadoop Common Issue Type: Improvement Components: build, test Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Passing the system property {{-Djava.security.egd=file:///dev/urandom}} forces the JVM to seed its PRNG from non-true random (/dev/urandom) instead of the true random (/dev/random). This makes the tests run faster, since without it they often hang waiting for entropy while Jetty is initializing. We should turn this on for the test targets by default, so developers/hudson boxes don't have to make this change system-wide or use workarounds like rngtools. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7332) Deadlock in IPC
Deadlock in IPC --- Key: HADOOP-7332 URL: https://issues.apache.org/jira/browse/HADOOP-7332 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 0.22.0 Reporter: Todd Lipcon Fix For: 0.22.0 Saw this during a run of TestIPC on 0.22 branch: [junit] Java stack information for the threads listed above: [junit] === [junit] IPC Client (47) connection to /0:0:0:0:0:0:0:0:48853 from an unknown user: [junit] at org.apache.hadoop.ipc.Client$ParallelResults.callComplete(Client.java:879) [junit] - waiting to lock 0xf599ef88 (a org.apache.hadoop.ipc.Client$ParallelResults) [junit] at org.apache.hadoop.ipc.Client$ParallelCall.callComplete(Client.java:862) [junit] at org.apache.hadoop.ipc.Client$Call.setException(Client.java:185) [junit] - locked 0xf59e2818 (a org.apache.hadoop.ipc.Client$ParallelCall) [junit] at org.apache.hadoop.ipc.Client$Connection.cleanupCalls(Client.java:843) [junit] at org.apache.hadoop.ipc.Client$Connection.close(Client.java:832) [junit] - locked 0xf59d8a90 (a org.apache.hadoop.ipc.Client$Connection) [junit] at org.apache.hadoop.ipc.Client$Connection.run(Client.java:708) [junit] Thread-242: [junit] at org.apache.hadoop.ipc.Client$Connection.markClosed(Client.java:788) [junit] - waiting to lock 0xf59d8a90 (a org.apache.hadoop.ipc.Client$Connection) [junit] at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:742) [junit] at org.apache.hadoop.ipc.Client.call(Client.java:1109) [junit] - locked 0xf599ef88 (a org.apache.hadoop.ipc.Client$ParallelResults) [junit] at org.apache.hadoop.ipc.TestIPC$ParallelCaller.run(TestIPC.java:135) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7334) test-patch should check for hard tabs
test-patch should check for hard tabs - Key: HADOOP-7334 URL: https://issues.apache.org/jira/browse/HADOOP-7334 Project: Hadoop Common Issue Type: Improvement Components: build, test Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Our coding guidelines say that hard tabs are disallowed in the Hadoop code, but they sometimes sneak in (there are about 280 in the common codebase at the moment). We should run a simple check for this in the test-patch process so it's harder for them to sneak in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HADOOP-7284) Trash and shell's rm does not work for viewfs
[ https://issues.apache.org/jira/browse/HADOOP-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reopened HADOOP-7284: - Trash and shell's rm does not work for viewfs - Key: HADOOP-7284 URL: https://issues.apache.org/jira/browse/HADOOP-7284 Project: Hadoop Common Issue Type: Bug Reporter: Sanjay Radia Assignee: Sanjay Radia Fix For: 0.23.0 Attachments: trash1.patch, trash2.patch, trash3.patch, trash4.patch, trash5.patch, trash6.patch, trash7.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7318) MD5Hash factory should reset the digester it returns
MD5Hash factory should reset the digester it returns Key: HADOOP-7318 URL: https://issues.apache.org/jira/browse/HADOOP-7318 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 0.22.0 Attachments: hadoop-7318.txt Currently the getDigest() method in MD5Hash does not reset the digester it returns. Since it's a thread-local, this means that a previous aborted usage of the same digester could leave some state around. For example, if the secondary namenode receives an IOException while transfering the image, and does another image transfer with the same thread, it will think it has received an invalid digest. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7317) RPC.stopProxy doesn't actually close proxy
RPC.stopProxy doesn't actually close proxy -- Key: HADOOP-7317 URL: https://issues.apache.org/jira/browse/HADOOP-7317 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 0.22.0 Reporter: Todd Lipcon Fix For: 0.22.0 Discovered while investigating HDFS-1965, it turns out that the reference-counting done in WritableRpcEngine.ClientCache doesn't map one-to-one with open TCP connections. This means that it's easy to accidentally leave TCP connections open longer than expected so long as the client has any other connections open at all. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7312) core-default.xml lists configuration version as 0.21
core-default.xml lists configuration version as 0.21 Key: HADOOP-7312 URL: https://issues.apache.org/jira/browse/HADOOP-7312 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Minor Fix For: 0.22.0 This key was added in HADOOP-6233, though appears unused. I suppose it's somewhat useful to try to diagnose if someone has old versions of core-default.xml on the classpath. Either way it should probably be updated to say 0.22 in the branch and 0.23 in trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7300) Configuration methods that return collections are inconsistent about mutability
Configuration methods that return collections are inconsistent about mutability --- Key: HADOOP-7300 URL: https://issues.apache.org/jira/browse/HADOOP-7300 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.22.0 Attachments: hadoop-7300.txt In particular, getTrimmedStringCollection seems to return an immutable collection, whereas getStringCollection returns a mutable one. IMO we should always return mutable collections since these methods by definition are doing copies. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7298) Add test utility for writing multi-threaded tests
Add test utility for writing multi-threaded tests - Key: HADOOP-7298 URL: https://issues.apache.org/jira/browse/HADOOP-7298 Project: Hadoop Common Issue Type: Test Components: test Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.22.0 A lot of our tests spawn off multiple threads in order to check various synchronization issues, etc. It's often tedious to write these kinds of tests because you have to manually propagate exceptions back to the main thread, etc. In HBase we have developed a testing utility which makes writing these kinds of tests much easier. I'd like to copy that utility into Hadoop so we can use it here as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7287) Configuration deprecation mechanism doesn't work properly for GenericOptionsParser/Tools
Configuration deprecation mechanism doesn't work properly for GenericOptionsParser/Tools Key: HADOOP-7287 URL: https://issues.apache.org/jira/browse/HADOOP-7287 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.22.0 Attachments: hadoop-7287-testcase.txt For example, you can't use -D options on the hadoop fs command line in order to specify the deprecated names of configuration options. The issue is that the ordering is: - JVM starts - GenericOptionsParser creates a Configuration object and calls set() for each of the options specified on command line - DistributedFileSystem or other class eventually instantiates HdfsConfiguration which adds the deprecations - Some class calls conf.get(new key) and sees the default instead of the version set on the command line -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-7279) Static web resources should not be duplicated between projects
Static web resources should not be duplicated between projects -- Key: HADOOP-7279 URL: https://issues.apache.org/jira/browse/HADOOP-7279 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: 0.22.0 Right now, some of the common resources (eg hadoop.css) are duplicated between common and dependent projects, since HDFS and MR can't actually load these resources out of the common static webapp. I'd like to rename common's static webapp to common/static, and host it at a different URL in the dependent projects. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HADOOP-7227) Remove protocol version check at proxy creation in Hadoop RPC.
[ https://issues.apache.org/jira/browse/HADOOP-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reopened HADOOP-7227: - Remove protocol version check at proxy creation in Hadoop RPC. -- Key: HADOOP-7227 URL: https://issues.apache.org/jira/browse/HADOOP-7227 Project: Hadoop Common Issue Type: Improvement Components: ipc Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: 0.23.0 Attachments: HADOOP-7227.2.patch, HADOOP-7227.3.patch, HADOOP-7227.4.patch Currently when a proxy is created for a protocol, there is a round trip of messages to check the protocol version. The protocol version is not checked in any subsequent rpc which could be a problem if the server restarts with a new protocol version. This issue and also the additional round-trip at proxy creation can be avoided if we add the protocol version in every rpc, and server checks the protocol version for every call. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HADOOP-7189) Add ability to enable 'debug' property in JAAS configuration
Add ability to enable 'debug' property in JAAS configuration Key: HADOOP-7189 URL: https://issues.apache.org/jira/browse/HADOOP-7189 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 0.23.0 Reporter: Todd Lipcon Priority: Minor Occasionally users have run into weird Unable to login messages. Unfortunately, JAAS obscures the underlying exception message in many cases because it thinks leaking the exception might be insecure in itself. Enabling the debug option in the JAAS configuration gets it to dump the underlying issue and makes troubleshooting this kind of issue easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HADOOP-7183) WritableComparator.get should not cache comparator objects
WritableComparator.get should not cache comparator objects -- Key: HADOOP-7183 URL: https://issues.apache.org/jira/browse/HADOOP-7183 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.22.0 HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HADOOP-7172) SecureIO should not check owner on non-secure clusters that have no native support
SecureIO should not check owner on non-secure clusters that have no native support -- Key: HADOOP-7172 URL: https://issues.apache.org/jira/browse/HADOOP-7172 Project: Hadoop Common Issue Type: Bug Components: io, security Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.22.0 The SecureIOUtils.openForRead function currently uses a racy stat/open combo if security is disabled and the native libraries are not available. This ends up shelling out to ls -ld which is very very slow. We've seen this cause significant performance regressions on clusters that match this profile. Since the racy permissions check doesn't buy us any security anyway, we should just fall back to a normal open without any stat() at all, if we can't use the native support to do it efficiently. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira