Re: [DISCUSS] About creation of Hadoop Thirdparty repository for shaded artifacts
I'm very unhappy with this direction. In particular, I don't think git is a good place for distribution of binary artifacts. Furthermore, the PMC shouldn't be releasing anything without a release vote. I'd propose that we make a third party module that contains the *source* of the pom files to build the relocated jars. This should absolutely be treated as a last resort for the mostly Google projects that regularly break binary compatibility (eg. Protobuf & Guava). In terms of naming, I'd propose something like: org.apache.hadoop.thirdparty.protobuf2_5 org.apache.hadoop.thirdparty.guava28 In particular, I think we absolutely need to include the version of the underlying project. On the other hand, since we should not be shading *everything* we can drop the leading com.google. The Hadoop project can make releases of the thirdparty module: org.apache.hadoop hadoop-thirdparty-protobuf25 1.0 Note that the version has to be the hadoop thirdparty release number, which is part of why you need to have the underlying version in the artifact name. These we can push to maven central as new releases from Hadoop. Thoughts? .. Owen On Fri, Sep 27, 2019 at 8:38 AM Vinayakumar B wrote: > Hi All, > >I wanted to discuss about the separate repo for thirdparty dependencies > which we need to shaded and include in Hadoop component's jars. > >Apologies for the big text ahead, but this needs clear explanation!! > >Right now most needed such dependency is protobuf. Protobuf dependency > was not upgraded from 2.5.0 onwards with the fear that downstream builds, > which depends on transitive dependency protobuf coming from hadoop's jars, > may fail with the upgrade. Apparently protobuf does not guarantee source > compatibility, though it guarantees wire compatibility between versions. > Because of this behavior, version upgrade may cause breakage in known and > unknown (private?) downstreams. > >So to tackle this, we came up the following proposal in HADOOP-13363. > >Luckily, As far as I know, no APIs, either public to user or between > Hadoop processes, is not directly using protobuf classes in signatures. (If > any exist, please let us know). > >Proposal: > > >1. Create a artifact(s) which contains shaded dependencies. All such > shading/relocation will be with known prefix > **org.apache.hadoop.thirdparty.**. >2. Right now protobuf jar (ex: o.a.h.thirdparty:hadoop-shaded-protobuf) > to start with, all **com.google.protobuf** classes will be relocated as > **org.apache.hadoop.thirdparty.com.google.protobuf**. >3. Hadoop modules, which needs protobuf as dependency, will add this > shaded artifact as dependency (ex: > o.a.h.thirdparty:hadoop-shaded-protobuf). >4. All previous usages of "com.google.protobuf" will be relocated to > "org.apache.hadoop.thirdparty.com.google.protobuf" in the code and will be > committed. Please note, this replacement is One-Time directly in source > code, NOT during compile and package. >5. Once all usages of "com.google.protobuf" is relocated, then hadoop > dont care about which version of original "protobuf-java" is in > dependency. >6. Just keep "protobuf-java:2.5.0" in dependency tree not to break the > downstreams. But hadoop will be originally using the latest protobuf > present in "o.a.h.thirdparty:hadoop-shaded-protobuf". > >7. Coming back to separate repo, Following are most appropriate reasons > of keeping shaded dependency artifact in separate repo instead of > submodule. > > 7a. These artifacts need not be built all the time. It needs to be > built only when there is a change in the dependency version or the build > process. > 7b. If added as "submodule in Hadoop repo", maven-shade-plugin:shade > will execute only in package phase. That means, "mvn compile" or "mvn > test-compile" will not be failed as this artifact will not have relocated > classes, instead it will have original classes, resulting in compilation > failure. Workaround, build thirdparty submodule first and exclude > "thirdparty" submodule in other executions. This will be a complex process > compared to keeping in a separate repo. > > 7c. Separate repo, will be a subproject of Hadoop, using the same > HADOOP jira project, with different versioning prefixed with "thirdparty-" > (ex: thirdparty-1.0.0). > 7d. Separate will have same release process as Hadoop. > > > HADOOP-13363 (https://issues.apache.org/jira/browse/HADOOP-13363) is > an > umbrella jira tracking the changes to protobuf upgrade. > > PR (https://github.com/apache/hadoop-thirdparty/pull/1) has been > raised > for separate repo creation in (HADOOP-16595 ( > https://issues.apache.org/jira/browse/HADOOP-16595) > > Please provide your inputs for the proposal and review the PR to > proceed with the proposal. > > >-Thanks, > Vinay > > On Fri, Sep 27, 2019 at 11:54 AM Vinod Kumar Vavilapalli < > vino...@apache.org> > wrote: > > > Moving the
Re: [VOTE] Moving Submarine to a separate Apache project proposal
Since you don't have any Apache Members, I'll join to provide Apache oversight. .. Owen On Fri, Sep 6, 2019 at 1:38 PM Owen O'Malley wrote: > +1 for moving to a new project. > > On Sat, Aug 31, 2019 at 10:19 PM Wangda Tan wrote: > >> Hi all, >> >> As we discussed in the previous thread [1], >> >> I just moved the spin-off proposal to CWIKI and completed all TODO parts. >> >> >> https://cwiki.apache.org/confluence/display/HADOOP/Submarine+Project+Spin-Off+to+TLP+Proposal >> >> If you have interests to learn more about this. Please review the proposal >> let me know if you have any questions/suggestions for the proposal. This >> will be sent to board post voting passed. (And please note that the >> previous voting thread [2] to move Submarine to a separate Github repo is >> a >> necessary effort to move Submarine to a separate Apache project but not >> sufficient so I sent two separate voting thread.) >> >> Please let me know if I missed anyone in the proposal, and reply if you'd >> like to be included in the project. >> >> This voting runs for 7 days and will be concluded at Sep 7th, 11 PM PDT. >> >> Thanks, >> Wangda Tan >> >> [1] >> >> https://lists.apache.org/thread.html/4a2210d567cbc05af92c12aa6283fd09b857ce209d537986ed800029@%3Cyarn-dev.hadoop.apache.org%3E >> [2] >> >> https://lists.apache.org/thread.html/6e94469ca105d5a15dc63903a541bd21c7ef70b8bcff475a16b5ed73@%3Cyarn-dev.hadoop.apache.org%3E >> >
Re: [VOTE] Moving Submarine to a separate Apache project proposal
+1 for moving to a new project. On Sat, Aug 31, 2019 at 10:19 PM Wangda Tan wrote: > Hi all, > > As we discussed in the previous thread [1], > > I just moved the spin-off proposal to CWIKI and completed all TODO parts. > > > https://cwiki.apache.org/confluence/display/HADOOP/Submarine+Project+Spin-Off+to+TLP+Proposal > > If you have interests to learn more about this. Please review the proposal > let me know if you have any questions/suggestions for the proposal. This > will be sent to board post voting passed. (And please note that the > previous voting thread [2] to move Submarine to a separate Github repo is a > necessary effort to move Submarine to a separate Apache project but not > sufficient so I sent two separate voting thread.) > > Please let me know if I missed anyone in the proposal, and reply if you'd > like to be included in the project. > > This voting runs for 7 days and will be concluded at Sep 7th, 11 PM PDT. > > Thanks, > Wangda Tan > > [1] > > https://lists.apache.org/thread.html/4a2210d567cbc05af92c12aa6283fd09b857ce209d537986ed800029@%3Cyarn-dev.hadoop.apache.org%3E > [2] > > https://lists.apache.org/thread.html/6e94469ca105d5a15dc63903a541bd21c7ef70b8bcff475a16b5ed73@%3Cyarn-dev.hadoop.apache.org%3E >
Re: [VOTE] Merging branch HDFS-7240 to trunk
This discussion seems to have died down coming closer consensus without a resolution. I'd like to propose the following compromise: * HDSL become a subproject of Hadoop. * HDSL will release separately from Hadoop. Hadoop releases will not contain HDSL and vice versa. * HDSL will get its own jira instance so that the release tags stay separate. * On trunk (as opposed to release branches) HDSL will be a separate module in Hadoop's source tree. This will enable the HDSL to work on their trunk and the Hadoop trunk without making releases for every change. * Hadoop's trunk will only build HDSL if a non-default profile is enabled. * When Hadoop creates a release branch, the RM will delete the HDSL module from the branch. * HDSL will have their own Yetus checks and won't cause failures in the Hadoop patch check. I think this accomplishes most of the goals of encouraging HDSL development while minimizing the potential for disruption of HDFS development. Thoughts? Andrew, Jitendra, & Sanjay? Thanks, Owen
Re: [VOTE] Merging branch HDFS-7240 to trunk
Hi Joep, On Tue, Mar 6, 2018 at 6:50 PM, J. Rottinghuiswrote: Obviously when people do want to use Ozone, then having it in the same repo > is easier. The flipside is that, separate top-level project in the same > repo or not, it adds to the Hadoop releases. > Apache projects are about the group of people who are working together. There is a large overlap between the team working on HDFS and Ozone, which is a lot of the motivation to keep project overhead to a minimum and not start a new project. Using the same releases or separate releases is a distinct choice. Many Apache projects, such as Common and Maven, have multiple artifacts that release independently. In Hive, we have two sub-projects that release indepdendently: Hive Storage API, and Hive. One thing we did during that split to minimize the challenges to the developers was that Storage API and Hive have the same master branch. However, since they have different releases, they have their own release branches and release numbers. If there is a change in Ozone and a new release needed, it would have to > wait for a Hadoop release. Ditto if there is a Hadoop release and there is > an issue with Ozone. The case that one could turn off Ozone through a Maven > profile works only to some extend. > If we have done a 3.x release with Ozone in it, would it make sense to do > a 3.y release with y>x without Ozone in it? That would be weird. > Actually, if Ozone is marked as unstable/evolving (we should actually have an even stronger warning for a feature preview), we could remove it in a 3.x. If a user picks up a feature before it is stable, we try to provide a stable platform, but mistakes happen. Introducing an incompatible change to the Ozone API between 3.1 and 3.2 wouldn't be good, but it wouldn't be the end of the world. .. Owen
Re: [VOTE] Merging branch HDFS-7240 to trunk
On Thu, Mar 1, 2018 at 11:03 PM, Andrew Wangwrote: Owen mentioned making a Hadoop subproject; we'd have to > hash out what exactly this means (I assume a separate repo still managed by > the Hadoop project), but I think we could make this work if it's more > attractive than incubation or a new TLP. Ok, there are multiple levels of sub-projects that all make sense: - Same source tree, same releases - examples like HDFS & YARN - Same master branch, separate releases and release branches - Hive's Storage API vs Hive. It is in the source tree for the master branch, but has distinct releases and release branches. - Separate source, separate release - Apache Commons. There are advantages and disadvantages to each. I'd propose that we use the same source, same release pattern for Ozone. Note that we tried and later reverted doing Common, HDFS, and YARN as separate source, separate release because it was too much trouble. I like Daryn's idea of putting it as a top level directory in Hadoop and making sure that nothing in Common, HDFS, or YARN depend on it. That way if a Release Manager doesn't think it is ready for release, it can be trivially removed before the release. One thing about using the same releases, Sanjay and Jitendra are signing up to make much more regular bugfix and minor releases in the near future. For example, they'll need to make 3.2 relatively soon to get it released and then 3.3 somewhere in the next 3 to 6 months. That would be good for the project. Hadoop needs more regular releases and fewer big bang releases. .. Owen
Re: [VOTE] Merging branch HDFS-7240 to trunk
I think it would be good to get this in sooner rather than later, but I have some thoughts. 1. It is hard to tell what has changed. git rebase -i tells me the branch has 722 commits. The rebase failed with a conflict. It would really help if you rebased to current trunk. 2. I think Ozone would be a good Hadoop subproject, but it should be outside of HDFS. 3. CBlock, which is also coming in this merge, would benefit from more separation from HDFS. 4. What are the new transitive dependencies that Ozone, HDSL, and CBlock adding to the clients? The servers matter too, but the client dependencies have a huge impact on our users. 5. Have you checked the new dependencies for compatibility with ASL? On Thu, Mar 1, 2018 at 2:45 PM, Clay B.wrote: > Oops, retrying now subscribed to more than solely yarn-dev. > > -Clay > > > On Wed, 28 Feb 2018, Clay B. wrote: > > +1 (non-binding) >> >> I have walked through the code and find it very compelling as a user; I >> really look forward to seeing the Ozone code mature and it maturing HDFS >> features together. The points which excite me as an eight year HDFS user >> are: >> >> * Excitement for making the datanode a storage technology container - this >> patch clearly brings fresh thought to HDFS keeping it from growing stale >> >> * Ability to build upon a shared storage infrastructure for diverse >> loads: I do not want to have "stranded" storage capacity or have to >> manage competing storage systems on the same disks (and further I want >> the metrics datanodes can provide me today, so I do not have to >> instrument two systems or evolve their instrumentation separately). >> >> * Looking forward to supporting object-sized files! >> >> * Moves HDFS in the right direction to test out new block management >> techniques for scaling HDFS. I am really excited to see the raft >> integration; I hope it opens a new era in Hadoop matching modern systems >> design with new consistency and replication options in our ever >> distributed ecosystem. >> >> -Clay >> >> On Mon, 26 Feb 2018, Jitendra Pandey wrote: >> >>Dear folks, >>> We would like to start a vote to merge HDFS-7240 branch into >>> trunk. The context can be reviewed in the DISCUSSION thread, and in the >>> jiras (See references below). >>> >>>HDFS-7240 introduces Hadoop Distributed Storage Layer (HDSL), which >>> is a distributed, replicated block layer. >>>The old HDFS namespace and NN can be connected to this new block >>> layer as we have described in HDFS-10419. >>>We also introduce a key-value namespace called Ozone built on HDSL. >>> >>>The code is in a separate module and is turned off by default. In a >>> secure setup, HDSL and Ozone daemons cannot be started. >>> >>>The detailed documentation is available at >>> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+ >>> Distributed+Storage+Layer+and+Applications >>> >>> >>>I will start with my vote. >>>+1 (binding) >>> >>> >>>Discussion Thread: >>> https://s.apache.org/7240-merge >>> https://s.apache.org/4sfU >>> >>>Jiras: >>> https://issues.apache.org/jira/browse/HDFS-7240 >>> https://issues.apache.org/jira/browse/HDFS-10419 >>> https://issues.apache.org/jira/browse/HDFS-13074 >>> https://issues.apache.org/jira/browse/HDFS-13180 >>> >>> >>>Thanks >>>jitendra >>> >>> >>> >>> >>> >>>DISCUSSION THREAD SUMMARY : >>> >>>On 2/13/18, 6:28 PM, "sanjay Radia" >>> wrote: >>> >>>Sorry the formatting got messed by my email client. Here >>> it is again >>> >>> >>>Dear >>> Hadoop Community Members, >>> >>> We had multiple community discussions, a few meetings >>> in smaller groups and also jira discussions with respect to this thread. We >>> express our gratitude for participation and valuable comments. >>> >>>The key questions raised were following >>>1) How the new block storage layer and OzoneFS benefit >>> HDFS and we were asked to chalk out a roadmap towards the goal of a >>> scalable namenode working with the new storage layer >>>2) We were asked to provide a security design >>>3)There were questions around stability given ozone >>> brings in a large body of code. >>>4) Why can?t they be separate projects forever or merged >>> in when production ready? >>> >>>We have responded to all the above questions with >>> detailed explanations and answers on the jira as well as in the >>> discussions. We believe that should sufficiently address community?s >>> concerns. >>> >>>Please see the summary below: >>> >>>1) The new code base benefits HDFS scaling and a roadmap >>> has been provided. >>> >>>
Re: Shuffler logic implementation
That is under your application's control. Define a class that implements Partitioner https://hadoop.apache.org/docs/r2.6.3/api/org/apache/hadoop/mapreduce/Partitioner.html and set the name of the class in your job's configuration using job.setPartitionerClass(...). .. Owen On Tue, Mar 14, 2017 at 2:51 PM, Pushparaj Motamariwrote: > Hi, > > I want to understand the implementation in the code which assigns > particular reducer with particular keys.I mean, the code which provides the > logic of assigning reducers with a particular key, where Mappers will send > their key,value pairs after mapping. Will it assign based on > hash(key)%(number of reducers) ? > > Regards > > Pushparaj >
Re: Moving to JDK7, JDK8 and new major releases
On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop I agree with Alejandro. Changing minimum JDKs is not an incompatible change and is fine in the 2 branch. (Although I think it is would *not* be appropriate for a patch release.) Of course we need to do it with forethought and testing, but moving off of JDK 6, which is EOL'ed is a good thing. Moving to Java 8 as a minimum seems much too aggressive and I would push back on that. I'm also think that we need to let the dust settle on the Hadoop 2 line for a while before we talk about Hadoop 3. It seems that it has only been in the last 6 months that Hadoop 2 adoption has reached the main stream users. Our user community needs time to digest the changes in Hadoop 2.x before we fracture the community by starting to discuss Hadoop 3 releases. .. Owen
[jira] [Created] (MAPREDUCE-5490) MapReduce doesn't set the environment variable for children processes
Owen O'Malley created MAPREDUCE-5490: Summary: MapReduce doesn't set the environment variable for children processes Key: MAPREDUCE-5490 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5490 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Owen O'Malley Assignee: Owen O'Malley Currently, MapReduce uses the command line argument to pass the classpath to the child. This breaks if the process forks a child that needs the same classpath. Such a case happens in Hive when it uses map-side joins. I propose that we make MapReduce in branch-1 use the CLASSPATH environment variable like YARN does. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5202) Revert MAPREDUCE-4397 to avoid using incorrect config files
Owen O'Malley created MAPREDUCE-5202: Summary: Revert MAPREDUCE-4397 to avoid using incorrect config files Key: MAPREDUCE-5202 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5202 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley MAPREDUCE-4397 added the capability to switch the location of the taskcontroller.cfg file, which weakens security. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5202) Revert MAPREDUCE-4397 to avoid using incorrect config files
[ https://issues.apache.org/jira/browse/MAPREDUCE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-5202. -- Resolution: Fixed I reverted the previous patch on branch-1, branch-1.1, and branch-1.2. Revert MAPREDUCE-4397 to avoid using incorrect config files --- Key: MAPREDUCE-5202 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5202 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley MAPREDUCE-4397 added the capability to switch the location of the taskcontroller.cfg file, which weakens security. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Release numbering for branch-2 releases
I think that using -(alpha,beta) tags on the release versions is a really bad idea. All releases should follow the strictly numeric (Major.Minor.Patch) pattern that we've used for all of the releases except the 2.0.x ones. -- Owen On Mon, Feb 4, 2013 at 11:53 AM, Stack st...@duboce.net wrote: On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy a...@hortonworks.com wrote: Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a stable release? This way we just have one series (2.0.x) which is not suitable for general consumption. That contains the versioning damage to the 2.0.x set. This is an improvement over the original proposal where we let the versioning mayhem run out 2.3. Thanks Arun, St.Ack
[jira] [Created] (MAPREDUCE-4601) Windows CMD processor doesn't use double quotes
Owen O'Malley created MAPREDUCE-4601: Summary: Windows CMD processor doesn't use double quotes Key: MAPREDUCE-4601 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4601 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win Reporter: Owen O'Malley Assignee: Owen O'Malley Currently, the task launch script under windows matches Linux and double quotes all of the values. Unfortunately, the Windows' CMD processor doesn't need and doesn't ignore the double quotes. The main symptom of this is that the CLASSPATH loses the first and last entries. {code} set CLASSPATH=c:\foo;c:\bar;c:\baz {code} results in having 'c:\foo' 'c:\bar' and 'c:\baz' on the classpath. Of those three, only the 'c:\bar' is valid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4505) Create a combiner bypass path for keys with a single value
Owen O'Malley created MAPREDUCE-4505: Summary: Create a combiner bypass path for keys with a single value Key: MAPREDUCE-4505 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4505 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Reporter: Owen O'Malley It would help optimize a lot of cases where there aren't a lot of replicated keys if the framework would bypass the deserialize/combiner/serialize step for keys that only have a single value. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4232) Make the distributed cache tests easier to diagnose
Owen O'Malley created MAPREDUCE-4232: Summary: Make the distributed cache tests easier to diagnose Key: MAPREDUCE-4232 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4232 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache, test Reporter: Owen O'Malley Assignee: Owen O'Malley We currently require that the test environment: * Have umask of 0022. * Have a world readable basedir (including parents) It would be good to check for those before bothering to run tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Reduce output is strange
On Tue, Apr 3, 2012 at 8:01 AM, Pedro Costa psdc1...@gmail.com wrote: If I want to compare 2 sequence files to see if they are the same, how do I compare? From the command line, you can textify the files with: hadoop fs -text myfile.seq Of course, if you are using API you can iterate through the two Sequence files and compare them row by row. -- Owen
Re: Reduce output is strange
On Tue, Apr 3, 2012 at 8:25 AM, Pedro Costa psdc1...@gmail.com wrote: What I want to ask is: - how do I read the values from sequence files that are block, or record compressed, or uncompressed? You use the SequenceFile.Reader class. - how do I know if the sequence file is block compressed, record compressed, or uncompressed? You use the SequenceFile.Reader class. - how do I know if it's a sequence file or a Textfile? SequenceFile's always have SEQ followed by the version in the first 4 bytes. -- Owen
Re: [RESULT] - [VOTE] Rename hadoop branches post hadoop-1.x
On Wed, Mar 28, 2012 at 5:11 PM, Doug Cutting cutt...@apache.org wrote: On 03/28/2012 12:39 PM, Owen O'Malley wrote: [ ... ] So the RM of the 2 branch needs to make the call of what should be 2.1 vs 3.0. I thought these were community decisions, not RM decisions, no? What to release is the RM's decision and then voted on by the community. We tried voting on which features to include and it led to no releases for two years. I think our users are better served by having good usable releases. -- Owen
Re: [RESULT] - [VOTE] Rename hadoop branches post hadoop-1.x
I disagree. Trunk should become branch-3 once someone wants to start stabilizing it. Arun is going to need the minor versions for when he adds features. X.Y.Z Z = bug fixes Y = minor release (compatible, adds features) X = major release (incompatible) So from branch-2 will come branch-2.0 with tags for 2.0.0, 2.0.1. New features will go into branch-2, which will become branch-2.1, branch-2.2, and so on. -- Owen
Re: [RESULT] - [VOTE] Rename hadoop branches post hadoop-1.x
On Wed, Mar 28, 2012 at 12:32 PM, Todd Lipcon t...@cloudera.com wrote: But new features also go to trunk. And if none of our new features are incompatible, why do we anticipate that trunk is 3.0? Let's imagine that we already had a 2.0.0 release. Now we want to add features like HA. The only place to put that is in 2.1.0. On the other hand, you don't want to pull *ALL* of the changes from trunk. That is way too much scope. So the RM of the 2 branch needs to make the call of what should be 2.1 vs 3.0. -- Owen
Re: svn commit: r1304067 - in /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project: ./ bin/ conf/ hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/ hadoop-mapreduce-exam
To me, I'd much much rather have the human readable description of what is being fixed and I mostly could care less about which subversion commit it corresponds to. I'd be all for using the CHANGE.txt description as the commit message for both trunk and the branches. -- Owen
[jira] [Created] (MAPREDUCE-3773) Add queue metrics with buckets for job run times
Add queue metrics with buckets for job run times Key: MAPREDUCE-3773 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3773 Project: Hadoop Map/Reduce Issue Type: New Feature Components: jobtracker Reporter: Owen O'Malley Assignee: Owen O'Malley It would be nice to have queue metrics that reflect the number of jobs in each queue that have been running for different ranges of time. Reasonable time ranges are probably 0-1 hr, 1-5 hr, 5-24 hr, 24+ hrs; but they should be configurable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3495) Remove my personal email address from the pipes build file.
Remove my personal email address from the pipes build file. --- Key: MAPREDUCE-3495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3495 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Reporter: Owen O'Malley Assignee: Owen O'Malley When I first wrote the pipes autoconf/automake stuff, I incorrectly put my email address in the AC_INIT line, which means if something goes wrong, you get: {quote} configure: WARNING: ## Report this to my-email ## {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2977) ResourceManager needs to renew and cancel tokens associated with a job
ResourceManager needs to renew and cancel tokens associated with a job -- Key: MAPREDUCE-2977 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2977 Project: Hadoop Map/Reduce Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.0 Reporter: Owen O'Malley Priority: Blocker The JobTracker currently manages tokens for the applications and the resource manager needs the same functionality. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2946) TaskTrackers fail at startup
TaskTrackers fail at startup Key: MAPREDUCE-2946 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2946 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.205.0 Reporter: Owen O'Malley Fix For: 0.20.205.0 Upgrading from 0.20.204.0 to 0.20.205.0-SNAPSHOT, the TaskTrackers refused to start because the cleanup failed. I was able to start the task trackers by deleting the mapred localdirs across the cluster. I was running with the linux task controller and security turned on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2946) TaskTrackers fail at startup
[ https://issues.apache.org/jira/browse/MAPREDUCE-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-2946. -- Resolution: Invalid I forgot to chmod the task-controller to setuid. Sorry for the noise. TaskTrackers fail at startup Key: MAPREDUCE-2946 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2946 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.205.0 Reporter: Owen O'Malley Fix For: 0.20.205.0 Upgrading from 0.20.204.0 to 0.20.205.0-SNAPSHOT, the TaskTrackers refused to start because the cleanup failed. I was able to start the task trackers by deleting the mapred localdirs across the cluster. I was running with the linux task controller and security turned on. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Created] (MAPREDUCE-2911) Hamster: Hadoop And Mpi on the same cluSTER
On Wed, Aug 31, 2011 at 7:22 PM, Josh Patterson j...@cloudera.com wrote: Do we have a list of all MR2 frameworks being worked on currently beyond MPI and Spark? Giraph is also going to port over: https://issues.apache.org/jira/browse/GIRAPH-13 -- Owen
[jira] [Resolved] (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-1943. -- Resolution: Fixed Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes Key: MAPREDUCE-1943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 0.20.203.0 Attachments: MAPREDUCE-1943-0.20-yahoo.patch, MAPREDUCE-1943-0.20-yahoo.patch, MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, MAPREDUCE-1943-yahoo-hadoop-0.20S.patch We have come across issues in production clusters wherein users abuse counters, statusreport messages and split sizes. One such case was when one of the users had 100 million counters. This leads to jobtracker going out of memory and being unresponsive. In this jira I am proposing to put sane limits on the status report length, the number of counters and the size of block locations returned by the input split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2846) a small % of all tasks fail with DefaultTaskController
[ https://issues.apache.org/jira/browse/MAPREDUCE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-2846. -- Resolution: Fixed Fix Version/s: 0.23.0 0.20.204.0 Hadoop Flags: [Reviewed] I just committed this. a small % of all tasks fail with DefaultTaskController -- Key: MAPREDUCE-2846 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2846 Project: Hadoop Map/Reduce Issue Type: Bug Components: task, task-controller, tasktracker Affects Versions: 0.20.204.0 Reporter: Allen Wittenauer Assignee: Owen O'Malley Priority: Blocker Fix For: 0.20.204.0, 0.23.0 Attachments: sync-trunk.patch, sync.patch After upgrading our test 0.20.203 grid to 0.20.204-rc2, we ran terasort to verify operation. While the job completed successfully, approx 10% of the tasks failed with task runner execution errors and the inability to create symlinks for attempt logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2688) rpm should only require the same major version as common and hdfs
[ https://issues.apache.org/jira/browse/MAPREDUCE-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-2688. -- Resolution: Duplicate Fix Version/s: 0.23.0 This was fixed by HDFS-2156. rpm should only require the same major version as common and hdfs - Key: MAPREDUCE-2688 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2688 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Fix For: 0.23.0 The rpm should only require the same version of common and hdfs be installed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2688) rpm should only require the same major version as common and hdfs
rpm should only require the same major version as common and hdfs - Key: MAPREDUCE-2688 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2688 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley The rpm should only require the same version of common and hdfs be installed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: mappers
Look in the job history file. It has a line for each event of the job including task start and finish. -- Owen On Jun 26, 2011, at 2:17 AM, Keren Ouaknine ker...@gmail.com wrote: Hello, I am looking for the actual number of mappers on each machine for the job. I know how to configure the max number (mapred.tasktracker.map.tasks.maximum in mapred-site.xml file), but not the actual number of mappers that were running for a completed job. Any idea where can I find this data? Thanks, Keren -- Keren Ouaknine Cell: +972 54 2565404 Web: www.kereno.com
[jira] [Resolved] (MAPREDUCE-587) Stream test TestStreamingExitStatus fails with Out of Memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-587. - Resolution: Fixed Fix Version/s: 0.23.0 This is already committed to trunk. Stream test TestStreamingExitStatus fails with Out of Memory Key: MAPREDUCE-587 URL: https://issues.apache.org/jira/browse/MAPREDUCE-587 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Environment: OS/X, 64-bit x86 imac, 4GB RAM. Reporter: Steve Loughran Assignee: Amar Kamat Priority: Minor Fix For: 0.23.0 Attachments: MAPREDUCE-587-v1.0.patch, mr-587-yahoo-y20-v1.0.patch, mr-587-yahoo-y20-v1.1.patch contrib/streaming tests are failing a test with an Out of Memory error on an OS/X Mac -same problem does not surface on Linux. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2506) Create a compatible interface for frameworks that need to clone MapReduce context objects.
Create a compatible interface for frameworks that need to clone MapReduce context objects. -- Key: MAPREDUCE-2506 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2506 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Owen O'Malley In 0.21 we moved the org.apache.hadoop.mapreduce context objects to interfaces. That made the APIs much better, but broke backwards compatibility for frameworks that need to clone them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2359) Distributed cache doesn't use non-default FileSystems correctly
Distributed cache doesn't use non-default FileSystems correctly --- Key: MAPREDUCE-2359 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2359 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Assignee: Krishna Ramachandran Fix For: 0.20.100 We are passing fs.deafult.name as viewfs:/// in core site.xml on oozie server. We have default name node in configuration also viewfs:/// We are using hdfs://path in our path for application. Its giving following error: IllegalArgumentException: Wrong FS: hdfs://nn/user/strat_ci/oozie-oozi/002-110217014830452-oozie-oozi-W/hadoop1--map-reduce/map-reduce-launcher.jar, expected: viewfs:/ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2360) Pig fails when using non-default FileSystem
Pig fails when using non-default FileSystem --- Key: MAPREDUCE-2360 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2360 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Reporter: Owen O'Malley Fix For: 0.20.100 The job client strips the file system from the user's job jar, which causes breakage when it isn't the default file system. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2361) Distributed Cache is not adding files to class paths correctly
Distributed Cache is not adding files to class paths correctly -- Key: MAPREDUCE-2361 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2361 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Owen O'Malley Assignee: Chris Douglas I am trying to add files into class path using: DistributedCache.addFileToClassPath If file path is relative path like: /user/dir1/dir2/a.jar everything is ok means if I try to get these files from the class path using DistributedCache.getFileClassPaths it returns me the path correctly. However if I use path like hdfs://nn:7877/user/dir1/dir2/a.jar And try to get class path files using DistributedCache.getFileClassPaths it returns me 3 files like: hdfs //nn 7877/user/dir1/dir2/a.jar -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2362) Unit test failures: TestBadRecords and TestTaskTrackerMemoryManager
Unit test failures: TestBadRecords and TestTaskTrackerMemoryManager --- Key: MAPREDUCE-2362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2362 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Owen O'Malley Assignee: Greg Roelofs Fix For: 0.20.100 Fix unit-test failures: TestBadRecords (NPE due to rearranged MapTask code) and TestTaskTrackerMemoryManager (need hostname in output-string pattern). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2363) Bad error messages for queues without acls
Bad error messages for queues without acls -- Key: MAPREDUCE-2363 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2363 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/capacity-sched Reporter: Owen O'Malley Assignee: Dick King Fix For: 0.20.100 When a queue is built without any access rights, the error message is very bad. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2364) Shouldn't hold lock on rjob while localizing resources.
Shouldn't hold lock on rjob while localizing resources. --- Key: MAPREDUCE-2364 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2364 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.100 Reporter: Owen O'Malley Assignee: Devaraj Das Fix For: 0.20.100 There is a deadlock while localizing resources on the TaskTracker. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2365) Add counters for FileInputFormat (BYTES_READ) and FileOutputFormat (BYTES_WRITTEN)
Add counters for FileInputFormat (BYTES_READ) and FileOutputFormat (BYTES_WRITTEN) -- Key: MAPREDUCE-2365 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2365 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley MAP_INPUT_BYTES and MAP_OUTPUT_BYTES will be computed using the difference between FileSystem counters before and after each next(K,V) and collect/write op. In case compression is being used, these counters will represent the compressed data sizes. The uncompressed size will not be available. This is not a direct back-port of 5710. (Counters will be computed in MapTask instead of in individual RecordReaders). 0.20.100 - New API - MAP_INPUT_BYTES will be computed using this method Old API - MAP_INPUT_BYTES will remain unchanged. 0.23 - New API - MAP_INPUT_BYTES will be computed using this method Old API - MAP_INPUT_BYTES likely to use this method -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2366) TaskTracker can't retrieve stdout and stderr from web UI
TaskTracker can't retrieve stdout and stderr from web UI Key: MAPREDUCE-2366 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2366 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Reporter: Owen O'Malley Assignee: Dick King Fix For: 0.20.100 Problem where the task browser UI can't retrieve the stdxxx printouts of streaming jobs that abend in the unix code, in the common case where the containing job doesn't reuse JVM's. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2355) Add an out of band heartbeat damper
Add an out of band heartbeat damper --- Key: MAPREDUCE-2355 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2355 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Reporter: Owen O'Malley Assignee: Arun C Murthy We should have a configurable knob to throttle how many out of band heartbeats are sent. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2357) When extending inputsplit (non-FileSplit), all exceptions are ignored
When extending inputsplit (non-FileSplit), all exceptions are ignored - Key: MAPREDUCE-2357 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2357 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Owen O'Malley Assignee: Luke Lu Fix For: 0.20.100 if you're using a custom RecordReader/InputFormat setup and using an InputSplit that does NOT extend FileSplit, then any exceptions you throw in your RecordReader.nextKeyValue() function are silently ignored. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2358) MapReduce assumes HDFS as the default filesystem
MapReduce assumes HDFS as the default filesystem Key: MAPREDUCE-2358 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2358 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Assignee: Krishna Ramachandran Fix For: 0.20.100 Mapred assumes hdfs as the default fs even when defined otherwise. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2262) Capacity Scheduler unit tests fail with class not found
Capacity Scheduler unit tests fail with class not found --- Key: MAPREDUCE-2262 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2262 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/capacity-sched Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.20.3 Currently the ivy.xml file for the capacity scheduler doesn't include the commons-cli, leading to class not found exceptions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Where ask questions about MapReduce source code?
On Wed, Dec 22, 2010 at 3:29 PM, Pedro Costa psdc1...@gmail.com wrote: Hi, I would like to understand some parts of the Map Reduce source code, and I don't know where to ask. Should I ask here? Yes.
[jira] Created: (MAPREDUCE-2188) The new API MultithreadedMapper doesn't call the initialize method of the RecordReader
The new API MultithreadedMapper doesn't call the initialize method of the RecordReader -- Key: MAPREDUCE-2188 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2188 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.22.0 The wrapping RecordReader in the Multithreaded Mapper is never initialized. With HADOOP-6685, this becomes a problem because the ReflectionUtils.copy requires a non-null configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Trunk build failing with ivy errors
On Wed, Nov 10, 2010 at 3:48 PM, Todd Lipcon t...@cloudera.com wrote: Tom has discovered that bumping the log4j version to 1.2.16 instead of 1.2.15 fixes the issue... should we just do that? I think that sounds reasonable. -- Owen
[jira] Resolved: (MAPREDUCE-2164) MapredTestDriver.java compilation fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-2164. -- Resolution: Cannot Reproduce It compiles for me at this point. If it still fails for you, please reopen. MapredTestDriver.java compilation fails on trunk Key: MAPREDUCE-2164 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2164 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Giridharan Kesavan Priority: Critical compile-mapred-test: [mkdir] Created dir: /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/build/test/mapred/classes [mkdir] Created dir: /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/build/test/mapred/testjar [mkdir] Created dir: /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/build/test/mapred/testshell [javac] Compiling 319 source files to /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/build/test/mapred/classes [javac] /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/src/test/mapred/org/apache/hadoop/test/MapredTestDriver.java:21: cannot find symbol [javac] symbol : class TestSequenceFile [javac] location: package org.apache.hadoop.io [javac] import org.apache.hadoop.io.TestSequenceFile; [javac]^ [javac] /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/src/test/mapred/org/apache/hadoop/test/MapredTestDriver.java:59: cannot find symbol [javac] symbol : class TestSequenceFile [javac] location: class org.apache.hadoop.test.MapredTestDriver [javac] pgd.addClass(testsequencefile, TestSequenceFile.class, [javac]^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 2 errors -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: hadoop.job.ugi backwards compatibility
On Sep 13, 2010, at 4:23 PM, Todd Lipcon wrote: I agree that keeping API compatibility for UGI was probably impossible, and respect that. But it would certainly be very easy to do a patch like the following: JobClient(Configuration conf) { if (conf.get(hadoop.job.ugi) != null UserGroupInformation.isSecurityEnabled()) { LOG.warn(Stop being evil. Don't use hadoop.job.ugi! RAAWR); UserGroupInformation.createRemoteUser(...).doAs() { create proxy } } else { create normal RPC proxy; } } My problem is three fold: 1. It isn't one or two spots. It is a *lot* of spots. Doing it inconsistently would be far worse than useless. 2. Having two different authentication paths dramatically increases the chance for bugs. 3. The previously mentioned badness where the api semantics dramatically change with the value of a config variable that isn't there to enable backwards compatibility. Furthermore, the upside is really small consisting of only the users that have: 1. developed internal servers that handle multiple users. 2. on hadoop 0.20 3. never plan on turning on security 4. are interested in moving to 0.21 or 0.22 5. aren't willing to do the straightforward fixes to their code. -- Owen
Re: hadoop.job.ugi backwards compatibility
Moving the discussion over to the more appropriate mapreduce-dev. On Mon, Sep 13, 2010 at 9:08 AM, Todd Lipcon t...@cloudera.com wrote: 1) Groups resolution happens on the server side, where it used to happen on the client. Thus, all Hadoop users must exist on the NN/JT machines in order for group mapping to succeed (or the user must write a custom group mapper). There is a plugin that performs the group lookup. See HADOOP-4656. There is no requirement for having the user accounts on the NN/JT although that is the easiest approach. It is not recommended that the users be allowed to login. I think it is important that turning security on and off doesn't drastically change the semantics or protocols. That will become much much harder to support downstream. 2) The hadoop.job.ugi parameter is ignored - instead the user has to use the new UGI.createRemoteUser(foo).doAs() API, even in simple security. User code that counts on hadoop.job.ugi working will be horribly broken once you turn on security. Turning on and off security should not involve testing all of your applications. It is unfortunate that we ever used the configuration value as the user, but continuing to support it will make our user's code much much more brittle. -- Owen
Re: hadoop.job.ugi backwards compatibility
On Mon, Sep 13, 2010 at 10:05 AM, Todd Lipcon t...@cloudera.com wrote: This is not MR-specific, since the strangely named hadoop.job.ugi determines HDFS permissions as well. Yeah, after I hit send, I realized that I should have used common-dev. This is really a dev issue. or the user must write a custom group mapper above refers to this plugin capability. But I think most users do not want to spend the time to write (or even setup) such a plugin beyond the default shell-based mapping service. Sure, which is why it is easiest to just have the (hopefully disabled) user accounts on the jt/nn. Any installs 100 nodes should be using HADOOP-6864 to avoid the fork in the JT/NN. As someone who spends an awful lot of time doing downstream support of lots of different clusters, I actually disagree. Normal applications never need to do doAs. They run as the default user. This only comes up in servers that deal with multiple users. In *that* context, it sucks having servers that only work in non-secure mode. If some server X only works without security that sucks. Doing doAs isn't harder, it is just different. Having two different semantics models *will* cause lots of grief. -- Owen
Re: hadoop.job.ugi backwards compatibility
On Mon, Sep 13, 2010 at 11:10 AM, Todd Lipcon t...@cloudera.com wrote: Yep, but there are plenty of 10 node clusters out there that do important work at small startups or single-use-case installations, too. We need to provide scalability and security features that work for the 100+ node clusters but also not leave the beginners in the dust. 10 node clusters are an important use case, but creating the user accounts on those clusters is very easy because of the few users. Futhermore, if the accounts aren't there it just means the users have no groups. Which for a single use system with security turned off isn't the end of the world. But I think there are plenty of people out there who have built small webapps, shell scripts, cron jobs, etc that use hadoop.job.ugi on some shared account to impersonate other users. I'd be surprised. At Yahoo, the primary problem came with people screen scraping the jobtracker http. With security turned off that isn't an issue. Again, it isn't hard, just the evolving interface of UserGroupInformation changed. With security, we tried really hard to maintain backwards compatibility and succeeded for the vast (99%+) majority of the users. Perhaps I am estimating incorrectly - that's why I wanted this discussion on a user-facing list rather than a dev-facing list. Obviously the pointer is there for them to follow into the rabbit hole of the dev lists. *grin* Another example use case that I do a lot on non-secure clusters is: hadoop fs -Dhadoop.job.ugi=hadoop,hadoop something I want to do as a superuser. The permissions model we have in 0.20 obviously isn't secure, but it's nice to avoid accidental mistakes, and making it easy to sudo like that is handy. It might make sense to add a new switch ( -user ?) to hadoop fs that does a doAs before doing the shell command. You could even make it fancy and try to be a proxy user if security is turned on. Regardless of our particular opinions, isn't our policy that we cannot break API compatibility between versions without a one-version deprecation period? There wasn't a way to keep UGI stable. It was a broken design before the security work. It is marked evolving so we try to minimize breakage, but it isn't prohibited. -- Owen
[jira] Resolved: (MAPREDUCE-2046) A input split cannot be less than a dfs block
[ https://issues.apache.org/jira/browse/MAPREDUCE-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-2046. -- Resolution: Cannot Reproduce This isn't true. InputSplits can be arbitrarily sized by the InputFormat. mapred.TextInputFormat if you set the number of maps very high, you will generate a large number of maps. In the new mapreduce.in.TextInputFormat, there are knobs that set the minimum and maximum block size. A input split cannot be less than a dfs block -- Key: MAPREDUCE-2046 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2046 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Namit Jain I ran into this while testing some hive features. Whether we use hiveinputformat or combinehiveinputformat, a split cannot be less than a dfs block size. This is a problem if we want to increase the block size for older data to reduce memory consumption for the name node. It would be useful if the input split was independent of the dfs block size. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-2007) Is it possible that use ArrayList or other type instead Iterable when use reduce(Object, Iterable, Context)?
[ https://issues.apache.org/jira/browse/MAPREDUCE-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-2007. -- Resolution: Won't Fix The framework can't assume that all of the values fit into memory, so it is not possible to make the API require a List object. If you are just counting values, you should consider replacing the value with an integer and implement a combiner that adds the counts together. It will be much more efficient. Look at the word count example for an example of how to do this. If you just need the first N values, just iterate through the values you need and return from the reduce method. There is no need to exhaust the iterator. Is it possible that use ArrayList or other type instead Iterable when use reduce(Object, Iterable, Context)? -- Key: MAPREDUCE-2007 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2007 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.20.2 Reporter: Hui Wen Han Fix For: 0.20.2 1) Sometimes We only need get the elements count of the input values of Reducer task, but we have to iterate all the input values to calculate it. 2) Sometimes We only need get a few elements (for example top n,last n ,or random ) from the input values of Reducer task, if it can use ArrayList or other type instead Iterable when use reduce(Object, Iterable, Context),it 's more conveniency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1786) Add method to support pre-partitioned data
Add method to support pre-partitioned data -- Key: MAPREDUCE-1786 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1786 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Reporter: Owen O'Malley Assignee: Owen O'Malley There are some applications where the map wants to partition the data itself. This happens in Pipes, if the user has a C++ partitioner. It would make sense to support it in streaming too. There is also use case where the Java partitioner needs the context object to update counters, etc. This jira is only about adding the method to the mapreduce Java API. The Pipes interface can be updated in a follow up Jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Un-deprecate the old MapReduce API?
On the various pieces, I think: 0.20: -0 for removing the deprecation, +1 for improving the deprecation message with links to the corresponding class. 0.21: new core api should be stable except for Job and Cluster new library code should be evolving -1 for removing the deprecation, we need to 0.22: all of the new api should be stable and the old api deprecated. Currently there is almost no way to write a moderately complex MR job that doesn't spew deprecation warnings. That is false in 0.21. -- Owen
[jira] Created: (MAPREDUCE-1669) The imported JSON credentials should support binary secrets.
The imported JSON credentials should support binary secrets. Key: MAPREDUCE-1669 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1669 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Owen O'Malley Currently, we support adding a file with secrets to a job. It can either be in binary or JSON, but the JSON format assumes that all of the secrets are UTF-8, which is often false. We should pick a format that allows binary data to be included. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[ANNOUCEMENT] Pig Hadoop Contributors Workshop at Yahoo!
Hello Hadoop Contributors, The Yahoo Hadoop Development team would like to invite you to a Contributors Workshop for Hadoop Core (HDFS Map-Reduce) and another for Pig on the day following the Hadoop Summit. The purpose of the workshops is to collectively discuss challenges, concerns and future ideas around Hadoop and Pig technologies. When: June 30th 2010 @ 10:00 am – 3:00 pm Where: Yahoo! Buildling C, Classroom 5 @ 701 First Avenue, Sunnyvale CA 94089 If you have any suggestion for agenda items, please suggest them on the relevant developers list. Owen O'Malley has volunteered to help organize the Hadoop Core meeting and Alan Gates is doing the same for Pig. Please RSVP by sending an email to hadoopcontributorr...@yahoo-inc.com before May 30th if you plan to attend. See you all at the Hadoop Summit – June 29th, http://www.hadoopsummit.org/ Looking forward to meeting more of you! Eric Baldeschwieler Owen O'Malley PS We would be happy to provide space for any of the other Hadoop sub- projects as well! If you are interested in organizing such a workshop, please email us at hadoopcontributorr...@yahoo-inc.com with WORKSHOP ORGANIZER (project) in the subject line. I'll send a separate email to their dev lists with this invitation also.
Re: [ANNOUCEMENT] Pig Hadoop Contributors Workshop at Yahoo!
On Mar 25, 2010, at 10:20 AM, Owen O'Malley wrote: Please RSVP by sending an email to hadoopcontributorr...@yahoo- inc.com before May 30th if you plan to attend. The proper email is: hadoopcontribu...@yahoo-inc.com . Sorry for the confusion. -- Owen
[jira] Created: (MAPREDUCE-1566) Need to add a mechanism to import tokens and secrets into a submitted job.
Need to add a mechanism to import tokens and secrets into a submitted job. -- Key: MAPREDUCE-1566 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1566 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.22.0 We need to include tokens and secrets into a submitted job. I propose adding a configuration attribute that when pointed at a token storage file will include the tokens and secrets from that token storage file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1567) Sharing Credentials between JobConfs leads to unintentional sharing of credentials
Sharing Credentials between JobConfs leads to unintentional sharing of credentials -- Key: MAPREDUCE-1567 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1567 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.22.0 Currently, if code does new JobConf(jobConf), it will share the Credentials. That leads to unintentional sharing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1528) TokenStorage should not be static
TokenStorage should not be static - Key: MAPREDUCE-1528 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1528 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Currently, TokenStorage is a singleton. This doesn't work for some use cases, such as Oozie. I think that each Job should have a TokenStorage that is associated it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1515) need to pass down java5 and forrest home variables
need to pass down java5 and forrest home variables -- Key: MAPREDUCE-1515 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1515 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.22.0 Attachments: m-1515.patch Currently, the build script doesn't pass down the variables for java5 and forrest, so the build breaks unless they are on the command line. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1503) Push HADOOP-6551 into MapReduce
Push HADOOP-6551 into MapReduce --- Key: MAPREDUCE-1503 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1503 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Owen O'Malley We need to throw readable exceptions instead of returning false. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1470) Move Delegation token into Common so that we can use it for MapReduce also
Move Delegation token into Common so that we can use it for MapReduce also -- Key: MAPREDUCE-1470 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1470 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Owen O'Malley Assignee: Owen O'Malley We need to update one reference for map/reduce when we move the hdfs delegation tokens. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1462) Enable context-specific and stateful serializers in MapReduce
Enable context-specific and stateful serializers in MapReduce - Key: MAPREDUCE-1462 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1462 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task Reporter: Owen O'Malley Assignee: Owen O'Malley Although the current serializer framework is powerful, within the context of a job it is limited to picking a single serializer for a given class. Additionally, Avro generic serialization can make use of additional configuration/state such as the schema. (Most other serialization frameworks including Writable, Jute/Record IO, Thrift, Avro Specific, and Protocol Buffers only need the object's class name to deserialize the object.) With the goal of keeping the easy things easy and maintaining backwards compatibility, we should be able to allow applications to use context specific (eg. map output key) serializers in addition to the current type based ones that handle the majority of the cases. Furthermore, we should be able to support serializer specific configuration/metadata in a type safe manor without cluttering up the base API with a lot of new methods that will confuse new users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1440) MapReduce should use the short form of the user names
MapReduce should use the short form of the user names - Key: MAPREDUCE-1440 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1440 Project: Hadoop Map/Reduce Issue Type: Improvement Components: security Reporter: Owen O'Malley Fix For: 0.22.0 To minimize disruption on MapReduce, we should use the local names (ie. omalley) rather than the long names (ie. omal...@apache.org as the basis for the username in MapReduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1385) Make changes to MapReduce for the new UserGroupInformation APIs (HADOOP-6299)
[ https://issues.apache.org/jira/browse/MAPREDUCE-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-1385. -- Resolution: Fixed Hadoop Flags: [Incompatible change, Reviewed] I just committed this. Thanks, Devaraj! Make changes to MapReduce for the new UserGroupInformation APIs (HADOOP-6299) - Key: MAPREDUCE-1385 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1385 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.22.0 Attachments: mr-6299.3.patch, mr-6299.7.patch, mr-6299.8.patch, mr-6299.patch This is about moving the MapReduce code to use the new UserGroupInformation API as described in HADOOP-6299. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Moving HDFS raid package from HDFS repository to MAPREDUCE repository
The issue is that we need to avoid loops in the project dependencies. Therefore, the order has to go: Common - HDFS - MapReduce The problem is that RAID needs MapReduce and therefore can't be put into HDFS. -- Owen
[jira] Reopened: (MAPREDUCE-1126) shuffle should use serialization to get comparator
[ https://issues.apache.org/jira/browse/MAPREDUCE-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reopened MAPREDUCE-1126: -- -1 to this massive API change without much more dialog. The scope of the patch was much larger than the description. shuffle should use serialization to get comparator -- Key: MAPREDUCE-1126 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1126 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Reporter: Doug Cutting Assignee: Aaron Kimball Fix For: 0.22.0 Attachments: MAPREDUCE-1126.2.patch, MAPREDUCE-1126.3.patch, MAPREDUCE-1126.4.patch, MAPREDUCE-1126.5.patch, MAPREDUCE-1126.6.patch, MAPREDUCE-1126.patch Currently the key comparator is defined as a Java class. Instead we should use the Serialization API to create key comparators. This would permit, e.g., Avro-based comparators to be used, permitting efficient sorting of complex data types without having to write a RawComparator in Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1274) The completed job web ui urls include full path names to the local file system on the JobTracker.
The completed job web ui urls include full path names to the local file system on the JobTracker. - Key: MAPREDUCE-1274 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1274 Project: Hadoop Map/Reduce Issue Type: Bug Components: security Affects Versions: 0.21.0 Reporter: Owen O'Malley Priority: Blocker Fix For: 0.21.0 Currently, the web ui for MapReduce in 0.21.0-dev include a path to a local file in the url: http://jt.foo.com:50030/jobdetailshistory.jsp?jobid=job_200912012129_0001logFile=file%3A%2Fopt%2Flocal%2Fowen%2Fhadoop%2Frun%2Flogs%2Fhistory%2Fdone%2Fjob_200912012129_0001_oom This implies a security bug where the user uses logFile=/etc/passwd or some other annoying trick. I suspect the answer is applying MAPREDUCE-1185 back to 0.21. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (MAPREDUCE-1244) eclipse-plugin fails with missing dependencies
[ https://issues.apache.org/jira/browse/MAPREDUCE-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reopened MAPREDUCE-1244: -- We need to apply this to 0.21 also. eclipse-plugin fails with missing dependencies -- Key: MAPREDUCE-1244 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1244 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.22.0 Reporter: Giridharan Kesavan Assignee: Giridharan Kesavan Fix For: 0.21.0, 0.22.0 Attachments: mapred-1244.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1241) JobTracker should not crash when mapred-queues.xml does not exist
JobTracker should not crash when mapred-queues.xml does not exist - Key: MAPREDUCE-1241 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1241 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Priority: Blocker Fix For: 0.21.0, 0.22.0 Currently, if you bring up the JobTracker on an old configuration directory, it gets a NullPointerException looking for the mapred-queues.xml file. It should just assume a default queue and continue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Ideas for dynamic change reducer task number ?
On Nov 22, 2009, at 4:48 PM, Jeff Zhang wrote: My concern is that it is just like hard code to use conf.setNumReduceTasks on the configuration. It is not flexible, so my idea is that adding an interface to change the reducer number dynamically according the different size of input data set. You misunderstand. I meant doing something like: public class MyInputFormat public InputSplit[] getSplits(JobConf conf) { InputSplit[] result = ...; // compute total size of input conf.setNumReduceTasks(max(6, size / 10G)); } } I haven't checked the code to make sure it will work, but I believe it will. -- Owen
Re: Ideas for dynamic change reducer task number ?
I'd suggest trying to do conf.setNumReduceTasks on the configuration passed to the InputFormat in getSplits. It will probably just work. -- Owen
[jira] Resolved: (MAPREDUCE-1091) TaskTrackers only work with same build as the JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-1091. -- Resolution: Won't Fix TaskTrackers only work with same build as the JobTracker Key: MAPREDUCE-1091 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1091 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.21.0 Reporter: Arun C Murthy Fix For: 0.21.0 Currently tasktrackers check to ensure that they are the same build as the JobTracker and bail-out if not. This is too restrictive - in the past we've had similar complaints: HADOOP-5203. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Welcome Konstantin Boudnik as a qa committer!
The Hadoop PMC has voted to make Cos a QA committer on Common, HDFS, and MapReduce. I'd like to welcome Cos as the newest committer. -- Owen
[jira] Resolved: (MAPREDUCE-1014) After the 0.21 branch, MapReduce trunk doesn't compile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-1014. -- Resolution: Fixed I updated the common and hdfs jars with the current ones. After the 0.21 branch, MapReduce trunk doesn't compile -- Key: MAPREDUCE-1014 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1014 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Ravi Gummadi Priority: Blocker Fix For: 0.22.0 When ant is run, the build fails with compilation problems. The first of that is: compile-mapred-classes: [taskdef] log4j:ERROR Could not instantiate class [org.apache.hadoop.metrics.jvm.EventCounter]. [taskdef] java.lang.ClassNotFoundException: org.apache.hadoop.metrics.jvm.EventCounter [taskdef] at org.apache.tools.ant.AntClassLoader.findClassInComponents(AntClassLoader.java:1383) [taskdef] at org.apache.tools.ant.AntClassLoader.findClass(AntClassLoader.java:1324) [taskdef] at org.apache.tools.ant.AntClassLoader.loadClass(AntClassLoader.java:1072) [taskdef] at java.lang.ClassLoader.loadClass(ClassLoader.java:254) [taskdef] at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:402) [taskdef] at java.lang.Class.forName0(Native Method) [taskdef] at java.lang.Class.forName(Class.java:169) [taskdef] at org.apache.log4j.helpers.Loader.loadClass(Loader.java:179) [taskdef] at org.apache.log4j.helpers.OptionConverter.instantiateByClassName(OptionConverter.java:320) [taskdef] at org.apache.log4j.helpers.OptionConverter.instantiateByKey(OptionConverter.java:121) [taskdef] at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:664) [taskdef] at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:647) [taskdef] at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:544) [taskdef] at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:440) [taskdef] at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:476) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1026) Shuffle should be secure
Shuffle should be secure Key: MAPREDUCE-1026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026 Project: Hadoop Map/Reduce Issue Type: Improvement Components: security Reporter: Owen O'Malley Assignee: Devaraj Das Since the user's data is available via http from the TaskTrackers, we should require a job-specific secret to access it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] Commit MAPREDUCE-728 to Hadoop 0.21
On Sep 18, 2009, at 10:24 PM, Hong Tang wrote: Given the circumstances, I would like to request a vote to commit MAPREDUCE-728 to Hadoop 0.21. Mumak has already found a couple bugs in map/reduce and promises to find more. I think this is a good low-risk addition to 0.21. +1 -- Owen
[jira] Created: (MAPREDUCE-1016) Make the format of the Job History be JSON instead of Avro binary
Make the format of the Job History be JSON instead of Avro binary - Key: MAPREDUCE-1016 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1016 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Fix For: 0.21.0, 0.22.0 I forgot that one of the features that would be nice is to off load the job history display from the JobTracker. That will be a lot easier, if the job history is stored in JSON. Therefore, I think we should change the storage now to prevent incompatibilities later. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: branching mapred
On Sep 15, 2009, at 9:31 AM, Steve Loughran wrote: I've created a little branch where I've synced up my lifecycle-aware services with the moved bits http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MAPREDUCE-233/ +1
[jira] Created: (MAPREDUCE-954) The new interface's Context objects should be interfaces
The new interface's Context objects should be interfaces Key: MAPREDUCE-954 URL: https://issues.apache.org/jira/browse/MAPREDUCE-954 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Owen O'Malley Fix For: 0.21.0 When I was doing HADOOP-1230, I was persuaded to make the Context objects as classes. I think that was a serious mistake. It caused a lot of information leakage into the public classes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-421) mapred pipes might return exit code 0 even when failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-421. - Resolution: Fixed Hadoop Flags: [Reviewed] I realized that this was difficult to test. I just committed this. Thanks, Christian! mapred pipes might return exit code 0 even when failing --- Key: MAPREDUCE-421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-421 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Reporter: Christian Kunz Assignee: Christian Kunz Fix For: 0.20.1 Attachments: MAPREDUCE-421.patch up to hadoop 0.18.3 org.apache.hadoop.mapred.JobShell ensured that 'hadoop jar' returns non-zero exit code when the job fails. This is no longer true after moving this to org.apache.hadoop.util.RunJar. Pipes jobs submitted through cli never returned proper exit code. The main methods in org.apache.hadoop.util.RunJar. and org.apache.hadoop.mapred.pipes.Submitter should be modified to return an exit code similar to how org.apache.hadoop.mapred.JobShell did it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-917) Remove getInputCounter and getOutputCounter from Contexts
Remove getInputCounter and getOutputCounter from Contexts - Key: MAPREDUCE-917 URL: https://issues.apache.org/jira/browse/MAPREDUCE-917 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Amareshwari Sriramadasu Priority: Blocker Fix For: 0.21.0 The getInputCounter and getOutputCounter methods need to be removed from the new mapreduce APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-693) Conf files not moved to done subdirectory after JT restart
[ https://issues.apache.org/jira/browse/MAPREDUCE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-693. - Resolution: Cannot Reproduce Fix Version/s: (was: 0.20.1) Conf files not moved to done subdirectory after JT restart Key: MAPREDUCE-693 URL: https://issues.apache.org/jira/browse/MAPREDUCE-693 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Ramya R Priority: Minor Attachments: MAPREDUCE-693-v1.1-branch-0.20.patch, MAPREDUCE-693-v1.2-branch-0.20.patch After MAPREDUCE-516, when a job is submitted and the JT is restarted (before job files have been written) and the job is killed after recovery, the conf files fail to be moved to the done subdirectory. The exact scenario to reproduce this issue is: * Submit a job * Restart JT before anything is written to the job files * Kill the job * The old conf files remain in the history folder and fail to be moved to done subdirectory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-777) A method for finding and tracking jobs from the new API
A method for finding and tracking jobs from the new API --- Key: MAPREDUCE-777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-777 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Owen O'Malley We need to create a replacement interface for the JobClient API in the new interface. In particular, the user needs to be able to query and track jobs that were launched by other processes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (MAPREDUCE-716) org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle
[ https://issues.apache.org/jira/browse/MAPREDUCE-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reopened MAPREDUCE-716: - Assignee: evanand Sorry, I thought the other jira was still open. org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle - Key: MAPREDUCE-716 URL: https://issues.apache.org/jira/browse/MAPREDUCE-716 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Java 1.6, HAdoop0.19.0, Linux..Oracle, Reporter: evanand Assignee: evanand Attachments: HADOOP-5482.20-branch.patch, HADOOP-5482.patch, HADOOP-5482.trunk.patch Original Estimate: 24h Remaining Estimate: 24h org.apache.hadoop.mapred.lib.db.DBInputformat not working with oracle. The out of the box implementation of the Hadoop is working properly with mysql/hsqldb, but NOT with oracle. Reason is DBInputformat is implemented with mysql/hsqldb specific query constructs like LIMIT, OFFSET. FIX: building a database provider specific logic based on the database providername (which we can get using connection). I HAVE ALREADY IMPLEMENTED IT FOR ORACLE...READY TO CHECK_IN CODE -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-726) Move the mapred script to map/reduce
Move the mapred script to map/reduce Key: MAPREDUCE-726 URL: https://issues.apache.org/jira/browse/MAPREDUCE-726 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Owen O'Malley Assignee: Dick King The mapred script should be moved to mapreduce from Common. This is the parallel of HADOOP-6123. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-712) TextWritter example is CPU bound!!
[ https://issues.apache.org/jira/browse/MAPREDUCE-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved MAPREDUCE-712. - Resolution: Invalid 16 maps on 8 cpus running gzip is expected to completely saturate cpu. This is not a bug!!! Also check to see if you were using the native codec. If you are using the Java codec, it will be very slow and cpu bound. TextWritter example is CPU bound!! -- Key: MAPREDUCE-712 URL: https://issues.apache.org/jira/browse/MAPREDUCE-712 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.20.1, 0.21.0 Environment: ~200 nodes cluster Each node has the following configuration: Processors: 2 x Xeon L5420 2.50GHz (8 cores) - Harpertown C0, 64-bit, quad-core (8 CPUs) 4 Disks 16 GB RAM Linux 2.6 Hadoop version: trunk Reporter: Khaled Elmeleegy Running the RandomTextWritter example job ( from the examples jar) pegs the machiens' CPUs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (MAPREDUCE-712) TextWritter example is CPU bound!!
[ https://issues.apache.org/jira/browse/MAPREDUCE-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reopened MAPREDUCE-712: - I notice now that you didn't have compression. I wonder how much time you were spending in gc with such small heaps. That might explain the cpu load. TextWritter example is CPU bound!! -- Key: MAPREDUCE-712 URL: https://issues.apache.org/jira/browse/MAPREDUCE-712 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.20.1, 0.21.0 Environment: ~200 nodes cluster Each node has the following configuration: Processors: 2 x Xeon L5420 2.50GHz (8 cores) - Harpertown C0, 64-bit, quad-core (8 CPUs) 4 Disks 16 GB RAM Linux 2.6 Hadoop version: trunk Reporter: Khaled Elmeleegy Running the RandomTextWritter example job ( from the examples jar) pegs the machiens' CPUs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: WELCOME to mapreduce-dev@hadoop.apache.org
On Jul 5, 2009, at 1:23 PM, shruti jain wrote: hi everyone, I am trying to do svn checkout with ssh: svn checkout svn+ssh://svn.apache.org/repos/asf/hadoop/common/trunk/ hadoop-common-trunk But it asks for a password. What should I do ? Use http instead of svn+ssh. -- Owen