Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
+1. From its user's viewpoint, recent improvements on test-patch made my work really efficient. For example, quick feedback due to avoiding unnecessary tests, automated build environment setup due to Docker support, automated patch download from JIRA, automated shellcheck and whitespace checker, etc. I believe it is worth spreading these ideas as a TLP over other projects having the same problems such as a long QA process. 2015-06-16 15:08 GMT+09:00 Chris Douglas cdoug...@apache.org: +1 A separate project sounds great. It'd be great to have more standard tooling across the ecosystem. As a practical matter, how should projects consume releases? -C On Mon, Jun 15, 2015 at 4:47 PM, Sean Busbey bus...@cloudera.com wrote: Oof. I had meant to push on this again but life got in the way and now the June board meeting is upon us. Sorry everyone. In the event that this ends up contentious, hopefully one of the copied communities can give us a branch to work in. I know everyone is busy, so here's the short version of this email: I'd like to move some of the code currently in Hadoop (test-patch) into a new TLP focused on QA tooling. I'm not sure what the best format for priming this conversation is. ORC filled in the incubator project proposal template, but I'm not sure how much that confused the issue. So to start, I'll just write what I'm hoping we can accomplish in general terms here. All software development projects that are community based (that is, accepting outside contributions) face a common QA problem for vetting in-coming contributions. Hadoop is fortunate enough to be sufficiently popular that the weight of the problem drove tool development (i.e. test-patch). That tool is generalizable enough that a bunch of other TLPs have adopted their own forks. Unfortunately, in most projects this kind of QA work is an enabler rather than a primary concern, so often the tooling is worked on ad-hoc and little shared improvements happen across projects. Since the tooling itself is never a primary concern, any made is rarely reused outside of ASF projects. Over the last couple months a few of us have been working on generalizing the tooling present in the Hadoop code base (because it was the most mature out of all those in the various projects) and it's reached a point where we think we can start bringing on other downstream users. This means we need to start establishing things like a release cadence and to grow the new contributors we have to handle more project responsibility. Personally, I think that means it's time to move out from under Hadoop to drive things as our own community. Eventually, I hope the community can help draw in a group of folks traditionally underrepresented in ASF projects, namely QA and operations folks. I think test-patch by itself has enough scope to justify a project. Having a solid set of build tools that are customizable to fit the norms of different software communities is a bunch of work. Making it work well in both the context of automated test systems like Jenkins and for individual developers is even more work. We could easily also take over maintenance of things like shelldocs, since test-patch is the primary consumer of that currently but it's generally useful tooling. In addition to test-patch, I think the proposed project has some future growth potential. Given some adoption of test-patch to prove utility, the project could build on the ties it makes to start building tools to help projects do their own longer-run testing. Note that I'm talking about the tools to build QA processes and not a particular set of tested components. Specifically, I think the ChaosMonkey work that's in HBase should be generalizable as a fault injection framework (either based on that code or something like it). Doing this for arbitrary software is obviously very difficult, and a part of easing that will be to make (and then favor) tooling to allow projects to have operational glue that looks the same. Namely, the shell work that's been done in hadoop-functions.sh would be a great foundational layer that could bring good daemon handling practices to a whole slew of software projects. In the event that these frameworks and tools get adopted by parts of the Hadoop ecosystem, that could make the job of i.e. Bigtop substantially easier. I've reached out to a few folks who have been involved in the current test-patch work or expressed interest in helping out on getting it used in other projects. Right now, the proposed PMC would be (alphabetical by last name): * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds pmc, sqoop pmc, all around Jenkins expert) * Sean Busbey (ASF member, accumulo pmc, hbase pmc) * Nick Dimiduk (hbase pmc, phoenix pmc) * Chris Nauroth (ASF member, incubator pmc, hadoop pmc) * Andrew Purtell (ASF
Re: Protocol Buffers version
On Jun 16, 2015, at 2:54 AM, Steve Loughran ste...@hortonworks.com wrote: One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does. to be ruthless, that's not enough reason to upgrade branch-2, due to the transitive pain it makes all the way down. Not in branch-2, but certainly in trunk.
Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
I think this is a great idea! Having just gone through the process of getting Phoenix up to speed with precommits, it would be really nice to have a place to go other than fork/hack someone else's work. For the same project, I recently integrated its first daemon service. This meant adding a bunch of servicy Python code (multi platform support is required) which I only sort of trust. Again, would be great to have an explicit resource for this kind of thing in the ecosystem. I expect Calcite and Kylin will be following along shortly. Since we're tossing out names, how about Apache Bootstrap? It's a meta-project to help other projects get off the ground, after all. -n On Monday, June 15, 2015, Sean Busbey bus...@cloudera.com wrote: Oof. I had meant to push on this again but life got in the way and now the June board meeting is upon us. Sorry everyone. In the event that this ends up contentious, hopefully one of the copied communities can give us a branch to work in. I know everyone is busy, so here's the short version of this email: I'd like to move some of the code currently in Hadoop (test-patch) into a new TLP focused on QA tooling. I'm not sure what the best format for priming this conversation is. ORC filled in the incubator project proposal template, but I'm not sure how much that confused the issue. So to start, I'll just write what I'm hoping we can accomplish in general terms here. All software development projects that are community based (that is, accepting outside contributions) face a common QA problem for vetting in-coming contributions. Hadoop is fortunate enough to be sufficiently popular that the weight of the problem drove tool development (i.e. test-patch). That tool is generalizable enough that a bunch of other TLPs have adopted their own forks. Unfortunately, in most projects this kind of QA work is an enabler rather than a primary concern, so often the tooling is worked on ad-hoc and little shared improvements happen across projects. Since the tooling itself is never a primary concern, any made is rarely reused outside of ASF projects. Over the last couple months a few of us have been working on generalizing the tooling present in the Hadoop code base (because it was the most mature out of all those in the various projects) and it's reached a point where we think we can start bringing on other downstream users. This means we need to start establishing things like a release cadence and to grow the new contributors we have to handle more project responsibility. Personally, I think that means it's time to move out from under Hadoop to drive things as our own community. Eventually, I hope the community can help draw in a group of folks traditionally underrepresented in ASF projects, namely QA and operations folks. I think test-patch by itself has enough scope to justify a project. Having a solid set of build tools that are customizable to fit the norms of different software communities is a bunch of work. Making it work well in both the context of automated test systems like Jenkins and for individual developers is even more work. We could easily also take over maintenance of things like shelldocs, since test-patch is the primary consumer of that currently but it's generally useful tooling. In addition to test-patch, I think the proposed project has some future growth potential. Given some adoption of test-patch to prove utility, the project could build on the ties it makes to start building tools to help projects do their own longer-run testing. Note that I'm talking about the tools to build QA processes and not a particular set of tested components. Specifically, I think the ChaosMonkey work that's in HBase should be generalizable as a fault injection framework (either based on that code or something like it). Doing this for arbitrary software is obviously very difficult, and a part of easing that will be to make (and then favor) tooling to allow projects to have operational glue that looks the same. Namely, the shell work that's been done in hadoop-functions.sh would be a great foundational layer that could bring good daemon handling practices to a whole slew of software projects. In the event that these frameworks and tools get adopted by parts of the Hadoop ecosystem, that could make the job of i.e. Bigtop substantially easier. I've reached out to a few folks who have been involved in the current test-patch work or expressed interest in helping out on getting it used in other projects. Right now, the proposed PMC would be (alphabetical by last name): * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds pmc, sqoop pmc, all around Jenkins expert) * Sean Busbey (ASF member, accumulo pmc, hbase pmc) * Nick Dimiduk (hbase pmc, phoenix pmc) * Chris Nauroth (ASF member, incubator pmc, hadoop pmc) * Andrew Purtell (ASF member, incubator pmc, bigtop pmc, hbase pmc, phoenix pmc) * Allen Wittenauer
Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
+1 on the idea. It would be great if tests about dependency management. multiple branches, and distributed environment can be done in the project. One discussion point is how Hadoop depends on Yetus, including the development cycles. It's a good time to rethink what's can be done for making Hadoop better. Thanks, - Tsuyoshi On Tue, Jun 16, 2015 at 8:47 AM, Sean Busbey bus...@cloudera.com wrote: Oof. I had meant to push on this again but life got in the way and now the June board meeting is upon us. Sorry everyone. In the event that this ends up contentious, hopefully one of the copied communities can give us a branch to work in. I know everyone is busy, so here's the short version of this email: I'd like to move some of the code currently in Hadoop (test-patch) into a new TLP focused on QA tooling. I'm not sure what the best format for priming this conversation is. ORC filled in the incubator project proposal template, but I'm not sure how much that confused the issue. So to start, I'll just write what I'm hoping we can accomplish in general terms here. All software development projects that are community based (that is, accepting outside contributions) face a common QA problem for vetting in-coming contributions. Hadoop is fortunate enough to be sufficiently popular that the weight of the problem drove tool development (i.e. test-patch). That tool is generalizable enough that a bunch of other TLPs have adopted their own forks. Unfortunately, in most projects this kind of QA work is an enabler rather than a primary concern, so often the tooling is worked on ad-hoc and little shared improvements happen across projects. Since the tooling itself is never a primary concern, any made is rarely reused outside of ASF projects. Over the last couple months a few of us have been working on generalizing the tooling present in the Hadoop code base (because it was the most mature out of all those in the various projects) and it's reached a point where we think we can start bringing on other downstream users. This means we need to start establishing things like a release cadence and to grow the new contributors we have to handle more project responsibility. Personally, I think that means it's time to move out from under Hadoop to drive things as our own community. Eventually, I hope the community can help draw in a group of folks traditionally underrepresented in ASF projects, namely QA and operations folks. I think test-patch by itself has enough scope to justify a project. Having a solid set of build tools that are customizable to fit the norms of different software communities is a bunch of work. Making it work well in both the context of automated test systems like Jenkins and for individual developers is even more work. We could easily also take over maintenance of things like shelldocs, since test-patch is the primary consumer of that currently but it's generally useful tooling. In addition to test-patch, I think the proposed project has some future growth potential. Given some adoption of test-patch to prove utility, the project could build on the ties it makes to start building tools to help projects do their own longer-run testing. Note that I'm talking about the tools to build QA processes and not a particular set of tested components. Specifically, I think the ChaosMonkey work that's in HBase should be generalizable as a fault injection framework (either based on that code or something like it). Doing this for arbitrary software is obviously very difficult, and a part of easing that will be to make (and then favor) tooling to allow projects to have operational glue that looks the same. Namely, the shell work that's been done in hadoop-functions.sh would be a great foundational layer that could bring good daemon handling practices to a whole slew of software projects. In the event that these frameworks and tools get adopted by parts of the Hadoop ecosystem, that could make the job of i.e. Bigtop substantially easier. I've reached out to a few folks who have been involved in the current test-patch work or expressed interest in helping out on getting it used in other projects. Right now, the proposed PMC would be (alphabetical by last name): * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds pmc, sqoop pmc, all around Jenkins expert) * Sean Busbey (ASF member, accumulo pmc, hbase pmc) * Nick Dimiduk (hbase pmc, phoenix pmc) * Chris Nauroth (ASF member, incubator pmc, hadoop pmc) * Andrew Purtell (ASF member, incubator pmc, bigtop pmc, hbase pmc, phoenix pmc) * Allen Wittenauer (hadoop committer) That PMC gives us several members and a bunch of folks familiar with the ASF. Combined with the code already existing in Apache spaces, I think that gives us sufficient justification for a direct board proposal. The planned project name is Apache Yetus. It's an archaic genus of sea snail and most of our project will
[jira] [Created] (HADOOP-12093) test-patch findbugs fails on a branch-based pre-commit runs
Sangjin Lee created HADOOP-12093: Summary: test-patch findbugs fails on a branch-based pre-commit runs Key: HADOOP-12093 URL: https://issues.apache.org/jira/browse/HADOOP-12093 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 3.0.0 Reporter: Sangjin Lee On our branch development JIRAs (YARN-2928), we are starting to see findbugs checks fail consistently. The relevant message: {noformat} findbugs baseline for /home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build Running findbugs in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice /home/jenkins/tools/maven/latest/bin/mvn clean test findbugs:findbugs -DskipTests -DhadoopPatchProcess /home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/YARN-2928FindBugsOutputhadoop-yarn-server-timelineservice.txt 21 Exception in thread main java.io.FileNotFoundException: /home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/patchprocess/YARN-2928FindbugsWarningshadoop-yarn-server-timelineservice.xml (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:146) at edu.umd.cs.findbugs.SortedBugCollection.progessMonitoredInputStream(SortedBugCollection.java:1231) at edu.umd.cs.findbugs.SortedBugCollection.readXML(SortedBugCollection.java:308) at edu.umd.cs.findbugs.SortedBugCollection.readXML(SortedBugCollection.java:295) at edu.umd.cs.findbugs.workflow.Filter.main(Filter.java:712) Pre-patch YARN-2928 findbugs is broken? {noformat} See YARN-3706 and YARN-3792 for instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
Since a couple of people have brought it up: I think the release question is probably one of the big question marks. Other than tar balls, how does something like this actually get used downstream? For test-patch, in particular, I have a few thoughts on this: Short term: * Projects that want to move RIGHT NOW would modify their Jenkins jobs to checkout from the Yetus repo (preferably at a well known tag or branch) in one directory and their project repo in another directory. Then it’s just a matter of passing the correct flags to test-patch. This is pretty much how I’ve been personally running test-patch for about 6 months now. Under Jenkins, we’ve seen this work with NiFi (incubating) already. * Create a stub version of test-patch that projects could check into their repo, replacing the existing test-patch. This stub version would git clone from either ASF or github and then execute test-patch accordingly on demand. With the correct smarts, it could make sure it has a cached version to prevent continual clones. Longer term: * I’ve been toying with the idea of (ab)using Java repos and packaging as a transportation layer, either in addition or in combination with something like a maven plugin. Something like this would clearly be better for offline usage and/or to lower the network traffic. It’s probably worth pointing out that plugins can get sucked in from outside the Yetus dir structure, so project specific bits can remain in those projects. This would mean that, e.g., if ambari decides they want to change the dependency ordering such that ambari-metrics always gets built first, that’s completely doable without the Yetus project getting involved. This is particularly relevant for things like the Dockerfile where projects would almost certainly want to dictate their build and test time dependencies.
[jira] [Created] (HADOOP-12094) TestCount Fails
Akira AJISAKA created HADOOP-12094: -- Summary: TestCount Fails Key: HADOOP-12094 URL: https://issues.apache.org/jira/browse/HADOOP-12094 Project: Hadoop Common Issue Type: Bug Components: test Reporter: Akira AJISAKA TestCount#processPathWithQuotasByQTVH and TestCount#processPathWithQuotasByStorageTypesHeader fails on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-12094) TestCount fails
[ https://issues.apache.org/jira/browse/HADOOP-12094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA resolved HADOOP-12094. Resolution: Duplicate TestCount fails --- Key: HADOOP-12094 URL: https://issues.apache.org/jira/browse/HADOOP-12094 Project: Hadoop Common Issue Type: Bug Components: test Reporter: Akira AJISAKA Attachments: org.apache.hadoop.fs.shell.TestCount.txt TestCount#processPathWithQuotasByQTVH and TestCount#processPathWithQuotasByStorageTypesHeader fails on trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
I think it's good to have a general build/test process projects can share, so +1 to pulling it out. You should get help from others. regarding incubation, it is a lot of work, especially for something that's more of an in-house tool than an artifact to release and redistribute. You can't just use apache labs or the build project's repo to work on this? if you do want to incubate, we may want to nominate the hadoop project as the monitoring PMC, rather than incubator@. -steve On 16 Jun 2015, at 17:59, Allen Wittenauer a...@altiscale.com wrote: Since a couple of people have brought it up: I think the release question is probably one of the big question marks. Other than tar balls, how does something like this actually get used downstream? For test-patch, in particular, I have a few thoughts on this: Short term: * Projects that want to move RIGHT NOW would modify their Jenkins jobs to checkout from the Yetus repo (preferably at a well known tag or branch) in one directory and their project repo in another directory. Then it’s just a matter of passing the correct flags to test-patch. This is pretty much how I’ve been personally running test-patch for about 6 months now. Under Jenkins, we’ve seen this work with NiFi (incubating) already. * Create a stub version of test-patch that projects could check into their repo, replacing the existing test-patch. This stub version would git clone from either ASF or github and then execute test-patch accordingly on demand. With the correct smarts, it could make sure it has a cached version to prevent continual clones. Longer term: * I’ve been toying with the idea of (ab)using Java repos and packaging as a transportation layer, either in addition or in combination with something like a maven plugin. Something like this would clearly be better for offline usage and/or to lower the network traffic. It’s probably worth pointing out that plugins can get sucked in from outside the Yetus dir structure, so project specific bits can remain in those projects. This would mean that, e.g., if ambari decides they want to change the dependency ordering such that ambari-metrics always gets built first, that’s completely doable without the Yetus project getting involved. This is particularly relevant for things like the Dockerfile where projects would almost certainly want to dictate their build and test time dependencies.
Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
I'm going to try responding to several things at once here, so apologies if I miss anyone and sorry for the long email. :) On Tue, Jun 16, 2015 at 3:44 PM, Steve Loughran ste...@hortonworks.com wrote: I think it's good to have a general build/test process projects can share, so +1 to pulling it out. You should get help from others. regarding incubation, it is a lot of work, especially for something that's more of an in-house tool than an artifact to release and redistribute. You can't just use apache labs or the build project's repo to work on this? if you do want to incubate, we may want to nominate the hadoop project as the monitoring PMC, rather than incubator@. -steve Important note: we're proposing a board resolution that would directly pull this code base out into a new TLP; there'd be no incubator, we'd just continue building community and start making releases. The proposed PMC believes the tooling we're talking about has direct applicability to projects well outside of the ASF. Lot's of other open source projects run on community contributions and have a general need for better QA tools. Given that problem set and the presence of a community working to solve it, there's no reason this needs to be treated as an in-house build project. We certainly want to be useful to ASF projects and getting them on-board given our current optimization for ASF infra will certainly be easier, but we're not limited to that (and our current prerequisites, a CI tool and jira or github, are pretty broadly available). On Tue, Jun 16, 2015 at 10:13 AM, Nick Dimiduk ndimi...@apache.org wrote: Since we're tossing out names, how about Apache Bootstrap? It's a meta-project to help other projects get off the ground, after all. There's already a web development framework named Bootstrap[1]. It's also used by several ASF projects, so I think it best to avoid the confusion. The name is, of course, up to the proposed PMC. As a bit of background, the current name Yetus fulfills Allen's desire to have something shell related and my desire to have a project that starts with Y (there are currently no ASF projects that start with Y). The universe of names that fill in these two is very small, AFAICT. I did a brief suitability search and didn't find any blockers. On Tue, Jun 16, 2015 at 11:59 AM, Allen Wittenauer a...@altiscale.com wrote: Since a couple of people have brought it up: I think the release question is probably one of the big question marks. Other than tar balls, how does something like this actually get used downstream? For test-patch, in particular, I have a few thoughts on this: Short term: * Projects that want to move RIGHT NOW would modify their Jenkins jobs to checkout from the Yetus repo (preferably at a well known tag or branch) in one directory and their project repo in another directory. Then it’s just a matter of passing the correct flags to test-patch. This is pretty much how I’ve been personally running test-patch for about 6 months now. Under Jenkins, we’ve seen this work with NiFi (incubating) already. * Create a stub version of test-patch that projects could check into their repo, replacing the existing test-patch. This stub version would git clone from either ASF or github and then execute test-patch accordingly on demand. With the correct smarts, it could make sure it has a cached version to prevent continual clones. Longer term: * I’ve been toying with the idea of (ab)using Java repos and packaging as a transportation layer, either in addition or in combination with something like a maven plugin. Something like this would clearly be better for offline usage and/or to lower the network traffic. It's important that the project follow ASF guidelines on publishing releases[2]. So long as we publish releases to the distribution directory I think we'd be fine having folks work off of the corresponding tag. I'm not sure there's much reason to do that, however. A Jenkins job can just as easily grab a release tarball as a git tag and we're not talking about a large amount of stuff. The kind of build setup that Chris N mentioned is also totally doable now that there's a build description DSL for Jenkins[3]. For individual developers, I don't see any reason we can't package things up as a tool, similar to how findbugs or shellcheck work. We can make OS packages (or homebrew for OS X) if we want to make stand alone installation on developer machines real easy. Those same packages could be installed on the ASF build machines, provided some ASF project wanted to make use of Yetus. Having releases will incur some turn around time for when folks want to see fixes, but that's a trade off around release cadence we can work out longer term. I would like to have one or two projects that can work off of the bleeding edge repo, but we'd have to get that to mesh with foundation policy. My gut tells me we should be
Re: [DISCUSS] project for pre-commit patch testing (was Re: upstream jenkins build broken?)
+1 A separate project sounds great. It'd be great to have more standard tooling across the ecosystem. As a practical matter, how should projects consume releases? -C On Mon, Jun 15, 2015 at 4:47 PM, Sean Busbey bus...@cloudera.com wrote: Oof. I had meant to push on this again but life got in the way and now the June board meeting is upon us. Sorry everyone. In the event that this ends up contentious, hopefully one of the copied communities can give us a branch to work in. I know everyone is busy, so here's the short version of this email: I'd like to move some of the code currently in Hadoop (test-patch) into a new TLP focused on QA tooling. I'm not sure what the best format for priming this conversation is. ORC filled in the incubator project proposal template, but I'm not sure how much that confused the issue. So to start, I'll just write what I'm hoping we can accomplish in general terms here. All software development projects that are community based (that is, accepting outside contributions) face a common QA problem for vetting in-coming contributions. Hadoop is fortunate enough to be sufficiently popular that the weight of the problem drove tool development (i.e. test-patch). That tool is generalizable enough that a bunch of other TLPs have adopted their own forks. Unfortunately, in most projects this kind of QA work is an enabler rather than a primary concern, so often the tooling is worked on ad-hoc and little shared improvements happen across projects. Since the tooling itself is never a primary concern, any made is rarely reused outside of ASF projects. Over the last couple months a few of us have been working on generalizing the tooling present in the Hadoop code base (because it was the most mature out of all those in the various projects) and it's reached a point where we think we can start bringing on other downstream users. This means we need to start establishing things like a release cadence and to grow the new contributors we have to handle more project responsibility. Personally, I think that means it's time to move out from under Hadoop to drive things as our own community. Eventually, I hope the community can help draw in a group of folks traditionally underrepresented in ASF projects, namely QA and operations folks. I think test-patch by itself has enough scope to justify a project. Having a solid set of build tools that are customizable to fit the norms of different software communities is a bunch of work. Making it work well in both the context of automated test systems like Jenkins and for individual developers is even more work. We could easily also take over maintenance of things like shelldocs, since test-patch is the primary consumer of that currently but it's generally useful tooling. In addition to test-patch, I think the proposed project has some future growth potential. Given some adoption of test-patch to prove utility, the project could build on the ties it makes to start building tools to help projects do their own longer-run testing. Note that I'm talking about the tools to build QA processes and not a particular set of tested components. Specifically, I think the ChaosMonkey work that's in HBase should be generalizable as a fault injection framework (either based on that code or something like it). Doing this for arbitrary software is obviously very difficult, and a part of easing that will be to make (and then favor) tooling to allow projects to have operational glue that looks the same. Namely, the shell work that's been done in hadoop-functions.sh would be a great foundational layer that could bring good daemon handling practices to a whole slew of software projects. In the event that these frameworks and tools get adopted by parts of the Hadoop ecosystem, that could make the job of i.e. Bigtop substantially easier. I've reached out to a few folks who have been involved in the current test-patch work or expressed interest in helping out on getting it used in other projects. Right now, the proposed PMC would be (alphabetical by last name): * Andrew Bayer (ASF member, incubator pmc, bigtop pmc, flume pmc, jclouds pmc, sqoop pmc, all around Jenkins expert) * Sean Busbey (ASF member, accumulo pmc, hbase pmc) * Nick Dimiduk (hbase pmc, phoenix pmc) * Chris Nauroth (ASF member, incubator pmc, hadoop pmc) * Andrew Purtell (ASF member, incubator pmc, bigtop pmc, hbase pmc, phoenix pmc) * Allen Wittenauer (hadoop committer) That PMC gives us several members and a bunch of folks familiar with the ASF. Combined with the code already existing in Apache spaces, I think that gives us sufficient justification for a direct board proposal. The planned project name is Apache Yetus. It's an archaic genus of sea snail and most of our project will be focused on shell scripts. N.b.: this does not mean that the Hadoop community would _have_ to rely on the new TLP, but I hope that once we have a release
Re: Apache Hadoop 2.7 Windows 7 x64 - Failing Tests TestKerberosAuthenticator
Hi All, I just cleared the kerberos tickets cached on my machine using klist purge, before starting the build. This solved the issue for me. Hope that helps anyone else facing a similar issue. Regards, Neeraj On Tue, 16/6/15, Neeraj Vaidya neeraj.vai...@yahoo.co.in wrote: Subject: Re: Apache Hadoop 2.7 Windows 7 x64 - Failing Tests TestKerberosAuthenticator To: common-dev@hadoop.apache.org common-dev@hadoop.apache.org Date: Tuesday, 16 June, 2015, 6:43 AM Hi, Can you please help me with my issue described below ? I am resenting this email as I probably sent the first one before my subscription to this list was confirmed. Sorry about that. Regards Neeraj On 15/06/2015, at 4:05 PM, Neeraj Vaidya neeraj.vai...@yahoo.co.in wrote: Hi, I have been trying to build Hadoop 2.7 on my Windows 7 64-bit laptop. I have installed all the pre-requisites mentioned in the BUILDING.txt file. However, when my build reaches the tests for hadoop-auth module, it keeps failing with errors related to timeout in the Kerberos Authentication tests. See SNIPPET below. The surefire-report for this test is attached herewith. Can you please let me know if/where I am going wrong ? I have used the following command to build mvn package -Pdist -Pdocs -Psrc -Dtar SNIPPET OF ERROR PRINTED ON SCREEN Running org.apache.hadoop.security.authentication.client.TestAuthenticatedURL Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.361 sec - in org.apache.hadoop.security.authentication.client.TestAuthenticatedURL Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator Tests run: 14, Failures: 0, Errors: 10, Skipped: 0, Time elapsed: 701.875 sec FAILURE! - in org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator testNotAuthenticated[0](org.apache.hadoop.security.authentication.client.TestKerberosAuthenticator) Time elapsed: 70.586 sec ERROR! java.lang.Exception: test timed out after 6 milliseconds at sun.security.krb5.Credentials.acquireDefaultNativeCreds(Native Method) at sun.security.krb5.Credentials.acquireDefaultCreds(Credentials.java:427) at sun.security.krb5.Credentials.acquireTGTFromCache(Credentials.java:295) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:665) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at sun.security.jgss.GSSUtil.login(GSSUtil.java:255) at sun.security.jgss.krb5.Krb5Util.getTicket(Krb5Util.java:158) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:335) at sun.security.jgss.krb5.Krb5InitCredential$1.run(Krb5InitCredential.java:331) at java.security.AccessController.doPrivileged(Native Method) at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:330) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:145) at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at sun.security.jgss.spnego.SpNegoContext.GSS_initSecContext(SpNegoContext.java:875) at sun.security.jgss.spnego.SpNegoContext.initSecContext(SpNegoContext.java:317) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248) at
[jira] [Created] (HADOOP-12091) Issues with directories handling
Gil Vernik created HADOOP-12091: --- Summary: Issues with directories handling Key: HADOOP-12091 URL: https://issues.apache.org/jira/browse/HADOOP-12091 Project: Hadoop Common Issue Type: Improvement Components: fs/swift Reporter: Gil Vernik Assignee: Gil Vernik OpenStack Swift doesn't have notion of directories. In Swift everything is object, stored in some container, that belongs to account. Current implementation contains many code that handle directories structure in Swift, in particular functions with code that zero length object is a directory. While it might be true for certain cases, there are also many cases where directory handling cases problems and highly reduce performance. For example, if Swift's container has dozens of objects and one of them has 0 length size, than Swift driver thinks it's a directory and report it to upper layer as directory. In consequence, this leads to various exceptions and crashes in client side / upper Hadoop layer. The propose of this Jira topic is to make directories handling in driver as an optional and configurable. The driver will behave the same, but there will be a configurable option that will disable directories handling and so everything will be objects, even those with 0 length size. This will cover cases, where clients doesn't care about directories structures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12092) Issues with sub-directories in Swift
Gil Vernik created HADOOP-12092: --- Summary: Issues with sub-directories in Swift Key: HADOOP-12092 URL: https://issues.apache.org/jira/browse/HADOOP-12092 Project: Hadoop Common Issue Type: Improvement Components: fs/swift Reporter: Gil Vernik Assignee: Gil Vernik OpenStack swift doesn't have notion of directories or sub-directories. In Swift everything is object, stored in container, that belongs to account. Openstack Swift allows objects to have delimiters and than users can access and filter those objects using delimiter. Very good explanation appear here http://docs.rackspace.com/files/api/v1/cf-devguide/content/Pseudo-Hierarchical_Folders_Directories-d1e1580.html Current driver has many code that create nested directories as zero length objects. While it might be needed for some cases, in general this is wrong when working with Swift and highly affects performance of the driver. The goal of this Jira is too make sub-directories generation as a configurable option. There will be option that will allow to disable sub-directories generation and this will highly improve the performance. Example: client perform PUT account/container/a/b/c/d/e/f/g.txt and driver is configured not to use sub-directories in Swift, than only one object a/b/c/d/e/f/g.txt will be generated in the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Build failed in Jenkins: Hadoop-Common-trunk #1528
See https://builds.apache.org/job/Hadoop-Common-trunk/1528/changes Changes: [arp] HDFS-8607. TestFileCorruption doesn't work as expected. (Contributed by Walter Su) [vinodkv] HADOOP-12001. Fixed LdapGroupsMapping to include configurable Posix UID and GID attributes during the search. Contributed by Patrick White. [vinodkv] HADOOP-12001. Moving CHANGES.txt up into 2.8. [aajisaka] MAPREDUCE-6363. [NNBench] Lease mismatch error when running with multiple mappers. Contributed by Brahma Reddy Battula. [aajisaka] MAPREDUCE-6396. TestPipeApplication fails by NullPointerException. Contributed by Brahma Reddy Battula. [szetszwo] HDFS-8576. Lease recovery should return true if the lease can be released and the file can be closed. Contributed by J.Andreina [szetszwo] HDFS-8540. Mover should exit with NO_MOVE_BLOCK if no block can be moved. Contributed by surendra singh lilhore [szetszwo] Move HDFS-8540 to 2.8 in CHANGES.txt. [szetszwo] HDFS-8361. Choose SSD over DISK in block placement. [ozawa] YARN-3711. Documentation of ResourceManager HA should explain configurations about listen addresses. Contributed by Masatake Iwasaki. [wheat9] HDFS-8592. SafeModeException never get unwrapped. Contributed by Haohui Mai. [devaraj] YARN-3789. Improve logs for LeafQueue#activateApplications(). Contributed -- [...truncated 5202 lines...] Running org.apache.hadoop.crypto.TestCryptoStreamsNormal Tests run: 14, Failures: 0, Errors: 0, Skipped: 8, Time elapsed: 6.804 sec - in org.apache.hadoop.crypto.TestCryptoStreamsNormal Running org.apache.hadoop.crypto.random.TestOsSecureRandom Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.581 sec - in org.apache.hadoop.crypto.random.TestOsSecureRandom Running org.apache.hadoop.crypto.random.TestOpensslSecureRandom Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.175 sec - in org.apache.hadoop.crypto.random.TestOpensslSecureRandom Running org.apache.hadoop.crypto.TestCryptoStreamsWithJceAesCtrCryptoCodec Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 23.063 sec - in org.apache.hadoop.crypto.TestCryptoStreamsWithJceAesCtrCryptoCodec Running org.apache.hadoop.crypto.TestOpensslCipher Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.198 sec - in org.apache.hadoop.crypto.TestOpensslCipher Running org.apache.hadoop.crypto.TestCryptoStreamsWithOpensslAesCtrCryptoCodec Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.699 sec - in org.apache.hadoop.crypto.TestCryptoStreamsWithOpensslAesCtrCryptoCodec Running org.apache.hadoop.crypto.TestCryptoStreams Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.259 sec - in org.apache.hadoop.crypto.TestCryptoStreams Running org.apache.hadoop.service.TestServiceLifecycle Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.372 sec - in org.apache.hadoop.service.TestServiceLifecycle Running org.apache.hadoop.service.TestCompositeService Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.246 sec - in org.apache.hadoop.service.TestCompositeService Running org.apache.hadoop.service.TestGlobalStateChangeListener Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.212 sec - in org.apache.hadoop.service.TestGlobalStateChangeListener Running org.apache.hadoop.ha.TestActiveStandbyElectorRealZK Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.489 sec - in org.apache.hadoop.ha.TestActiveStandbyElectorRealZK Running org.apache.hadoop.ha.TestHealthMonitor Exception: java.lang.RuntimeException thrown from the UncaughtExceptionHandler in thread Health Monitor for DummyHAService #3 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.114 sec - in org.apache.hadoop.ha.TestHealthMonitor Running org.apache.hadoop.ha.TestZKFailoverController Tests run: 19, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 43.268 sec - in org.apache.hadoop.ha.TestZKFailoverController Running org.apache.hadoop.ha.TestZKFailoverControllerStress Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 95.151 sec - in org.apache.hadoop.ha.TestZKFailoverControllerStress Running org.apache.hadoop.ha.TestActiveStandbyElector Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.933 sec - in org.apache.hadoop.ha.TestActiveStandbyElector Running org.apache.hadoop.ha.TestSshFenceByTcpPort Tests run: 4, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 3.71 sec - in org.apache.hadoop.ha.TestSshFenceByTcpPort Running org.apache.hadoop.ha.TestHAAdmin Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.445 sec - in org.apache.hadoop.ha.TestHAAdmin Running org.apache.hadoop.ha.TestFailoverController Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.507 sec - in org.apache.hadoop.ha.TestFailoverController Running org.apache.hadoop.ha.TestShellCommandFencer Tests run: 9, Failures: 0, Errors: 0, Skipped: 0,
Build failed in Jenkins: Hadoop-common-trunk-Java8 #230
See https://builds.apache.org/job/Hadoop-common-trunk-Java8/230/changes Changes: [arp] HDFS-8607. TestFileCorruption doesn't work as expected. (Contributed by Walter Su) [vinodkv] HADOOP-12001. Fixed LdapGroupsMapping to include configurable Posix UID and GID attributes during the search. Contributed by Patrick White. [vinodkv] HADOOP-12001. Moving CHANGES.txt up into 2.8. [aajisaka] MAPREDUCE-6363. [NNBench] Lease mismatch error when running with multiple mappers. Contributed by Brahma Reddy Battula. [aajisaka] MAPREDUCE-6396. TestPipeApplication fails by NullPointerException. Contributed by Brahma Reddy Battula. [szetszwo] HDFS-8576. Lease recovery should return true if the lease can be released and the file can be closed. Contributed by J.Andreina [szetszwo] HDFS-8540. Mover should exit with NO_MOVE_BLOCK if no block can be moved. Contributed by surendra singh lilhore [szetszwo] Move HDFS-8540 to 2.8 in CHANGES.txt. [szetszwo] HDFS-8361. Choose SSD over DISK in block placement. [ozawa] YARN-3711. Documentation of ResourceManager HA should explain configurations about listen addresses. Contributed by Masatake Iwasaki. [wheat9] HDFS-8592. SafeModeException never get unwrapped. Contributed by Haohui Mai. [devaraj] YARN-3789. Improve logs for LeafQueue#activateApplications(). Contributed -- [...truncated 5580 lines...] Running org.apache.hadoop.io.TestEnumSetWritable Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.431 sec - in org.apache.hadoop.io.TestEnumSetWritable Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.io.TestMapWritable Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.185 sec - in org.apache.hadoop.io.TestMapWritable Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.io.TestBooleanWritable Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.185 sec - in org.apache.hadoop.io.TestBooleanWritable Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.io.TestBytesWritable Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.183 sec - in org.apache.hadoop.io.TestBytesWritable Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.io.TestSequenceFile Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.373 sec - in org.apache.hadoop.io.TestSequenceFile Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.io.TestTextNonUTF8 Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.162 sec - in org.apache.hadoop.io.TestTextNonUTF8 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.io.TestObjectWritableProtos Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.275 sec - in org.apache.hadoop.io.TestObjectWritableProtos Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.io.TestDefaultStringifier Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.328 sec - in org.apache.hadoop.io.TestDefaultStringifier Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.io.retry.TestRetryProxy Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.189 sec - in org.apache.hadoop.io.retry.TestRetryProxy Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.io.retry.TestDefaultRetryPolicy Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.337 sec - in org.apache.hadoop.io.retry.TestDefaultRetryPolicy Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.io.retry.TestFailoverProxy Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.035 sec - in org.apache.hadoop.io.retry.TestFailoverProxy Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.io.nativeio.TestNativeIO Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.587 sec - in org.apache.hadoop.io.nativeio.TestNativeIO Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.io.nativeio.TestSharedFileDescriptorFactory Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.186 sec - in org.apache.hadoop.io.nativeio.TestSharedFileDescriptorFactory Java HotSpot(TM) 64-Bit Server VM warning: ignoring
Re: Maven always detects changes - Is this a Docker 'feature'?
Your clocks are probably confused. ant -diagnostics actually measures clock drift between System.currentTimeMillis() and the timestamps coming off the tmp dir. You should do the same with files touched in target/ On 15 Jun 2015, at 23:31, Colin P. McCabe cmcc...@apache.org wrote: Hi Darrell, Sorry, I'm not familiar with this feature of Maven. Perhaps try asking on the Apache Maven mailing list? best, Colin On Fri, May 22, 2015 at 8:34 AM, Darrell Taylor darrell.tay...@gmail.com wrote: Hi, Is it normal behaviour for maven to detect changes when I run tests with no changes? e.g. $ mvn test -Dtest=TestDFSShell -nsu -o ... [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hadoop-hdfs --- [INFO] Changes detected - recompiling the module! [INFO] Compiling 576 source files to /home/darrell/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/classes ... Then I run the same command again without touching anything else and it compiles everything again. It's getting rather tedious. I am running this from inside the docker container. Any help appreciated. Thanks Darrell.
[jira] [Resolved] (HADOOP-12091) Issues with directories handling in Swift
[ https://issues.apache.org/jira/browse/HADOOP-12091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-12091. - Resolution: Duplicate Issues with directories handling in Swift - Key: HADOOP-12091 URL: https://issues.apache.org/jira/browse/HADOOP-12091 Project: Hadoop Common Issue Type: Improvement Components: fs/swift Reporter: Gil Vernik Assignee: Gil Vernik OpenStack Swift doesn't have notion of directories. In Swift everything is object, stored in some container, that belongs to account. Current implementation contains many code that handle directories structure in Swift, in particular functions with code that zero length object is a directory. While it might be true for certain cases, there are also many cases where directory handling cases problems and highly reduce performance. For example, if Swift's container has dozens of objects and one of them has 0 length size, than Swift driver thinks it's a directory and report it to upper layer as directory. In consequence, this leads to various exceptions and crashes in client side / upper Hadoop layer. The propose of this Jira topic is to make directories handling in driver as an optional and configurable. The driver will behave the same, but there will be a configurable option that will disable directories handling and so everything will be objects, even those with 0 length size. This will cover cases, where clients doesn't care about directories structures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Protocol Buffers version
On 15 Jun 2015, at 22:31, Colin P. McCabe cmcc...@apache.org wrote: On Mon, Jun 15, 2015 at 7:24 AM, Allen Wittenauer a...@altiscale.com wrote: On Jun 12, 2015, at 1:03 PM, Alan Burlison alan.burli...@oracle.com wrote: On 14/05/2015 18:41, Chris Nauroth wrote: As a reminder though, the community probably would want to see a strong justification for the upgrade in terms of features or performance or something else. Right now, I'm not seeing a significant benefit for us based on my reading of their release notes. I think it's worthwhile to figure this out first. Otherwise, there is a risk that any testing work turns out to be a wasted effort. One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does. to be ruthless, that's not enough reason to upgrade branch-2, due to the transitive pain it makes all the way down. That's a pretty good reason. Some of us had a discussion at Summit about effectively forking protobuf and making it an Apache TLP. This would give us a chance to get out from under Google's blind spot, guarantee better compatibility across the ecosystem, etc, etc. It is sounding more and more like that's really what needs to happen. I agree that it would be nice if the protobuf project avoided making backwards-incompatible API changes within a minor release. But in practice, we have had the same issues with Jackson, Guava, jets3t, and other dependencies. Nearly every important Hadoop dependency has made backwards-incompatible API changes within a minor release of the dependency... and that's one reason we are using such old versions of everything. I don't think PB deserves to be singled out as much as it has been. I think it does deserve as it was such an all-or-nothing change. Guava, well, we may keep it at 11.0, but we've made sure there are no classes used which aren't in the latest versions. Even where we depend on artifacts which need later versions (curator-2.7.1) we've addressed the version problem by verifying that you can actually rebuild curator with guava-11.0 with everything working (curator-x-discovery doesn't compile, but we don't use that). So we know that unless a bit of curator uses reflection, we can run it against 11.x. And if someone wants to use a later version of Guava + hadoop-common, they can swap it in and hadoop will still work. Which is important as on Java 8u45 + you do need a recent Guava. In contrast, protobuf needed a co-ordinate update across everything, every project which had checked in their generated protobuf files had to rebuild and check in, which guarantees they could no longer work with protobuf 2.4 Jackson? its broken-ness wasn't so obvious: if we'd known I wouldn't have let it go updated. It's now on the risk list and I don't see us updating that for a long time. I think the work going on now to implement CLASSPATH isolation in Hadoop will really be beneficial here because we will be able to upgrade without worrying about these problems. +1
Re: What is the limit to the number of properties set in the configuration object
they also get sent over the wire with things like job submissions, so can make things slower. in my little grumpy project, https://github.com/steveloughran/grumpy , I actually stuck the groovy scripts into the config files as strings, so they'd be submitted as jobs; the mapper reducer would simply read the config, parse it as a method under the mapper context, then run it https://github.com/steveloughran/grumpy/blob/master/src/main/groovy/org/apache/hadoop/grumpy/scripted/ScriptedMapper.groovy On 15 Jun 2015, at 22:35, Colin P. McCabe cmcc...@apache.org wrote: Much like zombo.com, the only limit is yourself. But huge Configuration objects are going to be really inefficient, so I would look elsewhere for storing lots of data. best, Colin On Fri, Jun 12, 2015 at 7:30 PM, Sitaraman Vilayannur vrsitaramanietfli...@gmail.com wrote: Thanks Allen, what is the total size limit? Sitaraman On Fri, Jun 12, 2015 at 10:53 PM, Allen Wittenauer a...@altiscale.com wrote: On Jun 12, 2015, at 12:37 AM, Sitaraman Vilayannur vrsitaramanietfli...@gmail.com wrote: Hi, What is the limit on the number of properties that can be set using set(String s1, String s2) on the Configuration object for hadoop? Is this limit configurable if so what is the maximum that can be set? It's a total size of the conf limit, not a number of limit. In general, you shouldn't pack it full of stuff as calling Configuration is expensive. Use a side-input/distributed cache file for mass quantities of bits.