Re: [DISCUSS] Migrate hadoop from log4j1 to log4j2

2022-01-20 Thread Andrew Purtell
Just to clarify: I think you want to upgrade to Log4J2 (or switch to LogBack) 
as a strategy for new releases, but you have the option in maintenance releases 
to use Reload4J to maintain Appender API and operational compatibility, and 
users who want to minimize risks in production while mitigating the security 
issues will prefer that. 

> On Jan 20, 2022, at 8:59 AM, Andrew Purtell  wrote:
> 
> Reload4J has fixed all of those CVEs without requiring an upgrade. 
> 
>> On Jan 20, 2022, at 5:56 AM, Duo Zhang  wrote:
>> 
>> There are 3 new CVEs for log4j1 reported recently[1][2][3]. So I think it
>> is time to speed up the migration to log4j2 work[4] now.
>> 
>> You can see the discussion on the jira issue[4], our goal is to fully
>> migrate to log4j2 and the current most blocking issue is lack of the
>> "log4j.rootLogger=INFO,Console" grammer support for log4j2. I've already
>> started a discussion thread on the log4j dev mailing list[5] and the result
>> is optimistic and I've filed an issue for log4j2[6], but I do not think it
>> could be addressed and released soon. If we want to fully migrate to
>> log4j2, then either we introduce new environment variables or split the old
>> HADOOP_ROOT_LOGGER variable in the startup scripts. And considering the
>> complexity of our current startup scripts, the work is not easy and it will
>> also break lots of other hadoop deployment systems if they do not use our
>> startup scripts...
>> 
>> So after reconsidering the current situation, I prefer we use the log4j1.2
>> bridge to remove the log4j1 dependency first, and once LOG4J2-3341 is
>> addressed and released, we start to fully migrate to log4j2. Of course we
>> have other problems for log4j1.2 bridge too, as we have TaskLogAppender,
>> ContainerLogAppender and ContainerRollingLogAppender which inherit
>> FileAppender and RollingFileAppender in log4j1, which are not part of the
>> log4j1.2 bridge. But anyway, at least we could just copy the source code to
>> hadoop as we have WriteAppender in log4j1.2 bridge, and these two classes
>> do not have related CVEs.
>> 
>> Thoughts? For me I would like us to make a new 3.4.x release line to remove
>> the log4j1 dependencies ASAP.
>> 
>> Thanks.
>> 
>> 1. https://nvd.nist.gov/vuln/detail/CVE-2022-23302
>> 2. https://nvd.nist.gov/vuln/detail/CVE-2022-23305
>> 3. https://nvd.nist.gov/vuln/detail/CVE-2022-23307
>> 4. https://issues.apache.org/jira/browse/HADOOP-16206
>> 5. https://lists.apache.org/thread/gvfb3jkg6t11cyds4jmpo7lrswmx28w3
>> 6. https://issues.apache.org/jira/browse/LOG4J2-3341

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] Migrate hadoop from log4j1 to log4j2

2022-01-20 Thread Andrew Purtell
Reload4J has fixed all of those CVEs without requiring an upgrade. 

> On Jan 20, 2022, at 5:56 AM, Duo Zhang  wrote:
> 
> There are 3 new CVEs for log4j1 reported recently[1][2][3]. So I think it
> is time to speed up the migration to log4j2 work[4] now.
> 
> You can see the discussion on the jira issue[4], our goal is to fully
> migrate to log4j2 and the current most blocking issue is lack of the
> "log4j.rootLogger=INFO,Console" grammer support for log4j2. I've already
> started a discussion thread on the log4j dev mailing list[5] and the result
> is optimistic and I've filed an issue for log4j2[6], but I do not think it
> could be addressed and released soon. If we want to fully migrate to
> log4j2, then either we introduce new environment variables or split the old
> HADOOP_ROOT_LOGGER variable in the startup scripts. And considering the
> complexity of our current startup scripts, the work is not easy and it will
> also break lots of other hadoop deployment systems if they do not use our
> startup scripts...
> 
> So after reconsidering the current situation, I prefer we use the log4j1.2
> bridge to remove the log4j1 dependency first, and once LOG4J2-3341 is
> addressed and released, we start to fully migrate to log4j2. Of course we
> have other problems for log4j1.2 bridge too, as we have TaskLogAppender,
> ContainerLogAppender and ContainerRollingLogAppender which inherit
> FileAppender and RollingFileAppender in log4j1, which are not part of the
> log4j1.2 bridge. But anyway, at least we could just copy the source code to
> hadoop as we have WriteAppender in log4j1.2 bridge, and these two classes
> do not have related CVEs.
> 
> Thoughts? For me I would like us to make a new 3.4.x release line to remove
> the log4j1 dependencies ASAP.
> 
> Thanks.
> 
> 1. https://nvd.nist.gov/vuln/detail/CVE-2022-23302
> 2. https://nvd.nist.gov/vuln/detail/CVE-2022-23305
> 3. https://nvd.nist.gov/vuln/detail/CVE-2022-23307
> 4. https://issues.apache.org/jira/browse/HADOOP-16206
> 5. https://lists.apache.org/thread/gvfb3jkg6t11cyds4jmpo7lrswmx28w3
> 6. https://issues.apache.org/jira/browse/LOG4J2-3341

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] A final minor release off branch-2?

2017-11-15 Thread Andrew Purtell
> From recent classpath isolation work, I was surprised to find out that
many of our downstream projects (HBase, Tez, etc.) are still consuming many
non-public, server side APIs of Hadoop, not saying the projects/products
outside of hadoop ecosystem. Our API compatibility test does not (and
should not) cover these cases and situations. We can claim that new major
release shouldn't be responsible for these private API changes.

Would you consider filing HBase JIRAs for what are in your opinion the
worst offenses? We can at least take a look.



On Wed, Nov 15, 2017 at 1:37 AM, Junping Du  wrote:

> Thanks Vinod to bring up this discussion, which is just in time.
>
> I agree with most responses that option C is not a good choice as our
> community bandwidth is precious and we should focus on very limited
> mainstream branches to develop, test and deployment. Of course, we should
> still follow Apache way to allow any interested committer for rolling up
> his/her own release given specific requirement over the mainstream releases.
>
> I am not biased on option A or B (I will discuss this later), but I think
> a bridge release for upgrading to and back from 3.x is very necessary.
> The reasons are obviously:
> 1. Given lesson learned from previous experience of migration from 1.x to
> 2.x, no matter how careful we tend to be, there is still chance that some
> level of compatibility (source, binary, configuration, etc.) get broken for
> the migration to new major release. Some of these incompatibilities can
> only be identified in runtime after GA release with widely deployed in
> production cluster - we have tons of downstream projects and numerous
> configurations and we cannot cover them all from in-house deployment and
> test.
>
> 2. From recent classpath isolation work, I was surprised to find out that
> many of our downstream projects (HBase, Tez, etc.) are still consuming many
> non-public, server side APIs of Hadoop, not saying the projects/products
> outside of hadoop ecosystem. Our API compatibility test does not (and
> should not) cover these cases and situations. We can claim that new major
> release shouldn't be responsible for these private API changes. But given
> the possibility of breaking existing applications in some way, users could
> be very hesitated to migrate to 3.x release if there is no safe solution to
> roll back.
>
> 3. Beside incompatibilities, there is also possible to have performance
> regressions (lower throughput, higher latency, slower job running, bigger
> memory footprint or even memory leaking, etc.) for new hadoop releases.
> While the performance impact of migration (if any) could be neglectable to
> some users, other users could be very sensitive and wish to roll back if it
> happens on their production cluster.
>
> As Andrew mentioned in early email threads, some work has been done for
> verifying rolling upgrade from 2.x to 3.0 (just curious that which 2.x
> release is tested to upgrade from? 2.8.2 or 2.9.0 which is still in
> releasing?). But I am not aware any work we are doing now to test downgrade
> from 3.0 to 2.x (correct me if I miss any work). If users hit any of three
> situations I mentioned above then we should give them the chance to roll
> back if they are really conservative to these unexpected side-effect of
> upgrading. Given this, we should have this bridge release to cover the case
> for 3.0 safely roll back (no matter rolling or not). I am not sure it
> should be 2.9.x or 2.10.x for now (we can just call it 2.BR release)
> because we are not sure what exactly changes we should include for
> supporting roll back from 3.0 at this moment. We can defer this decision to
> discuss later when we have better ideas.
>
> Summary for my two cents:
> - No more feature release should happen on branch-2. 2.9 or 2.10 should be
> the last minor release (mainstream of community) on branch-2
>
> - A bridge release is necessary for safely upgrade/downgrade to 3.x
>
> - We can decide later to see if 2.10 is necessary when scope of the bridge
> release is more clear.
>
>
> Thanks,
>
> Junping
>
> 
> From: Andrew Wang 
> Sent: Tuesday, November 14, 2017 2:25 PM
> To: Wangda Tan
> Cc: Steve Loughran; Vinod Kumar Vavilapalli; Kai Zheng; Arun Suresh;
> common-...@hadoop.apache.org; yarn-...@hadoop.apache.org; Hdfs-dev;
> mapreduce-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] A final minor release off branch-2?
>
> To follow up on my earlier email, I don't think there's need for a bridge
> release given that we've successfully tested rolling upgrade from 2.x to
> 3.0.0. I expect we'll keep making improvements to smooth over any
> additional incompatibilities found, but there isn't a requirement that a
> user upgrade to a bridge release before upgrading to 3.0.
>
> Otherwise, I don't have a strong opinion about when to discontinue branch-2
> releases. Historically, a release line is 

Re: [VOTE] Release Apache Hadoop 2.7.3 RC1

2016-08-18 Thread Andrew Purtell
>
What is a realistic strategy for us to evolve the HDFS audit log in a
backward-compatible way?  If the API is essentially any form of ad-hoc
scripting, then for any proposed audit log format change, I can find a
reason to veto it on grounds of backward incompatibility.

Yeah when log scraping is the only way at information, then the API surface
expands to cover all manner of ad-hoc scripting.

Not sure moving away from emitting audit information in log lines would be
operator friendly. That's a tough one. Just about everything in the
ecosystem emits audit information as log lines. If Hadoop switches strategy
to become a one-off doing something different this would be painful.

Assuming log lines will be the way we continue to receive audit events from
Hadoop/HDFS, please consider freezing any changes to audit logging today,
develop a formal specification, add the specification to documentation, and
then take care to not break the specification between releases. Because
audit logging from the NN comes from low level places in FSNameSystem this
is going to constrain maintenance and refactor of that and related code, so
with my software maintainer hat on I feel your pain in advance. You'll want
to hash out what level of compatibility you'd like to offer. I'd recommend
only changing on major releases.

On Thu, Aug 18, 2016 at 10:04 AM, Chris Nauroth <cnaur...@hortonworks.com>
wrote:

> Andrew, thanks for adding your perspective on this.
>
> ​​
> What is a realistic strategy for us to evolve the HDFS audit log in a
> backward-compatible way?  If the API is essentially any form of ad-hoc
> scripting, then for any proposed audit log format change, I can find a
> reason to veto it on grounds of backward incompatibility.
>
> - I can’t add a new field on the end, because that would break an awk
> script that uses $NF expecting to find a specific field.
> - I can’t prepend a new field, because that would break a "cut -f1"
> expecting to find the timestamp.
> - HDFS can’t add any new features, because someone might have written a
> script that does "exit 1" if it finds an unexpected RPC in the "cmd=" field.
> - Hadoop is not allowed to add full IPv6 support, because someone might
> have written a script that looks at the "ip=" field and parses it by IPv4
> syntax.
>
> On the CLI, a potential solution for evolving the output is to preserve
> the old format by default and only enable the new format if the user
> explicitly passes a new argument.  What should we do for the audit log?
> Configuration flags in hdfs-site.xml?  (That of course adds its own brand
> of complexity.)
>
> ​​
> I’m particularly interested to hear potential solutions from people like
> Andrew and Allen who have been most vocal about the need for a stable
> format.  Without a solution, this unfortunately devolves into the format
> being frozen within a major release line.
>
> We could benefit from getting a patch on the compatibility doc that
> addresses the HDFS audit log specifically.
>
> --Chris Nauroth
>
> On 8/18/16, 8:47 AM, "Andrew Purtell" <andrew.purt...@gmail.com> wrote:
>
> An incompatible APIs change is developer unfriendly. An incompatible
> behavioral change is operator unfriendly. Historically, one dimension of
> incompatibility has had a lot more mindshare than the other. It's great
> that this might be changing for the better.
>
> Where I work when we move from one Hadoop 2.x minor to another we
> always spend time updating our deployment plans, alerting, log scraping,
> and related things due to changes. Some are debatable as if qualifying for
> the 'incompatible' designation. I think the audit logging change that
> triggered this discussion is a good example of one that does. If you want
> to audit HDFS actions those log emissions are your API. (Inotify doesn't
> offer access control events.) One has to code regular expressions for
> parsing them and reverse engineer under what circumstances an audit line is
> emitted so you can make assumptions about what transpired. Change either
> and you might break someone's automation for meeting industry or legal
> compliance obligations. Not a trivial matter. If you don't operate Hadoop
> in production you might not realize the implications of such a change. Glad
> to see Hadoop has community diversity to recognize it in some cases.
>
> > On Aug 18, 2016, at 6:57 AM, Junping Du <j...@hortonworks.com> wrote:
> >
> > I think Allen's previous comments are very misleading.
> > In my understanding, only incompatible API (RPC, CLIs, WebService,
> etc.) shouldn't land on branch-2, but other incompatible behaviors (logs,
> audit-log, daemon's restart, e

Re: [VOTE] Release Apache Hadoop 2.7.3 RC1

2016-08-18 Thread Andrew Purtell
An incompatible APIs change is developer unfriendly. An incompatible behavioral 
change is operator unfriendly. Historically, one dimension of incompatibility 
has had a lot more mindshare than the other. It's great that this might be 
changing for the better. 

Where I work when we move from one Hadoop 2.x minor to another we always spend 
time updating our deployment plans, alerting, log scraping, and related things 
due to changes. Some are debatable as if qualifying for the 'incompatible' 
designation. I think the audit logging change that triggered this discussion is 
a good example of one that does. If you want to audit HDFS actions those log 
emissions are your API. (Inotify doesn't offer access control events.) One has 
to code regular expressions for parsing them and reverse engineer under what 
circumstances an audit line is emitted so you can make assumptions about what 
transpired. Change either and you might break someone's automation for meeting 
industry or legal compliance obligations. Not a trivial matter. If you don't 
operate Hadoop in production you might not realize the implications of such a 
change. Glad to see Hadoop has community diversity to recognize it in some 
cases. 

> On Aug 18, 2016, at 6:57 AM, Junping Du  wrote:
> 
> I think Allen's previous comments are very misleading. 
> In my understanding, only incompatible API (RPC, CLIs, WebService, etc.) 
> shouldn't land on branch-2, but other incompatible behaviors (logs, 
> audit-log, daemon's restart, etc.) should get flexible for landing. 
> Otherwise, how could 52 issues ( https://s.apache.org/xJk5) marked with 
> incompatible-changes could get landed on branch-2 after 2.2.0 release? Most 
> of them are already released. 
> 
> Thanks,
> 
> Junping
> 
> From: Vinod Kumar Vavilapalli 
> Sent: Wednesday, August 17, 2016 9:29 PM
> To: Allen Wittenauer
> Cc: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
> yarn-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> Subject: Re: [VOTE] Release Apache Hadoop 2.7.3 RC1
> 
> I always look at CHANGES.txt entries for incompatible-changes and this JIRA 
> obviously wasn’t there.
> 
> Anyways, this shouldn’t be in any of branch-2.* as committers there clearly 
> mentioned that this is an incompatible change.
> 
> I am reverting the patch from branch-2* .
> 
> Thanks
> +Vinod
> 
>> On Aug 16, 2016, at 9:29 PM, Allen Wittenauer  
>> wrote:
>> 
>> 
>> 
>> -1
>> 
>> HDFS-9395 is an incompatible change:
>> 
>> a) Why is not marked as such in the changes file?
>> b) Why is an incompatible change in a micro release, much less a minor?
>> c) Where is the release note for this change?
>> 
>> 
>>> On Aug 12, 2016, at 9:45 AM, Vinod Kumar Vavilapalli  
>>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I've created a release candidate RC1 for Apache Hadoop 2.7.3.
>>> 
>>> As discussed before, this is the next maintenance release to follow up 
>>> 2.7.2.
>>> 
>>> The RC is available for validation at: 
>>> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC1/ 
>>> 
>>> 
>>> The RC tag in git is: release-2.7.3-RC1
>>> 
>>> The maven artifacts are available via repository.apache.org 
>>>  at 
>>> https://repository.apache.org/content/repositories/orgapachehadoop-1045/ 
>>> 
>>> 
>>> The release-notes are inside the tar-balls at location 
>>> hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I 
>>> hosted this at home.apache.org/~vinodkv/hadoop-2.7.3-RC1/releasenotes.html 
>>>  for 
>>> your quick perusal.
>>> 
>>> As you may have noted,
>>> - few issues with RC0 forced a RC1 [1]
>>> - a very long fix-cycle for the License & Notice issues (HADOOP-12893) 
>>> caused 2.7.3 (along with every other Hadoop release) to slip by quite a 
>>> bit. This release's related discussion thread is linked below: [2].
>>> 
>>> Please try the release and vote; the vote will run for the usual 5 days.
>>> 
>>> Thanks,
>>> Vinod
>>> 
>>> [1] [VOTE] Release Apache Hadoop 2.7.3 RC0: 
>>> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/index.html#26106 
>>> 
>>> [2]: 2.7.3 release plan: 
>>> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html 
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> 
> 
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For 

Re: 2.7.3 release plan

2016-04-01 Thread Andrew Purtell
As a downstream consumer of Apache Hadoop 2.7.x releases, I expect we would
patch the release to revert HDFS-8791 before pushing it out to production.
For what it's worth.


On Fri, Apr 1, 2016 at 11:23 AM, Andrew Wang 
wrote:

> One other thing I wanted to bring up regarding HDFS-8791, we haven't
> backported the parallel DN upgrade improvement (HDFS-8578) to branch-2.6.
> HDFS-8578 is a very important related fix since otherwise upgrade will be
> very slow.
>
> On Thu, Mar 31, 2016 at 10:35 AM, Andrew Wang 
> wrote:
>
> > As I expressed on HDFS-8791, I do not want to include this JIRA in a
> > maintenance release. I've only seen it crop up on a handful of our
> > customer's clusters, and large users like Twitter and Yahoo that seem to
> be
> > more affected are also the most able to patch this change in themselves.
> >
> > Layout upgrades are quite disruptive, and I don't think it's worth
> > breaking upgrade and downgrade expectations when it doesn't affect the
> (in
> > my experience) vast majority of users.
> >
> > Vinod seemed to have a similar opinion in his comment on HDFS-8791, but
> > will let him elaborate.
> >
> > Best,
> > Andrew
> >
> > On Thu, Mar 31, 2016 at 9:11 AM, Sean Busbey 
> wrote:
> >
> >> As of 2 days ago, there were already 135 jiras associated with 2.7.3,
> >> if *any* of them end up introducing a regression the inclusion of
> >> HDFS-8791 means that folks will have cluster downtime in order to back
> >> things out. If that happens to any substantial number of downstream
> >> folks, or any particularly vocal downstream folks, then it is very
> >> likely we'll lose the remaining trust of operators for rolling out
> >> maintenance releases. That's a pretty steep cost.
> >>
> >> Please do not include HDFS-8791 in any 2.6.z release. Folks having to
> >> be aware that an upgrade from e.g. 2.6.5 to 2.7.2 will fail is an
> >> unreasonable burden.
> >>
> >> I agree that this fix is important, I just think we should either cut
> >> a version of 2.8 that includes it or find a way to do it that gives an
> >> operational path for rolling downgrade.
> >>
> >> On Thu, Mar 31, 2016 at 10:10 AM, Junping Du 
> wrote:
> >> > Thanks for bringing up this topic, Sean.
> >> > When I released our latest Hadoop release 2.6.4, the patch of
> HDFS-8791
> >> haven't been committed in so that's why we didn't discuss this earlier.
> >> > I remember in JIRA discussion, we treated this layout change as a
> >> Blocker bug that fixing a significant performance regression before but
> not
> >> a normal performance improvement. And I believe HDFS community already
> did
> >> their best with careful and patient to deliver the fix and other related
> >> patches (like upgrade fix in HDFS-8578). Take an example of HDFS-8578,
> you
> >> can see 30+ rounds patch review back and forth by senior committers,
> not to
> >> mention the outstanding performance test data in HDFS-8791.
> >> > I would trust our HDFS committers' judgement to land HDFS-8791 on
> >> 2.7.3. However, that needs Vinod's final confirmation who serves as RM
> for
> >> branch-2.7. In addition, I didn't see any blocker issue to bring it into
> >> 2.6.5 now.
> >> > Just my 2 cents.
> >> >
> >> > Thanks,
> >> >
> >> > Junping
> >> >
> >> > 
> >> > From: Sean Busbey 
> >> > Sent: Thursday, March 31, 2016 2:57 PM
> >> > To: hdfs-...@hadoop.apache.org
> >> > Cc: Hadoop Common; yarn-...@hadoop.apache.org;
> >> mapreduce-dev@hadoop.apache.org
> >> > Subject: Re: 2.7.3 release plan
> >> >
> >> > A layout change in a maintenance release sounds very risky. I saw some
> >> > discussion on the JIRA about those risks, but the consensus seemed to
> >> > be "we'll leave it up to the 2.6 and 2.7 release managers." I thought
> >> > we did RMs per release rather than per branch? No one claiming to be a
> >> > release manager ever spoke up AFAICT.
> >> >
> >> > Should this change be included? Should it go into a special 2.8
> >> > release as mentioned in the ticket?
> >> >
> >> > On Thu, Mar 31, 2016 at 1:45 AM, Akira AJISAKA
> >> >  wrote:
> >> >> Thank you Vinod!
> >> >>
> >> >> FYI: 2.7.3 will be a bit special release.
> >> >>
> >> >> HDFS-8791 bumped up the datanode layout version,
> >> >> so rolling downgrade from 2.7.3 to 2.7.[0-2]
> >> >> is impossible. We can rollback instead.
> >> >>
> >> >> https://issues.apache.org/jira/browse/HDFS-8791
> >> >>
> >>
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
> >> >>
> >> >> Regards,
> >> >> Akira
> >> >>
> >> >>
> >> >> On 3/31/16 08:18, Vinod Kumar Vavilapalli wrote:
> >> >>>
> >> >>> Hi all,
> >> >>>
> >> >>> Got nudged about 2.7.3. Was previously waiting for 2.6.4 to go out
> >> (which
> >> >>> did go out mid February). Got a little busy since.
> >> >>>
> >> >>> Following up the 2.7.2 

[jira] [Created] (MAPREDUCE-5657) [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments

2013-11-27 Thread Andrew Purtell (JIRA)
Andrew Purtell created MAPREDUCE-5657:
-

 Summary: [JDK8] Fix Javadoc errors caused by incorrect or illegal 
tags in doc comments
 Key: MAPREDUCE-5657
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5657
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Andrew Purtell
Priority: Minor
 Attachments: 5657-branch-2.patch, 5657-trunk.patch

Javadoc is more strict by default in JDK8 and will error out on malformed or 
illegal tags found in doc comments. Although tagged as JDK8 all of the required 
changes are generic Javadoc cleanups.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: [VOTE] Release Apache Hadoop 2.0.4-alpha

2013-04-12 Thread Andrew Purtell
I find that branch-2.0.4-alpha won't compile for me.
o.a.h.yarn.server.resourcemanager.schduler.fifo.TestFifoScheduler is
missing an import for ResourceRequest or ResourceRequest is not available
on the branch.


On Thu, Apr 11, 2013 at 4:27 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:


 Talked to Arun offline and merged this into 2.0.4-alpha. Also fixed
 CHANGES.txt.

 Thanks,
 +Vinod

 On Apr 10, 2013, at 9:10 PM, Alejandro Abdelnur wrote:

  I've comitted HADOOP-9471 to trunk and branch-2 and closed JIRA with
  fixedVersion 2.0.5.
 
  If this JIRA makes it to 2.0.4 we need to update CHANGES.txt in
  trunk/branch-2 and the fixedVersion in the JIRA.
 
  Thx.
 
 
  On Tue, Apr 9, 2013 at 8:39 PM, Arun C Murthy a...@hortonworks.com
 wrote:
 
  Folks,
 
  I've created a release candidate (rc0) for hadoop-2.0.4-alpha that I
 would
  like to release.
 
  This is a bug-fix release which solves a number of issues discovered
  during integration testing of the full-stack.
 
  The RC is available at:
  http://people.apache.org/~acmurthy/hadoop-2.0.4-alpha-rc0/
  The RC tag in svn is here:
 
 http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.4-alpha-rc0
 
  The maven artifacts are available via repository.apache.org.
 
  Please try the release and vote; the vote will run for the usual 7 days.
 
  thanks,
  Arun
 
  P.S. Many thanks are in order - Roman/Cos and rest of BigTop community
 for
  helping to find a number of integration issues, Ted Yu for
 co-ordinating on
  HBase, Alejandro for co-ordinating on Oozie,
 Vinod/Sid/Alejandro/Xuan/Daryn
  and rest of devs for quickly jumping and fixing these.
 
 
  --
  Arun C. Murthy
  Hortonworks Inc.
  http://hortonworks.com/
 
 
 
 
 
  --
  Alejandro




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: [VOTE] Release Apache Hadoop 2.0.4-alpha

2013-04-12 Thread Andrew Purtell
Thanks Roman I'll use the tarball.

On Friday, April 12, 2013, Roman Shaposhnik wrote:

 On Fri, Apr 12, 2013 at 12:32 PM, Andrew Purtell 
 apurt...@apache.orgjavascript:;
 wrote:
  I find that branch-2.0.4-alpha won't compile for me.
  o.a.h.yarn.server.resourcemanager.schduler.fifo.TestFifoScheduler is
  missing an import for ResourceRequest or ResourceRequest is not available
  on the branch.

 Hm. RC1 as posted by Arun today compiles fine for me on these arcs:

 http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-Hadoop/

 In fact I wanted to give it a test run later today.

 Thanks,
 Roman.



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: [Vote] Merge branch-trunk-win to trunk

2013-03-25 Thread Andrew Purtell
Noticed this too. Simply a 'public' modifier is missing, but it's unclear
how this could not have been caught prior to check-in.


On Mon, Mar 25, 2013 at 9:17 PM, Konstantin Boudnik c...@apache.org wrote:

 It doesn't look like any progress has been done on the ticket below in the
 last 3 weeks. And now branch-2 can't be compiled because of


 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java:[895,15]
 WINDOWS is not public in org.apache.hadoop.fs.Path; cannot be accessed from
 outside package

 That's exactly why I was -1'ing this...
   Cos

 On Mon, Mar 04, 2013 at 05:41PM, Matt Foley wrote:
  Thanks, gentlemen.  I've opened and taken responsibility for
  https://issues.apache.org/jira/browse/HADOOP-9359.  Giri Kesavan has
 agreed
  to help with the parts that require Jenkins admin access.
 
  Thanks,
  --Matt
 
 
 
  On Mon, Mar 4, 2013 at 5:00 PM, Konstantin Shvachko 
 shv.had...@gmail.comwrote:
 
   +1 on the merge.
  
   I am glad we agreed.
   Having Jira to track the CI effort is a good idea.
  
   Thanks,
   --Konstantin
  
   On Mon, Mar 4, 2013 at 3:29 PM, Matt Foley mfo...@hortonworks.com
 wrote:
Thanks.  I agree Windows -1's in test-patch should not block commits.
   
--Matt
   
   
   
On Mon, Mar 4, 2013 at 2:30 PM, Konstantin Shvachko 
   shv.had...@gmail.com
wrote:
   
On Mon, Mar 4, 2013 at 12:22 PM, Matt Foley mfo...@hortonworks.com
 
wrote:
 Konstantine, you have voted -1, and stated some requirements
 before
 you'll
 withdraw that -1.  As I plan to do work to fulfill those
   requirements, I
 want to make sure that what I'm proposing will, in fact, satisfy
 you.
 That's why I'm asking, if we implement full test-patch
 integration
   for
 Windows, does it seem to you that that would provide adequate
 support?
   
Yes.
   
 I have learned not to presume that my interpretation is correct.
  My
 interpretation of item #1 is that test-patch provides pre-commit
   build,
 so
 it would satisfy item #1.  But rather than assuming that I am
 interpreting
 it correctly, I simply want your agreement that it would, or if
 not,
 clarification why it won't.
   
I agree it will satisfy my item #1.
I did not agree in my previous email, but I changed my mind based on
the latest discussion. I have to explain why now.
I was proposing nightly build because I did not want pre-commit
 build
for Windows block commits to Linux. But if people are fine just
 ignoring
-1s for the Windows part of the build it should be good.
   
 Regarding item #2, it is also my interpretation that test-patch
   provides
 an
 on-demand (perhaps 20-minutes deferred) Jenkins build and unit
 test,
 with
 logs available to the developer, so it would satisfy item #2.  But
 rather
 than assuming that I am interpreting it correctly, I simply want
 your
 agreement that it would, or if not, clarification why it won't.
   
It will satisfy my item #2 in the following way:
I can duplicate your pre-commit build for Windows and add an input
parameter, which would let people run the build on their patches
chosen from local machine rather than attaching them to Jiras.
   
Thanks,
--Konstantin
   
 In agile terms, you are the Owner of these requirements.  Please
 give
   me
 owner feedback as to whether my proposed work sounds like it will
 satisfy
 the requirements.

 Thank you,
 --Matt


 On Sun, Mar 3, 2013 at 12:16 PM, Konstantin Shvachko
 shv.had...@gmail.com
 wrote:

 Didn't I explain in details what I am asking for?

 Thanks,
 --Konst

 On Sun, Mar 3, 2013 at 11:08 AM, Matt Foley 
 mfo...@hortonworks.com
 wrote:
  Hi Konstantin,
  I'd like to point out two things:
  First, I already committed in this thread (email of Thu, Feb
 28,
   2013
  at
  6:01 PM) to providing CI for Windows builds.  So please stop
 acting
  like
  I'm
  resisting this idea or something.
  Second, you didn't answer my question, you just kvetched about
 the
  phrasing.
  So I ask again:
 
  Will providing full test-patch integration (pre-commit build
 and
  unit
  test
  triggered by Jira Patch Available state) satisfy your
 request for
  functionality #1 and #2?  Yes or no, please.
 
  Thanks,
  --Matt
 
 
  On Sat, Mar 2, 2013 at 7:32 PM, Konstantin Shvachko
  shv.had...@gmail.com
  wrote:
 
  Hi Matt,
 
  On Sat, Mar 2, 2013 at 12:32 PM, Matt Foley 
   mfo...@hortonworks.com
  wrote:
   Konstantin,
   I would like to explore what it would take to remove this
   perceived
   impediment --
 
  Glad you decided to explore. Thank you.
 
   although I reserve the right to argue that this is not
   pre-requisite to merging the cross-platform 

Re: [Vote] Merge branch-trunk-win to trunk

2013-03-25 Thread Andrew Purtell
Sorry, that was my error selecting the wrong reply option.


On Mon, Mar 25, 2013 at 10:25 PM, Konstantin Shvachko
shv.had...@gmail.comwrote:

 Andrew, this used to be on all -dev lists. Let's keep it that way.

 To the point.
 Does this mean that people are silently porting windows changes to
 branch-2?
 New features on a branch should be voted first, no?

 Thanks,
 --Konstantin


 On Mon, Mar 25, 2013 at 1:36 PM, Andrew Purtell apurt...@apache.org
 wrote:
  Noticed this too. Simply a 'public' modifier is missing, but it's unclear
  how this could not have been caught prior to check-in.
 
 
  On Mon, Mar 25, 2013 at 9:17 PM, Konstantin Boudnik c...@apache.org
 wrote:
 
  It doesn't look like any progress has been done on the ticket below in
 the
  last 3 weeks. And now branch-2 can't be compiled because of
 
 
 
 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java:[895,15]
  WINDOWS is not public in org.apache.hadoop.fs.Path; cannot be accessed
 from
  outside package
 
  That's exactly why I was -1'ing this...
Cos
 
  On Mon, Mar 04, 2013 at 05:41PM, Matt Foley wrote:
   Thanks, gentlemen.  I've opened and taken responsibility for
   https://issues.apache.org/jira/browse/HADOOP-9359.  Giri Kesavan has
  agreed
   to help with the parts that require Jenkins admin access.
  
   Thanks,
   --Matt
  
  
  
   On Mon, Mar 4, 2013 at 5:00 PM, Konstantin Shvachko 
  shv.had...@gmail.comwrote:
  
+1 on the merge.
   
I am glad we agreed.
Having Jira to track the CI effort is a good idea.
   
Thanks,
--Konstantin
   
On Mon, Mar 4, 2013 at 3:29 PM, Matt Foley mfo...@hortonworks.com
  wrote:
 Thanks.  I agree Windows -1's in test-patch should not block
 commits.

 --Matt



 On Mon, Mar 4, 2013 at 2:30 PM, Konstantin Shvachko 
shv.had...@gmail.com
 wrote:

 On Mon, Mar 4, 2013 at 12:22 PM, Matt Foley 
 mfo...@hortonworks.com
  
 wrote:
  Konstantine, you have voted -1, and stated some requirements
  before
  you'll
  withdraw that -1.  As I plan to do work to fulfill those
requirements, I
  want to make sure that what I'm proposing will, in fact,
 satisfy
  you.
  That's why I'm asking, if we implement full test-patch
  integration
for
  Windows, does it seem to you that that would provide adequate
  support?

 Yes.

  I have learned not to presume that my interpretation is
 correct.
   My
  interpretation of item #1 is that test-patch provides
 pre-commit
build,
  so
  it would satisfy item #1.  But rather than assuming that I am
  interpreting
  it correctly, I simply want your agreement that it would, or if
  not,
  clarification why it won't.

 I agree it will satisfy my item #1.
 I did not agree in my previous email, but I changed my mind
 based on
 the latest discussion. I have to explain why now.
 I was proposing nightly build because I did not want pre-commit
  build
 for Windows block commits to Linux. But if people are fine just
  ignoring
 -1s for the Windows part of the build it should be good.

  Regarding item #2, it is also my interpretation that test-patch
provides
  an
  on-demand (perhaps 20-minutes deferred) Jenkins build and unit
  test,
  with
  logs available to the developer, so it would satisfy item #2.
  But
  rather
  than assuming that I am interpreting it correctly, I simply
 want
  your
  agreement that it would, or if not, clarification why it won't.

 It will satisfy my item #2 in the following way:
 I can duplicate your pre-commit build for Windows and add an
 input
 parameter, which would let people run the build on their patches
 chosen from local machine rather than attaching them to Jiras.

 Thanks,
 --Konstantin

  In agile terms, you are the Owner of these requirements.
  Please
  give
me
  owner feedback as to whether my proposed work sounds like it
 will
  satisfy
  the requirements.
 
  Thank you,
  --Matt
 
 
  On Sun, Mar 3, 2013 at 12:16 PM, Konstantin Shvachko
  shv.had...@gmail.com
  wrote:
 
  Didn't I explain in details what I am asking for?
 
  Thanks,
  --Konst
 
  On Sun, Mar 3, 2013 at 11:08 AM, Matt Foley 
  mfo...@hortonworks.com
  wrote:
   Hi Konstantin,
   I'd like to point out two things:
   First, I already committed in this thread (email of Thu, Feb
  28,
2013
   at
   6:01 PM) to providing CI for Windows builds.  So please stop
  acting
   like
   I'm
   resisting this idea or something.
   Second, you didn't answer my question, you just kvetched
 about
  the
   phrasing.
   So I ask again:
  
   Will providing full test-patch integration (pre-commit
 build
  and
   unit
   test
   triggered by Jira Patch Available

Re: Release numbering for branch-2 releases

2013-02-01 Thread Andrew Purtell
On Fri, Feb 1, 2013 at 2:34 AM, Tom White t...@cloudera.com wrote:

 Possibly the reason for Stack's consternation is that this is a
 Hadoop-specific versioning scheme, rather than a standard one like
 Semantic Versioning (http://semver.org/) which is more widely
 understood.


If I can offer an alternate and likely more accurate divination, I think
it's the idea of having API incompatibility (also protocol incompatibility)
with each 2.x release.

The preference I believe is for API incompatibilities /
protocol incompatibilities to trigger a major release increment rather than
be rolled into the 2.x branch. Alternatively, I think I can anticipate the
concerns, but have you considered introducing feature flags into the RPC
protocols? Protobuf is a tagged format, by design readers can deal with
missing or unexpected optional fields as long as sender and receiver can
negotiate a lingua franca (via feature flags, is one way).

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


[jira] [Created] (MAPREDUCE-4769) Pipes build problem with recent OpenSSL libs

2012-11-02 Thread Andrew Purtell (JIRA)
Andrew Purtell created MAPREDUCE-4769:
-

 Summary: Pipes build problem with recent OpenSSL libs
 Key: MAPREDUCE-4769
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4769
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: pipes
Affects Versions: 3.0.0, 2.0.3-alpha
 Environment: CentOS 6.2 x86_64
Reporter: Andrew Purtell


Seems to be a problem with CMake not figuring that the linker needs -lcrypto 
too with recent OpenSSL. Observed on two CentOS 6 build servers occurring after 
'yum update' pulled down an openssl-devel update.

{noformat}
 [exec] Linking CXX executable examples/pipes-sort
 [exec] /usr/bin/cmake -E cmake_link_script CMakeFiles/pipes-sort.dir/li
 [exec] /usr/bin/c++-g -Wall -O2 -D_REENTRANT -D_FILE_OFFSET_BITS=64
 [exec] /usr/bin/ld: libhadooppipes.a(HadoopPipes.cc.o): undefined refer
 [exec] /usr/bin/ld: note: 'BIO_ctrl' is defined in DSO /lib64/libcrypto
 [exec] /lib64/libcrypto.so.10: could not read symbols: Invalid operatio
 [exec] collect2: ld returned 1 exit status
 [exec] make[3]: *** [examples/pipes-sort] Error 1
 [exec] make[2]: *** [CMakeFiles/pipes-sort.dir/all] Error 2
 [exec] make[1]: *** [all] Error 2
{noformat}

This works around the problem:

{noformat}
diff --git hadoop-tools/hadoop-pipes/src/CMakeLists.txt hadoop-tools/hadoop-
index a1ee97d..29cfba7 100644
--- hadoop-tools/hadoop-pipes/src/CMakeLists.txt
+++ hadoop-tools/hadoop-pipes/src/CMakeLists.txt
@@ -71,5 +71,6 @@ add_library(hadooppipes STATIC
 )
 target_link_libraries(hadooppipes
 ${OPENSSL_LIBRARIES}
+crypto
 pthread
 )

{noformat}

Builds with -Pnative won't complete without this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


HA MRv1 JobTracker?

2012-06-16 Thread Andrew Purtell
We are planning to run a next generation of Hadoop ecosystem components in
our production in a few months. We plan to use HDFS 2.0 for the HA NameNode
work. The platform will also include YARN but its use will be experimental.
So we'll be running something equivalent to the CDH MR1 package to support
production workloads for I'd guess a year.

We have heard a rumor regarding the existence of a version of the MR1
Jobtracker that persists state to Zookeeper such that failover to a new
instance is fast and doesn't lose job state. I'd like to be aspirational
and aim for a HA MR1 Jobtracker to compliment the HA namenode. Even if no
such existing code is available, we might adapt existing classes in the MR1
Jobtracker to models/proxies of state in zookeeper. For clusters of our
size (in the 100s of nodes range) this could be workable. Also, the MR
client could possibly use ZK for failover like the HDFS client.

I'm trying to find out first the availability of such code if anyone knows.
Otherwise, we may try building this, and so also I'd like to get a sense of
any interest in usage or dev collaboration.

Best regards,

- Andy




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


HBASE-5966: MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-05-08 Thread Andrew Purtell
Dear mapreduce-dev,

If one of you enterprising fellows feels like taking a look, perhaps
the errors we are seeing trying to run HBase's MapReduce based tests
atop Hadoop 2.0.0-alpha look to have a familiar cause?

https://issues.apache.org/jira/browse/HBASE-5966

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)


[jira] [Created] (MAPREDUCE-3654) MiniMRYarnCluster should set MASTER_ADDRESS to local

2012-01-10 Thread Andrew Purtell (Created) (JIRA)
MiniMRYarnCluster should set MASTER_ADDRESS to local
--

 Key: MAPREDUCE-3654
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3654
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 0.23.1
Reporter: Andrew Purtell


I needed to make the attached change in order for MiniMRCluster based HBase 
tests to get past job client initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Research projects for hadoop

2011-09-09 Thread Andrew Purtell
Both Hadoop and virtualization are means to an end. That end is to consolidate 
workloads traditionally deployed to separate servers so the average utilization 
and ROI of a given server increases.

Companies looking to consolidate data-intensive computation may be better 
served moving to Hadoop infrastructure than a virtualization project. Let me 
give you an example:

 From: Saikat Kanjilal [mailto:sxk1...@hotmail.com]
 By assigning a virtual machine to a datanode, we effectively isolate 
 the datanode from the load on the machine caused by other processes, making 
 the 
 datanode more responsive/reliable.W


One can set up virtual partitions of CPU and RAM resources that can be fairly 
independent, but attempting to stack I/O intensive workloads on top of each 
other via virtualization is a recipe for lower performance, negative ROI, 
and dissatisfied users.

Best regards,


   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)


- Original Message -
 From: Segel, Mike mse...@navteq.com
 To: common-...@hadoop.apache.org common-...@hadoop.apache.org; 
 mapreduce-dev@hadoop.apache.org mapreduce-dev@hadoop.apache.org
 Cc: 
 Sent: Friday, September 9, 2011 10:45 AM
 Subject: RE: Research projects for hadoop
 
 Why would you want to take a perfectly good machine and then try to 
 virtualize 
 it?
 I mean if I have 4 quad core cpus, I can run a lot of simultaneous map tasks. 
 However if I virtualize the box, I lose at least 1 core per VM so I end up 
 with 
 4 nodes that have less capabilities and performance than I would have under 
 my 
 original box
 
 
 -Original Message-
 From: Saikat Kanjilal [mailto:sxk1...@hotmail.com]
 Sent: Friday, September 09, 2011 10:59 AM
 To: common-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
 Subject: Research projects for hadoop
 
 
 Hi  Folks,I was looking through the following wiki page:  
 http://wiki.apache.org/hadoop/HadoopResearchProjects and was wondering if 
 there's been any work done (or any interest to do work) for the following 
 topics:
 Integration of Virtualization (such as Xen) with Hadoop toolsHow does one 
 integrate sandboxing of arbitrary user code in C++ and other languages in a 
 VM 
 such as Xen with the Hadoop framework? How does this interact with SGE, 
 Torque, 
 Condor?As each individual machine has more and more cores/cpus, it makes 
 sense 
 to partition each machine into multiple virtual machines. That gives us a 
 number 
 of benefits:By assigning a virtual machine to a datanode, we effectively 
 isolate 
 the datanode from the load on the machine caused by other processes, making 
 the 
 datanode more responsive/reliable.With multiple virtual machines on each 
 machine, we can lower the granularity of hod scheduling units, making it 
 possible to schedule multiple tasktrackers on the same machine, improving the 
 overall utilization of the whole clusters.With virtualization, we can easily 
 snapshot a virtual cluster before releasing it, making it possible to 
 re-activate the same cluster in the future and start to work from the 
 snapshot.Provisioning of long running Services via HODWork on a computation 
 model for services on the grid. The model would include:Various tools for 
 defining clients and servers of the service, and at the least a C++ and Java 
 instantiation of the abstractionsLogical definitions of how to partition work 
 onto a set of servers, i.e. a generalized shard implementationA few useful 
 abstractions like locks (exclusive and RW, fairness), leader election, 
 transactions,Various communication models for groups of servers belonging to 
 a 
 service, such as broadcast, unicast, etc.Tools for assuring QoS, reliability, 
 managing pools of servers for a service with spares, etc.Integration with 
 HDFS 
 for persistence, as well as access to local filesystemsIntegration with 
 ZooKeeper so that applications can use the namespace I would like to either 
 help 
 out with a design for the above or prototyping code, please let me know if 
 and 
 what the process may be to move forward with this.
 Regards
 
 The information contained in this communication may be CONFIDENTIAL and is 
 intended only for the use of the recipient(s) named above.  If you are not 
 the 
 intended recipient, you are hereby notified that any dissemination, 
 distribution, or copying of this communication, or any of its contents, is 
 strictly prohibited.  If you have received this communication in error, 
 please 
 notify the sender and delete/destroy the original message and any copy of it 
 from your computer or paper files.