[jira] [Resolved] (MAPREDUCE-5458) Jobhistory server (and probably others) throws HTTP 500 error if keytab fails

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-5458.
-
Resolution: Won't Fix

> Jobhistory server (and probably others) throws HTTP 500 error if keytab fails
> -
>
> Key: MAPREDUCE-5458
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5458
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 2.1.0-beta
>    Reporter: Allen Wittenauer
>Priority: Major
>
> I had a situation where the job history didn't renew its kerberos credentials 
> (still verifying that problem).  If a user connects to the web UI at a point 
> when the server can't talk to HDFS, it shows the user a 500 error rather than 
> giving something meaningful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6313) Audit/optimize tests in hadoop-mapreduce-client-jobclient

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-6313.
-
Resolution: Won't Fix

> Audit/optimize tests in hadoop-mapreduce-client-jobclient
> -
>
> Key: MAPREDUCE-6313
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6313
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>    Reporter: Allen Wittenauer
>Assignee: nijel
>Priority: Major
>  Labels: newbie
>
> The tests in this package take an extremely long time to run, with some tests 
> taking 15-20 minutes on their own.  It would be worthwhile to verify and 
> optimize any tests in this package in order to reduce patch testing time or 
> perhaps even splitting the package up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6378) convergence error in hadoop-streaming during mvn install

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-6378.
-
Resolution: Won't Fix

> convergence error in hadoop-streaming during mvn install
> 
>
> Key: MAPREDUCE-6378
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6378
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>    Reporter: Allen Wittenauer
>Priority: Major
>
> Running mvn install in the hadoop-tools/hadoop-streaming directory results in 
> a convergence error.  See comments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6469) libnativetask lacks header files and documentation

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-6469.
-
  Resolution: Won't Fix
Target Version/s:   (was: )

> libnativetask lacks header files and documentation
> --
>
> Key: MAPREDUCE-6469
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6469
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>    Reporter: Allen Wittenauer
>Priority: Blocker
>
> The MR native task library appears to have no header files included in the 
> maven package and no documentation generated for mvn site.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6699) hadoop-mapred unit tests for dynamic commands

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-6699.
-
Resolution: Won't Fix

> hadoop-mapred unit tests for dynamic commands
> -
>
> Key: MAPREDUCE-6699
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6699
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: scripts, test
>    Reporter: Allen Wittenauer
>Priority: Major
>
> This is a hold over from HADOOP-12930, dynamic sub commands.  Currently there 
> are no unit tests for this in mapred and there really should be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6691) move the shell code out of hadoop-mapreduce-project

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-6691.
-
Resolution: Won't Fix

> move the shell code out of hadoop-mapreduce-project
> ---
>
> Key: MAPREDUCE-6691
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6691
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: scripts, test
>    Reporter: Allen Wittenauer
>Priority: Major
>
> We need to move the shell code out of hadoop-mapreduce-project so that we can 
> properly build test code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6934) downlink.data is written to CWD

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-6934.
-
Resolution: Won't Fix

> downlink.data is written to CWD
> ---
>
> Key: MAPREDUCE-6934
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6934
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: pipes
>Affects Versions: 3.0.0-beta1
>    Reporter: Allen Wittenauer
>Priority: Minor
>
> When using Pipes, the downlink.data stream is written to the current working 
> directory.  This is a big of a problem when running MR jobclient tests in 
> parallel as the file is written outside of target.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] Alpha Release of Ozone

2018-08-08 Thread Allen Wittenauer



> On Aug 8, 2018, at 12:56 PM, Anu Engineer  wrote:
> 
>> Has anyone verified that a Hadoop release doesn't have _any_ of the extra 
>> ozone bits that are sprinkled outside the maven modules?
> As far as I know that is the state, we have had multiple Hadoop releases 
> after ozone has been merged. So far no one has reported Ozone bits leaking 
> into Hadoop. If we find something like that, it would be a bug.

There hasn't been a release from a branch where Ozone has been merged 
yet. The first one will be 3.2.0.  Running create-release off of trunk 
presently shows bits of Ozone in dev-support, hadoop-dist, and elsewhere in the 
Hadoop source tar ball. 

So, consider this as a report. IMHO, cutting an Ozone release prior to 
a Hadoop release ill-advised given the distribution impact and the requirements 
of the merge vote.  
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] Alpha Release of Ozone

2018-08-08 Thread Allen Wittenauer


Given that there are some Ozone components spread out past the core maven 
modules, is the plan to release a Hadoop Trunk + Ozone tar ball or is more work 
going to go into segregating the Ozone components prior to release? Has anyone 
verified that a Hadoop release doesn't have _any_ of the extra ozone bits that 
are sprinkled outside the maven modules?
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-07 Thread Allen Wittenauer
> On Jun 7, 2018, at 11:47 AM, Steve Loughran  wrote:
> 
> Actually, Yongjun has been really good at helping me get set up for a 2.7.7 
> release, including "things you need to do to get GPG working in the docker 
> image”

*shrugs* I use a different release script after some changes broke the 
in-tree version for building on OS X and I couldn’t get the fixes committed 
upstream.  So not sure what the problems are that you are hitting.

> On Jun 7, 2018, at 1:08 PM, Nandakumar Vadivelu  
> wrote:
> 
> It will be helpful if we can get the correct steps, and also update the wiki.
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Release+Validation

Yup. Looking forward to seeing it. 
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-07 Thread Allen Wittenauer


> On Jun 7, 2018, at 3:46 AM, Lokesh Jain  wrote:
> 
> Hi Yongjun
> 
> I followed Nanda’s steps and I see the same issues as reported by Nanda.


This situation is looking like an excellent opportunity for PMC members to 
mentor people on how the build works since it’s apparent that three days later, 
no one has mentioned that those steps aren’t the ones to build the complete 
website and haven’t been since at least 2.4.



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2018-05-15 Thread Allen Wittenauer

> On May 15, 2018, at 10:16 AM, Chris Douglas  wrote:
> 
> They've been failing for a long time. It can't install bats, and
> that's fatal? -C


The bats error is new and causes the build to fail enough that it 
produces the email output.  For the past few months, it hasn’t been producing 
email output at all because the builds have been timing out.  (The last ‘good’ 
report was Feb 26.)  Since no one [*] is paying attention to them enough to 
notice, I figured it was better to free up the cycles for the rest of the ASF. 

* - I noticed a while back, but for various reasons I’ve mostly moved to only 
working on Hadoop things where I’m getting paid.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2018-05-15 Thread Allen Wittenauer


FYI:

I’m going to disable the branch-2 nightly jobs.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [NOTIFICATION] Hadoop trunk rebased

2018-04-27 Thread Allen Wittenauer

Did the patch that fixes the mountain of maven warnings get missed?

> On Apr 26, 2018, at 11:52 PM, Akira Ajisaka  wrote:
> 
> + common-dev and mapreduce-dev
> 
> On 2018/04/27 6:23, Owen O'Malley wrote:
>> As we discussed in hdfs-dev@hadoop, I did a force push to Hadoop's trunk to
>> replace the Ozone merge with a rebase.
>> That means that you'll need to rebase your branches.
>> .. Owen
> 
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Fwd: Apache Hadoop qbt Report: trunk+JDK8 on Windows/x64

2018-03-15 Thread Allen Wittenauer

For my part of the HDFS bug bash, I’ve gotten the ASF Windows build 
working again. Starting tomorrow, results will be sent to the *-dev lists.

A few notes:

* It only runs the unit tests.  There’s not much point in running the other 
Yetus plugins since those are covered by the Linux one and this build is slow 
enough as it is.

* There are two types of ASF build nodes: Windows Server 2012 and Windows 
Server 2016. This job can run on both and will use whichever one has a free 
slot.

* It ALWAYS applies HADOOP-14667.05.patch prior to running.  As a result, this 
is only set up for trunk with no parameterization to run other branches.

* The URI handling for file paths in hadoop-common and elsewhere is pretty 
broken on Windows, so many many many unit tests are failing and I wouldn't be 
surprised if Windows hadoop installs are horked as a result.

* Runtime is about 12-13 hours with many tests taking significantly longer than 
their UNIX counterparts.  My guess is that this caused by winutils.  Changing 
from winutils to Java 7 API calls would get this more in line and be a 
significant performance boost for Windows clients/servers as well.

Have fun.

=

For more details, see https://builds.apache.org/job/hadoop-trunk-win/406/ 


[Mar 14, 2018 6:26:58 PM] (xyao) HDFS-13251. Avoid using hard coded datanode 
data dirs in unit tests.
[Mar 14, 2018 8:05:24 PM] (jlowe) MAPREDUCE-7064. Flaky test
[Mar 14, 2018 8:14:36 PM] (inigoiri) HDFS-13198. RBF: RouterHeartbeatService 
throws out CachedStateStore
[Mar 14, 2018 8:36:53 PM] (wangda) Revert "HADOOP-13707. If kerberos is enabled 
while HTTP SPNEGO is not
[Mar 14, 2018 10:47:56 PM] (fabbri) HADOOP-15278 log s3a at info. Contributed 
by Steve Loughran.




-1 overall


The following subsystems voted -1:
   unit


The following subsystems are considered long running:
(runtime bigger than 1h 00m 00s)
   unit


Specific tests:

   Failed CTEST tests :

  test_test_libhdfs_threaded_hdfs_static 

   Failed junit tests :

  hadoop.crypto.TestCryptoStreamsWithOpensslAesCtrCryptoCodec 
  hadoop.fs.contract.rawlocal.TestRawlocalContractAppend 
  hadoop.fs.TestFsShellCopy 
  hadoop.fs.TestFsShellList 
  hadoop.fs.TestLocalFileSystem 
  hadoop.http.TestHttpServer 
  hadoop.http.TestHttpServerLogs 
  hadoop.io.compress.TestCodec 
  hadoop.io.nativeio.TestNativeIO 
  hadoop.ipc.TestSocketFactory 
  hadoop.metrics2.impl.TestStatsDMetrics 
  hadoop.metrics2.sink.TestRollingFileSystemSinkWithLocal 
  hadoop.security.TestSecurityUtil 
  hadoop.security.TestShellBasedUnixGroupsMapping 
  hadoop.security.token.TestDtUtilShell 
  hadoop.util.TestNativeCodeLoader 
  hadoop.fs.TestWebHdfsFileContextMainOperations 
  hadoop.hdfs.client.impl.TestBlockReaderLocalLegacy 
  hadoop.hdfs.crypto.TestHdfsCryptoStreams 
  hadoop.hdfs.qjournal.client.TestQuorumJournalManager 
  hadoop.hdfs.qjournal.server.TestJournalNode 
  hadoop.hdfs.qjournal.server.TestJournalNodeSync 
  hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks 
  hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped 
  hadoop.hdfs.server.blockmanagement.TestNameNodePrunesMissingStorages 
  hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks 
  hadoop.hdfs.server.blockmanagement.TestReplicationPolicy 
  hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl 
  hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles 
  hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistLockedMemory 
  hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy 
  
hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement 
  hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
  hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter 
  hadoop.hdfs.server.datanode.fsdataset.impl.TestProvidedImpl 
  hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation 
  hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica 
  hadoop.hdfs.server.datanode.TestBlockPoolSliceStorage 
  hadoop.hdfs.server.datanode.TestBlockRecovery 
  hadoop.hdfs.server.datanode.TestBlockScanner 
  hadoop.hdfs.server.datanode.TestDataNodeFaultInjector 
  hadoop.hdfs.server.datanode.TestDataNodeMetrics 
  hadoop.hdfs.server.datanode.TestDataNodeUUID 
  hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure 
  hadoop.hdfs.server.datanode.TestDirectoryScanner 
  hadoop.hdfs.server.datanode.TestHSync 
  hadoop.hdfs.server.datanode.web.TestDatanodeHttpXFrame 
  hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand 
  hadoop.hdfs.server.diskbalancer.TestDiskBalancerRPC 
  hadoop.hdfs.server.federation.router.TestRouterAdminCLI 
  hadoop.hdfs.server.mover.TestStorageMover 
  hadoop.hdfs.server.nam

Re: [ANNOUNCE] Apache Hadoop 3.0.0 GA is released

2017-12-18 Thread Allen Wittenauer

It’s significantly more concerning that 3.0.0-beta1 doesn’t show up here:

http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-common/release/index.html

It looks like they are missing from the source tag too.  I wonder what else is 
missing.


> On Dec 18, 2017, at 11:15 AM, Andrew Wang  wrote:
> 
> Moving general@ to BCC,
> 
> The main page and releases posts on hadoop.apache.org are pretty clear
> about this being a diff from beta1, am I missing something? Pasted below:
> 
> After four alpha releases and one beta release, 3.0.0 is generally
> available. 3.0.0 consists of 302 bug fixes, improvements, and other
> enhancements since 3.0.0-beta1. All together, 6242 issues were fixed as
> part of the 3.0.0 release series since 2.7.0.
> 
> Users are encouraged to read the overview of major changes
>  in 3.0.0. The GA release
> notes
> 
> and changelog
> 
> detail
> the changes since 3.0.0-beta1.
> 
> 
> 
> On Mon, Dec 18, 2017 at 10:32 AM, Arpit Agarwal 
> wrote:
> 
>> That makes sense for Beta users but most of our users will be upgrading
>> from a previous GA release and the changelog will mislead them. The webpage
>> does not mention this is a delta from the beta release.
>> 
>> 
>> 
>> 
>> 
>> *From: *Andrew Wang 
>> *Date: *Friday, December 15, 2017 at 10:36 AM
>> *To: *Arpit Agarwal 
>> *Cc: *general , "common-...@hadoop.apache.org"
>> , "yarn-...@hadoop.apache.org" <
>> yarn-...@hadoop.apache.org>, "mapreduce-dev@hadoop.apache.org" <
>> mapreduce-dev@hadoop.apache.org>, "hdfs-...@hadoop.apache.org" <
>> hdfs-...@hadoop.apache.org>
>> *Subject: *Re: [ANNOUNCE] Apache Hadoop 3.0.0 GA is released
>> 
>> 
>> 
>> Hi Arpit,
>> 
>> 
>> 
>> If you look at the release announcements, it's made clear that the
>> changelog for 3.0.0 is diffed based on beta1. This is important since users
>> need to know what's different from the previous 3.0.0-* releases if they're
>> upgrading.
>> 
>> 
>> 
>> I agree there's additional value to making combined release notes, but
>> it'd be something additive rather than replacing what's there.
>> 
>> 
>> 
>> Best,
>> 
>> Andrew
>> 
>> 
>> 
>> On Fri, Dec 15, 2017 at 8:27 AM, Arpit Agarwal 
>> wrote:
>> 
>> 
>> Hi Andrew,
>> 
>> Thank you for all the hard work on this release. I was out the last few
>> days and didn’t get a chance to evaluate RC1 earlier.
>> 
>> The changelog looks incorrect. E.g. This gives an impression that there
>> are just 5 incompatible changes in 3.0.0.
>> http://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/
>> hadoop-common/release/3.0.0/CHANGES.3.0.0.html
>> 
>> I assume you only counted 3.0.0 changes in this log excluding
>> alphas/betas. However, users shouldn’t have to manually compile
>> incompatibilities by summing up a/b release notes. Can we fix the changelog
>> after the fact?
>> 
>> 
>> 
>> 
>> On 12/14/17, 10:45 AM, "Andrew Wang"  wrote:
>> 
>>Hi all,
>> 
>>I'm pleased to announce that Apache Hadoop 3.0.0 is generally available
>>(GA).
>> 
>>3.0.0 GA consists of 302 bug fixes, improvements, and other
>> enhancements
>>since 3.0.0-beta1. This release marks a point of quality and stability
>> for
>>the 3.0.0 release line, and users of earlier 3.0.0-alpha and -beta
>> releases
>>are encouraged to upgrade.
>> 
>>Looking back, 3.0.0 GA is the culmination of over a year of work on the
>>3.0.0 line, starting with 3.0.0-alpha1 which was released in September
>>2016. Altogether, 3.0.0 incorporates 6,242 changes since 2.7.0.
>> 
>>Users are encouraged to read the overview of major changes
>> in 3.0.0. The GA
>> release
>>notes
>>> dist/hadoop-common/release/3.0.0/RELEASENOTES.3.0.0.html>
>> and changelog
>>> dist/hadoop-common/release/3.0.0/CHANGES.3.0.0.html>
>> 
>>detail
>>the changes since 3.0.0-beta1.
>> 
>>The ASF press release provides additional color and highlights some of
>> the
>>major features:
>> 
>>https://globenewswire.com/news-release/2017/12/14/
>> 1261879/0/en/The-Apache-Software-Foundation-Announces-
>> Apache-Hadoop-v3-0-0-General-Availability.html
>> 
>>Let me end by thanking the many, many contributors who helped with this
>>release line. We've only had three major releases in Hadoop's 10 year
>>history, and this is our biggest major release ever. It's an incredible
>>accomplishment for our community, and I'm proud to have worked with
>> all of
>>you.
>> 
>>Best,
>>Andrew
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 


-
To unsubscribe, e-

Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-30 Thread Allen Wittenauer

> On Nov 30, 2017, at 1:07 AM, Rohith Sharma K S  
> wrote:
> 
> 
> >. If ATSv1 isn’t replaced by ATSv2, then why is it marked deprecated?
> Ideally it should not be. Can you point out where it is marked as deprecated? 
> If it is in historyserver daemon start, that change made very long back when 
> timeline server added. 


Ahh, I see where all the problems lie.  No one is paying attention to the 
deprecation message because it’s kind of oddly worded:

* It really means “don’t use ‘yarn historyserver’ use ‘yarn timelineserver’ ” 
* ‘yarn historyserver’ was removed from the documentation in 2.7.0
* ‘yarn historyserver’ doesn’t appear in the yarn usage output
* ‘yarn timelineserver’ runs the exact same class

There’s no reason for ‘yarn historyserver’ to exist in 3.x.  Just run ‘yarn 
timelineserver’ instead.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-25 Thread Allen Wittenauer

> On Nov 21, 2017, at 2:16 PM, Vinod Kumar Vavilapalli  
> wrote:
> 
>>> - $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start historyserver doesn't even 
>>> work. Not just deprecated in favor of timelineserver as was advertised.
>> 
>>  This works for me in trunk and the bash code doesn’t appear to have 
>> changed in a very long time.  Probably something local to your install.  (I 
>> do notice that the deprecation message says “starting” which is awkward when 
>> the stop command is given though.)  Also: is the deprecation message even 
>> true at this point?
> 
> 
> Sorry, I mischaracterized the problem.
> 
> The real issue is that I cannot use this command line when the MapReduce 
> JobHistoryServer is already started on the same machine.

The specific string is:

hadoop-${HADOOP_IDENT_STRING}-${HADOOP_SUBCMD}.pid

More specifically, the pid handling code will conflict if the following 
are true:

* same machine (obviously)
* same subcommand name
* same HADOOP_IDENT_USER: which by default is the user name of 
whatever starts it… but was designed to be overridden way back in hadoop 0.X.

… which means for most production setups, this is probably not real a 
problem.


> So, it looks like in shell-scripts, there can ever be only one daemon of a 
> given name, irrespective of which daemon scripts are invoked.

Correct.  Naming multiple, different daemons the same thing is 
extremely anti-user.   In fact, I thought this was originally about the “other” 
history server.

> 
> We need to figure out two things here
>  (a) The behavior of this command. Clearly, it will conflict with the 
> MapReduce JHS - only one of them can be started on the same node.

… by the same user, by default.  Started by a different user or 
different HADOOP_IDENT_USER, it will come up just fine.

>  (b) We need to figure out if this V1 TimelineService should even be support 
> given ATSv2.

If ATSv1 isn’t replaced by ATSv2, then why is it marked deprecated?

> On Nov 22, 2017, at 9:45 AM, Brahma Reddy Battula  wrote:
> 
> 1) Change the name
> 2) Create PID based on the CLASS Name, here applicationhistoryserver and 
> jobhistoryserver
> 3) Use same as branch-2.9..i.e suffixing with mapred or yarn
> 
> 
> @allen, any thoughts on this..?

Using the classname works in this instance, but just as we saw with the 
router daemons, people tend to use the same class names when building different 
components. It also means that if different daemons can be started in different 
ways from the same class dependent upon options, this conflict will still 
exist.  Also, with dynamic commands, it is very possible to run the same daemon 
from multiple start points.

As part of this discussion, I think it’s important to recognize:

a) This is likely to be primarily impacting developers.
b) We’re talking about two daemons where one has been deprecated.
c) Calling two different daemons “history server” is just awful from an end 
user perspective.
d) There is already a work around in place if one absolutely needs to run both 
on the same node as the same user, just as people do with datanode and 
nodemanager today.



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-20 Thread Allen Wittenauer

The original release script and instructions broke the build up into 
three or so steps. When I rewrote it, I kept that same model. It’s probably 
time to re-think that.  In particular, it should probably be one big step that 
even does the maven deploy.  There’s really no harm in doing that given that 
there is still a manual step to release the deployed jars into the production 
area.

We just need need to:

a) add an option to do deploy instead of just install.  if c-r is in asf mode, 
always activate deploy
b) pull the maven settings.xml file (and only the maven settings file… we don’t 
want the repo!) into the docker build environment
c) consolidate the mvn steps

This has the added benefit of greatly speeding up the build by removing 
several passes.

Probably not a small change, but I’d have to look at the code.  I’m on 
a plane tomorrow morning though.

Also:

>> 
>> Major
>> - The previously supported way of being able to use different tar-balls
>> for different sub-modules is completely broken - common and HDFS tar.gz are
>> completely empty.
>> 
> 
> Is this something people use? I figured that the sub-tarballs were a relic
> from the project split, and nowadays Hadoop is one project with one release
> tarball. I actually thought about getting rid of these extra tarballs since
> they add extra overhead to a full build.

I’m guessing no one noticed the tar errors when running mvn -Pdist.  
Not sure when they started happening.

> >   - When did we stop putting CHANGES files into the source artifacts?
> 
> CHANGES files were removed by 
> https://issues.apache.org/jira/browse/HADOOP-11792

To be a bit more specific about it, the maven assembly for source only 
includes things (more or less) that are part of the git repo.  When CHANGES.txt 
was removed from the source tree, it also went away from the tar ball.  This 
isn’t too much of an issue in practice though given the notes are put up on the 
web, part of the binary tar ball, and can be generated by following the 
directions in BUILDING.txt.  I don’t remember if Hadoop uploads them into the 
dist area, but if not probably should.

> - $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start historyserver doesn't even 
> work. Not just deprecated in favor of timelineserver as was advertised.

This works for me in trunk and the bash code doesn’t appear to have 
changed in a very long time.  Probably something local to your install.  (I do 
notice that the deprecation message says “starting” which is awkward when the 
stop command is given though.)  Also: is the deprecation message even true at 
this point?

>> - Cannot enable new UI in YARN because it is under a non-default
>> compilation flag. It should be on by default.
>> 
> 
> The yarn-ui profile has always been off by default, AFAIK. It's documented
> to turn it on in BUILDING.txt for release builds, and we do it in
> create-release.
> 
> IMO not a blocker. I think it's also more of a dev question (do we want to
> do this on every YARN build?) than a release one.

-1 on making yarn-ui always build.

For what is effectively an optional component (the old UI is still 
there), it’s heavy dependency requirements make it a special burden outside of 
the Docker container.  If it can be changed such that it either always 
downloads the necessary bits (regardless of the OS/chipset!) and/or doesn’t 
kill the maven build if those bits can’t be found  (i.e., truly optional), then 
I’d be less opposed.  (and, actually, quite pleased because then the docker 
image build would be significantly faster.)



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

2017-11-03 Thread Allen Wittenauer

> On Nov 3, 2017, at 12:08 PM, Stack  wrote:
> 
> On Sat, Oct 28, 2017 at 2:00 PM, Konstantin Shvachko 
> wrote:
> 
>> It is an interesting question whether Ozone should be a part of Hadoop.
> 
> I don't see a direct answer to this question. Is there one? Pardon me if
> I've not seen it but I'm interested in the response.

+1

Given:

* a completely different set of config files (ozone-site.xml, etc)
* package name is org.apache.hadoop.ozone, not 
org.apache.hadoop.hdfs.ozone

… it doesn’t really seem to want to be part of HDFS, much less Hadoop.

Plus hadoop-hdfs-project/hadoop-hdfs is already a battle zone when it comes to 
unit tests, dependencies, etc [*]

At a minimum, it should at least be using it’s own maven module for a 
lot of the bits that generates it’s own maven jars so that we can split this 
functionality up at build/test time.

At a higher level, this feels a lot like the design decisions that were 
made around yarn-native-services.  This feature is either part of HDFS or it’s 
not. Pick one.  Doing both is incredibly confusing for everyone outside of the 
branch.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Allen Wittenauer

> On Oct 24, 2017, at 4:10 PM, Andrew Wang  wrote:
> 
> FWIW we've been running branch-3.0 unit tests successfully internally, though 
> we have separate jobs for Common, HDFS, YARN, and MR. The failures here are 
> probably a property of running everything in the same JVM, which I've found 
> problematic in the past due to OOMs.

Last time I looked, surefire was configured to launch unit tests in 
different JVMs.  But that might only be true in trunk.  Or maybe only for some 
of the subprojects.  
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Allen Wittenauer

My plan is currently to:

*  switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561 
patch to test it out. 
* if the tests work, work on getting YETUS-561 committed to yetus master
* switch jobs back to ASF yetus master either post-YETUS-561 or without it if 
it doesn’t work
* go back to working on something else, regardless of the outcome


> On Oct 24, 2017, at 2:55 PM, Chris Douglas  wrote:
> 
> Sean/Junping-
> 
> Ignoring the epistemology, it's a problem. Let's figure out what's
> causing memory to balloon and then we can work out the appropriate
> remedy.
> 
> Is this reproducible outside the CI environment? To Junping's point,
> would YETUS-561 provide more detailed information to aid debugging? -C
> 
> On Tue, Oct 24, 2017 at 2:50 PM, Junping Du  wrote:
>> In general, the "solid evidence" of memory leak comes from analysis of 
>> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which 
>> piece of code are leaking memory from the analysis.
>> 
>> Unfortunately, I cannot find any conclusion from previous comments and it 
>> even cannot tell which daemons/components of HDFS consumes unexpected high 
>> memory. Don't sounds like a solid bug report to me.
>> 
>> 
>> 
>> Thanks,?
>> 
>> 
>> Junping
>> 
>> 
>> 
>> From: Sean Busbey 
>> Sent: Tuesday, October 24, 2017 2:20 PM
>> To: Junping Du
>> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; 
>> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
>> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>> 
>> Just curious, Junping what would "solid evidence" look like? Is the 
>> supposition here that the memory leak is within HDFS test code rather than 
>> library runtime code? How would such a distinction be shown?
>> 
>> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du 
>> mailto:j...@hortonworks.com>> wrote:
>> Allen,
>> Do we have any solid evidence to show the HDFS unit tests going through 
>> the roof are due to serious memory leak by HDFS? Normally, I don't expect 
>> memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
>> because of test or deployment issues.
>> Unless there is concrete evidence, my concern on seriously memory leak 
>> for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, 
>> etc.) have deployed 2.8 on large production environment for months. 
>> Non-serious memory leak (like forgetting to close stream in non-critical 
>> path, etc.) and other non-critical bugs always happens here and there that 
>> we have to live with.
>> 
>> Thanks,
>> 
>> Junping
>> 
>> 
>> From: Allen Wittenauer 
>> mailto:a...@effectivemachines.com>>
>> Sent: Tuesday, October 24, 2017 8:27 AM
>> To: Hadoop Common
>> Cc: Hdfs-dev; 
>> mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>; 
>> yarn-...@hadoop.apache.org<mailto:yarn-...@hadoop.apache.org>
>> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>> 
>>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
>>> mailto:a...@effectivemachines.com>> wrote:
>>> 
>>> 
>>> 
>>> With no other information or access to go on, my current hunch is that one 
>>> of the HDFS unit tests is ballooning in memory size.  The easiest way to 
>>> kill a Linux machine is to eat all of the RAM, thanks to overcommit and 
>>> that's what this "feels" like.
>>> 
>>> Someone should verify if 2.8.2 has the same issues before a release goes 
>>> out ...
>> 
>> 
>>FWIW, I ran 2.8.2 last night and it has the same problems.
>> 
>>Also: the node didn't die!  Looking through the workspace (so the 
>> next run will destroy them), two sets of logs stand out:
>> 
>> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>> 
>>and
>> 
>> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
>> 
>>It looks like my hunch is correct:  RAM in the HDFS unit tests are 
>> going through the roof.  It's also interesting how MANY log files there are. 
>>  Is surefire not picking up that jobs are dying?  Maybe not if memory is 
>> getting tight.
>> 
>>Anywa

Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Allen Wittenauer

> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer  
> wrote:
> 
> 
> 
> With no other information or access to go on, my current hunch is that one of 
> the HDFS unit tests is ballooning in memory size.  The easiest way to kill a 
> Linux machine is to eat all of the RAM, thanks to overcommit and that’s what 
> this “feels” like.
> 
> Someone should verify if 2.8.2 has the same issues before a release goes out …


FWIW, I ran 2.8.2 last night and it has the same problems.

Also: the node didn’t die!  Looking through the workspace (so the next 
run will destroy them), two sets of logs stand out:

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

and

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/

It looks like my hunch is correct:  RAM in the HDFS unit tests are 
going through the roof.  It’s also interesting how MANY log files there are.  
Is surefire not picking up that jobs are dying?  Maybe not if memory is getting 
tight. 

Anyway, at the point, branch-2.8 and higher are probably fubar’d. 
Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker containers 
can have their RAM limits set in order to prevent more nodes going catatonic.



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-23 Thread Allen Wittenauer


With no other information or access to go on, my current hunch is that one of 
the HDFS unit tests is ballooning in memory size.  The easiest way to kill a 
Linux machine is to eat all of the RAM, thanks to overcommit and that’s what 
this “feels” like.

Someone should verify if 2.8.2 has the same issues before a release goes out …


> On Oct 23, 2017, at 12:38 PM, Subramaniam V K  wrote:
> 
> Hi Allen,
> 
> I had set up the build (or intended to) in anticipation 2.9 release. Thanks 
> for fixing the configuration!
> 
> We did face HDFS tests timeouts in branch-2 when run together but 
> individually the tests pass:
> https://issues.apache.org/jira/browse/HDFS-12620
> 
> Folks in HDFS, can you please take a look at HDFS tests in branch-2 as we are 
> not able to get even a single Yetus run to complete due to multiple test 
> failures/timeout.
> 
> Thanks,
> Subru
> 
> On Mon, Oct 23, 2017 at 11:26 AM, Vrushali C  wrote:
> Hi Allen,
> 
> I have filed https://issues.apache.org/jira/browse/YARN-7380 for the
> timeline service findbugs warnings.
> 
> thanks
> Vrushali
> 
> 
> On Mon, Oct 23, 2017 at 11:14 AM, Allen Wittenauer  > wrote:
> 
> >
> > I’m really confused why this causes the Yahoo! QA boxes to go catatonic
> > (!?!) during the run.  As in, never come back online, probably in a kernel
> > panic. It’s pretty consistently in hadoop-hdfs, so something is going wrong
> > there… is branch-2 hdfs behaving badly?  Someone needs to run the
> > hadoop-hdfs unit tests to see what is going on.
> >
> > It’s probably worth noting that findbugs says there is a problem in the
> > timeline server hbase code.Someone should probably verify + fix that
> > issue.
> >
> >
> >
> > -
> > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >
> >
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-23 Thread Allen Wittenauer

I’m really confused why this causes the Yahoo! QA boxes to go catatonic (!?!) 
during the run.  As in, never come back online, probably in a kernel panic. 
It’s pretty consistently in hadoop-hdfs, so something is going wrong there… is 
branch-2 hdfs behaving badly?  Someone needs to run the hadoop-hdfs unit tests 
to see what is going on.

It’s probably worth noting that findbugs says there is a problem in the 
timeline server hbase code.Someone should probably verify + fix that issue.



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-21 Thread Allen Wittenauer

To whoever set this up:

There was a job config problem where the Jenkins branch parameter wasn’t passed 
to Yetus.  Therefore both of these reports have been against trunk.  I’ve fixed 
this job (as well as the other jobs) to honor that parameter.  I’ve kicked off 
a new run with these changes.




> On Oct 21, 2017, at 9:58 AM, Apache Jenkins Server 
>  wrote:
> 
> For more details, see 
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/
> 
> [Oct 20, 2017 9:27:59 PM] (stevel) HADOOP-14942. DistCp#cleanup() should 
> check whether jobFS is null.
> [Oct 21, 2017 12:19:29 AM] (subru) YARN-6871. Add additional deSelects params 
> in
> 
> 
> 
> 
> -1 overall
> 
> 
> The following subsystems voted -1:
>asflicense unit
> 
> 
> The following subsystems voted -1 but
> were configured to be filtered/ignored:
>cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace
> 
> 
> The following subsystems are considered long running:
> (runtime bigger than 1h  0m  0s)
>unit
> 
> 
> Specific tests:
> 
>Failed junit tests :
> 
>   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure100 
>   hadoop.hdfs.TestReadStripedFileWithMissingBlocks 
>   hadoop.hdfs.server.namenode.ha.TestPipelinesFailover 
>   hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency 
>   hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler 
>   hadoop.yarn.server.resourcemanager.TestApplicationMasterService 
>   hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA 
>   hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels 
>   hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue 
>   
> hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
>  
>   hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler 
>   hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher 
>   hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler 
>   hadoop.yarn.server.resourcemanager.TestRMHA 
>   
> hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification 
>   
> hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched 
>   hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps 
>   hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl 
>   
> hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation 
>   hadoop.yarn.server.resourcemanager.TestRMHATimelineCollectors 
>   hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA 
>   hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA 
>   
> hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesSchedulerActivities
>  
>   
> hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler 
>   
> hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerLazyPreemption
>  
>   
> hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
>  
>   hadoop.yarn.server.TestDiskFailures 
> 
>Timed out junit tests :
> 
>   
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.conf.TestZKConfigurationStore
>  
>   
> org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore 
>   org.apache.hadoop.yarn.server.resourcemanager.TestLeaderElectorService 
>   org.apache.hadoop.mapred.pipes.TestPipeApplication 
> 
> 
>   cc:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-compile-cc-root.txt
>   [4.0K]
> 
>   javac:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-compile-javac-root.txt
>   [284K]
> 
>   checkstyle:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-checkstyle-root.txt
>   [17M]
> 
>   pylint:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-patch-pylint.txt
>   [20K]
> 
>   shellcheck:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-patch-shellcheck.txt
>   [20K]
> 
>   shelldocs:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-patch-shelldocs.txt
>   [12K]
> 
>   whitespace:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/whitespace-eol.txt
>   [8.5M]
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/whitespace-tabs.txt
>   [292K]
> 
>   javadoc:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/diff-javadoc-javadoc-root.txt
>   [760K]
> 
>   unit:
> 
>   
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>   [308K]
>   
> https://builds.apache.or

YARN native services Re: 2017-10-06 Hadoop 3 release status update

2017-10-09 Thread Allen Wittenauer

> On Oct 6, 2017, at 5:51 PM, Eric Yang  wrote:
> yarn application -deploy –f spec.json
> yarn application -stop 
> yarn application -restart 
> yarn application -remove 
> 
> and
> 
> yarn application –list will display both application list from RM as well as 
> docker services?

IMO, that makes much more sense. [*] I’m trying think of a reason why 
I’d care if something was using this API or not.  It’s not like users can’t run 
whatever they want as part of their job now.  The break out is really only 
necessary so I have an idea if something is running that is using the REST API 
daemon. But more on that later….

> I think the development team was concerned that command structure overload 
> between batch applications and long running services.  In my view, there is 
> no difference, they are all applications.  The only distinction is the 
> launching and shutdown of services may be different from batch jobs.  I think 
> user can get used to these command structures without creating additional 
> command grouping.

I pretty much agree.  In fact, I’d love to see ‘yarn application’ even 
replace ‘yarn jar’. One Interface To Rule Them All.

I was under the impression (and, maybe this was my misunderstanding. if 
so, sorry) that “the goal” for this first pass was to integrate the existing 
Apache Slider functionality into YARN.  As it stands, I don’t think those goals 
have been met.  It doesn’t seem to be much different than just writing a shell 
profile to call slider directly:

---
function yarn_subcommand_service
{
   exec slider “$@“
}


(or whatever). Plus doing it this way, one gets the added benefit of the 
SIGNIFICANTLY better documentation. (Seriously: well done that team)

From an outside perspective, the extra daemon for running the REST API 
seems like when it should have clicked that the project is going off the rails 
and missing the whole “integration” aspect. Integrating the REST API into the 
RM from day one and the command separation would have also stuck out. If the RM 
runs the REST API, it now becomes a problem of “how does a user launch more 
than just a jar easily?” A problem that Hadoop has had since nearly day one.  
Redefining the “application” subcommand sounds like a reasonable way to move 
forward on that problem while also dropping the generic sounding "service" 
subcommand. 

But all that said, it feels like direct integration was avoided from 
the beginning and I’m unclear as to why. Take this line from the quick start 
documentation: 

"Start all the hadoop components HDFS, YARN as usual.”

a) This sentence is pretty much a declaration that this feature 
set isn’t part of “YARN”. 
b) Minimally, this should link to ClusterSetup. 

Anyway, yes, please work on removing all of these extra adoption 
barriers and increased workload on admin teams with Yet Another Daemon to 
monitor and collect metrics. 

Thanks!

[*] - I’m reminded of a conversation I had with a PMC member year or three ago 
about HDFS. They proudly almost defiantly stated that the HDFS command 
structure is such because it resembles the protocols and that was great. Guess 
what: users’ don’t care about how something is implemented, much less the 
protocols that are used to drive it. They care about consistency, EOU, and all 
those feel good things that make applications a joy to use. They have more 
important stuff to do. Copying the protocols onto the command line only help 
the person who wrote it and no one else. It’s hard not to walk away from 
playing with YARN in this branch as exhibiting those same anti-user behaviors.



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: 2017-10-06 Hadoop 3 release status update

2017-10-06 Thread Allen Wittenauer

> On Oct 6, 2017, at 1:31 PM, Andrew Wang  wrote:
> 
>   - Still waiting on Allen to review YARN native services feature.

Fake news.  

I’m still -1 on it, at least prior to a patch that posted late 
yesterday. I’ll probably have a chance to play with it early next week.


Key problems:

* still haven’t been able to bring up dns daemon due to lacking 
documentation

* it really needs better naming and command structures.  When put into 
the larger YARN context, it’s very problematic:

$ yarn —daemon start resourcemanager

vs.

$ yarn —daemon start apiserver 

if you awoke from a deep sleep from inside a cave, which one 
would you expect to “start YARN”? Made worse that the feature is called 
“YARN services” all over the place.

$ yarn service foo

… what does this even mean?

It would be great if other outsiders really looked hard at this branch 
to give the team feedback.   Once it gets released, it’s gonna be too late to 
change it….

As a sidenote:

It’d be great if the folks working on YARN spent some time 
consolidating daemons.  With this branch, it now feels like we’re approaching 
the double digit area of daemons to turn on all the features.  It’s well past 
ridiculous, especially considering we still haven’t replaced the MRJHS’s 
feature set to the point we can turn it off.


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: qbt is failiing///RE: Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-09-19 Thread Allen Wittenauer

> On Sep 19, 2017, at 6:35 AM, Brahma Reddy Battula 
>  wrote:
> 
> qbt is failing from two days with following errors, any idea on this..?

Nothing to be too concerned about.

This is what it looks like when a build server gets bounced or crashed. 
 INFRA team knows our jobs take forever so they rarely wait for them to finish 
if they are doing upgrades.  They’ve been doing that work lately; you can 
follow the action on builds@.





-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Merge yarn-native-services branch into trunk

2017-09-13 Thread Allen Wittenauer

> On Sep 8, 2017, at 9:25 AM, Jian He  wrote:
> 
> Hi Allen,
> The documentations are committed. Please check QuickStart.md and others in 
> the same folder.
> YarnCommands.md doc is updated to include new commands.
> DNS default port is also documented. 
> Would you like to give a look and see if it address your concerns ?

Somewhat. Greatly improved, but there’s still way too much “we’re 
working on this” and “here’s a link to a JIRA” and just general brokenness 
going on.

Here’s some examples from concepts.  Concepts!  The document I’d expect 
to give me very basic “when we talk about X, we mean Y” definitions:

"A host of scheduling features are being developed to support long running 
services.”

Yeah, ok?  How is this a concept?

  or

"[YARN-3998](https://issues.apache.org/jira/browse/YARN-3998) 
implements a retry-policy to let NM re-launch a service container when it 
fails.”


The patch itself went through nine revisions and a long discussion. 
Would an end user care about the details in that JIRA?  

If the answer to the last question is YES, then the documentation has 
failed.  The whole point of documentation is so they don’t have to go digging 
into the details of the implementation, the decision process that got us there, 
etc.  If they care enough about the details, they’ll run through the changelog 
and click on the JIRA link there.  If the summary line of the changelog isn’t 
obvious, well… then we need better summaries.

etc, etc.

...

The sleep example is nice.  Now, let’s see a non-toy example:  multiple 
instances of Apache httpd or MariaDB or something real and not from the Hadoop 
echo chamber (e.g., non-JVM-based).  If this is for “native” services, this 
shouldn’t be a problem, right?  Give a real example and users will buy what 
you’re selling.  I also think writing the docs and providing an example of 
doing something big and outside the team’s comfort zone will clarify where end 
users are going to need more help than what’s being provided.  Getting a 
MariaDB instance or three up will help tremendously here.

Which reminds me: something the documentation doesn’t cover is storage. 
What happens to it, where does it come from, etc, etc.  That’s an important 
detail that I didn’t see covered.  (I may have missed it.)  

…

Why are there directions to enable other, partially unrelated services 
in here?  Shouldn’t there be pointers to their specific documentation?  Is the 
expectation that if the requirements for those other services change that 
contributors will need to update multiple documents?

"Start the DNS server”

Just… yikes.

a) yarn classname … This is not how we do user-facing things. 
The fact it’s not really possible for a *daemon* to be put in the 
YarnCommands.md doc should be a giant red flag that something isn’t going 
correctly here.
b) no jsvc support for something that it’s strongly hinted at 
wanting to run privileged = an instant -1 for failing basic security practices. 
 There’s zero reason for it to be running continually as root.
c) If this would have been hooked into the shell scripts 
appropriately, logs, user switching, etc would have been had for free.
d) Where’s stop?  Right. Since it’s outside the scripts, there 
is no pid support so one has to do all of that manually….


Given:

 "3. Supports reverse lookups (name based on IP). Note, this works only 
for Docker containers.”

then:

"It should not be used as a fully-functional corporate DNS.”

Scratch corporate.  It’s not a fully functional DNS server if it can’t do 
reverse lookups.  (Which, ironically, means it’s not suitable for use with 
Apache Hadoop, given it requires both fwd and rev DNS ...)



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Merge yarn-native-services branch into trunk

2017-09-06 Thread Allen Wittenauer

> On Sep 5, 2017, at 6:23 PM, Jian He  wrote:
> 
>>  If it doesn’t have all the bells and whistles, then it shouldn’t be on 
>> port 53 by default.
> Sure, I’ll change the default port to not use 53 and document it.
>>  *how* is it getting launched on a privileged port? It sounds like the 
>> expectation is to run “command” as root.   *ALL* of the previous daemons in 
>> Hadoop that needed a privileged port used jsvc.  Why isn’t this one? These 
>> questions matter from a security standpoint.  
> Yes, it is running as “root” to be able to use the privileged port. The DNS 
> server is not yet integrated with the hadoop script. 
> 
>> Check the output.  It’s pretty obviously borked:
> Thanks for pointing out. Missed this when rebasing onto trunk.


Please correct me if I’m wrong, but the current summary of the branch, 
post these changes, looks like:

* A bunch of mostly new Java code that may or may not have 
javadocs (post-revert YARN-6877, still working out HADOOP-14835)
* ~1/3 of the docs are roadmap/TBD
* ~1/3 of the docs are for an optional DNS daemon that has no 
end user hook to start it
* ~1/3 of the docs are for a REST API that comes from some 
undefined daemon (apiserver?)
* Two new, but undocumented, subcommands to yarn
* There are no docs for admins or users on how to actually 
start or use this completely new/separate/optional feature

How are outside people (e.g., non-branch committers) supposed to test 
this new feature under these conditions?
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Merge yarn-native-services branch into trunk

2017-09-05 Thread Allen Wittenauer

> On Sep 5, 2017, at 3:12 PM, Gour Saha  wrote:
> 
> 2) Lots of markdown problems in the NativeServicesDiscovery.md document.
> This includes things like Œyarnsite.xml¹ (missing a dash.)
> 
> The md patch uploaded to YARN-5244 had some special chars. I fixed those
> in YARN-7161.


It’s a lot more than just special chars I think.  Even github (which 
has a way better markdown processor than what we’re using for the site docs) is 
having trouble rendering it:

https://github.com/apache/hadoop/blob/51c39c4261236ab714fe0ec8d00753dc4c6406ee/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/native-services/NativeServicesDiscovery.md

e.g., all of those ‘###’ are likely missing a space.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Merge yarn-native-services branch into trunk

2017-09-05 Thread Allen Wittenauer

> On Sep 5, 2017, at 2:53 PM, Jian He  wrote:
> 
>> Based on the documentation, this doesn’t appear to be a fully function DNS 
>> server as an admin would expect (e.g., BIND, Knot, whatever).  Where’s 
>> forwarding? How do I setup notify? Are secondaries even supported? etc, etc.
> 
> It seems like this is a rehash of some of the discussion you and others had 
> on the JIRA. The DNS here is a thin layer backed by service registry. My 
> understanding from the JIRA is that there are no claims that this is already 
> a DNS with all the bells and whistles - its goal is mainly to expose dynamic 
> services running on YARN as end-points. Clearly, this is an optional daemon, 
> if the provided feature set is deemed insufficient, an alternative solution 
> can be plugged in by specific admins because the DNS piece is completely 
> decoupled from the rest of native-services. 

If it doesn’t have all the bells and whistles, then it shouldn’t be on 
port 53 by default. It should also be documented that one *can’t* do these 
things.  If the standard config is likely to be a “real” server on port 53 
either acting as a secondary to the YARN one or at least able to forward 
queries to it, then these need to get documented.  As it stands, operations 
folks are going to be taken completely by surprise by some relatively random 
process sitting on a very well established port.

>> In fact:  was this even tested on port 53? How does this get launched such 
>> that it even has access to open port 53?  I don’t see any calls to use the 
>> secure daemon code in the shell scripts. Is there any jsvc voodoo or is it 
>> just “run X as root”?
> 
> Yes, we have tested this DNS server on port 53 on a cluster by running the 
> DNS server as root user. The port is clearly configurable, so the admin has 
> two options. Run as root + port 53. Run as non-root + non-privileged port. We 
> tested and left it as port 53 to keep it on a standard DNS port. It is 
> already documented as such though I can see that part can be improved a 
> little.

*how* is it getting launched on a privileged port? It sounds like the 
expectation is to run “command” as root.   *ALL* of the previous daemons in 
Hadoop that needed a privileged port used jsvc.  Why isn’t this one? These 
questions matter from a security standpoint.  

>>  4) Post-merge, yarn usage information is broken.  This is especially 
>> bad since it doesn’t appear that YarnCommands was ever updated to include 
>> the new sub-commands.
> 
> The “yarn” usage command is working for me. what do you mean ? 

Check the output.  It’s pretty obviously borked:

===snip

Daemon Commands:

nodemanager  run a nodemanager on each worker
proxyserver  run the web app proxy server
resourcemanager  run the ResourceManager
router   run the Router daemon
timelineserver   run the timeline server

Run a service Commands:

service  run a service

Run yarn-native-service rest server Commands:

apiserverrun yarn-native-service rest server


===snip===

> Yeah, looks like some previous features also forgot to update YarnCommands.md 
> for the new sub commands 

Likely.  But I was actually interested in playing with this one to 
compare it to the competition.  [Lucky you. ;) ]  But with pretty much zero 
documentation….



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Merge yarn-native-services branch into trunk

2017-09-05 Thread Allen Wittenauer

> On Aug 31, 2017, at 8:33 PM, Jian He  wrote:
> I would like to call a vote for merging yarn-native-services to trunk.

1) Did I miss it or is there no actual end-user documentation on how to 
use this?  I see 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/native-services/NativeServicesIntro.md,
 but that’s not particularly useful.  It looks like there are daemons that need 
to get started, based on other documentation?  How?  What do I configure? Is 
there a command to use to say “go do native for this job”?  I honestly have no 
idea how to make this do anything because most of the docs appear to be either 
TBD or expect me to read through a ton of JIRAs.  

2) Lots of markdown problems in the NativeServicesDiscovery.md 
document.  This includes things like ‘yarnsite.xml’ (missing a dash.)  Also, 
I’m also confused why it’s called that when the title is YARN DNS, but whatever.

3) The default port for the DNS server should NOT be 53 if typical 
deployments need to specify an alternate port.  Based on the documentation, 
this doesn’t appear to be a fully function DNS server as an admin would expect 
(e.g., BIND, Knot, whatever).  Where’s forwarding? How do I setup notify? Are 
secondaries even supported? etc, etc. In fact:  was this even tested on port 
53? How does this get launched such that it even has access to open port 53?  I 
don’t see any calls to use the secure daemon code in the shell scripts. Is 
there any jsvc voodoo or is it just “run X as root”?

4) Post-merge, yarn usage information is broken.  This is especially 
bad since it doesn’t appear that YarnCommands was ever updated to include the 
new sub-commands.

At this point in time:

-1 on 3.0.0-beta1
-0 on trunk



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



YARN javadoc failures Re: [DISCUSS] Branches and versions for Hadoop 3

2017-09-01 Thread Allen Wittenauer

> On Aug 28, 2017, at 9:58 AM, Allen Wittenauer  
> wrote:
>   The automation only goes so far.  At least while investigating Yetus 
> bugs, I've seen more than enough blatant and purposeful ignored errors and 
> warnings that I'm not convinced it will be effective. ("That javadoc compile 
> failure didn't come from my patch!"  Um, yes, yes it did.) PR for features 
> has greatly trumped code correctness for a few years now.


I'm psychic.

Looks like YARN-6877 is crashing JDK8 javadoc.  Maven stops processing 
and errors out before even giving a build error/success. Reverting the patch 
makes things work again. Anyway, Yetus caught it, warned about it continuously, 
but it was still committed.  


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Allen Wittenauer

> On Aug 28, 2017, at 12:41 PM, Jason Lowe  wrote:
> 
> I think this gets back to the "if it's worth committing" part.

This brings us back to my original question:

"Doesn't this place an undue burden on the contributor with the first 
incompatible patch to prove worthiness?  What happens if it is decided that 
it's not good enough?"

The answer, if I understand your position, is then at least a maybe 
leaning towards yes: a patch that prior to this branching policy change that  
would have gone in without any notice now has a higher burden (i.e., major 
feature) to prove worthiness ... and in the process eliminates a whole class of 
contributors and empowers others. Thus my concern ...

> As you mentioned, people are already breaking compatibility left and right as 
> it is, which is why I wondered if it was really any better in practice.  
> Personally I'd rather find out about a major breakage sooner than later, 
> since if trunk remains an active area of development at all times it's more 
> likely the community will sit up and take notice when something crazy goes 
> in.  In the past, trunk was not really an actively deployed area for over 5 
> years, and all sorts of stuff went in without people really being aware of it.

Given the general acknowledgement that the compatibility guidelines are 
mostly useless in reality, maybe the answer is really that we're doing releases 
all wrong.  Would it necessarily be a bad thing if we moved to a model where 
incompatible changes gradually released instead of one big one every seven?

Yes, I lived through the "walking on glass" days at Yahoo! and realize 
what I'm saying.  But I also think the rate of incompatible changes has slowed 
tremendously.  Entire groups of APIs aren't getting tossed out every week 
anymore.

> It sounds like we agree on that part but disagree on the specifics of how to 
> help trunk remain active.

Yup, and there is nothing wrong with that. ;)

>  Given that historically trunk has languished for years I was hoping this 
> proposal would help reduce the likelihood of it happening again.  If we 
> eventually decide that cutting branch-3 now makes more sense then I'll do 
> what I can to make that work well, but it would be good to see concrete 
> proposals on how to avoid the problems we had with it over the last 6 years.


Yup, agree. But proposals rarely seem to get much actual traction. 
(It's kind of fun reading the Hadoop bylaws and compatibility guidelines and 
old [VOTE] threads to realize how much stuff doesn't actually happen despite 
everyone generally agree that abc is a good idea.)  To circle back a bit, I do 
also agree that automation has a role to play

 Before anyone can accuse or imply me of being a hypocrite (and I'm 
sure someone eventually will privately if not publicly), I'm sure some folks 
don't realize I've been working on this set of problems from a different angle 
for the past few years.

There are a handful of people that know I was going to attempt to do a 
3.x release a few years ago. [Andrew basically beat me to it. :) ] But I ran 
into the release process.  What a mess.  Way too much manual work, lots of 
undocumented bits, violation of ASF rules(!) , etc, etc.  We've all heard the 
complaints.

My hypothesis:  if the release process itself is easier, then getting a 
release based on trunk is easier too. The more we automate, the more 
non-vendors ("non traditional release managers"?) will be willing to roll 
releases.  The more people that feel comfortable rolling a release, the more 
likelihood releases will happen.  The more likelihood of releases happening, 
the greater chance trunk had of getting out the door.

That turned into years worth of fixing and automating lots of stuff 
that was continual complained about but never fixed:  release notes, 
changes.txt, chunks of the build process, chunks of the release tar ball 
process, fixing consistency, etc.  Some of that became a part of Yetus, some of 
it didn't.  Some of that work leaked into branch-2 at some point. Many probably 
don't know why this stuff was happening.  Then there were the people that 
claimed I was "wasting my time" and that I should be focusing on "more 
important" things.  (Press release features, I'm assuming.)

So, yes, I'd like to see proposals, but I'd also like to challenge the 
community at large to spend more time on these build processes.  There's a 
tremendous amount of cruft and our usage of maven is still nearly primordial in 
implementation. (Shout out to Marton Elek who has some great although ambitious 
ideas.)  

Also kudos to Andrew for putting create-release and a lot of my other 
changes through their paces in the early days.  When he publicly stepped up to 
do the release, I don't know if he realized what he was walking into... 
-
To uns

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Allen Wittenauer

> On Aug 25, 2017, at 1:23 PM, Jason Lowe  wrote:
> 
> Allen Wittenauer wrote:
>  
> > Doesn't this place an undue burden on the contributor with the first 
> > incompatible patch to prove worthiness?  What happens if it is decided that 
> > it's not good enough?
> 
> It is a burden for that first, "this can't go anywhere else but 4.x" change, 
> but arguably that should not be a change done lightly anyway.  (Or any other 
> backwards-incompatible change for that matter.)  If it's worth committing 
> then I think it's perfectly reasonable to send out the dev announce that 
> there's reason for trunk to diverge from 3.x, cut branch-3, and move on.  
> This is no different than Andrew's recent announcement that there's now a 
> need for separating trunk and the 3.0 line based on what's about to go in.

So, by this definition as soon as a patch comes in to remove deprecated 
bits there will be no issue with a branch-3 getting created, correct?

>  Otherwise if past trunk behavior is any indication, it ends up mostly 
> enabling people to commit to just trunk, forgetting that the thing they are 
> committing is perfectly valid for branch-3. 

I'm not sure there was any "forgetting" involved.  We likely wouldn't 
be talking about 3.x at all if it wasn't for the code diverging enough.

> > Given the number of committers that openly ignore discussions like this, 
> > who is going to verify that incompatible changes don't get in?
>  
> The same entities who are verifying other bugs don't get in, i.e.: the 
> committers and the Hadoop QA bot running the tests.
>  Yes, I know that means it's inevitable that compatibility breakages will 
> happen, and we can and should improve the automation around compatibility 
> testing when possible.

The automation only goes so far.  At least while investigating Yetus 
bugs, I've seen more than enough blatant and purposeful ignored errors and 
warnings that I'm not convinced it will be effective. ("That javadoc compile 
failure didn't come from my patch!"  Um, yes, yes it did.) PR for features has 
greatly trumped code correctness for a few years now.

In any case, specifically thinking of the folks that commit maybe one 
or two patches a year.  They generally don't pay attention to *any* of this 
stuff and it doesn't seem like many people are actually paying attention to 
what gets committed until it breaks their universe.

>  But I don't think there's a magic bullet for preventing all compatibility 
> bugs from being introduced, just like there isn't one for preventing general 
> bugs.  Does having a trunk branch separate but essentially similar to 
> branch-3 make this any better?

Yes: it's been the process for over a decade now.  Unless there is some 
outreach done, it is almost a guarantee that someone will commit something to 
trunk they shouldn't because they simply won't know (or care?) the process has 
changed.  

> > Longer term:  what is the PMC doing to make sure we start doing major 
> > releases in a timely fashion again?  In other words, is this really an 
> > issue if we shoot for another major in (throws dart) 2 years?
> 
> If we're trying to do semantic versioning

FWIW: Hadoop has *never* done semantic versioning. A large percentage 
of our minors should really have been majors. 

> then we shouldn't have a regular cadence for major releases unless we have a 
> regular cadence of changes that break compatibility.  

But given that we don't follow semantic versioning

> I'd hope that's not something we would strive towards.  I do agree that we 
> should try to be better about shipping releases, major or minor, in a more 
> timely manner, but I don't agree that we should cut 4.0 simply based on a 
> duration since the last major release.

... the only thing we're really left with is (technically) time, either 
in the form of a volunteer saying "hey, I've got time to cut a release" or "my 
employer has a corporate goal based upon a feature in this release".   I would 
*love* for the PMC to define a policy or guidelines that says the community 
should strive for a major after x  incompatible changes, a minor after y 
changes, a micro after z fixes.  Even if it doesn't have any teeth, it would at 
least give people hope that their contributions won't be lost in the dustbin of 
history and may actually push others to work on getting a release out.  (Hadoop 
has people made committers based upon features that have never gotten into a 
stable release.  Needless to say, most of those people no longer contribute 
actively if at al

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-25 Thread Allen Wittenauer

> On Aug 25, 2017, at 10:36 AM, Andrew Wang  wrote:

> Until we need to make incompatible changes, there's no need for
> a Hadoop 4.0 version.

Some questions:

Doesn't this place an undue burden on the contributor with the first 
incompatible patch to prove worthiness?  What happens if it is decided that 
it's not good enough?

How many will it take before the dam will break?  Or is there a 
timeline going to be given before trunk gets set to 4.x?  

Given the number of committers that openly ignore discussions like 
this, who is going to verify that incompatible changes don't get in?

Longer term:  what is the PMC doing to make sure we start doing major 
releases in a timely fashion again?  In other words, is this really an issue if 
we shoot for another major in (throws dart) 2 years?
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Branch merges and 3.0.0-beta1 scope

2017-08-22 Thread Allen Wittenauer
We should avoid turning this into a replay of Apache Hadoop 2.6.0 (and 
to a lesser degree, 2.7.0 and 2.8.0) where a bunch of last minute 
“experimental” features derail stability for a significantly long period of 
time.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6934) downlink.data is written to CWD

2017-08-04 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created MAPREDUCE-6934:
---

 Summary: downlink.data is written to CWD
 Key: MAPREDUCE-6934
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6934
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: pipes
Affects Versions: 3.0.0-beta1
Reporter: Allen Wittenauer
Priority: Minor


When using Pipes, the downlink.data stream is written to the current working 
directory.  This is a big of a problem when running MR jobclient tests in 
parallel as the file is written outside of target.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-2320) RAID DistBlockFixer should limit pending jobs instead of pending files

2017-08-01 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-2320.
-
Resolution: Won't Fix

MR RAID has been replaced by HDFS EC in modern versions of Hadoop.

> RAID DistBlockFixer should limit pending jobs instead of pending files
> --
>
> Key: MAPREDUCE-2320
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2320
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Affects Versions: 0.20.2, 0.20.3
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
>Priority: Minor
>
> DistBlockFixer limits the number of files being fixed simultaneously to avoid 
> an unlimited backlog. This limits the number of parallel jobs though, and if 
> one job has a long running task, it prevents newer jobs being started. 
> Instead, it should have a limit on running jobs. That way, one long running 
> task will not block other jobs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-2267) Parallelize reading of blocks within a stripe

2017-08-01 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-2267.
-
Resolution: Won't Fix

MR RAID has been replaced by HDFS EC in modern versions of Hadoop.

> Parallelize reading of blocks within a stripe
> -
>
> Key: MAPREDUCE-2267
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2267
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Affects Versions: 0.22.0
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
> Attachments: MAPREDUCE-2267.1.patch, MAPREDUCE-2267.2.patch, 
> MAPREDUCE-2267.3.patch, MAPREDUCE-2267.4.patch, MAPREDUCE-2267.patch
>
>
> RAID code has several instances where several blocks of data have to be read 
> to perform an operation. For example, computing a parity block requires 
> reading the blocks of the source file. Similarly, generating a fixed block 
> requires reading a parity block and the good blocks from the source file. 
> These read operations proceed sequentially currently. RAID code should use a 
> thread pool to increase the parallelism and thus reduce latency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-2189) RAID Parallel traversal needs to synchronize stats

2017-08-01 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-2189.
-
Resolution: Won't Fix

MR RAID has been replaced by HDFS EC in modern versions of Hadoop.

> RAID Parallel traversal needs to synchronize stats
> --
>
> Key: MAPREDUCE-2189
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2189
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Reporter: Ramkumar Vadali
>Assignee: Ramkumar Vadali
> Attachments: MAPREDUCE-2189.patch
>
>
> The implementation of multi-threaded directory traversal does not update 
> stats in a thread-safe manner



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Are binary artifacts are part of a release?

2017-07-31 Thread Allen Wittenauer

... that doesn't contradict anything I said.  

> On Jul 31, 2017, at 7:23 PM, Konstantin Shvachko  wrote:
> 
> The issue was discussed on several occasions in the past.
> Took me a while to dig this out as an example:
> http://mail-archives.apache.org/mod_mbox/hadoop-general/20.mbox/%3C4EB0827C.6040204%40apache.org%3E
> 
> Doug Cutting:
> "Folks should not primarily evaluate binaries when voting. The ASF primarily 
> produces and publishes source-code
> so voting artifacts should be optimized for evaluation of that."
> 
> Thanks,
> --Konst
> 
> On Mon, Jul 31, 2017 at 4:51 PM, Allen Wittenauer 
>  wrote:
> 
> > On Jul 31, 2017, at 4:18 PM, Andrew Wang  wrote:
> >
> > Forking this off to not distract from release activities.
> >
> > I filed https://issues.apache.org/jira/browse/LEGAL-323 to get clarity on 
> > the matter. I read the entire webpage, and it could be improved one way or 
> > the other.
> 
> 
> IANAL, my read has always lead me to believe:
> 
> * An artifact is anything that is uploaded to dist.a.o and 
> repository.a.o
> * A release consists of one or more artifacts ("Releases are, 
> by definition, anything that is published beyond the group that owns it. In 
> our case, that means any publication outside the group of people on the 
> product dev list.")
> * One of those artifacts MUST be source
> * (insert voting rules here)
> * They must be built on a machine in control of the RM
> * There are no exceptions for alpha, nightly, etc
> * (various other requirements)
> 
> i.e., release != artifact  it's more like release = 
> artifact * n .
> 
> Do you have to have binaries?  No (e.g., Apache SpamAssassin has no 
> binaries to create).  But if you place binaries in dist.a.o or 
> repository.a.o, they are effectively part of your release and must follow the 
> same rules.  (Votes, etc.)
> 
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Are binary artifacts are part of a release?

2017-07-31 Thread Allen Wittenauer

> On Jul 31, 2017, at 4:18 PM, Andrew Wang  wrote:
> 
> Forking this off to not distract from release activities.
> 
> I filed https://issues.apache.org/jira/browse/LEGAL-323 to get clarity on the 
> matter. I read the entire webpage, and it could be improved one way or the 
> other.


IANAL, my read has always lead me to believe:

* An artifact is anything that is uploaded to dist.a.o and 
repository.a.o
* A release consists of one or more artifacts ("Releases are, 
by definition, anything that is published beyond the group that owns it. In our 
case, that means any publication outside the group of people on the product dev 
list.")
* One of those artifacts MUST be source
* (insert voting rules here)
* They must be built on a machine in control of the RM
* There are no exceptions for alpha, nightly, etc
* (various other requirements)

i.e., release != artifact  it's more like release = 
artifact * n .

Do you have to have binaries?  No (e.g., Apache SpamAssassin has no 
binaries to create).  But if you place binaries in dist.a.o or repository.a.o, 
they are effectively part of your release and must follow the same rules.  
(Votes, etc.)


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.7.4 (RC0)

2017-07-31 Thread Allen Wittenauer

> On Jul 31, 2017, at 11:20 AM, Konstantin Shvachko  
> wrote:
> 
> https://wiki.apache.org/hadoop/HowToReleasePreDSBCR

FYI:

If you are using ASF Jenkins to create an ASF release artifact, 
it's pretty much an automatic vote failure as any such release is in violation 
of ASF policy.


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Pre-Commit build is failing

2017-07-25 Thread Allen Wittenauer

Again: just grab the .gitignore file from trunk and update it in 
branch-2.7. It hasn't been touched (outside of one patch) in years.  The 
existing jobs should then work. 

The rest of this stuff, yes, I know and yes it's intentional.  The 
directory structure was inherited from the original jobs that Nigel set up with 
the old version of test-patch.  Maybe some day I'll fix it.  But that's a 
project for a different day.  In order to fix it, it means taking down the 
patch testing for Hadoop while I work it out.  You'll notice that all of the 
other Yetus jobs for Hadoop have a much different layout.




> On Jul 25, 2017, at 7:24 PM, suraj acharya  wrote:
> 
> Hi,
> 
> Seems like the issue was incorrect/unclean checkout.
> I made a few changes[1] to the directories the checkout happens to  and it is 
> now running. 
> Of course, this build[2] will take some time to run, but at the moment, it is 
> running maven install.
> 
> I am not sure who sets up/ manages the jenkins job of HDFS and dont want to 
> change that, but I will keep the dummy job around for a couple of days in 
> case anyone wants to see.
> Also, I see that you'll were using the master branch of Yetus. If there is no 
> patch present there that is of importance, then I would recommend to use the 
> latest stable release version 0.5.0
> 
> If you have more questions, feel free to ping dev@yetus.
> Hope this helps.
> 
> [1]: https://builds.apache.org/job/PreCommit-HDFS-Build-Suraj-Copy/configure
> [2]: https://builds.apache.org/job/PreCommit-HDFS-Build-Suraj-Copy/12/console
> 
> -Suraj Acharya
> 
> On Tue, Jul 25, 2017 at 6:57 PM, suraj acharya  wrote:
> For anyone looking. I created another job here. [1].
> Set it with debug to see the issue.
> The error is being seen here[2].
> From the looks of it, it looks like, the way the checkout is happening is not 
> very clean.
> I will continue to look at it, but in case anyone wants to jump in.
> 
> [1] : https://builds.apache.org/job/PreCommit-HDFS-Build-Suraj-Copy/
> [2] : https://builds.apache.org/job/PreCommit-HDFS-Build-Suraj-Copy/11/console
> 
> -Suraj Acharya
> 
> On Tue, Jul 25, 2017 at 6:28 PM, Konstantin Shvachko  
> wrote:
> Hi Yetus developers,
> 
> We cannot build Hadoop branch-2.7 anymore. Here is a recent example of a
> failed build:
> https://builds.apache.org/job/PreCommit-HDFS-Build/20409/console
> 
> It seems the build is failing because Yetus cannot apply the patch from the
> jira.
> 
> ERROR: HDFS-11896 does not apply to branch-2.7.
> 
> As far as I understand this is Yetus problem. Probably in 0.3.0.
> I can apply this patch successfully, but Yetus test-patch.sh script clearly
> failed to apply. Cannot say why because Yetus does not report it.
> I also ran Hadoop's test-patch.sh script locally and it passed successfully
> on branch-2.7.
> 
> Could anybody please take a look and help fixing the build.
> This would be very helpful for the release (2.7.4) process.
> 
> Thanks,
> --Konst
> 
> On Mon, Jul 24, 2017 at 10:41 PM, Konstantin Shvachko 
> wrote:
> 
> > Or should we backport the entire HADOOP-11917
> > <https://issues.apache.org/jira/browse/HADOOP-11917> ?
> >
> > Thanks,
> > --Konst
> >
> > On Mon, Jul 24, 2017 at 6:56 PM, Konstantin Shvachko  > > wrote:
> >
> >> Allen,
> >>
> >> Should we add "patchprocess/" to .gitignore, is that the problem for 2.7?
> >>
> >> Thanks,
> >> --Konstantin
> >>
> >> On Fri, Jul 21, 2017 at 6:24 PM, Konstantin Shvachko <
> >> shv.had...@gmail.com> wrote:
> >>
> >>> What stuff? Is there a jira?
> >>> It did work like a week ago. Is it a new Yetus requirement.
> >>> Anyways I can commit a change to fix the build on our side.
> >>> Just need to know what is missing.
> >>>
> >>> Thanks,
> >>> --Konst
> >>>
> >>> On Fri, Jul 21, 2017 at 5:50 PM, Allen Wittenauer <
> >>> a...@effectivemachines.com> wrote:
> >>>
> >>>>
> >>>> > On Jul 21, 2017, at 5:46 PM, Konstantin Shvachko <
> >>>> shv.had...@gmail.com> wrote:
> >>>> >
> >>>> > + d...@yetus.apache.org
> >>>> >
> >>>> > Guys, could you please take a look. Seems like Yetus problem with
> >>>> > pre-commit build for branch-2.7.
> >>>>
> >>>>
> >>>> branch-2.7 is missing stuff in .gitignore.
> >>>
> >>>
> >>>
> >>
> >
> 
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: About 2.7.4 Release

2017-05-02 Thread Allen Wittenauer

Is there any reason to not Close -alpha1+resolved state JIRAs?  It's been quite 
a while and those definitely should not getting re-opened anymore.  What about 
-alpha2's that are also resolved?
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: About 2.7.4 Release

2017-05-01 Thread Allen Wittenauer

> On May 1, 2017, at 2:27 PM, Andrew Wang  wrote:
> I believe I asked about this on dev-yetus a while back. I'd prefer that the 
> presence of the fix version be sufficient to indicate whether a JIRA is 
> included in a release branch. Yetus requires that the JIRA be resolved as 
> "Fixed" to show up, which is why we are in our current situation.

We can't do this because Hadoop is the only one that I've seen that 
sets Fix version at close time.  Everyone else is setting fix version in place 
of target (which is a custom field, iirc).



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: About 2.7.4 Release

2017-04-26 Thread Allen Wittenauer

> On Apr 25, 2017, at 12:35 AM, Akira Ajisaka  wrote:
> > Maybe we should create a jira to track this?
> 
> I think now either way (reopen or create) is fine.
> 
> Release doc maker creates change logs by fetching information from JIRA, so 
> reopening the tickets should be avoided when a release process is in progress.
> 

Keep in mind that the release documentation is part of the build 
process.  Users who are doing their own builds will have incomplete 
documentation if we keep re-opening JIRAs after a release.  At one point, JIRA 
was configured to refuse re-opening after a release is cut.  I'm not sure why 
it stopped doing that, but it might be time to see if we can re-enable that 
functionality.


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: HADOOP-14316: Switching from Findbugs to Spotbugs

2017-04-19 Thread Allen Wittenauer

> On Apr 19, 2017, at 10:52 AM, Wei-Chiu Chuang  wrote:
> That sounds scary. Would you mind to share the list of bugs that spotbugs 
> found? Sounds like some of them may warrant new blockers jiras for Hadoop 3.


I've added the list to the JIRA.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



HADOOP-14316: Switching from Findbugs to Spotbugs

2017-04-19 Thread Allen Wittenauer
Hey gang.

HADOOP-14316 enables the spotbugs back-end for the findbugs front-end.  
Spotbugs (https://spotbugs.github.io/) is the fork of findbugs that the 
community and some of the major contributors have made to move findbugs 
forward.  It is geared towards JDK8 and JDK9. 

Before I commit, I wanted to give a heads up to the community about 
this change.

After committal, there will be (approx) 62 new findbugs issues that 
will pop up in the source tree.  My quick pass over a handful of them indicates 
that a good number are legit/not false positives (and one of them, well, we got 
lucky it's an API that no one updates/uses). It will take the community to fix 
up those 62 problems, either by actually fixing them (better) or exempting them 
(not great, but sometimes required).

Thanks.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-04-17 Thread Allen Wittenauer
Looks like someone reset HEAD back to Mar 31. 

Sent from my iPad

> On Apr 16, 2017, at 12:08 AM, Apache Jenkins Server 
>  wrote:
> 
> For more details, see 
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/378/
> 
> 
> 
> 
> 
> -1 overall
> 
> 
> The following subsystems voted -1:
>docker
> 
> 
> Powered by Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org
> 
> 
> 
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6875) mapred-site.xml in bin tarball lacks a license

2017-04-05 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created MAPREDUCE-6875:
---

 Summary: mapred-site.xml in bin tarball lacks a license
 Key: MAPREDUCE-6875
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6875
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Allen Wittenauer
Priority: Blocker


mapred-site.xml file needs a license.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] Changing the default class path for clients

2017-04-03 Thread Allen Wittenauer

1.0.4:

"Prints the class path needed to get the Hadoop jar and the required 
libraries.”

 2.8.0 and 3.0.0:

"Prints the class path needed to get the Hadoop jar and the required 
libraries. If called without arguments, then prints the classpath set up by the 
command scripts, which is likely to contain wildcards in the classpath entries.”

I would take that to mean “what gives me all the public APIs?”  Which, 
by definition, should all be in hadoop-client-runtime (with the possible 
exception of the DistributedFileSystem Quota APIs, since for some reason those 
are marked public.) 

Let me ask it a different way:

Why should ‘yarn jar’, ‘mapred jar’, ‘hadoop distcp’, ‘hadoop fs’, etc, 
etc, etc, have anything but hadoop-client-runtime as the provided jar? Yes, 
some things might break, but given this is 3.0, some changes should be expected 
anyway. Given the definition above "needed to get the Hadoop jar and the 
required libraries”  switching this over seems correct.  


> On Apr 3, 2017, at 10:37 AM, Esteban Gutierrez  wrote:
> 
> 
> I agreed with Andrew too. Users have relied for years on `hadoop classpath` 
> for their script to launch jobs or other tools, perhaps no the best idea to 
> change the behavior without providing a proper deprecation path.
> 
> thanks!
> esteban.
> 
> --
> Cloudera, Inc.
> 
> 
> On Mon, Apr 3, 2017 at 10:26 AM, Andrew Wang  wrote:
> What's the current contract for `hadoop classpath`? Would it be safer to
> introduce `hadoop userclasspath` or similar for this behavior?
> 
> I'm betting that changing `hadoop classpath` will lead to some breakages,
> so I'd prefer to make this new behavior opt-in.
> 
> Best,
> Andrew
> 
> On Mon, Apr 3, 2017 at 9:04 AM, Allen Wittenauer 
> wrote:
> 
> >
> > This morning I had a bit of a shower thought:
> >
> > With the new shaded hadoop client in 3.0, is there any reason the
> > default classpath should remain the full blown jar list?  e.g., shouldn’t
> > ‘hadoop classpath’ just return configuration, user supplied bits (e.g.,
> > HADOOP_USER_CLASSPATH, etc), HADOOP_OPTIONAL_TOOLS, and
> > hadoop-client-runtime? We’d obviously have to add some plumbing for daemons
> > and the capability for the user to get the full list, but that should be
> > trivial.
> >
> > Thoughts?
> > -
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
> >
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[DISCUSS] Changing the default class path for clients

2017-04-03 Thread Allen Wittenauer

This morning I had a bit of a shower thought:

With the new shaded hadoop client in 3.0, is there any reason the 
default classpath should remain the full blown jar list?  e.g., shouldn’t 
‘hadoop classpath’ just return configuration, user supplied bits (e.g., 
HADOOP_USER_CLASSPATH, etc), HADOOP_OPTIONAL_TOOLS, and hadoop-client-runtime? 
We’d obviously have to add some plumbing for daemons and the capability for the 
user to get the full list, but that should be trivial.  

Thoughts?
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Can we update protobuf's version on trunk?

2017-03-28 Thread Allen Wittenauer

> On Mar 28, 2017, at 5:09 PM, Chris Douglas  wrote:
> 
> I haven't seen data identifying PB as a bottleneck, but the
> non-x86/non-Linux and dev setup arguments may make this worthwhile. -C

FWIW, we have the same problem with leveldbjni-all. (See the ASF 
PowerPC build logs) I keep meaning to spend time on the maven build to actually 
download and install since a) the project appears to be never headed for a 
release and b) it's not an optional component in YARN for some reason.  
Potentially in combination of moving from leveldbjni-all to just leveldbjni.



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [RESULT] [VOTE] Release Apache Hadoop 2.8.0 (RC3)

2017-03-23 Thread Allen Wittenauer

Just a heads up.  Looks like some removed the Finish Date off of 2.8.0 in JIRA. 
 It needs to be put back to match what is in the artifacts that we voted on.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.0 (RC3)

2017-03-21 Thread Allen Wittenauer

> On Mar 21, 2017, at 10:12 AM, Andrew Wang  wrote:
> 
> I poked around a bit. The 3.0.0-alpha2 binary tarball is only 246M and has
> more changes than 2.8.0.


Not to disclaim any other potential issues, but it's worth noting 3.x de-dupes 
jar files as part of the packaging process.  So it's not exactly an 
apples-to-apples comparison. (Although I think the new yarn-ui made the 
significant loss in excess jars moot.  Without that, I'd expect 3.x to be about 
half the size.)
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: About 2.7.4 Release

2017-03-08 Thread Allen Wittenauer

> On Mar 8, 2017, at 1:54 PM, Allen Wittenauer  
> wrote:
> 
>   This is already possible:
>   * don’t use —asfrelease
>   * use —sign, —native, and, if appropriate for your platform, 
> —docker and —dockercache


Oh yeah, I forgot about this:

https://effectivemachines.com/2016/08/16/building-your-own-apache-hadoop-distribution/



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: About 2.7.4 Release

2017-03-08 Thread Allen Wittenauer

> On Mar 8, 2017, at 10:55 AM, Marton Elek  wrote:
> 
> I think the main point here is the testing of the release script, not the 
> creation of the official release.

… except the Hadoop PMC was doing exactly this from 2.3.0 up until 
recently. Which means we have a few years worth of releases that are 
effectively untrustworthy despite being signed.  One of the (many) reasons I 
rewrote the release process was to get Hadoop back in line with ASF policy.  
Given the massive turn over in committers, I don’t want us to repeat the same 
mistakes (like we usually do).

> I think there should be an option to configure the release tool to use a 
> forked github repo and/or a private playground nexus instead of official 
> apache repos. In this case it would be easy to test regularly the tool, even 
> by a non-committer (or even from Jenkins). But it would be just a smoketest 
> of the release script…

This is already possible:
* don’t use —asfrelease
* use —sign, —native, and, if appropriate for your platform, 
—docker and —dockercache


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: About 2.7.4 Release

2017-03-07 Thread Allen Wittenauer

> On Mar 7, 2017, at 2:51 PM, Andrew Wang  wrote:
> I think it'd be nice to
> have a nightly Jenkins job that builds an RC,

Just a reminder that any such build cannot be used for an actual 
release:

http://www.apache.org/legal/release-policy.html#owned-controlled-hardware



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0-alpha2 RC0

2017-01-23 Thread Allen Wittenauer

> On Jan 23, 2017, at 8:50 PM, Chris Douglas  wrote:
> 
> Thanks for all your work on this, Andrew. It's great to see the 3.x
> series moving forward.
> 
> If you were willing to modify the release notes and add the LICENSE to
> the jar, we don't need to reset the clock on the VOTE, IMO.

FWIW, I wrote a new version of the verify-license-files tool and attached it to 
HADOOP-13374.  This version actually verifies that the license and notice files 
in jars and wars matches the one in base of the (tarball) distribution.

ERROR: hadoop-client-api-3.0.0-alpha3-SNAPSHOT.jar: Missing a LICENSE file
ERROR: hadoop-client-api-3.0.0-alpha3-SNAPSHOT.jar: No valid NOTICE found

WARNING: hadoop-client-minicluster-3.0.0-alpha3-SNAPSHOT.jar: Found 5 LICENSE 
files (0 were valid)
ERROR: hadoop-client-minicluster-3.0.0-alpha3-SNAPSHOT.jar: No valid LICENSE 
found
WARNING: hadoop-client-minicluster-3.0.0-alpha3-SNAPSHOT.jar: Found 3 NOTICE 
files (0 were valid)
ERROR: hadoop-client-minicluster-3.0.0-alpha3-SNAPSHOT.jar: No valid NOTICE 
found

ERROR: hadoop-client-runtime-3.0.0-alpha3-SNAPSHOT.jar: No valid LICENSE found
ERROR: hadoop-client-runtime-3.0.0-alpha3-SNAPSHOT.jar: No valid NOTICE found

> What's the issue with the minicluster jar [1]? I tried to reproduce,
> but had no issues with 1.8.0_92-b14.

minicluster is kind of weird on filesystems that don't support mixed case, like 
OS X's default HFS+.

$  jar tf hadoop-client-minicluster-3.0.0-alpha3-SNAPSHOT.jar | grep -i license
LICENSE.txt
license/
license/LICENSE
license/LICENSE.dom-documentation.txt
license/LICENSE.dom-software.txt
license/LICENSE.sax.txt
license/NOTICE
license/README.dom.txt
license/README.sax.txt
LICENSE
Grizzly_THIRDPARTYLICENSEREADME.txt



The problem here is that there is a 'license' directory and a file called 
'LICENSE'.  If this gets extracted by jar via jar xf, it will fail.  unzip can 
be made to extract it via an option like -o.  To make matters worse, none of 
these license files match the one in the generated tarball. :(



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release cadence and EOL

2017-01-23 Thread Allen Wittenauer

> On Jan 21, 2017, at 7:08 PM, Karthik Kambatla  wrote:
> 
>   3. RM: some method to madness. Junping, for instance, is trying to roll
>   a release with 2300 patches. It is a huge time investment. (Thanks again,
>   Junping.) Smaller releases are easier to manage. A target release cadence,
>   coupled with a process that encourages volunteering, IMO would lead to more
>   committers doing releases.


In the case of 2.8.0, that's on the original RM and "back port fever" 
that inflicts way too many committers.  2.8.0 has been sitting in a separate 
branch for over a year.  Of *course* it is going to be a disaster.  If the 
original RM had said "I don't have time, someone take over" after 3 months of 
it being left idle or another committer feeling as though they could take it or 
not everyone dumping everything they can in every release possible, it wouldn't 
be nearly as bad.

Not only do we need to encourage volunteering, but we also need to 
encourage people to relinquish control. If the PMC wants to enact a cadence, 
then they also must be willing to step in when an RM is unresponsive and 
request someone else take over.  A message every three months saying "Yes, I'm 
still working on it." doesn't really help anyone, including the RM.


> To conclude, the biggest value I see is us (the community) agreeing on good
> practices for our releases and work towards that. Writing it down somewhere
> makes it a little more formal like the compatibility stuff, even if it is
> not enforceable.

So it's exactly like the compatibility stuff. ;)


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0-alpha2 RC0

2017-01-22 Thread Allen Wittenauer

> On Jan 22, 2017, at 9:05 PM, Allen Wittenauer  
> wrote:
> 
> 
> 
> 
> 
>> On Jan 20, 2017, at 2:36 PM, Andrew Wang  wrote:
>> 
>> http://home.apache.org/~wang/3.0.0-alpha2-RC0/
> 
>   There are quite a few JIRA issues that need release notes.
> 


One other thing, before I forget... I'm not sure the 
hadoop-client-minicluster jar is getting built properly.  If you look inside, 
you'll find a real mishmash of things, including files and directories with the 
same names but different cases.  This means it won't extract properly on OS X.  
(jar xf on that jar file literally stack traces on my El Capitan machine. Neat!)
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0-alpha2 RC0

2017-01-22 Thread Allen Wittenauer




> On Jan 20, 2017, at 2:36 PM, Andrew Wang  wrote:
> 
> http://home.apache.org/~wang/3.0.0-alpha2-RC0/

There are quite a few JIRA issues that need release notes.


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [Continued] [Release thread] 2.8.0 release activities

2017-01-20 Thread Allen Wittenauer
If you ran mvn clean at any point in your repo between create-release and mvn 
deploy, you'll need to start at running create-release again.  create-release 
leaves things in a state that mvn deploy should be ready to go, with no clean 
necessary.


> On Jan 20, 2017, at 11:12 AM, Junping Du  wrote:
> 
> Yes. I did maven deploy in root directory before close the staging 
> repository. If this is the only suspect, I can drop the repository and do mvn 
> deploy again.
> 
> 
> Thanks,
> 
> 
> Junping
> 
> 
> From: Andrew Wang 
> Sent: Friday, January 20, 2017 10:48 AM
> To: Junping Du
> Cc: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
> Subject: Re: [Continued] [Release thread] 2.8.0 release activities
> 
> You can check the error message by clicking on it, a bunch like this:
> 
> Missing Signature: 
> '/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.8.0/hadoop-mapreduce-client-jobclient-2.8.0-tests.jar.asc'
>  does not exist for 'hadoop-mapreduce-client-jobclient-2.8.0-tests.jar'.
> 
> Did you maven deploy all the signature files?
> 
> On Fri, Jan 20, 2017 at 2:04 AM, Junping Du 
> mailto:j...@hortonworks.com>> wrote:
> Hi,
>I have successfully built the release bit on branch-2.8.0 by following: 
> https://wiki.apache.org/hadoop/HowToRelease step by step. However, when try 
> to close the "Staging Repositories" at Nexus page 
> (https://repository.apache.org/#stagingRepositories), I found our repository 
> - orgapachehadoop-1051 cannot be closed due to some signature validation 
> failed and one Apache rule failed. I never met this problem before in 
> previous releasing 2.6.3 and 2.6.4. Is this related to our recent changes on 
> release process/tools (docker based)? Any ideas or thoughts on how to fix the 
> problem?
> 
> Thanks,
> 
> Junping
> 
> 
> From: Junping Du mailto:j...@hortonworks.com>>
> Sent: Thursday, January 19, 2017 6:46 PM
> To: common-...@hadoop.apache.org; 
> hdfs-...@hadoop.apache.org; 
> mapreduce-dev@hadoop.apache.org; 
> yarn-...@hadoop.apache.org
> Cc: Varun Vasudev
> Subject: Re: [Continued] [Release thread] 2.8.0 release activities
> 
> According to Varun's offline email, the security fixes has landed on 
> branch-2, 2.8 and 2.8.0 branch.
> I was kicking off a new RC build (RC1), and will publish it for vote soon. In 
> the mean time, please mark fix version as 2.8.1 for any new commits landed on 
> branch-2.8, and don't commit anything to branch-2.8.0 at this moment. Thanks!
> 
> Cheers,
> 
> Junping
> 
> 
> From: Junping Du mailto:j...@hortonworks.com>>
> Sent: Wednesday, January 18, 2017 3:26 PM
> To: common-...@hadoop.apache.org; 
> hdfs-...@hadoop.apache.org; 
> mapreduce-dev@hadoop.apache.org; 
> yarn-...@hadoop.apache.org
> Cc: Varun Vasudev
> Subject: Re: [Continued] [Release thread] 2.8.0 release activities
> 
> Hi folks,
> In the passed one or two weeks, we found some new blockers get coming on 
> branch-2.8, like: YARN-6068 (log aggregation get stuck when NM restart with 
> work preserving enabled) and YARN-6072 (RM unable to start in secure mode). 
> Both of them are fixed now (YARN-6072 is fixed by Ajith and I fixed 
> YARN-6068), and I was starting the RC build process since early this week. As 
> we have significant build tools/process change (docker based) since 2.8 
> comparing with 2.6/2.7, it takes me a while to get familiar with it and 
> finally get a successful build on 2.8.0-RC0 last night.
> I already push RC0 tag into public which is prerequisite step before RC 
> voting. However,  in the mean while, I was pinged by Varun Vasudev that there 
> are a known vulnerability issues on container_executor get identified and 
> discussed in hadoop security email threads - looks like YARN-5704 fixed part 
> of it, but left part - the privilege escalation via /proc/self/environ is not 
> fixed yet. So most likely, I have to withdraw our 2.8.0 RC0 although I 
> haven't announced it public for vote yet. I will wait this issue get fixed to 
> prepare a new release candidate. As RC0 tag cannot be reverted after push 
> into apache, our next release candidate will start from RC1.
>As I mentioned in early email, 2.8.0 is a very big release (2300+ commits 
> since 2.7.3) and I am glad that we are almost there. Thanks everyone for 
> being patient and contributing our release work. Please let me know if you 
> have more comments or suggestions.
> 
> Thanks,
> 
> Junping
> 
> From: Junping Du mailto:j...@hortonworks.com>>
> Sent: Wednesday, J

Re: [VOTE] Release cadence and EOL

2017-01-18 Thread Allen Wittenauer

> On Jan 18, 2017, at 11:21 AM, Chris Trezzo  wrote:
> 
> Thanks Sangjin for pushing this forward! I have a few questions:

These are great questions, because I know I'm not seeing a whole lot of 
substance in this vote.  The way to EOL software in the open source universe is 
with new releases and aging it out.  If someone wants to be a RE for a new 
branch-1 release, more power to them.  As volunteers to the ASF, we're not on 
the hook to provide much actual support.  This feels more like a vendor play 
than a community one.  But if the PMC want to vote on it, whatever.  It won't 
be first bylaw that doesn't really mean much.

> 1. What is the definition of end-of-life for a release in the hadoop
> project? My current understanding is as follows: When a release line
> reaches end-of-life, there are no more planned releases for that line.
> Committers are no longer responsible for back-porting bug fixes to the line
> (including fixed security vulnerabilities) and it is essentially
> unmaintained.

Just a point of clarification.  There is no policy that says that 
committers must back port.  It's up to the individual committers to push a 
change onto any particular branch. Therefore, this vote doesn't really change 
anything in terms of committer responsibilities here.

> 2. How do major releases affect the end-of-life proposal? For example, how
> does a new minor release in the next major release affect the end-of-life
> of minor releases in a previous major release? Is it possible to have a
> maintained 2.x release if there is a 3.3 release?

I'm looking forward to seeing this answer too, given that 2.7.0 is 
probably past the 2 year mark, 2.8.0 has seemingly been in a holding pattern 
for over a year, and the next 3.0.0 alpha should be RSN

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.6.5 (RC1)

2016-10-06 Thread Allen Wittenauer

> On Oct 6, 2016, at 1:39 PM, Akira Ajisaka  wrote:
> 
> > It wasn't 'renamed' to jenkins, prior releases were actually built by and 
> > on the Jenkins infrastructure. Which was a very very bad idea:  it's 
> > insecure and pretty much against ASF policy.
> 
> Sorry for the confusion. I should not have used the word 'rename'.
> What I meant is that "would you change the name to 'jenkins' by using the 
> Jenkins infra?"


To re-iterate, building on the jenkins servers is a violation of ASF release 
policy and the PMC pretty much has a duty to vote -1 on any such release.  

http://www.apache.org/dev/release.html#owned-controlled-hardware

--snip--

Must releases be built on hardware owned and controlled by the committer?



Practically speaking, when a release consists of anything beyond an archive 
(e.g., tarball or zip file) of a source control tag, the only practical way to 
validate that archive is to build it locally; manually inspecting generated 
files (especially binary files) is not feasible. So, basically, "Yes".

--snip--

The ASF build servers are multi-user and run many many many untested code 
bases.  It would be extremely easy to inject class files into any running 
compile (Docker-ized or otherwise).   This means that any build that comes from 
those servers should be considered untrusted, especially from a release 
perspective.

For 3.x releases, I rewrote create-release and added the --asfrelease option to 
specifically provide a way for us to get consistent release candidates 
regardless of who builds it.  It should also speed up the release process since 
it also does a lot of the previously manually steps, such as signing.



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.6.5 (RC1)

2016-10-06 Thread Allen Wittenauer

> On Oct 5, 2016, at 10:35 PM, Akira Ajisaka  wrote:
> Can we rename it?
> 
> AFAIK, hadoop releases were built by hortonmu in 2014 and was renamed to 
> jenkins.

That's not how that works.

It's literally storing the id of the person who built the classes.  It 
wasn't 'renamed' to jenkins, prior releases were actually built by and on the 
Jenkins infrastructure. Which was a very very bad idea:  it's insecure and 
pretty much against ASF policy.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2016-09-13 Thread Allen Wittenauer

> On Sep 13, 2016, at 7:31 AM, Apache Jenkins Server 
>  wrote:
> 
> For more details, see 
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/163/
> 
>   unit:
> 
> 
>   
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/163/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-nativetask.txt
>   [124K]

I've got a fix for this in MAPREDUCE-6743. It'd be great if someone 
could review it.

Thanks.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[RESULTS] Re: [VOTE] Merge HADOOP-13341

2016-09-12 Thread Allen Wittenauer

The vote passes with 3 +1 binding votes.

I'll be merging this later today.

Thanks everyone!



> On Sep 7, 2016, at 6:44 AM, Allen Wittenauer  
> wrote:
> 
> 
>   I’d like to call for a vote to run for 5 days (ending  Mon 12, 2016 at 
> 7AM PT) to merge the HADOOP-13341 feature branch into trunk. This branch was 
> developed exclusively by me.  As usual with large shell script changes, it's 
> been broken up into several smaller commits to make it easier to read.  The 
> core of the functionality is almost entirely in hadoop-functions.sh with the 
> majority of the rest of the new additions either being documentation or test 
> code. In addition, large swaths of code is removed from the hadoop, hdfs, 
> mapred, and yarn executables.
> 
>   Here's a quick summary:
> 
> * makes the rules around _OPTS consistent across all the projects
> * makes it possible to provide custom _OPTS for every hadoop, hdfs, mapred, 
> and yarn subcommand
> * with the exception of deprecations, removes all of the custom daemon _OPTS 
> handling sprinkled around the hadoop, hdfs, mapred, and yarn subcommands
> * removes the custom handling handling of HADOOP_CLIENT_OPTS and makes it 
> consistent for non-daemon subcommands
> * makes the _USER blocker consistent with _OPTS as well as providing better 
> documentation around this feature's existence.  Note that this is an 
> incompatible change against -alpha1.
> * by consolidating all of this code, makes it possible to finally fix a good 
> chunk of the "directory name containing spaces blows up the bash code" 
> problems that's been around since the beginning of the project
> 
>   Thanks!
> 
> 
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Merge HADOOP-13341

2016-09-09 Thread Allen Wittenauer

> On Sep 9, 2016, at 2:15 PM, Anu Engineer  wrote:
> 
> +1, Thanks for the effort. It brings in a world of consistency to the hadoop 
> vars; and as usual reading your bash code was very educative.

Thanks!

There's still a handful of HDFS and MAPRED vars that begin with HADOOP, 
but those should be trivial to knock out after a pattern has been established.

> I had a minor suggestion though. since we have classified the _OPTS to client 
> and daemon opts, for new people it is hard to know which of these subcommands 
> are daemon vs. a client command.  Maybe we can add a special char in the help 
> message to indicate which are daemons or just document it? Only way I know 
> right now is to look the appropriate script and see if 
> HADOOP_SUBCMD_SUPPORTDAEMONIZATION is set to true.


That's a great suggestion.  Would it be better if the usage output was 
more like:

---snip---
Usage: hdfs [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]

  OPTIONS is none or any of:

--buildpaths   attempt to add class files from build tree
--config dir   Hadoop config directory
--daemon (start|status|stop)   operate on a daemon
--debugturn on shell script debug mode
--help usage information
--hostnames list[,of,host,names]   hosts to use in worker mode
--hosts filename   list of hosts to use in worker mode
--loglevel level   set the log4j level for this command
--workers  turn on worker mode

  SUBCOMMAND is one of:


Clients:
cacheadmin   configure the HDFS cache
classpathprints the class path needed to get the hadoop jar 
and the required libraries
crypto   configure HDFS encryption zones
...

Daemons:
balancer run a cluster balancing utility
datanode run a DFS datanode
namenode run the DFS name node
...
---snip---

We do something similar in Apache Yetus and shouldn't be too hard to do 
in Apache Hadoop. We couldn't read SUPPORTDAEMONIZATION to place things, but as 
long as people put their new commands in the correct section in hadoop_usage, 
it should work.


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Merge HADOOP-13341

2016-09-08 Thread Allen Wittenauer

> On Sep 8, 2016, at 2:50 AM, Steve Loughran  wrote:
> 
> I'm trying to do the review effort here even though I don't know detailed 
> bash, as I expect I don't know any less than others, and what better way to 
> learn than reviewing code written by people that do know bash? 

Just a heads up that I'm using bash variable references. While not 
exactly rare, they are uncommon.   [We use them in lots of places in the shell 
code already, so no new ground being broken.]  

> Could you submit a PR of that HADOOP-13341 branch, so I can review it there.

Sure.  https://github.com/apache/hadoop/pull/126 has been opened.

Thanks!
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[VOTE] Merge HADOOP-13341

2016-09-07 Thread Allen Wittenauer

I’d like to call for a vote to run for 5 days (ending  Mon 12, 2016 at 
7AM PT) to merge the HADOOP-13341 feature branch into trunk. This branch was 
developed exclusively by me.  As usual with large shell script changes, it's 
been broken up into several smaller commits to make it easier to read.  The 
core of the functionality is almost entirely in hadoop-functions.sh with the 
majority of the rest of the new additions either being documentation or test 
code. In addition, large swaths of code is removed from the hadoop, hdfs, 
mapred, and yarn executables.

Here's a quick summary:

* makes the rules around _OPTS consistent across all the projects
* makes it possible to provide custom _OPTS for every hadoop, hdfs, mapred, and 
yarn subcommand
* with the exception of deprecations, removes all of the custom daemon _OPTS 
handling sprinkled around the hadoop, hdfs, mapred, and yarn subcommands
* removes the custom handling handling of HADOOP_CLIENT_OPTS and makes it 
consistent for non-daemon subcommands
* makes the _USER blocker consistent with _OPTS as well as providing better 
documentation around this feature's existence.  Note that this is an 
incompatible change against -alpha1.
* by consolidating all of this code, makes it possible to finally fix a good 
chunk of the "directory name containing spaces blows up the bash code" problems 
that's been around since the beginning of the project

Thanks!


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

2016-09-01 Thread Allen Wittenauer

> On Sep 1, 2016, at 3:18 PM, Allen Wittenauer  
> wrote:
> 
> 
>> On Sep 1, 2016, at 2:57 PM, Andrew Wang  wrote:
>> 
>> Steve requested a git hash for this release. This led us into a brief
>> discussion of our use of git tags, wherein we realized that although
>> release tags are immutable (start with "rel/"), RC tags are not. This is
>> based on the HowToRelease instructions.
> 
>   We should probably embed the git hash in one of the files that gets gpg 
> signed.  That's an easy change to create-release.


(Well, one more easily accessible than 'hadoop version')
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

2016-09-01 Thread Allen Wittenauer

> On Sep 1, 2016, at 2:57 PM, Andrew Wang  wrote:
> 
> Steve requested a git hash for this release. This led us into a brief
> discussion of our use of git tags, wherein we realized that although
> release tags are immutable (start with "rel/"), RC tags are not. This is
> based on the HowToRelease instructions.

We should probably embed the git hash in one of the files that gets gpg 
signed.  That's an easy change to create-release.




-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[DISCUSS] HADOOP-13341 Merge Request

2016-08-31 Thread Allen Wittenauer

Before requesting a merge vote, I'd like for folks to take a look at 
HADOOP-13341.  This branch changes how the vast majority of the _OPTS variables 
work in various ways, making things easier for devs and users by helping to 
make the rules consistent.  It also clarifies/cleans up how the _USER variables 
work.  Probably worth while pointing out that this work is also required if we 
ever want to make spaces in file paths work properly (see HADOOP-13365, where 
I'm attempting to fix that too... ugh.).

Also, most of the patch is test code, comments, and documentation... so while 
it's a large patch, there's not much actual code. :)

Thanks.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

2016-08-30 Thread Allen Wittenauer

> On Aug 30, 2016, at 2:20 PM, Eric Badger  wrote:
> 
> Well that's embarrassing. I had accidentally slightly renamed my 
> log4j.properties file in my conf directory, so it was there, just not being 
> read.

Nah.  You were just testing out the shell rewrite's ability to detect a 
common error. ;) 

BTW, something else.. instead of doing env|grep HADOOP, you can do 
'hadoop envvars' to get most of the good stuff.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

2016-08-30 Thread Allen Wittenauer

> On Aug 30, 2016, at 2:06 PM, Eric Badger  
> wrote:
> 
> 
> WARNING: log4j.properties is not found. HADOOP_CONF_DIR may be incomplete.

^^


> 
> After running the above command, the RM UI showed a successful job, but as 
> you can see, I did not have anything printed onto the command line. Hopefully 
> this is just a misconfiguration on my part, but I figured that I would point 
> it out just in case.


It gave you a very important message in the output ...


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

2016-08-30 Thread Allen Wittenauer

> On Aug 30, 2016, at 10:17 AM, Zhe Zhang  wrote:
> 
> Thanks Andrew for the great work! It's really exciting to finally see a
> Hadoop 3 RC.
> 
> I noticed CHANGES and RELEASENOTES markdown files which were not in
> previous RCs like 2.7.3. What are good tools to verify them? I tried
> reading them on IntelliJ but format looks odd.


The site tarball has them converted to HTML.  I've also re-run the 
versions that I keep on my gitlab account.  (Since the data comes from JIRA, 
the content should be the same but the format and ordering might be different 
since I use the master branch of Yetus.) 
https://gitlab.com/_a__w_/eco-release-metadata/tree/master/HADOOP/3.0.0-alpha1

It also looks like IntelliJ has a few different markdown plug-ins.  
You'll want one that supports what is generally referred to as MultiMarkdown or 
Github-Flavored Markdown (GFM) since releasedocmaker uses the table extension 
format found in that specification.  (It's an extremely common extension so I'm 
sure one of them supports it.)
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2016-08-26 Thread Allen Wittenauer

> On Aug 26, 2016, at 7:55 AM, Apache Jenkins Server 
>  wrote:
> 
> 
>Failed CTEST tests :
> 
>   test_test_libhdfs_threaded_hdfs_static 
>   test_test_libhdfs_zerocopy_hdfs_static 


Something here likely broke these tests:

[Aug 24, 2016 7:47:52 AM] (aajisaka) HADOOP-13538. Deprecate getInstance and 
initialize methods with Path in
[Aug 24, 2016 1:46:47 PM] (daryn) HDFS-10762. Pass IIP for file status related 
methods
[Aug 24, 2016 1:57:23 PM] (kai.zheng) HDFS-8905. Refactor 
DFSInputStream#ReaderStrategy. Contributed by Kai
[Aug 24, 2016 2:17:05 PM] (kai.zheng) MAPREDUCE-6578. Add support for HDFS 
heterogeneous storage testing to
[Aug 24, 2016 2:40:51 PM] (jlowe) MAPREDUCE-6761. Regression when handling 
providers - invalid
[Aug 24, 2016 5:14:46 PM] (xiao) HADOOP-13396. Allow pluggable audit loggers in 
KMS. Contributed by Xiao
[Aug 24, 2016 8:21:08 PM] (kihwal) HDFS-10772. Reduce byte/string conversions 
for get listing. Contributed
[Aug 25, 2016 1:55:00 AM] (aajisaka) MAPREDUCE-6767. TestSlive fails after a 
common change. Contributed by
[Aug 25, 2016 4:54:57 AM] (aajisaka) HADOOP-13534. Remove unused 
TrashPolicy#getInstance and initialize code.




-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.7.3 RC1

2016-08-17 Thread Allen Wittenauer

Touching the audit log is *extremely* dangerous from a compatibility 
perspective.  It is easily the most machine processed log in Hadoop (with the 
second likely being the fsck log).  In particular, this comment tells me that 
we are almost certainly going to break users:

"Some audit logs ( for non-ACE failures ) will go missing. So this 
change needs to be marked as Incompatible, for heads-up."

If that means what I think it means (the ordering of checks is going to 
make previously logged errors disappear in lieu of other, new messages showing 
up first), that is going to cause massive problems for users who are looking 
for a particular entry. Worse, while the JIRA was marked incompatible, there 
are absolutely zero hints to end users (changes file, release notes) that this 
could potentially break their universe without digging into the comments of 
said JIRA.  That's not a heads up, that's a landmine.

It's also arguable that this is actually a bug fix.  A lot of the 
assumptions made in that JIRA about the audit logs original intent are 
completely wrong. Better yet, a lot of the justification is around another 
unmarked, incompatible change that was introduced in the 2.x timeline.

Even if one disagrees and still views this as a bug fix:  it's still an 
incompatible change.  Users are justifiably angry when we don't warn them about 
breakages and this is a great example of that.  

> On Aug 17, 2016, at 6:15 AM, Junping Du  wrote:
> 
> From my quick understanding, HDFS-9395 is more like a bug fix and improvement 
> for audit logging instead of incompatible changes. We mark incompatible 
> probably because the audit log behavior could be corrected/updated in some 
> exception cases. I think it still belongs to 2.7.3 scope. 
> Kuhu and Kihwal, any comments here?
> 
> 
> Thanks,
> 
> Junping 
> 
> From: Allen Wittenauer 
> Sent: Wednesday, August 17, 2016 5:29 AM
> To: common-...@hadoop.apache.org
> Cc: hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org; 
> mapreduce-dev@hadoop.apache.org
> Subject: Re: [VOTE] Release Apache Hadoop 2.7.3 RC1
> 
> -1
> 
> HDFS-9395 is an incompatible change:
> 
> a) Why is not marked as such in the changes file?
> b) Why is an incompatible change in a micro release, much less a minor?
> c) Where is the release note for this change?
> 
> 
>> On Aug 12, 2016, at 9:45 AM, Vinod Kumar Vavilapalli  
>> wrote:
>> 
>> Hi all,
>> 
>> I've created a release candidate RC1 for Apache Hadoop 2.7.3.
>> 
>> As discussed before, this is the next maintenance release to follow up 2.7.2.
>> 
>> The RC is available for validation at: 
>> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC1/ 
>> <http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>
>> 
>> The RC tag in git is: release-2.7.3-RC1
>> 
>> The maven artifacts are available via repository.apache.org 
>> <http://repository.apache.org/> at 
>> https://repository.apache.org/content/repositories/orgapachehadoop-1045/ 
>> <https://repository.apache.org/content/repositories/orgapachehadoop-1045/>
>> 
>> The release-notes are inside the tar-balls at location 
>> hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I 
>> hosted this at home.apache.org/~vinodkv/hadoop-2.7.3-RC1/releasenotes.html 
>> <http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html> for 
>> your quick perusal.
>> 
>> As you may have noted,
>> - few issues with RC0 forced a RC1 [1]
>> - a very long fix-cycle for the License & Notice issues (HADOOP-12893) 
>> caused 2.7.3 (along with every other Hadoop release) to slip by quite a bit. 
>> This release's related discussion thread is linked below: [2].
>> 
>> Please try the release and vote; the vote will run for the usual 5 days.
>> 
>> Thanks,
>> Vinod
>> 
>> [1] [VOTE] Release Apache Hadoop 2.7.3 RC0: 
>> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/index.html#26106 
>> <https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/index.html#26106>
>> [2]: 2.7.3 release plan: 
>> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html 
>> <http://markmail.org/thread/6yv2fyrs4jlepmmr>
> 
> 
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.7.3 RC1

2016-08-16 Thread Allen Wittenauer


-1

HDFS-9395 is an incompatible change:

a) Why is not marked as such in the changes file?
b) Why is an incompatible change in a micro release, much less a minor?
c) Where is the release note for this change?


> On Aug 12, 2016, at 9:45 AM, Vinod Kumar Vavilapalli  
> wrote:
> 
> Hi all,
> 
> I've created a release candidate RC1 for Apache Hadoop 2.7.3.
> 
> As discussed before, this is the next maintenance release to follow up 2.7.2.
> 
> The RC is available for validation at: 
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC1/ 
> 
> 
> The RC tag in git is: release-2.7.3-RC1
> 
> The maven artifacts are available via repository.apache.org 
>  at 
> https://repository.apache.org/content/repositories/orgapachehadoop-1045/ 
> 
> 
> The release-notes are inside the tar-balls at location 
> hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I hosted 
> this at home.apache.org/~vinodkv/hadoop-2.7.3-RC1/releasenotes.html 
>  for 
> your quick perusal.
> 
> As you may have noted,
> - few issues with RC0 forced a RC1 [1]
> - a very long fix-cycle for the License & Notice issues (HADOOP-12893) caused 
> 2.7.3 (along with every other Hadoop release) to slip by quite a bit. This 
> release's related discussion thread is linked below: [2].
> 
> Please try the release and vote; the vote will run for the usual 5 days.
> 
> Thanks,
> Vinod
> 
> [1] [VOTE] Release Apache Hadoop 2.7.3 RC0: 
> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/index.html#26106 
> 
> [2]: 2.7.3 release plan: 
> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html 
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [Release thread] 2.6.5 release activities

2016-08-15 Thread Allen Wittenauer

> On Aug 12, 2016, at 8:19 AM, Junping Du  wrote:
> 
>  In this community, we are so aggressive to drop Java 7 support in 3.0.x 
> release. Here, why we are so conservative to keep releasing new bits to 
> support Java 6?

I don't view a group of people putting bug fixes into a micro release 
as particularly conservative.  If a group within the community wasn't 
interested in doing it, 2.6.5 wouldn't be happening.

But let's put the releases into context, because I think it tells a 
more interesting story.

* hadoop 2.6.x = EOLed JREs (6,7) 
* hadoop 2.7 -> hadoop 2.x = transitional (7,8)
* hadoop 3.x = JRE 8
* hadoop 4.x = JRE 9 

There are groups of people still using JDK6 and they want bug fixes in 
a maintenance release.  Boom, there's 2.6.x.

Hadoop 3.x has been pushed off for years for "reasons".  So we still 
have releases coming off of branch-2.  If 2.7 had been released as 3.x, this 
chart would look less weird. But it wasn't thus 2.x has this weird wart in the 
middle of that supports JDK7 and JDK8.  Given the public policy and roadmaps of 
at least one major vendor at the time of this writing, we should expect to see 
JDK7 support for at least the next two years after 3.x appears. Bang, there's 
2.x, where x is some large number.

Then there is the future.  People using JRE 8 want to use newer 
dependencies.  A reasonable request. Some of these dependency updates won't 
work with JRE 7.   We can't do that in hadoop 2.x in any sort of compatible way 
without breaking the universe. (Tons of JIRAs on this point.) This means we can 
only do it in 3.x (re: Hadoop Compatibility Guidelines).  Kapow, there's 3.x

The log4j community has stated that v1 won't work with JDK9. In turn, 
this means we'll need to upgrade to v2 at some point.  Upgrading to v2 will 
break the log4j properties file (and maybe other things?). Another incompatible 
change and it likely won't appear until Apache Hadoop v4 unless someone takes 
the initiative to fix it before v3 hits store shelves.  This makes JDK9 the 
likely target for Apache Hadoop v4.  

Having major release cadences tied to JRE updates isn't necessarily a 
bad thing and definitely forces the community to a) actually stop beating 
around the bush on majors and b) actually makes it relatively easy to determine 
what the schedule looks like to some degree.





-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [Release thread] 2.6.5 release activities

2016-08-11 Thread Allen Wittenauer

> On Aug 11, 2016, at 8:10 AM, Junping Du  wrote:
> 
> Allen, to be clear, I am not against any branch release effort here. However,

"I'm not an X but "

> as RM for previous releases 2.6.3 and 2.6.4, I feel to have responsibility to 
> take care branch-2.6 together with other RMs (Vinod and Sangjin) on this 
> branch and understand current gap - especially, to get consensus from 
> community on the future plan for 2.6.x.
> Our bylaw give us freedom for anyone to do release effort, but our bylaw 
> doesn't stop our rights for reasonable question/concern on any release plan. 
> As you mentioned below, people can potentially fire up branch-1 release 
> effort. But if you call a release plan tomorrow for branch-1, I cannot 
> imagine nobody will question on that effort. Isn't it? 

From previous discussions I've seen around releases, I think it 
would depend upon which employee from which vendor raised the question.

> Let's keep discussions on releasing 2.6.5 more technical. IMO, to make 2.6.5 
> release more reasonable, shouldn't we check following questions first?
> 1. Do we have any significant issues that should land on 2.6.5 comparing with 
> 2.6.4?
> 2. If so, any technical reasons (like: upgrade is not smoothly, performance 
> downgrade, incompatibility with downstream projects, etc.) to stop our users 
> to move from 2.6.4 to 2.7.2/2.7.3?
> I believe having good answer on these questions can make our release plan 
> more reasonable to the whole community. More thoughts?

I think these questions are moot though:

* Hadoop 2.6 is the last release to support JDK6.   That sort of ends any 
questions around moving to 2.7. 

* There are always bugs in software that can benefit from getting fixes.  Given 
the JDK6 issue, yes, of course there are reasons why someone may want a 2.6.5.

* If a company/vendor is willing to fund people to work on a release, I'd much 
rather they do that work in the ASF than off on their own somewhere.  This way 
the community as a whole benefits.



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [Release thread] 2.6.5 release activities

2016-08-11 Thread Allen Wittenauer

> On Aug 11, 2016, at 5:59 AM, Junping Du  wrote:
> 
>  These comments are more like wishes but not giving more clarifications 
> on the needs. I would like to hear more specific reasons to not move to 2.7.x 
> releases but prefer to upgrade to 2.6.5. If the only reason is about 
> expectation management, I think we should claim 2.6.5 is the last branch-2.6 
> release after this release work, otherwise people would expect us to maintain 
> this branch forever which is impossible and unnecessary. Thoughts?

The bylaws[*] are such that if community members want to spend their 
time working on a branch, there isn't much to prevent that other than the PMC 
voting down the release of that branch or removing the committers working on 
that branch.  As has been pointed out to me many times, one can't dictate where 
others spend their volunteer time.  If they want to spend their efforts on 
branch-2.6, they can.  If that comes at the detriment of releases around 
branch-2.7 or branch-2.8 or even trunk, then so be it. Technically, someone 
could still fire up a branch-1 release.  Given the numbers of committers and 
PMC members as listed on the main ASF website (not the list on project one), we 
should have more than enough people to do all this work anyway.

* - of course, there's a few bylaws that aren't really enforced, so maybe even 
this isn't true?
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] The order of classpath isolation work and updating/shading dependencies on trunk

2016-07-25 Thread Allen Wittenauer

> On Jul 25, 2016, at 1:16 PM, Sangjin Lee  wrote:
> 
> Also:  right now, the non-Linux and/or non-x86 platforms have to supply their 
> own leveldbjni jar (or at least the C level library?) in order to make YARN 
> even functional.  How is that going to work with the class path manipulation?
> 
> First, the native libraries are orthogonal to this. They're not governed by 
> the java classpath.
> 
> For those platforms where users/admins need to provide their own LevelDB 
> libraries, the only requirement would be to add them to the 
> share/hadoop/.../lib directory. I don't think we would ask end users of the 
> clusters to bring in their own LevelDB library as it would not be an end-user 
> concern. I assume the administrators of clusters (still users but not end 
> users) would add it to the clusters. The classpath isolation doesn't really 
> have an impact on that.
> 

$ jar tf leveldbjni-all-1.8.jar | grep native
META-INF/native/
META-INF/native/linux32/
META-INF/native/linux32/libleveldbjni.so
META-INF/native/linux64/
META-INF/native/linux64/libleveldbjni.so
META-INF/native/osx/
META-INF/native/osx/libleveldbjni.jnilib
META-INF/native/windows32/
META-INF/native/windows32/leveldbjni.dll
META-INF/native/windows64/
META-INF/native/windows64/leveldbjni.dll



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] The order of classpath isolation work and updating/shading dependencies on trunk

2016-07-22 Thread Allen Wittenauer

> On Jul 22, 2016, at 5:47 PM, Zheng, Kai  wrote:
> 
> For the leveldb thing, wouldn't we have an alternative option in Java for the 
> platforms where leveldb isn't supported yet due to whatever reasons. IMO, 
> native library would be best to be used for optimization and production for 
> performance. For development and pure Java platform, by default pure Java 
> approach should still be provided and used. That is to say, if no Hadoop 
> native is used, all the functionalities should still work and not break. 

Yes and no.  I can certainly understand some high-end features being 
tied to native libraries, simply because system programming with Java is like 
being a touch typist with your nose.  

That said, absolutely key functionality should definitely work. Take a 
look at the last Linux/ppc64le report that was emailed to these very lists a 
few days ago [1]:


https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/30/artifact/out/console-report.html

Almost all of those YARN failures are due to MiniYARN trying to 
initiate leveldb as part of the service startup but can't because the embedded 
shared library is the wrong hardware architecture. Rather than catch the 
exception and do something else, the code just blows up in a very dramatic 
fashion. That should translate into YARN is completely busted and unusable 
without doing some very weird workarounds.

To get us back on topic:  the class path isolation work absolutely 
cannot make this situation worse.  We either need to make sure end users can 
replace/modify Hadoop's dependencies if they require native lirbaries or work 
harder on making multiplatform stuff better supported.  The nightly PowerPC 
builds should help tremendously towards this goal. [2]

1 - While I greatly appreciate the OpenPOWER Foundation getting the ASF access 
to these boxes -- Mesos and Hadoop are both actively using them -- It'd be 
great if they were more reliable so we could get a report every day of the 
week. :(

2 - At some point, I'll set up a manually triggered precommit job to test 
patches.  But until both boxes are online and available on a consistent basis, 
it just isn't worth the effort.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Setting JIRA fix versions for 3.0.0 releases

2016-07-22 Thread Allen Wittenauer

> On Jul 22, 2016, at 7:16 PM, Andrew Wang  wrote:
> 
> Does this mean you find our current system of listing a JIRA as being fixed 
> in both a 2.6.x and 2.7.x to be confusing?

Nope.  I'm only confused when there isn't a .0 release in the fix line. 
 When I see 2.6.x and 2.7.x I know that it was back ported to those branches.  
If I don't see a .0, I figure it's either a mistake or something that was 
already fixed by another change in that major/minor branch.  It's almost always 
the former, however.

> FWIW, my usecase is normally not "what is the earliest release that has this 
> fix?" but rather "is this fix in this release?". If it's easy to query the 
> latter, you can also determine the former. Some kind of query tool could help 
> here.

It literally becomes a grep if people commit the release data into the 
source tree, the release data is correct, etc:

$ mvn install site  -Preleasedocs -Pdocs -DskipTests
$ grep issueid 
hadoop-common-project/hadoop-common/src/site/markdown/release/*/CHANGES*

We should probably update the release process to make sure that *in 
progress* release data is also committed when a .0 is cut.  That's likely 
missing. Another choice would be to modify the pom to that runs releasedocmaker 
to use a range rather than single version, but that gets a bit tricky with 
release dates, how big of a range, etc.  Not impossible, just tricky.  Probably 
needs to be script that gets run as part of create-release, maybe?

(In reality, I do this grep against my git repo that generates the 
change log data automatically.  This way it is always up-to-date and not 
dependent upon release data being committed.  But that same grep could be done 
with a JQL query just as easily.)

> For the release notes, am I correct in interpreting this as:
> 
> * diff a.0.0 from the previous x.y.0 release
> * diff a.b.0  from the previous a.0.0 or a.b.0 release
> * diff a.b.c from the previous a.b.0 or a.b.c release

Pretty much yes.

> Ray pointed me at the changelogs of a few other enterprise software products, 
> and this strategy seems pretty common. I like it.

It's extremely common, to the point that putting every fix for every 
release touched is, at least to me, weird and extremely unconventional.

> I realize now that this means a lot more JIRAs will need the 2.8.0 fix 
> version, since they only have 2.6.x and 2.7.x.

Yup.

>   This makes the fix rules actually pretty easy:  the lowest a.b.0 release 
> and all non-.0 releases.
> 
> I think this needs to be amended to handle the case of multiple major release 
> branches, since we could have something committed for both 2.9.0 and 3.1.0. 
> So "lowest a.b.0 release within each major version"?

Yeah, switching to effectively trunk-based development makes the rules 
harder.  It's one of the reasons why the two big enterprisey companies I worked 
at prior to working on Hadoop didn't really do trunk-based for the vast 
majority of projects.  They always cut a branch (or equivalent for that SCM) to 
delineate a break.   Given the amount of ex-Sun folks involved in the early 
days of Hadoop, our pre-existing development processes very much reflect that 
culture.

> This was true previously (no releases from trunk, trunk is versioned a.0.0), 
> but now that trunk is essentially a minor release branch, its fix version 
> needs to be treated as such.

Yeah, I misspoke a bit when dealing with a head-of-tree model.  
3.0.0-alpha1 will generate different notes than 3.0.0-alpha2, obviously. Every 
3.0.0-(label) release is effectively a major version in that case.



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] The order of classpath isolation work and updating/shading dependencies on trunk

2016-07-22 Thread Allen Wittenauer

But if I don't use ApplicationClassLoader, my java app is basically screwed 
then, right?

Also:  right now, the non-Linux and/or non-x86 platforms have to supply their 
own leveldbjni jar (or at least the C level library?) in order to make YARN 
even functional.  How is that going to work with the class path manipulation?


> On Jul 22, 2016, at 9:57 AM, Sangjin Lee  wrote:
> 
> The work on HADOOP-13070 and the ApplicationClassLoader are generic and go 
> beyond YARN. It can be used in any JVM that uses hadoop. The current use 
> cases are MR containers, hadoop's RunJar (as in "hadoop jar"), and the YARN 
> node manager auxiliary services. I'm not sure if that's what you were asking, 
> but I hope it helps.
> 
> Regards,
> Sangjin
> 
> On Fri, Jul 22, 2016 at 9:16 AM, Sean Busbey  wrote:
> My work on HADOOP-11804 *only* helps processes that sit outside of YARN. :)
> 
> On Fri, Jul 22, 2016 at 10:48 AM, Allen Wittenauer
>  wrote:
> >
> > Does any of this work actually help processes that sit outside of YARN?
> >
> >> On Jul 21, 2016, at 12:29 PM, Sean Busbey  wrote:
> >>
> >> thanks for bringing this up! big +1 on upgrading dependencies for 3.0.
> >>
> >> I have an updated patch for HADOOP-11804 ready to post this week. I've
> >> been updating HBase's master branch to try to make use of it, but
> >> could use some other reviews.
> >>
> >> On Thu, Jul 21, 2016 at 4:30 AM, Tsuyoshi Ozawa  wrote:
> >>> Hi developers,
> >>>
> >>> I'd like to discuss how to make an advance towards dependency
> >>> management in Apache Hadoop trunk code since there has been lots work
> >>> about updating dependencies in parallel. Summarizing recent works and
> >>> activities as follows:
> >>>
> >>> 0) Currently, we have merged minimum update dependencies for making
> >>> Hadoop JDK-8 compatible(compilable and runnable on JDK-8).
> >>> 1) After that, some people suggest that we should update the other
> >>> dependencies on trunk(e.g. protobuf, netty, jackthon etc.).
> >>> 2) In parallel, Sangjin and Sean are working on classpath isolation:
> >>> HADOOP-13070, HADOOP-11804 and HADOOP-11656.
> >>>
> >>> Main problems we try to solve in the activities above is as follows:
> >>>
> >>> * 1) tries to solve dependency hell between user-level jar and
> >>> system(Hadoop)-level jar.
> >>> * 2) tries to solve updating old libraries.
> >>>
> >>> IIUC, 1) and 2) looks not related, but it's related in fact. 2) tries
> >>> to separate class loader between client-side dependencies and
> >>> server-side dependencies in Hadoop, so we can the change policy of
> >>> updating libraries after doing 2). We can also decide which libraries
> >>> can be shaded after 2).
> >>>
> >>> Hence, IMHO, a straight way we should go to is doing 2 at first.
> >>> After that, we can update both client-side and server-side
> >>> dependencies based on new policy(maybe we should discuss what kind of
> >>> incompatibility is acceptable, and the others are not).
> >>>
> >>> Thoughts?
> >>>
> >>> Thanks,
> >>> - Tsuyoshi
> >>>
> >>> -
> >>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> >>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >>>
> >>
> >>
> >>
> >> --
> >> busbey
> >>
> >> -
> >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >
> 
> 
> 
> --
> busbey
> 
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Setting JIRA fix versions for 3.0.0 releases

2016-07-22 Thread Allen Wittenauer

From the perspective of an end user who is reading multiple versions' 
listings at once, listing the same JIRA being fixed in multiple releases is 
totally confusing, especially now that release notes are actually readable.  
"So which version was it ACTUALLY fixed in?" is going to be the question. It'd 
be worthwhile for folks to actually build, say, trunk and look at the release 
notes section of the site build to see how these things are presented in 
aggregate before coming to any conclusions.  Just viewing a single version's 
output will likely give a skewed perspective.  (Or, I suppose you can read 
https://gitlab.com/_a__w_/eco-release-metadata/tree/master/HADOOP too, but the 
sort order is "wrong" for web viewing.)

My read of the HowToCommit fix rules is that they were written from the 
perspective of how we typically use branches to cut releases. In other words, 
the changes and release notes for 2.6.x, where x>0, 2.7.y, where y>0, will 
likely not be fully present/complete in 2.8.0 so wouldn't actually reflect the 
entirety of, say, the 2.7.4 release if 2.7.4 and 2.8.0 are being worked in 
parallel.   This in turn means the changes and release notes become orthogonal 
once the minor release branch is cut. This is also important because there is 
no guarantee that a change made in, say, 2.7.4 is actually in 2.8.0 because the 
code may have changed to the point that the fix isn't needed or wanted.

From an automation perspective, I took the perspective that this means 
that the a.b.0 release notes are expected to be committed to all non-released 
major branches.  So trunk will have release notes for 2.7.0, 2.8.0, 2.9.0, etc 
but not from 2.7.1, 2.8.1, or 2.9.1.  This makes the fix rules actually pretty 
easy:  the lowest a.b.0 release and all non-.0 releases.  trunk, as always, is 
only listed if that is the only place where it was committed. (i.e., the lowest 
a.b.0 release happens to be the highest one available.)

I suspect people are feeling confused or think the rules need to be 
changed mainly because a) we have a lot more branches getting RE work than ever 
before in Hadoop's history and b) 2.8.0 has been hanging out in an unreleased 
branch for ~7 months.  [The PMC should probably vote to kill that branch and 
just cut a new 2.8.0 based off of the current top of branch-2. I think that'd 
go a long way to clearing the confusion as well as actually making 2.8.0 
relevant again for those that still want to work on branch-2.]

Also:

> Assuming the semantic versioning (http://semver.org) as
> our baseline thinking, 

We don't use semantic versioning and you'll find zero references to it 
in any Apache Hadoop documentation.  If we were following semver, even in the 
loosest sense, 2.7.0 should have been 3.0.0 with the JRE upgrade requirement. 
(which, ironically, is still causing issues with folks moving things between 
2.6 and 2.7+, see the other thread about the Dockerfile.) In a stricter sense, 
we should be on v11 or something, given the amount of incompatible changes 
throughout branch-2's history.


> On Jul 22, 2016, at 11:44 AM, Andrew Wang  wrote:
> 
>> 
>> 
>>> I am also not quite sure I understand the rationale of what's in the
>> HowToCommit wiki. Assuming the semantic versioning (http://semver.org) as
>> our baseline thinking, having concurrent release streams alone breaks the
>> principle. And that is *regardless of* how we line up individual releases
>> in time (2.6.4 v. 2.7.3). Semantic versioning means 2.6.z < 2.7.* where *
>> is any number. Therefore, the moment we have any new 2.6.z release after
>> 2.7.0, the rule is broken and remains that way. Timing of subsequent
>> releases is somewhat irrelevant.
>> 
>> From a practical standpoint, I would love to know whether a certain patch
>> has been backported to a specific version. Thus, I would love to see fix
>> version enumerating all the releases that the JIRA went into. Basically the
>> more disclosure, the better. That would also make it easier for us
>> committers to see the state of the porting and identify issues like being
>> ported to 2.6.x but not to 2.7.x. What do you think? Should we revise our
>> policy?
>> 
>> 
> I also err towards more fix versions. Based on our branching strategy of
> branch-x -> branch-x.y -> branch->x.y.z, I think this means that the
> changelog will identify everything since the previous
> last-version-component of the branch name. So 2.6.5 diffs against 2.6.4,
> 2.8.0 diffs against 2.7.0, 3.0.0 against 2.0.0. This makes it more
> straightforward for users to determine what changelogs are important, based
> purely on the version number.
> 
> I agree with Sangjin that the #1 question that the changelogs should
> address is whether a certain patch is present in a version. For this
> usecase, it's better to have duplicate info than to omit something.
> 
> To answer "what's new", I think that's answered by the manually curated
> release notes, like 

Re: [DISCUSS] The order of classpath isolation work and updating/shading dependencies on trunk

2016-07-22 Thread Allen Wittenauer

Does any of this work actually help processes that sit outside of YARN?

> On Jul 21, 2016, at 12:29 PM, Sean Busbey  wrote:
> 
> thanks for bringing this up! big +1 on upgrading dependencies for 3.0.
> 
> I have an updated patch for HADOOP-11804 ready to post this week. I've
> been updating HBase's master branch to try to make use of it, but
> could use some other reviews.
> 
> On Thu, Jul 21, 2016 at 4:30 AM, Tsuyoshi Ozawa  wrote:
>> Hi developers,
>> 
>> I'd like to discuss how to make an advance towards dependency
>> management in Apache Hadoop trunk code since there has been lots work
>> about updating dependencies in parallel. Summarizing recent works and
>> activities as follows:
>> 
>> 0) Currently, we have merged minimum update dependencies for making
>> Hadoop JDK-8 compatible(compilable and runnable on JDK-8).
>> 1) After that, some people suggest that we should update the other
>> dependencies on trunk(e.g. protobuf, netty, jackthon etc.).
>> 2) In parallel, Sangjin and Sean are working on classpath isolation:
>> HADOOP-13070, HADOOP-11804 and HADOOP-11656.
>> 
>> Main problems we try to solve in the activities above is as follows:
>> 
>> * 1) tries to solve dependency hell between user-level jar and
>> system(Hadoop)-level jar.
>> * 2) tries to solve updating old libraries.
>> 
>> IIUC, 1) and 2) looks not related, but it's related in fact. 2) tries
>> to separate class loader between client-side dependencies and
>> server-side dependencies in Hadoop, so we can the change policy of
>> updating libraries after doing 2). We can also decide which libraries
>> can be shaded after 2).
>> 
>> Hence, IMHO, a straight way we should go to is doing 2 at first.
>> After that, we can update both client-side and server-side
>> dependencies based on new policy(maybe we should discuss what kind of
>> incompatibility is acceptable, and the others are not).
>> 
>> Thoughts?
>> 
>> Thanks,
>> - Tsuyoshi
>> 
>> -
>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>> 
> 
> 
> 
> -- 
> busbey
> 
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6743) nativetask unit tests need to provide usable output

2016-07-22 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created MAPREDUCE-6743:
---

 Summary: nativetask unit tests need to provide usable output
 Key: MAPREDUCE-6743
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6743
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: nativetask
Affects Versions: 3.0.0-alpha1
Reporter: Allen Wittenauer


Currently, hadoop-mapreduce-client-nativetask creates a nttest binary which 
provides an binary exit code to determine failure.  This means there is no 
output generated by the Jenkins run to actually debug or provide hints as to 
what failed.  Given that nttest is written with gtest, it should be configured 
to either spit out junit or TAP which can then be used to provide further 
analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6742) Test

2016-07-22 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved MAPREDUCE-6742.
-
Resolution: Not A Problem

> Test
> 
>
> Key: MAPREDUCE-6742
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6742
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Prashanth G B
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DICUSS] Upgrading Guice to 4.0(HADOOP-12064)

2016-06-29 Thread Allen Wittenauer

> On Jun 29, 2016, at 10:16 AM, Tsuyoshi Ozawa  wrote:
> 
> No objections here?


I talked to a handful of non-committer ISV-type folks yesterday and the general 
consensus was that there is an expectation that 3.x will have all the 
dependencies updated to something relatively modern if we can't either remove 
from the client path[*] or shade them.  So I think it's a safe bet that there 
basically won't be any major complaints for upgrades.


* - We should be able to use the same trick that is happening in hadoop-tools 
to do this (effectively lazy loading). I'm not sure it's worth the 
time/effort/politics to make that sort of a change.
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2016-06-18 Thread Allen Wittenauer



> On Jun 17, 2016, at 7:04 AM, Apache Jenkins Server 
>  wrote:
> 
> For more details, see 
> https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/66/
> 

I suspect this change:

> [Jun 16, 2016 11:17:06 AM] (vinayakumarb) HDFS-10256. Use 
> GenericTestUtils.getTestDir method in tests for

broke these tests:

> Specific tests:
> 
>Failed CTEST tests :
> 
>   test_test_libhdfs_threaded_hdfs_static 
>   test_test_libhdfs_zerocopy_hdfs_static 
>   test_test_native_mini_dfs 



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



ASF OS X Build Infrastructure

2016-05-19 Thread Allen Wittenauer

Some of you may not know that the ASF actually does have an OS X 
machine (a Mac mini, so it’s not a speed demon) in the build infrastructure.  
While messing around with getting all? of the trunk jobs reconfigured to do 
Java 8 and separate maven repos, I noticed that this box tends to sit idle most 
of the day. Why take advantage of it?  Therefore, I also setup two jobs for us 
to use to help alleviate the “I don’t have access to anything but Linux” excuse 
when writing code that may not work in a portable manner.

Jobs #1:

https://builds.apache.org/view/H-L/view/Hadoop/job/Precommit-HADOOP-OSX

This basically runs Apache Yetus precommit with quite a few of the 
unnecessary tests disabled.  For example, there’s no point in running 
checkstyle.  Note that this job takes the *full* JIRA issue id as input.  So 
‘HADOOP-9902’ not ‘9902’.  This allows for one Jenkins job to be used for all 
the Hadoop sub-projects (HADOOP, HDFS, MR, YARN).  “But my code is on github 
and I don’t want to upload a patch!”  I haven’t tested it, but it should also 
take a URL, so just add a .diff to the end of your github compare URL and put 
that in the issue box.  It hypothetically should work.

Job #2:

I’m still hammering on this one because the email notifications aren’t 
working to my satisfaction plus we have some extremely Linux-specific code in 
YARN… but 


https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-trunk-osx-java8/

… is a “build the world” job similar to what is currently running under 
the individual sub projects.  (This actually makes it one of the few “build 
everything” jobs we have running. Most of the other jobs only build that 
particular sub project.).  It does not run the full unit test suite and it also 
does not build all of the native code.  This gives us a place to start on our 
journey of making Hadoop actually, truly run everywhere.  (Interesting side 
note: It’s been *extremely* consistent in what fails vs. the Linux build hosts.)

At some point, likely after YETUS-390 is complete, I’ll switch this job 
over to be run by Apache Yetus in qbt mode so that it’s actually easier to 
track failures across all dirs.  A huge advantage over raw maven commands.

Happy testing everyone.

NOTE: if you don’t have access to launch jobs on builds.apache.org, 
you’ll need to send a request to private@.  The Apache Hadoop PMC has the keys 
to give access to folks.



-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: Hadoop CI with alternate architectures.

2016-05-18 Thread Allen Wittenauer


That’s really a question for infrastructure-...@apache.org . They 
manage the ASF build infrastructure which Apache Hadoop and lots of other 
projects utilize.  (Bigtop uses something custom, which I think is funded by 
Cloudera.)

Once it is registered with builds.apache.org, it’s just a matter of us 
enabling a Jenkins job for it. I suspect we’ll likely do it like we have done 
other architectures in the past: set up a nightly job that people will need to 
cognizant of.  Adding it to precommit would likely be too much and/or require 
some significant Yetus work to support reports from multiple hosts.

FWIW:

* We have a Mac mini which, ironically, I thought I’d see if I 
could get a nightly job running on while I type this.
* We have a dedicated Windows box which appears to be 
completely screwed up. 
* In the past, I think it was IBM that hinted that they would 
be willing to give ASF a PowerPC box but that offer seems to have disappeared.

> On May 18, 2016, at 7:52 AM, Tsuyoshi Ozawa  wrote:
> 
> Hi Asanjar,
> 
> Thanks for your contribution! I'm ashamed to say, but I don't know how
> to change the build machine.
> 
> Adding "to" Allen, who is a PMC of Yetus project. Hey Allen, do you
> know how to add Power based Jenkins slave(s) to Apache Hadoop CI?
> 
> Thanks,
> - Tsuyoshi
> 
> On Wed, May 18, 2016 at 11:18 PM, MrAsanjar .  wrote:
>> moving up.. :)
>> 
>> On Thu, May 12, 2016 at 12:45 AM, MrAsanjar .  wrote:
>> 
>>> I am writing this email to reduce mishaps similar to the issue reported by
>>> JIRA https://issues.apache.org/jira/browse/HADOOP-11505. In a nutshell,
>>> an x86 specific
>>> performance enhancement broke Hadoop build on Power and SPARC architecture.
>>> To avoid similar issues in future, could I offer my help here, as a
>>> OpenPOWER foundation member.
>>> For example, we could contribute a Power based Jenkins slave(s) to Apache
>>> Hadoop CI. As we  have successfully done similar contribution to Apache
>>> Bigtop CI in past
>>> https://ci.bigtop.apache.org/computer/docker-slave-ppc-1/. Thus, we could
>>> catch any regressions earlier in Hadoop  development cycle. I'd appreciate
>>> community's guidance on this.
>>> 
> 
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [DISCUSS] Set minimum version of Hadoop 3 to JDK8 (HADOOP-11858)

2016-05-17 Thread Allen Wittenauer

Thanks to Karthik’s +1, HADOOP-13161 was committed last night and appears to be 
working.

I’ve got almost the entire list of stuff finished.  I’m now going to go 
through the java8 jobs and actually sync them up, since they all do slightly 
different things. :/


> On May 16, 2016, at 8:02 PM, Allen Wittenauer  wrote:
> 
>   OK, it looks like if someone commits HADOOP-13161, then trunk only uses 
> JDK8 and branch-2 will use JDK7 and JDK8 during precommit with no changes 
> required to Apache Yetus. :D
> 
> 
>> On May 16, 2016, at 5:38 PM, Allen Wittenauer 
>>  wrote:
>> 
>> 
>> There’s a bunch of stuff that needs to happen at the Jenkins level:
>> 
>> * Kill off the JDK7 trunk builds for HADOOP, HDFS, MAPRED, YARN
>> * Remove JDK7 from pre-commit for HADOOP, HDFS, MAPRED, YARN
>> 
>> One thing that needs to happen in the Apache Yetus project:
>> * Wait until YETUS-369 has been written and committed to re-enable JDK7 for 
>> pre-commit  (This effectively means that *ALL* JDK7 testing will *ONLY* be 
>> happening in the regularly scheduled builds)
>> 
>> One thing that really should happen in the Apache Hadoop project:
>> * Remove JDK7 from trunk Dockerfile
>> 
>> I’ll start banging on this stuff over the next few days.
>> 
>> 
>>> On May 16, 2016, at 3:58 PM, Andrew Wang  wrote:
>>> 
>>> Very happy to announce that we've committed HADOOP-11858. I'm looking
>>> forward to writing my first lambda in Java. I also attached a video to the
>>> JIRA so we can all relive this moment in Hadoop development history.
>>> 
>>> It sounds like there's some precommit work to align test-patch with this
>>> change. I'm hoping Allen will take point on this, but ping me if I can be
>>> of any assistance.
>>> 
>>> On Thu, May 12, 2016 at 11:53 AM, Li Lu  wrote:
>>> 
>>>> I’d like to bring YARN-4977 into attention for using Java 8. HADOOP-13083
>>>> does the maven change and in yarn-api there are ~5000 javadoc warnings.
>>>> 
>>>> Li Lu
>>>> 
>>>>> On May 10, 2016, at 08:32, Akira AJISAKA 
>>>> wrote:
>>>>> 
>>>>> Hi developers,
>>>>> 
>>>>> Before cutting 3.0.0-alpha RC, I'd like to drop JDK7 support in trunk.
>>>>> Given this is a critical change, I'm thinking we should get the
>>>> consensus first.
>>>>> 
>>>>> One concern I think is, when the minimum version is set to JDK8, we need
>>>> to configure Jenkins to disable multi JDK test only in trunk.
>>>>> 
>>>>> Any thoughts?
>>>>> 
>>>>> Thanks,
>>>>> Akira
>>>>> 
>>>>> -
>>>>> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
>>>>> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   >