Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Eli Collins
On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.comwrote:

 On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:

 
 
  For the sake of this discussion we should separate the runtime from
  the programming APIs. Users are already migrating to the java7 runtime
  for most of the reasons listed below (support, performance, bugs,
  etc), and the various distributions cert their Hadoop 2 based
  distributions on java7.  This gives users many of the benefits of
  java7, without forcing users off java6. Ie Hadoop does not need to
  switch to the java7 programming APIs to make sure everyone has a
  supported runtime.
 
 
 +1: you can use Java 7 today; I'm not sure how tested Java 8 is


  The question here is really about when Hadoop, and the Hadoop
  ecosystem (since adjacent projects often end up in the same classpath)
  start using the java7 programming APIs and therefore break
  compatibility with java6 runtimes. I think our java6 runtime users
  would consider dropping support for their java runtime in an update of
  a major release to be an incompatible change (the binaries stop
  working on their current jvm).


 do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?


I mean 2.x -- 2.(x+1).  Ie I'm running the 2.4 stable and upgrading to 2.5.




  That may be worth it if we can
  articulate sufficient value to offset the cost (they have to upgrade
  their environment, might make rolling upgrades stop working, etc), but
  I've not yet heard an argument that articulates the value relative to
  the cost.  Eg upgrading to the java7 APIs allows us to pull in
  dependencies with new major versions, but only if those dependencies
  don't break compatibility (which is likely given that our classpaths
  aren't so isolated), and, realistically, only if the entire Hadoop
  stack moves to java7 as well




  (eg we have to recompile HBase to
  generate v1.7 binaries even if they stick on API v1.6). I'm not aware
  of a feature, bug etc that really motivates this.
 
  I don't see that being needed unless we move up to new java7+ only
 libraries and HBase needs to track this.

  The big recompile to work issue is google guava, which is troublesome
 enough I'd be tempted to say can we drop it entirely



  An alternate approach is to keep the current stable release series
  (v2.x) as is, and start using new APIs in trunk (for v3). This will be
  a major upgrade for Hadoop and therefore an incompatible change like
  this is to be expected (it would be great if this came with additional
  changes to better isolate classpaths and dependencies from each
  other). It allows us to continue to support multiple types of users
  with different branches, vs forcing all users onto a new version. It
  of course means that 2.x users will not get the benefits of the new
  API, but its unclear what those benefits are given theIy can already
  get the benefits of adopting the newer java runtimes today.
 
 
 
 I'm (personally) +1 to this, I also think we should plan to do the switch
 some time this year to not only get the benefits, but discover the costs



Agree



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Eli Collins
On Thu, Apr 10, 2014 at 6:49 AM, Raymie Stata rst...@altiscale.com wrote:

 I think the problem to be solved here is to define a point in time
 when the average Hadoop contributor can start using Java7 dependencies
 in their code.

 The use Java7 dependencies in trunk(/branch3) plan, by itself, does
 not solve this problem.  The average Hadoop contributor wants to see
 their contributions make it into a stable release in a predictable
 amount of time.  Putting code with a Java7 dependency into trunk means
 the exact opposite: there is no timeline to a stable release.  So most
 contributors will stay away from Java7 dependencies, despite the
 nominal policy that they're allowed in trunk.  (And the few that do
 use Java7 dependencies are people who do not value releasing code into
 stable releases, which arguably could lead to a situation that the
 Java7-dependent code in trunk is, on average, on the buggy side.)

 I'm not saying the branch2-in-the-future plan is the only way to
 solve the problem of putting Java7 dependencies on a known time-table,
 but at least it solves it.  Is there another solution?


All good reasons for why we should start thinking about a plan for v3. The
points above pertain to any features for trunk that break compatibility,
not just ones that use new Java APIs.  We shouldn't permit incompatible
changes to merge to v2 just because we don't yet have a timeline for v3, we
should figure out the latter. Also motivates finishing the work to isolate
dependencies between Hadoop code, other framework code, and user code.

Let's speak less abstractly, are there particular features or new
dependencies that you would like to contribute (or see contributed) that
require using the Java 1.7 APIs?  Breaking compat in v2 or rolling a v3
release are both non-trivial, not something I suspect we'd want to do just
because it would be, for example, nicer to have a newer version of Jetty.

Thanks,
Eli







 On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.com
 wrote:
  On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:
 
 
 
  For the sake of this discussion we should separate the runtime from
  the programming APIs. Users are already migrating to the java7 runtime
  for most of the reasons listed below (support, performance, bugs,
  etc), and the various distributions cert their Hadoop 2 based
  distributions on java7.  This gives users many of the benefits of
  java7, without forcing users off java6. Ie Hadoop does not need to
  switch to the java7 programming APIs to make sure everyone has a
  supported runtime.
 
 
  +1: you can use Java 7 today; I'm not sure how tested Java 8 is
 
 
  The question here is really about when Hadoop, and the Hadoop
  ecosystem (since adjacent projects often end up in the same classpath)
  start using the java7 programming APIs and therefore break
  compatibility with java6 runtimes. I think our java6 runtime users
  would consider dropping support for their java runtime in an update of
  a major release to be an incompatible change (the binaries stop
  working on their current jvm).
 
 
  do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?
 
 
  That may be worth it if we can
  articulate sufficient value to offset the cost (they have to upgrade
  their environment, might make rolling upgrades stop working, etc), but
  I've not yet heard an argument that articulates the value relative to
  the cost.  Eg upgrading to the java7 APIs allows us to pull in
  dependencies with new major versions, but only if those dependencies
  don't break compatibility (which is likely given that our classpaths
  aren't so isolated), and, realistically, only if the entire Hadoop
  stack moves to java7 as well
 
 
 
 
  (eg we have to recompile HBase to
  generate v1.7 binaries even if they stick on API v1.6). I'm not aware
  of a feature, bug etc that really motivates this.
 
  I don't see that being needed unless we move up to new java7+ only
  libraries and HBase needs to track this.
 
   The big recompile to work issue is google guava, which is troublesome
  enough I'd be tempted to say can we drop it entirely
 
 
 
  An alternate approach is to keep the current stable release series
  (v2.x) as is, and start using new APIs in trunk (for v3). This will be
  a major upgrade for Hadoop and therefore an incompatible change like
  this is to be expected (it would be great if this came with additional
  changes to better isolate classpaths and dependencies from each
  other). It allows us to continue to support multiple types of users
  with different branches, vs forcing all users onto a new version. It
  of course means that 2.x users will not get the benefits of the new
  API, but its unclear what those benefits are given theIy can already
  get the benefits of adopting the newer java runtimes today.
 
 
 
  I'm (personally) +1 to this, I also think we should plan to do the switch
  some time this year to not only get the benefits, but discover

Re: Plans of moving towards JDK7 in trunk

2014-04-09 Thread Eli Collins
 to branch-2. I guess this all
  depends on when we see ourselves shipping Hadoop-3. Any ideas on that?
 
 
  On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote:
 
   On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
   davi.ottenhei...@emc.com wrote:
From: Eli Collins [mailto:e...@cloudera.com]
Sent: Monday, April 07, 2014 11:54 AM
   
   
IMO we should not drop support for Java 6 in a minor update of a
  stable
release (v2).  I don't think the larger Hadoop user base would
 find it
acceptable that upgrading to a minor update caused their systems to
  stop
working because they didn't upgrade Java. There are people still
  getting
support for Java 6. ...
   
Thanks,
Eli
   
Hi Eli,
   
Technically you are correct those with extended support get critical
   security fixes for 6 until the end of 2016. I am curious whether many
 of
   those are in the Hadoop user base. Do you know? My guess is the vast
   majority are within Oracle's official public end of life, which was
 over
  12
   months ago. Even Premier support ended Dec 2013:
   
http://www.oracle.com/technetwork/java/eol-135779.html
   
The end of Java 6 support carries much risk. It has to be
 considered in
   terms of serious security vulnerabilities such as CVE-2013-2465 with
 CVSS
   score 10.0.
   
http://www.cvedetails.com/cve/CVE-2013-2465/
   
Since you mentioned caused systems to stop as an example of what
  would
   be a concern to Hadoop users, please note the CVE-2013-2465
 availability
   impact:
   
Complete (There is a total shutdown of the affected resource. The
   attacker can render the resource completely unavailable.)
   
This vulnerability was patched in Java 6 Update 51, but post end of
   life. Apple pushed out the update specifically because of this
   vulnerability (http://support.apple.com/kb/HT5717) as did some other
   vendors privately, but for the majority of people using Java 6 means
 they
   have a ticking time bomb.
   
Allowing it to stay should be considered in terms of accepting the
  whole
   risk posture.
   
  
   There are some who get extended support, but I suspect many just have
   a if-it's-not-broke mentality when it comes to production deployments.
   The current code supports both java6 and java7 and so allows these
   people to remain compatible, while enabling others to upgrade to the
   java7 runtime. This seems like the right compromise for a stable
   release series. Again, absolutely makes sense for trunk (ie v3) to
   require java7 or greater.
  
 




 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)


Re: Plans of moving towards JDK7 in trunk

2014-04-08 Thread Eli Collins
On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
davi.ottenhei...@emc.com wrote:
 From: Eli Collins [mailto:e...@cloudera.com]
 Sent: Monday, April 07, 2014 11:54 AM


 IMO we should not drop support for Java 6 in a minor update of a stable
 release (v2).  I don't think the larger Hadoop user base would find it
 acceptable that upgrading to a minor update caused their systems to stop
 working because they didn't upgrade Java. There are people still getting
 support for Java 6. ...

 Thanks,
 Eli

 Hi Eli,

 Technically you are correct those with extended support get critical security 
 fixes for 6 until the end of 2016. I am curious whether many of those are in 
 the Hadoop user base. Do you know? My guess is the vast majority are within 
 Oracle's official public end of life, which was over 12 months ago. Even 
 Premier support ended Dec 2013:

 http://www.oracle.com/technetwork/java/eol-135779.html

 The end of Java 6 support carries much risk. It has to be considered in terms 
 of serious security vulnerabilities such as CVE-2013-2465 with CVSS score 
 10.0.

 http://www.cvedetails.com/cve/CVE-2013-2465/

 Since you mentioned caused systems to stop as an example of what would be a 
 concern to Hadoop users, please note the CVE-2013-2465 availability impact:

 Complete (There is a total shutdown of the affected resource. The attacker 
 can render the resource completely unavailable.)

 This vulnerability was patched in Java 6 Update 51, but post end of life. 
 Apple pushed out the update specifically because of this vulnerability 
 (http://support.apple.com/kb/HT5717) as did some other vendors privately, but 
 for the majority of people using Java 6 means they have a ticking time bomb.

 Allowing it to stay should be considered in terms of accepting the whole risk 
 posture.


There are some who get extended support, but I suspect many just have
a if-it's-not-broke mentality when it comes to production deployments.
The current code supports both java6 and java7 and so allows these
people to remain compatible, while enabling others to upgrade to the
java7 runtime. This seems like the right compromise for a stable
release series. Again, absolutely makes sense for trunk (ie v3) to
require java7 or greater.


Re: Plans of moving towards JDK7 in trunk

2014-04-07 Thread Eli Collins
On Sat, Apr 5, 2014 at 12:54 PM, Raymie Stata rst...@altiscale.com wrote:
 To summarize the thread so far:

 a) Java7 is already a supported compile- and runtime environment for
 Hadoop branch2 and trunk
 b) Java6 must remain a supported compile- and runtime environment for
 Hadoop branch2
 c) (b) implies that branch2 must stick to Java6 APIs

 I wonder if point (b) should be revised.  We could immediately
 deprecate Java6 as a runtime (and thus compile-time) environment for
 Hadoop.  We could end support for in some published time frame
 (perhaps 3Q2014).  That is, we'd say that all future 2.x release past
 some date would not be guaranteed to run on Java6.  This would set us
 up for using Java7 APIs into branch2.

IMO we should not drop support for Java 6 in a minor update of a
stable release (v2).  I don't think the larger Hadoop user base would
find it acceptable that upgrading to a minor update caused their
systems to stop working because they didn't upgrade Java. There are
people still getting support for Java 6. For the same reason, the
various distributions will not want to drop support in a minor update
of their products also, and since distros are using the Apache v2.x
update releases as the basis for their updates it would mean they have
to stop shipping v2.x updates, which makes it harder to collaborate
upstream.

Your point with regard to testing and releasing trunk is valid, though
we need to address that anyway, outside the context of Java versions.

Thanks,
Eli


Re: [VOTE] Release Apache Hadoop 2.4.0

2014-04-04 Thread Eli Collins
+1  for another rc.  There have been quite a few issues found (handful
marked blocker) and this is only the first release candidate, seems
like the point of having multiple release candidates is to iterate
with another one that addresses the major issues found with the
previous one.

On Fri, Apr 4, 2014 at 5:06 PM, Gera Shegalov g...@shegalov.com wrote:
 I built the release from the rc tag, enabled timeline history service and
 ran a sleep job on a pseudo-distributed cluster.

 I encourage another rc, for 2.4.0 (non-binding)

 1) Despite the discussion on YARN-1701, timeline AHS still sets
 yarn.timeline-service.generic-application-history.fs-history-store.uri to a
 location under ${hadoop.log.dir} that is meant for local file system, but
 uses it on HDFS by default.

 2) Critical patch for WebHdfs/Hftp to fix the filesystem contract HDFS-6143
 is not included

 3) Several patches that already proved themselves useful for diagnostics in
 production and have been available for some months are still not included.
 MAPREDUCE-5044/YARN-1515 is the most obvious example. Our users need to see
 where the task container JVM got stuck when it was timed out by AM.

 Thanks,

 Gera




 On Fri, Apr 4, 2014 at 3:51 PM, Azuryy azury...@gmail.com wrote:

 Arun,

 Do you mean you will cut another RC for 2.4?


 Sent from my iPhone5s

  On 2014年4月5日, at 3:50, Arun C. Murthy a...@hortonworks.com wrote:
 
  Thanks for helping Tsuyoshi. Pls mark them as Blockers and set the
 fix-version to 2.4.1.
 
  Thanks again.
 
  Arun
 
 
  On Apr 3, 2014, at 11:38 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com
 wrote:
 
  Hi,
 
  Updated a test result log based on the result of 2.4.0-rc0:
  https://gist.github.com/oza/9965197
 
  IMO, there are some blockers to be fixed:
  * MAPREDUCE-5815(TestMRAppMaster failure)
  * YARN-1872(TestDistributedShell failure)
  * HDFS: TestSymlinkLocalFSFileSystem failure on Linux (I cannot find
  JIRA about this failure)
 
  Now I'm checking the problem reported by Azuryy.
 
  Thanks,
  - Tsuyoshi
 
  On Fri, Apr 4, 2014 at 8:55 AM, Tsuyoshi OZAWA 
 ozawa.tsuyo...@gmail.com wrote:
  Hi,
 
  Ran tests and confirmed that some tests(TestSymlinkLocalFSFileSystem)
 fail.
  The log of the test failure is as follows:
 
  https://gist.github.com/oza/9965197
 
  Should we fix or disable the feature?
 
  Thanks,
  - Tsuyoshi
 
  On Mon, Mar 31, 2014 at 6:22 PM, Arun C Murthy a...@hortonworks.com
 wrote:
  Folks,
 
  I've created a release candidate (rc0) for hadoop-2.4.0 that I would
 like to get released.
 
  The RC is available at:
 http://people.apache.org/~acmurthy/hadoop-2.4.0-rc0
  The RC tag in svn is here:
 https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0-rc0
 
  The maven artifacts are available via repository.apache.org.
 
  Please try the release and vote; the vote will run for the usual 7
 days.
 
  thanks,
  Arun
 
  --
  Arun C. Murthy
  Hortonworks Inc.
  http://hortonworks.com/
 
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
 entity to
  which it is addressed and may contain information that is
 confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader
  of this message is not the intended recipient, you are hereby
 notified that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 
 
 
  --
  - Tsuyoshi
 
 
 
  --
  - Tsuyoshi
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.



Re: Edit permissions for my Hadoop wiki account

2014-01-03 Thread Eli Collins
Added you both.

On Fri, Jan 3, 2014 at 4:58 PM, Arpit Agarwal aagar...@hortonworks.com wrote:
 Could some kind admin do the same for my account too?

 My Hadoop wiki username is ArpitAgarwal

 Thanks!


 On Fri, Jan 3, 2014 at 4:54 PM, Andrew Wang andrew.w...@cloudera.comwrote:

 Hi all,

 Could someone give my wiki account edit permissions? Username is
 AndrewWang.

 Thanks,
 Andrew


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: branch development for HADOOP-9639

2013-12-06 Thread Eli Collins
+1good idea

Thanks for contributing Sangjin.


On Mon, Dec 2, 2013 at 11:47 AM, Sangjin Lee sj...@apache.org wrote:
 We have been having discussions on HADOOP-9639 (shared cache for jars) and
 the proposed design there for some time now. We are going to start work on
 this and have it vetted and reviewed by the community. I have just filed
 some more implementation JIRAs for this feature: YARN-1465, MAPREDUCE-5662,
 YARN-1466, YARN-1467

 Rather than working privately in our corner and sharing a big patch at the
 end, I'd like to explore the idea of developing on a branch in the public
 to foster more public feedback. Recently the Hadoop PMC has passed the
 change to the bylaws to allow for branch committers (
 http://mail-archives.apache.org/mod_mbox/hadoop-general/201307.mbox/%3CCACO5Y4y7HZnn3BS-ZyCVfv-UBcMudeQhndr2vqg%3DXqE1oBiQvQ%40mail.gmail.com%3E),
 and I think it would be a good model for this development.

 I'd like to propose a branch development and a branch committer status for
 a couple of us who are going to work on this per bylaw. Could you please
 let me know what you think?

  Thanks,
 Sangjin


Re: Wiki Permissions

2013-11-12 Thread Eli Collins
I added your username, you should be able to edit now.

On Tue, Nov 12, 2013 at 1:00 PM, Billy Watson williamrwat...@gmail.com wrote:
 Sorry, my username is WilliamWatson.

 William Watson
 Software Engineer
 (904) 705-7056 PCS


 On Tue, Nov 12, 2013 at 4:00 PM, Billy Watson williamrwat...@gmail.comwrote:

 Hi,

 I'd like to get permissions to edit the wiki to both put my company name
 on the PoweredBy page and to improve the HBase REST docs.



 William Watson
 Software Engineer
 (904) 705-7056 PCS



[jira] [Resolved] (HADOOP-10034) optimize same-filesystem symlinks by doing resolution server-side

2013-10-18 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HADOOP-10034.
--

Resolution: Duplicate

Dupe of HDFS-932

 optimize same-filesystem symlinks by doing resolution server-side
 -

 Key: HADOOP-10034
 URL: https://issues.apache.org/jira/browse/HADOOP-10034
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Reporter: Colin Patrick McCabe

 We should optimize same-filesystem symlinks by doing resolution server-side 
 rather than client side, as discussed on HADOOP-9780.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Managing docs with hadoop-1 hadoop-2

2013-10-18 Thread Eli Collins
On Fri, Oct 18, 2013 at 2:10 PM, Arun C Murthy a...@hortonworks.com wrote:

 Folks,

  Currently http://hadoop.apache.org/docs/stable/ points to hadoop-1. With
 hadoop-2 going GA, should we just point that to hadoop-2?

  Couple of options:
  # Have stable1/stable2 links:
http://hadoop.apache.org/docs/stable1 - hadoop-1.x
http://hadoop.apache.org/docs/stable2 - hadoop-2.x


+1,   would also make:
 current - stable2(since v2 is the latest)
 stable - stable1 (for compatibility)

Thanks,
Eli


 # Just point stable to hadoop-2 and create something new for hadoop-1:
http://hadoop.apache.org/docs/hadoop1 - hadoop-1.x
http://hadoop.apache.org/docs/stable - hadoop-2.x

 We have similar requirements for *current* link too.

 Thoughts?

 thanks,
 Arun

 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



[jira] [Created] (HADOOP-10055) FileSystemShell.apt.vm doc has typo numRepicas

2013-10-16 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-10055:


 Summary: FileSystemShell.apt.vm doc has typo numRepicas 
 Key: HADOOP-10055
 URL: https://issues.apache.org/jira/browse/HADOOP-10055
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0
Reporter: Eli Collins
Priority: Trivial


HDFS-5139 added numRepicas to FileSystemShell.apt.vm, should be numReplicas.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: streaming documentation in Hadoop 2?

2013-10-14 Thread Eli Collins
This is MAPREDUCE-4282

On Mon, Oct 14, 2013 at 3:28 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
 Doc existed in MR1 http://hadoop.apache.org/docs/stable/streaming.html, but
 it looks like it and a bunch of other stuff (e.g. Rumen and the MapReduce
 Tutorial) weren't ported over.


 On Mon, Oct 14, 2013 at 3:20 PM, Eli Collins e...@cloudera.com wrote:

 It probably just needs doc, I'd go ahead and file a jira for it. The
 wiki content here could be a good starting point.

 On Mon, Oct 14, 2013 at 2:56 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:
  Hi All,
 
  I noticed that the hadoop streaming documentation does not exist in the
  Hadoop 2 source tree, and also cannot be found on the internet.   Is this
  on purpose?  I found this wiki page
  http://wiki.apache.org/hadoop/HadoopStreaming - is that where doc is
  supposed to go?  As this page isn't tied to a specific version, how does
 it
  work if new options are added?
 
  thanks,
  -Sandy



Re: symlink support in Hadoop 2 GA

2013-09-18 Thread Eli Collins
On Wed, Sep 18, 2013 at 5:45 AM, Steve Loughran ste...@hortonworks.comwrote:

 On 18 September 2013 12:53, Alejandro Abdelnur t...@cloudera.com wrote:

  On Wed, Sep 18, 2013 at 11:29 AM, Steve Loughran ste...@hortonworks.com
  wrote:
 
   I'm reluctant for this as while delaying the release, because we are
  going
   to find problems all the way up the stack -which will require a
   choreographed set of changes. Given the grief of the protbuf update, I
   don't want to go near that just before the final release.
  
 
  Well, I would use the exact same argument used for protobuf (which only
  complication was getting protoc 2.5.0 in the jenkins boxes and
 communicate
  developers to do the same, other than that we didn't hit any other issue
  AFAIK) ...
 

 protobuf was traumatic at build time, as I recall because it was neither
 forwards or backwards compatible. Those of us trying to build different
 branches had to choose which version to have on the path, or set up scripts
 to do the switching. HBase needed rebuilding, so did other things. And I
 still have the pain of downloading and installing protoc on all Linux VMs I
 build up going forward, until apt-get and yum have protoc 2.5 artifacts.

 This means it was very painful for developer, added a lot of late breaking
 pain to the developers, but it had one key feature that gave it an edge: it
 was immediately obvious where you had a problem as things didn't compile or
 classload without linkage problems. No latent bugs, unless protobuf 2.5 has
 them internally -for which we have to rely on google's release testing to
 have found.

 That is a lot simpler to regression test than adding any new feature to
 HDFS and seeing what breaks -as that is something that only surfaces out in
 the field. Which is why I think it's too late in the 2.1 release timetable
 to add symlinks. We've had a 2.1-beta out there, we've got feedback. Fix
 those problems that are show stoppers, but don't add more stuff. Which is
 precisely why I have not been pushing in any of my recent changes. I may
 seem ruthless arguing against symlinks -but I'm not being inconsistent with
 my own commit history. The only two things I've put in branch-2.1 since
 beta-1 were a separate log for the Configuration deprecation warnings and a
 patch to the POM for a java7 build on OSX: and they weren't even my
 patches.


 -Steve

 (One of these days I should volunteer to be the release manager and it'll
 be obvious that Arun is being quite amenable to all the other developers)



 
  IMO, it makes more sense to do this change during the beta rather than
 when
  GA. That gives us more flexibility to iron out things if necessary.
 
 
 I'm arguing this change can go into the beta of the successor to 2.1 -not
 GA.


What does this change refer to?  Symlinks are already in 2.1, and the
existing semantics create problems for programs (eg see the pig
example in HADOOP-9912)
that we need to resolve.  I don't think do nothing is an option for 2.2. GA.

Thanks,
Eli







 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: symlink support in Hadoop 2 GA

2013-09-17 Thread Eli Collins
(Looping in Arun since this impacts 2.x releases)

I updated the versions on HADOOP-8040 and sub-tasks to reflect where
the changes have landed. All of these changes (modulo HADOOP-9417)
were merged to branch-2.1 and are in the 2.1.0 release.

While symlinks are in 2.1.0 I don't think we can really claim they're
ready until issues like HADOOP-9912 are resolved, and they are
supported in the shell, distcp and WebHDFS/HttpFS/Hftp (these are not
esoteric!).  Someone can create a symlink with FileSystem causing
someone else's distcp job to fail. Unlikely given they're not exposed
outside the Java API but still not great.   Ideally this work would
have been done on a feature branch and then merged when complete, but
that's water under the bridge.

I see the following options:

1. Fixup the current symlink support so that symlinks are ready for
2.2 (GA), or at least the public APIs. This means the APIs will be in
GA from the get go so while the functionality might be fully baked we
don't have to worry about incompatible changes like FileStatus#isDir()
changing behavior in 2.3 or a later update.  The downside is this will
take at least a couple weeks (to resolve HADOOP-9912 and potentially
implement the remaining pieces) and so may impact the 2.2 release
timing. This option means 2.2 won't remove the new APIs introduced in
2.1.  We'd want to spin a 2.1.2 beta with the new API changes so we
don't introduce new APIs in the beta to GA transition.

2. Revert symlinks from branch-2.1-beta and branch-2. Finish up the
work in trunk (or a feature branch) and merge for a subsequent 2.x
update.  While this helps get us to GA faster it would be preferable
to get an API change like this in for 2.2 GA since they may be
disruptive to introduce in an update (eg see example in #1). And of
course our users would like symlinks functionality in the GA release.
This option would mean 2.2 is incompatible with 2.1 because it's
dropping the new APIs, not ideal for a beta to GA transition.

3. Revert and punt symlinks to 3.x.  IMO should be the last resort.

If we have sufficient time I think option #1 would be best.  What do
others think?

Thanks,
Eli


On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang andrew.w...@cloudera.com wrote:
 Hi all,

 I wanted to broadcast plans for putting the FileSystem symlinks work
 (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think
 it's pretty important we get it in since it's not a compatible change; if
 it misses the GA train, we're not going to have symlinks until the next
 major release.

 However, we're still dealing with ongoing issues revealed via testing.
 There's user-code out there that only handles files and directories and
 will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
 for a nice example where globStatus returning symlinks broke Pig; some of
 us had a conference call to talk it through, and one definite conclusion
 was that this wasn't solvable in a generally compatible manner.

 There are also still some gaps in symlink support right now. For example,
 the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
 resolution, and tooling like the FsShell and Distcp still need to be
 updated as well.

 So, there's definitely work to be done, but there are a lot of users
 interested in the feature, and symlinks really should be in GA. Would
 appreciate any thoughts/input on the matter.

 Thanks,
 Andrew


Re: What's the status of IPv6 support?

2013-08-16 Thread Eli Collins
Hey Andrew,

I'm not aware of anyone working on IPv6 support.

The following looks up to date and may be helpful:
http://wiki.apache.org/hadoop/HadoopIPv6

Thanks,
Eli

On Fri, Aug 16, 2013 at 8:27 AM, Andrew Pennebaker
apenneba...@42six.com wrote:
 When will Hadoop get better IPv6 support? It would be nice if Ubuntu users
 didn't have to disable IPv6 in order to get Hadoop working.


Re: What's the status of IPv6 support?

2013-08-16 Thread Eli Collins
On Friday, August 16, 2013, Steve Loughran wrote:

 On 16 August 2013 10:05, Andrew Pennebaker 
 apenneba...@42six.comjavascript:;
 wrote:

  Thanks for the link!
 
  I understand a lack of support *for* IPv6, what I don't understand is why
  IPv6 must be disabled in order for Hadoop to work. On systems with both
  IPv4 and IPv6, I thought IPv4 apps could ignore IPv6 as if it weren't
  available. I suppose that's not quite right.
 


 historically the JVM IPv6 support on Linux has been pretty troublesome -I
 don't know if it's better now, because we all avoid it.

 I'm not sure you have to disable it as long as all java processes get
 started with -Djava.net.preferIPv4Stack=true as a JVM arg


That's correct, I've run Hadoop clusters with ipv6 enabled using these
options (they were checked in by default awhile back).




 If you don't say that then the JVM keeps trying to create sockets at the
 IPv6 level and then fallback to IPv4 if it can't connect, which adds
 overhead to all connections.

 -steve


 
  On Fri, Aug 16, 2013 at 12:44 PM, Eli Collins 
  e...@cloudera.comjavascript:;
 wrote:
 
   Hey Andrew,
  
   I'm not aware of anyone working on IPv6 support.
  
   The following looks up to date and may be helpful:
   http://wiki.apache.org/hadoop/HadoopIPv6
  
   Thanks,
   Eli
  
   On Fri, Aug 16, 2013 at 8:27 AM, Andrew Pennebaker
   apenneba...@42six.com javascript:; wrote:
When will Hadoop get better IPv6 support? It would be nice if Ubuntu
   users
didn't have to disable IPv6 in order to get Hadoop working.
  
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Java 7 and Hadoop

2013-08-02 Thread Eli Collins
You should be able to login to builds.apache.org using your Apache
credentials and create a job.  I'd copy the Hadoop trunk job and just
update the JDK dropdown to JDK 1.7 (latest).  You can ping me today
if that doesn't work.

On Fri, Aug 2, 2013 at 10:55 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
 How do we go about setting up builds on jdk7?  Couldn't figure out how to
 do it on Jenkins, so I'm assuming it either requires PMC or Infra?

 -Sandy


 On Thu, Jul 25, 2013 at 10:43 AM, Santhosh M S 
 santhosh_mut...@yahoo.comwrote:

 Resurrecting an old thread - now that JDK 6 is EOL should we switch over
 to JDK 7 across the board?

 Thoughts?

 Thanks,
 Santhosh


 
  From: Eli Collins e...@cloudera.com
 To: common-dev@hadoop.apache.org
 Sent: Tuesday, July 31, 2012 6:19 PM
 Subject: Re: Java 7 and Hadoop


 We should officially support it in Hadoop (one piece of BIGTOP-458
 which is for supporting it across the whole stack).
 Now that HADOOP-8370 (fixes native compilation) is in the full tarball
 should work. A good next step would be updating JAVA_HOME on one of
 the Hadoop jenkins jobs to use jdk7.


 On Tue, Jul 31, 2012 at 9:17 AM, Thomas Graves tgra...@yahoo-inc.com
 wrote:
  I've seen more and more people using java 7.  We are also planning to
 move
  to java 7 due to the eol of java 6 that Scott referenced.
 
  What are folks thoughts on making it officially supported by Hadoop?  Is
  there a process for this or is it simply updating the wiki Eli mentioned
  after sufficient testing?
 
  Thanks,
  Tom
 
 
  On 4/26/12 4:25 PM, Eli Collins e...@cloudera.com wrote:
 
  Hey Scott,
 
  Nice.  Please update this page with your experience when you get a
 chance:
  http://wiki.apache.org/hadoop/HadoopJavaVersions
 
  Thanks,
  Eli
 
 
  On Thu, Apr 26, 2012 at 2:03 PM, Scott Carey sc...@richrelevance.com
 wrote:
  Java 7 update 4 has been released.  It is even available for MacOS X
 from
  Oracle:
 
 http://www.oracle.com/technetwork/java/javase/downloads/jdk-7u4-downloads-159
  1156.html
 
  Java 6 will reach end of life in about 6 months.   After that point,
 there
  will be no more public updates from Oracle for Java 6, even security
 updates.
  https://blogs.oracle.com/henrik/entry/updated_java_6_eol_date
  You can of course pay them for updates or build your own OpenJDK.
 
  The entire Hadoop ecosystem needs to test against Java 7 JDKs this
 year.  I
  will be testing some small clusters of ours with JDK 7 in about a
 month, and
  my internal projects will start using Java 7 features shortly after.
 
 
  See the JDK roadmap:
 
 http://blogs.oracle.com/javaone/resource/java_keynote/slide_15_full_size.gif
  https://blogs.oracle.com/java/entry/moving_java_forward_java_strategy
 
 
 


Re: Hadoop 2.0 - org.apache.hadoop.fs.Hdfs vs. DistributedFileSystem?

2013-06-18 Thread Eli Collins
Hey Steve,

That's correct, see HADOOP-6223 for the history.  However, per Andrew
I don't think it's realistic to expect people to migrate off
FileSystem for a while (I filed HADOOP-6446 well over three years
ago).

The unfortunate consequence of the earlier decision to have parallel
interfaces rather than transition one over time means people
effectively need to end up implementing multiple backends - one that
gets used by clients of FileSystem, and one for clients of
FileContext.  Implementing in only one place significantly limits
adoption of the feature or file system because they can't be
effectively adopted in practice unless they're available to old and
new clients  (for example, this is why symlinks are getting backported
to FileSystem from FileContext).

Thanks,
Eli

On Tue, Jun 18, 2013 at 11:15 AM, Stephen Watt sw...@redhat.com wrote:
 Hi Folks

 My understanding is that from Hadoop 2.0 onwards the AbstractFileSystem is 
 now the strategic class to extend for writing Hadoop FileSystem plugins. This 
 is a departure from previous versions where one would extend the FileSystem 
 class. This seems to be reinforced by the hadoop-default.xml for Hadoop 2.0 
 in the Apache Wiki 
 (http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-project-dist/hadoop-common/core-default.xml)
  which shows fs.AbstractFileSystem.hdfs.impl being set to 
 org.apache.hadoop.fs.Hdfs

 Is my assertion correct? Do we have community consensus around this? i.e. 
 Beyond the apache distro, are the commercial distros (Intel, Hortonworks, 
 Cloudera, WanDisco, EMC Pivotal, etc.) using org.apache.hadoop.fs.Hdfs as 
 their filesystem plugin for HDFS? What does one lose by using the 
 DistributedFileSystem class instead of the Hdfs class?

 Regards
 Steve Watt

 - Original Message -
 From: Andrew Wang andrew.w...@cloudera.com
 To: common-dev@hadoop.apache.org
 Cc: Milind Bhandarkar mbhandar...@gopivotal.com, shv hadoop 
 shv.had...@gmail.com, Steve Loughran ste...@hortonworks.com, Kun Ling 
 erlv5...@gmail.com, Roman Shaposhnik shaposh...@gmail.com, Andrew 
 Purtell apurt...@apache.org, cdoug...@apache.org, jayh...@cs.ucsc.edu, 
 Sanjay Radia san...@hortonworks.com
 Sent: Friday, June 14, 2013 1:32:38 PM
 Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop 
 FileSystems + Workshop

 Hey Steve,

 I agree that it's confusing. FileSystem and FileContext are essentially two
 parallel sets of interfaces for accessing filesystems in Hadoop.
 FileContext splits the interface and shared code with AbstractFileSystem,
 while FileSystem is all-in-one. If you're looking for the AFS equivalents
 to DistributedFileSystem and LocalFileSystem, see Hdfs and LocalFs.

 Realistically, FileSystem isn't going to be deprecated and removed any time
 soon. There are lots of 3rd-party FileSystem implementations, and most apps
 today use FileSystem (including many HDFS internals, like trash and the
 shell).

 When I read the wiki page, I figured that the mention of AFS was
 essentially a typo, since everyone's been steaming ahead with FileSystem.
 Standardizing FileSystem makes total sense to me, I just wanted to confirm
 that plan.

 Best,
 Andrew


 On Fri, Jun 14, 2013 at 9:38 AM, Stephen Watt sw...@redhat.com wrote:

 This is a good point Andrew. The hangout was actually the first time I'd
 heard about the AbstractFileSystem class. I've been doing some further
 analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0
 implementation of DistributedFileSystem and LocalFileSystem class they
 extend the FileSystem class and not AbstractFileSystem. I would imagine if
 the plan for Hadoop 2.0 is to build FileSystem implementations using the
 AbstractFileSystem, then those two would use it, so I'm a bit confused.

 Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you
 clarify this for us?

 Regards
 Steve Watt

 - Original Message -
 From: Andrew Wang andrew.w...@cloudera.com
 To: common-dev@hadoop.apache.org
 Cc: mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com,
 ste...@hortonworks.com, erlv5...@gmail.com, shaposh...@gmail.com,
 apurt...@apache.org, cdoug...@apache.org, jayh...@cs.ucsc.edu,
 san...@hortonworks.com
 Sent: Monday, June 10, 2013 5:14:16 PM
 Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop
 FileSystems + Workshop

 Thanks for the summary Steve, very useful.

 I'm wondering a bit about the point on testing AbstractFileSystem rather
 than FileSystem. While these are both wrappers for DFSClient, they're
 pretty different in terms of the APIs they expose. Furthermore, AFS is not
 actually a client-facing API; clients interact with an AFS through
 FileContext.

 I ask because I did some work trying to unify the symlink tests for both
 FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things
 like the default mkdir semantics are different; you can see some of the
 contortions in HADOOP-9370. I ultimately ended up just adhering to 

Re: Cannot update PoweredBy wiki page

2013-05-30 Thread Eli Collins
Hey Adam,

If you create a user ID for the Hadoop wiki and send it to me I'll
update this page which will give you perms.

http://wiki.apache.org/hadoop/ContributorsGroup

On Thu, May 30, 2013 at 3:26 PM, Adam Kawa kawa.a...@gmail.com wrote:
 Could someone please add me? I promise not to break anything ;)


 2013/5/31 Konstantin Boudnik c...@apache.org

 Someone with proper karma needs to add you into ACL for this page.

 On Thu, May 30, 2013 at 11:14PM, Adam Kawa wrote:
  Hi,
 
  When uploading new content (and information about my company), I got the
  exception
  Sorry, can not save page because rubbelloselotto.de is not allowed in
  this wiki.
 
  How could I solve it?
 
  Kind regards,
  Adam



Re: [PROPOSAL] change in bylaws to remove Release Plan vote

2013-05-21 Thread Eli Collins
+1  thanks Matt.


On Tue, May 21, 2013 at 2:10 PM, Matt Foley ma...@apache.org wrote:

 Hi all,
 This has been a side topic in several email threads recently.  Currently we
 have an ambiguity.  We have a tradition in the dev community that any
 committer can create a branch, and propose release candidates from it.  Yet
 the Hadoop bylaws say that releases have to be planned in advance, the plan
 needs to be voted on, and presumably can be denied.

 Apache policies (primarily here http://www.apache.org/dev/release.html
  and here http://www.apache.org/foundation/voting.html, with
 non-normative commentary
 here
 http://incubator.apache.org/guides/releasemanagement.html#best-practice)
 are very clear on how Releases have to be approved, and our bylaws are
 consistent with those policies.  But Apache policies don't say anything
 I've found about Release Plans, nor about voting on Release Plans.

 I propose the following change, to remove Release Plan votes, and give a
 simple definition of Release Manager role.  I'm opening discussion with
 this proposal, and will put it to a vote if we seem to be getting
 consensus.  Here's the changes I suggest in the
 Bylawshttp://hadoop.apache.org/bylaws.html
  document:

 ===

 1. In the Decision Making : Actions section of the Bylaws, the
 following text is removed:

 ** Release Plan*

 Defines the timetable and actions for a release. The plan also nominates a
 Release Manager.

 Lazy majority of active committers


 2. In the Roles and Responsibilities section of the Bylaws, an additional
 role is defined:

 ** Release Manager*

 A Release Manager (RM) is a committer who volunteers to produce a Release
 Candidate according to
 HowToReleasehttps://wiki.apache.org/hadoop/HowToRelease.
  The RM shall publish a Release Plan on the *common-dev@* list stating the
 branch from which they intend to make a Release Candidate, at least one
 week before they do so. The RM is responsible for building consensus around
 the content of the Release Candidate, in order to achieve a successful
 Product Release vote.

 ===

 Please share your views.
 Best regards,
 --Matt (long-time release manager)



Re: [VOTE] Plan to create release candidate for 0.23.8

2013-05-17 Thread Eli Collins
+1

On Friday, May 17, 2013, Thomas Graves wrote:

 Hello all,

 We've had a few critical issues come up in 0.23.7 that I think warrants a
 0.23.8 release. The main one is MAPREDUCE-5211.  There are a couple of
 other issues that I want finished up and get in before we spin it.  Those
 include HDFS-3875, HDFS-4805, and HDFS-4835.  I think those are on track
 to finish up early next week.   So I hope to spin 0.23.8 soon after this
 vote completes.

 Please vote '+1' to approve this plan. Voting will close on Friday May
 24th at 2:00pm PDT.

 Thanks,
 Tom Graves




Re: [VOTE] - Release 2.0.5-beta

2013-05-15 Thread Eli Collins
On Wed, May 15, 2013 at 1:29 PM, Matt Foley mfo...@hortonworks.com wrote:

  Arun, not sure whether your Yes to all already covered this, but I'd
 like
  to throw in support for the compatibility guidelines being a blocker.

 +1 to that.  Definitely an overriding concern for me.


+1  Likewise.   Would be great to get more eyeballs on Karthik's patch
on HADOOP-9517 if people haven't review it already.


Re: Use hadoop.relaxed.worker.version.check to allow versions in the same major version?

2013-04-26 Thread Eli Collins
Hey Karthik,

We already support this for HDFS, see HDFS-2983 (Relax the build
version check to permit rolling upgrades within a release).
We should do so for Yarn as well, MAPREDUCE-4150 tracks this.  I don't
think Ahmed is working on it so would be great for you or someone else
to take it.

Thanks,
Eli

On Fri, Apr 26, 2013 at 10:21 AM, Karthik Kambatla ka...@cloudera.com wrote:
 Hi devs,

 Given that we have API compatibility within a major release, I was
 wondering if it would make sense to augment the worker version check to
 allow workers from a different minor/point release in the same major
 release? The motivation for this is rolling upgrade within a major release.

 We could use the same property hadoop.relaxed.worker.version.check for this
 or add another property. Thoughts?

 Thanks
 Karthik


Re: Heads up - 2.0.5-beta

2013-04-26 Thread Eli Collins
On Fri, Apr 26, 2013 at 11:15 AM, Arun C Murthy a...@hortonworks.com wrote:

 On Apr 25, 2013, at 7:31 PM, Roman Shaposhnik wrote:

 On Thu, Apr 25, 2013 at 6:34 PM, Arun C Murthy a...@hortonworks.com wrote:

 With that in mind, I really want to make a serious push to lock down APIs 
 and wire-protocols for hadoop-2.0.5-beta.
 Thus, we can confidently support hadoop-2.x in a compatible manner in the 
 future. So, it's fine to add new features,
 but please ensure that all APIs are frozen for hadoop-2.0.5-beta

 Arun, since it sounds like you have a pretty definite idea
 in mind for what you want 'beta' label to actually mean,
 could you, please, share the exact criteria?

 Sorry, I'm not sure if this is exactly what you are looking for but, as I 
 mentioned above, the primary aim would be make the final set of required 
 API/write-protocol changes so that we can call it a 'beta' i.e. once 
 2.0.5-beta ships users  downstream projects can be confident about forward 
 compatibility in hadoop-2.x line. Obviously, we might discover a blocker bug 
 post 2.0.5 which *might* necessitate an unfortunate change - but that should 
 be an outstanding exception.

Arun, Suresh,

Mind reviewing the following page Karthik put together on
compatibility?   http://wiki.apache.org/hadoop/Compatibility

I think we should do something similar to what Sanjay proposed in
HADOOP-5071 for Hadoop v2.   If we get on the same page on
compatibility terms/APIs then we can quickly draft the policy, at
least for the things we've already got consensus on.  I think our new
developers, users, downstream projects, and partners would really
appreciate us making this clear.  If people like the content we can
move it to the Hadoop website and maintain it in svn like the bylaws.

The reason I think we need to do so is because there's been confusion
about what types of compatibility we promise and some open questions
which I'm not sure everyone is clear on. Examples:
- Are we going to preserve Hadoop v3 clients against v2 servers now
that we have protobuf support?  (I think so..)
- Can we break rolling upgrade of daemons in updates post GA? (I don't
think so..)
- Do we disallow HDFS metadata changes that require an HDFS upgrade in
an update? (I think so..)
- Can we remove methods from v2 and v2 updates that were deprecated in
v0.20-22?  (Unclear)
- Will we preserve binary compatibility for MR2 going forward? (I think so..)
- Does the ability to support multiple versions of MR simultaneously
via MR2 change the MR API compatibility story? (I don't think so..)
- Are the RM protocols sufficiently stable to disallow incompatible
changes potentially required by non-MR projects? (Unclear, most large
Yarn deployments I'm aware of are running 0.23, not v2 alphas)

I'm also not sure there's currently consensus on what an incompatible
change is. For example, I think HADOOP-9151 is incompatible because it
broke client/server wire compatibility with previous releases and any
change that breaks wire compatibility is incompatible.  Suresh felt it
was not an incompatible change because it did not affect API
compatibility (ie PB is not considered part of the API) and the change
occurred while v2 is in alpha.  Not sure we need to go through the
whole exercise of what's allowed in an alpha and beta (water under the
bridge, hopefully), but I do think we should clearly define an
incompatible change.  It's fine that v2 has been a bit wild wild west
in the alpha development stage but I think we need to get a little
more rigorous.

Thanks,
Eli


Re: Heads up - 2.0.5-beta

2013-04-26 Thread Eli Collins
On Fri, Apr 26, 2013 at 2:42 PM, Suresh Srinivas sur...@hortonworks.com wrote:
 Eli, I will post a more detailed reply soon. But one small correction:


 I'm also not sure there's currently consensus on what an incompatible
 change is. For example, I think HADOOP-9151 is incompatible because it
 broke client/server wire compatibility with previous releases and any
 change that breaks wire compatibility is incompatible.  Suresh felt it
 was not an incompatible change because it did not affect API
 compatibility (ie PB is not considered part of the API) and the change
 occurred while v2 is in alpha.


 This is not correct. I did not say it was not an incompatible change.
 It was indeed an incompatible wire protocol change. My argument was,
 the phase of development we were in, we could not mark wire protocol
 as stable and not make any incompatible change. But once 2.0.5-beta
 is out, as had discussed earlier, we should not make further incompatible
 changes to wire protocol.

Sorry for the confusion, I misinterpreted your comments on the jira
(specifically, This is an incompatible change: I disagree. and see
my argument that about why this is not incompatible.)  to indicate
that you thought it was not incompatible.




 --
 http://hortonworks.com/download/


Fwd: Compatibility in Apache Hadoop

2013-04-22 Thread Eli Collins
On Mon, Apr 22, 2013 at 5:42 PM, Steve Loughran ste...@hortonworks.com wrote:

 On 22 April 2013 14:00, Karthik Kambatla ka...@cloudera.com wrote:

  Hadoop devs,
 
 
  This doc does not intend to propose new policies. The idea is to have one
  document that outlines the various compatibility concerns (lots of areas
  beyond API compatibility), captures the respective policies that exist, and
  if we want to define policies for the items where it’s not clear we have
  something to iterate on.
 
  The first draft just lists the types of compatibility. In the next step, we
  can add existing policies and subsequently work towards policies for
  others.
 
 
 I don't see -yet- a definition of compatible at the API signature level vs
 semantics level.

 The @ interface attributes say these methods are
 internal/external/stable/unstable (there's also @VisibleForTesting,that
 comes out of guava (yes?).

 There's a separate issue that says we make some guarantee that the
 behaviour of a interface remains consistent over versions, which is hard
 to do without some rigorous definition of what the expected behaviour of an
 implementation should be.


Good point, Steve.  I've assumed the semantics of the API had to
respect the attribute (eg changing the semantics of FileSystem#close
would be an incompatible change, since this is a public/stable API,
even if the new semantics are arguably better).  But you're right,
unless we've actually defined what the semantics of the APIs are it's
hard to say if we've materially changed them.  How about adding a new
section on the page and calling that out explicitly?

In practice I think we'll have to take semantics case by case, clearly
define the semantics we care about better in the javadocs (for the
major end user-facing classes at least, calling out both intended
behavior and behavior that's meant to be undefined) and using
individual judgement elsewhere.  For example, HDFS-4156 changed
DataInputStream#seek to throw an IOE if you seek to a negative offset,
instead of succeeding then resulting in an NPE on the next access.
That's an incompatible change in terms of semantics, but not semantics
intended by the author, or likely semantics programs depend on.
However if a change made FileSystem#close three times slower, this
perhaps a smaller semantic change (eg doesn't change what exceptions
get thrown) but probably much less tolerable for end users.

In any case, even if we get an 80% solution to the semantics issue
we'll probably be in good shape for v2 GA if we can sort out the
remaining topics.   See any other topics missing?   Once the overall
outline is in shape it make sense to annotate the page with the
current policy (if there's already consensus on one), and identifying
areas where we need to come up with a policy or are leaving TBD.
Currently this is a source of confusion for new developers, some
downstream projects and users.

Thanks,
Eli


Re: [Vote] Merge branch-trunk-win to trunk

2013-02-27 Thread Eli Collins
Bobby raises some good questions.  A related one, since most current
developers won't add Windows support for new features that are
platform specific is it assumed that Windows development will either
lag or will people actively work on keeping Windows up with the
latest?  And vice versa in case Windows support is implemented first.

Is there a jira for resolving the outstanding TODOs in the code base
(similar to HDFS-2148)?  Looks like this merge doesn't introduce many
which is great (just did a quick diff and grep).

Thanks,
Eli

On Wed, Feb 27, 2013 at 8:17 AM, Robert Evans ev...@yahoo-inc.com wrote:
 After this is merged in is Windows still going to be a second class
 citizen but happens to work for more than just development or is it a
 fully supported platform where if something breaks it can block a release?
  How do we as a community intend to keep Windows support from breaking?
 We don't have any Jenkins slaves to be able to run nightly tests to
 validate everything still compiles/runs.  This is not a blocker for me
 because we often rely on individuals and groups to test Hadoop, but I do
 think we need to have this discussion before we put it in.

 --Bobby

 On 2/26/13 4:55 PM, Suresh Srinivas sur...@hortonworks.com wrote:

I had posted heads up about merging branch-trunk-win to trunk on Feb 8th.
I
am happy to announce that we are ready for the merge.

Here is a brief recap on the highlights of the work done:
- Command-line scripts for the Hadoop surface area
- Mapping the HDFS permissions model to Windows
- Abstracted and reconciled mismatches around differences in Path
semantics
in Java and Windows
- Native Task Controller for Windows
- Implementation of a Block Placement Policy to support cloud
environments,
more specifically Azure.
- Implementation of Hadoop native libraries for Windows (compression
codecs, native I/O)
- Several reliability issues, including race-conditions, intermittent test
failures, resource leaks.
- Several new unit test cases written for the above changes

Please find the details of the work in CHANGES.branch-trunk-win.txt -
Common changeshttp://bit.ly/Xe7Ynv, HDFS changeshttp://bit.ly/13QOSo9,
and YARN and MapReduce changes http://bit.ly/128zzMt. This is the work
ported from branch-1-win to a branch based on trunk.

For details of the testing done, please see the thread -
http://bit.ly/WpavJ4. Merge patch for this is available on HADOOP-8562
https://issues.apache.org/jira/browse/HADOOP-8562.

This was a large undertaking that involved developing code, testing the
entire Hadoop stack, including scale tests. This is made possible only
with
the contribution from many many folks in the community. Following people
contributed to this work: Ivan Mitic, Chuan Liu, Ramya Sunil, Bikas Saha,
Kanna Karanam, John Gordon, Brandon Li, Chris Nauroth, David Lao, Sumadhur
Reddy Bolli, Arpit Agarwal, Ahmed El Baz, Mike Liddell, Jing Zhao, Thejas
Nair, Steve Maine, Ganeshan Iyer, Raja Aluri, Giridharan Kesavan, Ramya
Bharathi Nimmagadda, Daryn Sharp, Arun Murthy, Tsz-Wo Nicholas Sze, Suresh
Srinivas and Sanjay Radia. There are many others who contributed as well
providing feedback and comments on numerous jiras.

The vote will run for seven days and will end on March 5, 6:00PM PST.

Regards,
Suresh




On Thu, Feb 7, 2013 at 6:41 PM, Mahadevan Venkatraman
mah...@microsoft.comwrote:

 It is super exciting to look at the prospect of these changes being
merged
 to trunk. Having Windows as one of the supported Hadoop platforms is a
 fantastic opportunity both for the Hadoop project and Microsoft
customers.

 This work began around a year back when a few of us started with a basic
 port of Hadoop on Windows. Ever since, the Hadoop team in Microsoft have
 made significant progress in the following areas:
 (PS: Some of these items are already included in Suresh's email, but
 including again for completeness)

 - Command-line scripts for the Hadoop surface area
 - Mapping the HDFS permissions model to Windows
 - Abstracted and reconciled mismatches around differences in Path
 semantics in Java and Windows
 - Native Task Controller for Windows
 - Implementation of a Block Placement Policy to support cloud
 environments, more specifically Azure.
 - Implementation of Hadoop native libraries for Windows (compression
 codecs, native I/O) - Several reliability issues, including
 race-conditions, intermittent test failures, resource leaks.
 - Several new unit test cases written for the above changes

 In the process, we have closely engaged with the Apache open source
 community and have got great support and assistance from the community
in
 terms of contributing fixes, code review comments and commits.

 In addition, the Hadoop team at Microsoft has also made good progress in
 other projects including Hive, Pig, Sqoop, Oozie, HCat and HBase. Many
of
 these changes have already been committed to the respective trunks with
 help from various committers and contributors. It is great to see the
 

Re: Heads up - merge branch-trunk-win to trunk

2013-02-07 Thread Eli Collins
Thanks for the update Suresh.  Has any testing been done on the branch on
Linux aside from running the unit tests?

Thanks,
Eli


On Thu, Feb 7, 2013 at 5:42 PM, Suresh Srinivas sur...@hortonworks.comwrote:

 The support for Hadoop on Windows was proposed in
 HADOOP-8079https://issues.apache.org/jira/browse/HADOOP-8079 almost
 a year ago. The goal was to make Hadoop natively integrated, full-featured,
 and performance and scalability tuned on Windows Server or Windows Azure.
 We are happy to announce that a lot of progress has been made in this
 regard.

 Initial work started in a feature branch, branch-1-win, based on branch-1.
 The details related to the work done in the branch can be seen in
 CHANGES.txt
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHANGES.branch-1-win.txt?view=markup
 .
 This work has been ported to a branch, branch-trunk-win, based on trunk.
 Merge patch for this is available on
 HADOOP-8562https://issues.apache.org/jira/browse/HADOOP-8562
 .

 Highlights of the work done so far:
 1. Necessary changes in Hadoop to run natively on Windows. These changes
 handle differences in platforms related to path names, process/task
 management etc.
 2. Addition of winutils tools for managing file permissions and ownership,
 user group mapping, hardlinks, symbolic links, chmod, disk utilization, and
 process/task management.
 3. Added cmd scripts equivalent to existing shell scripts hadoop-daemon.sh,
 start and stop scripts.
 4. Addition of block placement policy implemnation to support cloud
 enviroment, more specifically Azure.

 We are very close to wrapping up the work in branch-trunk-win and getting
 ready for a merge. Currently the merge patch is passing close to 100% of
 unit tests on Linux. Soon I will call for a vote to merge this branch into
 trunk.

 Next steps:
 1. Call for vote to merge branch-trunk-win to trunk, when the work
 completes and precommit build is clean.
 2. Start a discussion on adding Jenkins precommit builds on windows and how
 to integrate that with the existing commit process.

 Let me know if you have any questions.

 Regards,
 Suresh



Re: [VOTE] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

2012-11-30 Thread Eli Collins
-1, 0, -1

IIUC the only platform we plan to add support for that we can't easily
support today (w/o an emulation layer like cygwin) is Windows, and it
seems like making the bash scripts simpler and having parallel bat
files is IMO a better approach.

On Sat, Nov 24, 2012 at 12:13 PM, Matt Foley ma...@apache.org wrote:
 For discussion, please see previous thread [PROPOSAL] introduce Python as
 build-time and run-time dependency for Hadoop and throughout Hadoop stack.

 This vote consists of three separate items:

 1. Contributors shall be allowed to use Python as a platform-independent
 scripting language for build-time tasks, and add Python as a build-time
 dependency.
 Please vote +1, 0, -1.

 2. Contributors shall be encouraged to use Maven tasks in combination with
 either plug-ins or Groovy scripts to do cross-platform build-time tasks,
 even under ant in Hadoop-1.
 Please vote +1, 0, -1.

 3. Contributors shall be allowed to use Python as a platform-independent
 scripting language for run-time tasks, and add Python as a run-time
 dependency.
 Please vote +1, 0, -1.

 Note that voting -1 on #1 and +1 on #2 essentially REQUIRES contributors to
 use Maven plug-ins or Groovy as the only means of cross-platform build-time
 tasks, or to simply continue using platform-dependent scripts as is being
 done today.

 Vote closes at 12:30pm PST on Saturday 1 December.
 -
 Personally, my vote is +1, +1, +1.
 I think #2 is preferable to #1, but still has many unknowns in it, and
 until those are worked out I don't want to delay moving to cross-platform
 scripts for build-time tasks.

 Best regards,
 --Matt


[jira] [Resolved] (HADOOP-8968) Add a flag to completely disable the worker version check

2012-10-26 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HADOOP-8968.
-

  Resolution: Fixed
Hadoop Flags: Reviewed

I've committed this. Thanks Tucu.

 Add a flag to completely disable the worker version check
 -

 Key: HADOOP-8968
 URL: https://issues.apache.org/jira/browse/HADOOP-8968
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 1.2.0

 Attachments: HADOOP-8968.patch, HADOOP-8968.patch, HADOOP-8968.patch, 
 HADOOP-8968.patch, HADOOP-8968.patch


 The current logic in the TaskTracker and the DataNode to allow a relax 
 version check with the JobTracker and NameNode works only if the versions of 
 Hadoop are exactly the same.
 We should add a switch to disable version checking completely, to enable 
 rolling upgrades between compatible versions (typically patch versions).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[DISCUSS] remove packaging

2012-10-15 Thread Eli Collins
Hey guys,

Heads up: I filed HADOOP-8925 to remove the packaging from trunk and
branch-2.  The packages are not currently being built, were never
updated for MR2/YARN, I'm not aware of anyone planning to do this work
or maintain them, etc. No sense in letting them continue to bit rot in
the code base.

Thanks,
Eli


Re: [DISCUSS] remove packaging

2012-10-15 Thread Eli Collins
Hey Bobby,

That's correct, I mean the packages directories in common, hdfs, and
MR top-level directories, which contain the debs and RPMs.  I'm not
opposed to someone re-working/contributing new code as long as they're
maintained. Don't think there's any sense in keeping the branch-1 code
in trunk/branch-2 since it doesn't support the same daemons, isn't
maintained etc.

Thanks,
Eli

On Mon, Oct 15, 2012 at 10:52 AM, Robert Evans ev...@yahoo-inc.com wrote:
 Eli,

 By packaging I assume that you mean the RPM/Deb packages and not the tar.gz.  
 If that is the case I have no problem with them being removed because as you 
 said in the JIRA BigTop is already providing a working alternative.  If 
 someone else wants to step up to maintain them I also don't have a problem 
 with them staying, so long as they become a part of HADOOP-8914 (Automating 
 the release build).

 --Bobby


 -Original Message-
 From: Eli Collins [mailto:e...@cloudera.com]
 Sent: Monday, October 15, 2012 10:33 AM
 To: common-dev@hadoop.apache.org
 Subject: [DISCUSS] remove packaging

 Hey guys,

 Heads up: I filed HADOOP-8925 to remove the packaging from trunk and
 branch-2.  The packages are not currently being built, were never
 updated for MR2/YARN, I'm not aware of anyone planning to do this work
 or maintain them, etc. No sense in letting them continue to bit rot in
 the code base.

 Thanks,
 Eli


[jira] [Created] (HADOOP-8931) Add Java version to startup message

2012-10-15 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8931:
---

 Summary: Add Java version to startup message 
 Key: HADOOP-8931
 URL: https://issues.apache.org/jira/browse/HADOOP-8931
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Trivial


I often look at logs and have to track down the java version they were run 
with, it would be useful if we logged this as part of the startup message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8925) Remove packaging

2012-10-12 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8925:
---

 Summary: Remove packaging
 Key: HADOOP-8925
 URL: https://issues.apache.org/jira/browse/HADOOP-8925
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins


Per discussion on HADOOP-8809, now that Bigtop is TLP and supports Hadoop v2 
let's remove the Hadoop packaging from trunk and branch-2. We should remove it 
anyway since it no longer part of the build post mavenization, was not updated 
post MR1 (there's no MR2/YARN packaging) and is not maintained.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-8809) RPMs should skip useradds if the users already exist

2012-10-12 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HADOOP-8809.
-

Resolution: Won't Fix

Filed HADOOP-8925.

 RPMs should skip useradds if the users already exist
 

 Key: HADOOP-8809
 URL: https://issues.apache.org/jira/browse/HADOOP-8809
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 1.0.3
Reporter: Steve Loughran
Priority: Minor

 The hadoop.spec preinstall script creates users -but it does this even if 
 they already exist. This may causes problems if the installation has already 
 got those users with different uids. A check with {{id}} can avoid this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8914) Automate release builds

2012-10-10 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8914:
---

 Summary: Automate release builds
 Key: HADOOP-8914
 URL: https://issues.apache.org/jira/browse/HADOOP-8914
 Project: Hadoop Common
  Issue Type: Task
Reporter: Eli Collins


Hadoop releases are currently created manually by the RM (following 
http://wiki.apache.org/hadoop/HowToRelease), which means various aspects of the 
build are ad hoc, eg what tool chain was used to compile the java and native 
code varies from release to release. Other steps can be inconsistent since 
they're done manually eg recently the checksums for an RC were incorrect. Let's 
use the jenkins toolchain and create a job that automates creating release 
builds so that the only manual thing about releasing is publishing to mvn 
central.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Hadoop-1.0.4-rc0

2012-10-09 Thread Eli Collins
On Tue, Oct 9, 2012 at 1:02 PM, Matt Foley mfo...@hortonworks.com wrote:
 Hi Eli,
 Thanks for the suggestion.  Looks like this has gotten fleshed out a little
 more since I started doing releases.

 I've had my key posted at MIT since the beginning.  I've now also uploaded
 it to the PGP Global Directory, and uploaded the key fingerprint to my
 profile at id.apache.org.

 However, when I tried to commit it to
 KEYShttp://svn.apache.org/repos/asf/hadoop/common/dist/KEYS,
 I got error svn:  access to
 '/repos/asf/!svn/act/69e4489f-fcdc-45b3-9a14-637b3a078b13' forbidden.
  Also, at http://svn.apache.org/repos/asf/hadoop/common/dist/readme.txt it
 says:


 To generate the KEYS file, use:

 % wget https://people.apache.org/keys/group/hadoop.asc  KEYS

 which would seem to argue against simply committing changes to KEYS.  Yet
 the file at https://people.apache.org/keys/group/hadoop.asc
 is considerably behind the file at
 http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS

 Any idea what the correct resolution of this is?  Or should I ping
 infra@for the correct instructions?

Bobby, you just updated the KEYS file right? Did you run into any of
these issues?

Thanks,
Eli


Re: Need to add fs shim to use QFS

2012-10-05 Thread Eli Collins
Hey Thilee,

Thanks for contributing. We don't process pull request on the git
mirrors, please upload a patch against trunk and branch-1 if you'd
like this included in Hadoop 1.x and 2.x releases. More info here:
http://wiki.apache.org/hadoop/HowToContribute

Thanks,
Eli

On Fri, Oct 5, 2012 at 10:27 AM, Thilee Subramaniam
thi...@quantcast.com wrote:
 We at Quantcast have released QFS 1.0 (Quantcast File System) to open
 source. This is based on the KFS 0.5 (Kosmos Distributed File System),
 a C++ distributed filesystem implementation. KFS plugs into Apache
 Hadoop via the 'kfs' shim that is part of Hadoop codebase.

 QFS has added support for permissions, and also, provides fault tolerance
 through Reed-Solomon encoding as well as replication. There are also a
 number of performance and stability improvements, including a rewrite of
 the client library to allow parallel concurrent I/Os. Going forward, new
 releases of KFS will come from QFS.

 The open source release of QFS is at http://quantcast.github.com/qfs

 QFS plugs into Apache Hadoop the same way KFS does. Currently, one would
 apply the patches or JARs from the QFS source tree onto Apache Hadoop to
 make Hadoop use QFS. The patch for Apache Hadoop 1.0.X can be found at
 https://github.com/quantcast/qfs/blob/master/hadoop/hadoop-1.0.X.patch

 In order to make the integration seamless, we would like to add a 'qfs'
 shim to Apache Hadoop so that the current active branches (1.0.X, 2.X.X,
 0.23.X) of Apache Hadoop can use QFS.

 Towards this, I've submited an ASF JIRA feature ticket (HADOOP-8885) under
 hadoop-common project, and send a pull request with the QFS shim changes
 to https://github.com/apache/hadoop-common/tree/branch-1.0.2

 I will subsequently submit pull requests to the other active Hadoop
 branches.

 If you have any question, I will be happy yo answer or provide more
 details on QFS.

 - Thilee



[jira] [Created] (HADOOP-8886) Remove KFS support

2012-10-05 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8886:
---

 Summary: Remove KFS support
 Key: HADOOP-8886
 URL: https://issues.apache.org/jira/browse/HADOOP-8886
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins


KFS is no longer maintained (is replaced by QFS, which HADOOP-8885 is adding), 
let's remove it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Hadoop-1.0.4-rc0

2012-10-04 Thread Eli Collins
+1

Thanks Matt.  I verified the signatures and ran some basic MR jobs.

Btw, it would be good to update the public KEYS repo with your key as
well.  See http://wiki.apache.org/hadoop/HowToRelease.

Thanks,
Eli


On Thu, Oct 4, 2012 at 1:59 PM, Matt Foley ma...@apache.org wrote:
 Hi,
 There has been a request from several PMC members for a maintenance release
 of hadoop-1.0.
 Please download and test this release candidate, Hadoop-1.0.4-rc0.

 It is available at http://people.apache.org/~mattf/hadoop-1.0.4-rc0/
 or from the Nexus maven repo.

 Release notes are at
 http://people.apache.org/~mattf/hadoop-1.0.4-rc0/releasenotes.html

 Vote will run for 1 week as usual, terminating at 2pm PDT, Thur 11 Oct 2012.

 Thank you,
 --Matt Foley
 Release Manager


[jira] [Created] (HADOOP-8873) Port HADOOP-8175 (Add mkdir -p flag) to branch-1

2012-10-02 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8873:
---

 Summary: Port HADOOP-8175 (Add mkdir -p flag) to branch-1
 Key: HADOOP-8873
 URL: https://issues.apache.org/jira/browse/HADOOP-8873
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Eli Collins


Per HADOOP-8551 let's port the mkdir -p option to branch-1 for a 1.x release to 
help users transition to the new shell behavior. In Hadoop 2.x mkdir currently 
requires the -p option to create parent directories but a program that 
specifies it won't work on 1.x since it doesn't support this option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8856) SecuirtyUtil#openSecureHttpConnection should use an authenticated URL even if kerberos is not enabled

2012-09-26 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8856:
---

 Summary: SecuirtyUtil#openSecureHttpConnection should use an 
authenticated URL even if kerberos is not enabled
 Key: HADOOP-8856
 URL: https://issues.apache.org/jira/browse/HADOOP-8856
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.0.2-alpha
Reporter: Eli Collins
Assignee: Eli Collins


HADOOP-8581 updated openSecureHttpConnection to use an ssl factory, however we 
only use it if kerberos security is enabled, so we'll fail to use it if SSL is 
enabled but kerberos/SPNEGO are not. This manifests itself as the 2NN failing 
to checkpoint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-8856) SecuirtyUtil#openSecureHttpConnection should use an authenticated URL even if kerberos is not enabled

2012-09-26 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HADOOP-8856.
-

Resolution: Duplicate

Dupe of HADOOP-8855

 SecuirtyUtil#openSecureHttpConnection should use an authenticated URL even if 
 kerberos is not enabled
 -

 Key: HADOOP-8856
 URL: https://issues.apache.org/jira/browse/HADOOP-8856
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.0.2-alpha
Reporter: Eli Collins
Assignee: Eli Collins

 HADOOP-8581 updated openSecureHttpConnection to use an ssl factory, however 
 we only use it if kerberos security is enabled, so we'll fail to use it if 
 SSL is enabled but kerberos/SPNEGO are not. This manifests itself as the 2NN 
 failing to checkpoint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8857) hadoop.http.authentication.signature.secret.file should be created if the configured file does not exist

2012-09-26 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8857:
---

 Summary: hadoop.http.authentication.signature.secret.file should 
be created if the configured file does not exist
 Key: HADOOP-8857
 URL: https://issues.apache.org/jira/browse/HADOOP-8857
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Priority: Minor


AuthenticationFilterInitializer#initFilter fails if the configured 
{{hadoop.http.authentication.signature.secret.file}} does not exist, eg:

{noformat}
java.lang.RuntimeException: Could not read HTTP signature secret file: 
/var/lib/hadoop-hdfs/hadoop-http-auth-signature-secret
{noformat}

Creating /var/lib/hadoop-hdfs/hadoop-http-auth-signature-secret (populated with 
a string) fixes the issue. Per the auth docs If a secret is not provided a 
random secret is generated at start up time., which sounds like it means the 
file should be generated at startup with a random secrete, which doesn't seem 
to be the case. Also the instructions in the docs should be more clear in this 
regard.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8859) Improve SecurityUtil#openSecureHttpConnection javadoc

2012-09-26 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8859:
---

 Summary: Improve SecurityUtil#openSecureHttpConnection javadoc 
 Key: HADOOP-8859
 URL: https://issues.apache.org/jira/browse/HADOOP-8859
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation, security
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins


Per HADOOP-8855 SecurityUtil#openSecureHttpConnection is not SPNEGO specific 
since it supports other authenticators so we should update the javadoc to 
reflect that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Commits breaking compilation of MR 'classic' tests

2012-09-25 Thread Eli Collins
How about adding this step to the MR PreCommit jenkins job so it's run
as part test-patch?

On Tue, Sep 25, 2012 at 7:48 PM, Arun C Murthy a...@hortonworks.com wrote:
 Committers,

  As most people are aware, the MapReduce 'classic' tests (in 
 hadoop-mapreduce-project/src/test) still need to built using ant since they 
 aren't mavenized yet.

  I've seen several commits (and 2 within the last hour i.e. MAPREDUCE-3681 
 and MAPREDUCE-3682) which lead me to believe developers/committers aren't 
 checking for this.

  Henceforth, with all changes, before committing, please do run:
  $ mvn install
  $ cd hadoop-mapreduce-project
  $ ant veryclean all-jars -Dresolvers=internal

  These instructions were already in 
 http://wiki.apache.org/hadoop/HowToReleasePostMavenization and I've just 
 updated http://wiki.apache.org/hadoop/HowToContribute.

 thanks,
 Arun



[jira] [Created] (HADOOP-8812) ExitUtil#terminate should print Exception#toString

2012-09-14 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8812:
---

 Summary: ExitUtil#terminate should print Exception#toString 
 Key: HADOOP-8812
 URL: https://issues.apache.org/jira/browse/HADOOP-8812
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


Per Steve's feedback on ExitUtil#terminate should print Exception#toString 
rather than use getMessage as the latter may return null.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8801) ExitUtil#terminate should capture the exception stack trace

2012-09-13 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8801:
---

 Summary: ExitUtil#terminate should capture the exception stack 
trace
 Key: HADOOP-8801
 URL: https://issues.apache.org/jira/browse/HADOOP-8801
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hadoop-8801.txt

ExitUtil#terminate(status,Throwable) should capture and log the stack trace of 
the given throwable. This will help debug issues like HDFS-3933.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8804) Improve Web UIs when the wildcard address is used

2012-09-13 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8804:
---

 Summary: Improve Web UIs when the wildcard address is used
 Key: HADOOP-8804
 URL: https://issues.apache.org/jira/browse/HADOOP-8804
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha, 1.0.0
Reporter: Eli Collins
Priority: Minor


When IPC addresses are bound to the wildcard (ie the default config) the NN, JT 
(and probably RM etc) Web UIs are a little goofy. Eg 0 Hadoop Map/Reduce 
Administration and NameNode '0.0.0.0:18021' (active). Let's improve them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8807) Update README and website to reflect HADOOP-8662

2012-09-13 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8807:
---

 Summary: Update README and website to reflect HADOOP-8662
 Key: HADOOP-8807
 URL: https://issues.apache.org/jira/browse/HADOOP-8807
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Reporter: Eli Collins


HADOOP-8662 removed the various tabs from the website. Our top-level README.txt 
and the generated docs refer to them (eg hadoop.apache.org/core, /hdfs etc). 
Let's fix that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: test-patch and native code compile

2012-09-06 Thread Eli Collins
Yea we want jenkins to run with native. How about adding making native
optional in test-patch via a flag and updating the jenkins jobs to use
it?

On Thu, Sep 6, 2012 at 7:25 AM, Alejandro Abdelnur t...@cloudera.com wrote:
 Makes sense, though the Jenkins runs should continue to run w/ native, right?

 On Thu, Sep 6, 2012 at 12:49 AM, Hemanth Yamijala yhema...@gmail.com wrote:
 Hi,

 The test-patch script in Hadoop source runs a native compile with the
 patch. On platforms like MAC, there are issues with the native
 compile. For e.g we run into HADOOP-7147 that has been resolved as
 Won't fix.

 Hence, should we have a switch in test-patch to not run native compile
 ? Could open a JIRA and fix, if that's OK ?

 Thanks
 hemanth



 --
 Alejandro


Re: Branch 2 release names

2012-09-05 Thread Eli Collins
On Tue, Sep 4, 2012 at 11:55 AM, Owen O'Malley omal...@apache.org wrote:
 While cleaning up the subversion branches, I thought more about the
 branch 2 release names. I'm concerned if we backtrack and reuse
 release numbers it will be extremely confusing to users. It also
 creates problems for tools like Maven that parse version numbers and
 expect a left to right release numbering scheme (eg. 2.1.1-alpha 
 2.1.0). It also seems better to keep on the 2.0.x minor release until
 after we get a GA release off of the 2.0 branch.

 Therefore, I'd like to propose:
 1. rename branch-2.0.1-alpha - branch-2.0
 2. delete branch-2.1.0-alpha
 3. stabilizing goes into branch-2.0 until it gets to GA
 4. features go into branch-2 and will be branched into branch-2.1 later
 5. The release tags can have the alpha/beta tags on them.

 Thoughts?


branch-2.0.1-alpha is pretty far behind branch-2 at this point, both
in terms of features merged to branch-2 (eg no auto failover or hsync)
and bug fixes (iiuc it is just 2.0 plus a couple changes). From my
hdfs pov the branch doesn't seem worth maintaining.  I'd tweak this as
follows:

1. delete branch-2.1.0-alpha
2. rename branch-2 - branch-2.0 some time after 0.23.3 is released
3. stabilizing goes into branch-2.0 until it gets to GA
4. features go into branch-2 and will be branched into branch-2.1 later
5. The release tags can have the alpha/beta tags on them.

On the hdfs side most trunk work is for branch-2 so we're already
merging trunk - branch-2, so delaying the third merge would help, and
we're using feature branches for the big stuff (HDFS-3077) so they're
being isolated that way.

Thanks,
Eli


[jira] [Created] (HADOOP-8769) Tests failures on the ARM hosts

2012-09-05 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8769:
---

 Summary: Tests failures on the ARM hosts 
 Key: HADOOP-8769
 URL: https://issues.apache.org/jira/browse/HADOOP-8769
 Project: Hadoop Common
  Issue Type: Test
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins


I created a [jenkins job|https://builds.apache.org/job/Hadoop-trunk-ARM] that 
runs on the ARM machines. The local build is now working and running tests 
(thanks Gavin!), however there are 40 test failures, looks like most are due to 
host configuration issues. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-8722) Update BUILDING.txt with latest snappy info

2012-08-30 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HADOOP-8722.
-

  Resolution: Fixed
   Fix Version/s: 2.2.0-alpha
Target Version/s:   (was: 2.2.0-alpha)
Hadoop Flags: Reviewed

I've committed this, thanks Colin!

 Update BUILDING.txt with latest snappy info
 ---

 Key: HADOOP-8722
 URL: https://issues.apache.org/jira/browse/HADOOP-8722
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0-alpha
Reporter: Eli Collins
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 2.2.0-alpha

 Attachments: HADOOP-8722.002.patch, HADOOP-8722.003.patch, 
 HADOOP-8722.004.patch, hadoop-8722.txt


 HADOOP-8620 changed the default of snappy.lib from snappy.prefix/lib to 
 empty.  This, in turn, means that you can't use {{-Dbundle.snappy}} without 
 setting {{-Dsnappy.lib}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8752) Update website to reflect merged committer lists

2012-08-30 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8752:
---

 Summary: Update website to reflect merged committer lists
 Key: HADOOP-8752
 URL: https://issues.apache.org/jira/browse/HADOOP-8752
 Project: Hadoop Common
  Issue Type: Task
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


Let's update the website to reflect the merged committer list 
(http://s.apache.org/Owx), ie one top-level credits section next to the PMC 
section. 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8744) Fix the OSX native build

2012-08-28 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8744:
---

 Summary: Fix the OSX native build
 Key: HADOOP-8744
 URL: https://issues.apache.org/jira/browse/HADOOP-8744
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 2.2.0-alpha
Reporter: Eli Collins


Per HADOOP-8737 let's fix the OSX native build, which was working as of 
HADOOP-3659 (v 0.21).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8740) Build target to generate findbugs html output

2012-08-27 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8740:
---

 Summary: Build target to generate findbugs html output
 Key: HADOOP-8740
 URL: https://issues.apache.org/jira/browse/HADOOP-8740
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Reporter: Eli Collins


It would be useful if there was a build target or flag to generate findbugs 
output. It would depend on {{mvn compile findbugs:findbugs}} and run 
{{$FINDBUGS_HOME/bin/convertXmlToText -html ../path/to/findbugsXml.xml 
findbugs.html}} to generate findbugs.html in the target directory.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-8722) Update BUILDING.txt to indicate snappy.lib is required

2012-08-22 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8722:
---

 Summary: Update BUILDING.txt to indicate snappy.lib is required
 Key: HADOOP-8722
 URL: https://issues.apache.org/jira/browse/HADOOP-8722
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Reporter: Eli Collins
Priority: Minor


HADOOP-8620 changed the default of snappy.lib from snappy.prefix/lib to the 
empty which means we need to update the following in BUILDING.txt to indicate 
it needs to be set (or restore the default to snappy.prefix/lib). 

{noformat}
  * Use -Dsnappy.prefix=(/usr/local)  -Dbundle.snappy=(false) to compile
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8710) Remove ability for users to easily run the trash emptier

2012-08-17 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8710:
---

 Summary: Remove ability for users to easily run the trash emptier
 Key: HADOOP-8710
 URL: https://issues.apache.org/jira/browse/HADOOP-8710
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.0.0-alpha, 1.0.0
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hadoop-8710.txt

Users can currently run the emptier via {{hadoop org.apache.hadoop.fs.Trash}}, 
which seems error prone as there's nothing in that command that suggests it 
runs the emptier and nothing that asks you before deleting the trash for all 
users (that the current user is capable of deleting). Given that the trash 
emptier runs server side (eg on the NN) let's remove the ability to easily run 
it client side.  Marking as an incompatible change since someone expecting the 
hadoop command with this class specified to empty trash will no longer be able 
to (they'll need to create their own class that does this).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: S3 FS tests broken?

2012-08-14 Thread Eli Collins
Trevor,
Forgot to ask, since you can reproduce this can you confirm and see
why S3Conf.get is returning null for test.fs.s3.name?

On Mon, Aug 13, 2012 at 6:35 PM, Eli Collins e...@cloudera.com wrote:
 Passes for me locally, and the precondition that's failing (passing
 null to Conf#set) from the backtrace looks like the null is coming
 from:

 S3Conf.set(FS_DEFAULT_NAME_DEFAULT, S3Conf.get(test.fs.s3.name));

 which is set in core-site.xml so something strange is going on.
 HADOOP-6296 looks related btw.


 On Mon, Aug 13, 2012 at 6:04 PM, Trevor tre...@scurrilous.com wrote:
 Anyone know why these tests have started failing? It happens for me locally
 and it just happened in Jenkins:
 https://builds.apache.org/job/PreCommit-HADOOP-Build/1288/

 I don't see any obvious changes recently that would cause it.

 Tests in error:
   testCreateFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property
 value must not be null

 testCreateFileWithNullName(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testCreateExistingFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null

 testCreateFileInNonExistingDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testCreateDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null

 testMkdirsFailsForSubdirectoryOfExistingFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testIsDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testDeleteFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property
 value must not be null

 testDeleteNonExistingFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null

 testDeleteNonExistingFileInDir(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testDeleteDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null

 testDeleteNonExistingDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testModificationTime(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testFileStatus(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property
 value must not be null

 testGetFileStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null

 testListStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testListStatus(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property
 value must not be null
   testBlockSize(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)
   testFsStatus(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWorkingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)
   testMkdirs(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testMkdirsFailsForSubdirectoryOfExistingFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testGetFileStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testListStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)
   testListStatus(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWriteReadAndDeleteEmptyFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWriteReadAndDeleteHalfABlock(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWriteReadAndDeleteOneBlock(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWriteReadAndDeleteOneAndAHalfBlocks(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWriteReadAndDeleteTwoBlocks(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)
   testOverwrite(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWriteInNonExistentDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testDeleteNonExistentFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testDeleteRecursively(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testDeleteEmptyDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameNonExistentPath(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameFileMoveToNonExistentDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameFileMoveToExistingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameFileAsExistingFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameFileAsExistingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameDirectoryMoveToNonExistentDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameDirectoryMoveToExistingDirectory(org.apache.hadoop.fs.s3

[jira] [Created] (HADOOP-8689) Make trash a server side configuration option

2012-08-13 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8689:
---

 Summary: Make trash a server side configuration option
 Key: HADOOP-8689
 URL: https://issues.apache.org/jira/browse/HADOOP-8689
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins


Per ATM's suggestion in HADOOP-8598 for v2 let's make {{fs.trash.interval}} 
configured server side rather than client side. The 
{{fs.trash.checkpoint.interval}} option is already server side as the emptier 
runs in the NameNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8690) Shell may remove a file without going to trash even if skipTrash is not enabled

2012-08-13 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8690:
---

 Summary: Shell may remove a file without going to trash even if 
skipTrash is not enabled
 Key: HADOOP-8690
 URL: https://issues.apache.org/jira/browse/HADOOP-8690
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Priority: Minor


Delete.java contains the following comment:

{noformat}
// TODO: if the user wants the trash to be used but there is any
// problem (ie. creating the trash dir, moving the item to be deleted,
// etc), then the path will just be deleted because moveToTrash returns
// false and it falls thru to fs.delete.  this doesn't seem right
{noformat}

If Trash#moveToAppropriateTrash returns false FsShell will delete the path even 
if skipTrash is not enabled. The comment isn't quite right as some of these 
failure scenarios result in exceptions not a false return value, and in the 
case of an exception we don't unconditionally delete the path. 
TrashPolicy#moveToTrash states that it only returns false if the item is 
already in the trash or trash is disabled, and the expected behavior for these 
cases is to just delete the path. However TrashPolicyDefault#moveToTrash also 
returns false if there's a problem creating the trash directory, so for this 
case I don't think we should throw an exception rather than return false.

I also question the behavior of just deleting when the item is already in the 
trash as it may have changed since previously put in the trash and not been 
checkpointed yet. Seems like in this case we should move it to trash but with a 
file name suffix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: S3 FS tests broken?

2012-08-13 Thread Eli Collins
Passes for me locally, and the precondition that's failing (passing
null to Conf#set) from the backtrace looks like the null is coming
from:

S3Conf.set(FS_DEFAULT_NAME_DEFAULT, S3Conf.get(test.fs.s3.name));

which is set in core-site.xml so something strange is going on.
HADOOP-6296 looks related btw.


On Mon, Aug 13, 2012 at 6:04 PM, Trevor tre...@scurrilous.com wrote:
 Anyone know why these tests have started failing? It happens for me locally
 and it just happened in Jenkins:
 https://builds.apache.org/job/PreCommit-HADOOP-Build/1288/

 I don't see any obvious changes recently that would cause it.

 Tests in error:
   testCreateFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property
 value must not be null

 testCreateFileWithNullName(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testCreateExistingFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null

 testCreateFileInNonExistingDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testCreateDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null

 testMkdirsFailsForSubdirectoryOfExistingFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testIsDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testDeleteFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property
 value must not be null

 testDeleteNonExistingFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null

 testDeleteNonExistingFileInDir(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testDeleteDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null

 testDeleteNonExistingDirectory(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testModificationTime(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testFileStatus(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property
 value must not be null

 testGetFileStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null

 testListStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.TestS3_LocalFileContextURI):
 Property value must not be null
   testListStatus(org.apache.hadoop.fs.TestS3_LocalFileContextURI): Property
 value must not be null
   testBlockSize(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)
   testFsStatus(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWorkingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)
   testMkdirs(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testMkdirsFailsForSubdirectoryOfExistingFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testGetFileStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testListStatusThrowsExceptionForNonExistentFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)
   testListStatus(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWriteReadAndDeleteEmptyFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWriteReadAndDeleteHalfABlock(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWriteReadAndDeleteOneBlock(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWriteReadAndDeleteOneAndAHalfBlocks(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWriteReadAndDeleteTwoBlocks(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)
   testOverwrite(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testWriteInNonExistentDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testDeleteNonExistentFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testDeleteRecursively(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testDeleteEmptyDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameNonExistentPath(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameFileMoveToNonExistentDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameFileMoveToExistingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameFileAsExistingFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameFileAsExistingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameDirectoryMoveToNonExistentDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameDirectoryMoveToExistingDirectory(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 testRenameDirectoryAsExistingFile(org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract)

 

[jira] [Created] (HADOOP-8687) Bump log4j to version 1.2.17

2012-08-12 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8687:
---

 Summary: Bump log4j to version 1.2.17
 Key: HADOOP-8687
 URL: https://issues.apache.org/jira/browse/HADOOP-8687
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor


Let's bump log4j from 1.2.15 to version 1.2.17. It and 16 are maintenance 
releases with good fixes and also remove some jar dependencies (javamail, jmx, 
jms).

http://logging.apache.org/log4j/1.2/changes-report.html#a1.2.17

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: fs.local.block.size vs file.blocksize

2012-08-11 Thread Eli Collins
Hi Ellis,

fs.local.block.size is the default FileSystem block size, note however
that most file systems (like HDFS, see DistributedFileSystem) override
this, eg when using HDFS the default block size is configured with
dfs.blocksize which defaults to 64mb.

Note in v1 the default block size for hdfs was 64mb as well
(configured via dfs.block.size, which dfs.blocksize replaces).

Thanks,
Eli

On Sat, Aug 11, 2012 at 7:55 AM, Ellis H. Wilson III el...@cse.psu.edu wrote:
 Hi guys and gals,

 I originally posted a version of this question on the user list on a few
 days ago to no response, so I thought perhaps it delved a bit too far into
 the nitty-gritty to warrant one.  My apologies for cross-listing.

 Can someone please briefly summarize the difference between these two
 parameters?  I do not see deprecated warnings for fs.local.block.size when I
 run with it set.  Furthermore, and I'm unsure if this is related, I see two
 copies of what is effectively RawLocalFileSystem.java (the other is
 local/RawLocalFs.java).  It appears that the one in local/ is for the old
 abstract FileSystem class, whereas RawLocalFileSystem.java uses the new
 abstract class.  Perhaps this is the root cause of the two parameters?  Or
 does file.blocksize simply control the abstract class or some such thing?

 The practical answers I really need to get a handle on are the following:
 1. Is the default for the file:// filesystem boosted to a 64MB blocksize in
 Hadoop 2.0?  It was only 32MB in Hadoop 1.0, but it's not 100% clear to me
 that it is now a full 64MB.  The core-site.xml docs online suggest it's been
 boosted.
 2. If I alter the blocksize of file://, is it correct to presume that also
 will impact the shuffle block-size since that data goes locally?

 Thanks!

 ellis


Re: adding bin and net categories to JIRA/HADOOP

2012-08-06 Thread Eli Collins
Works for me. I've been adding missing ones that make sense (eg webhdfs).

On Mon, Aug 6, 2012 at 11:34 AM, Steve Loughran ste...@hortonworks.com wrote:
 There aren't bin and net categories in JIRA, yet often bugs go against the
 code there?

 Should I add them?


Re: Where can I find old Hadoop Common Release Binaries?

2012-08-04 Thread Eli Collins
Hey Carlos,

We don't release common separately from mr and hdfs. You can download
the hadoop binaries from this site (or preferably it's mirror).

http://apache.claz.org/hadoop/common

Eg the 0.20.2 binary is here:
http://apache.claz.org/hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz

Thanks,
Eli

On Sat, Aug 4, 2012 at 8:01 PM, Carlos Andrade carlosvia...@gmail.com wrote:
 Dears,

 I am part of a multi institutional group that is doing some data analysis
 on different open source projects and we selected Hadoop Commons to be part
 of the data set. I would very much appreciate if one of you could confirm
 if Hadoop Commons only available binaries are located at
 http://www.apache.org/dyn/closer.cgi/hadoop/common/

 I noticed some releases like hadoop-0.20.2, despite of the mirror doesn't
 seem to have the binaries available.


 Carlos Andrade
 http://carlosandrade.co


[jira] [Created] (HADOOP-8642) io.native.lib.available only controls zlib

2012-08-02 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8642:
---

 Summary: io.native.lib.available only controls zlib
 Key: HADOOP-8642
 URL: https://issues.apache.org/jira/browse/HADOOP-8642
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins


Per core-default.xml {{io.native.lib.available}} indicates Should native 
hadoop libraries, if present, be used however it looks like it only affects 
zlib. Since we always load the native library this means we may use native 
libraries even if io.native.lib.available is set to false.

Let's make the flag to work as advertised - rather than always load the native 
hadoop library we only attempt to load the library (and report that native is 
available) if this flag is set. Since io.native.lib.available defaults to true 
the default behavior should remain unchanged (except that now we wont actually 
try to load the library if this flag is disabled).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Java 7 and Hadoop

2012-07-31 Thread Eli Collins
We should officially support it in Hadoop (one piece of BIGTOP-458
which is for supporting it across the whole stack).
Now that HADOOP-8370 (fixes native compilation) is in the full tarball
should work. A good next step would be updating JAVA_HOME on one of
the Hadoop jenkins jobs to use jdk7.


On Tue, Jul 31, 2012 at 9:17 AM, Thomas Graves tgra...@yahoo-inc.com wrote:
 I've seen more and more people using java 7.  We are also planning to move
 to java 7 due to the eol of java 6 that Scott referenced.

 What are folks thoughts on making it officially supported by Hadoop?  Is
 there a process for this or is it simply updating the wiki Eli mentioned
 after sufficient testing?

 Thanks,
 Tom


 On 4/26/12 4:25 PM, Eli Collins e...@cloudera.com wrote:

 Hey Scott,

 Nice.  Please update this page with your experience when you get a chance:
 http://wiki.apache.org/hadoop/HadoopJavaVersions

 Thanks,
 Eli


 On Thu, Apr 26, 2012 at 2:03 PM, Scott Carey sc...@richrelevance.com wrote:
 Java 7 update 4 has been released.  It is even available for MacOS X from
 Oracle:
 http://www.oracle.com/technetwork/java/javase/downloads/jdk-7u4-downloads-159
 1156.html

 Java 6 will reach end of life in about 6 months.   After that point, there
 will be no more public updates from Oracle for Java 6, even security 
 updates.
 https://blogs.oracle.com/henrik/entry/updated_java_6_eol_date
 You can of course pay them for updates or build your own OpenJDK.

 The entire Hadoop ecosystem needs to test against Java 7 JDKs this year.  I
 will be testing some small clusters of ours with JDK 7 in about a month, and
 my internal projects will start using Java 7 features shortly after.


 See the JDK roadmap:
 http://blogs.oracle.com/javaone/resource/java_keynote/slide_15_full_size.gif
 https://blogs.oracle.com/java/entry/moving_java_forward_java_strategy





Re: Build failed in Jenkins: Hadoop-Common-trunk #481

2012-07-24 Thread Eli Collins
I fixed up the command jenkins was running to hopefully fix this error.


On Tue, Jul 24, 2012 at 2:27 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 See https://builds.apache.org/job/Hadoop-Common-trunk/481/changes

 Changes:

 [eli] HDFS-3709. TestStartup tests still binding to the ephemeral port. 
 Contributed by Eli Collins

 [bobby] MAPREDUCE-3893. allow capacity scheduler configs max-apps and 
 max-am-pct per queue (tgraves via bobby)

 [todd] HDFS-3697. Enable fadvise readahead by default. Contributed by Todd 
 Lipcon.

 --
 [...truncated 18195 lines...]
 [DEBUG]   (s) reportsDirectory = 
 https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/surefire-reports
 [DEBUG]   (s) runOrder = filesystem
 [DEBUG]   (s) session = org.apache.maven.execution.MavenSession@98f352
 [DEBUG]   (s) skip = false
 [DEBUG]   (s) skipTests = false
 [DEBUG]   (s) systemPropertyVariables = {hadoop.log.dir=null, 
 hadoop.tmp.dir=null, java.net.preferIPv4Stack=true, 
 java.security.egd=file:///dev/urandom, 
 java.security.krb5.conf=https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/src/test/resources/krb5.conf,
  test.build.classes=null, 
 test.build.data=https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/test-dir,
  
 test.build.dir=https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/test-dir,
  test.build.webapps=null, test.cache.data=null}
 [DEBUG]   (s) testClassesDirectory = 
 https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/test-classes
 [DEBUG]   (s) testFailureIgnore = false
 [DEBUG]   (s) testNGArtifactName = org.testng:testng
 [DEBUG]   (s) testSourceDirectory = 
 https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/src/test/java
 [DEBUG]   (s) trimStackTrace = true
 [DEBUG]   (s) useFile = true
 [DEBUG]   (s) useManifestOnlyJar = true
 [DEBUG]   (s) useSystemClassLoader = true
 [DEBUG]   (s) useUnlimitedThreads = false
 [DEBUG]   (s) workingDirectory = 
 https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples
 [DEBUG] -- end configuration --
 [INFO] No tests to run.
 [INFO] Surefire report directory: 
 https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/surefire-reports
 [DEBUG] dummy:dummy:jar:1.0 (selected for null)
 [DEBUG]   org.apache.maven.surefire:surefire-booter:jar:2.12:compile 
 (selected for compile)
 [DEBUG] org.apache.maven.surefire:surefire-api:jar:2.12:compile (selected 
 for compile)
 [DEBUG] Adding to surefire booter test classpath: 
 /home/jenkins/.m2/repository/org/apache/maven/surefire/surefire-booter/2.12/surefire-booter-2.12.jar
  Scope: compile
 [DEBUG] Adding to surefire booter test classpath: 
 /home/jenkins/.m2/repository/org/apache/maven/surefire/surefire-api/2.12/surefire-api-2.12.jar
  Scope: compile
 [DEBUG] Setting system property [java.net.preferIPv4Stack]=[true]
 [DEBUG] Setting system property 
 [java.security.krb5.conf]=[https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/src/test/resources/krb5.conf]
 [DEBUG] Setting system property [tar]=[true]
 [DEBUG] Setting system property 
 [test.build.data]=[https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/test-dir]
 [DEBUG] Setting system property 
 [user.dir]=[https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples]
 [DEBUG] Setting system property 
 [localRepository]=[/home/jenkins/.m2/repository]
 [DEBUG] Setting system property 
 [test.build.dir]=[https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/test-dir]
 [DEBUG] Setting system property 
 [basedir]=[https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples]
 [DEBUG] Setting system property [java.security.egd]=[file:///dev/urandom]
 [DEBUG] Using JVM: /home/jenkins/tools/java/jdk1.6.0_27/jre/bin/java
 [DEBUG] Setting environment variable 
 [LD_LIBRARY_PATH]=[/home/jenkins/tools/java/jdk1.6.0_27/jre/lib/i386/server:/home/jenkins/tools/java/jdk1.6.0_27/jre/lib/i386:/home/jenkins/tools/java/jdk1.6.0_27/jre/../lib/i386:https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/target/native/target/usr/local/lib]
 [DEBUG] Setting environment variable [MALLOC_ARENA_MAX]=[4]
 [DEBUG] dummy:dummy:jar:1.0 (selected for null)
 [DEBUG]   org.apache.maven.surefire:surefire-junit3:jar:2.12:test (selected 
 for test)
 [DEBUG] org.apache.maven.surefire:surefire-api:jar:2.12:test (selected 
 for test)
 [DEBUG] Adding

Re: 10 failures when run test on hadoop-common-project

2012-07-23 Thread Eli Collins
Hi Yanbo,

You can look at the individual tests output (eg
hadoop-common-project/hadoop-common/target/surefire-reports/org.apache.hadoop.net.TestTableMapping.txt
for TestTableMapping and so on) for an indication as to why the test
failed.

Thanks,
Eli

On Mon, Jul 23, 2012 at 3:50 AM, Yanbo Liang yanboha...@gmail.com wrote:
 Hi All,

 I just run the test code of hadoop-common-project with revision 1364560.
 It produced 10 FAILURES after I typed mvn test under the folder of
 hadoop-common-project.
 But the lastest build in the Jenkins server is Hadoop-Common-trunk#480, and
 there is no test failures or errors occurred.

 Is there some bugs in the test cases or some wrong operations that I have
 done?

 I list the FAILURES that I have encountered:

 Running org.apache.hadoop.net.TestTableMapping
 Tests run: 5, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.619 sec
  FAILURE!

 Running org.apache.hadoop.net.TestNetUtils
 Tests run: 38, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.562 sec
  FAILURE!


 Running org.apache.hadoop.util.TestDiskChecker
 Tests run: 14, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.875 sec
  FAILURE!

 Running org.apache.hadoop.ha.TestShellCommandFencer
 Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.158 sec
  FAILURE!

 Running org.apache.hadoop.security.TestSecurityUtil
 Tests run: 15, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.756 sec
  FAILURE!

 Running org.apache.hadoop.fs.TestFSMainOperationsLocalFileSystem
 Tests run: 49, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.371 sec
  FAILURE!

 Running org.apache.hadoop.fs.viewfs.TestViewFsTrash
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.601 sec
  FAILURE!

 Running org.apache.hadoop.fs.viewfs.TestFSMainOperationsLocalFileSystem
 Tests run: 98, Failures: 0, Errors: 98, Skipped: 0, Time elapsed: 2.437 sec
  FAILURE!

 Running org.apache.hadoop.fs.TestFileUtil
 Tests run: 10, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.9 sec
  FAILURE!

 Running org.apache.hadoop.fs.TestLocalDirAllocator
 Tests run: 27, Failures: 9, Errors: 0, Skipped: 0, Time elapsed: 1.152 sec
  FAILURE!


[jira] [Reopened] (HADOOP-8431) Running distcp wo args throws IllegalArgumentException

2012-07-23 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins reopened HADOOP-8431:
-


I just tried this and confirmed it still fails with the latest build. In the 
future please try to reproduce the issue before you close it.

hadoop-3.0.0-SNAPSHOT $ ./bin/hadoop distcp
12/07/23 19:21:48 ERROR tools.DistCp: Invalid arguments: 
java.lang.IllegalArgumentException: Target path not specified
at org.apache.hadoop.tools.OptionsParser.parse(OptionsParser.java:86)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:102)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:368)


 Running distcp wo args throws IllegalArgumentException
 --

 Key: HADOOP-8431
 URL: https://issues.apache.org/jira/browse/HADOOP-8431
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
  Labels: newbie

 Running distcp w/o args results in the following:
 {noformat}
 hadoop-3.0.0-SNAPSHOT $ ./bin/hadoop distcp
 12/05/23 18:49:04 ERROR tools.DistCp: Invalid arguments: 
 java.lang.IllegalArgumentException: Target path not specified
   at org.apache.hadoop.tools.OptionsParser.parse(OptionsParser.java:86)
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:102)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.tools.DistCp.main(DistCp.java:368)
 Invalid arguments: Target path not specified
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8616) ViewFS configuration requires a trailing slash

2012-07-23 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8616:
---

 Summary: ViewFS configuration requires a trailing slash
 Key: HADOOP-8616
 URL: https://issues.apache.org/jira/browse/HADOOP-8616
 Project: Hadoop Common
  Issue Type: Bug
  Components: viewfs
Affects Versions: 2.0.0-alpha, 0.23.0
Reporter: Eli Collins


If the viewfs config doesn't have a trailing slash commands like the following 
fail:

{noformat}
bash-3.2$ hadoop fs -ls
-ls: Can not create a Path from an empty string
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [path ...]
{noformat}

We hit this problem with the following configuration because hdfs://ha-nn-uri 
does not have a trailing /.

{noformat}
  property
  namefs.viewfs.mounttable.foo.link./nameservices/ha-nn-uri/name
  valuehdfs://ha-nn-uri/value
  /property
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Common and hdfs Jenkins jobs now running tests

2012-07-19 Thread Eli Collins
Hey gang,

The tests were disabled on the common and hdfs jenkins jobs for some
reason. This was hiding test failures in the tests that are not run by
test-patch (eg hadoop-dist, see HDFS-3690).

I've re-enabled the tests on these jobs and filed HADOOP-8610 to get
test-patch on Hadoop to run the root projects (eg hadoop-tools).

Thanks,
Eli


[jira] [Created] (HADOOP-8598) Server-side Trash

2012-07-15 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8598:
---

 Summary: Server-side Trash
 Key: HADOOP-8598
 URL: https://issues.apache.org/jira/browse/HADOOP-8598
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Critical


There are a number of problems with Trash that continues to result in permanent 
data loss for users. The primary reasons trash is not used:

- Trash is configured client-side and not enabled by default.
- Trash is shell-only. FileSystem, WebHDFS, HttpFs, etc never use trash.
- If trash fails, for example, because we can't create the trash directory or 
the move itself fails, trash is bypassed and the data is deleted.

Trash was designed as a feature to help end users via the shell, however in my 
experience the primary use of trash is to help administrators implement data 
retention policies (this was also the motivation for HADOOP-7460).  One could 
argue that (periodic read-only) snapshots are a better solution to this 
problem, however snapshots are not slated for Hadoop 2.x and trash is 
complimentary to snapshots (and backup) - eg you may create and delete data 
within your snapshot or backup window - so it makes sense to revisit trash's 
design. I think it's worth bringing trash's functionality in line with what 
users need.

I propose we enable trash on a per-filesystem basis and implement it 
server-side. Ie trash becomes an HDFS feature enabled by administrators. 
Because the trash emptier lives in HDFS and users already have a per-filesystem 
trash directory we're mostly there already. The design preference from 
HADOOP-2514 was for trash to be implemented in user code however (a) in light 
of these problems, (b) we have a lot more user-facing APIs than the shell and 
(c) clients increasingly span file systems (via federation and symlinks) this 
design choice makes less sense. This is why we already use a per-filesystem 
trash/home directory instead of the user's client-configured one - otherwise 
trash would not work because renames can't span file systems.

In short, HDFS trash would work similarly to how it does today, the difference 
is that client delete APIs would result in a rename into trash (ala 
TrashPolicyDefault#moveToTrash) if trash is enabled. Like today it would be 
renamed to the trash directory on the file system where the file being removed 
resides. The primary difference is that enablement and policy are configured 
server-side by adminstrators and is used regardless of the API used to access 
the filesytem. The one execption to this is that I think we should continue to 
support the explict skipTrash shell option. The rationale for skipTrash 
(HADOOP-6080) is that a move to trash may fail in cases where a rm may not, if 
a user has a home directory quota and does a rmr /tonsOfData, for example. 
Without a way to bypass this the user has no way (unless we revisit quotas, 
permissions or trash paths) to remove a directory they have permissions to 
remove without getting their quota adjusted by an admin. The skip trash API can 
be implemented by adding an explicit FileSystem API that bypasses trash and 
modifying the shell to use it when skipTrash is enabled. Given that users must 
explicitly specify skipTrash the API is less error prone. We could have the 
shell ask confirmation and annotate the API private to FsShell to discourage 
programatic use. This is not ideal but can be done compatibly (unlike 
redefining quotas, permissions or trash paths).

In terms of compatibility, while this proposal is technically an incompatible 
change (client side configuration that disables trash and uses skipTrash with a 
previous FsShell release will now both be ignored if server-side trash is 
enabled, and non-HDFS file systems would need to make similar changes) I think 
it's worth targeting for Hadoop 2.x given that the new semantics preserve the 
current semantics. In 2.x I think we should preserve FsShell based trash and 
support both it and server-side trash (defaults to disabled). For trunk/3.x I 
think we should remove the FsShell based trash entirely and enable server-side 
trash by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8594) Upgrade to findbugs 2

2012-07-13 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8594:
---

 Summary: Upgrade to findbugs 2
 Key: HADOOP-8594
 URL: https://issues.apache.org/jira/browse/HADOOP-8594
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins


Harsh recently ran findbugs 2 (instead of 1.3.9 which is what jenkins runs) and 
it showed thousands of warnings (they've made a lot of progress in findbugs 
releases). We should upgrade to findbugs 2 and fix these. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8595) Create security page in the docs and update the ASF page to link to it

2012-07-13 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8595:
---

 Summary: Create security page in the docs and update the ASF page 
to link to it
 Key: HADOOP-8595
 URL: https://issues.apache.org/jira/browse/HADOOP-8595
 Project: Hadoop Common
  Issue Type: Task
Reporter: Eli Collins


We should (1) create a http://hadoop.apache.org/security.html with info on how 
to report a security problem, pointer to secur...@hadoop.apache.org, and (2) 
get it linked off the main ASF page: 
http://www.apache.org/security/projects.html.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8596) TestFileAppend4#testCompleteOtherLeaseHoldersFile times out

2012-07-13 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8596:
---

 Summary: TestFileAppend4#testCompleteOtherLeaseHoldersFile times 
out
 Key: HADOOP-8596
 URL: https://issues.apache.org/jira/browse/HADOOP-8596
 Project: Hadoop Common
  Issue Type: Task
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins


Saw TestFileAppend4#testCompleteOtherLeaseHoldersFile on a recent jenkins run. 
Perhaps we need to bump the timeout?


{noformat}
Error Message

test timed out after 6 milliseconds
Stacktrace

java.lang.Exception: test timed out after 6 milliseconds
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1186)
at java.lang.Thread.join(Thread.java:1239)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.join(BPServiceActor.java:477)
at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.join(BPOfferService.java:259)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.shutDownAll(BlockPoolManager.java:117)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1101)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1343)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1323)
at 
org.apache.hadoop.hdfs.TestFileAppend4.testCompleteOtherLeaseHoldersFile(TestFileAppend4.java:289)
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-7836) TestSaslRPC#testDigestAuthMethodHostBasedToken fails with hostname localhost.localdomain

2012-07-12 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HADOOP-7836.
-

  Resolution: Fixed
   Fix Version/s: 1.2.0
Target Version/s:   (was: 1.1.0)

I've committed this, thanks Daryn!

Do we need a jira for the same test forward ported to trunk?

 TestSaslRPC#testDigestAuthMethodHostBasedToken fails with hostname 
 localhost.localdomain
 

 Key: HADOOP-7836
 URL: https://issues.apache.org/jira/browse/HADOOP-7836
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc, test
Affects Versions: 1.1.0
Reporter: Eli Collins
Priority: Minor
 Fix For: 1.2.0

 Attachments: HADOOP-7836.patch, hadoop-7836.txt


 TestSaslRPC#testDigestAuthMethodHostBasedToken fails on branch-1 on some 
 hosts.
 null expected:localhost[] but was:localhost[.localdomain]
 junit.framework.ComparisonFailure: null expected:localhost[] but 
 was:localhost[.localdomain]
 null expected:[localhost] but was:[eli-thinkpad]
 junit.framework.ComparisonFailure: null expected:[localhost] but 
 was:[eli-thinkpad]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8592) Hadoop-auth should use o.a.h.util.Time methods instead of System#currentTimeMillis

2012-07-12 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8592:
---

 Summary: Hadoop-auth should use o.a.h.util.Time methods instead of 
System#currentTimeMillis
 Key: HADOOP-8592
 URL: https://issues.apache.org/jira/browse/HADOOP-8592
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Priority: Minor


HDFS-3641 moved HDFS' Time methods to common so they can be used by MR (and 
eventually others). We should replace used of System#currentTimeMillis in MR 
with Time#now (or Time#monotonicNow when computing intervals, eg to sleep).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8587) HarFileSystem access of harMetaCache isn't threadsafe

2012-07-11 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HADOOP-8587.
-

  Resolution: Fixed
   Fix Version/s: 2.0.1-alpha
  1.2.0
Target Version/s:   (was: 1.2.0, 2.0.1-alpha)
Hadoop Flags: Reviewed

Thanks for the review Daryn. I've committed this to trunk and merged to 
branch-2 and branch-1.

 HarFileSystem access of harMetaCache isn't threadsafe
 -

 Key: HADOOP-8587
 URL: https://issues.apache.org/jira/browse/HADOOP-8587
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Fix For: 1.2.0, 2.0.1-alpha

 Attachments: hadoop-8587-b1.txt, hadoop-8587.txt, hadoop-8587.txt


 HarFileSystem's use of the static harMetaCache map is not threadsafe. Credit 
 to Todd for pointing this out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: a bug in ViewFs tests

2012-07-11 Thread Eli Collins
Hey Andrew.  I'd open a new jira, and thanks for the find btw!

Thanks,
Eli

On Wed, Jul 11, 2012 at 4:08 PM, Andrey Klochkov
akloch...@griddynamics.com wrote:
 Hi,
 I noticed that the fix done in HADOOP-8036 (failing ViewFs tests) was
 reverted later when resolving HADOOP-8129, so the bug exists both in
 0.23 and 2.0. I'm going to provide an alternative fix. Should I reopen
 HADOOP-8036 or create a new one instead? Thanks.

 --
 Andrey Klochkov


[jira] [Resolved] (HADOOP-8584) test-patch.sh should not immediately exit when no tests are added or modified

2012-07-10 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HADOOP-8584.
-

   Resolution: Fixed
Fix Version/s: 3.0.0
 Hadoop Flags: Reviewed

I've committed this to trunk.

 test-patch.sh should not immediately exit when no tests are added or modified
 -

 Key: HADOOP-8584
 URL: https://issues.apache.org/jira/browse/HADOOP-8584
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0

 Attachments: HADOOP-8584.001.patch


 test-patch.sh should not immediately exit when no tests are added or modified.
 Although it's good to note whether or not a patch introduces or modifies 
 tests, it's not good to abort the Jenkins patch process if it did not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8587) HarFileSystem access of harMetaCache isn't threadsafe

2012-07-10 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8587:
---

 Summary: HarFileSystem access of harMetaCache isn't threadsafe
 Key: HADOOP-8587
 URL: https://issues.apache.org/jira/browse/HADOOP-8587
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins


HarFileSystem's use of the static harMetaCache map is not threadsafe. Credit to 
Todd for pointing this out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8554) KerberosAuthenticator should use the configured principal

2012-07-06 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HADOOP-8554.
-

Resolution: Invalid

You're right, thanks for the explanation, I didn't realize the principal config 
was server-side only. Also, the reason I hit this with webhdfs and not hftp is 
that hftp doesn't support SPNEGO.

 KerberosAuthenticator should use the configured principal
 -

 Key: HADOOP-8554
 URL: https://issues.apache.org/jira/browse/HADOOP-8554
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 1.0.0, 2.0.0-alpha, 2.0.1-alpha, 3.0.0
Reporter: Eli Collins
  Labels: security, webconsole

 In KerberosAuthenticator we construct the principal as follows:
 {code}
 String servicePrincipal = HTTP/ + KerberosAuthenticator.this.url.getHost();
 {code}
 Seems like we should use the configured 
 hadoop.http.authentication.kerberos.principal instead right?
 I hit this issue as a distcp using webhdfs://localhost fails because 
 HTTP/localhost is not in the kerb DB but using webhdfs://eli-thinkpad works 
 because HTTP/eli-thinkpad is (and is my configured principal). distcp using 
 Hftp://localhost with the same config works so it looks like this check is 
 webhdfs specific for some reason (webhdfs is using spnego and hftp is not?).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8565) AuthenticationFilter#doFilter warns unconditionally when using SPNEGO

2012-07-05 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8565:
---

 Summary: AuthenticationFilter#doFilter warns unconditionally when 
using SPNEGO 
 Key: HADOOP-8565
 URL: https://issues.apache.org/jira/browse/HADOOP-8565
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha, 1.0.3
Reporter: Eli Collins


The following code in AuthenticationFilter#doFilter throws 
AuthenticationException (and warns) unconditionally because 
KerberosAuthenticator#authenticate returns null if SPNEGO is used.

{code}
  token = authHandler.authenticate(httpRequest, httpResponse);
  ...
  if (token != null) { ... } else {
throw new AuthenticationException
  }
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8568) DNS#reverseDns fails on IPv6 addresses

2012-07-05 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8568:
---

 Summary: DNS#reverseDns fails on IPv6 addresses
 Key: HADOOP-8568
 URL: https://issues.apache.org/jira/browse/HADOOP-8568
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins


DNS#reverseDns assumes hostIp is a v4 address (4 parts separated by dots), 
blows up if given a v6 address:

{noformat}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
at org.apache.hadoop.net.DNS.reverseDns(DNS.java:79)
at org.apache.hadoop.net.DNS.getHosts(DNS.java:237)
at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:340)
at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:358)
at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:337)
at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:235)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1649)
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8554) KerberosAuthenticator should use the configured principal

2012-07-03 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8554:
---

 Summary: KerberosAuthenticator should use the configured principal
 Key: HADOOP-8554
 URL: https://issues.apache.org/jira/browse/HADOOP-8554
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 1.0.0
Reporter: Eli Collins


In KerberosAuthenticator we construct the principal as follows:

{code}
String servicePrincipal = HTTP/ + KerberosAuthenticator.this.url.getHost();
{code}

Seems like we should use the configured 
hadoop.http.authentication.kerberos.principal instead right?

I hit this issue as a distcp using webhdfs://localhost fails because 
HTTP/localhost is not in the kerb DB but using webhdfs://eli-thinkpad works 
because HTTP/eli-thinkpad is (and is my configured principal). distcp using 
Hftp://localhost with the same config works so it looks like this check is 
webhdfs specific for some reason (webhdfs is using spnego and hftp is not?).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-8546) missing hdfs user guide and command pages from hadoop project page

2012-07-02 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HADOOP-8546.
-

Resolution: Duplicate

Per HDFS-3458 the forest docs needed to be ported to APT.

 missing hdfs user guide and command pages from hadoop project page  
 

 Key: HADOOP-8546
 URL: https://issues.apache.org/jira/browse/HADOOP-8546
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Jason Shih
Priority: Trivial

 cant find the web master contact thus file issue in jira.
 two of the following links are missing from hadoop project page, maybe you 
 can help restoring back them. thanks 
 http://hadoop.apache.org/common/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs_user_guide.html
  
 http://hadoop.apache.org/common/docs/current/hadoop-project-dist/hadoop-common/commands_manual.html
 Cheers,
 Jason

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Clarification about 2.* branches

2012-06-06 Thread Eli Collins
On Wed, Jun 6, 2012 at 2:10 PM, Eli Collins e...@cloudera.com wrote:
 Hey Vinod,

 Out of curiosity, why delete the branch?  Might make spelunking for
 svn revs that correspond to a release hard.  If we're deleting old
 release branches might as well delete the others (branch-0.23.0,
 branch-0.23.0-rc0, branch-0.23.1, branch-0.23.2, etc) as well?

Oops, shouldn't have listed branch-0.23.2 as 0.23.2 isn't out yet.



 Thanks,
 Eli

 On Wed, Jun 6, 2012 at 1:52 PM, Vinod Kumar Vavilapalli
 vino...@hortonworks.com wrote:


 I checked with Nicholas also who did the only merge into branch-2.0.0-alpha 
 after the release and got a confirmation that I can delete the branch.

 Removing the branch now.

 Thanks,
 +Vinod


 On Jun 5, 2012, at 5:45 PM, Arun C Murthy wrote:

 +1 for blowing away branch-2.0.0-alpha now, we have the release tag if 
 necessary.

 thanks,
 Arun

 On Jun 5, 2012, at 4:40 PM, Vinod Kumar Vavilapalli wrote:

 Hi,

 I see two branches for 2.* now: branch-2 and branch-2.0.0-alpha and two 
 sections in CHANGES.txt: 2.0.1-alpha unreleased and 2.0.0-alpha released. 
 It is a little confusing, seems like branch-2.0.0-alpha was a staging 
 branch for the release and can be thrown away.

 Anyone committing patches to both branch-2 and branch-2.0.0-alpha? If not, 
 I'll blow away the later and rename the section in CHANGES.txt to branch-2.

 Thanks,
 +Vinod

 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/





[jira] [Resolved] (HADOOP-8430) Backport new FileSystem methods introduced by HADOOP-8014 to branch-1

2012-06-04 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HADOOP-8430.
-

  Resolution: Fixed
   Fix Version/s: 1.1.0
Target Version/s:   (was: 1.1.0)

Thanks Daryn. I've committed this to branch-1 and merged to branch-1.1

 Backport new FileSystem methods introduced by HADOOP-8014 to branch-1 
 --

 Key: HADOOP-8430
 URL: https://issues.apache.org/jira/browse/HADOOP-8430
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 1.1.0

 Attachments: hadoop-8430-1.txt


 Per HADOOP-8422 let's backport the new FileSystem methods from HADOOP-8014 to 
 branch-1 so users can transition over in Hadoop 1.x releases, which helps 
 upstream projects like HBase work against federation (see HBASE-6067). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Hadoop view on Jenkins

2012-06-04 Thread Eli Collins
Hey gang,

fyi, I created a view in jenkins so you can see all the enabled Hadoop builds:
https://builds.apache.org/view/Hadoop

Thanks,
Eli


[jira] [Created] (HADOOP-8463) hadoop.security.auth_to_local needs a key definition and doc

2012-05-31 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8463:
---

 Summary: hadoop.security.auth_to_local needs a key definition and 
doc 
 Key: HADOOP-8463
 URL: https://issues.apache.org/jira/browse/HADOOP-8463
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins


hadoop.security.auth_to_local should be defined in 
CommonConfigurationKeysPublic.java, and update the uses of the raw string in 
common and hdfs with the key. 
It's definition in core-site.xml should also be updated with a description.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: cmake

2012-05-23 Thread Eli Collins
On Wed, May 23, 2012 at 8:58 AM, Owen O'Malley omal...@apache.org wrote:
 After a quick look at cmake, it seems reasonable. Please use feature tests
 rather than OS tests. (By that I mean that if Solaris needs foobar.h and
 RHEL needs sys/foobar.h to get the definition for foobar, the code should
 ifdef whether they need sys/foobar.h and not if it is solaris or not.)

 On a related note, on what platforms does the native code currently compile
 and work?

 I've compiled and tested on:
  RHEL  CentOS 5
  RHEL  CentOS 6

 I've heard of it compiling on:
  Solaris

 What other ones are out there?

Also multiple versions of OUL, SLES, Debian, and Ubuntu.


[jira] [Created] (HADOOP-8430) Backport new FileSystem methods introduced by HADOOP-8014 to branch-1

2012-05-23 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8430:
---

 Summary: Backport new FileSystem methods introduced by HADOOP-8014 
to branch-1 
 Key: HADOOP-8430
 URL: https://issues.apache.org/jira/browse/HADOOP-8430
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Eli Collins
Assignee: Eli Collins


Per HADOOP-8422 let's backport the new FileSystem methods from HADOOP-8014 to 
branch-1 so users can transition over in Hadoop 1.x releases, which helps 
upstream projects like HBase work against federation (see HBASE-6067). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8431) Running distcp wo args throws IllegalArgumentException

2012-05-23 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8431:
---

 Summary: Running distcp wo args throws IllegalArgumentException
 Key: HADOOP-8431
 URL: https://issues.apache.org/jira/browse/HADOOP-8431
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins


Running distcp w/o args results in the following:

{noformat}
hadoop-3.0.0-SNAPSHOT $ ./bin/hadoop distcp
12/05/23 18:49:04 ERROR tools.DistCp: Invalid arguments: 
java.lang.IllegalArgumentException: Target path not specified
at org.apache.hadoop.tools.OptionsParser.parse(OptionsParser.java:86)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:102)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:368)
Invalid arguments: Target path not specified
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HADOOP-7069) Replace forrest with supported framework

2012-05-21 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HADOOP-7069.
-

   Resolution: Fixed
Fix Version/s: (was: 0.24.0)
   2.0.0

 Replace forrest with supported framework
 

 Key: HADOOP-7069
 URL: https://issues.apache.org/jira/browse/HADOOP-7069
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Reporter: Jakob Homan
 Fix For: 2.0.0


 It's time to burn down the forrest.  Apache forrest, which is used to 
 generate the documentation for all three subprojects, has not had a release 
 in several years (0.8, the version we use was released April 18, 2007), and 
 requires JDK5, which was EOL'ed in November 2009.  Since it doesn't seem 
 likely Forrest will be developed any more, and JDK5 is not shipped with 
 recent OSX versions, or included by default in most linux distros, we should 
 look to find a new documentation system and convert the current docs to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8422) FileSystem#getDefaultBlockSize

2012-05-21 Thread Eli Collins (JIRA)
Eli Collins created HADOOP-8422:
---

 Summary: FileSystem#getDefaultBlockSize 
 Key: HADOOP-8422
 URL: https://issues.apache.org/jira/browse/HADOOP-8422
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Eli Collins




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: cmake

2012-05-21 Thread Eli Collins
+1   having a build tool that supports multiple platforms is worth the
dependency. I've also had good experiences with cmake.


On Mon, May 21, 2012 at 6:00 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote:
 Hi all,

 We'd like to use CMake instead of autotools to build native (C/C++) code in
 Hadoop.  There are a lot of reasons to want to do this.  For one thing, it is
 not feasible to use autotools on the Windows platform, because it depends on
 UNIX shell scripts, the m4 macro processor, and some other pieces of
 infrastructure which are not present on Windows.

 For another thing, CMake builds are substantially simpler and faster, because
 there is only one layer of generated code.  With autotools, you have automake
 generating m4 code which autoconf reads, which it uses to generate a UNIX 
 shell
 script, which then generates another UNIX shell script, which eventually
 generates Makefiles.  CMake simply generates Makefiles out of CMakeLists.txt
 files-- much simpler to understand and debug, and much faster.
 CMake is a lot easier to learn.

 automake error messages can be very, very confusing.  This is because you are
 essentially debugging a pile of shell scripts and macros, rather than a
 coherent whole.  So you see error messages like autoreconf: cannot empty
 /tmp/ar0.4849: Is a directory or Can't locate object method path via
 package Autom4te... and so forth.  CMake error messages come from the CMake
 application and they almost always immediately point you to the problem.

 From a build point of view, the net result of adopting CMake would be that you
 would no longer need automake and related programs installed to build the
 native parts of Hadoop.  Instead, you would need CMake installed.  CMake is
 packaged by Red Hat, even in RHEL5, so it shouldn't be difficult to install
 locally.  It's also available for Mac OS X and Windows, as I mentioned 
 earlier.

 The JIRA for this work is at https://issues.apache.org/jira/browse/HADOOP-8368
 Thanks for reading.

 sincerely,
 Colin


  1   2   3   4   >