Re: [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

2012-11-22 Thread Steve Loughran
On 22 November 2012 02:40, Chris Nauroth cnaur...@hortonworks.com wrote:


 It seems like the trickiest issue is preservation of permissions and
 symlinks in tar files.  I suspect that any JVM-based solution like custom
 Maven plugins, Groovy, or jtar would be limited in this respect.  According
 to Ant documentation, it's a JDK limitation, so I suspect all of these
 would have the same problem.  I haven't tried any of them though.  (If
 there was a feasible solution, then Ant likely would have incorporated it
 long ago.)  If anyone wants to try though, we might learn something from
 that.

 Thank you,
 --Chris


You are limited by what File.canRead(), canWrite() and canExecute) tell you.

The absence of a way to detect file permissions in Java -is because of the
lowest-common-denominator approach of the JavaFS APIs, supporting FAT32
(odd case logic, no perms or symlinks), NTFS (odd case logic, ACLs over
perms, symlinks historically very hard to create), HFS+ (case insensitive
unix fs!) as well as classic unixy filesystems.

Ant tarfileset filesets in tar let you spec permissions on filesets you
pull into the tar; they are generated x-platform, which the other reason
why you declare them in tar -you have the right to generate proper tar
files even if you use a Windows box.

symlinks are problematic -even detecting them cross platform is pretty
unreliable. To really do them you'd need to add a new symlinkfileset
entity for tar, that would take the link declaration. I could imagine how
to do that -and if stuck into the hadoop tools JAR, wouldn't even depend on
a new version of Ant.

Maven just adds extra layers in the way.

-Steve


Re: [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack

2012-11-22 Thread Steve Loughran
On 21 November 2012 19:15, Matt Foley ma...@apache.org wrote:

 This discussion started in


 Those of us involved in the branch-1-win port of Hadoop to Windows without
 use of Cygwin, have faced the issue of frequent use of shell scripts
 throughout the system, both in build time (eg, the utility
 saveVersion.sh),
 and run time (config files like hadoop-env.sh and the start/stop scripts
 in bin/* ).  Similar usages exist throughout the Hadoop stack, in all
 projects.

 The vast majority of these shell scripts do not do anything platform
 specific; they can be expressed in a posix-conforming way.  Therefore, it
 seems to us that it makes sense to start using a cross-platform scripting
 language, such as python, in place of shell for these purposes.  For those
 rare occasions where platform-specific functionality really is needed,
 python also supports quite a lot of platform-specific functionality on both
 Linux and Windows; but where that is inadequate, one could still
 conditionally invoke a platform-specific module written in shell (for
 Linux/*nix) or powershell or bat (for Windows).

 The primary motive for moving to a cross-platform scripting language is
 maintainability.  The alternative would be to maintain two complete suites
 of scripts, one for Linux and one for Windows (and perhaps others in the
 future).  We want to avoid the need to update dual modules in two different
 languages when functionality changes, especially given that many Linux
 developers are not familiar with powershell or bat, and many Windows
 developers are not familiar with shell or bash.


I'd argue that a lot of Hadoop java developers aren't that familiar with
bash. It's only in the last six months that I've come to hate it properly.

In the ant project, it was the launcher scripts that had the worst
bugrep:line ratio, as
 -variations in .sh behaviour, especially under cygwin, but also things
that weren't bash (AIX, ...)
 -requirements of the entire unix command set for real work
 -variants in the parameters/behaviour of those commands between Linux and
other widely used Unix systems (e.g. OSX)
 -lack of inclusion of the .sh scripts in the junit test suite
 -lack of understanding of bash.

In the ant project we added a Python launcher in, what, 2001, based on the
Perl launcher supplied by one steve_l@users.sourceforge


 For run-time, there is likely to be a lot more discussion.  Lots of folks,
 including me, aren't real happy with use of active scripts for
 configuration, and various others, including I believe some of the Bigtop
 folks, have issues with the way the start/stop scripts work.  Nevertheless,
 all those scripts exist today and are widely used.  And they present an
 impediment to porting to Windows-without-cygwin.


They're a maintenance and support cost on Unix. Too many scripts, even more
in Yarn, weakly-nondeterministic logic for loading env variables,
especially between init.d and bin/hadoop; not much diagnostics. And as with
Ant, a relatively under-comprehended language with no unit test coverage.

I'd replace the bash logic with python for Unix dev and maintenance alone.
You could put your logic into a shared python module in usr/lib/hadoop/bin
, have PyUnit test the inner functions as part of the build and test
process ( jenkins).



 Nothing about run-time use of scripts has changed significantly over the
 past three years, and I don't think we should hold up the Windows port
 while we have a huge discussion about issues that veer dangerously into
 religious/aesthetic domains. It would be fun to have that discussion, but I
 don't want this decision to be dependent on it!


With Yarn its got more complex. More env variables to set, more support
calls when they aren't.


 So I propose that we go ahead and also approve python as a run-time
 dependency, and allow the inclusion of python scripts in place of current
 shell-based functionality.  The unpleasant alternative is to spawn a bunch
 of powershell scripts in parallel to the current shell scripts, with a very
 negative impact on maintainability.  The Windows port must, after all, be
 allowed to proceed.


+1 to any vote to allow .py at run time as a new feature

=0 to ripping out and replacing the existing .sh scripts with python code,
as even though I don't like the scripts, replacing them could be traumatic
downstream.

+1 to a gradual migration to .py for new code, starting with the yarn
scripts.


Please remove me from mailing list

2012-11-22 Thread Ajay Sharma
Dear Admin,

 

Please, remove mu email address “aja...@hotmail.com” from the 
“common-dev@hadoop.apache.org” mailing list.

 

Regards.

Ajay Sharma


Sent from Windows Mail