Re: Yet another patch + how to check for stuff in shell scripts

2008-07-07 Thread Raphael Geissert
Russ Allbery wrote:

 Raphael Geissert [EMAIL PROTECTED]
 writes:
 
 Of course, $LEADIN isn't suitable for every check lintian is doing.  My
 main point was that several checks are actually very prone to false
 positives and false negatives.
 
 Yeah, that's part of the problem and one of the things that makes the code
 hard to maintain: some of the existing patterns are actually intentionally
 tuned to avoid false positives that we found in practice, whereas some are
 just randomly the way they are for no particularly good reason, and
 there's no good way to tell the difference.
 
 Lintian has a very effective feedback mechanism for false positives and I
 regularly look through lintian.d.o trying to find them and fix them, but
 it has no such feedback mechanism for false negatives.

Maybe experimental tags can be (ab)used with tags where 'greedier' regexes
are used, resulting in more data that can be used to improve the regexes.

 
 For a lot of the checks (such as the unescaped period in invoke-rc.d), the
 false positive risk is just theoretical since no one does that.  Which
 doesn't mean we shouldn't improve it, of course, if we can without
 creating problems for ourselves.
 
 I tried to design the module to be lintian friendly and I've even
 started to implement it in checks/scripts; but I have to confess that
 some of the checks being done right after the bashisms checks in
 checks/scripts are painful and require special care.
 
 Yeah, there's a lot of that in Lintian.
 
 The idea is to also perform all, or at least most, of the non-bashisms
 checks also in Devscripts::Checkbashisms and later separate the results
 and emit the necessary tags. That's what the add_hash method is for.
 
 I'll have to see the patch to really comment.  Please do bear in mind that
 we have to keep things running in stable, which means that adding new
 package dependencies, while possible, is annoying to deal with.  I'm not
 sure which way would be best -- to have devscripts depend on lintian and
 use a module it provides, to have lintian depend on devscripts and use a
 module it provides, or to package an independent package like Frank did
 with libparse-debianchangelog-perl (although that's a special case since
 it's also used by other things, like packages.debian.org).

I was actually thinking about shipping a copy of the module in lintian
itself. Because I don't think lintian wants to depend on devscripts, the
same way devscripts doesn't want to depend on another package just for
checkbashisms to work.

 
 What I think Lintian really needs is a module that can do the following:
 
 * Read a shell script and provide the caller with a stream of logical
   lines, dealing with things like backslash escapes so that each caller
   doesn't have to.
 
 * Recognize heredocs and filter them out of the command stream so that the
   caller only sees the contents of heredocs as arguments and not as
   possible separate commands.

Both already done in some way by checkbashisms.

 
 * Recognize conditionals so that the caller can get simple information
   like is this line inside a conditional or would it always execute?.
   This is important in Lintian in several places for false positive
   suppression; we often can assume that if code is conditional, the
   maintainer probably knows what they're doing and we shouldn't bug them
   about something we're unsure about.

IMO this is one of the most complex and complicated things to do, as it
requires more than a regexes logic.

 
 * Recognize common patterns that Lintian needs to be concerned with, such
   as whether a conditional probes for the existence of a command using the
   regular which / command -v methods.  There are a bunch of stock shell
   script parsing patterns like that in Lintian right now.

Something like $LEADIN?

 
 * Optionally do simple canonicalization on commands before returning them
   to the caller for analysis, doing things like removing unnecessary
   quotes or filtering out command prefixes that don't affect whether the
   command would run (such as your time example, or more commonly, things
   like @ and - in Makefiles).

Already done by checkbashisms.

 
 * Parse commands into a list of command and arguments so that code can do
   simple things like ask was the command given this argument without
   doing error-prone regex matching on the whole command line.

Not impossible, but won't be easy either.

 
 You could incorporate bashism checks into the same module, although to
 some extent a bashism checker is a consumer of this sort of parsing
 framework.

Sure; but I have to say that checkbashisms is currently are both things
together and the perl module allows bashisms + whatever you want.

 
 This is a chunk of work, yes, but bits of Lintian already do a lot of
 this, and the checkbashisms code in particular already does quite a lot.
 Working through the API to make sure the code is clean and readable is
 probably the hardest part.  My off-the-cuff 

Re: Yet another patch + how to check for stuff in shell scripts

2008-07-07 Thread Russ Allbery
Raphael Geissert [EMAIL PROTECTED] writes:
 Russ Allbery wrote:

 What I think Lintian really needs is a module that can do the following:
 
 * Read a shell script and provide the caller with a stream of logical
   lines, dealing with things like backslash escapes so that each caller
   doesn't have to.
 
 * Recognize heredocs and filter them out of the command stream so that the
   caller only sees the contents of heredocs as arguments and not as
   possible separate commands.

 Both already done in some way by checkbashisms.

Well, yes and no.  checkbashisms does this internally for its own checks,
but doesn't provide the necessary API to callers.  If your new module does
that, that's great!  But I want to be able to do these things *without*
checking for bashisms as well (obvious example: a maintainer script
written in bash), and I want to be able to do it from check scripts
separate from checks/scripts (which means I don't want to keep redoing the
bashism checks each time it's called).

So having a separate parsing layer is, I think, a requirement for Lintian
going forward to be able to do script checks in a sane fashion.  If we get
that via your work on checkbashisms, that's fantastic -- it just needs to
provide the necessary API.

 * Recognize conditionals so that the caller can get simple information
   like is this line inside a conditional or would it always execute?.
   This is important in Lintian in several places for false positive
   suppression; we often can assume that if code is conditional, the
   maintainer probably knows what they're doing and we shouldn't bug them
   about something we're unsure about.

 IMO this is one of the most complex and complicated things to do, as it
 requires more than a regexes logic.

I believe you are overestimating the difficulty.

 * Recognize common patterns that Lintian needs to be concerned with, such
   as whether a conditional probes for the existence of a command using the
   regular which / command -v methods.  There are a bunch of stock shell
   script parsing patterns like that in Lintian right now.

 Something like $LEADIN?

I don't think it's related, but I might be missing something.  Look at
checks/menus for an example of the kind of code that needs to be made
generic.

 * Parse commands into a list of command and arguments so that code can do
   simple things like ask was the command given this argument without
   doing error-prone regex matching on the whole command line.

 Not impossible, but won't be easy either.

Here too I think you're seriously overestimating the difficulty.

 Sure; but I have to say that checkbashisms is currently are both things
 together and the perl module allows bashisms + whatever you want.

If it provides an API to do what's mentioned above, that's great.

-- 
Russ Allbery ([EMAIL PROTECTED])   http://www.eyrie.org/~eagle/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Yet another patch + how to check for stuff in shell scripts

2008-07-06 Thread Russ Allbery
Russ Allbery [EMAIL PROTECTED] writes:

 --- a/checks/scripts
 +++ b/checks/scripts
 @@ -697,9 +697,12 @@ while (SCRIPTS) {
   $line =~ s/(^|[^\\\'](?:)*)\(?:\\.|[^\\\])+\/$1/g;
   for my $re (@bashism_regexs) {
   if ($line =~ m/($re)/) {
 - $found = 1;
   ($match) = m/($re)/;
 - last;
 + unless ($match =~ m/^\s*time\s*$/ 
 + Dep::implies($deps{depends}, 
 Dep::parse('time'))) {
 + $found = 1;
 + last;
 + }
   }
   }
   }

 This is surely wrong.  It gives a free pass on any bashisms in a line
 *ending* in time if there's a dependency on time.

Oh, no, I see now.  You're matching inside the bashism match.  Hm, yeah,
that will probably work, although it feels potentially fragile... but I
can't imagine any other bashism that would match that regex.

However, I do think the code would be cleaner if that were pulled out into
a separate check.  The Dep::implies just seems to be in an odd location
doing it that way, and accumulating exceptions inside that loop feels like
the wrong direction to go for long-term code maintainability.

-- 
Russ Allbery ([EMAIL PROTECTED])   http://www.eyrie.org/~eagle/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Yet another patch + how to check for stuff in shell scripts

2008-07-06 Thread Raphael Geissert
Russ Allbery wrote:

 Raphael Geissert [EMAIL PROTECTED]
 writes:
 
 Besides that patch, I'd like to know the opinion of others about how
 shell scripts should be checked. At the moment checks/scripts now makes
 use of $LEADIN as a separator/command identifier, but other checks also
 checking for stuff in shell script don't.

 This caused me troubles when I prefixed the invoke-rc.d calls in
 testset/scripts/debian/postinst with 'time'. Doing that caused most, if
 not all, of checks/init.d results to be completely incorrect because it
 failed to detect the invoke-rc.d calls.
 
 Given that running invoke-rc.d under time in a Debian maintainer script
 doesn't seem like sane behavior, I don't think I have a problem with
 that specific case.  However, in general, there are a lot of shell script
 checks in Lintian that try to figure out what's being run, some of which
 dealing with conditionals, and they're not done in a consistent way.
 
 I would really like to see this solved by taking a higher-level look at
 shell parsing and writing a module that has some deeper understanding of
 conditionals or at least a structure where we can apply that.  Just
 applying $LEADIN is going to be wrong in a lot of cases.  I think we need

At least on checkbashisms it has not only reduced the number of false
positives, but even reduced the number of false negatives; because it knows
where to start looking for a command.

 a documented module with a clearer picture of what it can and can't do
 which understands some of the common conditional constructs, such as
 checking whether a command exists, and tracks things like if/fi nesting.

Well, that would be even more complex and complicated; and if it were so
easy I can be sure that someone would have already written such a module.

 
 Although that problem could not be avoided just by using $LEADIN
 (because 'time' is not in the list of separators, so someone please add
 it) similar problems can.
 
 I don't think it makes sense to add time to $LEADIN in general, since time
 changes the nature of the command similar to adding env in front of
 something; it forces the command to be external rather than a built-in,

It depends on whether the built-in time is used or not. If the built-in is
used:

$ time [ 1 == 1 ]; echo $?

real0m0.000s
user0m0.000s
sys 0m0.000s
0

If not:

$ /usr/bin/time [ 1 == 1 ]; echo $?
[: ==: binary operator expected
Command exited with non-zero status 2
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
2

 which means that many of the things we check for become okay, such as echo
 with flags or the full syntax of kill.
 

It again depends, but I see your point.

Cheers,
-- 
Atomo64 - Raphael

Please avoid sending me Word, PowerPoint or Excel attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Yet another patch + how to check for stuff in shell scripts

2008-07-06 Thread Russ Allbery
Raphael Geissert [EMAIL PROTECTED] writes:
 Russ Allbery wrote:

 Given that running invoke-rc.d under time in a Debian maintainer script
 doesn't seem like sane behavior, I don't think I have a problem with
 that specific case.  However, in general, there are a lot of shell script
 checks in Lintian that try to figure out what's being run, some of which
 dealing with conditionals, and they're not done in a consistent way.

 I would really like to see this solved by taking a higher-level look at
 shell parsing and writing a module that has some deeper understanding
 of conditionals or at least a structure where we can apply that.  Just
 applying $LEADIN is going to be wrong in a lot of cases.  I think we
 need

 At least on checkbashisms it has not only reduced the number of false
 positives, but even reduced the number of false negatives; because it
 knows where to start looking for a command.

Sorry, I wasn't clear.  Adding $LEADIN to all the other shell parsing code
in Lintian is something that I think we want to think about very
carefully, and I wish we had a higher-level approach.  It's certainly fine
for the bashism code.  I'm nervous about it as a general approach to all
of the shell parsing code in lintian, such as the invoke-rc.d check and
the whole ton of other checks that look at what maintainer scripts do or
don't run.

Regardless of which way we do this, it really does need to be in a module
rather than embedded direction in the checks/* scripts, so that we have
one place that does the work consistently and which we can change if we
need to in order to cope with additional cases.  (We need that anyway in
order to do bashism checks on debian/rules and on other scripts than
maintainer scripts.)

 a documented module with a clearer picture of what it can and can't do
 which understands some of the common conditional constructs, such as
 checking whether a command exists, and tracks things like if/fi
 nesting.

 Well, that would be even more complex and complicated; and if it were so
 easy I can be sure that someone would have already written such a
 module.

Well, what I'm trying to get at is that the current situation is fragile,
and making it more fragile is not appealing.  From a maintainability
standpoint, the current shell parsing in Lintian is some of the hardest
code to both understand and keep working.

Parsing shell scripts with regexes is extremely painful.

The best solution for Debian as a whole, IMO, would be to develop a mini
command language that could do 90% of what maintainer scripts do and leave
shell in the same bucket that Perl is in now: only necessary for weird
cases where the maintainer can then be presumed to know what they're
doing.  But that's obviously outside the scope of what Lintian can do.

-- 
Russ Allbery ([EMAIL PROTECTED])   http://www.eyrie.org/~eagle/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Yet another patch + how to check for stuff in shell scripts

2008-07-06 Thread Raphael Geissert
Russ Allbery wrote:

 Raphael Geissert [EMAIL PROTECTED]
 writes:
 Russ Allbery wrote:
 
 Given that running invoke-rc.d under time in a Debian maintainer script
 doesn't seem like sane behavior, I don't think I have a problem with
 that specific case.  However, in general, there are a lot of shell
 script checks in Lintian that try to figure out what's being run, some
 of which dealing with conditionals, and they're not done in a consistent
 way.
 
 I would really like to see this solved by taking a higher-level look at
 shell parsing and writing a module that has some deeper understanding
 of conditionals or at least a structure where we can apply that.  Just
 applying $LEADIN is going to be wrong in a lot of cases.  I think we
 need
 
 At least on checkbashisms it has not only reduced the number of false
 positives, but even reduced the number of false negatives; because it
 knows where to start looking for a command.
 
 Sorry, I wasn't clear.  Adding $LEADIN to all the other shell parsing code
 in Lintian is something that I think we want to think about very
 carefully, and I wish we had a higher-level approach.  It's certainly fine

Of course, $LEADIN isn't suitable for every check lintian is doing.
My main point was that several checks are actually very prone to false
positives and false negatives.

 for the bashism code.  I'm nervous about it as a general approach to all
 of the shell parsing code in lintian, such as the invoke-rc.d check and
 the whole ton of other checks that look at what maintainer scripts do or
 don't run.
 
 Regardless of which way we do this, it really does need to be in a module
 rather than embedded direction in the checks/* scripts, so that we have
 one place that does the work consistently and which we can change if we
 need to in order to cope with additional cases.  (We need that anyway in
 order to do bashism checks on debian/rules and on other scripts than
 maintainer scripts.)

I've already converted checkbashisms into a perl module, and its interface.
I sent a copy of both today to Adam so he can take a look at them; there are
still two or three TODO's, but all are extras which the original
checkbashisms doesn't have.

I tried to design the module to be lintian friendly and I've even started to
implement it in checks/scripts; but I have to confess that some of the
checks being done right after the bashisms checks in checks/scripts are
painful and require special care.

The idea is to also perform all, or at least most, of the non-bashisms
checks also in Devscripts::Checkbashisms and later separate the results and
emit the necessary tags. That's what the add_hash method is for.

 
 a documented module with a clearer picture of what it can and can't do
 which understands some of the common conditional constructs, such as
 checking whether a command exists, and tracks things like if/fi
 nesting.
 
 Well, that would be even more complex and complicated; and if it were so
 easy I can be sure that someone would have already written such a
 module.
 
 Well, what I'm trying to get at is that the current situation is fragile,
 and making it more fragile is not appealing.  From a maintainability
 standpoint, the current shell parsing in Lintian is some of the hardest
 code to both understand and keep working.
 
 Parsing shell scripts with regexes is extremely painful.

Sure it is :)

 
 The best solution for Debian as a whole, IMO, would be to develop a mini
 command language that could do 90% of what maintainer scripts do and leave
 shell in the same bucket that Perl is in now: only necessary for weird
 cases where the maintainer can then be presumed to know what they're
 doing.  But that's obviously outside the scope of what Lintian can do.
 

Weren't shell scripts supposed to do that? :) 
IMHO using yet another command language is *not* the solution.

Cheers,
-- 
Atomo64 - Raphael

Please avoid sending me Word, PowerPoint or Excel attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: Yet another patch + how to check for stuff in shell scripts

2008-07-06 Thread Russ Allbery
Raphael Geissert [EMAIL PROTECTED] writes:

 Of course, $LEADIN isn't suitable for every check lintian is doing.  My
 main point was that several checks are actually very prone to false
 positives and false negatives.

Yeah, that's part of the problem and one of the things that makes the code
hard to maintain: some of the existing patterns are actually intentionally
tuned to avoid false positives that we found in practice, whereas some are
just randomly the way they are for no particularly good reason, and
there's no good way to tell the difference.

Lintian has a very effective feedback mechanism for false positives and I
regularly look through lintian.d.o trying to find them and fix them, but
it has no such feedback mechanism for false negatives.

For a lot of the checks (such as the unescaped period in invoke-rc.d), the
false positive risk is just theoretical since no one does that.  Which
doesn't mean we shouldn't improve it, of course, if we can without
creating problems for ourselves.

 I tried to design the module to be lintian friendly and I've even
 started to implement it in checks/scripts; but I have to confess that
 some of the checks being done right after the bashisms checks in
 checks/scripts are painful and require special care.

Yeah, there's a lot of that in Lintian.

 The idea is to also perform all, or at least most, of the non-bashisms
 checks also in Devscripts::Checkbashisms and later separate the results
 and emit the necessary tags. That's what the add_hash method is for.

I'll have to see the patch to really comment.  Please do bear in mind that
we have to keep things running in stable, which means that adding new
package dependencies, while possible, is annoying to deal with.  I'm not
sure which way would be best -- to have devscripts depend on lintian and
use a module it provides, to have lintian depend on devscripts and use a
module it provides, or to package an independent package like Frank did
with libparse-debianchangelog-perl (although that's a special case since
it's also used by other things, like packages.debian.org).

What I think Lintian really needs is a module that can do the following:

* Read a shell script and provide the caller with a stream of logical
  lines, dealing with things like backslash escapes so that each caller
  doesn't have to.

* Recognize heredocs and filter them out of the command stream so that the
  caller only sees the contents of heredocs as arguments and not as
  possible separate commands.

* Recognize conditionals so that the caller can get simple information
  like is this line inside a conditional or would it always execute?.
  This is important in Lintian in several places for false positive
  suppression; we often can assume that if code is conditional, the
  maintainer probably knows what they're doing and we shouldn't bug them
  about something we're unsure about.

* Recognize common patterns that Lintian needs to be concerned with, such
  as whether a conditional probes for the existence of a command using the
  regular which / command -v methods.  There are a bunch of stock shell
  script parsing patterns like that in Lintian right now.

* Optionally do simple canonicalization on commands before returning them
  to the caller for analysis, doing things like removing unnecessary
  quotes or filtering out command prefixes that don't affect whether the
  command would run (such as your time example, or more commonly, things
  like @ and - in Makefiles).

* Parse commands into a list of command and arguments so that code can do
  simple things like ask was the command given this argument without
  doing error-prone regex matching on the whole command line.

You could incorporate bashism checks into the same module, although to
some extent a bashism checker is a consumer of this sort of parsing
framework.

This is a chunk of work, yes, but bits of Lintian already do a lot of
this, and the checkbashisms code in particular already does quite a lot.
Working through the API to make sure the code is clean and readable is
probably the hardest part.  My off-the-cuff estimate is that it would take
me about eight hours of work, but I've been working on Policy recently and
need to work on Kerberos packages next and won't have those eight hours
for probably a month or two at least.

Once we had something like this, a ton of code in Lintian could be
simplified by using it, far more broadly than just check/scripts.  We look
for patterns in maintainer scripts in all sorts of places in Lintian, and
do it in a different way almost every time right now.

 Weren't shell scripts supposed to do that? :) 
 IMHO using yet another command language is *not* the solution.

Well, I obviously don't agree, but unless I or someone else has a chance
to write the code and specification for how it would work and see what it
looks like, it's an academic argument.  :)

-- 
Russ Allbery ([EMAIL PROTECTED])   http://www.eyrie.org/~eagle/


-- 
To