Re: Yet another patch + how to check for stuff in shell scripts
Russ Allbery wrote: Raphael Geissert [EMAIL PROTECTED] writes: Of course, $LEADIN isn't suitable for every check lintian is doing. My main point was that several checks are actually very prone to false positives and false negatives. Yeah, that's part of the problem and one of the things that makes the code hard to maintain: some of the existing patterns are actually intentionally tuned to avoid false positives that we found in practice, whereas some are just randomly the way they are for no particularly good reason, and there's no good way to tell the difference. Lintian has a very effective feedback mechanism for false positives and I regularly look through lintian.d.o trying to find them and fix them, but it has no such feedback mechanism for false negatives. Maybe experimental tags can be (ab)used with tags where 'greedier' regexes are used, resulting in more data that can be used to improve the regexes. For a lot of the checks (such as the unescaped period in invoke-rc.d), the false positive risk is just theoretical since no one does that. Which doesn't mean we shouldn't improve it, of course, if we can without creating problems for ourselves. I tried to design the module to be lintian friendly and I've even started to implement it in checks/scripts; but I have to confess that some of the checks being done right after the bashisms checks in checks/scripts are painful and require special care. Yeah, there's a lot of that in Lintian. The idea is to also perform all, or at least most, of the non-bashisms checks also in Devscripts::Checkbashisms and later separate the results and emit the necessary tags. That's what the add_hash method is for. I'll have to see the patch to really comment. Please do bear in mind that we have to keep things running in stable, which means that adding new package dependencies, while possible, is annoying to deal with. I'm not sure which way would be best -- to have devscripts depend on lintian and use a module it provides, to have lintian depend on devscripts and use a module it provides, or to package an independent package like Frank did with libparse-debianchangelog-perl (although that's a special case since it's also used by other things, like packages.debian.org). I was actually thinking about shipping a copy of the module in lintian itself. Because I don't think lintian wants to depend on devscripts, the same way devscripts doesn't want to depend on another package just for checkbashisms to work. What I think Lintian really needs is a module that can do the following: * Read a shell script and provide the caller with a stream of logical lines, dealing with things like backslash escapes so that each caller doesn't have to. * Recognize heredocs and filter them out of the command stream so that the caller only sees the contents of heredocs as arguments and not as possible separate commands. Both already done in some way by checkbashisms. * Recognize conditionals so that the caller can get simple information like is this line inside a conditional or would it always execute?. This is important in Lintian in several places for false positive suppression; we often can assume that if code is conditional, the maintainer probably knows what they're doing and we shouldn't bug them about something we're unsure about. IMO this is one of the most complex and complicated things to do, as it requires more than a regexes logic. * Recognize common patterns that Lintian needs to be concerned with, such as whether a conditional probes for the existence of a command using the regular which / command -v methods. There are a bunch of stock shell script parsing patterns like that in Lintian right now. Something like $LEADIN? * Optionally do simple canonicalization on commands before returning them to the caller for analysis, doing things like removing unnecessary quotes or filtering out command prefixes that don't affect whether the command would run (such as your time example, or more commonly, things like @ and - in Makefiles). Already done by checkbashisms. * Parse commands into a list of command and arguments so that code can do simple things like ask was the command given this argument without doing error-prone regex matching on the whole command line. Not impossible, but won't be easy either. You could incorporate bashism checks into the same module, although to some extent a bashism checker is a consumer of this sort of parsing framework. Sure; but I have to say that checkbashisms is currently are both things together and the perl module allows bashisms + whatever you want. This is a chunk of work, yes, but bits of Lintian already do a lot of this, and the checkbashisms code in particular already does quite a lot. Working through the API to make sure the code is clean and readable is probably the hardest part. My off-the-cuff
Re: Yet another patch + how to check for stuff in shell scripts
Raphael Geissert [EMAIL PROTECTED] writes: Russ Allbery wrote: What I think Lintian really needs is a module that can do the following: * Read a shell script and provide the caller with a stream of logical lines, dealing with things like backslash escapes so that each caller doesn't have to. * Recognize heredocs and filter them out of the command stream so that the caller only sees the contents of heredocs as arguments and not as possible separate commands. Both already done in some way by checkbashisms. Well, yes and no. checkbashisms does this internally for its own checks, but doesn't provide the necessary API to callers. If your new module does that, that's great! But I want to be able to do these things *without* checking for bashisms as well (obvious example: a maintainer script written in bash), and I want to be able to do it from check scripts separate from checks/scripts (which means I don't want to keep redoing the bashism checks each time it's called). So having a separate parsing layer is, I think, a requirement for Lintian going forward to be able to do script checks in a sane fashion. If we get that via your work on checkbashisms, that's fantastic -- it just needs to provide the necessary API. * Recognize conditionals so that the caller can get simple information like is this line inside a conditional or would it always execute?. This is important in Lintian in several places for false positive suppression; we often can assume that if code is conditional, the maintainer probably knows what they're doing and we shouldn't bug them about something we're unsure about. IMO this is one of the most complex and complicated things to do, as it requires more than a regexes logic. I believe you are overestimating the difficulty. * Recognize common patterns that Lintian needs to be concerned with, such as whether a conditional probes for the existence of a command using the regular which / command -v methods. There are a bunch of stock shell script parsing patterns like that in Lintian right now. Something like $LEADIN? I don't think it's related, but I might be missing something. Look at checks/menus for an example of the kind of code that needs to be made generic. * Parse commands into a list of command and arguments so that code can do simple things like ask was the command given this argument without doing error-prone regex matching on the whole command line. Not impossible, but won't be easy either. Here too I think you're seriously overestimating the difficulty. Sure; but I have to say that checkbashisms is currently are both things together and the perl module allows bashisms + whatever you want. If it provides an API to do what's mentioned above, that's great. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Yet another patch + how to check for stuff in shell scripts
Russ Allbery [EMAIL PROTECTED] writes: --- a/checks/scripts +++ b/checks/scripts @@ -697,9 +697,12 @@ while (SCRIPTS) { $line =~ s/(^|[^\\\'](?:)*)\(?:\\.|[^\\\])+\/$1/g; for my $re (@bashism_regexs) { if ($line =~ m/($re)/) { - $found = 1; ($match) = m/($re)/; - last; + unless ($match =~ m/^\s*time\s*$/ + Dep::implies($deps{depends}, Dep::parse('time'))) { + $found = 1; + last; + } } } } This is surely wrong. It gives a free pass on any bashisms in a line *ending* in time if there's a dependency on time. Oh, no, I see now. You're matching inside the bashism match. Hm, yeah, that will probably work, although it feels potentially fragile... but I can't imagine any other bashism that would match that regex. However, I do think the code would be cleaner if that were pulled out into a separate check. The Dep::implies just seems to be in an odd location doing it that way, and accumulating exceptions inside that loop feels like the wrong direction to go for long-term code maintainability. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Yet another patch + how to check for stuff in shell scripts
Russ Allbery wrote: Raphael Geissert [EMAIL PROTECTED] writes: Besides that patch, I'd like to know the opinion of others about how shell scripts should be checked. At the moment checks/scripts now makes use of $LEADIN as a separator/command identifier, but other checks also checking for stuff in shell script don't. This caused me troubles when I prefixed the invoke-rc.d calls in testset/scripts/debian/postinst with 'time'. Doing that caused most, if not all, of checks/init.d results to be completely incorrect because it failed to detect the invoke-rc.d calls. Given that running invoke-rc.d under time in a Debian maintainer script doesn't seem like sane behavior, I don't think I have a problem with that specific case. However, in general, there are a lot of shell script checks in Lintian that try to figure out what's being run, some of which dealing with conditionals, and they're not done in a consistent way. I would really like to see this solved by taking a higher-level look at shell parsing and writing a module that has some deeper understanding of conditionals or at least a structure where we can apply that. Just applying $LEADIN is going to be wrong in a lot of cases. I think we need At least on checkbashisms it has not only reduced the number of false positives, but even reduced the number of false negatives; because it knows where to start looking for a command. a documented module with a clearer picture of what it can and can't do which understands some of the common conditional constructs, such as checking whether a command exists, and tracks things like if/fi nesting. Well, that would be even more complex and complicated; and if it were so easy I can be sure that someone would have already written such a module. Although that problem could not be avoided just by using $LEADIN (because 'time' is not in the list of separators, so someone please add it) similar problems can. I don't think it makes sense to add time to $LEADIN in general, since time changes the nature of the command similar to adding env in front of something; it forces the command to be external rather than a built-in, It depends on whether the built-in time is used or not. If the built-in is used: $ time [ 1 == 1 ]; echo $? real0m0.000s user0m0.000s sys 0m0.000s 0 If not: $ /usr/bin/time [ 1 == 1 ]; echo $? [: ==: binary operator expected Command exited with non-zero status 2 0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+190minor)pagefaults 0swaps 2 which means that many of the things we check for become okay, such as echo with flags or the full syntax of kill. It again depends, but I see your point. Cheers, -- Atomo64 - Raphael Please avoid sending me Word, PowerPoint or Excel attachments. See http://www.gnu.org/philosophy/no-word-attachments.html -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Yet another patch + how to check for stuff in shell scripts
Raphael Geissert [EMAIL PROTECTED] writes: Russ Allbery wrote: Given that running invoke-rc.d under time in a Debian maintainer script doesn't seem like sane behavior, I don't think I have a problem with that specific case. However, in general, there are a lot of shell script checks in Lintian that try to figure out what's being run, some of which dealing with conditionals, and they're not done in a consistent way. I would really like to see this solved by taking a higher-level look at shell parsing and writing a module that has some deeper understanding of conditionals or at least a structure where we can apply that. Just applying $LEADIN is going to be wrong in a lot of cases. I think we need At least on checkbashisms it has not only reduced the number of false positives, but even reduced the number of false negatives; because it knows where to start looking for a command. Sorry, I wasn't clear. Adding $LEADIN to all the other shell parsing code in Lintian is something that I think we want to think about very carefully, and I wish we had a higher-level approach. It's certainly fine for the bashism code. I'm nervous about it as a general approach to all of the shell parsing code in lintian, such as the invoke-rc.d check and the whole ton of other checks that look at what maintainer scripts do or don't run. Regardless of which way we do this, it really does need to be in a module rather than embedded direction in the checks/* scripts, so that we have one place that does the work consistently and which we can change if we need to in order to cope with additional cases. (We need that anyway in order to do bashism checks on debian/rules and on other scripts than maintainer scripts.) a documented module with a clearer picture of what it can and can't do which understands some of the common conditional constructs, such as checking whether a command exists, and tracks things like if/fi nesting. Well, that would be even more complex and complicated; and if it were so easy I can be sure that someone would have already written such a module. Well, what I'm trying to get at is that the current situation is fragile, and making it more fragile is not appealing. From a maintainability standpoint, the current shell parsing in Lintian is some of the hardest code to both understand and keep working. Parsing shell scripts with regexes is extremely painful. The best solution for Debian as a whole, IMO, would be to develop a mini command language that could do 90% of what maintainer scripts do and leave shell in the same bucket that Perl is in now: only necessary for weird cases where the maintainer can then be presumed to know what they're doing. But that's obviously outside the scope of what Lintian can do. -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Yet another patch + how to check for stuff in shell scripts
Russ Allbery wrote: Raphael Geissert [EMAIL PROTECTED] writes: Russ Allbery wrote: Given that running invoke-rc.d under time in a Debian maintainer script doesn't seem like sane behavior, I don't think I have a problem with that specific case. However, in general, there are a lot of shell script checks in Lintian that try to figure out what's being run, some of which dealing with conditionals, and they're not done in a consistent way. I would really like to see this solved by taking a higher-level look at shell parsing and writing a module that has some deeper understanding of conditionals or at least a structure where we can apply that. Just applying $LEADIN is going to be wrong in a lot of cases. I think we need At least on checkbashisms it has not only reduced the number of false positives, but even reduced the number of false negatives; because it knows where to start looking for a command. Sorry, I wasn't clear. Adding $LEADIN to all the other shell parsing code in Lintian is something that I think we want to think about very carefully, and I wish we had a higher-level approach. It's certainly fine Of course, $LEADIN isn't suitable for every check lintian is doing. My main point was that several checks are actually very prone to false positives and false negatives. for the bashism code. I'm nervous about it as a general approach to all of the shell parsing code in lintian, such as the invoke-rc.d check and the whole ton of other checks that look at what maintainer scripts do or don't run. Regardless of which way we do this, it really does need to be in a module rather than embedded direction in the checks/* scripts, so that we have one place that does the work consistently and which we can change if we need to in order to cope with additional cases. (We need that anyway in order to do bashism checks on debian/rules and on other scripts than maintainer scripts.) I've already converted checkbashisms into a perl module, and its interface. I sent a copy of both today to Adam so he can take a look at them; there are still two or three TODO's, but all are extras which the original checkbashisms doesn't have. I tried to design the module to be lintian friendly and I've even started to implement it in checks/scripts; but I have to confess that some of the checks being done right after the bashisms checks in checks/scripts are painful and require special care. The idea is to also perform all, or at least most, of the non-bashisms checks also in Devscripts::Checkbashisms and later separate the results and emit the necessary tags. That's what the add_hash method is for. a documented module with a clearer picture of what it can and can't do which understands some of the common conditional constructs, such as checking whether a command exists, and tracks things like if/fi nesting. Well, that would be even more complex and complicated; and if it were so easy I can be sure that someone would have already written such a module. Well, what I'm trying to get at is that the current situation is fragile, and making it more fragile is not appealing. From a maintainability standpoint, the current shell parsing in Lintian is some of the hardest code to both understand and keep working. Parsing shell scripts with regexes is extremely painful. Sure it is :) The best solution for Debian as a whole, IMO, would be to develop a mini command language that could do 90% of what maintainer scripts do and leave shell in the same bucket that Perl is in now: only necessary for weird cases where the maintainer can then be presumed to know what they're doing. But that's obviously outside the scope of what Lintian can do. Weren't shell scripts supposed to do that? :) IMHO using yet another command language is *not* the solution. Cheers, -- Atomo64 - Raphael Please avoid sending me Word, PowerPoint or Excel attachments. See http://www.gnu.org/philosophy/no-word-attachments.html -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Yet another patch + how to check for stuff in shell scripts
Raphael Geissert [EMAIL PROTECTED] writes: Of course, $LEADIN isn't suitable for every check lintian is doing. My main point was that several checks are actually very prone to false positives and false negatives. Yeah, that's part of the problem and one of the things that makes the code hard to maintain: some of the existing patterns are actually intentionally tuned to avoid false positives that we found in practice, whereas some are just randomly the way they are for no particularly good reason, and there's no good way to tell the difference. Lintian has a very effective feedback mechanism for false positives and I regularly look through lintian.d.o trying to find them and fix them, but it has no such feedback mechanism for false negatives. For a lot of the checks (such as the unescaped period in invoke-rc.d), the false positive risk is just theoretical since no one does that. Which doesn't mean we shouldn't improve it, of course, if we can without creating problems for ourselves. I tried to design the module to be lintian friendly and I've even started to implement it in checks/scripts; but I have to confess that some of the checks being done right after the bashisms checks in checks/scripts are painful and require special care. Yeah, there's a lot of that in Lintian. The idea is to also perform all, or at least most, of the non-bashisms checks also in Devscripts::Checkbashisms and later separate the results and emit the necessary tags. That's what the add_hash method is for. I'll have to see the patch to really comment. Please do bear in mind that we have to keep things running in stable, which means that adding new package dependencies, while possible, is annoying to deal with. I'm not sure which way would be best -- to have devscripts depend on lintian and use a module it provides, to have lintian depend on devscripts and use a module it provides, or to package an independent package like Frank did with libparse-debianchangelog-perl (although that's a special case since it's also used by other things, like packages.debian.org). What I think Lintian really needs is a module that can do the following: * Read a shell script and provide the caller with a stream of logical lines, dealing with things like backslash escapes so that each caller doesn't have to. * Recognize heredocs and filter them out of the command stream so that the caller only sees the contents of heredocs as arguments and not as possible separate commands. * Recognize conditionals so that the caller can get simple information like is this line inside a conditional or would it always execute?. This is important in Lintian in several places for false positive suppression; we often can assume that if code is conditional, the maintainer probably knows what they're doing and we shouldn't bug them about something we're unsure about. * Recognize common patterns that Lintian needs to be concerned with, such as whether a conditional probes for the existence of a command using the regular which / command -v methods. There are a bunch of stock shell script parsing patterns like that in Lintian right now. * Optionally do simple canonicalization on commands before returning them to the caller for analysis, doing things like removing unnecessary quotes or filtering out command prefixes that don't affect whether the command would run (such as your time example, or more commonly, things like @ and - in Makefiles). * Parse commands into a list of command and arguments so that code can do simple things like ask was the command given this argument without doing error-prone regex matching on the whole command line. You could incorporate bashism checks into the same module, although to some extent a bashism checker is a consumer of this sort of parsing framework. This is a chunk of work, yes, but bits of Lintian already do a lot of this, and the checkbashisms code in particular already does quite a lot. Working through the API to make sure the code is clean and readable is probably the hardest part. My off-the-cuff estimate is that it would take me about eight hours of work, but I've been working on Policy recently and need to work on Kerberos packages next and won't have those eight hours for probably a month or two at least. Once we had something like this, a ton of code in Lintian could be simplified by using it, far more broadly than just check/scripts. We look for patterns in maintainer scripts in all sorts of places in Lintian, and do it in a different way almost every time right now. Weren't shell scripts supposed to do that? :) IMHO using yet another command language is *not* the solution. Well, I obviously don't agree, but unless I or someone else has a chance to write the code and specification for how it would work and see what it looks like, it's an academic argument. :) -- Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/ -- To