Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-12 Thread Jim Pryor
On Tue, Jan 12, 2010 at 02:29:35PM +0100, Xyne wrote:

> That doesnt work for overridden variables in split packages because they
> are set inside the packaging function(s).

Yes, right, good point. That answers a question I asked in another
message.

> Even without that to consider, you cannot blindly trust the variable
> declaration section of PKGBUILDs uploaded to the AUR.

Yes, exactly, that's why I was thinking of exploits your method might
still be vulnerable to unless you take special steps to catch them.

-- 
Jim Pryor
prof...@jimpryor.net


Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-12 Thread Xyne
> If you're willing to trust the variable declaration part of the
> PKGBUILD, then yeah it'd be easy to execute just that part. You don't
> even need to cut out the build() function, since executing the whole
> thing would only declare and not run that function. All you'd need to do
> is to add some "echo"s at the end of the wrapper function you've
> constructed, and execute the wrapper function.


That doesnt work for overridden variables in split packages because they
are set inside the packaging function(s). Anything which could
selectively execute blocks of code inside of functions to get the
values of the variables would be far more complicated than this
approach and probably far more exploitable.

Even without that to consider, you cannot blindly trust the variable
declaration section of PKGBUILDs uploaded to the AUR.


Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-12 Thread Xyne
> Indeed. Perhaps Allan would be interested on this for his makepkg test  
> suite, although maybe more in the concept since the test suite us in  
> python.

It would be trivial to conver this to Python. I will probably do that
myself if there seems to be enough interest in it.


Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-12 Thread Xyne
> I was brainstorming to think of possible exploits. It looks like this is
> valid syntax:
> 
> echo normal stuff
> exit 0
> any funky stuff I want
> pkgver=#$#%$%%^&^...@#$$@^ } more funky stuff {
> 
> Running bash -n on that gives 0. Now there's not necessarily anything
> wrong here---unless your parser doesn't stop parsing at the exit command.
> If it goes past that, then maybe exploits could be introduced, because
> we wouldn't be entitled to the assumption that the rest of the code is
> valid syntax.
> 
> -- 
> Jim Pryor

I haven't tested that but I don't think it would be an issue. As long
as it doesn't break out of the function declaration, it shoulld work
and afaik, you can include "exit" inside a function. I'm not a Bash
expert though, so correct me if I'm wrong.


Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-11 Thread Jim Pryor
On Sat, Jan 09, 2010 at 09:23:56PM +0100, Xyne wrote:

> I first check the PKGBUILD with "/bin/bash -n PKGBUILD". If this
> command exits without error then the PKGBUILD contains valid syntax,
> most importantly it does not contain extra closing brackets ("}").
> 
> This lets me wrap the entire PKGBUILD in a function, e.g.
> pkgbuild () {
> 
> }
> 
> I can then source the file with Bash without executing any code. The
> previous check with "bash -n" guarantees that the PKGBUILD can not
> escape the wrapping function. Because all code is inside a function,
> sourcing the file does not execute any code at all.
> 
> Bash simply parses the file and stores the code itself in the
> "pkgbuild" function, which itself contains other variables and
> functions (e.g. package_foo, build). Because the code has not been
> executed, the variables have not been expanded/interpolated and thus
> still contain things such s "http://example.com/$pkgname-$pkgver.tar";,
> which is why it must still be intepolated by the parser.

I was brainstorming to think of possible exploits. It looks like this is
valid syntax:

echo normal stuff
exit 0
any funky stuff I want
pkgver=#$#%$%%^&^...@#$$@^ } more funky stuff {

Running bash -n on that gives 0. Now there's not necessarily anything
wrong here---unless your parser doesn't stop parsing at the exit command.
If it goes past that, then maybe exploits could be introduced, because
we wouldn't be entitled to the assumption that the rest of the code is
valid syntax.

-- 
Jim Pryor
prof...@jimpryor.net


Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-11 Thread Jim Pryor
On Tue, Jan 12, 2010 at 07:52:54AM +0800, Sebastian Nowicki wrote:

> >Bash simply parses the file and stores the code itself in the
> >"pkgbuild" function, which itself contains other variables and
> >functions (e.g. package_foo, build). Because the code has not been
> >executed, the variables have not been expanded/interpolated and thus
> >still contain things such s "http://example.com/$pkgname-$pkgver.tar";,
> >which is why it must still be intepolated by the parser.
> 
> It seems I did understand it, I just forgot assignments don't get
> interpreted. I suppose there's no way to get bash to execute the
> assignments but not the code? Perhaps filtering the function
> definitions from set output. I havn't looked at the output of set
> so, again, I'm shooting in the dark here.

If you're willing to trust the variable declaration part of the
PKGBUILD, then yeah it'd be easy to execute just that part. You don't
even need to cut out the build() function, since executing the whole
thing would only declare and not run that function. All you'd need to do
is to add some "echo"s at the end of the wrapper function you've
constructed, and execute the wrapper function.

But this would essentially be the same as making a temp copy of the PKGBUILD,
and adding some "echo"s at the end of the file, then sourcing the file.

Xyne doesn't want to trust the PKGBUILDs that far, he's only using bash
to format the code canonically, then he's regexping the canonical form
instead of executing any of it.

-- 
Jim Pryor
prof...@jimpryor.net


Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-11 Thread Sebastian Nowicki



On 10/01/2010, at 4:23, Xyne  wrote:


It is quite a clever idea. I haven't seen this approach before. I
haven't looked at it thoroughly, but it looks like you're simply
sourcing the PKGBUILD with some trickery not to execute the code. Why
then the need for further parsing? Does `set` produce "raw" bash,  
e.g.
'source=("https://localhost/$pkgname.tgz";)'? It seems like bash  
should

be able to do it itself. If that were the case, the parser would be
extremely reliable (definitely more so than mine). There are still
some "safety" issues involved, although maybe not for your purposes.
One major thing is infinite loops - there's no way to break them. I'm
sure this parser will be very useful when such things aren't an  
issue.

Bash simply parses the file and stores the code itself in the
"pkgbuild" function, which itself contains other variables and
functions (e.g. package_foo, build). Because the code has not been
executed, the variables have not been expanded/interpolated and thus
still contain things such s "http://example.com/$pkgname-$pkgver.tar";,
which is why it must still be intepolated by the parser.


It seems I did understand it, I just forgot assignments don't get  
interpreted. I suppose there's no way to get bash to execute the  
assignments but not the code? Perhaps filtering the function  
definitions from set output. I havn't looked at the output of set so,  
again, I'm shooting in the dark here.




The advantage of this method is that "set" will print out the
"pkgbuild" function and its contents in a canonical form, e.g. all
assignments to a variable are on a single line, if/then/else  
statements

follow a single format, etc.


That is very handy indeed.


Let me repeat that my method does not run any code in the
PKGBUILD. I've tested this by including an infinite loop at the top of
the file and it was not executed. I actually believe that this method
provides a perfectly safe and potentially very reliable method of
retrieving all metadata in the PKGBUILD with very little dependencies
and considerable portability.


Indeed. Perhaps Allan would be interested on this for his makepkg test  
suite, although maybe more in the concept since the test suite us in  
python.





Regards,
Xyne


Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-09 Thread Xyne
Loui Chang wrote:

> Wow this is quite clever. It definitely would make the job of parsing
> much easier. Thanks for the explanation.

:)

I intend to flesh out the parser as special cases pop up. As already
mentioned, there will be limits to what it can do depending on whether
the packager uses command output to set variables, but perhaps Arch
could eventually impose a de facto standard for PKGBUILDs using the
parser itself as the standard, i.e. if the PKGBUILD metadata gets
past the parser, the PKGBUILD itself would be considered invalid. In
that case, the parser would support tricks such as

[ "$ARCH" == "x86_64" ] && depends=('foo' 'bar')

I want to be very clear that I am NOT suggesting that my parser become
the standard, only that a parser based on this approach _could_ become
one.

Also note that this is really a method on its own that just happens to
be implemented in Perl in this case. If you look at the code, you will
see that it could very quickly be adapted to Python (and thus Django),
or PHP, or just about anything.


Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-09 Thread Loui Chang
On Sat 09 Jan 2010 21:23 +0100, Xyne wrote:
> You haven't fully understood how it works so I hope you don't mind if I
> try to explain it again.
> 
> I first check the PKGBUILD with "/bin/bash -n PKGBUILD". If this
> command exits without error then the PKGBUILD contains valid syntax,
> most importantly it does not contain extra closing brackets ("}").
> [...]

Wow this is quite clever. It definitely would make the job of parsing
much easier. Thanks for the explanation.



Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-09 Thread Xyne
> It is quite a clever idea. I haven't seen this approach before. I  
> haven't looked at it thoroughly, but it looks like you're simply  
> sourcing the PKGBUILD with some trickery not to execute the code. Why  
> then the need for further parsing? Does `set` produce "raw" bash, e.g.  
> 'source=("https://localhost/$pkgname.tgz";)'? It seems like bash should  
> be able to do it itself. If that were the case, the parser would be  
> extremely reliable (definitely more so than mine). There are still  
> some "safety" issues involved, although maybe not for your purposes.  
> One major thing is infinite loops - there's no way to break them. I'm  
> sure this parser will be very useful when such things aren't an issue.

You haven't fully understood how it works so I hope you don't mind if I
try to explain it again.

I first check the PKGBUILD with "/bin/bash -n PKGBUILD". If this
command exits without error then the PKGBUILD contains valid syntax,
most importantly it does not contain extra closing brackets ("}").

This lets me wrap the entire PKGBUILD in a function, e.g.
pkgbuild () {

}

I can then source the file with Bash without executing any code. The
previous check with "bash -n" guarantees that the PKGBUILD can not
escape the wrapping function. Because all code is inside a function,
sourcing the file does not execute any code at all.

Bash simply parses the file and stores the code itself in the
"pkgbuild" function, which itself contains other variables and
functions (e.g. package_foo, build). Because the code has not been
executed, the variables have not been expanded/interpolated and thus
still contain things such s "http://example.com/$pkgname-$pkgver.tar";,
which is why it must still be intepolated by the parser.

The advantage of this method is that "set" will print out the
"pkgbuild" function and its contents in a canonical form, e.g. all
assignments to a variable are on a single line, if/then/else statements
follow a single format, etc.

This makes it possible to easily parse the assignments themselves, in
the order that they occur, without haing to consider all variations of
valid whitespace in statements. The parser simply needs to recognize
Bash syntax for things such as string substitutions, but this is a
relatively limted set so it is not difficult to handle all such cases.
The output of "set" also guarantees that you have a representation of
all variable assignments (in sequential order, and within their local
environment) so you have all the information that you need to
interpolate them. You could even handle command output if you wish,
using a command white-list to make sure that no trickery is used to run
malicious code.

Let me repeat that my method does not run any code in the
PKGBUILD. I've tested this by including an infinite loop at the top of
the file and it was not executed. I actually believe that this method
provides a perfectly safe and potentially very reliable method of
retrieving all metadata in the PKGBUILD with very little dependencies
and considerable portability.


Regards,
Xyne


Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-09 Thread Sebastian Nowicki


On 09/01/2010, at 2:50 AM, Xyne wrote:


What was the problem with that from Sebastian which was discussed
earlier on the mailing lists, IRCs ? How does it know more ?



I don't know. I wrote this because I needed a PKGBUILD parser in Perl
for Bauerbill. Maybe it's better, maybe it's worse. I posted it here  
in
case someone finds it useful per se, or wishes to take any of the  
ideas

from it and use them to iimprove other parsers.


It is quite a clever idea. I haven't seen this approach before. I  
haven't looked at it thoroughly, but it looks like you're simply  
sourcing the PKGBUILD with some trickery not to execute the code. Why  
then the need for further parsing? Does `set` produce "raw" bash, e.g.  
'source=("https://localhost/$pkgname.tgz";)'? It seems like bash should  
be able to do it itself. If that were the case, the parser would be  
extremely reliable (definitely more so than mine). There are still  
some "safety" issues involved, although maybe not for your purposes.  
One major thing is infinite loops - there's no way to break them. I'm  
sure this parser will be very useful when such things aren't an issue.



Hmmm, after briefly reviewing the messages, I can mention that my
parser:
* doesn't depent on Yacc/Lex
* supports split packages already
* handles multiline assignments
* supports interpolation and string substitutions


For the record pkgparse does support split packages and word  
substitutions (though it's primitive atm, i.e. only $foo works,  
modifiers like ${foo##bar} don't). The major problem is with multiline  
assignments, but once that get's fixed, most PKGBUILDs should be parse- 
able. It probably won't depend on yacc/lex anymore either, but it will  
depend on Lemon/Ragel, as that's the direction it seems to be  
going :P. It's a compile-time dependency though, so it's not really a  
reason not to use it. To use it in perl you'd have to make perl  
bindings, which would require compilation anyway.




Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-08 Thread Xyne
> What was the problem with that from Sebastian which was discussed
> earlier on the mailing lists, IRCs ? How does it know more ?
> 

I don't know. I wrote this because I needed a PKGBUILD parser in Perl
for Bauerbill. Maybe it's better, maybe it's worse. I posted it here in
case someone finds it useful per se, or wishes to take any of the ideas
from it and use them to iimprove other parsers.

Hmmm, after briefly reviewing the messages, I can mention that my
parser:
* doesn't depent on Yacc/Lex
* supports split packages already
* handles multiline assignments
* supports interpolation and string substitutions


Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-08 Thread Laszlo Papp
On Fri, Jan 8, 2010 at 6:06 PM, Xyne  wrote:
> Hi,
>
> There was no response on the pacman-dev list but someone here might
> find this potentially useful:
>
> http://mailman.archlinux.org/pipermail/pacman-dev/2010-January/010322.html
>
> It's written in Perl but it could easily be adapted to Python. It
> handles Bash variable interpolation and string substitutions so far.
> I'll add support for other PKGBUILD "tricks"as I come across them, e.g.
> setting dependencies by architecture. The post linked above includes
> simplified JSON output from the split kernel26 package.
>
> Regards,
> Xyne
>

What was the problem with that from Sebastian which was discussed
earlier on the mailing lists, IRCs ? How does it know more ?

Best Regards,
Laszlo Papp


[aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-08 Thread Xyne
Hi,

There was no response on the pacman-dev list but someone here might
find this potentially useful:

http://mailman.archlinux.org/pipermail/pacman-dev/2010-January/010322.html

It's written in Perl but it could easily be adapted to Python. It
handles Bash variable interpolation and string substitutions so far.
I'll add support for other PKGBUILD "tricks"as I come across them, e.g.
setting dependencies by architecture. The post linked above includes
simplified JSON output from the split kernel26 package.

Regards,
Xyne