Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-09 Thread Sebastian Nowicki


On 09/01/2010, at 2:50 AM, Xyne wrote:


What was the problem with that from Sebastian which was discussed
earlier on the mailing lists, IRCs ? How does it know more ?



I don't know. I wrote this because I needed a PKGBUILD parser in Perl
for Bauerbill. Maybe it's better, maybe it's worse. I posted it here  
in
case someone finds it useful per se, or wishes to take any of the  
ideas

from it and use them to iimprove other parsers.


It is quite a clever idea. I haven't seen this approach before. I  
haven't looked at it thoroughly, but it looks like you're simply  
sourcing the PKGBUILD with some trickery not to execute the code. Why  
then the need for further parsing? Does `set` produce raw bash, e.g.  
'source=(https://localhost/$pkgname.tgz;)'? It seems like bash should  
be able to do it itself. If that were the case, the parser would be  
extremely reliable (definitely more so than mine). There are still  
some safety issues involved, although maybe not for your purposes.  
One major thing is infinite loops - there's no way to break them. I'm  
sure this parser will be very useful when such things aren't an issue.



Hmmm, after briefly reviewing the messages, I can mention that my
parser:
* doesn't depent on Yacc/Lex
* supports split packages already
* handles multiline assignments
* supports interpolation and string substitutions


For the record pkgparse does support split packages and word  
substitutions (though it's primitive atm, i.e. only $foo works,  
modifiers like ${foo##bar} don't). The major problem is with multiline  
assignments, but once that get's fixed, most PKGBUILDs should be parse- 
able. It probably won't depend on yacc/lex anymore either, but it will  
depend on Lemon/Ragel, as that's the direction it seems to be  
going :P. It's a compile-time dependency though, so it's not really a  
reason not to use it. To use it in perl you'd have to make perl  
bindings, which would require compilation anyway.




Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-09 Thread Xyne
 It is quite a clever idea. I haven't seen this approach before. I  
 haven't looked at it thoroughly, but it looks like you're simply  
 sourcing the PKGBUILD with some trickery not to execute the code. Why  
 then the need for further parsing? Does `set` produce raw bash, e.g.  
 'source=(https://localhost/$pkgname.tgz;)'? It seems like bash should  
 be able to do it itself. If that were the case, the parser would be  
 extremely reliable (definitely more so than mine). There are still  
 some safety issues involved, although maybe not for your purposes.  
 One major thing is infinite loops - there's no way to break them. I'm  
 sure this parser will be very useful when such things aren't an issue.

You haven't fully understood how it works so I hope you don't mind if I
try to explain it again.

I first check the PKGBUILD with /bin/bash -n PKGBUILD. If this
command exits without error then the PKGBUILD contains valid syntax,
most importantly it does not contain extra closing brackets (}).

This lets me wrap the entire PKGBUILD in a function, e.g.
pkgbuild () {
PKGBUILD
}

I can then source the file with Bash without executing any code. The
previous check with bash -n guarantees that the PKGBUILD can not
escape the wrapping function. Because all code is inside a function,
sourcing the file does not execute any code at all.

Bash simply parses the file and stores the code itself in the
pkgbuild function, which itself contains other variables and
functions (e.g. package_foo, build). Because the code has not been
executed, the variables have not been expanded/interpolated and thus
still contain things such s http://example.com/$pkgname-$pkgver.tar;,
which is why it must still be intepolated by the parser.

The advantage of this method is that set will print out the
pkgbuild function and its contents in a canonical form, e.g. all
assignments to a variable are on a single line, if/then/else statements
follow a single format, etc.

This makes it possible to easily parse the assignments themselves, in
the order that they occur, without haing to consider all variations of
valid whitespace in statements. The parser simply needs to recognize
Bash syntax for things such as string substitutions, but this is a
relatively limted set so it is not difficult to handle all such cases.
The output of set also guarantees that you have a representation of
all variable assignments (in sequential order, and within their local
environment) so you have all the information that you need to
interpolate them. You could even handle command output if you wish,
using a command white-list to make sure that no trickery is used to run
malicious code.

Let me repeat that my method does not run any code in the
PKGBUILD. I've tested this by including an infinite loop at the top of
the file and it was not executed. I actually believe that this method
provides a perfectly safe and potentially very reliable method of
retrieving all metadata in the PKGBUILD with very little dependencies
and considerable portability.


Regards,
Xyne


Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-09 Thread Loui Chang
On Sat 09 Jan 2010 21:23 +0100, Xyne wrote:
 You haven't fully understood how it works so I hope you don't mind if I
 try to explain it again.
 
 I first check the PKGBUILD with /bin/bash -n PKGBUILD. If this
 command exits without error then the PKGBUILD contains valid syntax,
 most importantly it does not contain extra closing brackets (}).
 [...]

Wow this is quite clever. It definitely would make the job of parsing
much easier. Thanks for the explanation.



Re: [aur-dev] patch for AUR about setting the DEFAULT_LANG

2010-01-09 Thread Loui Chang
On Mon 30 Nov 2009 17:27 +0800, Athurg Gooth wrote:
 When i port a chinese version AUR,  I fount this bug. That once i setting a
 default language to sth(eg: zh_CN) by change DEFAULT_LANG macr define in
 web/lib/config.inc, it won't work, and this language page(here is zh_CN)
 could not show its native strings.

DEFAULT_LANG was supposed to indicate the language that strings in the
code are written in, so that if someone asked for 'en' then the code
wouldn't look for en.po and come up with an error.

I think your idea makes more sense though.

 Then i turn back to check if i got a wrong spelling. But i fount the
 developer have told us that options couldn't be change(in
 web/lib/config.inc, line 48). So I think maybe its a bug which havn't been
 fixed.
 
 After i check all the code about language setting, i think i got  the
 reason. We have two problems which cause that bug.
 
 First, in .../web/lib/aur.inc, between line 296 to line 298. Even the
 $LANG==DEFAULT_LANG, we should include the $LANG.po file. Because once the
 DEFAULT_LANG isn't english, we also need translate the strings.
 So i just suggest add an 'else' branch after line 298 to include
 DEFAULT_LANG.po.such aselse{include_once(DEFAULT_LANG..po;)}
 
 Second, in .../web/lib/translator.inc, between line 52 to line 62. The
 reason is as the same as i said above. If we havn't set a $LANG var, the
 $LANG will be set to DEFAULT_LANG. But the DEFAULT_LANG doesn't mean
 english. Even the $LANG havn't been set, the $_t maybe setting (see  Firest
 above) when include from DEFAULT_LANG.po. We should also translate them.
 So i think we should remove the 'else' identify. make the 'else' branch work
 for ever.
 
 By the way if the function include_lang() in .../web/lib/translator.inc,
 between line 32 to line 40 is an old function to make the lang func? Maybe
 we should remove them.

Indeed. I think we can remove it now.

 I prepare to make a mirror for our AUR to chinese people, how could i got
 the databases an files from aur.archlinux.org. OR i couldn't make a mirror
 for that.

It's great to hear that people are playing with the AUR code.
Thanks for the patch. I've applied a slightly modified version. It has
helped reveal some redundant code that we could eliminate too. Please
let us know more about your ideas about mirroring the AUR.

Thanks and cheers!
Sorry about the delay.



Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.

2010-01-09 Thread Xyne
Loui Chang wrote:

 Wow this is quite clever. It definitely would make the job of parsing
 much easier. Thanks for the explanation.

:)

I intend to flesh out the parser as special cases pop up. As already
mentioned, there will be limits to what it can do depending on whether
the packager uses command output to set variables, but perhaps Arch
could eventually impose a de facto standard for PKGBUILDs using the
parser itself as the standard, i.e. if the PKGBUILD metadata gets
past the parser, the PKGBUILD itself would be considered invalid. In
that case, the parser would support tricks such as

[ $ARCH == x86_64 ]  depends=('foo' 'bar')

I want to be very clear that I am NOT suggesting that my parser become
the standard, only that a parser based on this approach _could_ become
one.

Also note that this is really a method on its own that just happens to
be implemented in Perl in this case. If you look at the code, you will
see that it could very quickly be adapted to Python (and thus Django),
or PHP, or just about anything.