Re: Output of dpkg-scanpackages as XML

2005-01-13 Thread Frank S. Thomas
On Thursday 06 January 2005 02:14, Sven Mueller wrote: I would appreciate it if you could make your scripts available once they are (nearly;-)) finished. Ok, I nearly finished it :) It is written in Python and is called dctrl2xml. Binary and source packages are apt-getable from my private

Re: Output of dpkg-scanpackages as XML

2005-01-13 Thread William Ballard
On Thu, Jan 13, 2005 at 08:14:31PM +0100, Frank S. Thomas wrote: I could successfully convert sid's binary-i386/source package files into XML. However, because dctrl2xml builds up a dom tree of all packages in these files, this is horribly slow. There may also be some cases where dctrl2xml

Re: Output of dpkg-scanpackages as XML

2005-01-13 Thread William Ballard
On Thu, Jan 13, 2005 at 10:06:25PM +0100, Frank S. Thomas wrote: Using Python's DOM is (as far as I can see) the easiest and robust way to accomplish the task. It is working but slow if it processes a file with more than 8000 package records. But this was never my goal, I only wanted to

Re: Output of dpkg-scanpackages as XML

2005-01-07 Thread Antti-Juhani Kaijanaho
On 20050105T163207-0500, William Ballard wrote: echo 'packagesentry' zcat /a/dists/latest/binary-i386/Packages.gz | \ grep-dctrl . | sed -r -e 's/(Description): (.+)/\1Short-Description\2\/Short-DescriptionLong-Decription![CDATA[/' \ -e 's/([^:]+): (.+)/\1\2\/\1/' \ -e

Re: Output of dpkg-scanpackages as XML

2005-01-07 Thread William Ballard
On Fri, Jan 07, 2005 at 05:36:12PM +0200, Antti-Juhani Kaijanaho wrote: Not that I'm not flattered by the fact that you use grep-dctrl, but ... Is it really your intent here to filter out those packages whose packages record does not contain a literal dot? Sounds like a quite puzzling

Re: Output of dpkg-scanpackages as XML

2005-01-06 Thread Sam Watkins
On Wed, Jan 05, 2005 at 11:24:46PM +, David Given wrote: It's still an ad-hoc solution, though; does anyone know of versions of the standard textutils that know about Unicode? the Plan 9 ones would use utf-8, but I suppose they're not POSIX. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED]

Output of dpkg-scanpackages as XML

2005-01-05 Thread Frank S. Thomas
Hi, I want to publish on my homepage a list of packages, that are in my private package repository. Therefore it would be useful, if I could convert the output of 'dpkg-scanpackages' and 'dpkg-scansources' into XML, so that I only have to write the appropriate XSLT stylesheets. Is there any

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Wed, Jan 05, 2005 at 09:57:26PM +0100, Frank S. Thomas wrote: Hi, I want to publish on my homepage a list of packages, that are in my private package repository. Therefore it would be useful, if I could convert the output of 'dpkg-scanpackages' and 'dpkg-scansources' into XML, so that I

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Wed, Jan 05, 2005 at 04:21:38PM -0500, William Ballard wrote: If you want to include the short or long descriptions you'd have to wrap those fields in CDATA tags, so you'd need an exta sed expression to handle that. This outputs all fields and splits the short and long descriptions: echo

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Wed, Jan 05, 2005 at 04:32:07PM -0500, William Ballard wrote: echo '/Long-Description/entry/packages' ^^^ Should have closed the CDATA tag here. The short description tag should probably be wrapped in CDATA too. If any package descriptions contain ]], it'll break it. You should

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread Justin Pryzby
On Wed, Jan 05, 2005 at 04:42:32PM -0500, William Ballard wrote: On Wed, Jan 05, 2005 at 04:32:07PM -0500, William Ballard wrote: echo '/Long-Description/entry/packages' ^^^ Should have closed the CDATA tag here. The short description tag should probably be wrapped in CDATA

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Wed, Jan 05, 2005 at 04:44:45PM -0500, Justin Pryzby wrote: On Wed, Jan 05, 2005 at 04:42:32PM -0500, William Ballard wrote: Is there a unicode shell which does all piping in Unicode? cmd.exe in NT has a switch that does all piping in Unicode Does it make a difference? Shouldn't a pipe

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread David Given
William Ballard wrote: [...] Of course you're right. But building XML with shell commands was always a lot easier when I could count on all shell output being 2-byte Unicode. It was a neat bit of magic, ascii and utf-8 text files would get turned into Unicode and I'd pipe them to cscript.exe and

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Wed, Jan 05, 2005 at 10:36:53PM +, David Given wrote: iconv is your friend: zcat Packages.gz | iconv -f utf8 -t ucs2-le | cscript In our case we're using sed. Is sed unicode-aware? (As an aside a lot of the commands you use in NT are builtins to cmd.exe and under this switch they

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread David Given
William Ballard wrote: [...] But back to Linux. $echo hi | iconv -f utf8 -t unicode | grep hi (no output) Not surprised; grep understands ASCII, AFAIK, so what you've just sent to it is: $ echo hi | iconv -f utf8 -t unicode | od -t x1 000 ff fe 68 00 69 00 0a 00 It can't find an 'h' and an

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread Sven Mueller
William Ballard wrote on 05/01/2005 22:42: On Wed, Jan 05, 2005 at 04:32:07PM -0500, William Ballard wrote: echo '/Long-Description/entry/packages' ^^^ Should have closed the CDATA tag here. The short description tag should probably be wrapped in CDATA too. If any package descriptions

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Thu, Jan 06, 2005 at 12:28:47AM +0100, Sven Mueller wrote: CDATA\2\/CDATA ![CDATA[This is the correct syntax for CDATA It can contain embedded and characters.]] The only thing it can't contain is ]] You didn't use an actual CDATA node you used an element named CDATA. You can leave out the

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread William Ballard
On Wed, Jan 05, 2005 at 07:11:06PM -0500, William Ballard wrote: You can leave out the CDATA element. Forgot to mention parsers are not obligated to respect whitespace and newlines unless it's in a ![CDATA[ ]] tag, though they usually do (and they have flags to control this). It's like the

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread Frank S. Thomas
On Wednesday 05 January 2005 22:21, William Ballard wrote: This is trivial with grep-dctrl and sed. For example: Thanks, for making me aware of grep-dctrl. echo 'packagesentry' zcat /a/dists/latest/binary-i386/Packages.gz | \ grep-dctrl -sPackage,Version . | \ sed -r -e

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread Sven Mueller
Frank S. Thomas wrote on 06/01/2005 01:46: Thanks so far. I think I'll write my own PHP script that will output a more structured XML document, so that I can create hyperlinks from the packages in the 'Depends' field and parse the quasi field 'Homepage'. I would appreciate it if you could make

Re: Output of dpkg-scanpackages as XML

2005-01-05 Thread Sven Mueller
Frank S. Thomas wrote on 06/01/2005 01:46: Thanks so far. I think I'll write my own PHP script that will output a more structured XML document, so that I can create hyperlinks from the packages in the 'Depends' field and parse the quasi field 'Homepage'. I would appreciate it if you could make