Bug#486425: ITP: bomstrip -- strip Byte-Order Marks from UTF-8 text files
On Mon, Jun 16, 2008 at 12:28:52PM +0300, Lars Wirzenius wrote: ma, 2008-06-16 kello 12:14 +0300, Peter Pentchev kirjoitti: Hm. Okay, so maybe the two command-line utilities and the collection might be separated. IMHO, the collection *is* still useful on its own :) If others share this opinion, I may either create two separate packages, or just remove the command-line utilities and file a wishlist bug against coreutils or textutils or something like that. How does that strike you? What do others think? I wish to ask this question: is packaging this collection directly useful to Debian users? How? If the task is to remove BOMs from files, then surely that should be served by a one-line sed or perl script (plus manual page, copyright license file, etc, so a total cost of about 200 KiB). I'm not saying that the collection should not be packaged, I'm just finding it hard to imagine it being useful as a Debian package. Yes, I agree with your reasoning. Okay then, so the new plan is to package the bomstrip and bomstrip-files utilities and the manual page, and leave developers who are interested in the various implementations to take a look at the source package. As to the packaging, though, I wonder. The sed and awk implementations of bomstrip have problems with files that do not end in a newline. Thus, I would prefer to package either the Perl or the C implementation. The C implementation is a bit larger (ELF and stuff), but the Perl implementation has a run-time dependency on, well, Perl :) My personal preference would be towards the C version to avoid the unnecessary dependency, but if people feel that 5 KB is too big for such a utility, I'll package up the Perl version instead :) In another message, Bryan Donlan wrote: Would the collection really be useful in /binary/ form, however? If the goal is to show how easy it is to write, installing a bunch of functionally identical /usr/bin/bomstrip.{c,ada,cplusplus,haskell,ocaml} binaries won't demonstrate much :) Errr, that would have been true if the package only distributed the compiled versions; however, the whole point of my original ITP was to install all the implementations into /usr/share/bomstrip/source/ and only one executable as /usr/bin/bomstrip :) (and a helper tool as /usr/bin/bomstrip-files, and manpages for both) But, yep, as I wrote above, the plan now is to only package the tools and the manpages. Thanks to everyone for the comments! G'luck, Peter -- Peter Pentchev [EMAIL PROTECTED][EMAIL PROTECTED][EMAIL PROTECTED] PGP key:http://people.FreeBSD.org/~roam/roam.key.asc Key fingerprint FDBA FD79 C26F 3C51 C95E DF9E ED18 B68D 1619 4553 This would easier understand fewer had omitted. pgpUECzJhZ1Kw.pgp Description: PGP signature
Bug#486425: ITP: bomstrip -- strip Byte-Order Marks from UTF-8 text files
On Mon, Jun 16, 2008 at 03:08:02AM +0300, Peter Pentchev wrote: * Package name: bomstrip Programming Lang: Awk, Brainf*ck, C, C++, Forth, Haskell, OCaml, Ook!, Pascal, PHP, Perl, PostScript, Python, Ruby, sed, Unlambda All these programming languages got me wondering. Apparently the same program is implemented in all these languages. But you only need one to get the desired functionality. Also, I see the sed variant is just a one-liner. Perhaps it is better if this functionality is merged with a package like coreutils or recode, if it is not already there someway. -- Met vriendelijke groet / with kind regards, Guus Sliepen [EMAIL PROTECTED] signature.asc Description: Digital signature
Bug#486425: ITP: bomstrip -- strip Byte-Order Marks from UTF-8 text files
On Mon, Jun 16, 2008 at 10:55:47AM +0200, Guus Sliepen wrote: On Mon, Jun 16, 2008 at 03:08:02AM +0300, Peter Pentchev wrote: * Package name: bomstrip Programming Lang: Awk, Brainf*ck, C, C++, Forth, Haskell, OCaml, Ook!, Pascal, PHP, Perl, PostScript, Python, Ruby, sed, Unlambda All these programming languages got me wondering. Apparently the same program is implemented in all these languages. But you only need one to get the desired functionality. Also, I see the sed variant is just a one-liner. Perhaps it is better if this functionality is merged with a package like coreutils or recode, if it is not already there someway. As the author writes on his website, the whole point of the bomstrip project being a collection of implementations is more of a social / political goal of spreading the word, showing how easy it is, bringing attention to the broken UTF-8 text files that some programs generate, and so on. IMHO, the distribution also servers as a nice way to demonstrate a simple (well, admittedly, a *very* simple :)) task done in various languages. Hm. Okay, so maybe the two command-line utilities and the collection might be separated. IMHO, the collection *is* still useful on its own :) If others share this opinion, I may either create two separate packages, or just remove the command-line utilities and file a wishlist bug against coreutils or textutils or something like that. How does that strike you? What do others think? G'luck, Peter -- Peter Pentchev [EMAIL PROTECTED][EMAIL PROTECTED][EMAIL PROTECTED] PGP key:http://people.FreeBSD.org/~roam/roam.key.asc Key fingerprint FDBA FD79 C26F 3C51 C95E DF9E ED18 B68D 1619 4553 Do you think anybody has ever had *precisely this thought* before? pgpLxaapQ0ZhU.pgp Description: PGP signature
Bug#486425: ITP: bomstrip -- strip Byte-Order Marks from UTF-8 text files
ma, 2008-06-16 kello 12:14 +0300, Peter Pentchev kirjoitti: Hm. Okay, so maybe the two command-line utilities and the collection might be separated. IMHO, the collection *is* still useful on its own :) If others share this opinion, I may either create two separate packages, or just remove the command-line utilities and file a wishlist bug against coreutils or textutils or something like that. How does that strike you? What do others think? I wish to ask this question: is packaging this collection directly useful to Debian users? How? If the task is to remove BOMs from files, then surely that should be served by a one-line sed or perl script (plus manual page, copyright license file, etc, so a total cost of about 200 KiB). I'm not saying that the collection should not be packaged, I'm just finding it hard to imagine it being useful as a Debian package. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#486425: ITP: bomstrip -- strip Byte-Order Marks from UTF-8 text files
On Mon, Jun 16, 2008 at 5:14 AM, Peter Pentchev [EMAIL PROTECTED] wrote: On Mon, Jun 16, 2008 at 10:55:47AM +0200, Guus Sliepen wrote: On Mon, Jun 16, 2008 at 03:08:02AM +0300, Peter Pentchev wrote: * Package name: bomstrip Programming Lang: Awk, Brainf*ck, C, C++, Forth, Haskell, OCaml, Ook!, Pascal, PHP, Perl, PostScript, Python, Ruby, sed, Unlambda All these programming languages got me wondering. Apparently the same program is implemented in all these languages. But you only need one to get the desired functionality. Also, I see the sed variant is just a one-liner. Perhaps it is better if this functionality is merged with a package like coreutils or recode, if it is not already there someway. As the author writes on his website, the whole point of the bomstrip project being a collection of implementations is more of a social / political goal of spreading the word, showing how easy it is, bringing attention to the broken UTF-8 text files that some programs generate, and so on. IMHO, the distribution also servers as a nice way to demonstrate a simple (well, admittedly, a *very* simple :)) task done in various languages. Hm. Okay, so maybe the two command-line utilities and the collection might be separated. IMHO, the collection *is* still useful on its own :) If others share this opinion, I may either create two separate packages, or just remove the command-line utilities and file a wishlist bug against coreutils or textutils or something like that. How does that strike you? What do others think? Would the collection really be useful in /binary/ form, however? If the goal is to show how easy it is to write, installing a bunch of functionally identical /usr/bin/bomstrip.{c,ada,cplusplus,haskell,ocaml} binaries won't demonstrate much :) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#486425: ITP: bomstrip -- strip Byte-Order Marks from UTF-8 text files
Package: wnpp Severity: wishlist Owner: Peter Pentchev [EMAIL PROTECTED] * Package name: bomstrip Version : 8 Upstream Author : Mechiel Lukkien [EMAIL PROTECTED] * URL : http://www.xs4all.nl/~mechiel/projects/bomstrip/ * License : public domain Programming Lang: Awk, Brainf*ck, C, C++, Forth, Haskell, OCaml, Ook!, Pascal, PHP, Perl, PostScript, Python, Ruby, sed, Unlambda Description : Strip Byte-Order Marks from UTF-8 text files The bomstrip distribution is a collection of filters stripping the three-byte Byte-Order Mark from UTF-8 text - in UTF-8, the BOM is not even needed, and it is often actually harmful. More information about the bomstrip distribution may be found on the author's site, http://www.xs4all.nl/~mechiel/projects/bomstrip/ What I intend to package is bomstrip-8 with a couple of my own changes as listed on the http://devel.ringlet.net/textproc/bomstrip/ webpage; most probably, the bomstrip-8-roam-06 version, if I don't come up with anything more in the meantime :) Of course, the changes have been sent to the upstream author, and if he decides to release a new version, I'll package it instead :) -- System Information: Debian Release: lenny/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 2.6.18-4-amd64 (SMP w/4 CPU cores) Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) (ignored: LC_ALL set to C) Shell: /bin/sh linked to /bin/bash pgpFOe2JxRJi2.pgp Description: PGP signature