Bug#486425: ITP: bomstrip -- strip Byte-Order Marks from UTF-8 text files

2008-06-17 Thread Peter Pentchev
On Mon, Jun 16, 2008 at 12:28:52PM +0300, Lars Wirzenius wrote:
 ma, 2008-06-16 kello 12:14 +0300, Peter Pentchev kirjoitti:
  Hm.  Okay, so maybe the two command-line utilities and the collection
  might be separated.  IMHO, the collection *is* still useful on its own :)
  If others share this opinion, I may either create two separate packages,
  or just remove the command-line utilities and file a wishlist bug
  against coreutils or textutils or something like that.  How does that
  strike you?  What do others think?
 
 I wish to ask this question: is packaging this collection directly
 useful to Debian users? How? If the task is to remove BOMs from files,
 then surely that should be served by a one-line sed or perl script (plus
 manual page, copyright license file, etc, so a total cost of about 200
 KiB).
 
 I'm not saying that the collection should not be packaged, I'm just
 finding it hard to imagine it being useful as a Debian package.

Yes, I agree with your reasoning.  Okay then, so the new plan is
to package the bomstrip and bomstrip-files utilities and the manual
page, and leave developers who are interested in the various
implementations to take a look at the source package.

As to the packaging, though, I wonder.  The sed and awk implementations
of bomstrip have problems with files that do not end in a newline.
Thus, I would prefer to package either the Perl or the C implementation.
The C implementation is a bit larger (ELF and stuff), but the Perl
implementation has a run-time dependency on, well, Perl :)
My personal preference would be towards the C version to avoid
the unnecessary dependency, but if people feel that 5 KB is too big
for such a utility, I'll package up the Perl version instead :)

In another message, Bryan Donlan wrote:
 Would the collection really be useful in /binary/ form, however? If
 the goal is to show how easy it is to write, installing a bunch of
 functionally identical
 /usr/bin/bomstrip.{c,ada,cplusplus,haskell,ocaml} binaries won't
 demonstrate much :)

Errr, that would have been true if the package only distributed
the compiled versions; however, the whole point of my original ITP
was to install all the implementations into /usr/share/bomstrip/source/
and only one executable as /usr/bin/bomstrip :) (and a helper tool as
/usr/bin/bomstrip-files, and manpages for both)

But, yep, as I wrote above, the plan now is to only package the tools
and the manpages.

Thanks to everyone for the comments!

G'luck,
Peter

-- 
Peter Pentchev  [EMAIL PROTECTED][EMAIL PROTECTED][EMAIL PROTECTED]
PGP key:http://people.FreeBSD.org/~roam/roam.key.asc
Key fingerprint FDBA FD79 C26F 3C51 C95E  DF9E ED18 B68D 1619 4553
This would easier understand fewer had omitted.


pgpUECzJhZ1Kw.pgp
Description: PGP signature


Bug#486425: ITP: bomstrip -- strip Byte-Order Marks from UTF-8 text files

2008-06-16 Thread Guus Sliepen
On Mon, Jun 16, 2008 at 03:08:02AM +0300, Peter Pentchev wrote:

 * Package name: bomstrip
   Programming Lang: Awk, Brainf*ck, C, C++, Forth, Haskell, OCaml, Ook!,
 Pascal, PHP, Perl, PostScript, Python, Ruby, sed,
   Unlambda

All these programming languages got me wondering. Apparently the same
program is implemented in all these languages. But you only need one to
get the desired functionality. Also, I see the sed variant is just a
one-liner. Perhaps it is better if this functionality is merged with a
package like coreutils or recode, if it is not already there someway.

-- 
Met vriendelijke groet / with kind regards,
  Guus Sliepen [EMAIL PROTECTED]


signature.asc
Description: Digital signature


Bug#486425: ITP: bomstrip -- strip Byte-Order Marks from UTF-8 text files

2008-06-16 Thread Peter Pentchev
On Mon, Jun 16, 2008 at 10:55:47AM +0200, Guus Sliepen wrote:
 On Mon, Jun 16, 2008 at 03:08:02AM +0300, Peter Pentchev wrote:
 
  * Package name: bomstrip
Programming Lang: Awk, Brainf*ck, C, C++, Forth, Haskell, OCaml, Ook!,
  Pascal, PHP, Perl, PostScript, Python, Ruby, sed,
  Unlambda
 
 All these programming languages got me wondering. Apparently the same
 program is implemented in all these languages. But you only need one to
 get the desired functionality. Also, I see the sed variant is just a
 one-liner. Perhaps it is better if this functionality is merged with a
 package like coreutils or recode, if it is not already there someway.

As the author writes on his website, the whole point of the bomstrip
project being a collection of implementations is more of a social /
political goal of spreading the word, showing how easy it is,
bringing attention to the broken UTF-8 text files that some programs
generate, and so on.

IMHO, the distribution also servers as a nice way to demonstrate
a simple (well, admittedly, a *very* simple :)) task done in various
languages.

Hm.  Okay, so maybe the two command-line utilities and the collection
might be separated.  IMHO, the collection *is* still useful on its own :)
If others share this opinion, I may either create two separate packages,
or just remove the command-line utilities and file a wishlist bug
against coreutils or textutils or something like that.  How does that
strike you?  What do others think?

G'luck,
Peter

-- 
Peter Pentchev  [EMAIL PROTECTED][EMAIL PROTECTED][EMAIL PROTECTED]
PGP key:http://people.FreeBSD.org/~roam/roam.key.asc
Key fingerprint FDBA FD79 C26F 3C51 C95E  DF9E ED18 B68D 1619 4553
Do you think anybody has ever had *precisely this thought* before?


pgpLxaapQ0ZhU.pgp
Description: PGP signature


Bug#486425: ITP: bomstrip -- strip Byte-Order Marks from UTF-8 text files

2008-06-16 Thread Lars Wirzenius
ma, 2008-06-16 kello 12:14 +0300, Peter Pentchev kirjoitti:
 Hm.  Okay, so maybe the two command-line utilities and the collection
 might be separated.  IMHO, the collection *is* still useful on its own :)
 If others share this opinion, I may either create two separate packages,
 or just remove the command-line utilities and file a wishlist bug
 against coreutils or textutils or something like that.  How does that
 strike you?  What do others think?

I wish to ask this question: is packaging this collection directly
useful to Debian users? How? If the task is to remove BOMs from files,
then surely that should be served by a one-line sed or perl script (plus
manual page, copyright license file, etc, so a total cost of about 200
KiB).

I'm not saying that the collection should not be packaged, I'm just
finding it hard to imagine it being useful as a Debian package.





-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#486425: ITP: bomstrip -- strip Byte-Order Marks from UTF-8 text files

2008-06-16 Thread Bryan Donlan
On Mon, Jun 16, 2008 at 5:14 AM, Peter Pentchev [EMAIL PROTECTED] wrote:
 On Mon, Jun 16, 2008 at 10:55:47AM +0200, Guus Sliepen wrote:
 On Mon, Jun 16, 2008 at 03:08:02AM +0300, Peter Pentchev wrote:

  * Package name: bomstrip
Programming Lang: Awk, Brainf*ck, C, C++, Forth, Haskell, OCaml, Ook!,
  Pascal, PHP, Perl, PostScript, Python, Ruby, sed,
  Unlambda

 All these programming languages got me wondering. Apparently the same
 program is implemented in all these languages. But you only need one to
 get the desired functionality. Also, I see the sed variant is just a
 one-liner. Perhaps it is better if this functionality is merged with a
 package like coreutils or recode, if it is not already there someway.

 As the author writes on his website, the whole point of the bomstrip
 project being a collection of implementations is more of a social /
 political goal of spreading the word, showing how easy it is,
 bringing attention to the broken UTF-8 text files that some programs
 generate, and so on.

 IMHO, the distribution also servers as a nice way to demonstrate
 a simple (well, admittedly, a *very* simple :)) task done in various
 languages.

 Hm.  Okay, so maybe the two command-line utilities and the collection
 might be separated.  IMHO, the collection *is* still useful on its own :)
 If others share this opinion, I may either create two separate packages,
 or just remove the command-line utilities and file a wishlist bug
 against coreutils or textutils or something like that.  How does that
 strike you?  What do others think?

Would the collection really be useful in /binary/ form, however? If
the goal is to show how easy it is to write, installing a bunch of
functionally identical
/usr/bin/bomstrip.{c,ada,cplusplus,haskell,ocaml} binaries won't
demonstrate much :)



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#486425: ITP: bomstrip -- strip Byte-Order Marks from UTF-8 text files

2008-06-15 Thread Peter Pentchev
Package: wnpp
Severity: wishlist
Owner: Peter Pentchev [EMAIL PROTECTED]

* Package name: bomstrip
  Version : 8
  Upstream Author : Mechiel Lukkien [EMAIL PROTECTED]
* URL : http://www.xs4all.nl/~mechiel/projects/bomstrip/
* License : public domain
  Programming Lang: Awk, Brainf*ck, C, C++, Forth, Haskell, OCaml, Ook!,
Pascal, PHP, Perl, PostScript, Python, Ruby, sed,
Unlambda
  Description : Strip Byte-Order Marks from UTF-8 text files

The bomstrip distribution is a collection of filters stripping
the three-byte Byte-Order Mark from UTF-8 text - in UTF-8, the BOM
is not even needed, and it is often actually harmful.  More information
about the bomstrip distribution may be found on the author's site,
http://www.xs4all.nl/~mechiel/projects/bomstrip/

What I intend to package is bomstrip-8 with a couple of my own changes
as listed on the http://devel.ringlet.net/textproc/bomstrip/ webpage;
most probably, the bomstrip-8-roam-06 version, if I don't come up
with anything more in the meantime :)  Of course, the changes have
been sent to the upstream author, and if he decides to release a new
version, I'll package it instead :)

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.18-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) (ignored: LC_ALL set to C)
Shell: /bin/sh linked to /bin/bash


pgpFOe2JxRJi2.pgp
Description: PGP signature