Bug#549233: docbook-to-man: Does not accept (some) (unicode) characters)

2020-03-10 Thread Agustin Martin
On Thu, Feb 27, 2020 at 03:48:44PM +0100, Agustin Martin wrote:
> 
> I recently tried to play with linuxdoc and utf-8 documents and run into the
> same problem,
> 
> onsgmls: ... 01.precmdout:1559:71:E: non SGML character number 141
> 
> This time I was lucky and a web search pointed me to
> https://bugzilla.redhat.com/show_bug.cgi?id=66179. After that suggestion, 
> 
> SP_CHARSET_FIXED=yes SP_ENCODING=xml sgml2html FAQ-CervanTeX-utf8.sgml

Better, for utf-8

$ SP_CHARSET_FIXED=yes SP_ENCODING=utf-8 sgml2html FAQ-CervanTeX-utf8.sgml

-- 
Agustin



Bug#549233: docbook-to-man: Does not accept (some) (unicode) characters)

2020-02-27 Thread Agustin Martin
On Sat, Aug 25, 2018 at 10:02:27AM +0200, Helge Kreutzmann wrote:
> reopen 549233
> found 549233 1:2.0.0-42
> severity 549233 minor
> thanks
> 
> Hello Chris,
> On Mon, Aug 20, 2018 at 10:27:11AM +, Debian Bug Tracking System wrote:
> > This is an automatic notification regarding your Bug report
> > which was filed against the docbook-to-man package:
> > 
> > #549233: docbook-to-man: Does not accept (some) (unicode) characters
> > 
> > > It appears that docbook-to-man is not UTF-8 ready. If you compile the
> > > attached man page "as is" then you'll get the following error:
> > > /usr/bin/nsgmls:demo.man.sgml:60:6:E: non SGML character number 156
> > > /usr/bin/nsgmls:demo.man.sgml:60:6: open elements: REFENTRY REFSECT1[1] 
> > > PARA[1] (#PCDATA[1])
> > > /usr/bin/nsgmls:demo.man.sgml:62:9:E: non SGML character number 159
> > > /usr/bin/nsgmls:demo.man.sgml:62:9: open elements: REFENTRY REFSECT1[1] 
> > > PARA[1] (#PCDATA[1])
> > 
> > This is no longer reproducible; so closing :)
> 
> Well, in my environment (current testing) it is:
> helge@samd:~/download$ recode latin1..utf8 demo.man.sgml
> helge@samd:~/download$ file *.sgml
> demo.man.sgml:   HTML document, UTF-8 Unicode text
> helge@samd:~/download$ docbook-to-man demo.man.sgml > demo.1
> /usr/bin/nsgmls:demo.man.sgml:60:6:E: non SGML character number 156
> /usr/bin/nsgmls:demo.man.sgml:60:6: open elements: REFENTRY REFSECT1[1] 
> PARA[1] (#PCDATA[1])
> /usr/bin/nsgmls:demo.man.sgml:62:9:E: non SGML character number 159
> /usr/bin/nsgmls:demo.man.sgml:62:9: open elements: REFENTRY REFSECT1[1] 
> PARA[1] (#PCDATA[1])
> 
> The same error happens with the file from Paul. (I did not see his e-mail
> earlier, because he did not CC me and adressed only the bug) and the
> output is the same for both.

Hi,

I recently tried to play with linuxdoc and utf-8 documents and run into the
same problem,

onsgmls: ... 01.precmdout:1559:71:E: non SGML character number 141

This time I was lucky and a web search pointed me to
https://bugzilla.redhat.com/show_bug.cgi?id=66179. After that suggestion, 

SP_CHARSET_FIXED=yes SP_ENCODING=xml sgml2html FAQ-CervanTeX-utf8.sgml

made that messages disappear with opensp. I am including that in
linuxdoc-tools as part of preliminary utf-8 support and may be of help here.

> > > Interestingly, some characters (like "ü") are accepted without
> > > problems while others (Ü,ß) yield the above errors.

May be it complains only about one part of the multi-byte representation,
not present in lowercase characters.

-- 
Agustin



Bug#549233: docbook-to-man: Does not accept (some) (unicode) characters

2017-08-03 Thread Paul Hardy
Helge,

I looked at this bug report because I was looking into other things
related to DocBook and man pages.  I think this is not a bug.

If you look at the file you attached with "less", you will see
characters such as "" and "".  Those are the hexadecimal
values of Latin1 characters, not parts of UTF-8 characters.  Maybe you
accidentally attached the wrong file; I don't know.  That would
explain why there are no complaints when you try to convert this as a
Latin1 document though.

I am attaching a version of your document re-encoded as UTF-8 for you
to experiment with.  I did not try to process it.

I also changed the "doctype" word on the first line to "DOCTYPE"; I
think it is always supposed to be upper-case even if some tools don't
complain.

Because this is a DocBook file, you could try giving the filename the
suffix ".xml" and insert this as a first line in the file to declare
its encoding as UTF-8:

 

You might also need to modify the system identifier in the DOCTYPE line.

In summary, I think the problem is that the document you attached is
not a valid UTF-8 document.  There might be other reasons why it is
also not a valid SGML document.

The "emacs" editor will recognize SGML documents as such if the
filename ends in ".sgml".  I do not know how well it validates general
SGML files though.  If you end the filename in ".xml", you can enable
Nxml mode in emacs.  See
https://www.emacswiki.org/emacs/UsingNxmlModeWithDocBook for more
information.  Note that some older emacs instructions might say you
need to download the Nxml package and install it for emacs, but recent
versions of emacs include it.

I do not know what other validation tools are available for general
SGML files on Debian, apart from xsltproc.

I am not the maintainer of this package, and DocBook 4.1 is very, very
old at this point, but I am posting this in case it helps you resolve
this bug.  Hopefully, this is also enough information for you to
decide to close the bug.

Good luck,


Paul Hardy
FIXME">
  Niedermeyer">
  FIXME 2009">
  1">
  chias...@bsi.bund.de">
  
  CHIASMUS">
  

  Debian">
  GNU">
  GPL">
  ">
]>

 

  
   

  
 
  
  
FIXME 2009
   
  
   

 
 




 
 
  
  Demo for docbook-to-man problems
 
 


  -hilfe
 
 

  -beispiel
 
 
 
  BESCHREIBUNG
  
   is some programme 


Für Hinweise zur Sicherheit siehe "HINWEISE", für erste Schritte siehe 
"EINFÜHRUNG" und für weiter Anwendungsbeispiele "BEISPIELE".

Testing ß.


 
 
  OPTIONEN
  
Zwischen einer Option und dem zugehörigen Parameter können, müssen aber keine 
Leerzeichen stehen. Die Optionen können in beliebiger Reihenfolge angegeben 
werden. Wird eine Option mehrfach angegeben, so wird nur das letzte Auftreten 
der Option (bzw. der zugehörige Parameter) ausgewertet. Wildcards werden von 
chiasmus nicht unterstützt.


  
   
-hilfe

 
   Gibt eine kurze Hilfe aus.
 

   

   
  -beispiel

 
   Gibt Beispiele zur Verwendung von Chiasmus aus.
 

   

   
  -m something

	
	some text
 

 
Beispiele: 
 
 
	 
	 
		 a. This is something
	 
	   


	b. Something else



	 
		 c. Even more so 
	 


	 
		 d. A fourth item
	  


	 
		 e. A fifth item 
	 
	 
 

 

   
-q something

	
	Does somthing more
 
 
	 Beispiele: 
 
  
  
	  
	  a. Should be a) (restarted)
	  
   
   
   
	   b.  Should be b)
% demo something ...
   
   
   
   
Hinweis: Die Option -q braucht nicht mit angegeben zu werden. Das Kommando 
% demo something else 
leistet dasselbe wie das Kommando in Beispiel a.
 

   
 
   
-z Options

 
FIXME
 
 
	 some text
 
 
Beispiele: 
 
  
  
	  
	  a.  Should be again a)  
   
   
   
	   b.  Should  be again b)
 
 
 
   
 
  
 




Bug#549233: docbook-to-man: Does not accept (some) (unicode) characters

2009-10-01 Thread Helge Kreutzmann
Package: docbook-to-man
Version: 1:2.0.0-27
Severity: important
Tags: l10n

It appears that docbook-to-man is not UTF-8 ready. If you compile the
attached man page as is then you'll get the following error:
/usr/bin/nsgmls:demo.man.sgml:60:6:E: non SGML character number 156
/usr/bin/nsgmls:demo.man.sgml:60:6: open elements: REFENTRY REFSECT1[1] PARA[1] 
(#PCDATA[1])
/usr/bin/nsgmls:demo.man.sgml:62:9:E: non SGML character number 159
/usr/bin/nsgmls:demo.man.sgml:62:9: open elements: REFENTRY REFSECT1[1] PARA[1] 
(#PCDATA[1])

(The man page looks fine, though). 

Interestingly, some characters (like ü) are accepted without
problems while others (Ü,ß) yield the above errors.

If you recode the file to latin1, then the errors vanish

(Note that in my UTF-8 environment, the
generated man page appears now broken because all umlauts and ß appear
to be silently removed - this can be fixed by recoding the generated
man page back to UTF-8).

-- System Information:
Debian Release: 5.0.3
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: powerpc (ppc)

Kernel: Linux 2.6.24.3-grsec
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages docbook-to-man depends on:
ii  docbook   4.5-4  standard SGML representation syste
ii  libc6 2.7-18 GNU C Library: Shared libraries
ii  sp1.3.4-1.2.1-47 James Clark's SGML parsing tools

docbook-to-man recommends no packages.

docbook-to-man suggests no packages.

-- no debconf information
-- 
  Dr. Helge Kreutzmann deb...@helgefjell.de
   Dipl.-Phys.   http://www.helgefjell.de/debian.php
64bit GNU powered gpg signed mail preferred
   Help keep free software libre: http://www.ffii.de/
!doctype refentry PUBLIC -//OASIS//DTD DocBook V4.1//EN [

  !ENTITY dhfirstname firstnameFIXME/firstname
  !ENTITY dhsurname surnameNiedermeyer/surname
  !ENTITY dhdate dateFIXME 2009/date
  !ENTITY dhsection manvolnum1/manvolnum
  !ENTITY dhemail emailchias...@bsi.bund.de/email
  !ENTITY dhusername Max Mustermann
  !ENTITY dhucpackage refentrytitleCHIASMUS/refentrytitle
  !ENTITY dhpackage demo

  !ENTITY debian productnameDebian/productname
  !ENTITY gnu acronymGNU/acronym
  !ENTITY gpl gnu; acronymGPL/acronym
  !ENTITY demo commanddhpackage;/command
]
refentry
 refentryinfo
address
  dhemail;
   /address

  author
dhfirstname; dhsurname;
  /author
  copyright
yearFIXME 2009/year
   holderdhusername;/holder
  /copyright
   dhdate;

 /refentryinfo
 refmeta
dhucpackage;

dhsection;

 /refmeta
 refnamediv
  refnamedhpackage;/refname
  refpurposeDemo for docbook-to-man problems/refpurpose
 /refnamediv
 refsynopsisdiv
cmdsynopsis sepchar= 
   demo; 
 arg choice=plain -hilfe/arg
 /cmdsynopsis
 cmdsynopsis
   demo; 
 arg choice=plain -beispiel/arg
 /cmdsynopsis
 /refsynopsisdiv
 refsect1
  titleBESCHREIBUNG/title
  para
  demo; is some programme /para

para
Für Hinweise zur Sicherheit siehe HINWEISE, für erste Schritte siehe 
EINFÜHRUNG und für weiter Anwendungsbeispiele BEISPIELE.

Testing ß.
/para

 /refsect1
 refsect1
  titleOPTIONEN/title
  para
Zwischen einer Option und dem zugehörigen Parameter können, müssen aber keine 
Leerzeichen stehen. Die Optionen können in beliebiger Reihenfolge angegeben 
werden. Wird eine Option mehrfach angegeben, so wird nur das letzte Auftreten 
der Option (bzw. der zugehörige Parameter) ausgewertet. Wildcards werden von 
chiasmus nicht unterstützt.
/para

  variablelist
   varlistentry
termoption-hilfe/option/term
listitem
 para
   Gibt eine kurze Hilfe aus.
 /para
/listitem
   /varlistentry

   varlistentry
  termoption-beispiel/option/term
listitem
 para
   Gibt Beispiele zur Verwendung von Chiasmus aus.
 /para
/listitem
   /varlistentry

   varlistentry
  termoption-m/option optionsomething/option/term
listitem
para
some text
 /para

 para
Beispiele: 
 /para
 orderedlist numeration=loweralpha continuation='restarts'
 listitem
 para
 a. This is something
 /para
   /listitem
listitem
para
b. Something else
/para
/listitem
listitem
 para
 c. Even more so 
 /para
/listitem
listitem
 para
 d. A fourth item
 /para 
/listitem
listitem
 para
 e. A fifth item 
 /para
 /listitem
 /orderedlist
/listitem
 /varlistentry

   varlistentry
termoption-q/option optionsomething/option/term
listitem
para
Does somthing more
 /para
 para
 Beispiele: 
 /para
  orderedlist numeration=loweralpha continuation='restarts'
  listitem
  para
  a. Should be a) (restarted)