[bug #65232] Russian hyphenation is not working

2024-08-19 Thread G. Branden Robinson
Update of bug #65232 (group groff):

 Assigned to:None => gbranden   


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/


signature.asc
Description: PGP signature


[bug #65232] Russian hyphenation is not working

2024-02-05 Thread Robin Haberkorn
Follow-up Comment #5, bug#65232 (group groff):

[comment #4 комментарий №4:]
> 
> [comment #3 comment #3:]
> > After switching from pdfroff (-Tps) to pdfmom (-Tpdf), hyphenation
suddenly works fine.
> 
> Glad to hear it.
>  
I forgot to mention, I also had to install a new version of the
LiberationSerif fonts as the previous ones I was using, apparently weren't
fully compatible with gropdf. There were for instance some space characters
that were not displayed correctly.

> > Moreover, it will even work with UTF8 input (-Kutf-8), even though that
causes other glitches.
> 
> What glitches are you seeing?
> 
With -Kutf-8, link texts generated by .pdfhref were sometimes missing -
seemingly random - characters.

> The input is coverted from UTF-8 to KOI8-R.  The hyphenation patters are
defined in terms of KOI8-R code points.  The formatter (GNU _troff_) decides
where the hyphens should go and performs the breaks.  The formatter converts
the input characters into internal data structures called "nodes" that do not
use an externally visible encoding.  Then, when generating device-independent
output, each glyph nodes is converted to a device-independent special
character command _if_ the output device supports its code point.  (If it
doesn't, you get a warning like "special character 'u0413' not defined".)
> 
Are you telling me that pdfmom is actually internally converting my text to
KOI8-R after noticing I did -mru?
This is obviously not the case as I tried to print some Cyrillic using .tm and
it comes out as Unicode escapes as would be expected after the sources are ran
through preconv.



___

Reply to this item at:

  

___
Сообщение отправлено по Savannah
https://savannah.gnu.org/




[bug #65232] Russian hyphenation is not working

2024-02-03 Thread G. Branden Robinson
Update of bug#65232 (group groff):

  Status:   Need Info => Invalid
 Open/Closed:Open => Closed 

___

Follow-up Comment #4:


[comment #3 comment #3:]
> After switching from pdfroff (-Tps) to pdfmom (-Tpdf), hyphenation suddenly
works fine.

Glad to hear it.
 
> Moreover, it will even work with UTF8 input (-Kutf-8), even though that
causes other glitches.

What glitches are you seeing?

> I have no idea way it can hyphenate Unicode escapes.

The input is coverted from UTF-8 to KOI8-R.  The hyphenation patters are
defined in terms of KOI8-R code points.  The formatter (GNU _troff_) decides
where the hyphens should go and performs the breaks.  The formatter converts
the input characters into internal data structures called "nodes" that do not
use an externally visible encoding.  Then, when generating device-independent
output, each glyph nodes is converted to a device-independent special
character command _if_ the output device supports its code point.  (If it
doesn't, you get a warning like "special character 'u0413' not defined".)

> pdfroff should perhaps be marked as deprecated or pdfmom should outright
replace it.

We are pondering that notion in bug #63827.  Would you like to be added to the
CC list of that ticket?
 
> From my perspective, you can close this ticket.

Thanks for following up!  Closing.


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/




[bug #65232] Russian hyphenation is not working

2024-02-03 Thread Robin Haberkorn
Follow-up Comment #3, bug#65232 (group groff):

After switching from pdfroff (-Tps) to pdfmom (-Tpdf), hyphenation suddenly
works fine.

Moreover, it will even work with UTF8 input (-Kutf-8), even though that causes
other glitches. I have no idea way it can hyphenate Unicode escapes.

`pdfmom --roff -spdf` generally works much better than pdfroff, including TOC
recollation which can finally be done without manually psselect-ing thanks to
.pdfswitchtopage.

pdfroff should perhaps be marked as deprecated or pdfmom should outright
replace it.

>From my perspective, you can close this ticket.


___

Reply to this item at:

  

___
Сообщение отправлено по Savannah
https://savannah.gnu.org/




[bug #65232] Russian hyphenation is not working

2024-01-31 Thread Robin Haberkorn
Follow-up Comment #2, bug#65232 (group groff):

Hello Branden!

I am not quite sure what additional info you need. I attached a test case. You
can reproduce it. No matter what font size or hyphenation mode, I cannot get
it to hyphenate.

Hyphenation *does* work when formatting for -Tutf8. The same is true for the
Махновщина-text from the mailing list post. Furthermore, I do not
understand why the Махновщина-text given in UTF8 can be hyphenated
correctly at all. I thought that hyphenation will only work in KOI8-R.


___

Reply to this item at:

  

___
Сообщение отправлено по Savannah
https://savannah.gnu.org/




[bug #65232] Russian hyphenation is not working

2024-01-30 Thread G. Branden Robinson
Update of bug#65232 (group groff):

  Status:None => Need Info  

___

Follow-up Comment #1:

Hi Robin,

[comment #0 original submission:]
> I cannot get Russian hyphenation to work on a HEAD build of Groff. As far as
I understand, it should be enough to -mru.

It should.

> It should even enable hyphenation mode 8 by default.

That is _not_ my understanding.  The automated test assumes that loading the
"ru" package will set the hyphenation mode to 1 or 2.


$ grep -A2 mru tmac/tests/localization-works.sh 
output=$(printf "%s\n" "$input" | "$groff" -Tascii -P-cbou -mru 2>&1)
echo 'checking raw troff with -mru' >&2
echo "$output" | grep -Fqx '.hy=1' || wail

--
output=$(printf "%s\n" "$input" | "$groff" -Tascii -P-cbou -me -mru 2>&1)
echo 'checking -me with -mru' >&2
echo "$output" | grep -Fqx '.hy=2' || wail

--
output=$(printf "%s\n" "$input" | "$groff" -Tascii -P-cbou -ms -mru 2>&1)
echo 'checking -ms with -mru' >&2
echo "$output" | grep -Fqx '.hy=2' || wail

--
output=$(printf "%s\n" "$input" | "$groff" -Tascii -P-cbou -rcR=0 -man -mru
2>&1)
echo 'checking -man with -rcR=0 -mru' >&2
echo "$output" | grep -Fqx '.hy=2' || wail

--
output=$(printf "%s\n" "$input" | "$groff" -Tascii -P-cbou -rcR=1 -man -mru
2>&1)
echo 'checking -man with -rcR=1 -mru' >&2
echo "$output" | grep -Fqx '.hy=1' || wail


And indeed that's what "ru.tmac" sets up.


.\" Set up hyphenation.
.
.\" Russian hyphenation (\lefthyphenmin=2, \righthyphenmin=2)
.nr \*[locale]*hyphenation-mode-base 1
.nr \*[locale]*hyphenation-mode-trap 2


But we don't have in this automated test any Russian language text that we
check for correct hyphenation.  None was available.  If you can supply some,
that would be helpful.

> Still, I try to set HY and .hy manually without any success.
> 
> My source file UTF-8, converted to KOI8 using iconv, but I also included the
preconverted KOI8 file in case you don't have a working iconv. btw. that's a
very useful hack, as it preserves misc. codepoints as unicode character
escapes.
> 
> You have to install LiberationSerif, for instance using install-fonts.sh.
> 
> The command line to build the example used is:
> 

> iconv -f UTF-8 -t KOI8-R --unicode-subst='\[u%04X]' hyphen-utf8.ms | groff
-Tpdf -ms -mru >hyphen-koi8.pdf


Here's a thread from last March when we were first landing this change.

https://lists.gnu.org/archive/html/groff/2023-03/msg00100.html

You might compare your results with those we were getting at the time.

Regards,
Branden


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/




[bug #65232] Russian hyphenation is not working

2024-01-30 Thread Robin Haberkorn
URL:
  

 Summary: Russian hyphenation is not working
   Group: GNU roff
   Submitter: rhaberkorn
   Submitted: Ср 31 янв 2024 02:48:42
Category: Macro - others/general
Severity: 3 - Normal
  Item Group: Incorrect behaviour
  Status: None
 Privacy: Public
 Assigned to: None
 Open/Closed: Open
 Discussion Lock: Any
 Planned Release: None


___

Follow-up Comments:


---
Date: Ср 31 янв 2024 02:48:42   By: Robin Haberkorn 
I cannot get Russian hyphenation to work on a HEAD build of Groff. As far as I
understand, it should be enough to -mru. It should even enable hyphenation
mode 8 by default.

Still, I try to set HY and .hy manually without any success.

My source file UTF-8, converted to KOI8 using iconv, but I also included the
preconverted KOI8 file in case you don't have a working iconv. btw. that's a
very useful hack, as it preserves misc. codepoints as unicode character
escapes.

You have to install LiberationSerif, for instance using install-fonts.sh.

The command line to build the example used is:


iconv -f UTF-8 -t KOI8-R --unicode-subst='\[u%04X]' hyphen-utf8.ms | groff
-Tpdf -ms -mru >hyphen-koi8.pdf








___
File Attachments:


---
Name: hyphen.tar.gz  Size: 111КиБ


AGPL NOTICE

These attachments are served by Savane. You can download the corresponding
source code of Savane at
https://git.savannah.nongnu.org/cgit/administration/savane.git/snapshot/savane-3112ec7181a7018604fb7b25b2201235b3bdfb6a.tar.gz

___

Reply to this item at:

  

___
Сообщение отправлено по Savannah
https://savannah.gnu.org/