Re: [PATCH groff] tmac/hyphenex.en: add patterns for sequestrate & its derivates

2024-12-17 Thread onf
On Wed Dec 18, 2024 at 2:15 AM CET, onf wrote:
> On Wed Dec 18, 2024 at 12:14 AM CET, Tadziu Hoffmann wrote:
> > There are a few other words that don't follow the pattern.
> > "filtrate" is the fluid that has been filtered, but I don't
> > think "to filtrate" is a valid word.  And "orientation" is
> > the act or result of orienting, not "orientating".
>
> I didn't mean to imply that it's right this way (although according to
> Oxford, it is[1]). I was just pointing out that it didn't sound wrong
> to me (whereas your examples do; I don't know why).

By the way, although there is no entry for "filtrate", there is in fact
an entry for "orientate", and it's not marked "non-standard" either:
  orien-tate verb (BrE) = ORIENT



Re: [PATCH groff] tmac/hyphenex.en: add patterns for sequestrate & its derivates

2024-12-17 Thread onf
On Wed Dec 18, 2024 at 12:14 AM CET, Tadziu Hoffmann wrote:
> > With that said, not being a native speaker, if I had to turn "sequestration"
> > into a verb, I would say "sequestrate" too and it would sound right to me...
>
> There are a few other words that don't follow the pattern.
> "filtrate" is the fluid that has been filtered, but I don't
> think "to filtrate" is a valid word.  And "orientation" is
> the act or result of orienting, not "orientating".

I didn't mean to imply that it's right this way (although according to
Oxford, it is[1]). I was just pointing out that it didn't sound wrong
to me (whereas your examples do; I don't know why).

Looking the word up again shows that sequester has another, related
meaning ("to keep a jury together [somewhere to prevent] them from
talking to other people [...]") which sequestrate does not have.

~ onf

[1] The dictionary labels gramatically incorrect words such as "gonna"
with "non-standard", but sequestrate is not labeled in ANY way,
indicating it's not incorrect in any way (according to them).



Re: [PATCH groff] tmac/hyphenex.en: add patterns for sequestrate & its derivates

2024-12-17 Thread Tadziu Hoffmann


> With that said, not being a native speaker, if I had to turn "sequestration"
> into a verb, I would say "sequestrate" too and it would sound right to me...

There are a few other words that don't follow the pattern.
"filtrate" is the fluid that has been filtered, but I don't
think "to filtrate" is a valid word.  And "orientation" is
the act or result of orienting, not "orientating".





Re: [PATCH groff] tmac/hyphenex.en: add patterns for sequestrate & its derivates

2024-12-17 Thread onf
Hi Branden,

On Tue Dec 17, 2024 at 8:29 PM CET, G. Branden Robinson wrote:
> [...]
> (Did you hear that the Siberian traps appear to be roaring to life?[1]
> Many of us under the age of 60 can look forward to dying of heat stroke.)

Not really, but I know there are many places where the permafrost is melting,
so it doesn't surprise me. Frankly I don't read such news much; the little
I read every now and then is sad more than enough. I guess what's worst is
that all the purported solutions just result in more pollution and
environmental destruction without really solving anything (except for
income for those pushing them, obviously).

> > I am not sure about "sequestrated" and especially about
> > "sequestrating",
>
> I'm dubious about "sequestrate" itself, and therefore even more so of
> these derived forms.  One or two other cases exist of UK English getting
> carried away with reduplicative affixes on verbs, but I can't summon any
> to mind right now.  More common is the pointless suffixing of "-al" to
> "make" an adjective out of a word ending in "-ic" that _already is_ an
> adjective, like "ironical".  UK English just loves this form of
> morphologic excess.  I blame proximity to France.

It's funny because I would expect such stuff to come from American English
given how many of its speakers can't even distinguish between "its" and
"it's" or even "your" and "you're" :)

With that said, not being a native speaker, if I had to turn "sequestration"
into a verb, I would say "sequestrate" too and it would sound right to me...

> > I have modified your script into the following to be in line with the
> > way I set up hyphenation:
> >   #!/bin/sh
> >   printf '.mso %s.tmac\n.ll 1Z\n\\&%s\n' "$1" "$2" |
>
> One _Z_?  What _is_ this unit?  And why isn't the formatter complaining
> about it?

Oh well, that's what I get for trying to write something 'smart'
from memory. I was trying to get the `z` unit, but what I really
meant to say was this:
  .ll \n[.H]u

Anyway, you're right: groff complains if I say `1z`, but not when I say `1Z`.

> > nroff -ww -Wbreak |
> > sed -E '/^$/d' |
>
> `-E` is, I think, unnecessary here, since `^` and `$` as zero-width
> anchoring atoms are both valid POSIX BREs, not reserved to EREs.  FYI.

Frankly, I don't care. I have a habit of using -E on anything that doesn't
default to ERE, because the last thing I want is accidentally breaking a
working script by changing some regex in a way that makes it no longer work
with BRE and forgetting to add the -E flag.

In my mind, I always want ERE behavior, so I give it the -E flag.
Then I don't have to remember all the differences between the two
just to be able to tell when I need to add the flag. This can be
especially frustrating when GNU extends BRE to include ERE features
such as the `+` quantifier.

> > It hyphenates correctly, too:
> >   se‐ques‐tra‐tion
> > 
> > However, I have a file where hyphentation is setup like this:
> >   .mso en.tmac
> >   .de HY
> >   . hy 4
> >   ..
> > 
> > (the macro HY is used after .nh to re-enable hyphenation.)
>
> [...]
>
> > But when I put:
> >   .hw se-ques-tra-tion
> > after the above requests at the top of the document, it does.
> > 
> > I have no idea what might cause this behavior. Running groff with
> > -ww does not reveal anything hyphenation-related.
>
> I think something might be misconfigured in your installation.  :(

Yeah, my macros. (:

To expand on the very brief explanation I provided in my previous reply,
I had this:
  .so mac.tmac
  .mso en.tmac
  .de HY
  . hy 4
  ..

The mac.tmac file contains my version of the Mk macros, which setup
hyphenation parameters for Czech. The following lines override those
parameters with English ones.

Well, except Mk has this great property of initializing itself only
after you use one of its macros, so that the Czech hyphenation
parameters which I configured in Mk's init were loaded AFTER the
English ones, not before.

I fixed it for now by manually initializing Mk just after loading it.
I will likely get rid of this initialization behavior altogether in
the future.

~ onf



Re: [PATCH groff] tmac/hyphenex.en: add patterns for sequestrate & its derivates

2024-12-17 Thread G. Branden Robinson
Hi onf,

At 2024-12-17T19:48:24+0100, onf wrote:
> On Tue Dec 17, 2024 at 7:00 PM CET, G. Branden Robinson wrote:
> > Is that a standard English word?  "Sequester" is; sometimes used in
> > U.S. criminal procedure to refer to a process of isolating a jury
> > during its deliberations.  I think I've also seen it in fiscal
> > contexts.
> >
> > "sequester, sequestered, sequestering" would all be standard.
> >
> > [...]
> >
> > Hmm.  "sequestration" _does_ seem standard to me, though.
> 
> From Oxford Advanced Learner's Dictionary of Current English, 6th ed.:
>   se-ques-trate (also se-ques-ter) verb
>(law) to take control of sb's property or ASSETS until a debt has
>  been paid
>-> se-ques-tra-tion noun
> 
> The word has gained another meaning since this book came out in the
> phrase "carbon sequestration", which britannica.com defines as
> "the long-term storage of carbon in plants, soils, geologic formations,
> and the ocean."

Yes, I'm familiar with that form of the word (as noted above) and this
application of it.  (Did you hear that the Siberian traps appear to be
roaring to life?[1]  Many of us under the age of 60 can look forward to
dying of heat stroke.)

> I am not sure about "sequestrated" and especially about
> "sequestrating",

I'm dubious about "sequestrate" itself, and therefore even more so of
these derived forms.  One or two other cases exist of UK English getting
carried away with reduplicative affixes on verbs, but I can't summon any
to mind right now.  More common is the pointless suffixing of "-al" to
"make" an adjective out of a word ending in "-ic" that _already is_ an
adjective, like "ironical".  UK English just loves this form of
morphologic excess.  I blame proximity to France.

But I digress...

> but I have added them anyway as they seem theoretically possiple and I
> didn't want to risk they wouldn't hyphenate correctly.

Then the thing to do is put appropriate `hw` requests in your troffrc
file, into the document, or into a file that your document sources.  GNU
troff's hyphenation exception files are not a good first location to
site hyphenations of nonstandard words.

> > If TeX doesn't handle this word, I'm inclined to advise that a
> > document do so itself with the `hw` request.
> 
> I dunno. I don't have TeX installed.

I do, but don't know enough TeX to write a counterpart to my "hyphen"
script for it without doing a lot of homework first.  Maybe someone else
here does.

> I have modified your script into the following to be in line with the
> way I set up hyphenation:
>   #!/bin/sh
>   printf '.mso %s.tmac\n.ll 1Z\n\\&%s\n' "$1" "$2" |

One _Z_?  What _is_ this unit?  And why isn't the formatter complaining
about it?

("What are you animals doing in my head?  Why is Private Pyle out of his
bunk after lights out?  Why is Private Pyle holding that weapon?  Why
aren't you stomping Private Pyle's guts out?")

I see I have more work to do on Savannah #64240.

https://savannah.gnu.org/bugs/?64240

>   nroff -ww -Wbreak |
>   sed -E '/^$/d' |

`-E` is, I think, unnecessary here, since `^` and `$` as zero-width
anchoring atoms are both valid POSIX BREs, not reserved to EREs.  FYI.

>   tr -d '\n' && echo
> 
> It hyphenates correctly, too:
>   se‐ques‐tra‐tion
> 
> However, I have a file where hyphentation is setup like this:
>   .mso en.tmac
>   .de HY
>   . hy 4
>   ..
> 
> (the macro HY is used after .nh to re-enable hyphenation.)

Seems reasonable.

> ...and the word "sequestration" simply does not hyphenate.

Hmm.  I can't reproduce this.

$ cat EXPERIMENTS/onf-hyphen.roff
.ll 10n
.na
.mso en.tmac
.de HY
.  hy 4
..
sequestration
sequestration
.nh
sequestration
sequestration
.HY
sequestration
sequestration
.pl \n[nl]u
$ nroff -ww -Wbreak EXPERIMENTS/onf-hyphen.roff
sequestra‐
tion se‐
questra‐
tion
sequestration
sequestration
sequestra‐
tion se‐
questra‐
tion

I get the same results with my working copy and with groff 1.23.0.

I even get the same results with groff 1.22.4, with this expected
additional diagnostic.

troff: EXPERIMENTS/onf-hyphen.roff:3: warning: can't find macro file 'en.tmac'

We didn't have "en.tmac" back then.

> But when I put:
>   .hw se-ques-tra-tion
> after the above requests at the top of the document, it does.
> 
> I have no idea what might cause this behavior. Running groff with
> -ww does not reveal anything hyphenation-related.

I think something might be misconfigured in your installation.  :(

What version of groff are you running?  (Down to the commit, if
necessary.  `groff --version` should disclose this information.)

I can try a build of that exact same commit, run it, and maybe we can
compare `pev` and/or `phw` request output.

Regards,
Branden

[1] https://www.theguardian.com/world/2024/dec/10/arctic-tundra-carbon-shift


signature.asc
Description: PGP signature


Re: [PATCH groff] tmac/hyphenex.en: add patterns for sequestrate & its derivates

2024-12-17 Thread onf
On Tue Dec 17, 2024 at 7:48 PM CET, onf wrote:
> > But groff also breaks it just fine for me.
> >
> > $ hyphen sequestration
> > se‐ques‐tra‐tion
> >
> > $ cat ~/bin/hyphen
> > [...]
>
> However, I have a file where hyphentation is setup like this:
>   .mso en.tmac
>   .de HY
>   . hy 4
>   ..
>
> (the macro HY is used after .nh to re-enable hyphenation.)
>
> ...and the word "sequestration" simply does not hyphenate.
> But when I put:
>   .hw se-ques-tra-tion
> after the above requests at the top of the document, it does.
>
> I have no idea what might cause this behavior. Running groff with
> -ww does not reveal anything hyphenation-related.

Ugh. The hyphenation settings were being overriden by another macro which
was being triggerred after the above.

Thanks for the assistance, and sorry for bothering you.

~ onf



Re: [PATCH groff] tmac/hyphenex.en: add patterns for sequestrate & its derivates

2024-12-17 Thread onf
Hi Branden,

On Tue Dec 17, 2024 at 7:00 PM CET, G. Branden Robinson wrote:
> Is that a standard English word?  "Sequester" is; sometimes used in
> U.S. criminal procedure to refer to a process of isolating a jury during
> its deliberations.  I think I've also seen it in fiscal contexts.
>
> "sequester, sequestered, sequestering" would all be standard.
>
> [...]
>
> Hmm.  "sequestration" _does_ seem standard to me, though.

>From Oxford Advanced Learner's Dictionary of Current English, 6th ed.:
  se-ques-trate (also se-ques-ter) verb
   (law) to take control of sb's property or ASSETS until a debt has
 been paid
   -> se-ques-tra-tion noun

The word has gained another meaning since this book came out in the
phrase "carbon sequestration", which britannica.com defines as
"the long-term storage of carbon in plants, soils, geologic formations,
and the ocean."

I am not sure about "sequestrated" and especially about "sequestrating",
but I have added them anyway as they seem theoretically possiple and I
didn't want to risk they wouldn't hyphenate correctly.

> Does TeX break these?  Our hyphenation patterns, including the
> exceptions, come from TeX.
>
> If TeX doesn't handle this word, I'm inclined to advise that a document
> do so itself with the `hw` request.

I dunno. I don't have TeX installed.

> But groff also breaks it just fine for me.
>
> $ hyphen sequestration
> se‐ques‐tra‐tion
>
> $ cat ~/bin/hyphen
> [...]

I have modified your script into the following to be in line with the way
I set up hyphenation:
  #!/bin/sh
  printf '.mso %s.tmac\n.ll 1Z\n\\&%s\n' "$1" "$2" |
nroff -ww -Wbreak |
sed -E '/^$/d' |
tr -d '\n' && echo

It hyphenates correctly, too:
  se‐ques‐tra‐tion

However, I have a file where hyphentation is setup like this:
  .mso en.tmac
  .de HY
  . hy 4
  ..

(the macro HY is used after .nh to re-enable hyphenation.)

...and the word "sequestration" simply does not hyphenate.
But when I put:
  .hw se-ques-tra-tion
after the above requests at the top of the document, it does.

I have no idea what might cause this behavior. Running groff with
-ww does not reveal anything hyphenation-related.

~ onf



Re: [PATCH groff] tmac/hyphenex.en: add patterns for sequestrate & its derivates

2024-12-17 Thread G. Branden Robinson
Hi onf,

At 2024-12-17T18:18:39+0100, onf wrote:
> ---
> These words currently don't hyphenate at all with en.tmac.

Is that a standard English word?  "Sequester" is; sometimes used in
U.S. criminal procedure to refer to a process of isolating a jury during
its deliberations.  I think I've also seen it in fiscal contexts.

"sequester, sequestered, sequestering" would all be standard.

Does TeX break these?  Our hyphenation patterns, including the
exceptions, come from TeX.

If TeX doesn't handle this word, I'm inclined to advise that a document
do so itself with the `hw` request.

Hmm.  "sequestration" _does_ seem standard to me, though.

But groff also breaks it just fine for me.

$ hyphen sequestration
se‐ques‐tra‐tion

$ cat ~/bin/hyphen
#!/bin/sh

: ${HY:=4}

for W
do
printf ".hy $HY\n.ll 1u\n%s\n" "$W" | nroff -Wbreak | sed '/^$/d' \
| tr -d '\n'
echo
done

# vim:set ai et sw=4 ts=4 tw=80:

>  tmac/hyphenex.en | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tmac/hyphenex.en b/tmac/hyphenex.en
> index 768c0af9d..bd7303613 100644
> --- a/tmac/hyphenex.en
> +++ b/tmac/hyphenex.en
> @@ -59,6 +59,10 @@
>ring-leaders
>round-table
>round-tables
> +  se-ques-tra-te
> +  se-ques-tra-ted
> +  se-ques-tra-ting
> +  se-ques-tra-tion
>single-space
>single-spaced
>single-spacing
> -- 
> 2.47.0

Regards,
Branden


signature.asc
Description: PGP signature