On Tue, Dec 3, 2024 at 03:58:20PM -0500, Bruce Momjian wrote:
> On Tue, Dec 3, 2024 at 09:05:45PM +0100, Peter Eisentraut wrote:
> > On 26.11.24 20:04, Bruce Momjian wrote:
> > > %.pdf: %.fo $(ALL_IMAGES)
> > > - $(FOP) -fo $< -pdf $@
> > > + LANG=C $(FOP) -fo $< -pdf $@ 2>&1 | \
> > > + awk 'B
On Tue, Dec 3, 2024 at 09:03:37PM +0100, Peter Eisentraut wrote:
> On 03.12.24 04:13, Bruce Momjian wrote:
> > On Mon, Dec 2, 2024 at 09:33:39PM -0500, Tom Lane wrote:
> > > Bruce Momjian writes:
> > > > Now that we have a warning about non-emittable characters in the PDF
> > > > build, do you w
On Tue, Dec 3, 2024 at 09:05:45PM +0100, Peter Eisentraut wrote:
> On 26.11.24 20:04, Bruce Momjian wrote:
> > %.pdf: %.fo $(ALL_IMAGES)
> > - $(FOP) -fo $< -pdf $@
> > + LANG=C $(FOP) -fo $< -pdf $@ 2>&1 | \
> > + awk 'BEGIN { warn = 0 } { print }/not available in font/ { warn = 1 }
>
On 26.11.24 20:04, Bruce Momjian wrote:
%.pdf: %.fo $(ALL_IMAGES)
- $(FOP) -fo $< -pdf $@
+ LANG=C $(FOP) -fo $< -pdf $@ 2>&1 | \
+ awk 'BEGIN { warn = 0 } { print }/not available in font/ { warn = 1 }
\
+ END { if (warn != 0) print("\nFound characters that cannot be
On 03.12.24 04:13, Bruce Momjian wrote:
On Mon, Dec 2, 2024 at 09:33:39PM -0500, Tom Lane wrote:
Bruce Momjian writes:
Now that we have a warning about non-emittable characters in the PDF
build, do you want me to put back the Latin1 characters in the SGML
files or leave them as HTML entities?
On Mon, Dec 2, 2024 at 09:33:39PM -0500, Tom Lane wrote:
> Bruce Momjian writes:
> > Now that we have a warning about non-emittable characters in the PDF
> > build, do you want me to put back the Latin1 characters in the SGML
> > files or leave them as HTML entities?
>
> I think going forward we
Bruce Momjian writes:
> Now that we have a warning about non-emittable characters in the PDF
> build, do you want me to put back the Latin1 characters in the SGML
> files or leave them as HTML entities?
I think going forward we're going to be putting in people's names
in UTF8 --- I was certainly
On Tue, Nov 5, 2024 at 10:08:17AM +0100, Peter Eisentraut wrote:
> On 02.11.24 14:18, Bruce Momjian wrote:
> > On Sat, Nov 2, 2024 at 12:02:12PM +0900, Tatsuo Ishii wrote:
> > > > Yes, we _allow_ LATIN1 characters in the SGML docs, but I replaced the
> > > > LATIN1 characters we had with HTML ent
On Tue, Nov 26, 2024 at 02:04:15PM -0500, Bruce Momjian wrote:
> On Tue, Nov 26, 2024 at 12:41:37PM -0500, Tom Lane wrote:
> > Bruce Momjian writes:
> > > On Tue, Nov 26, 2024 at 11:43:02AM -0500, Tom Lane wrote:
> > >> I don't think this patch is doing anything I want at all.
> >
> > > Gee, I ki
On Tue, Nov 26, 2024 at 12:41:37PM -0500, Tom Lane wrote:
> Bruce Momjian writes:
> > On Tue, Nov 26, 2024 at 11:43:02AM -0500, Tom Lane wrote:
> >> I don't think this patch is doing anything I want at all.
>
> > Gee, I kind of liked the patch, but maybe you didn't like the additional
> > complex
Bruce Momjian writes:
> On Tue, Nov 26, 2024 at 11:43:02AM -0500, Tom Lane wrote:
>> I don't think this patch is doing anything I want at all.
> Gee, I kind of liked the patch, but maybe you didn't like the additional
> complexity to check the PDF output twice, once on input (complex) and
> once
On Tue, Nov 26, 2024 at 11:43:02AM -0500, Tom Lane wrote:
> Bruce Momjian writes:
> > Do we want to add this complexity?
>
> I don't think this patch is doing anything I want at all.
Gee, I kind of liked the patch, but maybe you didn't like the additional
complexity to check the PDF output twice
Bruce Momjian writes:
> Do we want to add this complexity?
I don't think this patch is doing anything I want at all.
regards, tom lane
On Tue, Nov 26, 2024 at 06:25:13PM +0900, Tatsuo Ishii wrote:
> I have looked into the patches.
> > %.pdf: %.fo $(ALL_IMAGES)
> > - $(FOP) -fo $< -pdf $@
> > + CLANG=C $(FOP) -fo $< -pdf $@ 2>&1 | \
>
> Shouldn't "CLANG" be "LANG"?
Yes, probably.
> > + awk 'BEGIN{err=0}{print}/not availab
I have looked into the patches.
> Subject: [PATCH v3 1/3] Disallow characters that cannot be displayed in PDF
>
> ---
> doc/src/sgml/Makefile | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
> index a04c532b53..18bf87d031
On Mon, 18 Nov 2024 22:07:40 -0500
Bruce Momjian wrote:
> On Tue, Nov 19, 2024 at 11:29:07AM +0900, Yugo NAGATA wrote:
> > On Mon, 18 Nov 2024 16:04:20 -0500
> > > So, the failure of ligatures is caused usually by not using the right
> > > Adobe Font Metric (AFM) file, I think. I have seen fault
On Tue, Nov 19, 2024 at 11:29:07AM +0900, Yugo NAGATA wrote:
> On Mon, 18 Nov 2024 16:04:20 -0500
> > So, the failure of ligatures is caused usually by not using the right
> > Adobe Font Metric (AFM) file, I think. I have seen faulty ligature
> > rendering in PDFs but was alway able to fix it by u
On Mon, 18 Nov 2024 16:04:20 -0500
Bruce Momjian wrote:
> On Mon, Nov 11, 2024 at 10:02:15PM +0900, Yugo Nagata wrote:
> > On Tue, 5 Nov 2024 10:08:17 +0100
> > Peter Eisentraut wrote:
> >
> >
> > > >> So you convert LATIN1 characters to HTML entities so that it's easier
> > > >> to detect non
On Mon, Nov 11, 2024 at 10:02:15PM +0900, Yugo Nagata wrote:
> On Tue, 5 Nov 2024 10:08:17 +0100
> Peter Eisentraut wrote:
>
>
> > >> So you convert LATIN1 characters to HTML entities so that it's easier
> > >> to detect non-LATIN1 characters is in the SGML docs? If my
> > >> understanding is co
On Tue, 5 Nov 2024 10:08:17 +0100
Peter Eisentraut wrote:
> >> So you convert LATIN1 characters to HTML entities so that it's easier
> >> to detect non-LATIN1 characters is in the SGML docs? If my
> >> understanding is correct, it can be also achieved by using some tools
> >> like:
> >>
> >> ico
On 02.11.24 14:18, Bruce Momjian wrote:
On Sat, Nov 2, 2024 at 12:02:12PM +0900, Tatsuo Ishii wrote:
Yes, we _allow_ LATIN1 characters in the SGML docs, but I replaced the
LATIN1 characters we had with HTML entities, so there are none
currently.
I think it is too easy for non-Latin1 UTF8 to cr
On Sat, Nov 2, 2024 at 12:02:12PM +0900, Tatsuo Ishii wrote:
> > Yes, we _allow_ LATIN1 characters in the SGML docs, but I replaced the
> > LATIN1 characters we had with HTML entities, so there are none
> > currently.
> >
> > I think it is too easy for non-Latin1 UTF8 to creep into our SGML docs
> Yes, we _allow_ LATIN1 characters in the SGML docs, but I replaced the
> LATIN1 characters we had with HTML entities, so there are none
> currently.
>
> I think it is too easy for non-Latin1 UTF8 to creep into our SGML docs
> so I added a cron job on my server to alert me when non-ASCII characte
On Sat, Nov 2, 2024 at 07:27:00AM +0900, Tatsuo Ishii wrote:
> > On Wed, Oct 16, 2024 at 09:54:57AM -0400, Bruce Momjian wrote:
> >> On Wed, Oct 16, 2024 at 10:00:15AM +0200, Peter Eisentraut wrote:
> >> > On 15.10.24 23:51, Bruce Momjian wrote:
> >> > > On Tue, Oct 15, 2024 at 05:27:49PM -0400, T
Hi Bruce,
> On Wed, Oct 16, 2024 at 09:54:57AM -0400, Bruce Momjian wrote:
>> On Wed, Oct 16, 2024 at 10:00:15AM +0200, Peter Eisentraut wrote:
>> > On 15.10.24 23:51, Bruce Momjian wrote:
>> > > On Tue, Oct 15, 2024 at 05:27:49PM -0400, Tom Lane wrote:
>> > > > Bruce Momjian writes:
>> > > > > W
On Wed, Oct 16, 2024 at 09:54:57AM -0400, Bruce Momjian wrote:
> On Wed, Oct 16, 2024 at 10:00:15AM +0200, Peter Eisentraut wrote:
> > On 15.10.24 23:51, Bruce Momjian wrote:
> > > On Tue, Oct 15, 2024 at 05:27:49PM -0400, Tom Lane wrote:
> > > > Bruce Momjian writes:
> > > > > Well, we can only u
On Wed, Oct 16, 2024 at 09:58:23AM +0200, Peter Eisentraut wrote:
> On 15.10.24 23:51, Bruce Momjian wrote:
> > > I don't see why we need to enforce this at this level. Whatever
> > > downstream
> > > toolchain has requirements about which characters are allowed will
> > > complain
> > > if it e
On Wed, Oct 16, 2024 at 10:00:15AM +0200, Peter Eisentraut wrote:
> On 15.10.24 23:51, Bruce Momjian wrote:
> > On Tue, Oct 15, 2024 at 05:27:49PM -0400, Tom Lane wrote:
> > > Bruce Momjian writes:
> > > > Well, we can only use Latin-1, so the idea is that we will be explicit
> > > > about specify
On 15.10.24 23:51, Bruce Momjian wrote:
On Tue, Oct 15, 2024 at 05:27:49PM -0400, Tom Lane wrote:
Bruce Momjian writes:
Well, we can only use Latin-1, so the idea is that we will be explicit
about specifying Latin-1 only as HTML entities, rather than letting
non-Latin-1 creep in as UTF8. We c
On 15.10.24 23:51, Bruce Momjian wrote:
I don't see why we need to enforce this at this level. Whatever downstream
toolchain has requirements about which characters are allowed will complain
if it encounters a character it doesn't like.
Uh, the PDF build does not complain if you pass it a non-
On Tue, Oct 15, 2024 at 05:59:05PM -0400, Tom Lane wrote:
> Bruce Momjian writes:
> > On Tue, Oct 15, 2024 at 05:27:49PM -0400, Tom Lane wrote:
> >> That policy would cause substantial problems with contributor names
> >> in the release notes. I agree with Peter that we don't need this.
> >> Catc
Bruce Momjian writes:
> On Tue, Oct 15, 2024 at 05:27:49PM -0400, Tom Lane wrote:
>> That policy would cause substantial problems with contributor names
>> in the release notes. I agree with Peter that we don't need this.
>> Catching otherwise-invisible characters seems sufficient.
> Uh, why can
On Tue, Oct 15, 2024 at 05:27:49PM -0400, Tom Lane wrote:
> Bruce Momjian writes:
> > Well, we can only use Latin-1, so the idea is that we will be explicit
> > about specifying Latin-1 only as HTML entities, rather than letting
> > non-Latin-1 creep in as UTF8. We can exclude certain UTF8 or SGM
On Tue, Oct 15, 2024 at 11:08:15PM +0200, Peter Eisentraut wrote:
> On 15.10.24 22:37, Bruce Momjian wrote:
> > > I don't understand the point of this. Maybe it's okay to try to detect
> > > certain "hidden" whitespace characters, like in the case that started this
> > > thread. But I don't see t
Bruce Momjian writes:
> Well, we can only use Latin-1, so the idea is that we will be explicit
> about specifying Latin-1 only as HTML entities, rather than letting
> non-Latin-1 creep in as UTF8. We can exclude certain UTF8 or SGML files
> if desired.
That policy would cause substantial problem
On 15.10.24 22:37, Bruce Momjian wrote:
I don't understand the point of this. Maybe it's okay to try to detect
certain "hidden" whitespace characters, like in the case that started this
thread. But I don't see the value in prohibiting all non-ASCII characters,
as is being proposed here.
Well,
On Tue, Oct 15, 2024 at 10:34:16PM +0200, Peter Eisentraut wrote:
> On 15.10.24 18:54, Bruce Momjian wrote:
> > > I agree with encoding non-Latin1 characters and disallowing non-ASCII
> > > characters totally.
> > >
> > > I found your patch includes fixes in *.svg files, so how about checking
> >
On 15.10.24 18:54, Bruce Momjian wrote:
I agree with encoding non-Latin1 characters and disallowing non-ASCII
characters totally.
I found your patch includes fixes in *.svg files, so how about checking
also them by check-non-ascii? Also, I think it is better to use perl instead
of grep because n
On Tue, Oct 15, 2024 at 10:10:36AM +0900, Yugo NAGATA wrote:
> Hi Bruce,
>
> On Mon, 14 Oct 2024 16:31:11 -0400
> Bruce Momjian wrote:
>
> > On Mon, Oct 14, 2024 at 03:05:35PM -0400, Bruce Momjian wrote:
> > > I did some more research and we able to clarify our behavior in
> > > release.sgml:
>
Hi Bruce,
On Mon, 14 Oct 2024 16:31:11 -0400
Bruce Momjian wrote:
> On Mon, Oct 14, 2024 at 03:05:35PM -0400, Bruce Momjian wrote:
> > I did some more research and we able to clarify our behavior in
> > release.sgml:
>
> I have specified some more details in my patched version:
>
> We
On Mon, Oct 14, 2024 at 03:05:35PM -0400, Bruce Momjian wrote:
> I did some more research and we able to clarify our behavior in
> release.sgml:
I have specified some more details in my patched version:
We can only use Latin1 characters, not all UTF8 characters,
because some rende
On Fri, Oct 11, 2024 at 12:36:53PM +0900, Yugo NAGATA wrote:
> On Fri, 11 Oct 2024 12:16:50 +0900 (JST)
> Tatsuo Ishii wrote:
>
> > > We can check non-ASCII letters SGML/XML files by preparing "allowlist"
> > > that contains lines which are allowed to have non-ascii characters,
> > > although thi
On Fri, 11 Oct 2024 12:16:50 +0900 (JST)
Tatsuo Ishii wrote:
> > We can check non-ASCII letters SGML/XML files by preparing "allowlist"
> > that contains lines which are allowed to have non-ascii characters,
> > although this list will need to be maintained when lines in it are modified.
> > I've
> We can check non-ASCII letters SGML/XML files by preparing "allowlist"
> that contains lines which are allowed to have non-ascii characters,
> although this list will need to be maintained when lines in it are modified.
> I've attached a patch to add a simple Perl script to do this.
I doubt it r
On Thu, 10 Oct 2024 16:00:41 +0900 (JST)
Tatsuo Ishii wrote:
> > Bruce Momjian writes:
> >> Can we use Unicode in the SGML files?
> >
> > I believe we've been doing it for contributors' names that require
> > non-ASCII letters, but not in any other places.
>
> We have non-ASCII letters in char
> On 9 Oct 2024, at 04:49, Tatsuo Ishii wrote:
> Besides nbsp, there are tons of confusing Unicode
> characters out there. For example there are many "hyphen like
> characters".
Using characters which look alike is in the field of internet security known as
homograph attacks, where for example a
> Bruce Momjian writes:
>> Can we use Unicode in the SGML files?
>
> I believe we've been doing it for contributors' names that require
> non-ASCII letters, but not in any other places.
We have non-ASCII letters in charset.sgml too, to show some examples
of collation.
Best reagards,
--
Tatsuo I
Bruce Momjian writes:
> Can we use Unicode in the SGML files?
I believe we've been doing it for contributors' names that require
non-ASCII letters, but not in any other places.
regards, tom lane
On Wed, Oct 9, 2024 at 11:49:29AM +0900, Tatsuo Ishii wrote:
> >> On Mon, Sep 30, 2024 at 11:59:48AM +0200, Daniel Gustafsson wrote:
> >> > > On 30 Sep 2024, at 11:03, Tatsuo Ishii wrote:
> >> > >
> >> > I think there's an unnecessary underscore in config.sgml.
> >> > >
> >> > > I was wron
> On Mon, 7 Oct 2024 15:45:54 -0400
> Bruce Momjian wrote:
>
>> On Mon, Sep 30, 2024 at 11:59:48AM +0200, Daniel Gustafsson wrote:
>> > > On 30 Sep 2024, at 11:03, Tatsuo Ishii wrote:
>> > >
>> > I think there's an unnecessary underscore in config.sgml.
>> > >
>> > > I was wrong. The part
On Mon, 7 Oct 2024 15:45:54 -0400
Bruce Momjian wrote:
> On Mon, Sep 30, 2024 at 11:59:48AM +0200, Daniel Gustafsson wrote:
> > > On 30 Sep 2024, at 11:03, Tatsuo Ishii wrote:
> > >
> > I think there's an unnecessary underscore in config.sgml.
> > >
> > > I was wrong. The particular byte
Hi Danile, Yugo,
>> On 8 Oct 2024, at 02:03, Tatsuo Ishii wrote:
>>> On Tue, 1 Oct 2024 22:20:55 +0900
>>> Yugo Nagata wrote:
>
>>> I've attached a updated patch.
>>> I added the comment to explain why Perl is used instead of grep or sed.
>>
>> Looks good to me. If there's no objection, I wil
> On 8 Oct 2024, at 02:03, Tatsuo Ishii wrote:
>> On Tue, 1 Oct 2024 22:20:55 +0900
>> Yugo Nagata wrote:
>> I've attached a updated patch.
>> I added the comment to explain why Perl is used instead of grep or sed.
>
> Looks good to me. If there's no objection, I will commit this to
> master b
> On Tue, 1 Oct 2024 22:20:55 +0900
> Yugo Nagata wrote:
>
>> On Tue, 1 Oct 2024 15:16:52 +0900
>> Yugo NAGATA wrote:
>>
>> > On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
>> > Tatsuo Ishii wrote:
>> >
>> > > >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in
>> > > >> UTF-8. nb
On Mon, Sep 30, 2024 at 11:59:48AM +0200, Daniel Gustafsson wrote:
> > On 30 Sep 2024, at 11:03, Tatsuo Ishii wrote:
> >
> I think there's an unnecessary underscore in config.sgml.
> >
> > I was wrong. The particular byte sequences just looked an underscore
> > on my editor but the byte seq
On Tue, 1 Oct 2024 22:20:55 +0900
Yugo Nagata wrote:
> On Tue, 1 Oct 2024 15:16:52 +0900
> Yugo NAGATA wrote:
>
> > On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
> > Tatsuo Ishii wrote:
> >
> > > >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in
> > > >> UTF-8. nbsp in UTF-8 is
On Tue, 1 Oct 2024 15:16:52 +0900
Yugo NAGATA wrote:
> On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
> Tatsuo Ishii wrote:
>
> > >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in
> > >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
> > >> point in Unicode
On Tue, 01 Oct 2024 10:33:50 +0900 (JST)
Tatsuo Ishii wrote:
> >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in
> >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
> >> point in Unicode. i.e. U+00A0).
> >> So grep -P "[\xC2\xA0]" should work to detect
> On Mon, 30 Sep 2024 17:23:24 +0900 (JST)
> Tatsuo Ishii wrote:
>
>> >> I think there's an unnecessary underscore in config.sgml.
>> >> Attached patch fixes it.
>> >
>> > I could not apply the patch with an error.
>> >
>> > error: patch failed: doc/src/sgml/config.sgml:9380
>> > error: doc/s
>> That's because non-breaking space (nbsp) is not encoded as 0xa0 in
>> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
>> point in Unicode. i.e. U+00A0).
>> So grep -P "[\xC2\xA0]" should work to detect nbsp.
>
> `LC_ALL=C grep -P "\xC2\xA0"` works for my environment.
> (
On Mon, 30 Sep 2024 20:07:31 +0900 (JST)
Tatsuo Ishii wrote:
> >> I wonder if it would be worth to add a check for this like we have to tabs?
>
> +1.
>
> >> The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
> >> (doing so made me realize we don't have an equivalent meso
>> I wonder if it would be worth to add a check for this like we have to tabs?
+1.
>> The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
>> (doing so made me realize we don't have an equivalent meson target).
>
> Your patch couldn't detect 0xA0 in config.sgml in my machin
On Mon, 30 Sep 2024 11:59:48 +0200
Daniel Gustafsson wrote:
> > On 30 Sep 2024, at 11:03, Tatsuo Ishii wrote:
> >
> I think there's an unnecessary underscore in config.sgml.
> >
> > I was wrong. The particular byte sequences just looked an underscore
> > on my editor but the byte sequence
> On 30 Sep 2024, at 11:03, Tatsuo Ishii wrote:
>
I think there's an unnecessary underscore in config.sgml.
>
> I was wrong. The particular byte sequences just looked an underscore
> on my editor but the byte sequence is actually 0xc2a0, which must be a
> "non breaking space" encoded in UTF
On Mon, 30 Sep 2024 18:03:44 +0900 (JST)
Tatsuo Ishii wrote:
> >>> I think there's an unnecessary underscore in config.sgml.
>
> I was wrong. The particular byte sequences just looked an underscore
> on my editor but the byte sequence is actually 0xc2a0, which must be a
> "non breaking space" en
On Mon, 30 Sep 2024 17:23:24 +0900 (JST)
Tatsuo Ishii wrote:
> >> I think there's an unnecessary underscore in config.sgml.
> >> Attached patch fixes it.
> >
> > I could not apply the patch with an error.
> >
> > error: patch failed: doc/src/sgml/config.sgml:9380
> > error: doc/src/sgml/confi
>>> I think there's an unnecessary underscore in config.sgml.
I was wrong. The particular byte sequences just looked an underscore
on my editor but the byte sequence is actually 0xc2a0, which must be a
"non breaking space" encoded in UTF-8. I guess someone mistakenly
insert a non breaking space wh
>> I think there's an unnecessary underscore in config.sgml.
>> Attached patch fixes it.
>
> I could not apply the patch with an error.
>
> error: patch failed: doc/src/sgml/config.sgml:9380
> error: doc/src/sgml/config.sgml: patch does not apply
Strange. I have no problem applying the patch h
On Mon, 30 Sep 2024 15:34:04 +0900 (JST)
Tatsuo Ishii wrote:
> I think there's an unnecessary underscore in config.sgml.
> Attached patch fixes it.
I could not apply the patch with an error.
error: patch failed: doc/src/sgml/config.sgml:9380
error: doc/src/sgml/config.sgml: patch does not app
I think there's an unnecessary underscore in config.sgml.
Attached patch fixes it.
Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0aec11f443..08173ecb5c 1
70 matches
Mail list logo