Re: Canonicalization of URLs on our website

2024-05-08 Thread Ralph Goers
The IDE lets you view the page you are editing but generally it doesn’t show 
you the nav or the rest of the site. I am also not confident that everything 
will be styled exactly the same as it will appear in the browser.
So, in short, the IDE is great for editing but I like to do my reviews on a 
local site.

Ralph

> On May 8, 2024, at 9:50 AM, Piotr P. Karwasz  wrote:
> 
> Hi Volkan,
> 
> On Wed, 8 May 2024 at 14:57, Volkan Yazıcı  wrote:
>> In my opinion,
>> 
>>   - Being able to view `target/site` with just using my browser and
>>   nothing else is super convenient. The development experience is much
>>   smoother.
> 
> If we use a non-default `html_extension_style`, you can still preview
> the website locally. You just need to modify the Antora Playbook
> (AFAIK Antora does not support variable substitution in the playbook).
> We can also set `html_extension_style: default` in our Git repo and
> configure the CI to modify it when it builds the staging site.
> 
> I usually use my IDE to preview the documentation pages, so for me it
> is not a problem.
> 
> Anyway I value consistency more than one particular solution. If we
> prefer to stick with the `default` style, for SEO purposes we should
> ensure that all our internal links end up in `.html`. This mainly
> applies to index files.
> 
> Piotr



Re: Canonicalization of URLs on our website

2024-05-08 Thread Piotr P. Karwasz
Hi Volkan,

On Wed, 8 May 2024 at 14:57, Volkan Yazıcı  wrote:
> In my opinion,
>
>- Being able to view `target/site` with just using my browser and
>nothing else is super convenient. The development experience is much
>smoother.

If we use a non-default `html_extension_style`, you can still preview
the website locally. You just need to modify the Antora Playbook
(AFAIK Antora does not support variable substitution in the playbook).
We can also set `html_extension_style: default` in our Git repo and
configure the CI to modify it when it builds the staging site.

I usually use my IDE to preview the documentation pages, so for me it
is not a problem.

Anyway I value consistency more than one particular solution. If we
prefer to stick with the `default` style, for SEO purposes we should
ensure that all our internal links end up in `.html`. This mainly
applies to index files.

Piotr


Re: Canonicalization of URLs on our website

2024-05-08 Thread Ralph Goers
Volkan,

I completely agree with you. I prefer to review my site changes either in 
target/site or by doing a deploy to a local directly. In both cases I use the 
file protocol to view it.

Ralph

> On May 8, 2024, at 5:55 AM, Volkan Yazıcı  wrote:
> 
> All non-default `html_extension_style` options require to run a web server.
> 
> In my opinion,
> 
>   - Being able to view `target/site` with just using my browser and
>   nothing else is super convenient. The development experience is much
>   smoother.
>   - None of the advantages you cited for switching from `/foo.html` to
>   `/foo`, `/foo/index.html`, etc. is worth the trouble/complexity it will
>   introduce.
> 
> In short, I am not inclined to change the current path naming scheme. That
> said, I don't want to sound bossy. I would appreciate it if others can join
> the discussion with their arguments.
> 
> On Wed, May 8, 2024 at 10:22 AM Piotr P. Karwasz 
> wrote:
> 
>> Hi all,
>> 
>> On Sun, 21 Apr 2024 at 20:19, Volkan Yazıcı  wrote:
>>>   1. Could you show us the Antora configuration option you mentioned
>>>   and how we can use it to achieve what you propose?
>> 
>> I found the perfect Antora setting: `html_extension_style`[1].
>> 
>> The option I am proposing corresponds to the `drop` style:
>> 
>> * a `/foo/bar.html` file will be referenced as `foo/bar`,
>> * a `/foo/index.html` file will be referenced as `foo/`.
>> 
>> The `indexify` style is very similar, but it always uses a trailing
>> `/` for the file names.
>> 
>> I see both pros and cons for the two styles:
>> 
>> ## `indexify` style
>> 
>> Pros:
>> * Doesn't make a difference between "normal" HTML files and folders.
>> If we transform `foo.html` into `foo/index.html` and add subpages, the
>> URL remains always `foo/`.
>> * We restore the old URLs like `/log4j/2.x/log4j-api/` that became
>> `/log4j/2.x/log4j-api.html`.
>> * Works on every HTTP server (even Python's).
>> 
>> Cons:
>> * We need a lot of HTTP redirects like
>> `/log4j/2.x/manual/configuration.html` ->
>> `/log4j/2.x/manual/configuration/`
>> 
>> ## `drop` style
>> 
>> Pros:
>> * We don't need redirects for the current pages, only a global rewrite
>> rule that states that we prefer to omit the `.html` suffix.
>> * It is shorter than the `indexify` style.
>> * It is easier to implement on already compiled pages: no need to
>> move/rename files.
>> 
>> Cons:
>> * If `foo.html` becomes `foo/index.html` the canonical URL changes
>> from `foo` to `foo/`. However the redirect from the old to the new URL
>> is done automatically by most servers.
>> * It doesn't work with all web servers, but it works with Apache HTTP
>> Server.
>> 
>> What do you think about adopting the `drop` style?
>> 
>> Piotr
>> 
>> PS: Javadoc also can use the `drop` style. See e.g. Jakarta drops the
>> `.html` (and apparently capital letters) from its Javadoc.
>> 
>> [1]
>> https://docs.antora.org/antora/latest/playbook/urls-html-extension-style/
>> [2]
>> https://jakarta.ee/specifications/servlet/6.0/apidocs/jakarta.servlet/jakarta/servlet/filter
>> 



Re: Canonicalization of URLs on our website

2024-05-08 Thread Volkan Yazıcı
All non-default `html_extension_style` options require to run a web server.

In my opinion,

   - Being able to view `target/site` with just using my browser and
   nothing else is super convenient. The development experience is much
   smoother.
   - None of the advantages you cited for switching from `/foo.html` to
   `/foo`, `/foo/index.html`, etc. is worth the trouble/complexity it will
   introduce.

In short, I am not inclined to change the current path naming scheme. That
said, I don't want to sound bossy. I would appreciate it if others can join
the discussion with their arguments.

On Wed, May 8, 2024 at 10:22 AM Piotr P. Karwasz 
wrote:

> Hi all,
>
> On Sun, 21 Apr 2024 at 20:19, Volkan Yazıcı  wrote:
> >1. Could you show us the Antora configuration option you mentioned
> >and how we can use it to achieve what you propose?
>
> I found the perfect Antora setting: `html_extension_style`[1].
>
> The option I am proposing corresponds to the `drop` style:
>
> * a `/foo/bar.html` file will be referenced as `foo/bar`,
> * a `/foo/index.html` file will be referenced as `foo/`.
>
> The `indexify` style is very similar, but it always uses a trailing
> `/` for the file names.
>
> I see both pros and cons for the two styles:
>
> ## `indexify` style
>
> Pros:
> * Doesn't make a difference between "normal" HTML files and folders.
> If we transform `foo.html` into `foo/index.html` and add subpages, the
> URL remains always `foo/`.
> * We restore the old URLs like `/log4j/2.x/log4j-api/` that became
> `/log4j/2.x/log4j-api.html`.
> * Works on every HTTP server (even Python's).
>
> Cons:
> * We need a lot of HTTP redirects like
> `/log4j/2.x/manual/configuration.html` ->
> `/log4j/2.x/manual/configuration/`
>
> ## `drop` style
>
> Pros:
> * We don't need redirects for the current pages, only a global rewrite
> rule that states that we prefer to omit the `.html` suffix.
> * It is shorter than the `indexify` style.
> * It is easier to implement on already compiled pages: no need to
> move/rename files.
>
> Cons:
> * If `foo.html` becomes `foo/index.html` the canonical URL changes
> from `foo` to `foo/`. However the redirect from the old to the new URL
> is done automatically by most servers.
> * It doesn't work with all web servers, but it works with Apache HTTP
> Server.
>
> What do you think about adopting the `drop` style?
>
> Piotr
>
> PS: Javadoc also can use the `drop` style. See e.g. Jakarta drops the
> `.html` (and apparently capital letters) from its Javadoc.
>
> [1]
> https://docs.antora.org/antora/latest/playbook/urls-html-extension-style/
> [2]
> https://jakarta.ee/specifications/servlet/6.0/apidocs/jakarta.servlet/jakarta/servlet/filter
>


Re: Canonicalization of URLs on our website

2024-05-08 Thread Piotr P. Karwasz
Hi all,

On Sun, 21 Apr 2024 at 20:19, Volkan Yazıcı  wrote:
>1. Could you show us the Antora configuration option you mentioned
>and how we can use it to achieve what you propose?

I found the perfect Antora setting: `html_extension_style`[1].

The option I am proposing corresponds to the `drop` style:

* a `/foo/bar.html` file will be referenced as `foo/bar`,
* a `/foo/index.html` file will be referenced as `foo/`.

The `indexify` style is very similar, but it always uses a trailing
`/` for the file names.

I see both pros and cons for the two styles:

## `indexify` style

Pros:
* Doesn't make a difference between "normal" HTML files and folders.
If we transform `foo.html` into `foo/index.html` and add subpages, the
URL remains always `foo/`.
* We restore the old URLs like `/log4j/2.x/log4j-api/` that became
`/log4j/2.x/log4j-api.html`.
* Works on every HTTP server (even Python's).

Cons:
* We need a lot of HTTP redirects like
`/log4j/2.x/manual/configuration.html` ->
`/log4j/2.x/manual/configuration/`

## `drop` style

Pros:
* We don't need redirects for the current pages, only a global rewrite
rule that states that we prefer to omit the `.html` suffix.
* It is shorter than the `indexify` style.
* It is easier to implement on already compiled pages: no need to
move/rename files.

Cons:
* If `foo.html` becomes `foo/index.html` the canonical URL changes
from `foo` to `foo/`. However the redirect from the old to the new URL
is done automatically by most servers.
* It doesn't work with all web servers, but it works with Apache HTTP Server.

What do you think about adopting the `drop` style?

Piotr

PS: Javadoc also can use the `drop` style. See e.g. Jakarta drops the
`.html` (and apparently capital letters) from its Javadoc.

[1] https://docs.antora.org/antora/latest/playbook/urls-html-extension-style/
[2] 
https://jakarta.ee/specifications/servlet/6.0/apidocs/jakarta.servlet/jakarta/servlet/filter


Re: Canonicalization of URLs on our website

2024-04-22 Thread Piotr P. Karwasz
Hi Volkan,

On Sun, 21 Apr 2024 at 20:19, Volkan Yazıcı  wrote:
>
> I have a couple of questions Piotr:
>
>1. Could you show us the Antora configuration option you mentioned
>and how we can use it to achieve what you propose?

There are `:relfileprefix:` and `:relfilesuffix:` attributes[1] that
can be used to achieve that effect.
The Antora website[2] itself hides the extensions of the files.

>2. Are you suggesting that all `foo.html` pages should be converted to
>`foo/`?

Initially I thought about `foo` (i.e. keep `foo.html` as file name and
use mod_negotiate), but `foo/` (i.e. putting the HTML in
`foo/index.html`) seems to be a more widespread practice.

The usual argument for using `foo` instead of `foo.html` is that it
hides the technology used to generate the site. While for `foo.php`
and `foo.jsp` this makes sense, for `foo.html` I find this argument
bogus, since websites will always use HTML as output format.

On the other hand there is another advantage for using `foo` instead
of `foo.html`: if in the future you create subpages `foo/bar` and
`foo/baz`, their URLs will be obtained by simply appending a word to
their "parent" URL.

Piotr

[1] 
https://docs.asciidoctor.org/asciidoc/latest/macros/inter-document-xref/#mapping-references-to-a-different-structure
[2] https://docs.antora.org/antora/latest/page/page-links/


Re: Canonicalization of URLs on our website

2024-04-21 Thread Ralph Goers
I have always viewed index.html as a special case. When navigating to the root 
of a site - l.a.o/log4j/2.x/ - should be sufficient as it should default to 
index.html. However, the “real” url includes index.html. 

Other pages should always be whatever.html IMO. 

Ralph

> On Apr 20, 2024, at 1:17 PM, Piotr P. Karwasz  wrote:
> 
> Hi,
> 
> I scanned our https://logging.apache.org/ website and found out that
> the internal hyperlinks between our pages are not consistent. For
> example links to:
> 
> https://logging.apache.org/log4j/2.x/
> 
> might appear in hyperlinks with an URI path of:
> 
> * `/log4j/2.x` (which causes a 301 HTTP redirect),
> * `/log4j/2.x/`,
> * `/log4j/2.x/index.html`.
> 
> This lack of uniformity can cause several problems:
> 
> * search engines might treat those 3 links as equivalent, but not necessarily.
> * if an `index.html` file is moved, we need to provide a redirect for
> all 3 alternatives: a recent example is
> `/log4j/2.x/log4j-1.2-api/index.html` that was moved to
> `/log4j2/2.x/log4j-1.2-api.html`.
> * for the rare people that actually look at the URL of a page, it
> doesn't seem coherent.
> 
> So I would propose to adopt only one of the 3 alternatives and stick
> to it as much as possible? Which one should we choose?
> 
> The simplest one (`/log4j/2.x/index.html`) does not require a Web
> server and can be viewed locally and can be viewed using the `file:`
> scheme in a browser. However I find it less elegant than
> `/log4j/2.x/`.
> Antora is probably able to generate both versions through some
> configuration option, so choosing `/log4j/2.x/` does not preclude the
> possibility to generate a different version to check the web site
> locally.
> 
> Another canonicalization we might apply regards trailing `.html`
> extensions in the URL. The current website supports both:
> 
> * `/log4j2/log4j-api`,
> * `/log4j2/log4j-api.html`.
> 
> through `mod_negotiation`. Should we use the version with a trailing
> `.html` or without it? The `https://apache.org/` website hides the
> `.html` extension in most the links.
> 
> Piotr



Re: Canonicalization of URLs on our website

2024-04-21 Thread Volkan Yazıcı
I have a couple of questions Piotr:

   1. Could you show us the Antora configuration option you mentioned
   and how we can use it to achieve what you propose?
   2. Are you suggesting that all `foo.html` pages should be converted to
   `foo/`?


On Sat, Apr 20, 2024 at 10:17 PM Piotr P. Karwasz 
wrote:

> Hi,
>
> I scanned our https://logging.apache.org/ website and found out that
> the internal hyperlinks between our pages are not consistent. For
> example links to:
>
> https://logging.apache.org/log4j/2.x/
>
> might appear in hyperlinks with an URI path of:
>
> * `/log4j/2.x` (which causes a 301 HTTP redirect),
> * `/log4j/2.x/`,
> * `/log4j/2.x/index.html`.
>
> This lack of uniformity can cause several problems:
>
> * search engines might treat those 3 links as equivalent, but not
> necessarily.
> * if an `index.html` file is moved, we need to provide a redirect for
> all 3 alternatives: a recent example is
> `/log4j/2.x/log4j-1.2-api/index.html` that was moved to
> `/log4j2/2.x/log4j-1.2-api.html`.
> * for the rare people that actually look at the URL of a page, it
> doesn't seem coherent.
>
> So I would propose to adopt only one of the 3 alternatives and stick
> to it as much as possible? Which one should we choose?
>
> The simplest one (`/log4j/2.x/index.html`) does not require a Web
> server and can be viewed locally and can be viewed using the `file:`
> scheme in a browser. However I find it less elegant than
> `/log4j/2.x/`.
> Antora is probably able to generate both versions through some
> configuration option, so choosing `/log4j/2.x/` does not preclude the
> possibility to generate a different version to check the web site
> locally.
>
> Another canonicalization we might apply regards trailing `.html`
> extensions in the URL. The current website supports both:
>
> * `/log4j2/log4j-api`,
> * `/log4j2/log4j-api.html`.
>
> through `mod_negotiation`. Should we use the version with a trailing
> `.html` or without it? The `https://apache.org/` 
> website hides the
> `.html` extension in most the links.
>
> Piotr
>


Re: Canonicalization of URLs on our website

2024-04-20 Thread Piotr P. Karwasz
Hi Gary,

On Sun, 21 Apr 2024 at 00:02, Gary Gregory  wrote:
>
> I agree with Piotr. I prefer the simplest solution, pointing to
> `index.html`, no guessing required.

Personally I prefer the shortest one:

* no www,
* no `index.html`,
* no `.html`.

Piotr


Re: Canonicalization of URLs on our website

2024-04-20 Thread Gary Gregory
I agree with Piotr. I prefer the simplest solution, pointing to
`index.html`, no guessing required.

Gary

On Sat, Apr 20, 2024 at 4:17 PM Piotr P. Karwasz
 wrote:
>
> Hi,
>
> I scanned our https://logging.apache.org/ website and found out that
> the internal hyperlinks between our pages are not consistent. For
> example links to:
>
> https://logging.apache.org/log4j/2.x/
>
> might appear in hyperlinks with an URI path of:
>
> * `/log4j/2.x` (which causes a 301 HTTP redirect),
> * `/log4j/2.x/`,
> * `/log4j/2.x/index.html`.
>
> This lack of uniformity can cause several problems:
>
> * search engines might treat those 3 links as equivalent, but not necessarily.
> * if an `index.html` file is moved, we need to provide a redirect for
> all 3 alternatives: a recent example is
> `/log4j/2.x/log4j-1.2-api/index.html` that was moved to
> `/log4j2/2.x/log4j-1.2-api.html`.
> * for the rare people that actually look at the URL of a page, it
> doesn't seem coherent.
>
> So I would propose to adopt only one of the 3 alternatives and stick
> to it as much as possible? Which one should we choose?
>
> The simplest one (`/log4j/2.x/index.html`) does not require a Web
> server and can be viewed locally and can be viewed using the `file:`
> scheme in a browser. However I find it less elegant than
> `/log4j/2.x/`.
> Antora is probably able to generate both versions through some
> configuration option, so choosing `/log4j/2.x/` does not preclude the
> possibility to generate a different version to check the web site
> locally.
>
> Another canonicalization we might apply regards trailing `.html`
> extensions in the URL. The current website supports both:
>
> * `/log4j2/log4j-api`,
> * `/log4j2/log4j-api.html`.
>
> through `mod_negotiation`. Should we use the version with a trailing
> `.html` or without it? The `https://apache.org/` website hides the
> `.html` extension in most the links.
>
> Piotr


Canonicalization of URLs on our website

2024-04-20 Thread Piotr P. Karwasz
Hi,

I scanned our https://logging.apache.org/ website and found out that
the internal hyperlinks between our pages are not consistent. For
example links to:

https://logging.apache.org/log4j/2.x/

might appear in hyperlinks with an URI path of:

* `/log4j/2.x` (which causes a 301 HTTP redirect),
* `/log4j/2.x/`,
* `/log4j/2.x/index.html`.

This lack of uniformity can cause several problems:

* search engines might treat those 3 links as equivalent, but not necessarily.
* if an `index.html` file is moved, we need to provide a redirect for
all 3 alternatives: a recent example is
`/log4j/2.x/log4j-1.2-api/index.html` that was moved to
`/log4j2/2.x/log4j-1.2-api.html`.
* for the rare people that actually look at the URL of a page, it
doesn't seem coherent.

So I would propose to adopt only one of the 3 alternatives and stick
to it as much as possible? Which one should we choose?

The simplest one (`/log4j/2.x/index.html`) does not require a Web
server and can be viewed locally and can be viewed using the `file:`
scheme in a browser. However I find it less elegant than
`/log4j/2.x/`.
Antora is probably able to generate both versions through some
configuration option, so choosing `/log4j/2.x/` does not preclude the
possibility to generate a different version to check the web site
locally.

Another canonicalization we might apply regards trailing `.html`
extensions in the URL. The current website supports both:

* `/log4j2/log4j-api`,
* `/log4j2/log4j-api.html`.

through `mod_negotiation`. Should we use the version with a trailing
`.html` or without it? The `https://apache.org/` website hides the
`.html` extension in most the links.

Piotr