Re: Auto-checking dead links in the manual (was: http: links in the manual)

2022-08-22 Thread Ihor Radchenko
Max Nikulin  writes:

> I hope that selenium is currently overkill, however more sites are 
> starting to use anti-DDOS shields like cloudflare and HTTP client may be 
> banned just because it does not fetch other resources like JS scripts.

Such links are to be considered dead for the purposes of Org manual.
We must not link websites that cannot be opened without running non-free
JS. It is according to GNU Documentation Standards.

> I do not have a patch, just an idea: export backend that ignores 
> everything besides link and either send requests from lisp code or 
> generate file for another tool.
>
> #+attr_linklint: ...
>
> may be used to specify regexp that target page is expected to contain. 
> There are some complications like e.g. "info:" links having special code 
> to generate HTML with URL derived from original path. So it may be more 
> robust to parse HTML document (without checking of linked document text).

Yes, the most robust way will be simply extracting links from the html
version of the manual and testing them using whatever method is
appropriate.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92



Re: Auto-checking dead links in the manual (was: http: links in the manual)

2022-08-22 Thread Hendursaga
> I hope that selenium is currently overkill

Me too, although the WebDriver protocol itself is less bloated than Selenium. 
Personally I use Etaoin[1] for anything WebDriver-related, it's pretty compact, 
Lisp-y, and you can easily run unit tests with Emacs. As for anything 
ready-made for cleaning up dead links, I'm not aware of, unfortunately.

[1] https://github.com/clj-commons/etaoin



Re: Auto-checking dead links in the manual (was: http: links in the manual)

2022-08-22 Thread Max Nikulin

On 22/08/2022 09:46, Ihor Radchenko wrote:

Juan Manuel Macías writes:


Maybe, instead of repairing the links manually, we could think of some
code that would do this work periodically, and also check the health of
the links, running a url request on each link and returning a list of
broken links. I don't know if it is possible to do something like that
in Elisp, as I don't have much experience with web and link issues. I
think there are also external tools, like Selenium Web Driver, but my
experience with it is very limited (I use Selenium from time to time
when I want to take a screenshot of a web page).


This is a good idea.

Selenium is probably an overkill since we should better not link JS-only
websites from the manual anyway. What we can do instead is a make target
that will use something like wget.

Patches are welcome!


I hope that selenium is currently overkill, however more sites are 
starting to use anti-DDOS shields like cloudflare and HTTP client may be 
banned just because it does not fetch other resources like JS scripts.


I do not have a patch, just an idea: export backend that ignores 
everything besides link and either send requests from lisp code or 
generate file for another tool.


#+attr_linklint: ...

may be used to specify regexp that target page is expected to contain. 
There are some complications like e.g. "info:" links having special code 
to generate HTML with URL derived from original path. So it may be more 
robust to parse HTML document (without checking of linked document text).






Auto-checking dead links in the manual (was: http: links in the manual)

2022-08-21 Thread Ihor Radchenko
Juan Manuel Macías  writes:

>> Max Nikulin to emacs-orgmode. [PATCH] org-manual.org: Update links to
>> MathJax docs. Sun, 3 Oct 2021 23:17:46 +0700.
>> https://list.orgmode.org/sjcl3b$gsr$1...@ciao.gmane.io
>>
>> In the particular case of docs.mathjax.org I am unsure if mild
>> preference of http: over https: is not a mistake in the server
>> configuration. I do not mind "https:" there, any variant is better
>> than the old broken link.
>
> Maybe, instead of repairing the links manually, we could think of some
> code that would do this work periodically, and also check the health of
> the links, running a url request on each link and returning a list of
> broken links. I don't know if it is possible to do something like that
> in Elisp, as I don't have much experience with web and link issues. I
> think there are also external tools, like Selenium Web Driver, but my
> experience with it is very limited (I use Selenium from time to time
> when I want to take a screenshot of a web page).

This is a good idea.

Selenium is probably an overkill since we should better not link JS-only
websites from the manual anyway. What we can do instead is a make target
that will use something like wget.

Patches are welcome!

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92



Re: http: links in the manual

2022-08-21 Thread Juan Manuel Macías
Max Nikulin writes:

> One may got no response trying to fix a link.
>
> Max Nikulin to emacs-orgmode. [PATCH] org-manual.org: Update links to
> MathJax docs. Sun, 3 Oct 2021 23:17:46 +0700.
> https://list.orgmode.org/sjcl3b$gsr$1...@ciao.gmane.io
>
> In the particular case of docs.mathjax.org I am unsure if mild
> preference of http: over https: is not a mistake in the server
> configuration. I do not mind "https:" there, any variant is better
> than the old broken link.

Maybe, instead of repairing the links manually, we could think of some
code that would do this work periodically, and also check the health of
the links, running a url request on each link and returning a list of
broken links. I don't know if it is possible to do something like that
in Elisp, as I don't have much experience with web and link issues. I
think there are also external tools, like Selenium Web Driver, but my
experience with it is very limited (I use Selenium from time to time
when I want to take a screenshot of a web page).

Best regards,

Juan Manuel 



http: links in the manual

2022-08-20 Thread Max Nikulin

On 20/08/2022 12:51, Ihor Radchenko wrote:


Note that we still have a number of http links in the manual. One may
want to fix them.


One may got no response trying to fix a link.

Max Nikulin to emacs-orgmode. [PATCH] org-manual.org: Update links to 
MathJax docs. Sun, 3 Oct 2021 23:17:46 +0700. 
https://list.orgmode.org/sjcl3b$gsr$1...@ciao.gmane.io


In the particular case of docs.mathjax.org I am unsure if mild 
preference of http: over https: is not a mistake in the server 
configuration. I do not mind "https:" there, any variant is better than 
the old broken link.