Bug#1003089: man-db has become prohibitively slow

2022-01-23 Thread Colin Watson
Control: tag -1 fixed-upstream On Sun, Jan 23, 2022 at 06:11:19PM +0100, Steinar H. Gunderson wrote: > On Sat, Jan 22, 2022 at 12:41:56AM +, Colin Watson wrote: > >> Technically, UTF-8 validation can be done at a few gigabytes per second > >> per core: > >> > >> > >>

Bug#1003089: man-db has become prohibitively slow

2022-01-23 Thread Steinar H. Gunderson
On Sat, Jan 22, 2022 at 12:41:56AM +, Colin Watson wrote: >> Technically, UTF-8 validation can be done at a few gigabytes per second >> per core: >> >> >> https://lemire.me/blog/2018/05/16/validating-utf-8-strings-using-as-little-as-0-7-cycles-per-byte/ >> >> but that is probably

Bug#1003089: man-db has become prohibitively slow

2022-01-21 Thread Colin Watson
On Fri, Jan 21, 2022 at 11:38:56PM +0100, Steinar H. Gunderson wrote: > On Fri, Jan 21, 2022 at 09:48:06PM +, Colin Watson wrote: > > So the current behaviour isn't a bug as such, but there's definitely > > room for optimization here: when operating in-process, and in the common > > case where

Bug#1003089: man-db has become prohibitively slow

2022-01-21 Thread Steinar H. Gunderson
On Fri, Jan 21, 2022 at 09:48:06PM +, Colin Watson wrote: > So the current behaviour isn't a bug as such, but there's definitely > room for optimization here: when operating in-process, and in the common > case where the target encoding is UTF-8, the UTF-8 to UTF-8 trial > decoding path could

Bug#1003089: man-db has become prohibitively slow

2022-01-21 Thread Colin Watson
On Thu, Jan 20, 2022 at 08:26:40PM +0100, Steinar H. Gunderson wrote: > On Mon, Jan 17, 2022 at 10:46:29PM +, Colin Watson wrote: > > test_manfile (which despite the name is not a test function) calls > > find_name with file!="-" and encoding=NULL; that causes find_name to > > call

Bug#1003089: man-db has become prohibitively slow

2022-01-20 Thread Steinar H. Gunderson
On Mon, Jan 17, 2022 at 10:46:29PM +, Colin Watson wrote: > test_manfile (which despite the name is not a test function) calls > find_name with file!="-" and encoding=NULL; that causes find_name to > call get_page_encoding, which always returns something non-NULL > ("ISO-8859-1" for English

Bug#1003089: man-db has become prohibitively slow

2022-01-18 Thread Steinar H. Gunderson
On Mon, Jan 17, 2022 at 10:46:29PM +, Colin Watson wrote: > Mild preference for GitLab MR comments, but I'm not that fussy. I dropped some comments (I've never used the interface before, but it seemed to work OK). What I haven't done is to download the patch and test performance myself; I'll

Bug#1003089: man-db has become prohibitively slow

2022-01-17 Thread Colin Watson
On Mon, Jan 17, 2022 at 09:15:14PM +0100, Steinar H. Gunderson wrote: > On Mon, Jan 17, 2022 at 04:10:02AM +, Colin Watson wrote: > > We definitely do need to sort out encoding conversion, though. Although > > UTF-8 has been recommended for many years, policy still allows "the > > usual

Bug#1003089: man-db has become prohibitively slow

2022-01-17 Thread Steinar H. Gunderson
On Mon, Jan 17, 2022 at 04:10:02AM +, Colin Watson wrote: > Significant progress! See the end of this email. Thanks for dealing with this. I'm deleting most of your text, but be sure that I've read it :-) And I understand most of it (although I of course disagree at some points, I think

Bug#1003089: man-db has become prohibitively slow

2022-01-16 Thread Colin Watson
Significant progress! See the end of this email. On Tue, Jan 04, 2022 at 10:21:57AM +0100, Steinar H. Gunderson wrote: > On Mon, Jan 03, 2022 at 10:34:08PM +, Colin Watson wrote: > > This part is already covered in https://bugs.debian.org/696503. I admit > > that this was last updated ten

Bug#1003089: man-db has become prohibitively slow

2022-01-04 Thread Steinar H. Gunderson
On Tue, Jan 04, 2022 at 10:28:51PM +0100, Steinar H. Gunderson wrote: > I made a straw man to test whether this was really true, and it turns it is. > See the attached patch, Now actually attached. /* Steinar */ -- Homepage: https://www.sesse.net/ --- man-db-2.9.4.orig/lib/sandbox.c +++

Bug#1003089: man-db has become prohibitively slow

2022-01-04 Thread Steinar H. Gunderson
On Tue, Jan 04, 2022 at 10:21:58AM +0100, Steinar H. Gunderson wrote: > I took a look at mandb's profile in perf, and even after turning off > libseccomp, it appears that perhaps 10–11% of its time is spent doing real > work (decompression, lexer, malloc, character set conversion). The rest is >

Bug#1003089: man-db has become prohibitively slow

2022-01-04 Thread Steinar H. Gunderson
On Mon, Jan 03, 2022 at 10:34:08PM +, Colin Watson wrote: > This part is already covered in https://bugs.debian.org/696503. I admit > that this was last updated ten years ago; I really need to get back to > that and get it to work one way or another. :-/ Ah, well, that wasn't so easy to

Bug#1003089: man-db has become prohibitively slow

2022-01-03 Thread Colin Watson
On Mon, Jan 03, 2022 at 08:34:46PM +0100, Steinar H. Gunderson wrote: > I've noticed that during upgrades to bullseye, the man-db trigger now seems to > take a very long time; in fact, it frequently seems to be a significant time > of > the total time of the full-upgrade (depending, of course, on

Bug#1003089: man-db has become prohibitively slow

2022-01-03 Thread Steinar H. Gunderson
Package: man-db Version: 2.9.4-2 Severity: normal Tags: upstream Hi Colin, I've noticed that during upgrades to bullseye, the man-db trigger now seems to take a very long time; in fact, it frequently seems to be a significant time of the total time of the full-upgrade (depending, of course, on