[PATCH] extindex: support --no-multi-pack-index

2024-04-28 Thread Eric Wong
git multi-pack-index files were creating swap storms and OOM-ing on my system; so providing an option to disable it seems prudent given the minor startup time regression. --- Documentation/public-inbox-extindex.pod | 13 + Documentation/public-inbox-index.pod| 7 +++

[PATCH] t/imap_searchqp: hopefully fix test reliability

2024-04-28 Thread Eric Wong
Localizing assignments to *STDERR doesn't seem to always work with scalar (String) IO objects. Fortunately, doing actual dup2 redirects always seems reliable, so do that instead of attempting to understand why PerlIO sometimes fails with the assignment. --- t/imap_searchqp.t | 15 ++-

ActivityPub vs email - cultural differences

2024-04-28 Thread Eric Wong
I've been looking into AP off and on over the last year. While the technical side is certainly feasable, I'm not sure how to deal with the cultural differences... * AP allows limited HTML, sometimes sourced from Markdown; but tags seem supported by all implementations. Even snac2 (a

Re: filtering stable patches in lore queries

2024-04-27 Thread Eric Wong
"Jason A. Donenfeld" wrote: > Hi, > > Greg and Sasha add the "X-stable: review" to their patch bombs, with the > intention that people will be able to filter these out should they > desire to do so. For example, I usually want all threads that match code > I care about, but I don't regularly

Re: [RFC] altid: start supporting indexfilter type (was: Alternate permalink URLs)

2024-04-27 Thread Eric Wong
Ping. Trying your gentoo address, since I'm wondering if your lack of response was due to an address I hadn't seen before. Any thoughts on this RFC for configuring additional header indices? I was just reminded of this by someone else...

[PATCH 2/4] search: remove auto-start for async_mset

2024-04-26 Thread Eric Wong
Only public-facing daemons use it, currently, and all public-facing daemons will pre-spawn it as early as feasible. lei will need it eventually to handle queries requiring C++, but I'm not certain what path to take with lei, yet... --- lib/PublicInbox/Search.pm | 1 - 1 file changed, 1

[PATCH 1/4] test_common: don't needlessly rebuild C++ Xapian helper

2024-04-26 Thread Eric Wong
We should almost always be calling `check_build' instead of `build'. Using ccache masked some of the overhead from this, but various linker implementations are still slow. --- lib/PublicInbox/TestCommon.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

[PATCH 0/4] more xap_helper updates

2024-04-26 Thread Eric Wong
Eric Wong (4): test_common: don't needlessly rebuild C++ Xapian helper search: remove auto-start for async_mset xap_helper: reopen logs in daemons xap_helper: implement alarm(2)-based timeout lib/PublicInbox/Daemon.pm | 37 --- lib/PublicInbox/Search.pm | 1

[PATCH 3/4] xap_helper: reopen logs in daemons

2024-04-26 Thread Eric Wong
When read-only daemons reopen log files via SIGUSR1, be sure to propagate it to Xapian helper processes to ensure old log files can be closed and archived. --- lib/PublicInbox/Daemon.pm | 37 +--- lib/PublicInbox/TestCommon.pm | 11 +++ lib/PublicInbox/XapHelper.pm |

[PATCH 4/4] xap_helper: implement alarm(2)-based timeout

2024-04-26 Thread Eric Wong
alarm(2) delivering SIGALRM seems sufficient for Xapian since Xapian doesn't block signals (which would necessitate the use of SIGKILL via RLIMIT_CPU hard limit). When Xapian gets stuck in `D' state on slow storage, SIGKILL would not make a difference, either (at least not on Linux). Relying on

[PATCH 4/5] search: async_mset: pass resource errors to callback

2024-04-25 Thread Eric Wong
We need to be able to handle resource limitation errors in public-facing daemons. --- lib/PublicInbox/Search.pm | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm index 60d12dbf..b7732ae5 100644 ---

[PATCH 5/5] daemon: share and allow configuring Xapian helpers

2024-04-25 Thread Eric Wong
Xapian helper processes are disabled by default once again. However, they can be enabled via the new `-X INTEGER' parameter. One big positive is the Xapian helpers being spawned by the top-level daemon means they can be shared freely across all workers for improved load balancing and memory

[PATCH 2/5] www: mbox*: use Perl 5.12

2024-04-25 Thread Eric Wong
We were already silently relying on v5.10 features (`//') and all the regexps to work correctly with v5.12 unicode_strings. --- lib/PublicInbox/Mbox.pm | 2 +- lib/PublicInbox/MboxGz.pm | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/Mbox.pm

[PATCH 3/5] send_cmd4: make `tries' a per-call parameter

2024-04-25 Thread Eric Wong
While existing callers are private (lei, *-index, -watch) are private, we should not be blocking the event loop in public-facing servers when we hit ETOOMANYREFS, ENOMEM, or ENOBUFS. --- lib/PublicInbox/CmdIPC4.pm | 12 ++-- lib/PublicInbox/Spawn.pm | 12 +++-

[PATCH 0/5] xap_helper stuff for public daemons

2024-04-25 Thread Eric Wong
1 and 2 are trivial fixes. 3 and 4 makes failures more graceful when dealing with resource exhaustion. 5 allows configuring the Xapian helper processes from the top-level daemon and reverts to disabling helpers by default. Eric Wong (5): t/cindex: require DBD::SQLite for now www: mbox*: use

[PATCH 1/5] t/cindex: require DBD::SQLite for now

2024-04-25 Thread Eric Wong
Technically it's not required, but -compact blindly requires DBD::SQLite at the moment since it was designed for inboxes in mind. Furthermore, cindex isn't useful at the moment without inboxes to associate with, and inboxes can't be indexed without SQLite. --- t/cindex.t | 2 +- 1 file changed,

BSD make problem [was: [PATCH 5/5] www: wire up search ...]

2024-04-25 Thread Eric Wong
Eric Wong wrote: > diff --git a/MANIFEST b/MANIFEST > index 4c974338..fb175e5f 100644 > --- a/MANIFEST > +++ b/MANIFEST > @@ -382,6 +382,8 @@ lib/PublicInbox/XapClient.pm > lib/PublicInbox/XapHelper.pm > lib/PublicInbox/XapHelperCxx.pm > lib/PublicInbox/Xapcmd.pm >

[PATCH 6/5] xap_helper: PERL_INLINE_DIRECTORY fallback for JAOT build

2024-04-24 Thread Eric Wong
systemd setups may use role accounts (e.g. `news') with XDG_CACHE_HOME unset and a non-existent HOME directory which the user has no permission to create. In those cases, fallback to using PERL_INLINE_DIRECTORY if available for building the just-ahead-of-time C++ binary. ---

[PATCH 4/5] mbox: hoist out refill_result_ids

2024-04-24 Thread Eric Wong
This makes upcoming changes easier to understand. --- lib/PublicInbox/Mbox.pm | 32 ++-- 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm index 52f88ae3..ac565df9 100644 --- a/lib/PublicInbox/Mbox.pm +++

[PATCH 5/5] www: wire up search to use async xap_helper

2024-04-24 Thread Eric Wong
The C++ version of xap_helper will allow more complex and expensive queries. Both the Perl and C++-only version will allow offloading search into a separate process which can be killed via ITIMER_REAL or RLIMIT_CPU in the face of overload. The xap_helper `mset' command wrapper is simplified to

[PATCH 3/5] xap_helper: drop terms+data from `mset' command

2024-04-24 Thread Eric Wong
Retrieving Xapian document terms, data (and possibly values) and transferring to the Perl side would be an increase in complexity and I/O both the Perl and C++ sides. It would require more I/O in C++ and transient memory use on the Perl side where slow mset iteration gives an opportunity to

[PATCH 1/5] searchview: get rid of unused adump callback arg

2024-04-24 Thread Eric Wong
It hasn't been used since 2016 when we started working on improved streamability of gigantic responses. Fixes: 95d4bf7aded4 (atom: switch to getline/close for response bodies, 2016-12-03) --- lib/PublicInbox/SearchView.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git

[PATCH 2/5] xap_helper.h: remove _SC_NPROCESSORS_ONLN default

2024-04-24 Thread Eric Wong
It's never straightforward to pick an ideal number of processes for anything and Xapian helper processes are no exception since there may be a massive disparities in CPU count and I/O performance. So default to a single worker for now in the C++ version since that's the default is for the

[PATCH 0/5] www: start using xap_helper process

2024-04-24 Thread Eric Wong
Putting Xapian work into subprocesses will allow us to implement proper timeouts and ultra-expensive queries without harming unrelated queries. It's on by default right now, but I think it's better to keep it off by default to avoid tripping up existing process monitoring tools. Eric Wong (5

[PATCH] doc: strongly recommend MALLOC_MMAP_THRESHOLD_=131072 for glibc

2024-04-18 Thread Eric Wong
The 131072 byte lower bound was the old default before the sliding mmap window was introduced in modern glibc malloc. While the sliding mmap window was intended to be faster by reducing syscalls, zeroing and kernel overhead, it is also prone to fragmentation from allocation patterns seen in

Re: Sharing lei searches

2024-04-18 Thread Eric Wong
Gonsolo wrote: > Hi! > > Is there an easy way to share a lei configuration for a different computer? > Right now I'm relying on the following (clumsy) workflow: > > 1. lei edit-search on computer A, copy and mail to myself > 2. Dummy lei q on computer B. > 3. lei edit-search on computer B, use

sub prototypes aren't enough...

2024-04-17 Thread Eric Wong
Eric Wong wrote: > v2 fixes an incorrect call to add_uniq_timer. Sometimes I wish Perl > could have more static type||arg checking, but it's probably still > better than other scripting languages... Fwiw, this would work for all current callers, AFAIK: diff --git a/lib/PublicIn

[PATCH v2 3/4] lei/store: stop shard workers + cat-file on idle

2024-04-17 Thread Eric Wong
Schedule a timer to stop shard workers and the git-cat-file process after a `barrier' command. This allows us to save some memory again when the lei-daemon is idle but preserves the fork overhead reduction when issuing many commands in parallel or in quick succession. --- v2 fixes an incorrect

[PATCH 2/4] lei: use ->barrier to commit to lei/store

2024-04-16 Thread Eric Wong
barrier (synchronous checkpoint) is better than ->done with parallel lei commands being issued (via '&' or different terminals), since repeatedly stopping and restarting processes doesn't play nicely with expensive tasks like `lei reindex'. This introduces a slight regression in maintaining more

[PATCH 3/4] lei/store: stop shard workers + cat-file on idle

2024-04-16 Thread Eric Wong
Schedule a timer to stop shard workers and the git-cat-file process after a `barrier' command. This allows us to save some memory again when the lei-daemon is idle but preserves the fork overhead reduction when issuing many commands in parallel or in quick succession. ---

[PATCH 1/4] v2 + lei/store: always wait for fast-import checkpoint

2024-04-16 Thread Eric Wong
Since data going to git is the most important, always ensure data is written to git before attempting to write anything to SQLite or Xapian. --- lib/PublicInbox/LeiStore.pm | 4 +--- lib/PublicInbox/V2Writable.pm | 8 +--- 2 files changed, 2 insertions(+), 10 deletions(-) diff --git

[PATCH 0/4] lei parallelism fixes

2024-04-16 Thread Eric Wong
This series allows `lei reindex' to run in parallel with other lei commands which write to lei/store. Eric Wong (4): v2 + lei/store: always wait for fast-import checkpoint lei: use ->barrier to commit to lei/store lei/store: stop shard workers + cat-file on idle lei: use async barr

[PATCH 4/4] lei: use async barrier for --import-before

2024-04-16 Thread Eric Wong
Write barriers can take a long time to finish, especially when commands are issues in parallel. So handle it asynchronously without blocking lei-daemon by making EOFpipe a little more flexible by supporting arguments to the callback function. This is another step towards improving parallel use

[PATCH] doc: note MALLOC_MMAP_THRESHOLD_ as a potential workaround

2024-04-15 Thread Eric Wong
Large string processing + concurrency + caching/memoization really brings out the worst in glibc malloc :< --- Documentation/public-inbox-tuning.pod | 5 - examples/public-inbox-netd@.service | 5 - 2 files changed, 8 insertions(+), 2 deletions(-) diff --git

Re: downloading t.mbox.gz messages are not sorted in expected order

2024-04-13 Thread Eric Wong
Jacob Keller wrote: > > > On 4/11/2024 3:42 PM, Konstantin Ryabitsev wrote: > > On Thu, Apr 11, 2024 at 03:32:43PM -0700, Jacob Keller wrote: > >> I sometimes download patch series off of public inbox hosted servers to > >> apply with git-am. Occasionally I have found that these do not apply >

[PATCH 2/3] io: avoid redundant waitpid in DESTROY

2024-04-12 Thread Eric Wong
We shouldn't attempt to reap a process again after it's been reaped asynchronously in the SIGCHLD handler. Noticed while working on changes to get lei/store to use checkpointing. --- lib/PublicInbox/IO.pm | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git

[PATCH 0/3] some lei fixes

2024-04-12 Thread Eric Wong
Some trivial fixes I noticed while (still) working on getting lei to use checkpoint to improve parallelism. Eric Wong (3): lei_remote: solver supports uncommitted blobs io: avoid redundant waitpid in DESTROY lei: remove leftover debugging message lib/PublicInbox/IO.pm| 10

[PATCH 1/3] lei_remote: solver supports uncommitted blobs

2024-04-12 Thread Eric Wong
This should improve `lei blob' and `lei rediff' functionality for folks relying on `lei index' and allows future work to improve parallelism via checkpointing in lei/store. --- lib/PublicInbox/LeiRemote.pm | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git

[PATCH 3/3] lei: remove leftover debugging message

2024-04-12 Thread Eric Wong
Noticed while working on other things... Fixes: 299aac294ec3 (lei: do label/keyword parsing in optparse, 2023-10-02) --- lib/PublicInbox/LEI.pm | 2 -- 1 file changed, 2 deletions(-) diff --git a/lib/PublicInbox/LEI.pm b/lib/PublicInbox/LEI.pm index 06592358..5b46686a 100644 ---

Re: [PATCH] lei q: support --thread-id=$MSGID || -T $MSGID

2024-04-12 Thread Eric Wong
Štěpán Němec wrote: > Eric Wong wrote: > > + is $lei_out, '', 'no results on unlrelated thread'; > ^ > s/unlrelated/unrelated/ Thanks, squashed: diff --git a/t/psgi_v2.t b/t/psgi_v2.t index 56a6ae8e..d5c328f0 100644 --

[PATCH] doc: mknews: fix warnings when generating NEWS.html

2024-04-12 Thread Eric Wong
We need these values in the PSGI $env to generate the cache key, even if we're not linkifying anything. Fixes: 48cbe0c3 (www: linkify inbox addresses in To/Cc headers, 2024-01-09) --- Documentation/mknews.perl | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/mknews.perl

Re: lei-up doesn't output replies to matching thread

2024-04-11 Thread Eric Wong
Josh Steadmon wrote: > Hello again, > > I'm having trouble where `lei up --all` is not outputting threaded > replies, despite the fact that the saved search requests them. I noticed > the problem on this Git thread: I think this needs the notmuch-style subquery support I mentioned here:

[PATCH] lei q: support --thread-id=$MSGID || -T $MSGID

2024-04-11 Thread Eric Wong
This adds support for the "POST /$INBOX/$MSGID/?x=m?q=..." added last year to support per-thread searches 764035c83 (www: support POST /$INBOX/$MSGID/?x=m=, 2023-03-30) This only supports instances of public-inbox since 764035c83, but unfortunately there hasn't been a release since then. ---

[PATCH] lei blob: fix attachment extraction for unimported||inflight

2024-04-11 Thread Eric Wong
Noticed while trying to make other reliability improvements to lei... --- lib/PublicInbox/LeiBlob.pm | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/LeiBlob.pm b/lib/PublicInbox/LeiBlob.pm index 127cc81e..00697097 100644 ---

Re: sample robots.txt to reduce WWW load

2024-04-08 Thread Eric Wong
Eric Wong wrote: > I've unleashed the bots again and let them run rampant on the > https://80x24.org/lore/ HTML pages. Will need to add malloc > tracing on my own to generate reproducible results to prove it's > worth adding to glibc malloc... Unfortunately, mwrap w/ tracing is expe

Re: [RFT] syscall: set default constants for Inline::C platforms

2024-04-08 Thread Eric Wong
Gaelan Steele wrote: > > > On Apr 8, 2024, at 10:48 AM, Eric Wong wrote: > > > >> I’m not enough of a Perl person to fully untangle this. As > >> best I can tell, the intent is that non-Linux/BSD OSes should > >> still work with Inline::C, but this

[PATCH] www: speed up global manifest.js.gz w/ "all" extindex

2024-04-08 Thread Eric Wong
By reducing internal event loop iterations, this brings 300+ inboxes down ~32ms to ~27ms. It should also be more consistent on servers with busy event loops since all the Xapian DB traffic happens at once, theoretically mproving cache utilization. --- lib/PublicInbox/ConfigIter.pm | 7 +++

[RFT] syscall: set default constants for Inline::C platforms

2024-04-08 Thread Eric Wong
Gaelan Steele wrote: > Unfortunately this patch broke public-inbox on Darwin: > > Bareword "SIZEOF_cmsghdr" not allowed while "strict subs" in use at > /tmp/public-inbox/lib/PublicInbox/Syscall.pm line 456. > BEGIN not safe after errors--compilation aborted at >

public-inbox.org/git/*/s/ endpoints struggling

2024-04-07 Thread Eric Wong
The blob reconstruction endpoints (aka the WWW version of `lei blob') have been hammered by bots while I'm gathering data to reproduce (and hopefully fix) fragmentation problems in dlmalloc-based allocators. The data gathering is silly expensive all around and I'm on old and slow hardware and

Re: v1.9.0 : `ls-search' is not an lei command

2024-04-04 Thread Eric Wong
Josh Steadmon wrote: > Hello all, > > I recently had to reinstall my work machine, and after doing so I now > get an error when running `lei ls-search`: > > `ls-search' is not an lei command > > This happens with both version 1.9.0-1+build1 of the Debian "lei" > package, and with version 1.9.0

Re: sample robots.txt to reduce WWW load

2024-04-03 Thread Eric Wong
Konstantin Ryabitsev wrote: > On Mon, Apr 01, 2024 at 01:21:45PM +0000, Eric Wong wrote: > > Performance is still slow, and crawler traffic patterns tend to > > do bad things with caches at all levels, so I've regretfully had > > to experiment with robots.txt to mitigate

sample robots.txt to reduce WWW load

2024-04-01 Thread Eric Wong
Performance is still slow, and crawler traffic patterns tend to do bad things with caches at all levels, so I've regretfully had to experiment with robots.txt to mitigate performance problems. The /s/ solver endpoint remains expensive but commit 8d6a50ff2a44 (www: use a dedicated limiter for blob

[PATCH 2/3] treewide: avoid getpid() for OnDestroy checks

2024-04-01 Thread Eric Wong
getpid() isn't cached by glibc nowadays and system calls are more expensive due to CPU vulnerability mitigations. To ensure we switch to the new semantics properly, introduce a new `on_destroy' function to simplify callers. Furthermore, most OnDestroy correctness is often tied to the process

[PATCH 1/3] lock: get rid of PID guard

2024-04-01 Thread Eric Wong
PID guards for OnDestroy will be the default in an upcoming change. In the meantime, LeiMirror was the only user and didn't actually need it. --- lib/PublicInbox/LeiMirror.pm | 2 +- lib/PublicInbox/Lock.pm | 8 2 files changed, 5 insertions(+), 5 deletions(-) diff --git

[PATCH 0/3] treewide: getpid() syscall reduction

2024-04-01 Thread Eric Wong
problem a while ago. Eric Wong (3): lock: get rid of PID guard treewide: avoid getpid() for OnDestroy checks treewide: avoid getpid for more ownership checks lib/PublicInbox/CodeSearchIdx.pm | 29 +++ lib/PublicInbox/DS.pm | 9 lib/PublicInbox

[PATCH 3/3] treewide: avoid getpid for more ownership checks

2024-04-01 Thread Eric Wong
There are still some places where on_destroy isn't suitable, This gets rid of getpid() calls in most of those cases to reduce syscall costs and cleanup syscall trace output. --- lib/PublicInbox/DSKQXS.pm | 7 --- lib/PublicInbox/Daemon.pm | 26 +-

Re: How to initialize Git repos

2024-03-29 Thread Eric Wong
Felix Lechner wrote: > Hi, > > When I edit the config file manually, i.e. without public-inbox-init, You can still use -init, it should be idempotent > I see errors until messages are added, such as: > > ($INBOX_DIR/description missing) for v1, it's the same as $GIT_DIR/description

Re: [PATCH (RFC) 2/2] INSTALL: try to be less confusing about optional modules

2024-03-18 Thread Eric Wong
Štěpán Němec wrote: > On Sat, 16 Mar 2024 21:27:56 + > Eric Wong wrote: > > > Štěpán Němec wrote: > >> The difference between the "numerous optional modules" > >> section (containing only two modules) and the "everything > >> else o

Re: [PATCH 1/2] Fix some typos and language nits in docs and comments

2024-03-16 Thread Eric Wong
thanks, pushed

Re: [PATCH (RFC) 2/2] INSTALL: try to be less confusing about optional modules

2024-03-16 Thread Eric Wong
Štěpán Němec wrote: > The difference between the "numerous optional modules" > section (containing only two modules) and the "everything > else optional" section was unclear (to me, at least). > Just put both under a single heading. > +++ b/INSTALL > @@ -58,7 +58,8 @@ Where "deb" indicates

Re: [RFC] daemon: experimental PREALLOC_NR env knob

2024-03-14 Thread Eric Wong
Abandoning this idea for now... jemalloc via LD_PRELOAD seems be working out to deal with fragmentation. I hope to be able to steal jemalloc ideas for glibc malloc to give a better out-of-the-box experience on GNU/Linux systems.

Re: Lei exception

2024-03-14 Thread Eric Wong
please keep meta@public-inbox.org in the Cc: Gonsolo wrote: > > Which version of public-inbox is this? It looks like 1.9 based > > on the line number[1]. > > Package from Ubuntu/Debian. > > lei 1.9.0-1 (Ubuntu) > 936c275178dfc6908577487ce97d3a83c58c5449 PublicInbox/LeiSearch.pm OK. > >

[PATCH] doc: update release notes, marketing, and install

2024-03-13 Thread Eric Wong
INSTALL now covers more of lei since I'm less uncomfortable about it for 2.0 and points users towards the install/ helpers if installing from source. --- Documentation/RelNotes/v2.0.0.wip | 46 -- Documentation/marketing.txt | 13 +- INSTALL |

Re: Lei exception

2024-03-13 Thread Eric Wong
Gonsolo wrote: > Hi! > > Since a few days I'm getting the following error when running "lei up --all": > > 10770 lei2mail 1 (nshard=3) 8b214638a3a05e3d9f2345a632a5eed0de7f9ab6: > Exception: Document 22720 not found at > /usr/share/perl5/PublicInbox/LeiSearch.pm line 68. > > Is there anything I

Re: mda tool not delivering

2024-03-12 Thread Eric Wong
Felix Lechner wrote: > Hi, > > What's a good place to ask questions about running public-inbox-mda, > please? This is the only place :> > I can't get public-inbox-mda to import on-disk messages. They end up in > the emergency Maildir. My configuration and a sample message are below. >

[PATCH 4/4] codesearch: deduplicate $git->{nick} field

2024-03-11 Thread Eric Wong
While PublicInbox::Config is responsible for some instances of setting $git->{nick}, more PublicInbox::Git objects may be created from loading the cindex and we should do our best to reuse that memory, too. Followup-to: 84ed7ec1c887 (dedupe inbox names, coderepo nicks + git dirs, 2024-03-04) ---

[PATCH 0/4] memory reductions for WWW + solver

2024-03-11 Thread Eric Wong
nd 2/4 with Devel::Mwrap and noticed 4/4 while working on 2/4. 3/4 is just a doc update but I've been successfully using jemalloc on my lore+gko mirror for a week or two, now (and I plan to experiment with making glibc||dlmalloc more resistant to fragmentation) Eric Wong (4): www: use a dedicated l

[PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs

2024-03-11 Thread Eric Wong
With my current mirror of lore + gko, this saves over 300K allocations and brings the allocation count in this area down to under 5K. The reduction in AV refs saves around 45MB RAM according to measurements done live via Devel::Mwrap. --- lib/PublicInbox/CodeSearch.pm | 11 +-- 1 file

[PATCH 3/4] doc: tuning: note reduced fragmentation w/ jemalloc

2024-03-11 Thread Eric Wong
I may be mistaken, but I suspect the reason jemalloc handles long-lived processes better than glibc is due to granularity reduction being scaled to larger size classes. This can waste 20% of an individual allocation, but increases the likelyhood of reuse (without splitting/consolidating into

[PATCH 1/4] www: use a dedicated limiter for blob solver

2024-03-11 Thread Eric Wong
Wrap the entire solver command chain with a dedicated limiter. The normal limiter is designed for longer-lived commands or ones which serve a single HTTP request (e.g. git-http-backend or cgit) and not effective for short memory + CPU intensive commands used for solver. Each overall solver

[PATCH] listener: don't loop on errors

2024-03-11 Thread Eric Wong
Fortunately, this only affects `--multi-accept=' users, with `--multi-accept=-1' users getting infinite loops. I noticed this when EMFILE was reached on my setup, but any error should cause us to give up accept(2) (at least temporarily) and allow work for other items in the event loop to be

[PATCH] import: fix handling of init.defaultBranch

2024-03-10 Thread Eric Wong
Eric Wong wrote: > Will try to fix ASAP but offline problems persist :< Fortunately it turned out to be a simple fix :x ---8<--- Subject: [PATCH] import: fix handling of init.defaultBranch We must chomp the newline in the branch name if it's set. Reported-by: Rob Herring Lin

Re: crash on git-fast-import

2024-03-10 Thread Eric Wong
Rob Herring wrote: > Most Recent Commands Before Crash > - > blob > mark :1 > data 12367 > get-mark :1 > reset refs/heads/main Ah, it looks like init.defaultBranch != master is completely broken :x You should be able to workaround the problem by

[PATCH 2/2] import: croak (instead of die) on write failures

2024-03-08 Thread Eric Wong
This allows accurate reporting of the error location and can be made to dump a Perl backtrace via PERL5OPT='-MCarp=verbose'. Noticed while tracking down fast-import failures. Link: https://public-inbox.org/meta/cal_jsqk7p4gjlpyvzxnecymxt4j6ah5f3pz1rqdhxmystg3...@mail.gmail.com/ ---

[PATCH 1/2] lei: prevent empty {bytes} field in saved search

2024-03-08 Thread Eric Wong
Noticed while tracking down fast-import crash bug report. Link: https://public-inbox.org/meta/cal_jsqk7p4gjlpyvzxnecymxt4j6ah5f3pz1rqdhxmystg3...@mail.gmail.com/ --- lib/PublicInbox/LeiSearch.pm | 2 ++ lib/PublicInbox/LeiToMail.pm | 1 + lib/PublicInbox/OverIdx.pm | 6 +- 3 files

[PATCH 0/2] fixes noticed while tracking down fast-import failures

2024-03-08 Thread Eric Wong
I'm still trying to track down the cause of fast-import failures[1], but this series presents some useful fixes regardless. Link: https://public-inbox.org/meta/cal_jsqk7p4gjlpyvzxnecymxt4j6ah5f3pz1rqdhxmystg3...@mail.gmail.com/ Eric Wong (2): lei: prevent empty {bytes} field in saved search

Re: crash on git-fast-import

2024-03-07 Thread Eric Wong
Rob Herring wrote: > fatal: Expected committer but didn't get one > fast-import: dumping crash report to > /home/rob/.local/share/lei/store/local/0.git/fast_import_crash_99633 Are you able to share the contents of that crash file? > E: import done: write to fast-import failed: Illegal seek at >

[PATCH] dedupe inbox names, coderepo nicks + git dirs

2024-03-04 Thread Eric Wong
Inbox names, coderepo nicks, git_dir values are used heavily as hash keys by the read-only coderepo WWW pieces. Relying on CoW for mutable scalars on newer Perl doesn't work well since CoW for those scalars are limited to 256 CoW references and blow past that number when mapping thousands of

[RFC] daemon: experimental PREALLOC_NR env knob

2024-02-24 Thread Eric Wong
As I've observed using the mwrap-perl LD_PRELOAD wrapper, permanent 4080-byte arenas allocated by Perl late in the process will impede consolidation of freed adjacent blocks. In long-lived processes, this fragmentation from immortal arenas near the "wilderness"[1] area can force excessive memory

[PATCH 1/2] eml: avoid anonymous __WARN__ sub for encode/decode

2024-02-13 Thread Eric Wong
Repeatedly allocating an anonymous sub is an expensive operation and a potential source of leaks in older Perl. Instead, `local'-ize a global and use a permanent sub to workaround the old Encode 2.87..3.12 leak. --- lib/PublicInbox/Eml.pm | 18 +++--- 1 file changed, 11

[PATCH 2/2] eml: reuse ->decode buffer

2024-02-13 Thread Eric Wong
It's not really relevant at the moment, but a sufficiently smart implementation could eventually save some memory here. Perl already optimizes in-place sort (@x = sort @x), so there's precedent for a potential future where a Perl implementation could generally optimize in-place operations for

[PATCH 0/2] eml: allocation reductions

2024-02-13 Thread Eric Wong
1/2 is obvious, 2/2 is aspirational dream territory... (been dreaming up a faster, alternative run-time for Perl :P) Eric Wong (2): eml: avoid anonymous __WARN__ sub for encode/decode eml: reuse ->decode buffer lib/PublicInbox/Eml.pm | 22 +- 1 file changed,

[PATCH 2/3] doc: config: cgit=rewrite isn't implemented, yet

2024-02-13 Thread Eric Wong
It'll probably be done for another release, I doubt most cgit users are willing to completely replace it with our coderepo viewer just yet... --- Documentation/public-inbox-config.pod | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/public-inbox-config.pod

[PATCH 0/3] www cgit support updates

2024-02-13 Thread Eric Wong
I only noticed the fix for 1/3 because I wasn't using a custom-patched+built cgit for the first time, ever; but our cgit support was unusable for non-default config files without it. I only noticed 3/3 while checking the result of 2/3. Eric Wong (3): www: cgit: support non-standard cgitrc

[PATCH 1/3] www: cgit: support non-standard cgitrc locations

2024-02-13 Thread Eric Wong
If publicinbox.cgitrc is set in the config file, we'll ensure cgit sees it as CGIT_CONFIG since the configured publicinbox.cgitrc knob may not be the default path the cgit.cgi binary was configured to use. Furthermore, we'll respect CGIT_CONFIG in the environment if publicinbox.cgitrc is unset in

[PATCH 3/3] doc: fix formatting for CLI switch aliases

2024-02-13 Thread Eric Wong
`=item' elements in Pod need to be surrounded by empty lines. It's an unfortunate waste of vertical space, but Pod is still better than *roff and usually available out-of-the-box. --- Documentation/lei-q.pod | 2 ++ Documentation/public-inbox-clone.pod | 1 +

[PATCH 1/2] xap_helper_cxx: -O2 optimize read-only files by default

2024-02-12 Thread Eric Wong
While fast build times from -O0 is critical to my sanity when actively working on C++, the files installed via package managers or `make install' aren't likely to change frequently. In that case, expensive -O2 optimizations make sense since the 10-20s saved from a single large --join more than

[PATCH 0/2] xap_helper C++ fixes

2024-02-12 Thread Eric Wong
I suppose -O2 build times isn't the worst thing for release users even though I can't stand it while hacking... Eric Wong (2): xap_helper_cxx: -O2 optimize read-only files by default codesearch: generate_cxx: drop unused variables lib/PublicInbox/CodeSearch.pm | 1 - lib/PublicInbox

[PATCH 2/2] codesearch: generate_cxx: drop unused variables

2024-02-12 Thread Eric Wong
We are just using the odd ref+deref (`${\...}') syntax and don't need to calculate line numbers ourselves, nowadays. --- lib/PublicInbox/CodeSearch.pm | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/PublicInbox/CodeSearch.pm b/lib/PublicInbox/CodeSearch.pm index 48697cdc..1f95a726 100644

Re: [PATCH 1/2] viewvcs: parallelize commit display

2024-02-12 Thread Eric Wong
Eric Wong wrote: > times down to ~6s (still slow). Avoiding patch generation for > root commits (like cgit also does) brings it down to a few > hundred milliseconds on a public-facing server. Actually, the "(like cgit also does)" bit is wrong: cgit generates a full patch,

[PATCH 0/2] viewvcs improvements

2024-02-12 Thread Eric Wong
Major speedups for root commits, and a smaller speedup for giant non-root commits too large to show. 2/2 fixes some long-standing HTML generation bugs :x Eric Wong (2): viewvcs: parallelize commit display viewvcs: HTML fixes for commits lib/PublicInbox/ViewVCS.pm | 104

[PATCH 2/2] viewvcs: HTML fixes for commits

2024-02-12 Thread Eric Wong
The "patch is too large to show" text is now broken by an to prevent it from being confused as part of a commit message (or having somebody intentionally insert that text in a commit message to confuse readers). A missing is also necessary before the tag for the related commit search form. ---

[PATCH 1/2] viewvcs: parallelize commit display

2024-02-12 Thread Eric Wong
Similar to commit cbe2548c91859dfb923548ea85d8531b90d53dc3 (www_coderepo: use OnDestroy to render summary view, 2023-04-09), we can rely on OnDestroy and Qspawn to run dependencies in a structured way and with some extra parallelism for SMP users. Perl (as opposed to POSIX sh) allows us to easily

[PATCH] www: quiet errors for git-{archive,http-backend}

2024-02-09 Thread Eric Wong
SIGPIPE (13) can be quite common with unreliable connections and impatient clients, so just ignore them. --- lib/PublicInbox/GitHTTPBackend.pm | 1 + lib/PublicInbox/RepoSnapshot.pm | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/GitHTTPBackend.pm

Re: lei up can't fetch new thread messages when searching by mid

2024-02-09 Thread Eric Wong
Pratyush Yadav wrote: yeah, known problem, whole thread from last year... https://public-inbox.org/meta/20230330112951.M493025@dcvr/T/ > Is there a way to properly subscribe to an email thread? I suppose I can > do query the subject instead but that can also run into some corner >

[PATCH v2] view: decode In-Reply-To comments added by some MUAs

2024-02-09 Thread Eric Wong
Štěpán Němec wrote: > Eric Wong wrote: > > Subject: [PATCH] view: decode In-Reply-To comments added by Gnus > Or just "some MUAs"? Who knows who else... Yeah, I wouldn't be surprised if there were more... ---8<--- Subject: [PATCH] view: decode In-Reply-To comments

[PATCH] view: decode In-Reply-To comments added by Gnus

2024-02-07 Thread Eric Wong
I noticed it in while scanning for something else. --- lib/PublicInbox/View.pm | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 697535ff..2aee5f05 100644 ---

[PATCH v2] daemon: quiet Email::Address::XS warnings properly

2024-02-07 Thread Eric Wong
Eric Wong wrote: > index a2c1ed6e..7aee5c72 100644 > --- a/lib/PublicInbox/Daemon.pm > +++ b/lib/PublicInbox/Daemon.pm > @@ -698,7 +701,6 @@ sub run { > # localize GCF2C for tests: > local $PublicInbox::GitAsyncCat::GCF2C; > local $PublicInbox::Git::async_wa

[PATCH] daemon: quiet Email::Address::XS warnings properly

2024-02-02 Thread Eric Wong
Setting $SIG{__WARN__} at the top-level no longer has any effect since we localize $SIG{__WARN__} when entering ->event_step on a per-listener basis. Fixes: 60d262483a4d (daemon: use per-listener SIG{__WARN__} callbacks, 2022-08-08) --- lib/PublicInbox/Daemon.pm | 6 -- 1 file changed, 4

[PATCH v2] pop3d: support fcntl locks on OpenBSD i386

2024-02-01 Thread Eric Wong
Štěpán Němec wrote: > Eric Wong wrote: > > > AFAIK File::FcntlLock isn't packaged for OpenBSD, > > https://openports.pl/path/devel/p5-File-FcntlLock > https://ftp.openbsd.org/pub/OpenBSD/snapshots/packages/amd64/p5-File-FcntlLock-0.22.tgz > > (apparently only adde

  1   2   3   4   5   6   7   8   9   10   >