[PATCH 4/4] ensure Xapian and SQLite are still optional for v1 tests

2018-04-06 Thread Eric Wong (Contractor, The Linux Foundation)
Xapian is size-intensive and SQLite is not strictly necessary for v1. --- script/public-inbox-compact | 2 +- scripts/import_vger_from_mbox | 2 +- t/convert-compact.t | 2 +- t/v2mirror.t | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git

[PATCH 2/4] nntp: set Xref across multiple inboxes

2018-04-06 Thread Eric Wong (Contractor, The Linux Foundation)
Noted by Jonathan Corbet in https://lwn.net/Articles/748184/ --- lib/PublicInbox/NNTP.pm | 43 --- t/nntp.t| 6 -- 2 files changed, 32 insertions(+), 17 deletions(-) diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm index

[PATCH 8/8] msgmap: speed up minmax with separate queries

2018-04-06 Thread Eric Wong (Contractor, The Linux Foundation)
This significantly improves the performance of the NNTP GROUP command with 2.7 million messages from over 250ms to 700us. SQLite is weird about this, but at least there's a way to optimize it. --- lib/PublicInbox/Msgmap.pm | 10 +++--- t/perf-nntpd.t| 13 + 2 files

[PATCH 3/8] over: remove forked subprocess

2018-04-06 Thread Eric Wong (Contractor, The Linux Foundation)
Since the overview stuff is a synchronization point anyways, move it into the main V2Writable process and allow us to drop a bunch of code. This is another step towards making Xapian optional for v2. In other words, the fan-out point is moved and the Xapian partitions no longer need to

[PATCH 4/8] v2writable: reduce barriers

2018-04-06 Thread Eric Wong (Contractor, The Linux Foundation)
Since we handle the overview info synchronously, we only need barriers in tests, now. We will use asynchronous checkpoints to sync less-important Xapian data. For data deduplication, this requires us to hoist out the cat-blob support in ::Import for reading uncommitted data in git. ---

[PATCH 7/8] store less data in the Xapian document

2018-04-06 Thread Eric Wong (Contractor, The Linux Foundation)
Since we only query the SQLite over DB for OVER/XOVER; do not need to waste space storing fields To/Cc/:bytes/:lines or the XNUM term. We only use From/Subject/References/Message-ID/:blob in various places of the PSGI code. For reindexing, we will take advantage of docid stability in

[PATCH 1/8] psgi: ensure /$INBOX/$MESSAGE_ID/T/ endpoint is chronological

2018-04-06 Thread Eric Wong (Contractor, The Linux Foundation)
We only need to call get_thread beyond 1000 messages for fetching entire mboxes. It's probably too much for the HTML display otherwise. --- lib/PublicInbox/Mbox.pm | 6 +++--- lib/PublicInbox/Over.pm | 18 +++--- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git