[PATCH] convert: preserve indexlevel on conversions

2020-02-08 Thread Eric Wong
We don't want to blow up users storage too badly when converting v1 to v2 or break because they don't have Xapian bindings installed. --- script/public-inbox-convert | 9 + 1 file changed, 9 insertions(+) diff --git a/script/public-inbox-convert b/script/public-inbox-convert index 906001c

[PATCH 2/1] t/multi-mid: skip properly w/o DBD::SQLite

2020-02-08 Thread Eric Wong
SearchIdx always requires DBD::SQLite, so only require it after we've passed `require_mods(qw(DBD::SQLite))'. --- t/multi-mid.t | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/multi-mid.t b/t/multi-mid.t index 94c0e0a2..df865efb 100644 --- a/t/multi-mid.t +++ b/t/multi-mid.t

[ANNOUNCE] public-inbox 1.3.0

2020-02-09 Thread Eric Wong
Many internal improvements to improve the developer experience, long-term maintainability, ease-of-installation and compatibility. There are also several bugfixes. Some of the internal improvements involve avoiding Perl startup time in tests. "make check" now runs about 50% faster than before, an

[PATCH] t/msg_iter: test for X-UNKNOWN charset from Alpine

2020-02-13 Thread Eric Wong
A long overdue test for behavior established in 2016. Fixes: 1b28cc7f00a866cb ("view: try assuming UTF-8 for bogus charsets") --- MANIFEST | 1 + t/msg_iter.t | 20 t/x-unknown-alpine.eml | 21 + 3 files changed, 42 insertions(+)

[PATCH 5/8] view,searchview: avoid smsg method calls when using SQLite/Xapian

2020-02-15 Thread Eric Wong
We already pre-populate the hashref when loading $smsg (PublicInbox::SearchMsg) objects out of over.sqlite3 or Xapian, so making expensive method calls isn't necessary in those cases. We only need to use the method calls when SQLite or Xapian are not available or are being populated (such as durin

[PATCH 3/8] view: dump_topics: better naming of top Subject

2020-02-15 Thread Eric Wong
We use `$top' in other places, so name it to `$top_subj' consistently for `$subj' and `$prev_subj' comparisons down the function. --- lib/PublicInbox/View.pm | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 45c

[PATCH 4/8] view: cleanup topic accumulation and dumping

2020-02-15 Thread Eric Wong
Avoid needlessly normalizing the subject when dumping, since it's pushed into the @$topic array during accumulation in normalized form. We can also safely treat $smsg as a hashref and avoid calling "->ds" as a method since we know we've got that loaded via Over||Search and won't have to use Email:

[PATCH 2/8] view: single id="t" for multi-Subject messages

2020-02-15 Thread Eric Wong
While multi-Subject messages are unfortunate, try not to generate confusing/invalid HTML with multiple elements having the same HTML id attribute. --- lib/PublicInbox/View.pm | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/View.pm b/lib/PublicInb

[PATCH 7/8] view: escape ampersand in Message-IDs

2020-02-15 Thread Eric Wong
We need to escape ampersands (and some other characters for href attributes), so introduce a `mid_href' sub to do just that. '<', '>' and '"' were always escaped, so there's no risk of tag or attribute injection, but creative Message-IDs could cause confusion for some parsers and generate invalid

[PATCH 0/8] some view cleanups and minor bugfixes

2020-02-15 Thread Eric Wong
Pretty insignificant, but the diffstat makes me happy :> Eric Wong (8): view: remove mhref arg from multipart_text_as_html view: single id="t" for multi-Subject messages view: dump_topics: better naming of top Subject view: cleanup topic accumulation and dumping view,sear

[PATCH 1/8] view: remove mhref arg from multipart_text_as_html

2020-02-15 Thread Eric Wong
No point in passing something on stack only to stash it into the $ctx which holds most other parameters used for rendering the HTML. --- lib/PublicInbox/View.pm | 14 +++--- lib/PublicInbox/WwwAtomStream.pm | 3 ++- xt/perf-msgview.t| 3 ++- 3 files changed, 11 i

[PATCH 8/8] view: remove last Hval->new caller

2020-02-15 Thread Eric Wong
The object-oriented Hval API turned out to be less useful and more clunky than I envisioned years ago, so get rid of it. We'll no longer strip trailing whitespace from From: headers in the HTML display, but I doubt anybody cares. --- lib/PublicInbox/Hval.pm | 21 - lib/PublicIn

[PATCH 6/8] view: escape Subject HTML directly

2020-02-15 Thread Eric Wong
No need to use the over-engineered Hval OO API when the subject is already normalized and there's no trailing spaces because of normalization. --- lib/PublicInbox/View.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 033

[PATCH] viewdiff: do not generate "a=" parameter if "b=" matches

2020-02-16 Thread Eric Wong
Long URLs waste bandwidth and redundant query parameters make caching more difficult and expensive. Fixes: ddec19694cbf0e1d ("viewdiff: rewrite and simplify") --- lib/PublicInbox/ViewDiff.pm | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/ViewDiff.pm b/l

[PATCH] doc: design_www: document solver endpoint

2020-02-16 Thread Eric Wong
The blob regeneration (solving) part has been stable and performant for over a year with no problems, even with web crawlers constantly hitting it without needing rate limits. All the other stuff is open to bikeshedding (as long as my crappy hardware supports it :P) --- Documentation/design_www.t

[PATCH] view: shorten life of MIME object for permalink

2020-02-17 Thread Eric Wong
We don't need to hold onto the Email::MIME object across multiple WwwResponse->getline calls, instead we can stuff the rendered HTML of the first (and hopefully only) message of the buffer into ctx->{-html_tip}. --- lib/PublicInbox/View.pm | 58 - 1 file cha

Re: [PATCH] view: shorten life of MIME object for permalink

2020-02-17 Thread Eric Wong
Pushed as commit 9703d80efd848f582e5b265db1958e0f143d8712 I expect this to be significant in high-concurrency situations. There's more changes on the horizon to further reduce memory usage of the WWW interface :> -- unsubscribe: one-click, see List-Unsubscribe header archive: https://public-inbox.

[PATCH] doc: improve wording of "inbox" vs "repository"

2020-02-22 Thread Eric Wong
Since v2 inboxes contain multiple git repositories, avoid the use of the word "repository" when referring to inboxes as a whole in most places. --- Documentation/public-inbox-convert.pod | 6 +++--- Documentation/public-inbox-daemon.pod| 3 +-- Documentation/public-inbox-index.pod | 6 ++

Re: [PATCH] doc: improve wording of "inbox" vs "repository"

2020-02-23 Thread Eric Wong
Kyle Meyer wrote: > Eric Wong writes: > > > Since v2 inboxes contain multiple git repositories, avoid the > > use of the word "repository" when referring to inboxes as a > > whole in most places. > [...] > > diff --git a/TODO b/TODO > > index 9

[PATCH] doc: technical: document data structures

2020-02-23 Thread Eric Wong
Can't code without data structures, and we emphasize data over code just about everywhere. --- Documentation/technical/data_structures.txt | 228 MANIFEST| 1 + 2 files changed, 229 insertions(+) create mode 100644 Documentation/technical

[PATCH 2/1] INSTALL: s/repositories/inboxes/

2020-02-23 Thread Eric Wong
Since v2 inboxes can be made of several git repositories, consistently call them "inboxes", instead. --- INSTALL | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/INSTALL b/INSTALL index bf1c821a..7d14ca55 100644 --- a/INSTALL +++ b/INSTALL @@ -21,9 +21,9 @@ Requirements pub

[PATCH] searchview: set obfuscation inbox properly

2020-02-23 Thread Eric Wong
We never lookup `$ctx->{-obfuscate}' anywhere, as the correct key is `$ctx->{-obfs_ibx}' since some of the address obfuscation stuff is inbox-specific. Note: some of the obfuscation stuff still needs tests, but it's low-priority at the moment since I don't think it's a good feature after all. ---

[PATCH 0/2] import_vger_from_mbox improvements

2020-02-23 Thread Eric Wong
There are lots of mboxes out there :) Eric Wong (2): import_vger_from_mbox: drop redundant "use" statements import_vger_from_mbox: add --filter parameter scripts/import_vger_from_mbox | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) -- unsubscribe: one-click

[PATCH 2/2] import_vger_from_mbox: add --filter parameter

2020-02-23 Thread Eric Wong
It shouldn't be hard to make this into a more generic importer not specific to vger lists. --- scripts/import_vger_from_mbox | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/scripts/import_vger_from_mbox b/scripts/import_vger_from_mbox index b3aceb6b..d1ce7231 100644 --- a/sc

[PATCH 1/2] import_vger_from_mbox: drop redundant "use" statements

2020-02-23 Thread Eric Wong
PublicInbox::InboxWritable takes care of those imports. --- scripts/import_vger_from_mbox | 3 --- 1 file changed, 3 deletions(-) diff --git a/scripts/import_vger_from_mbox b/scripts/import_vger_from_mbox index 0e5ba6b4..b3aceb6b 100644 --- a/scripts/import_vger_from_mbox +++ b/scripts/import_vge

[PATCH 2/3] viewdiff: remove optional CR handling

2020-02-23 Thread Eric Wong
The only caller of `flush_diff' is `add_text_body', and that already did CRLF conversion on the text part. The regexps in SolverGit still need to preserve CR, however, since that actually applies patches (instead of rendering them), and we need to preserve CRLF patches for CRLF files. --- lib/Pub

[PATCH 1/3] hval: ascii_html: drop CRLF => LF conversion

2020-02-23 Thread Eric Wong
Instead, we add CRLF conversion to the only remaining place which needs it, ViewVCS. This save many redundant ops in in many places. The only other place where this mattered was in View::add_text_body, but we already started doing CRLF conversions when we added diff parsing and link generation fo

[PATCH 3/3] examples/nginx_proxy: convert CRLF to LF

2020-02-23 Thread Eric Wong
It was the only file in our tree which had CRLF line endings, so make it consistent with the rest. --- examples/nginx_proxy | 48 ++-- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/examples/nginx_proxy b/examples/nginx_proxy index 38e60643.

[PATCH 0/3] avoid redundant CRLF handling

2020-02-23 Thread Eric Wong
Redundant ops waste cycles and make the code more difficult to follow. And 3/3 is an overdue cleanup which can also serve as an impromptu test for solver... Eric Wong (3): hval: ascii_html: drop CRLF => LF conversion viewdiff: remove optional CR handling examples/nginx_proxy: convert C

[PATCH 0/2] v2writable: reduce smsg->{mime} impact

2020-02-24 Thread Eric Wong
improve some v2writable behaviors while we're at it. Eric Wong (2): v2writable: make remove return-compatible w/ Import::remove v2writable: lookup_content => content_exists lib/PublicInbox/V2Writable.pm | 34 -- t/v2writable.t| 7 +--

[PATCH 2/2] v2writable: lookup_content => content_exists

2020-02-24 Thread Eric Wong
It only needs to return a boolean, since none of the current callers care about the return value. Thus avoid a hash table assignment and use of `$smsg->{mime}', here. --- lib/PublicInbox/V2Writable.pm | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/

[PATCH] v2writable: lookup_content => content_exists

2020-02-24 Thread Eric Wong
It only needs to return a boolean, since none of the current callers care about the return value, so avoid a hash assignment and use of `$smsg->{mime}', here. --- lib/PublicInbox/V2Writable.pm | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/V2Writabl

[PATCH 1/2] v2writable: make remove return-compatible w/ Import::remove

2020-02-24 Thread Eric Wong
Import::remove is a documented interface, and the return value of the V2Writable work-alike should try to be compatible with what Import implements. --- lib/PublicInbox/V2Writable.pm | 23 +-- t/v2writable.t| 7 +-- 2 files changed, 18 insertions(+), 12 del

[RFC] msgtime: do not require tz offset with Date::Parse fallback

2020-02-25 Thread Eric Wong
Leah Neukirchen wrote: > Hi, > > I've recently imported some sizable archives (~100k messages) of old > mailing lists and noticed some slight inconveniences: Thanks for the reports, will answer 2. separately. > 1) RFC5322/822 invalid Date: headers should be parsed more gracefully > > Some old

weird From: lines [was: Two small issues when importing old archives]

2020-02-25 Thread Eric Wong
Leah Neukirchen wrote: > 2) Weird From: lines crash the whole import > > From: "=?iso-8859-1?Q?Jochen_K=FCpper?= > This funny line broke import_maildir: > > fatal: Missing > in ident string: =?iso-8859-1?Q?Jochen_K=FCpper?= usenet > <"=?iso-8859-1?Q?Jochen_K=FCpper?= 1101853296 > +0100 > fa

[PATCH] doc: 1.4.0 release notes update

2020-02-25 Thread Eric Wong
Perhaps 1.4.0 will be a small release, after all (and also smaller in terms of memory use :) --- Documentation/RelNotes/v1.4.0.eml | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/Documentation/RelNotes/v1.4.0.eml b/Documentation/RelNotes/v1.4.0.eml index 6b1bc86e..0ebf8d6

[PATCH] searchview: improve naming and simplify hash override

2020-02-25 Thread Eric Wong
`%over' could be confused for the overview SQLite DB instance, so call it `%override', instead. There's also no need to write a loop to override a hash when the language can do it for us. --- lib/PublicInbox/SearchView.pm | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --gi

[PATCH] import: drop '<' and '>' characters in addresses

2020-02-26 Thread Eric Wong
Eric Wong wrote: > Leah Neukirchen wrote: > > 2) Weird From: lines crash the whole import > > > > From: "=?iso-8859-1?Q?Jochen_K=FCpper?= > > > This funny line broke import_maildir: > > > > fatal: Missing > in ident string: =?iso-8859-

[PATCH] doc: design_www: document offline friendliness

2020-02-27 Thread Eric Wong
This isn't anything new and has been a part of the design since the beginning, but it may not be apparent to some folks. --- Documentation/design_www.txt | 17 + 1 file changed, 17 insertions(+) diff --git a/Documentation/design_www.txt b/Documentation/design_www.txt index 240fa50

[PATCH] INSTALL: update for 1.3.0+, clarify IO::Compress

2020-02-27 Thread Eric Wong
IO::Compress is required for v2 inboxes and overview indices, after all, but it is often pulled in by other packages (HTTP::Message via Plack::Test). --- INSTALL | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/INSTALL b/INSTALL index 7d14ca55..4f0217a3 100644 --- a/INST

[pushed] msgtime: assume +0000 if TZ missing when using Date::Parse

2020-03-01 Thread Eric Wong
Eric Wong wrote: > Leah Neukirchen wrote: > > 1) RFC5322/822 invalid Date: headers should be parsed more gracefully > > > > Some old mails had Date: headers without time zones, e.g. > > Date: Sat, 27 Sep 1997 10:02:32 > > > > This results in public-i

[PATCH] INSTALL: refer to the proper Debian version

2020-03-02 Thread Eric Wong
Debian 10.0 was released July 2019, so update our documentation to reflect that. While we're at it, fixup a broken footnote reference for Inline::C, too. --- INSTALL | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/INSTALL b/INSTALL index 4f0217a3..3984df71 100644 --- a/IN

[PATCH] spawn: correctly handle error code

2020-03-03 Thread Eric Wong
Both the C and pure Perl implementions of `pi_fork_exec' returns `-1' on error, not `undef'. --- lib/PublicInbox/Spawn.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/Spawn.pm b/lib/PublicInbox/Spawn.pm index 2d9f734c..ad6be187 100644 --- a/lib/PublicInbox/Sp

[PATCH] git: remove POSIX::dup2 import

2020-03-03 Thread Eric Wong
We rely on spawn/popen_rd for redirects, nowadays. --- lib/PublicInbox/Git.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm index 7eaaeb8b..9c96b3f0 100644 --- a/lib/PublicInbox/Git.pm +++ b/lib/PublicInbox/Git.pm @@ -9,7 +9,7 @

[PATCH] index: use git commit times on missing Date/Received

2020-03-04 Thread Eric Wong
When indexing messages without Date: and/or Received: headers, fall back to using timestamps originally recorded by git in the commit object. This allows git mirrors to preserve the import datestamp and timestamp of a message according to what was fed into git, instead of blindly falling back to t

Re: [PATCH] index: use git commit times on missing Date/Received

2020-03-04 Thread Eric Wong
Erm... sent prematurely :x Warns in tests. -- unsubscribe: one-click, see List-Unsubscribe header archive: https://public-inbox.org/meta/

[PATCH] searchmsg: allow lines (and bytes) to be zero

2020-03-07 Thread Eric Wong
We will occasionally see legit messages with zero lines, be sure we index that count for NNTP clients. I'm not sure about bytes being zero (aside from purged messages), but we should've dealt with that earlier up the stack. --- lib/PublicInbox/SearchMsg.pm | 4 ++-- t/v2mirror.t |

[PATCH] daemon: remove unused $parent_pipe variable

2020-03-07 Thread Eric Wong
We can just create a ParentPipe and let PublicInbox::DS manage its life cycle. While we're at it, favor `\&coderef' over `*coderef' so we're explicit about it being a code ref and not some other ref type. --- lib/PublicInbox/Daemon.pm | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff

Re: How to force stricter threading

2020-03-11 Thread Eric Wong
Konstantin Ryabitsev wrote: > Hello: > > I think public-inbox currently does some heuristic-based threading, > which may actually not be that useful. For example: > > https://lore.kernel.org/linux-renesas-soc/20200217101741.3758-1-geert+rene...@glider.be/ > > None of the [PATCH] messages have

Re: setting up mailman2 and public-inbox

2020-03-11 Thread Eric Wong
Luke Kenneth Casson Leighton wrote: > eric, hi, > > we're having difficulty understanding how to deploy public-inbox in a > way that very simply and as a top and only priority records email in a > public inbox, for the purposes of having it in a git repository, when > that email is coming in via

inbox indexing wishlist [was: [TOPIC 16/17] “I want a reviewer”]

2020-03-14 Thread Eric Wong
Jeff King wrote: > On Fri, Mar 13, 2020 at 09:25:31PM +0000, Eric Wong wrote: > > > > 6. Peff: this is all possible on the mailing list. I see things that look > > > interesting, and have a to do folder. If someone replies, I’ll take it off > > > the list. Once a

[PATCH] http: fix RFC conformance w.r.t. message length

2020-03-16 Thread Eric Wong
We need to favor "Transfer-Encoding: chunked" over the value of the Content-Length header. We should also reject bogus, duplicate and/or unreasonable values for both these, since they can trigger unexpected behavior when combined with other HTTP parsers in proxies such as varnish, nginx, haproxy,

Re: up and running, integrated with exim4 mta

2020-03-18 Thread Eric Wong
lkcl wrote: > hi eric we have things running, hooray, i thought you might appreciate > it is a little different > http://inbox.libre-riscv.org/libre-riscv-dev/new.html Good to know! Btw, if you have DBD::SQLite (and optionally, Search::Xapian), you can run `public-inbox-index $INBOX_DIR' to get

Re: How to force stricter threading

2020-03-19 Thread Eric Wong
Konstantin Ryabitsev wrote: > Hello: > > I think public-inbox currently does some heuristic-based threading, > which may actually not be that useful. For example: > > https://lore.kernel.org/linux-renesas-soc/20200217101741.3758-1-geert+rene...@glider.be/ > > None of the [PATCH] messages have

[PATCH] doc: standards: add references to RFC 5322 (and RFC 822)

2020-03-19 Thread Eric Wong
RFC 5322 is the latest one in this line, but much documentation and even command-line options in other programs (e.g. git) refer to RFC 2822 or even RFC 822. --- Documentation/standards.perl | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Documentation/standards.perl b/Docum

Re: How to force stricter threading

2020-03-19 Thread Eric Wong
Eric Wong wrote: > So the "Patchwork summary for: linux-renesas-soc" message: > > https://lore.kernel.org/linux-renesas-soc/158229483332.12219.5639020605006542672.git-patchwork-summ...@kernel.org/raw > > has the following header: > > References: <20200217101

[PATCH 1/6] www: update ->preload for newer modules

2020-03-19 Thread Eric Wong
We'll also avoid explicitly loading standard library modules like POSIX and Digest::SHA, here; instead we load our own modules and let those load whatever non-PublicInbox:: modules they need. --- lib/PublicInbox/WWW.pm | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/

[PATCH 0/6] daemon: reduce fragmentation via preload

2020-03-19 Thread Eric Wong
x27;m wondering if WWW should just preload by default. I'm not sure if anybody uses public-inbox.cgi (or should be using it :P). It's not like we don't ship public-inbox-httpd; and any PSGI implementation could be used for smaller inboxes (or powerful-enough hardware). Eric Won

[PATCH 2/6] wwwlisting: favor "use" over require

2020-03-19 Thread Eric Wong
"use" is also evaluated earlier than "require", so it is favorable for compile-only checking. --- lib/PublicInbox/WwwListing.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/WwwListing.pm b/lib/PublicInbox/WwwListing.pm index c063fca6..a8aecaf7 100644 ---

[PATCH 6/6] viewdiff: favor `qr' to precompile regexps

2020-03-19 Thread Eric Wong
We can also avoid `o' regexp modifier, since it isn't recommended by Perl upstream, anymore (although we don't have any bugs or unintended behavior because of it). --- lib/PublicInbox/ViewDiff.pm | 47 - 1 file changed, 26 insertions(+), 21 deletions(-) diff --

[PATCH 3/6] wwwlisting: avoid lazy loading JSON module

2020-03-19 Thread Eric Wong
We already lazy-load WwwListing for the CGI script, and hiding another layer of lazy-loading makes things difficult to do WWW->preload. We want long-lived processes to do all long-lived allocations up front to avoid fragmentation in the allocator, but we'll still support short-lived processes by l

[PATCH 5/6] daemon: do more immortal allocations up front

2020-03-19 Thread Eric Wong
Doing immortal allocations late can cause those allocations to end up in places where it fragments the heap. So do more things up front for long-lived daemons. --- lib/PublicInbox/NNTPD.pm | 4 lib/PublicInbox/WWW.pm | 21 + 2 files changed, 21 insertions(+), 4 deletio

[PATCH 4/6] www: avoid `state' usage to perform allocations up-front

2020-03-19 Thread Eric Wong
We want WWW->preload to get as many immortal allocations done as possible, and the `state' feature from Perl 5.10 prevents that. --- lib/PublicInbox/SolverGit.pm | 13 +++-- lib/PublicInbox/ViewDiff.pm | 6 +++--- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/lib/Public

[PATCH] examples/*.psgi: add examples for -httpd

2020-03-19 Thread Eric Wong
public-inbox-httpd should work with any PSGI files, so make it more apparent to people reading .psgi examples. --- examples/cgit.psgi | 5 - examples/highlight.psgi| 4 examples/newswww.psgi | 5 - examples/public-inbox.psgi | 5 + 4 files changed, 17 insertions(+

Re: up and running, integrated with exim4 mta

2020-03-19 Thread Eric Wong
lkcl wrote: > On Thu, Mar 19, 2020 at 3:06 AM Eric Wong wrote: > > > a section to disable spam and also adding the listid to the config is > > > critical otherwise public-inbox-mda fails silently. > > > > There's also '--no-precheck' on the com

[PATCH 7/9] v2: pass smsg in more places

2020-03-20 Thread Eric Wong
We can pass fewer order-dependent args to V2Writable::do_idx and SearchIdxShard::index_raw by passing the smsg object, instead. --- lib/PublicInbox/SearchIdxShard.pm | 27 ++-- lib/PublicInbox/V2Writable.pm | 52 +++ 2 files changed, 42 insertions(+), 37

[PATCH 0/9] preserve time and date of initial commit

2020-03-20 Thread Eric Wong
mess left in 1 and 2, Finally, patch 9 fixes the corner-case-of-corner-cases for dealing with multi-MID messages which require a one-off queue to store the git commit/author times instead of overloading msgmap. Eric Wong (9): index: use git commit times on missing Date/Received v2writable: pres

[PATCH 9/9] v2: SDBM-based multi Message-ID queue

2020-03-20 Thread Eric Wong
This lets us store author and committer times for deferred indexing messages with ambiguous Message-IDs. This allows us to reproducibly reindex messages with the git commit and author times when a rare message lacks Received and/or Date headers while having ambiguous Message-IDs. --- MANIFEST

[PATCH 5/9] overidx: parse_references: less error-prone args

2020-03-20 Thread Eric Wong
Favor `$smsg->{mid}' instead of `$mid0' to reduce parameters down-the-line, but favor passing the Email::MIME::Header object around instead of relying on the bloat-prone `$smsg->{mime}' and calling ->header_obj on it. --- lib/PublicInbox/OverIdx.pm | 8 +++- lib/PublicInbox/SearchIdx.pm | 2

[PATCH 4/9] smsg: to_doc_data: use existing fields

2020-03-20 Thread Eric Wong
No need to pass extra parameters to this method, since smsg has universal meanings for {blob} and {mid}. --- lib/PublicInbox/OverIdx.pm | 2 +- lib/PublicInbox/SearchIdx.pm | 4 +++- lib/PublicInbox/Smsg.pm | 7 +++ 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/lib/Publ

[PATCH 8/9] *idx: pass smsg in even more places

2020-03-20 Thread Eric Wong
We can finally get rid of the awkward, ad-hoc use of V2Writable, SearchIdx, and OverIdx args for passing {cotime} and {autime} between classes. We'll still use those git time fields internally within V2Writable and SearchIdx for (re)indexing, but that's not worth avoiding as a fallback. --- lib/P

[PATCH 6/9] *idx: pass $smsg in more places instead of many args

2020-03-20 Thread Eric Wong
We can pass blessed PublicInbox::Smsg objects to internal indexing APIs instead of having long parameter lists in some places. The end goal is to avoid parsing redundant information each step of the way and hopefully make things more understandable. --- lib/PublicInbox/OverIdx.pm| 14 +++-

[PATCH 2/9] v2writable: preserve timestamps from import

2020-03-20 Thread Eric Wong
While v2 indexing is triggered immediately after writing the commit to the git repository, there may be a gap between when PublicInbox::Import generates a timestamp and when PublicInbox::SearchIdx sees the message. So follow the mirror indexing behavior and take the to-be-indexed (time|date)stamps

[PATCH 3/9] rename PublicInbox::SearchMsg => PublicInbox::Smsg

2020-03-20 Thread Eric Wong
Since the introduction of over.sqlite3, SearchMsg is not tied to our search functionality in any way, so stop confusing ourselves and future hackers by just calling it "PublicInbox::Smsg". Add a missing "use" in ExtMsg while we're at it. --- Documentation/mknews.perl | 4 ++--

[PATCH 1/9] index: use git commit times on missing Date/Received

2020-03-20 Thread Eric Wong
When indexing messages without Date: and/or Received: headers, fall back to using timestamps originally recorded by git in the commit object. This allows git mirrors to preserve the import datestamp and timestamp of a message according to what was fed into git, instead of blindly falling back to t

[PATCH 0/2] wwwlisting: fixup warnings :x

2020-03-20 Thread Eric Wong
I made tests run so quickly that I missed some warnings :x Eric Wong (2): wwwlisting: use first successfully loaded JSON module t/www_listing: avoid 'once' warnings lib/PublicInbox/WwwListing.pm | 2 +- t/www_listing.t | 2 +- 2 files changed, 2 insertions(+), 2

[PATCH 1/2] wwwlisting: use first successfully loaded JSON module

2020-03-20 Thread Eric Wong
And not the last... I only noticed this since JSON::PP::Boolean was spewing redefinition warnings via overload.pm Fixes: 8fb8fc52420ef669 ("wwwlisting: avoid lazy loading JSON module") --- lib/PublicInbox/WwwListing.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/Publi

[PATCH 2/2] t/www_listing: avoid 'once' warnings

2020-03-20 Thread Eric Wong
We reach into the WwwListing package directly to retrieve that JSON encoder/decoder object, and we can't rely on `use' since WwwListing loading may fail if Plack is missing. --- t/www_listing.t | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/www_listing.t b/t/www_listing.t in

[PATCH 01/11] qspawn: reinstate filter support, add gzip filter

2020-03-20 Thread Eric Wong
We'll be supporting gzipped from sqlite3(1) dumps for altid files in future commits. In the future (and if we survive), we may replace Plack::Middleware::Deflater with our own GzipFilter to work better with asynchronous responses without relying on memory-intensive anonymous subs. --- MANIFEST

[PATCH 10/11] altid: warn about non-word prefixes

2020-03-20 Thread Eric Wong
We only support searching on prefixes matching /\A\w+\z/ because Xapian requires ':' to delimit the prefix and splits on spaces without quotes. I've also verified Xapian supports multibyte UTF-8 characters, underscores, and bare numbers as search prefixes, so there's no need to restrict it beyond

[PATCH 08/11] search: clobber -user_pfx on query parser initialization

2020-03-20 Thread Eric Wong
While we don't currently reinitialize the query parser for the lifetime of a PublicInbox::Search object and have no plans to, it's incorrect to be appending to an existing array in case we reininitialize the query parser in the future. --- lib/PublicInbox/Search.pm | 2 +- 1 file changed, 1 insert

[PATCH 11/11] www: add endpoint to retrieve altid dumps

2020-03-20 Thread Eric Wong
This ensures all our indexed data, including data from altid searches (e.g. "gmane:$ARTNUM") is retrievable. It uses a "POST" request to avoid wasting cycles when invoked by crawlers, since it could potentially be several megabytes of data not indexable by search engines. --- MANIFEST

[PATCH 05/11] wwwstream: oneshot sets content-length

2020-03-20 Thread Eric Wong
PublicInbox::HTTP will chunk, otherwise, and that's extra overhead which isn't needed. --- lib/PublicInbox/WwwStream.pm | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm index fceef745..985e0262 100644 ---

[PATCH 00/11] www: export SQLite altid dumps

2020-03-20 Thread Eric Wong
To improve reproducibility in mirrors, altid dumps can be exported via "POST /$INBOX_URL/$prefix.sql.gz". $prefix is something like "gmane" (though the search prefix is "gmane:" with a colon). Eric Wong (11): qspawn: reinstate filter support, add gzip filter

[PATCH 09/11] wwwtext: show thread endpoint w/ indexlevel=basic

2020-03-20 Thread Eric Wong
And show contact info when there's no indexing, at all. Installations where Xapian is too expensive can still support threading since it only depends on SQLite, so we need to inform users of what's available. --- lib/PublicInbox/WwwText.pm | 10 +- 1 file changed, 9 insertions(+), 1 deleti

[PATCH 07/11] qspawn: handle ENOENT (and other errors on exec)

2020-03-20 Thread Eric Wong
As sqlite3(1) and other executables may become unavailable or uninstalled while a daemon runs, we need to gracefully handle errors in those cases. --- lib/PublicInbox/Qspawn.pm | 58 ++- t/httpd-corner.psgi | 7 + t/httpd-corner.t | 25 ++

[PATCH 06/11] mbox: need_gzip uses WwwStream::oneshot

2020-03-20 Thread Eric Wong
This makes the error page more consistent. Not that it really matters since Compress::Raw::Zlib and IO::Compress packages have been distributed with Perl since 5.10.x. Of course, zlib itself is also a dependency of git. --- lib/PublicInbox/Mbox.pm | 16 +++- 1 file changed, 7 inserti

[PATCH 04/11] extmsg: use WwwResponse::oneshot

2020-03-20 Thread Eric Wong
No reason to use the ->getline interface for small responses. --- lib/PublicInbox/ExtMsg.pm| 4 ++-- lib/PublicInbox/WwwStream.pm | 7 --- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/lib/PublicInbox/ExtMsg.pm b/lib/PublicInbox/ExtMsg.pm index 44884ad2..74a95cf9 100644 --

[PATCH 02/11] gzipfilter: lazy allocate the deflate context

2020-03-20 Thread Eric Wong
zlib contexts are memory-intensive, particularly when used for compression. Since the gzip filter may be sitting in a limiter queue for a long period, delay the allocation we actually have data to translate, and not a moment sooner. --- lib/PublicInbox/GzipFilter.pm | 15 ++- 1 file c

[PATCH 03/11] wwwstream: introduce oneshot API to avoid ->getline

2020-03-20 Thread Eric Wong
The ->getline API is only useful for limiting memory use when streaming responses containing multiple emails or log messages. However it's unnecessary complexity and overhead for callers (PublicInbox::HTTP) when there's only a single message. --- lib/PublicInbox/ViewVCS.pm | 8 +--- lib/Pub

[PATCH v2] t/www_listing: avoid 'once' warnings

2020-03-20 Thread Eric Wong
We reach into the WwwListing package directly to retrieve that JSON encoder/decoder object, and we can't rely on `use' since WwwListing loading may fail if Plack is missing. --- *sigh* v1 was wrong :x t/www_listing.t | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/t/www

[PATCH] t/msgtime: skip test if timezone isn't UTC

2020-03-20 Thread Eric Wong
Date::Parse falls back to using the local timezone when it's missing from an email, so only test in a reasonable TZ (UTC) for server software. --- t/msgtime.t | 4 1 file changed, 4 insertions(+) diff --git a/t/msgtime.t b/t/msgtime.t index 7c95e547..3f09fb4e 100644 --- a/t/msgtime.t +++ b/t

[PATCH 0/2] daemon: SIGUSR2 fixes

2020-03-22 Thread Eric Wong
I noticed we never had tests for SIGUSR2, so I started writing them and fixed two bugs. Eric Wong (2): daemon: fix SIGUSR2 upgrade with -W0 (no workers) daemon: unlink .oldbin PID file correctly lib/PublicInbox/Daemon.pm | 7 ++- t/httpd-unix.t| 100

[PATCH 1/2] daemon: fix SIGUSR2 upgrade with -W0 (no workers)

2020-03-22 Thread Eric Wong
Disabling workers via `-W0' blesses the contents of the @listeners array, so we need to ensure we call fcntl on the GLOB ref in ->{sock}. Add tests to ensure USR2 works regardless of whether workers are enabled or not. --- lib/PublicInbox/Daemon.pm | 3 ++ t/httpd-unix.t| 99

[PATCH 2/2] daemon: unlink .oldbin PID file correctly

2020-03-22 Thread Eric Wong
We need to track the PID file having ".oldbin" appended to it while a SIGUSR2 upgrade is in progress and ensure it is unlinked on SIGQUIT. --- lib/PublicInbox/Daemon.pm | 4 ++-- t/httpd-unix.t| 1 + 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/Daemon.

Re: p-i Gentoo package

2020-03-22 Thread Eric Wong
Thomas Schneider wrote: > Hi, > > I’ve created a package for Gentoo to ease installing public-inbox. It > is located in my overlay[0], which can be activated with `layman -a qsx` > or `eselect repository enable qsx`. Thanks Thomas. I don't know much about Gentoo these days, but hopefully it wa

[PATCH 3/2 (squash)] t/httpd-unix: fix race in SIGUSR2 test

2020-03-24 Thread Eric Wong
We need to stop workers in the old process, check the socket and ensure $new_pid is ready to receive signals before killing it. --- t/httpd-unix.t | 6 ++ 1 file changed, 6 insertions(+) diff --git a/t/httpd-unix.t b/t/httpd-unix.t index 939431f4..a0fe1e31 100644 --- a/t/httpd-unix.t +++ b/t/

[PATCH 2/3] wwwtext: show altid instructions in config

2020-03-26 Thread Eric Wong
Exposing altid dumps will help and ensure total reproducibility of existing instances. AFAIK, sqlite3(1) can't execute arbitrary code, so it's not quite as fashionable as the "curl | bash" stuff the cool people are doing, these days :P --- lib/PublicInbox/WwwText.pm | 20 ++-- 1 f

[PATCH 3/3] wwwaltid: inform users to use POST instead of GET

2020-03-26 Thread Eric Wong
Seeing the example config linkified, some users may inevitably try to following it in a browser with a GET request. Provide a helpful message to inform users to use POST instead of attempting to treat /$INBOX/$ALTID.sql.gz as a Message-Id. --- lib/PublicInbox/WWW.pm | 2 ++ lib/PublicInbox/

[PATCH 0/3] wwwaltid: add pointers for usability

2020-03-26 Thread Eric Wong
Provide helpful hints and pointers in existing config example to reproduce altid DBs when mirroring Eric Wong (3): inbox: altid_map becomes a method wwwtext: show altid instructions in config wwwaltid: inform users to use POST instead of GET lib/PublicInbox/Inbox.pm| 15

<    2   3   4   5   6   7   8   9   10   11   >