1/2 fixes a bug while checking over the blob OID indexing code
Eric Wong (2):
searchidx: drop redundant decl in index_git_blob_id
cindex: index full (40/64 char) hex blob OIDs
lib/PublicInbox/CodeSearchIdx.pm | 15 +--
lib/PublicInbox/SearchIdx.pm | 1 -
t/cindex.t
This future proofs the index against git auto-abbreviation
needing more characters as the repo grows. It'll be useful for
joining against inboxes using dfpre.
As with emails, we'll continue indexing abbreviated blob OIDs
down to 7 hex characters so a SHA-1 git repo will have all
abbreviations of
2/4 probably affects NetBSD and OpenBSD, too, but tests don't
always fail...
Eric Wong (4):
t/xap_helper: make sendmsg errors more obvious
xap_helper.h: fix non-assignable stderr case
tests: note kevent+tmpfs failures on DragonFly <= 6.4
xap_helper: enable stderr assignment on Dragon
I forgot to set TMPDIR=/path/to/non-tmpfs again.
---
lib/PublicInbox/TestCommon.pm | 23 ++-
t/dir_idle.t | 7 +--
t/kqnotify.t | 2 +-
3 files changed, 28 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/TestCommon.pm
By ignoring SIGPIPE, we hit our own error path and emit an informative
error message instead of dying abruptly and requiring somebody to run
`echo $?' to see the child status from their shell.
---
t/xap_helper.t | 1 +
1 file changed, 1 insertion(+)
diff --git a/t/xap_helper.t b/t/xap_helper.t
It looks like DragonFly inherited this from FreeBSD to
allow us to save us some syscalls.
---
lib/PublicInbox/xap_helper.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/PublicInbox/xap_helper.h b/lib/PublicInbox/xap_helper.h
index c1ab66f3..1f8c426b 100644
---
I mixed up "flush" with "close" :x
Fixes: 87b7f633f241 (xap_helper: implement mset endpoint for WWW, IMAP, etc...)
---
lib/PublicInbox/xap_helper.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/PublicInbox/xap_helper.h b/lib/PublicInbox/xap_helper.h
index
Kyle Meyer wrote:
> Eric Wong writes:
> > +Treat the name of the public inbox as it's unqualified URL when
>
> s/it's/its/
Thanks, will push this fix out:
---8<--
Subject: [PATCH] doc: config: fix grammar for nameIsUrl
Reported-by: Kyle Meyer
Link: https://pub
As with mail search, a cindex may be updated while WWW is
serving requests. Thus we must reopen the Xapian DB when
the revision we're using becomes stale.
---
v2: avoid reintroducing load_ct as noted in
https://public-inbox.org/meta/20231130213641.M35664@dcvr/
lib/PublicInbox/CodeSearch.pm |
Eric Wong wrote:
> +++ b/lib/PublicInbox/CodeSearch.pm
> @@ -242,15 +247,21 @@ sub paths2roots {
> \%ret;
> }
>
> +sub load_ct { # retry_reopen cb
> + my ($self, $git_dir) = @_;
> + my @ids = docids_of_git_dir $self, $git_dir or return;
> + for
We no longer fork after cidx_init, so there's no need to spend
CPU cycles on the getpid() syscall, especially since it's no
longer cached on glibc while syscalls are also more expensive
these days due to CPU vulnerability mitigations.
---
lib/PublicInbox/CodeSearchIdx.pm | 22
We no longer vivify the intermediate $ibx->{-hide} hashref,
instead we use $ibx->{-hide_$KEY} directly. This avoids
an intermediate hashref and extra hash table lookups.
---
lib/PublicInbox/CodeSearch.pm | 2 +-
lib/PublicInbox/Inbox.pm | 8 ++--
lib/PublicInbox/WwwListing.pm | 2 +-
3
This is a convenient (and slightly memory-saving) alternative to
specifying a `publicinbox.*.url' entry for every single inbox
when using publicinbox.wwwListing.
---
Documentation/public-inbox-config.pod | 19 ++-
lib/PublicInbox/WwwListing.pm | 21 +
2
For inboxes associated with an extindex (currently only the
special "all") one, we can share the git process across
all those inboxes unambiguously when retrieving full SHA-1
blobs.
The comment for my proposed patch is also out-of-date as that
git speedup has been a part of git since 2.33.
---
As with mail search, a cindex may be updated while WWW is
serving requests. Thus we must reopen the Xapian DB when
the revision we're using becomes stale.
---
lib/PublicInbox/CodeSearch.pm | 25 +++--
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git
This brings a no-op -cindex scan of a git.kernel.org mirror
down from 70s to 10s with a hot cache on a busy machine.
CPU-intensive SHA-256 fingerprinting of the `git show-ref'
result can be parallelized on shard workers. Future changes can
move more of the initial scan setup phase into shard
When setting up stdin for commands, the write_file API is
convenient enough nowadays to not be worth having special
support with process spawning.
When reading stdout of commands, we should probably be using
utf8_maybe everywhere since there'll always be legacy encodings
in git repos.
Reading
It's entirely possible for public inboxes to have zero patches
in them, so the amount of match slots may not match match the
number of joined ekeys.
---
lib/PublicInbox/CodeSearch.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/PublicInbox/CodeSearch.pm
We no longer trigger git cleanups from the Inbox package since
`git cat-file' users have their own cleanup to support git
coderepos not associated with any inbox.
This change means we unconditionally expire SQLite and Xapian
FDs and some internal caches regardless of git activity. The
old logic
It saves some code in case we keep libgit2 around.
---
lib/PublicInbox/Gcf2.pm | 16
lib/PublicInbox/Git.pm | 27 ++-
2 files changed, 18 insertions(+), 25 deletions(-)
diff --git a/lib/PublicInbox/Gcf2.pm b/lib/PublicInbox/Gcf2.pm
index
Notable changes:
10/15 provides a huge speedup which will hopefully make
future developments faster.
12/15 probably obsoletes libgit2 for extindex "all" users.
13/15 can save some memory with many inboxes while making
configuration easier.
Eric Wong (15):
cindex: fix store_repo+r
This will allow WWW to use a combined LeiALE-like
thing to reduce git processes.
---
lib/PublicInbox/CodeSearch.pm| 27 --
lib/PublicInbox/CodeSearchIdx.pm | 161 +--
2 files changed, 127 insertions(+), 61 deletions(-)
diff --git
We only use it as a boolean flag, and there's no need to waste
space for common, non-error cases.
---
lib/PublicInbox/CodeSearchIdx.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/CodeSearchIdx.pm b/lib/PublicInbox/CodeSearchIdx.pm
index
This fixes the case where we're running both SHA-256 and SHA-1.
There's no tests for SHA-256, yet, but the bug is pretty obvious
upon reading the code.
---
lib/PublicInbox/CodeSearchIdx.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/PublicInbox/CodeSearchIdx.pm
It's possible to update the fingerprint for a given repo when we
have no commits to index on because they were already done for
another repo. Thus we'll always vivify $repo_ctx->{active}
before calling store_repo since $active may've been undef.
---
lib/PublicInbox/CodeSearchIdx.pm | 6 +++---
1
Explicitly drop support for "\n" in git coderepo pathnames as
we do other stuff. Gcf2 (our libgit2 helper) was always
broken with "\n" in pathnames, and I'm not sure if cgit config
files work with them, either. Dealing with newline characters
requires extra complexity that I'm not willing to
Štěpán Němec wrote:
>
> I apologize for the late response.
No worries, I still have mails in other places from months
ago I've been meaning to get to :x
> On Mon, 23 Oct 2023 19:58:18 +0000
> Eric Wong wrote:
>
> > Thanks for the info. Just curious, what
Thanks, both patches in this series pushed
Konstantin Ryabitsev wrote:
> On Tue, Nov 28, 2023 at 06:20:03PM +0000, Eric Wong wrote:
> > Though being able to find unanswered threads could be helpful.
>
> Note, I'm not saying it's not a cool feature. :) However, I imagine people
> would be more interested in searching
Konstantin Ryabitsev wrote:
> On Tue, Nov 28, 2023 at 05:35:09PM +0000, Eric Wong wrote:
> > > I understand the reasoning, but I'm not sure we should be trying too hard
> > > to
> > > make public-inbox a patch tracking platform. What makes lei great is
> > &
This ensures the /all/ extindex can have auto-associations
with coderepos just like normal inboxes do.
---
lib/PublicInbox/CodeSearch.pm | 9 +
1 file changed, 9 insertions(+)
diff --git a/lib/PublicInbox/CodeSearch.pm b/lib/PublicInbox/CodeSearch.pm
index 7c0dd063..5c5774cf 100644
---
--no-import-before skips importing entire messages, not just
keywords, so it can cause permanent data loss if -o is pointed
to precious data.
---
Documentation/lei-q.pod | 5 +++--
lib/PublicInbox/LEI.pm | 1 +
t/lei-q-kw.t| 19 ---
3 files changed, 20
We need to load the proper package and fully-qualify the sub
call since we shouldn't load Hval in lei. Some users use this
feature even if its broken, oh well :<
---
lib/PublicInbox/MailDiff.pm | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/MailDiff.pm
Found by tidy(1) while dealing with other stuff.
---
lib/PublicInbox/MailDiff.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/PublicInbox/MailDiff.pm b/lib/PublicInbox/MailDiff.pm
index 89284e39..e4e262ef 100644
--- a/lib/PublicInbox/MailDiff.pm
+++
Well, I actually found the mail_diff bugs while looking into
micro-optimizing -cindex.
Eric Wong (4):
lei q: fix --no-import-before completion + docs
www: mail_diff: fix optional address obfuscation
www: mail_diff: add final newline before diffing
www: mail_diff: add missing tag
This gets rid of the "\ No newline at end of file"
since it's distracting noise.
---
lib/PublicInbox/MailDiff.pm | 2 +-
t/lei-mail-diff.t | 1 +
t/psgi_v2.t | 1 +
3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/MailDiff.pm
Konstantin Ryabitsev wrote:
> On Tue, Nov 28, 2023 at 12:10:28AM +0000, Eric Wong wrote:
> > Would they be useful?
> >
> > It's not currently possible to quickly search for whether or not
> > a term (e.g. patchid:) is present in a Xapian document. Having
> >
It ought to help a bit with organization since xap_helper.h
is getting somewhat large and we'll need new endpoints to
support WWW, lei, and whatever else that needs to come.
---
MANIFEST| 1 +
lib/PublicInbox/XapHelperCxx.pm | 10 +-
lib/PublicInbox/xap_helper.h|
Code search will require SCM_RIGHTS, and Inline::C on BSDs
probably isn't too onerous a dependency for new features as
all the ones I've tested have it packaged.
Furthermore, requiring SCM_RIGHTS isn't far off since OpenBSD's
Perl is patched to route the `syscall' perlop through libc[1],
while
Absolute pathnames of git coderepos are stored in the cindex,
but we should favor paths relative to $ENV{PWD} since it
respects symlinks in the heirarchy.
Respecting symlinks makes it easier to migrate cindex to
new storage as old storage wears out and to relocate the
storage device onto another
Only worktrees need to use `git rev-parse --git-path', so avoid
the spawn overhead of a new process. With the SolverGit.pm
limit on coderepo scans disabled and scanning over 800 git repos
for git@vger matches, this reduces up xt/solver.t times by
roughly 25%.
---
lib/PublicInbox/Git.pm | 17
File::Spec->abs2rel doesn't touch the filesystem at all when
given an absolute base arg ($env->{PATH_INFO}), so we can rely
on it to generate relative links that work with the `mount'
from Plack::Builder and also people running `wget -r' mirrors.
---
lib/PublicInbox/Hval.pm | 12 +++-
1
We don't want hundreds of git cat-file processes for coderepos
lingering around.
---
lib/PublicInbox/Git.pm | 7 ++-
lib/PublicInbox/SolverGit.pm | 3 +++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index
The HTML is still extremely rough, but links seem to be mostly
working...
---
MANIFEST | 1 +
lib/PublicInbox/CodeSearch.pm | 8 +++
lib/PublicInbox/RepoList.pm| 39 ++
lib/PublicInbox/WwwCoderepo.pm | 3 +++
Data::Dumper+B::Deparse seems fast enough to generate cache keys
with, so this makes updating and developing tests easier (as
opposed to forcing the developer to change the identifier). The
main downside is we'll have to deal with cache expiration, but
"make clean" seems overly aggressive already
Accepting @ARGV without switches ends up being ambiguous with
optional parameters for --join and --show. Requiring users to
specify `--join=' or `--show=' is a bit awkward (as it with
-clone --objstore= and the like, but that is historical baggage
we need to carry at this point...)
---
<5 minutes if
done frequently.
New performance problem: solver could definitely be smarter
about dealing with common roots/groups. For the longest time,
I've only had 1 coderepo per-inbox, having hundreds is wacky.
Actual searching against the cindex isn't done, yet, but
that's kinda straightf
The C++ version will allow us to take full advantage of Xapian's
APIs for better queries, and the Perl bindings version can still
be advantageous in the future since we'll be able to support
timeouts effectively.
---
MANIFEST| 1 +
Makefile.PL | 8
This becomes noticeable when loading lots of coderepos on
my local mirror of git.kernel.org now that we can load repos
from cindex.
---
lib/PublicInbox/Git.pm | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index
This is a major step in solving the problem of having to
manually associate hundreds/thousands of coderepos with
hundreds/thousands of public-inboxes to power solver
(and more).
---
lib/PublicInbox/CodeSearch.pm| 153 +--
lib/PublicInbox/CodeSearchIdx.pm | 42
We store the full path name and xap_terms already removes
the `P' character, so the loop and substr calls are a
no-op replacing `/' with `/'.
---
lib/PublicInbox/CodeSearch.pm | 1 -
1 file changed, 1 deletion(-)
diff --git a/lib/PublicInbox/CodeSearch.pm b/lib/PublicInbox/CodeSearch.pm
index
We don't want to be accessing uninitialized variables on
process teardown since much of our control flow revolves
around DESTROY for dependency handling.
---
lib/PublicInbox/CodeSearchIdx.pm | 5 +
1 file changed, 5 insertions(+)
diff --git a/lib/PublicInbox/CodeSearchIdx.pm
Would they be useful?
It's not currently possible to quickly search for whether or not
a term (e.g. patchid:) is present in a Xapian document. Having
the ability to do so would make it easier to find non-patch messages,
or easily filter down to cover letters, bot replies, etc...
Thus adding
While MTAs seem to stop '\0' from appearing in headers, users
fetching archives via git remain susceptible to having '\0' land
in archives. So we'll filter them out of Xapian and SQLite DBs
to avoid interopability problems with CLI tools since there's no
known messages in lore or any of my
Thanks, pushed as commit 577e421a0815e66f965bd4317adad5aeea3cc52a
with your Tested-By (sent privately in <87leaj4ea1@collabora.com>)
It's not async-signal-safe and the glibc implementation uses
malloc via asnprintf. Practically it's not a problem unless the
kernel OOMs and the write(2) fails to the self-pipe.
---
lib/PublicInbox/xap_helper.h | 29 -
1 file changed, 12 insertions(+), 17 deletions(-)
Already pushed out since I forgot which VM I was on :x
Eric Wong (2):
xap_helper: avoid strerror(3) inside signal handler
xap_helper.h: avoid some off_t vs size_t problems
lib/PublicInbox/xap_helper.h | 59 ++--
1 file changed, 30 insertions(+), 29 deletions(-)
We'll introduce a helper to cast off_t to size_t consistently
for mmap/munmap/calloc calls which require size_t. Also, an
extra check for multiplication overflow can be helpful just
in case we end up with a gigantic file roots file.
---
lib/PublicInbox/xap_helper.h | 30
Ricardo Cañuelo wrote:
> where the '&' character is escaped in the text of the tag but
> not in the href attributes. Shouldn't these be escaped as well? If so,
> the fix should be most likely located in WwwAtomStream.pm:atom_header().
Thanks for the bug report. Yes, '&' should be escaped,
The LibreSSL 3.7.2 on my OpenBSD 7.3 VM seems return 7 bytes of
junk data before EOF/ECONNRESET when a client attempts to write
plain-text to a TLS socket.
---
t/nntpd-tls.t | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/t/nntpd-tls.t b/t/nntpd-tls.t
index
Reset gets called on END{} anyways to workaround DBI lifetime
problems, so there's no need to call it near exit. We can't
replace calls to POSIX::_exit with `exit' to force END{} to
run just yet, as there are still some lingering destruction
ordering problems on newer DBI and or Perls.
---
6/7 ought to fix another hang in t/lei-q-save.t when writing to
v2 outputs.
Much of this stuff will be relevant to code search since Xapian
searches will be moved to C++ (if available) to support features
which aren't usable from Perl bindings and allow more
predictable performance anyways.
Eric
The long-term plan is to share non-blocking read buffering logic
with HTTP/NNTP/IMAP/POP3 and also XapClient.
---
lib/PublicInbox/Gcf2Client.pm | 1 -
lib/PublicInbox/Git.pm| 59 ++-
lib/PublicInbox/IO.pm | 53 ++-
3
This ensures our tests actually test the -j0 and -j1 cases
properly.
---
lib/PublicInbox/XapClient.pm | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/PublicInbox/XapClient.pm b/lib/PublicInbox/XapClient.pm
index 7737e30d..1f9ddccc 100644
--- a/lib/PublicInbox/XapClient.pm
+++
Reset gets called on END{} anyways to workaround DBI lifetime
problems, so there's no need to call it near exit.
We'll also replace many calls to POSIX::_exit with the normal
`exit'. This ensures END{} gets called since all of our
destructors are fork-safe nowadays so POSIX::_exit is
While the {inflight} array should be tied to the IO object even
more tightly, that's not an easy task with our current code. So
take some small steps by introducing a gcf_inflight helper to
validate the ownership of the process and to drain the inflight
array via the awaitpid callback.
This
As with our popen_* uses, we can simplify callers by using
attach_pid to handle automatic reaping upon close.
---
lib/PublicInbox/CodeSearchIdx.pm | 10 ++
lib/PublicInbox/XapClient.pm | 4 +++-
t/xap_helper.t | 3 +--
3 files changed, 6 insertions(+), 11
No need to waste memory bandwidth when we can just rely on
the preprocessor to load the header.
---
lib/PublicInbox/XapHelperCxx.pm | 10 +++---
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/lib/PublicInbox/XapHelperCxx.pm b/lib/PublicInbox/XapHelperCxx.pm
index
This also reduces repetition in the setup code.
---
lib/PublicInbox/XapClient.pm| 4 +---
lib/PublicInbox/XapHelperCxx.pm | 1 +
t/xap_helper.t | 2 +-
3 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/XapClient.pm b/lib/PublicInbox/XapClient.pm
We can't assume signals are blocked when neither signalfd nor
EVFILT_SIGNAL are in use. So just return an empty result so
the caller can recalculate the timeout.
I found this bug while making xt/httpd-async-stream.t
use our event loop to reap processes but have abandoned
that effort for now
Eric Wong (3):
http: fix HTTP/1.1 pipelining during long async requests
select+poll: have caller retry on EINTR
ds: long_step: eliminate redundant fileno call
lib/PublicInbox/DS.pm | 1 -
lib/PublicInbox/DSPoll.pm | 6 ++--
lib/PublicInbox/HTTP.pm | 17 +-
lib/PublicInbox
We already stash the associated FD for reporting at startup and
don't need to call `fileno' again. Found via manual code
inspection while considering the effort to make async {forward}
from PublicInbox::HTTP more like the generic long_response API
and {long_cb} field used by IMAP/NNTP/POP3.
---
We must not attempt to read request bodies from the HTTP client
while processing a long request since that drains pipelined
requests. The NNTP/IMAP/POP3 event_step callbacks follow the
same behavior when {long_cb} is present from ->long_response.
This bug has little real-world consequence since
Eric Wong wrote:
> Konstantin Ryabitsev wrote:
> > I'm quite happy to not require libgit2 -- I've always found it easier to
> > just
> > use git plumbing commands even if this requires exec'ing an external
> > executable.
Sorry, I forget, individual inboxes via
Eric Wong wrote:
> Štěpán Němec wrote:
> > Eric Wong wrote:
> > > +Rerun deduplication on messages of with the given Message-ID or
> >^^^
> > not so fast :-P
>
> Thanks. Will s/of // when I commit when more awake.
&g
Štěpán Němec wrote:
> Eric Wong wrote:
> >
> > I'm also wondering if it's necessary to have a blurb about NOT
> > supporting comma-delimited Message-IDs on the CLI, since some
> > strange Message-IDs may have a comma in them.
>
> I think the description is a
We don't want the milter to munge List-Unsubscribe headers from
external (incoming) mlmmj lists, only lists hosted on the server
running unsubscribe.milter.
Adding support for an allow_domains file should've been enough,
but this further restricts the milter to only operating on Postfix
Yes, that was valid Perl syntax :x
---
t/cindex-join.t | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/t/cindex-join.t b/t/cindex-join.t
index 0972afa4..2836eb6c 100644
--- a/t/cindex-join.t
+++ b/t/cindex-join.t
@@ -41,7 +41,7 @@ EOM
while (my ($url, $v, $ng) =
Štěpán Němec wrote:
> Eric Wong wrote:
> > +++ b/Documentation/public-inbox-extindex.pod
> > @@ -47,6 +47,20 @@ C set to C and their respective Xapian
> > public-inboxes where cross-posting is common, this allows
> > significant space savings on Xapian indices.
>
`reset' means we want to ignore existing join data, while
the default (non-reset) means we perform an incremental
join while taking into account existing (fuzzy) join data.
---
lib/PublicInbox/CodeSearchIdx.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
We've had it since v1.7.0 when -extindex was introduced,
but it was never documented outside of commit messages.
---
Documentation/public-inbox-extindex.pod | 26 +
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/Documentation/public-inbox-extindex.pod
This fixes t/lei-q-save.t getting stuck since $self->{ale} is
already gone by the time DESTROY gets called.
---
lib/PublicInbox/LeiSavedSearch.pm | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/LeiSavedSearch.pm
b/lib/PublicInbox/LeiSavedSearch.pm
For users hosting read-only mirrors (via clone|fetch) and feeding
inboxes via -watch
---
I'm also considering a `fetchonly' directive for -learn/-mda,
too; but I think overloading watch can coexist with that...
Documentation/public-inbox-watch.pod | 5 -
lib/PublicInbox/Watch.pm
We only care about error checking when stdout is an mbox output
pointed to a pathname. This is noticeable with `lei up' with
multiple non-mbox* destinations. We'll also ensure throwing
exceptions to trigger lei->x_it from lei->do_env results in the
epoll/kqueue watch being discarded, otherwise
We'll also disable GC since fetch/clone already leaves us with
packs.
---
Will squash this into 3/3
t/cindex-join.t | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/t/cindex-join.t b/t/cindex-join.t
index fad30d93..0972afa4 100644
--- a/t/cindex-join.t
+++
3/3 fleshes out more join functionality, including storing the
join data in compressed JSON as Xapian metadata and loading it
as a Perl hash won't be excessive (compared to having 30-50k
inbox names+paths in memory).
Eric Wong (3):
cindex: avoid unneeded and redundant `local' calls
doc/cindex
We only set $MAX_SIZE at startup, and there's no need to
use a local $self->{roots} for the per-repo roots array.
---
lib/PublicInbox/CodeSearchIdx.pm | 13 ++---
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/lib/PublicInbox/CodeSearchIdx.pm
The association data is just stored as deflated JSON in Xapian
metadata keys of shard[0] for now. It should be reasonably
compact and fit in memory for now since we'll assume sane,
non-malicious git coderepo history, for now.
The new cindex-join.t test requires TEST_REMOTE_JOIN=1 to be
set in
There's no point in duplicating --no-fsync documentation across
manpages. --dangerous can be useful for reducing SSD wear, so
add a pointer to it as well.
---
Documentation/public-inbox-cindex.pod | 7 +--
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git
Informal benchmarks show a rough 5% indexing improvement on an
SMP system when there are idle cores due to Xapian shards being
I/O bound (since `git patch-id' is mainly CPU bound).
This is only parallelized on a per-patch basis. Further
increasing parallelism would increase complexity and
I encountered the odd lack of `return' while chasing Gcf2 bugs
on CentOS 7.x which resulted in commit 7d06b126e939
("gcf2: fix autodie usage for older Perl") and commit e618c7654794
("gcf2client: add alias for PublicInbox::Git::fail") before
realizing the lack of `return' here wasn't the culprit
We want to use the filenames tail will watch, not the number of
args passed to the `tail_f' subroutine.
Fixes: 9231d2e7b93f (tests: map CLOFORK->FD_CLOEXEC temporarily for `tail -f')
---
lib/PublicInbox/TestCommon.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
"Robin H. Johnson" wrote:
> Hi,
>
> This is more of a feature request / request for pointers on how to tweak
> the design to support something, and it might be suited to maintaining
> as a local patch.
Since the indexing internals are somewhat in flux and tied to
Xapian and Perl, I'm happy to
"Robin H. Johnson" wrote:
> The date is based on arrival time at the archive ingest.
>
> For some of the very old lists, we do have a list of message-ids that we
> know existed but aren't captured in the archive, and those mails have
> been added to the old locations if they are ever found
"Robin H. Johnson" wrote:
> Hi!
>
> Writing to see about work in converting Gentoo's (now-broken) other
> archives web interface over into using public-inbox instead.
>
> This is the first of a few questions/bumps along the way.
>
> For historical reasons on the scaling side, the archive
Stale entries from newsgroup name changes (including adding
a `publicinbox..newsgroup' entry when none existed
before) can wreak havoc during a --reindex. So give the
hint to users about running -extindex with --gc to clean
up stale entries.
---
Documentation/public-inbox-extindex.pod | 5 +++--
Konstantin Ryabitsev wrote:
> I'm quite happy to not require libgit2 -- I've always found it easier to just
> use git plumbing commands even if this requires exec'ing an external
> executable.
Yeah, I don't have libgit2 installed on most of my systems, either.
Hoping git itself eventually makes
We need to consistently check the exit code of pigz|gzip|xz|bzip2
when writing to compressed mboxes (or bad storage).
---
lib/PublicInbox/LeiConvert.pm | 4 ++--
lib/PublicInbox/LeiToMail.pm | 11 +++
lib/PublicInbox/LeiXSearch.pm | 9 +
3 files changed, 14 insertions(+), 10
We've always forced LeiToMail to only have one process for v2
outputs anyways since v2 has its own sharding and IPC. Thus we
can use the single LeiToMail process directly to avoid extra IPC
overhead.
---
lib/PublicInbox/LeiConvert.pm | 7 ++-
lib/PublicInbox/LeiToMail.pm | 19
working on 3/4.
Eric Wong (4):
lei: fix idempotent STDERR redirect in workers
lei convert: fix repeat and idempotent v2 output
lei: avoid extra fork for v2 outputs
lei q|up|convert: common finish_output to detect errors
lib/PublicInbox/LEI.pm | 2 +-
lib/PublicInbox/LeiConvert.pm
201 - 300 of 7360 matches
Mail list logo