There's a lot of crap in archives and git-fast-import
accepts empty names and email addresses for authors
just fine.
---
lib/PublicInbox/Import.pm | 27 +++
1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
There's a lot of weird characters which show up in LKML archives
which we did not support before. Furthermore, allow spaces
before the '>' in the From: line as at least some non-spam
poster used it.
---
lib/PublicInbox/Address.pm | 3 ++-
t/address.t| 5 +++--
2 files changed, 5
This will allow easier-compatibility with v2 code which will
introduce content_id as the unique identifier.
The old "XMID" becomes "XM" as a free text searchable term.
"Q" becomes "XMID" as a boolean prefix.
There's no user-visible changes in this, but there needs to
be a schema version bump
The mboxes I got from cregit have two spaces after the email
address, while the "git format-patch" output I'm used to dealing
with only has one space.
It's still a "strict" match in that it checks for something
resembling a timestamp, but it relaxes the number of spaces
between the email address
Check for this before doing the Xapian-based v2 importer.
---
t/import.t | 25 -
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/t/import.t b/t/import.t
index fb6238e..92c82b9 100644
--- a/t/import.t
+++ b/t/import.t
@@ -6,7 +6,10 @@ use Test::More;
use
Big lists are orders of magnitude more efficient with v2.
---
scripts/import_vger_from_mbox | 24 ++--
1 file changed, 6 insertions(+), 18 deletions(-)
diff --git a/scripts/import_vger_from_mbox b/scripts/import_vger_from_mbox
index 3fa5c77..6ea2ca5 100644
---
Wrap "get-mark" and "checkpoint" commands for git-fast-import
while documenting/cementing parts of the API.
---
lib/PublicInbox/Import.pm | 28 ++--
t/import.t| 4 +++-
2 files changed, 25 insertions(+), 7 deletions(-)
diff --git
I decided not to copy the notmuch implementation regarding
serialization of integers to Xapian metadata.
---
lib/PublicInbox/SearchIdx.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 0ee0779..fa5057f 100644
Call order will need to change a bit since this is going to be
tied to Xapian
---
MANIFEST | 1 +
lib/PublicInbox/ContentId.pm | 30 ++
lib/PublicInbox/Import.pm| 74 +++-
3 files changed, 91 insertions(+), 14
Hostnames can contain '-' and this allows public-inbox-watch(1)
to work on machines which generate Maildir files with '-' in
them.
---
lib/PublicInbox/WatchMaildir.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/PublicInbox/WatchMaildir.pm
plagued the initial v1 design.
There's also a couple of small fixes along the way to make it
tolerate some crap in older archives.
The search indexer and content-based deduplication will
still need to be worked on.
Eric Wong (Contractor, The Linux Foundation) (17):
AUTHORS: add The Linux
The parallelization requires splitting Msgmap, text+term
indexing, and thread-linking out into separate processes.
git-fast-import is fast, so we don't bother parallelizing it.
Msgmap (SQLite) and thread-linking (Xapian) must be serialized
because they rely on monotonically increasing numbers
It is less confusing without the clobber assignment; and
PublicInbox::MIME exists to workaround bugs in older
Email::MIME (which is in Debian 9 (stretch))
---
scripts/import_vger_from_mbox | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/scripts/import_vger_from_mbox
This is too slow, currently. Working with only 2017 LKML
archives:
git-only: ~1 minute
git + SQLite: ~12 minutes
git+Xapian+SQlite: ~45 minutes
So yes, it looks like we'll need to parallelize Xapian indexing,
at least.
---
lib/PublicInbox/Import.pm | 1 +
We want to reduce the time in the main V2Writable process
spends writing to the pipe, as the main process itself is
the primary source of contention.
While we're at it, always flush after writing to ensure
the child sees it at once. (Grr... Perl doesn't use writev)
---
Despite email not existing until 1971; "Jan 1, 1970 00:00:00"
seems like a common default timestamp for some test emails
to use as a Date: header.
---
lib/PublicInbox/Import.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/PublicInbox/Import.pm
This will let us quickly test between v2 and v1 inboxes.
---
scripts/import_vger_from_mbox | 24 +++-
1 file changed, 19 insertions(+), 5 deletions(-)
diff --git a/scripts/import_vger_from_mbox b/scripts/import_vger_from_mbox
index d30e8a3..abc2d37 100644
---
Since we'll be adding new repositories to the `alternates' file
in git, we must restart the `git cat-file --batch' process as
git currently does not detect changes to the alternates file
in long-running cat-file processes.
Don't bother with the `--batch-check' process since we won't be
using it
In general, they are, but there's no way for or general purpose
mail server to enforce that. This is a step in allowing us
to handle more corner cases which existing lists throw at us.
---
lib/PublicInbox/ExtMsg.pm| 2 +-
lib/PublicInbox/Search.pm| 14 --
Wrap the old Import package to enable creating new repos based
on size thresholds. This is better than relying on time-based
rotation as LKML traffic seems to be increasing.
---
MANIFEST | 1 +
lib/PublicInbox/Git.pm| 12 +++
lib/PublicInbox/Import.pm | 9
This likely has no real world implications, though, as we
fall back to Msgmap lookups anyways.
Broken since commit 7eeadcb62729b0efbcb53cd9b7b181897c92cf9a
("search: remove unnecessary abstractions and functionality")
---
lib/PublicInbox/ExtMsg.pm | 2 +-
1 file changed, 1 insertion(+), 1
rors and bugs quickly in the PSGI interface.
For sorting, relying on the Date: header seems unreliable as
kernel developers seem more prone to having bad clocks than
other lists I've imported. I'll probably switch the internal
timestamps to use the Received: date as a result.
Eric Wong (Contract
It works around some bugs in older Email::MIME which we'll
find useful.
---
lib/PublicInbox/MIME.pm | 2 ++
lib/PublicInbox/SearchIdx.pm| 2 --
lib/PublicInbox/V2Writable.pm | 2 --
lib/PublicInbox/WatchMaildir.pm | 2 --
lib/PublicInbox/WwwAttach.pm| 3 +--
This should give us an idea of how much a problem deduplication
will be.
---
lib/PublicInbox/SearchIdx.pm | 6 --
lib/PublicInbox/V2Writable.pm | 2 +-
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index
We need to ensure Xapian transaction commits are made to remote
partitions before associated commits hit the skeleton DB.
This causes unnecessary commits to be made to the skeleton DB;
but they're mostly harmless. Further work will be necessary
to ensure proper ordering and avoidance of
Probably unnecessary, but set binmode for consistency across
platforms.
---
lib/PublicInbox/SearchIdxPart.pm | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/PublicInbox/SearchIdxPart.pm b/lib/PublicInbox/SearchIdxPart.pm
index 64e5263..63fc704 100644
---
Iterating through a list of documents while modifying them does
not seem to be supported in Xapian and it can trigger
DatabaseCorruptError exceptions. This only worked with past
datasets out of dumb luck. With the work-in-progress "v2"
public-inbox layout, this problem might become more visible
Fortunately, Xapian multiple database support makes things
easier but we still need to handle the skeleton DB separately.
---
lib/PublicInbox/Inbox.pm | 21 +
lib/PublicInbox/Search.pm | 42 --
2 files changed, 57 insertions(+), 6
I added these while chasing down the DatabaseCorruptError
exceptions which turned out to be caused by Xapian DB
modifications during iteration.
---
lib/PublicInbox/SearchIdxSkeleton.pm | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/SearchIdxSkeleton.pm
A different Xapian DB requires the use of a different Enquire
object. This is necessary for get_thread and thread skeleton
to work in the PSGI UI.
---
lib/PublicInbox/Search.pm | 13 +++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/Search.pm
Otherwise, references and thread linking doesn't happen
across subject mismatches. Oops, this is important.
---
lib/PublicInbox/SearchIdxThread.pm | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/PublicInbox/SearchIdxThread.pm
b/lib/PublicInbox/SearchIdxThread.pm
index fd133d1..57bb293
Interchangably using "all", "skel", "threader", etc. were
confusing. Standardize on the "skeleton" term to describe
this class since it's also used for retrieval of basic headers.
---
MANIFEST | 2 +-
lib/PublicInbox/Search.pm |
Relying more on Xapian requires retrying reopens in more
places to ensure it does not fall down and show errors to
the user.
---
lib/PublicInbox/Inbox.pm | 8 +---
lib/PublicInbox/Search.pm | 3 ++-
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/Inbox.pm
Any Xapian DB is subject to the same errors and retries.
Perhaps in the future this can made more granular to avoid
unnecessary reopens.
---
lib/PublicInbox/Search.pm | 8 +++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
This was adding a needless newline into doc_data
---
lib/PublicInbox/SearchIdxPart.pm | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/PublicInbox/SearchIdxPart.pm b/lib/PublicInbox/SearchIdxPart.pm
index 63fc704..e37887a 100644
--- a/lib/PublicInbox/SearchIdxPart.pm
+++
OFFSET in SQLite gets painful to deal with. Instead,
rely on timestamps (from Received:) for pagination.
This also sets us up for more precise Date searching
in case we want it.
---
lib/PublicInbox/Feed.pm | 25 ++
lib/PublicInbox/Inbox.pm | 4 +--
lib/PublicInbox/Over.pm | 24
be improved across the board and
commands like XOVER/XHDR are over 20x faster than they were
in v1 with Xapian.
There's also a few cleanups and code simplifications
which should make future work easier.
Eric Wong (Contractor, The Linux Foundation) (7):
t/thread-all.t: modernize test to support
We we worked around the default range/termination conditions of
long_response in many cases to reduce calls to SQLite or Xapian.
So continue that trend and become more like the PSGI API
which doesn't force callers to specify an article range or
work inside a loop.
---
lib/PublicInbox/Msgmap.pm |
id_batch had a an overly complicated interface, replace it
with id_batch which is simpler and takes advantage of
selectcol_arrayref in DBI. This allows simplification of
callers and the diffstat agrees with me.
---
lib/PublicInbox/Mbox.pm | 11 ++-
lib/PublicInbox/Msgmap.pm | 19
There'll be more performance-related tests in the future.
---
MANIFEST | 2 +-
t/perf-threading.t | 32
t/thread-all.t | 32
3 files changed, 33 insertions(+), 33 deletions(-)
create mode 100644 t/perf-threading.t
We'll be adding more tests in the same vein as this
to improve NNTP performance.
---
t/thread-all.t | 24 +---
1 file changed, 9 insertions(+), 15 deletions(-)
diff --git a/t/thread-all.t b/t/thread-all.t
index d4e8c1f..820fba8 100644
--- a/t/thread-all.t
+++ b/t/thread-all.t
Some of this jankiness was from early performance problems
and they turned out to be unnecessary measures.
---
lib/PublicInbox/OverIdx.pm| 9 -
lib/PublicInbox/V2Writable.pm | 3 +--
script/public-inbox-compact | 13 +
3 files changed, 6 insertions(+), 19 deletions(-)
Dscho found this useful for finding matching git commits based
on AuthorDate in git. Add it to the overview DB format, too;
so in the future we can support v2 repos without Xapian.
https://public-inbox.org/git/nycvar.qro.7.76.6.1804041821420...@zvavag-6oxh6da.rhebcr.pbec.zvpebfbsg.pbz
This hopefully helps for people who try to understand
this design.
---
lib/PublicInbox/V2Writable.pm | 54 +--
1 file changed, 27 insertions(+), 27 deletions(-)
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index
Xapian is size-intensive and SQLite is not strictly necessary for v1.
---
script/public-inbox-compact | 2 +-
scripts/import_vger_from_mbox | 2 +-
t/convert-compact.t | 2 +-
t/v2mirror.t | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git
Noted by Jonathan Corbet in https://lwn.net/Articles/748184/
---
lib/PublicInbox/NNTP.pm | 43 ---
t/nntp.t| 6 --
2 files changed, 32 insertions(+), 17 deletions(-)
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index
This significantly improves the performance of the NNTP GROUP
command with 2.7 million messages from over 250ms to 700us.
SQLite is weird about this, but at least there's a way to
optimize it.
---
lib/PublicInbox/Msgmap.pm | 10 +++---
t/perf-nntpd.t| 13 +
2 files
Since the overview stuff is a synchronization point anyways,
move it into the main V2Writable process and allow us to
drop a bunch of code. This is another step towards making
Xapian optional for v2.
In other words, the fan-out point is moved and the Xapian
partitions no longer need to
Since we handle the overview info synchronously, we only need
barriers in tests, now. We will use asynchronous checkpoints
to sync less-important Xapian data.
For data deduplication, this requires us to hoist out the
cat-blob support in ::Import for reading uncommitted data
in git.
---
Since we only query the SQLite over DB for OVER/XOVER; do not
need to waste space storing fields To/Cc/:bytes/:lines or the
XNUM term. We only use From/Subject/References/Message-ID/:blob
in various places of the PSGI code.
For reindexing, we will take advantage of docid stability
in
We only need to call get_thread beyond 1000 messages for
fetching entire mboxes. It's probably too much for the HTML
display otherwise.
---
lib/PublicInbox/Mbox.pm | 6 +++---
lib/PublicInbox/Over.pm | 18 +++---
2 files changed, 14 insertions(+), 10 deletions(-)
diff --git
The partition count can change if public-inbox-compact runs
while public-inbox-watch or public-inbox-index is running.
---
lib/PublicInbox/V2Writable.pm | 44 ---
1 file changed, 29 insertions(+), 15 deletions(-)
diff --git a/lib/PublicInbox/V2Writable.pm
There's enough gmane links out there in wild that it makes sense
to maintain support for these mappings.
---
MANIFEST | 1 +
lib/PublicInbox/AltId.pm | 20 +++---
lib/PublicInbox/Filter/RubyLang.pm | 22 ++-
t/altid_v2.t
Lets not scare users when they encounter files that are supposed
to be there. Then, preserve the journal and pipe.lock, even if
they're supposedly unused due to us holding the inbox-wide lock.
---
script/public-inbox-compact | 9 +
1 file changed, 9 insertions(+)
diff --git
$mset->size is probably more obvious than relying on a tied
array and saves us a line.
---
lib/PublicInbox/SearchView.pm | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index c789795..d038dfc 100644
---
Eric Wong (Contractor, The Linux Foundation) (7):
v2writable: recount partitions after acquiring lock
searchmsg: remove unused `tid' and `path' methods
search: remove unnecessary OP_AND of query
mbox: do not sort search results
searchview: minor cleanup
support altid mechanism for v2
These internal attributes are not exposed and no longer
used in our APIs.
---
lib/PublicInbox/SearchMsg.pm | 5 -
1 file changed, 5 deletions(-)
diff --git a/lib/PublicInbox/SearchMsg.pm b/lib/PublicInbox/SearchMsg.pm
index 6c0780e..d43853a 100644
--- a/lib/PublicInbox/SearchMsg.pm
+++
Sorting large msets is a waste when it comes to mboxes
since MUAs should thread and sort them as the user desires.
This forces us to rework each of the mbox download mechanisms
to be more independent of each other, but might make things
easier to reason about.
---
lib/PublicInbox/Mbox.pm | 139
This was vestigial code from the switch to the overview DB
---
lib/PublicInbox/Search.pm | 1 -
1 file changed, 1 deletion(-)
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index eca2b0f..4e014f4 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -216,7
The Xapian partitions will trigger the removal anyways.
Test this and fix some description/spelling errors
while we're at it.
---
lib/PublicInbox/V2Writable.pm | 1 -
t/v2writable.t| 6 --
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git
This allows us to emulate the display of thread-aware MUAs when
multiple messages share the same Message-ID. This also is a
place where "public-inbox-index --reindex" is useful to fix
existing messages and no schema version bump is necessary.
---
lib/PublicInbox/SearchIdx.pm | 13 +++--
We do not need to rewrite old commits unaffected by the object_id
purge, only newer commits. This was a state management bug :x
We will also return the new commit ID of rewritten history to
aid in incremental indexing of mirrors for the next change.
---
lib/PublicInbox/Import.pm | 25
---
script/public-inbox-init | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/script/public-inbox-init b/script/public-inbox-init
index d6a6482..3ef6c3b 100755
--- a/script/public-inbox-init
+++ b/script/public-inbox-init
@@ -5,7 +5,7 @@
# Initializes a public-inbox, basically
Xapian may become unhappy if a DB is modified during iteration:
nntp://news.gmane.org/20180228004400.gu12...@survex.com
---
lib/PublicInbox/V2Writable.pm | 46 +--
1 file changed, 27 insertions(+), 19 deletions(-)
diff --git a/lib/PublicInbox/V2Writable.pm
This increases indexing time by around 10% but roughly
halves memory usage of an -index process.
We will probably make this tunable in the future for people
with bigger/smaller machines.
---
lib/PublicInbox/SearchIdx.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
Gigantic feeds probably make some clients unhappy,
clamp it to what it was in the past.
Fixes: b9534449ecce2c59 ("view: avoid offset during pagination")
---
lib/PublicInbox/Feed.pm | 2 +-
lib/PublicInbox/View.pm | 7 +++
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git
--no-renumber does not allow merging, and merging is not ideal
for reindexing, either.
---
script/public-inbox-compact | 15 ++-
t/convert-compact.t | 2 --
2 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/script/public-inbox-compact
While hunting duplicates, I noticed a leading '-' in some
Message-IDs as a result of RFC4648 encoding. While '-' seems
allowed by RFC5322 and URL-friendly (RFC4648), they are uncommon
and make using Message-IDs as arguments for command-line tools
more difficult. So prefix them with a datestamp
Hopefully the final round of patches, most notably [7/12]
improving deduplication and [9/12] to avoid causing difficulties
for NNTP readers on v1 inboxes.
And a few more bugfixes along the way...
Eric Wong (Contractor, The Linux Foundation) (12):
feed: respect feedmax, again
v1: remove
We generally do not want git to waste time finding abbreviations
and we do not want the possibility of them becoming ambiguous
over time, either.
---
lib/PublicInbox/Feed.pm | 2 +-
lib/PublicInbox/V2Writable.pm | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git
We do not want phrase searches to cross between independent
fields (filenames/Message-ID vs bodies)
---
lib/PublicInbox/SearchIdx.pm | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 2239c90..6e44887 100644
---
I'm not sure how useful this view is, but it exists for now.
---
lib/PublicInbox/SearchIdx.pm | 1 +
t/psgi_v2.t | 11 +--
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index fd76627..0b1dc21
git fast-import and the main V2Writable process combined takes
about one CPU, so avoid having too many Xapian partitions which
cause unnecessary I/O contention.
---
lib/PublicInbox/V2Writable.pm | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git
In case people were running old buggy versions from 2016...
(and -convert should probably clean those up, eventually)
---
lib/PublicInbox/Import.pm | 3 +++
1 file changed, 3 insertions(+)
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index c7a96e1..b25427e 100644
---
Searching across different inboxes is expensive without
SQLite (or Xapian) installed, so avoid doing expensive tree
lookups in git. Since SQLite is required for Xapian
support anyways, we won't need to check Xapian, either.
Sites without SQLite installed will simply 404 if somebody
requests a
"LIKE" in SQLite (and other SQL implementations I've seen) is
expensive with nearly 3 million messages in the archives.
This caused some partial Message-ID lookups to take over 600ms
on my workstation (~300ms on a faster Xeon). Cut that to below
under 30ms on average on my workstation by relying
We'll be ensuring we can run more of the HTTP and all of the
NNTP interface with only SQLite (and not Xapian) installed
in the future.
---
t/over.t | 3 ++-
t/v1-add-remove-add.t | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/t/over.t b/t/over.t
index
We use the actual Inbox object everywhere else and don't
need the name of the inbox separated from the object.
---
lib/PublicInbox/WWW.pm | 1 -
1 file changed, 1 deletion(-)
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 7bf866f..f5ed271 100644
--- a/lib/PublicInbox/WWW.pm
I forget this endpoint is still accessible (even if not linked).
This also simplifies new.html all around and removes some unused
clutter from the old days while we're at it.
---
lib/PublicInbox/Feed.pm | 82 -
t/psgi_v2.t | 6
2
This needs tests and further refinement, but current tests pass.
---
lib/PublicInbox/Mbox.pm | 12 ++---
lib/PublicInbox/SearchMsg.pm | 7 +++
lib/PublicInbox/View.pm | 107 ++-
lib/PublicInbox/WWW.pm | 7 +--
t/psgi_v2.t
We do not need many of these, anymore.
---
lib/PublicInbox/SearchView.pm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 55c588c..6e537b4 100644
--- a/lib/PublicInbox/SearchView.pm
+++
Since we need to handle messages with multiple and duplicate
Message-ID headers, our thread skeleton display must account
for that.
Since we have a "preferred" Message-ID in case of conflicts,
use it as the UUID in an Atom feed so readers do not get
confused by conflicts.
---
This should help us detect bugs sooner in case we have
space waste problems.
---
lib/PublicInbox/SearchIdx.pm | 4
1 file changed, 4 insertions(+)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 7ac16ec..446cfb0 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++
Some test coverage is better than none, here.
---
t/psgi_v2.t | 22 ++
1 file changed, 22 insertions(+)
diff --git a/t/psgi_v2.t b/t/psgi_v2.t
index 6a2ea5b..7389798 100644
--- a/t/psgi_v2.t
+++ b/t/psgi_v2.t
@@ -43,6 +43,7 @@ $mime->body_set("hello world!\n");
my @warn;
This gives more-up-to-date data in case and allows us
to avoid reopening in more places ourselves.
---
lib/PublicInbox/Search.pm | 5 +
t/psgi_v2.t | 10 +++---
2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/lib/PublicInbox/Search.pm
e-ID.
I'm not sure if messages with the same Message-ID should
be rendered as relatives, or if they should only be grouped
together in the same rootset (as if their "Subject:" matches)
but not rendered as ancestors.
Eric Wong (Contractor, The Linux Foundation) (11):
import: consolida
We'll be making sure V2Writable uses this.
---
lib/PublicInbox/InboxWritable.pm | 71
lib/PublicInbox/SearchIdx.pm | 80
t/search.t | 28 +--
3 files changed, 93 insertions(+), 86 deletions(-)
diff
Ensure -convert and -compact do not make repositories
unreadable on live servers.
---
lib/PublicInbox/InboxWritable.pm | 5 +-
lib/PublicInbox/V2Writable.pm| 26
script/public-inbox-compact | 110 +--
script/public-inbox-convert | 23 +--
We have Git::qx nowadays.
---
t/v2writable.t | 8 ++--
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/t/v2writable.t b/t/v2writable.t
index 0eda432..7abb14f 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -43,13 +43,9 @@ if ('ensure git configs are correct') {
This bug was hidden due to timing problems with eatmydata or
running with tmpfs for TMPDIR.
---
script/public-inbox-convert | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/script/public-inbox-convert b/script/public-inbox-convert
index 2b0a385..e6fb4f5 100755
---
Eric Wong (Contractor, The Linux Foundation) (4):
convert: avoid redundant "done\n" statement for fast-import
search: move permissions handling to InboxWritable
t/v2writable: use simplify permissions reading
v2: respect core.sharedRepository in git configs
lib/P
This was causing errors while attempting to load messages via
the WWW interface while mass-importing LKML. While we're at it,
remove unnecessary eval from lookup_article.
---
lib/PublicInbox/Search.pm | 46 ++
1 file changed, 22 insertions(+), 24
I mainly focus on -watch for mirroring busy mailing lists, but
using -mda should remain an option.
---
MANIFEST | 1 +
lib/PublicInbox/Emergency.pm | 3 ++-
script/public-inbox-mda | 14 +--
t/v2mda.t| 59
And we do not want to start making confused repos if somebody
leaves out "-V2" the second time around.
---
lib/PublicInbox/V2Writable.pm | 1 -
script/public-inbox-init | 17 -
t/init.t | 32 ++--
3 files changed, 46
I suppose people will use -mda for large mailing lists, too
(as opposed to -watch); so it now supports v2. After importing
a giant archives into a v2 inbox, the new -compact wrapper
can consolidate and reduce Xapian space (and FD) overhead.
Eric Wong (Contractor, The Linux Foundation) (4
The only Xapian term which should be unique is the NNTP article
number; so we no longer need find_unique_doc_id.
---
lib/PublicInbox/Search.pm | 24
1 file changed, 8 insertions(+), 16 deletions(-)
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index
We do not need this subroutine for read-only use in Search.pm
---
lib/PublicInbox/Search.pm| 8
lib/PublicInbox/SearchIdx.pm | 8
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index 7d42aaa..6f5e062
Purging existing messages is fairly straightforward since we can
take advantage of Xapian and lookup the git object_id with it.
Unfortunately, purging an already "removed" message (which is
no longer in Xapian) is not as easy and we'll need to expose
->purge_oids to purge by the git object_id
The Message-ID mapped to an NNTP article number is stronger,
so we will favor that for attachment lookups.
---
lib/PublicInbox/Inbox.pm | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 47b8630..4c7305f
Otherwise I would forget and be tempted to remove them.
---
lib/PublicInbox/SearchMsg.pm | 4
1 file changed, 4 insertions(+)
diff --git a/lib/PublicInbox/SearchMsg.pm b/lib/PublicInbox/SearchMsg.pm
index e314fed..e55d401 100644
--- a/lib/PublicInbox/SearchMsg.pm
+++
1 - 100 of 197 matches
Mail list logo