Re: Mail archives in Git using ssoma (Docker image)

2016-09-07 Thread Eric Wong
David Bremner  wrote:
> Eric Wong  writes:
> > For mirroring existing lists, I started using public-inbox-watch
> > which currently watches Maildirs.  The config knobs are sorta
> > documented from my announcement to git@vger:
> >
> > https://public-inbox.org/git/20160710004813.ga20...@dcvr.yhbt.net/
> > http://hjrcffqmbrq6wope.onion/git/20160710004813.ga20...@dcvr.yhbt.net/
> >
> > Initial import (w/o spamassassin) was done with
> > scripts/import_vger_from_mbox in the source:
> >
> > torsocks git clone http://hjrcffqmbrq6wope.onion/public-inbox
> > git clone https://public-inbox.org/ public-inbox
> > git clone git://repo.or.cz/public-inbox
> >
> 
> FWIW, I already have a Maildir with a complete (and updated) archive of the 
> list (and
> only that) for use of nmbug. So at the risk of putting all eggs in one
> basket, perhaps public-inbox-watch could watch that maildir.

Yes, public-inbox-watch(1) is probably preferable for any subscriber to
start archiving the notmuch list.  I just pushed out some POD manpages
which should probably help (along with the existing INSTALL doc):

   https://public-inbox.org/meta/20160907004907.1479-...@80x24.org/

public-inbox-overview(7) should be a good starting point of ways
to start mirroring/hosting.  Please feel free to ask me directly
and/or m...@public-inbox.org if you need clarification or help.
I'm scatterbrained and tend to omit things when writing
documentation (it's hard to tell what a reader wants to know :x)


Anyways, thanks for notmuch (and being GPL-3.0+)!  I'm not a
user myself(*), but I've found the notmuch source to be a good
place to steal Xapian usage examples from for public-inbox :>



(*) I have trouble with Maildir-only scalability and
still use gzipped mbox for old mail.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: Mail archives in Git using ssoma (Docker image)

2016-08-21 Thread Eric Wong
"W. Trevor King"  wrote:
> On Sun, Aug 21, 2016 at 12:08:52PM +, Eric Wong wrote:
> > "W. Trevor King"  wrote:
> > > This is the ssoma archive (with the data in it).  I just set up a
> > > basic HTTP archive (following [1]) based on a Docker image [2] (Gentoo
> > > doesn't package all the Perl dependencies public-inbox needs).
> > 
> > Ugh, that sucks (sorry, not a fan of Docker).
> > 
> > What's missing from Gentoo?
> 
> Gentoo doesn't package (or I couldn't find the package for)
> Encode::MIME::Header or Mail::Thread.  I tried installing things from
> CPAN, but ran into a compile-time error from the ‘cpan’ invocationand
> gave up ;).  I can try and reproduce the error if you're curious, but
> I don't have it handy at the moment.

Encode::MIME::Header is distributed with perl itself on Debian and also
the stock upstream install.  Not sure if there's an option you missed or
disabled.

Which perl version do you use?

perl on 5.14 Debian wheezy even seems to have it.  I actually
still want everything to work on 5.8, since that seems to be
the de-facto baseline in the wild.


Mail::Thread is one .pm, and I'll probably replace it with
something (same algorithm) which can use half the memory by
avoiding wrapper object abstractions (it's probably the biggest
memory hog at the moment).

lib/PublicInbox/Thread.pm already has 3 monkey patches to workaround
upstream bugs in Mail::Thread.  It's dead upstream, and not available on
FreeBSD, either.

> > >   $ git config -f srv/notmuch.git/config publicinbox.http 
> > > http://tremily.us
> > >   $ git config -f srv/notmuch.git/config publicinbox.email 
> > > notmuch@notmuchmail.org
> > 
> > That should probably be:
> > 
> > ; based on your [3]
> > git config -f srv/notmuch.git/config \
> > publicinbox.notmuch.url http://tremily.us/notmuch
> > 
> > git config -f srv/notmuch.git/config \
> > publicinbox.notmuch.address notmuch@notmuchmail.org
> > 
> > ; this is crucial for all the public-inbox-* tools
> > git config -f srv/notmuch.git/config \
> > publicinbox.notmuch.mainrepo /path/to/notmuch.git
> 
> I was using these in the Dockerfile's CMD:
> 
>   (cd /srv;
>for NAME in *;
>do
>  CONF="/srv/${NAME}/config";
>  public-inbox-init "${NAME}" "/srv/${NAME}" $(git config -f "${CONF}" 
> publicinbox.http) $(git config -f "${CONF}" publicinbox.email);
>done) && …
> 
> Are you saying that I can skip the ~/.public-inbox/config entries
> setup by public-inbox-init if I set publicinbox.{name}.* in the ssoma
> repository's config?  That would be nice.

Erm, sorry, no, I mean ~/.public-inbox/config as the "git config -f"
arg in the above commands.  Your original config was
meaningless in the context of public-inbox itself; I don't
recall public-inbox relies on $GIT_DIR/config much (if at all)
outside of standard git things.

Using ~/.public-inbox/config is required for multi-inbox lookups
(since you normally run MDA w/o args)

You can also override ~/.public-inbox/config by setting the
PI_CONFIG env (like GIT_CONFIG).

> I don't see a point to having {name} in ssoma-config settings though,
> since you're already in a single bucket by that point (using
> publicinbox.{name}.* makes sense in the multi-bucket
> ~/.public-inbox/config).
> 
> > > It's not updating automatically yet, but that will probably look
> > > like:
> > > 
> > > 1. Pull new mbox [4].
> > > 2. Import into notmuch-arcives [5].
> > > 3. Re-run public-inbox-index (this could probably be via ‘docker exec …’.
> > > 
> > > But I'll have to test that to confirm.  And ideally we'd be using
> > > ssoma-mda or similar directly, instead of going through mbox, but I'd
> > > rather get the official headers on the stored mail than be efficient
> > > ;).
> > 
> > For mirroring existing lists, I started using public-inbox-watch
> > which currently watches Maildirs.
> 
> If I had a Maildir locally, I'd just use procmail and push new
> messages into ssoma-mda.  I'm using the import script because my local
> mail has “how we delivered this to Trevor” headers (which I don't want
> to add) but the downloaded mbox has “how we delivered this to
> notmuch@notmuchmail.org” (which seems like a better fit for a shared
> ssoma repo).

I don't mind extra/different headers.   The majority of messages in
public-inbox.org/git/ has messages that were delivered to gmane;
recent ones are delivered to me, and some holes were filled in by
Jeff King's archives.  All of our mail systems add different
headers.

> > I recommend public-inbox-watch for mirroring existing lists (such as
> > what I did with git@vger) but public-inbox-mda for self-hosted lists
> > (such as m...@public-inbox.org).
> 
> Why is that?  Procmail + public-inbox-mda (or my Python ssoma-mda fork
> [1,2]) seems simpler and equally effective if you want to insert a
> message that your mail system is delivering locally.

-watch is usable for importing big archives or bursts 

Re: Mail archives in Git using ssoma (Docker image)

2016-08-21 Thread W. Trevor King
On Sun, Aug 21, 2016 at 12:08:52PM +, Eric Wong wrote:
> "W. Trevor King"  wrote:
> > This is the ssoma archive (with the data in it).  I just set up a
> > basic HTTP archive (following [1]) based on a Docker image [2] (Gentoo
> > doesn't package all the Perl dependencies public-inbox needs).
> 
> Ugh, that sucks (sorry, not a fan of Docker).
> 
> What's missing from Gentoo?

Gentoo doesn't package (or I couldn't find the package for)
Encode::MIME::Header or Mail::Thread.  I tried installing things from
CPAN, but ran into a compile-time error from the ‘cpan’ invocationand
gave up ;).  I can try and reproduce the error if you're curious, but
I don't have it handy at the moment.

> >   $ git config -f srv/notmuch.git/config publicinbox.http http://tremily.us
> >   $ git config -f srv/notmuch.git/config publicinbox.email 
> > notmuch@notmuchmail.org
> 
> That should probably be:
> 
>   ; based on your [3]
>   git config -f srv/notmuch.git/config \
>   publicinbox.notmuch.url http://tremily.us/notmuch
> 
>   git config -f srv/notmuch.git/config \
>   publicinbox.notmuch.address notmuch@notmuchmail.org
> 
>   ; this is crucial for all the public-inbox-* tools
>   git config -f srv/notmuch.git/config \
>   publicinbox.notmuch.mainrepo /path/to/notmuch.git

I was using these in the Dockerfile's CMD:

  (cd /srv;
   for NAME in *;
   do
 CONF="/srv/${NAME}/config";
 public-inbox-init "${NAME}" "/srv/${NAME}" $(git config -f "${CONF}" 
publicinbox.http) $(git config -f "${CONF}" publicinbox.email);
   done) && …

Are you saying that I can skip the ~/.public-inbox/config entries
setup by public-inbox-init if I set publicinbox.{name}.* in the ssoma
repository's config?  That would be nice.

I don't see a point to having {name} in ssoma-config settings though,
since you're already in a single bucket by that point (using
publicinbox.{name}.* makes sense in the multi-bucket
~/.public-inbox/config).

> > It's not updating automatically yet, but that will probably look
> > like:
> > 
> > 1. Pull new mbox [4].
> > 2. Import into notmuch-archives [5].
> > 3. Re-run public-inbox-index (this could probably be via ‘docker exec …’.
> > 
> > But I'll have to test that to confirm.  And ideally we'd be using
> > ssoma-mda or similar directly, instead of going through mbox, but I'd
> > rather get the official headers on the stored mail than be efficient
> > ;).
> 
> For mirroring existing lists, I started using public-inbox-watch
> which currently watches Maildirs.

If I had a Maildir locally, I'd just use procmail and push new
messages into ssoma-mda.  I'm using the import script because my local
mail has “how we delivered this to Trevor” headers (which I don't want
to add) but the downloaded mbox has “how we delivered this to
notmuch@notmuchmail.org” (which seems like a better fit for a shared
ssoma repo).

> I recommend public-inbox-watch for mirroring existing lists (such as
> what I did with git@vger) but public-inbox-mda for self-hosted lists
> (such as m...@public-inbox.org).

Why is that?  Procmail + public-inbox-mda (or my Python ssoma-mda fork
[1,2]) seems simpler and equally effective if you want to insert a
message that your mail system is delivering locally.

> > One shift from Gmane's mid.gmane.org/… is that the public-inbox UI
> > Message-ID lookup is per-bucket, and public-inbox seems to be
> > encouraging per-list buckets.
> 
> The public-inbox-nntpd interface supports mid lookups across all
> inboxes in that instance; so it should be doable in the WWW
> interface, too.  Either way, I think it has to be O(n) where (n) is
> the number of Xapian DBs, though.

I'm more concerned about the interface, and less about the
implementation (which can be improved later).  The (n) lookups are
trivially parallelizable, and you can always add a Message-ID →
buckets lookup table if (n) lookups turns out to be too slow.

Cheers,
Trevor

[1]: id:20141107190321.gl23...@odin.tremily.us
[2]: id:af679af8257e250ac606e35a1307ad02907b8426.1413663212.git.wk...@tremily.us
 
http://public-inbox.org/meta/af679af8257e250ac606e35a1307ad02907b8426.1413663212.git.wk...@tremily.us/t/#u

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: Mail archives in Git using ssoma (Docker image)

2016-08-21 Thread David Bremner
Eric Wong  writes:


> For mirroring existing lists, I started using public-inbox-watch
> which currently watches Maildirs.  The config knobs are sorta
> documented from my announcement to git@vger:
>
> https://public-inbox.org/git/20160710004813.ga20...@dcvr.yhbt.net/
> http://hjrcffqmbrq6wope.onion/git/20160710004813.ga20...@dcvr.yhbt.net/
>
> Initial import (w/o spamassassin) was done with
> scripts/import_vger_from_mbox in the source:
>
> torsocks git clone http://hjrcffqmbrq6wope.onion/public-inbox
> git clone https://public-inbox.org/ public-inbox
> git clone git://repo.or.cz/public-inbox
>

FWIW, I already have a Maildir with a complete (and updated) archive of the 
list (and
only that) for use of nmbug. So at the risk of putting all eggs in one
basket, perhaps public-inbox-watch could watch that maildir.

d
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: Mail archives in Git using ssoma (Docker image)

2016-08-21 Thread W. Trevor King
On Sat, Aug 20, 2016 at 09:36:31PM -0700, W. Trevor King wrote:
> [2]: git://tremily.us/notmuch-archives.git

This is the ssoma archive (with the data in it).  I just set up a
basic HTTP archive (following [1]) based on a Docker image [2] (Gentoo
doesn't package all the Perl dependencies public-inbox needs).
Dockerfile for rebuilding the image is in [2].  I'm currently hosting
the archives (HTTP only) at [3].  Spinning that up from the Docker
image looks like:

  $ mkdir srv
  $ git clone --bare git://tremily.us/notmuch-archives.git srv/notmuch
  $ echo 'Notmuch -- Just an email system' >srv/notmuch.git/description
  $ git config -f srv/notmuch.git/config publicinbox.http http://tremily.us
  $ git config -f srv/notmuch.git/config publicinbox.email 
notmuch@notmuchmail.org
  $ docker run --name notmuch-archives -d -p 80:8080 -v ${PWD}/srv/:/srv/ 
wking/public-inbox

(although I'm using -p ###:8080 and have an Nginx reverse-proxy in
front).  It's not updating automatically yet, but that will probably
look like:

1. Pull new mbox [4].
2. Import into notmuch-archives [5].
3. Re-run public-inbox-index (this could probably be via ‘docker exec …’.

But I'll have to test that to confirm.  And ideally we'd be using
ssoma-mda or similar directly, instead of going through mbox, but I'd
rather get the official headers on the stored mail than be efficient
;).

One shift from Gmane's mid.gmane.org/… is that the public-inbox UI
Message-ID lookup is per-bucket, and public-inbox seems to be
encouraging per-list buckets.

And while I feel like I had a good grasp of the ssoma format two years
ago, I know very little about Perl and public-inbox.  I'm sure you
could setup a public-inbox host that is more efficient than what's
currently in my Docker image.

Cheers,
Trevor

[1]: http://public-inbox.org/INSTALL
[2]: https://hub.docker.com/r/wking/public-inbox/
[3]: http://tremily.us/notmuch/
[4]: https://notmuchmail.org/archives/notmuch.mbox
[5]: id:20160821043631.ga2...@odin.tremily.us

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch