Il 12/01/2010 21:04, Magnus Hagander ha scritto:
On Tue, Jan 12, 2010 at 20:56, Matteo Beccati<p...@beccati.com> wrote:
Il 12/01/2010 10:30, Magnus Hagander ha scritto:
The problem is usually with strange looking emails with 15 different
MIME types. If we can figure out the proper way to render that, the
rest really is just a SMOP.
Yeah, I was expecting some, but all the message I've looked at seemed to be
working ok.
Have you been looking at old or new messages? Try grabbing a couple of
MBOX files off archives.postgresql.org from several years back, you're
more likely to find weird MUAs then I think.
Both. pgsql-hacker and -general are subscribed and getting new emails
and pgsql-www is just an import of the archives:
http://archives.beccati.org/pgsql-www/by/date (sorry, no paging)
(just fixed a 500 error that was caused by the fact that I've been
playing with the db a bit and a required helper table was missing)
(BTW, for something to actually be used In Production (TM), we want
something that uses one of our existing frameworks. So don't go
overboard in code-wise implementations on something else - proof of
concept on something else is always ok, of course)
OK, that's something I didn't know, even though I expected some kind of
limitations. Could you please elaborate a bit more (i.e. where to find
info)?
Well, the framework we're moving towards is built on top of django, so
that would be a good first start.
There is also whever the commitfest thing is built on, but I'm told
that's basically no framework.
I'm afraid that's outside on my expertise. But I can get as far as
having a proof of concept and the required queries / php code.
Having played with it, here's my feedback about AOX:
pros:
- seemed to be working reliably;
- does most of the dirty job of parsing emails, splitting parts, etc
- highly normalized schema
- thread support (partial?)
A killer will be if that thread support is enough. If we have to build
that completely ourselves, it'll take a lot more work.
Looks like we need to populate a helper table with hierarchy
information, unless Ahijit has a better idea and knows how to get it
from the aox main schema.
cons:
- directly publishing the live email feed might not be desirable
Why not?
The scenario I was thinking at was the creation of a static snapshot and
potential inconsistencies that might occur if the threads get updated
during that time.
- queries might end up being a bit complicate for simple tasks
As long as we don't have to hit them too often, which is solve:able
with caching. And we do have a pretty good RDBMS to run the queries on
:)
True :)
I don't think you can trust the NNTP gateway now or in the past,
messages are sometimes lost there. The mbox files are as complete as
anything we'll ever get.
Importing the whole pgsql-www archive with a perl script that bounces
messages via SMTP took about 30m. Maybe there's even a way to skip SMTP, I
haven't looked into it that much.
Um, yes. There is an MBOX import tool.
Cool.
With all that said, I can't promise anything as it all depends on how much
spare time I have, but I can proceed with the evaluation if you think it's
useful. I have a feeling that AOX is not truly the right tool for the job,
but we might be able to customise it to suit our needs. Are there any other
requirements that weren't specified?
Well, I think we want to avoid customizing it. Using a custom
frontend, sure. But we don't want to end up customizing the
parser/backend. That's the road to unmaintainability.
Sure. I guess my wording wasn't right... I was more thinking about
adding new tables, materialized views or whatever else might be missing
to make it fit out purpose.
Cheers
--
Matteo Beccati
Development & Consulting - http://www.beccati.com/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers