RE: controlling STARTTLS by IP address

2016-07-14 Thread M. Balridge
Quoting Michael Fox :

> > Seems like your firewall could redirect to a different port that doesn't
> > offer starttls.
> 
> Yes, of course.  But that would require multiple ports, making the client
> configuration cumbersome and error-prone.

It looks like there's an internal Dovecot solution, so all's well. 

I just thought to remind people that with some firewalls, there's always a way
to perform "silent" redirections using the DNAT target in the PREROUTING
table, i.e.,:

-t nat -A PREROUTING -i ${EXTIF} -s ${NOTLSSOURCES} -p tcp --dport 110 \
 --syn -j DNAT --to-destination ${DOVECOT}:${NOTLSPOP3PORT}

If you're using a Linux iptables firewall, you wouldn't need to expose the
different port to the client, but would make use of the NAT subsystem to
redirect the connection from certain IP#s->POP3 to the service port where
you've denied TLS.

No client would need to be made aware of the "secret" ${NOTLSPOP3PORT}, and in
fact, the firewall would continue to DROP packets sent to it from elsewhere if
you have a default-deny policy in effect.

=R=


Re: Scaling to 10 Million IMAP sessions on a single server

2017-02-22 Thread M. Balridge
Quoting Ruga :

> Comparison of Dovecot, Uwash, Courier, Cyrus and M-Box:
> http://www.isode.com/whitepapers/mbox-benchmark.html

Wow.  That comparison is only 11.5 years old.

The "default" file system of reiserfs and gcc-3.3 were dead giveaways. 

I suspect Dovecot's changed a tad since that test.

=R=


Re: Minor patches for builds against ancient platforms

2017-06-09 Thread M. Balridge

> >> Warning: Transaction log file
> /home/luser/mail/.imap/INBOX/dovecot.index.log
> >> was locked for 95 seconds (rotating while syncing)
> 
> Timo recently explained to me it's probably caused by slow I/O or
> processing.  This explanation is consistent with my observation that
> the users who get these messages have jumbo mailboxes.

I do know that this little box of horrors has 200-300MB mbox INBOXes on an
ext3 filesystem formatted in 2005.  I am very nervous about converting them to
Maildir at this point.  If I could get someone (or something) to the site and
replace it with something much more suitable, I could have these people join
the 21st Century.

> You can disable dotclock altogether, but I don't think this is what you
> meant.  You can use locking method "dotlock_try" rather than "dotlock"
> -- the former will ignore quota/permissions problems and plow on.
> (It still logs it.)  You could also align luser's mail folder group with
> with luser's GID, which is usually what I do.

What I meant was, are certain types of filenames "blocked" by policy from
being created via IMAP commands?  I'm sure I could run a few tests to answer
this for myself, or better still, go through Timo's code.

> Maybe locks are created even when files don't exist as there may
> be a race condition where another process is creating/deleting it at
> the same time.

Sure, I could see that.  In reading through the locking code of sendmail,
dovecot, UW-IMAPD, and procmail, it's clear that locking files under UNIX is
chaotic and filled with no small amount of voodoo.  And naturally, opinions
and implementations radically disagree. (How sensible it was of Timo to make
it a RUNTIME configuration option in dovecot.)

I appreciate your reply, Joseph.

48 hours after switching half of the userbase to dovecot, I am not seeing any
serious problems, and already people are more often than not very pleased by
the improved performance and responsiveness.

One last problem area is that many users have soft-links to mailboxes located
on a second drive, but these never appear in folder enumeration lists or they
appear grayed out in SeaMonkey/Thunderbird.

I've tried just symbolically linking to directories containing other mboxes,
but sometimes it works, and sometimes it doesn't.  I wonder if there's
paranoia checking in the code that follows symbolic links to ensure that
uids/gids of the "owning" directory and the linked-to directory (or files
within it) are the same.

I'm still trying to absorb all of the documentation for dovecot.

=M=


Minor patches for builds against ancient platforms

2017-06-08 Thread M. Balridge
I was recently asked to upgrade some neolithic aged software (UW-IMAP,
sendmail 8.12.x, apache 1.3, amongst other horrors).

The box is physically remote, so an aggressive "new flush" wasn't an option. 
I've been able to upgrade the compiler to gcc-3.4, openssl to 1.0.2k, glibc,
php to something in the 5.4-branch, etc.

I have CLucene working, even.

I know should take a shotgun to the box and retire it. It's a NORTHWOOD P4, no
less, with only 1.5GB RAM and 74GB of SCSI-160 storage.

*BUT* that isn't my call to make, as much as I'd like to do the right thing.

When Life(tm) hands you incredibly sour and bitter oranges, the best you can
do aside from making a Palmetto Punch, is perhaps traditional Cochinita pibil
the way they do in the Yucatan.

I ran across two main problems, the first of which struck during the build.

Amazingly enough, I was able to update pcre, gettext, openssl, textcat, and
other libraries to modern versions without too much pain and suffering.

1) In src/lib/compat.h there is a definition for p(read|write) that conflicts
with the one in /usr/include/unistd.h

On this box, there is a macro appended to the definition (to control whether
or not THROW is defined in C++ "mode").  This is regulated by using the macro
__THROW.  I assume this is anachronistic.

2) There was an odd overflow bug in the quota module. (Yes, would you believe
that user quotas are used + enforced on this Frankenbox?)  I assume it's a
rarely seen issue because few Dovecot users compile the software in caves on
computers powered by horse-pulled generator wheels.  I suspect Timo's seen
more Abominable Snowcreatures in Espoo than systems like these.

Simply adding an explicit 64 bit (unsigned) type to the constant multipliers
seemed to address this. Of these two patches, this is probably the most "safe"
and thus likely to be accepted into the main branch of code.

Thanks for the great software, as always, Timo. It's a testimony to your
design and implementation acumen that software you've written in 2017 still
runs on machines that went obsolete in 480 B.C.E.

I am trying to track down one possible issue that could be locking-related,
which causes some mailbox open operations to see to take longer than they
should. Log entries like:

> Warning: Transaction log file /home/luser/mail/.imap/INBOX/dovecot.index.log
> was locked for 95 seconds (rotating while syncing)

> Warning: Transaction log file /home/luser/mail/.imap/INBOX/dovecot.index.log
> was locked for 92 seconds (rotating while syncing)

I am using sendmail 8.15.2 (HASFLOCK not defined) and procmail 3.22 (Locking
strategies: dotlocking, fcntl(), lockf(), flock())

I also see odd errors while using SeaMonkey clients:

imap(luser): Error: fchown(/home/luser/mail/.subscriptions.lock,
group=501(coregroup)) failed: Operation not permitted (egid=200(users), group
based on /home/luser/mail - see http://wiki2.dovecot.org/Errors/ChgrpNoPerm)

imap(luser): Error: file_dotlock_open() failed with subscription file
/home/luser/mail/.subscriptions: Operation not permitted

.subscriptions doesn't exist either as a file or a directory in the named
directories.

Is there a "filter" against dot-files being opened within the bowels of dovecot?

Onto the "meat" of this "bug" report:

Dovecot: dovecot-2.2.30.2 
Slackware 9 (with most of the core libs upgraded to the latest possible)
Kernel: 2.4.35-ow2

Configure command: CC=gcc-3.4 CXX=g++-3.4 \
CFLAGS='-O2 -march=pentium4 -mtune=pentium4 -fPIC -fPIE \
 -fomit-frame-pointer -fstack-protector-all -D_FORTIFY_SOURCE=2' \
CFLAGS='-O2 -march=pentium4 -mtune=pentium4 -fPIC -fPIE \
 -fomit-frame-pointer -fstack-protector-all -D_FORTIFY_SOURCE=2' \
CPPFLAGS=-I/dev/shm/libstemmer_c/include \
LDFLAGS='-L/dev/shm/libstemmer_c -z relro -z now' \
./configure --prefix=/usr --with-ssldir=/etc/ssl --localstatedir=/var \
--sysconfdir=/etc/dovecot --with-bzlib --with-libcap --with-lz4 \
--with-textcat --with-stemmer --with-sql=yes --with-cdb \
--with-shadow --with-libwrap --with-moduledir=/usr/lib/dovecot \
--with-icu --with-lucene --with-sqlite --with-sql=yes

Build fix patch (mismatching prototype): https://pastebin.com/GS3a2DPX
Quota Overflow Fix Patch: https://pastebin.com/gsSXmkz9

Dovecot configuration: https://pastebin.com/JX43feFw

Without the patch:

# doveadm quota get -u luser
Quota name  Type  Value   Limit%
User quota  STORAGE 3365836 1305696  257
Group quota STORAGE   0   -0

(All attempts to add mail to any folder fail with a quota error.)

With the patch:

# doveadm quota get -u luser
Quota name  Type  Value   Limit %
User quota  STORAGE 3364608 55061
Group quota STORAGE   0   - 0

Thanks,
=M=


Re: Minor patches for builds against ancient platforms

2017-06-13 Thread M. Balridge

Timo Sirainen inscribed:

Have you set mbox_very_dirty_syncs=yes? That should be helpful.


Oh, that sounded like a risky option.

I do have mbox_dirty_syncs enabled.

Are there still "safety checks" with the extra down-and-dirty sync option?

Joseph Tam-a-lyne wrote:
> doveadm user $user
>
> which will supply the second half: it will spit out the UID, GID, home
> and mail directories of a user as specified by dovecot's
> configuration.

Yes, that outputs the UID/GID/location of user mail, which can feed a 
tool to audit and/or change directory permissions to conform to 
expectations.



This is a consequence of writing secure software: it employs least
privilege so that a fault will not result in someone being able to
mess around with someone else's mail (or indices).  GID can also
governaccess to shared mailboxes.


Sure, sure, I understand the notion, as I aspire towards "least 
privilege necessary" designs in my own software. In this case, it seemed 
that the software was throwing an error when it failed to do something 
most unprivileged processes cannot do: change the group ownership of an 
object to a group of which you're not a member.


I would certainly want log entries, sure... but an outright failure when 
ownership/u+ permissions are otherwise supportive of the operation in 
question?


I appreciate the fact my questions (and Piltdown Box) are probably 
noising up your list, and yet you're still both giving me the time of day.


My thanks, once again,
=M=


Re: Minor patches for builds against ancient platforms

2017-06-14 Thread M. Balridge
I've gone through and recompiled sendmail (enabling HASFLOCK) and procmail
(disabling lockf()) to harmonise the locking strategies, as it seems various
authors of email software over the years have pontificated with great force
and wind about which locking strategy was truly FUBARed and which was not.  

Naturally, different authors came to different conclusions, whilst sparing no
small amount of verbiage to lash out against platforms which committed the
most heinous crimes, and those whose turds are manna from heaven.

I've settled on flock (and dot-lock for writes), since NFS isn't used on
yourcavesgotmail.com.

Since I have allowed a limited use of UW-imapd, which has Crispinisms (R.I.P.,
dear Mark) of its own, including an unyielding embrace of flock() over
fcntl(), and I was NOT going to jump through the many hoops to re-build that
janky code even if I could find the myriad patches I need to apply to do so, I
chose the course of least pain - which still managed to involve bone knives
being inserted under my fingernails all the same. (That I would willingly do
this to myself should give you legitimate concern for my sanity, never mind
permission to keep molesting your INBOXes.)

*BUT* in a variation of the aphorism "all things taste better with butter",
all email access on hardware from the Pleistocene is better with Dovecot:
faster, smoother, and more functional all-around.

Practically moribund webmail (Horde/IMP) users who had given up on it are
saying they've been able to use it for the first time in years.

I cannot help but recall the housewife from _The Castle_
 who when asked how she made some
"extraordinarily great" food item, her reply was always the same: scooped it
out of the tub.



Because of Timo's and others' hard work, all I had to do was scoop the code
out of the tarball and compile + install it.

Once I resolve the symlinked mailboxes issue, I'll be able to walk away from
this completely. (Prayze beasyllabub!)

Quoting Joseph Tam-Thank-You-Ma'am :

> > One last problem area is that many users have soft-links to mailboxes
> > located on a second drive, but these never appear in folder enumeration
> > lists or they appear grayed out in SeaMonkey/Thunderbird. 
>
> It works for me.  From what I see, the ownership of the symlink is
> ignored; it's the underlying file that counts.  Maybe a subscription
> issue?

I've tried changing how I symbolically linked the mailboxes, i.e., creating a
sub-directory that is symlinked into the user's mail/ directory versus
symbolically linking the mbox files themselves, etc. No dice. Permissions are
fine. I've even resorted to changing the index locking strategy, to no avail.

Whether in Horde/IMP or current SeaMonkey, the "top-level" (symbolic link
itself) shows up, but doesn't show any sub-folders of any kind.

Folders that are NOT symbolically linked work perfectly, and have various
levels of hierarchy that are selectable as expected. Nothing appears in the 
logs.

$ cd ~/mail
$ ls -l
-rw---  1  2411625 Dec 16 09:12 Dovecot
lrwxrwxrwx  1   21 Jun 13 18:01 OldMail -> /u2/usermail/luser
-rwx--  8 4096 Jan  1 12:09 "Open Source Projects"

I've (rm ~/mail/.subscriptions && touch ~/mail/.subscriptions) to flush any
subscriptions file issues.

The permissions seem fine. The dreaded pirate UW-IMAPD displays them without
incident of any kind.  When I have the user switch to UW-IMAPD under Horde,
for example, the folders are fully available as expected.

$ ls -l / | grep u2
drwxr-xr-x  16 root root4096 Jun  8 18:20 u2

$ ls -l /u2
drwxr-x---  9 4096 Jun 14 17:07 usermail

$ ls -l /u2/usermail
drwx--  5 4096 Jun  7 01:11 luser

$ ls -l /u2/usermail/luser
drwx--  2 4096 Jun  7 01:11 OLD_INBOX
drwx--  2 4096 Jun  6 11:33 lists
drwx--  2 4096 Jun  7 01:11 saved-emails

$ ls -l /u2/usermail/luser/OLD_INBOX/
-rw---  1 367270796 Nov 18  2016 INBOX_2016_01_to_08

Is there a subtle interaction with mail_full_filesystem_access settings, or
similar that might be getting in the way?

Other data: there are fs quotas on / but not /u2.  That shouldn't matter, but
I will concede that I'm not a little ignorant about such things.

How might I go about further debugging this?  I've tried to manually doveadm
index those mailboxes, which doesn't give me any errors, but it also returns
far too quickly to give me the impression that it's done anything.  Same result.

When I issue IMAP commands to enumerate the mailboxes directly, I get:

$ telnet localhost 10143
127.0.0.1...
Connected to localhost.
Escape character is '^]'.
* OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE
STARTTLS AUTH=PLAIN AUTH=DIGEST-MD5 AUTH=CRAM-MD5] IMAP ready.

A login luser **
A OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE
SORT SORT=DISPLAY THREAD=REFERENCES 

Re: Minor patches for builds against ancient platforms

2017-06-16 Thread M. Balridge
Quoting Joseph Tam :

> If this is output on the dovecot server itself so there's no mismatch
> in pathnames.  Have you checked whether the dovecot user can traverse
> all the way from / to /u2/usermail/luser/

The dovecot user, as in the "dummy" user dovecot uses for sandboxing, or the
UID of the user logged in via IMAP through dovecot?

No, the (dovecot user) doesn't have access to the director(ies), but the
logged-in users DO.  That's no different from the mail directories in /home/*,
though, which are 0700/owned-by-their-respective-users.

I have confirmed by using "su -" to the various UIDs that they can fully
access the mailboxes behind the symlinked directories.

> I'm thinking no .subscription would be better.

Done, but it makes no difference.

> Dovecot does have chroot-ing stuff that might impede symlink following:
> 
>   https://wiki.dovecot.org/Chrooting

I'm not running dovecot chrooted.  That's a bear of a very different species,
and one I'd not care to wrestle for this sort of setup. Maybe in a "vpopmail"
type of situation where dovecot only runs as a delegate UID where there are no
real system UIDs/GIDs for the users in question.

There's no dovecotian "AllowSymlink" analogue to Apache's FollowSymLinks
directive, I assume?  I scoured through the documentation, but didn't see
anything, but it's not the first time I've missed things in documentation.

> It seems you have more basic file access problems.

I suspect so, but it's a strange one, because (al)pine and UW-imapd have
accessed these mailboxes without any issues for many years, as much as it
comes as a shock that such decrepit software could ever be accused of
performing correctly.

> Nothing with verbose logging (set mail_debug = yes)?  Try the simple case
> 
>   ln -s /u2/usermail/luser/OLD_INBOX/INBOX_2016_01_to_08
> ~luser/mail/testbox

Will do. My thanks, Joseph.

=M=


Re: Minor patches for builds against ancient platforms

2017-06-11 Thread M. Balridge
David "Show Me The Vintage!" McGuire wrote:

>   I for one am finding this thread extremely entertaining.  I have to
> wonder how you'd sound if you came across a machine that was actually
> OLD. ;)

Well, I am fond of "old" hardware, which may still be on the wrong side of the
New/Old divide for some of you: DECSYSTEM-20s and VAX 11/780s were the first
"big" systems I ran across that I admired a great deal as an assembly language
programmer and software engineer in general. 

[ IBM System/3X0 systems (and EBCDIC) were horrors, though POWER & PowerPC
were very interesting for me. ] My first assembly language was Z80, though,
which spoiled me for the more primitive CPUs like 6502.

I still maintain some respect for the old iron that could seemingly gracefully
handle 200+ users, many of them hammering the system with compile jobs,
Mandlebrot set "renders", or other geek-driven nonsense, without going off
into the weeds the way a 2017 system does when you do something trivial like
enumerate a large directory of files/folders recursively.

Joseph of Tam (I am) wrote:

> If all your concerned is dovecot dot-files, you can place the 
> indices somewhere else other than the user's filesapace.

When I manually created the .subscriptions file(s) in the right places, the
error message went away, and the functionality seems to work in SeaMonkey 
clients.

I wonder if it's a combination of permissions (even though the mail
directories are all owned by their respective users), or dovecot settings... 
On that note, has anyone written a tool that "harmonises" users mail
directories' permissions - ideally reading the dovecot configuration to assess
where *THE* mail directories are actually used by dovecot?  I was surprised by
the pickiness of the group ownership/permissions issues, though reflecting on
things, I can see why you'd at least want some logging by default for those
conditions.

His Timoness boomed:

> On 9 Jun 2017, at 5.03, M. Balridge <dove...@r.paypc.com> wrote:
>> 1) In src/lib/compat.h there is a definition for p(read|write) 
>> that conflicts with the one in /usr/include/unistd.h [...]
> 
> I don't know about this. Anyway, can't apply this patch since it 
> likely fails elsewhere.

Fair enough. I knew this was unlikely to be accepted for multiple reasons,
never mind a ferociously high potential-pain:reward ratio.

I'm happy to help in my insignificant way, re: the second patch.

DO many people use filesystem quotas with dovecot much, you think?

> I think it's just doing a lot of work on the mbox file itself 
>(reading/writing/rewriting). Would be nice of course if it logged 
> more information, but mbox format is a bit too legacy to spend 
> much time on improving.

I suspect the (heavy) use of procmail on Herr Frankbox is contributing to
either some lock "confusion" *OR* triggering dovecot to do "expensive" mbox
re-read/syncs or something?

There are mail-mulching scriplets in the global procmail (tied to spamassassin
results). Some daintily direct the dross to the /dev/null paradise in the sky.
Some "consume" the mail and redirect them to one of two or three folders
within mail/, while the "rest" allow procmail to append it to INBOX. 

My question is:

Is there is smarter way to do the "delivery" so that the dovecot system is
"informed" of an append (or excision), obviating or at least reducing the need
to perform more costly re-syncs (or timeouts awaiting a lock break)?

I anticipate a thundering herd declaiming that procmail is the spawn of Satan,
Hitler, and He-who-shall-not-be-named. As someone who was responsible for much
of them (nearing 10 years ago, now), I don't disagree with that view.

I don't have the budget or mandate to bring slivers of Elysium to this
downtrodden backwater of technology. I would expect that any use of procmail
with dovecot's "special" mail storage formats would *REQUIRE* the use of
"deliver" or some other tool to properly incorporate new mail into a dovecot 
hive.

My thanks as always,
=M= 


Re: Retrieving mail from read-only mdbox

2017-05-31 Thread M. Balridge
Quoting Mark Moseley :

> I've tried using IMAP with mail_location pointed at the snapshot, but,
> though I can get a listing of emails in the mailbox, the fetch fails when
> dovecot can't write-lock dovecot.index.log.

I'm surprised that dovecot would even try to write-lock a write-protected
file/directory, though I can appreciate the situation where a file may be in a
directory that is writable by some UID other than the one dovecot is running as.

Is there an unsafe control over lock_method similar to Samba's fake oplocks
setting in Dovecot?

If anyone wants some good "horror" writing, a perusal of Jeremy Allison's
write-up on the schizophrenic approaches to file-locking is worthy of your time.

https://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html

=M=


Re: [trees-plugin] - Dovecot index gets corrupted, when using maildir and recievend and accessing mail at the same time

2018-08-11 Thread M. Balridge
Quoting Joseph Tam :

> Another privacy plugin that assumes the server operator is unmotivated or
> respects your privacy anyways, and won't just skim your password right off
> the top to look at your mail.  A vault with steel walls and a dirt floor.

*SIGH* As usual, you're right on the money, Joseph.

I used to let things like this "slide", but somewhat recently I've had some
clients badgering me to implement something like this. It takes longer than it
should to explain how pointless the exercise is.

Given that:

1) Email transactions, from submission, to delivery, to final reception by a
MUA, are done with plaintext contents. Those who want security, will undergo
the additional steps and hassles with using PGP to encrypt the contents,
providing the only demonstrably secure (against "Evil SysAdmins") means of
cloaking your content. The submission, delivery, and final reception is still
performed as "plaintext", albeit with an attachment that is encrypted, a
process done (and undone) by the ultimate endpoint clients.

2) Even if the "Evil SysAdmin" doesn't scribble all of the users' passphrases
into a log, it's trivial for various tools, many of which were hastily cobbled
together during the fad of implementing Sarbanes-Oxley Act (SOX) compliance on
mail servers. Tools like "milter-bcc" and friends which automatically clone
all email submitted to or arriving through SMTP, etc. It doesn't matter if
your SMTP software implements 65,536 Jiggabyte Key Quantum-Computing-Resistant
crypto, when it has the decrypted contents in its spool.

I imagine this is an exercise in buzzword collection, and to be seen to be
"doing something" to improve security and/or privacy.

If privacy is desired, there are only end-to-end encryption/signature schemes
to ensure anything at all, and even there we're at the mercy of mathematical
gods greater than we.

Looking to a "magical" oracle on your server to do it for you, whilst keeping
all of the leaky, plaintext, and promiscuous protocols (DSN, bounces,
intermediate MXer hosts that eruct contents to various envelope addresses,
etc) that will betray you behind your back without a moment's notice is a
Fool's Errand.

Think it over.

=M=



Re: Storing Messages in the cloud

2018-07-10 Thread M. Balridge
Quoting Dave McGuire :

> On 07/10/2018 09:23 AM, dcl...@list.jmatt.net wrote:
> >> A colleague asked me if it was possible for Dovecot to store messages
> >> in the cloud.
> >
> > Does he have a more specific description of what he wants than "in the
> > cloud", or does he just like using buzzwords? - From a user
> perspective,
> > I would say that Dovecot, or any other IMAP server, already stores
> > messages "in the cloud".  They are on a remote server, accessible
> from
> > any location by any device that has a functional IMAP client.
> 
>   I'm glad someone else said it. ;)  When I read the OP's message I just
> sat there shaking my head.  As for his colleague, the local McDonald's
> is hiring.

While I chuckled as well, I imagine what Jerry might be asking is how to use a
remote file storage protocol or even a key-value storage backend, or something
else that simply provides for remote back-end storage.

I suppose this is for those shops who do not want to spill any potentially
confidential data - it's easy to store all of the data remotely with symmetric
encryption so it's opaque to the remote provider - but do not or cannot deal
with tasks and maintenance of backing up and maintaining the data and storage
equipment to support its operations.

I also imagine the hard-working crew at Timo, Inc. have such things for sale
to facilitate this with security and good performance, but they won't be
inexpensive.

See: http://www.dovecot.fi/products/

"Dovecot Object Storage" is probably what you're asking about, but you know
the drill with software that doesn't have displayed pricing: you probably
cannot afford it.

=M=



Re: Solr

2019-01-02 Thread M. Balridge


> The main problem is : After some time of indexing from Dovecot, Dovecot
> returns errors (invalid SID, etc...) and Solr return "out of range
> indexes" errors

I've been watching the progress of this thread with no small concern, mainly
because I've been tasked with providing a server-side email search facility
with a budget and manpower level that comes down to mainly *1*, i.e., me.

I was expecting, given the strongly worded language about "just use
lucene/SOLR" and "ignore squat", that I should invest time + effort into this
JAVA nightmare that is SOLR.

I started with squat and another word-indexor system that used out-of-band
(not a dovecot plugin) software to provide rapid (sub-second) searches through
tens-of-GB-scale mailboxes.

Unlike what I was led to believe, the squat indexes worked surprisingly well,
once you sorted out the odd resource size (ulimit-related) issues (vsz &
friends) limitations. I did notice the "worst-case" search performance have
worryingly high O(x) increases in time, but I'd not seen anything that was a
dealbreaker. It goes without saying that various substring searches worked as
expected, for the most part.

My experiences with SOLR were similar to Messr. Moreau's: lots of startup
errors with provided schemata files. Lots of JAVA nonsense issues. Lots of
sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM,
version of SOLR, and dovecot to find the "best" working combination, only to
find that the searches didn't work out as expected. I expected to be able to
do date-ranging based searches. Didn't work. I expected to search CONTENTS of
emails, and despite many days of tweaks, I couldn't get it to index even the
basics like filenames/types of attachments, so I could exposed
attachment-based searching to my users.

So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a
Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the
following functionality:

1) The ability to search for a string within any of the structured fields
(from/subject) that returns correct results?

2) The ability to search for any string within the BODY of emails, including
the MIME attachment boundaries?

3) The ability to do "ranging" searches for structures within emails that
decompose to "dates" or other simple-numeric data?

OPTIONALLY, and this is probably way outside of the scope of the above,
despite the fact that it's listed as a "selling point" of SOLR versus other
full text search engines:

4) The ability to do searches against any attachments that are able to be
post-processed and hyper-indexed by SOLR+Tika?

-

SOLR seems to have "brand cachet", so presumably it actually works (for 
somebody).

Dovecot has not a little "brand cachet", and for me, I have innate faith and
trust in Timo and his software. I am no stranger to the "costs" of "free"
software, in that you sacrifice your own blood, sweat, and tears just to get
these disparate pieces to work together.

I *DO* respect that Timo has to keep the lights (and sauna) on in Finland.
Maybe there's a super-secret (no advertised prices, "carrier-only" price list)
with _Dovecot, Oy_ wherein the above ARE actually available for something less
than 6.022 x 10^23 Euros per centi-second of licencing fees.

But please, level with us faithful users.  Does this morass of Java B.S.
actually work, and if not, please just deprecate and remove this moribund
software, and stop trying to bury the only FTS plugin many of us HAVE actually
gotten to work.  (Pretty please?)

I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S.
to actually work, as I have. 

He persevered where I'd given up. He's vocal about it, and now I'm chiming in
that this ornate collection of switchblades only cuts those who try to use them.

Respectfully,
=M=



Re: Mailing list address harvested for spamming

2018-12-01 Thread M. Balridge
Quoting dovecot-...@deemzed.uk:

> Not to stir the pot, but I notice my email address has recently been
> harvested from this list for spamming purposes. This email address is
> unique and not used for anything else.
> 
> I'd distinguish this from spam sent to the mailing list itself, which is
> obviously different.
> 
> Is there anything further that could be done to prevent this?

It's practically impossible to "police" all of those who sign up for a mailing
list that they do so for honest or constructive intentions. In addition,
copies of this mailing list are archived by various online search engines and
indexors, from content maintained or published by the list operators.

You're already using unique mail addresses, which is a sensible strategy, and
one I use myself. In fact, I use a scheme whereby I don't need to change or
update any back-end settings to deal with a multitude of unique and ad-hoc
specified addresses for every vendor/supplier and interaction point I deal with.

In short, if you use a public mailing list, expect that the address you use
for it will be discovered and abused by the nefarious marketeers of the High
Bit Seas.

Cordially,
=Malcky=



Re: Solr connection timeout hardwired to 60s

2019-04-04 Thread M. Balridge via dovecot


> I'm a denizen of the solr-u...@lucene.apache.org mailing list.
> [...]
> Here's a wiki page that I wrote about that topic.  This wiki is going
> away next month, but for now you can still access it:
> 
> https://wiki.apache.org/solr/SolrPerformanceProblems

That's a great resource, Shawn.

I am about to put together a test case to provide a comprehensive FTS setup
around Dovecot with a goal towards exposing proximity keyword searching, with
email silos containing tens of terabytes (most of the "bulk" is represented by
attachments, each of which get processed down to plaintext, if possible).
Figure thousands of users with decades of email (80,000 to 750,000) emails per
user).

My main background is in software engineering (C/C++/Python/Assembler), but I
have been forced into system admin tasks during many stretches of my work. I
do vividly remember the tedium of dealing with JAVA and GC, tuning it to avoid
stalls, and its ravenous appetite for RAM. 

It looks like those problems are still with us, many versions later.  For
corporations with infinite budgets, throwing lots of crazy money at the
problem is "fine" (>1TB RAM, all PCIe SSDs, etc), but I am worried that I will
be shoved forcefully into a wall of having to spend a fortune just to keep FTS
performing reasonably well before I even get to the 10,000 user mark.

I realise the only way to keep performance reasonable is to heavily shard the
index database, but I am concerned about how well the process works in
practice without needing a great deal of sysadmin hand-holding. I would
ideally prefer the decisions of how/where to shard be based on
volume/heuristics than something that is done manually. I realise that a human
will be necessary to add more hardware to the pools, but what are my options
for scaling the system by orders of magnitude?

What is a general rule of thumb for RAM and SSD disk requirements as a
fraction of indexed document hive size to keep query performance at 200ms or
less? How do people deal with the JAVA GC world-stoppages, other than simply
doubling or tripling every instance?

I am wondering how well alternatives to Solr work in these situations
(ElasticSearch, Xapian, and any others I may have missed).

Regards,

=M=