Re: Solr -> Xapian ?

2019-01-22 Thread Joan Moreau via dovecot
 
greatest value), or the gratest value (which may not be the latest) (the code of 
existing plugins is unclear about this, Solr looks for the greatest for insance)


All the mails are always supposed to be indexed from the beginning to the last 
indexed mail. If there's a gap, indexer first indexes all the missing mails. So 
the latest UID is supposed to be the greatest UID. (Supporting out-of-order 
indexing would be rather difficult to keep track of.)


Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the 
link with "build_more" ?


The idea is that it calls something like:

- build_key(type=hdr, hdr_name=From)
- build_more("t...@iki.fi")
- build_key(type=hdr, hdr_name=Subject)
- build_more("Re: Solr -> Xapian ?")
- build_key(type=body_part)
- build_more("message body piece")
- build_more("message body piece2")
...


Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, 
to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ?


lookup() gets struct mail_search_arg *args, which contains the entire IMAP 
SEARCH query. This could be used for more or less complex query builders.

In case of a single header search, you should have args->args->hdr_field_name contain 
the header name and args->args->value.str contain the content you're searching for.


Q4 : Refresh : this is very unclear. How come there would not be the "latest" 
view on index. What is the real meaning of this function ?


In case of Xapian it might not matter if it automatically refreshes its indexes 
between each query. But with some other indexes this could happen:

- IMAP session is opened
- IMAP SEARCH is run, which opens and searches the index
- a new mail is delivered to the mailbox and indexed
- IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail 
and doesn't include it in the search results.


Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ?


It's run when "doveadm fts rescan" is run manually. Usually that's only run 
manually to fix up some brokenness. So it's intended to verify that the current mailbox 
contents match the FTS indexes:
- If there are any mails in FTS index that no longer exist in the actual 
mailbox, delete those mails from FTS
- If FTS is missing any mails in the middle of the mailbox, make sure that the 
next mailbox indexing will index those missing mails. I think currently this 
basically means reindexing all the mails since the first missing mail, even the 
mails that are already in the index.

fts-lucene implements this, but other FTS backends are lazy and simply rebuild 
all mails. Actually fts-solr is bad because it doesn't even delete the extra 
mails.

Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ? 
and finally , for fts_backend__lookup_multi, why is that backend dependent ?


This function is called only when searching in virtual folders. So for
example the virtual "All mails" folder, which would contain all mails in
all folders. In that case the boxes[] would contain a list of user's all
folders, except Trash and Spam. If lookup_multi() isn't implemented
(left to NULL), the search is run separately via lookup() for each
folder. With lookup_multi() there can be just one lookup, and the
backend can filter only the wanted folders and return them directly. So
it's an optimization for FTS indexes that support user-global searches
rather than only per-folder searches.


static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, struct 
mailbox *const boxes[], struct mail_search_arg *args, enum fts_lookup_flags 
flags, struct fts_multi_result *result)
{
struct xapian_fts_backend_update_context *ctx =
(struct xapian_fts_backend_update_context *)_ctx;

int i=0;

while(boxes[i]!=NULL)
{
if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0)
 return -1;
i++;
}
return 0;
}


See fts_backend_lookup_multi() - if you leave lookup_multi=NULL it
basically does this.


For "rescan " and "optimize", wouldn't it be the dovecot core who indicate 
which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would 
the backend be aware of the transactions on the mailbox ???


rescan() is about fixing up a more or less broken index, or simply to
verify that it's all ok. So core doesn't know what messages exist in the
FTS index and can't request specific reindexing or expunging. I guess an
alternative API could have been to have functions that iterate through
all mails in the index, and use that to implement rescan in core. Now
thinking about it, that sounds like a simpler and better way.

optimize() is currently done only when explicitly running "doveadm fts
optimize", which requests running a slower index optimization. 

Re: Solr -> Xapian ?

2019-01-13 Thread Timo Sirainen
On 13 Jan 2019, at 10.45, Joan Moreau via dovecot  wrote:
> 
> Now, I can see in the logs that several times, the dovecot calls the 
> fts_backend_xapian_update_set_mailbox with box == NULL. WHy so ?
> 
fts-api.h says:

/* Switch to updating the specified mailbox. box may also be set to NULL to
   make sure the previous mailbox won't tried to be accessed anymore. */
void fts_backend_update_set_mailbox(struct fts_backend_update_context *ctx,
struct mailbox *box);

So it's just telling you that you can close/free any stuff related to that 
mailbox.
>> additionally, my logic is that the backend stores one databalse per mailox 
>> in /xapian-indexes (in the "root" dir of the user), the name od the database 
>> is the GUID of the mailbox
>> 
>> For INBOX, that works perfectly, and database is properly createdm and 
>> backed starts indexing all emails
>> 
>> For other folder, somehow, the process can not access that (root) folder.
>> 
>> Am I missing something ?
>> 

This is a bit ambiguous, because some people mean mailbox=folder and others 
mean mailbox=user account, and GUID can also be the internal Dovecot folder 
GUID, or a GUID of the user.

I'd recommend using a single database per user anyway.



Re: Solr -> Xapian ?

2019-01-13 Thread Joan Moreau via dovecot
because fts_squat is set to be deleted 

Xapian and similar libraries offers a very easy interface for FTS 

(and basically, I have done it already) 


On 2019-01-07 18:31, Michael Slusarz wrote:

Maybe a dumb question (I admit I haven't followed this thread very closely)... 

But why are you writing a new FTS driver?  If squat allegedly does everything you need it to do, why don't you just take that plugin and fix it up to do what you need?  That seems way easier than trying to create a FTS driver from scratch. 

michael 

On January 7, 2019 at 7:05 AM Joan Moreau via dovecot  wrote: 

Hi 

ANyone to answer specifically ? 

Q1 : get_last_uid -> Is this the last UID indexed (which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) 

Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? 

Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? 

Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? 

Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? 


Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ?

Re: Solr -> Xapian ?

2019-01-13 Thread Joan Moreau via dovecot

I found the solution o this using
SEQ_RANGE_ARRAY_ADD(>DEFINITE_UIDS, UID); 


Now, I can see in the logs that several times, the dovecot calls the
fts_backend_xapian_update_set_mailbox with box == NULL. WHy so ? 

THank you 


On 2019-01-12 21:40, Joan Moreau via dovecot wrote:

I somehow fixed the folder issue. (seems some unix rights after too many tests) 

Getting back on the "fts_results" structure: 

I am trying: 


I_ARRAY_INIT(&(RESULT->DEFINITE_UIDS),R->SIZE);
I_ARRAY_INIT(&(RESULT->MAYBE_UIDS),0); 


uint32_t uid;
for(i=0;isize;i++)
{
try
{
uid=atol(backend->dbr->get_document(r->data[i]).get_value(1).c_str());
i_warning("Rresult UID=%d",uid);
ARRAY_IDX_SET(&(RESULT->DEFINITE_UIDS),I,);
}
catch(Xapian::Error e)
{
i_warning(e.get_msg().c_str());
}
} 

I can see in hte log that UID are properly found on Xapian database, but no results are transmitted to dovecot and to the imap client (roundcube in my case) 

Help please :) 

On 2019-01-12 18:15, Joan Moreau wrote: 

additionally, my logic is that the backend stores one databalse per mailox in /xapian-indexes (in the "root" dir of the user), the name od the database is the GUID of the mailbox 

For INBOX, that works perfectly, and database is properly createdm and backed starts indexing all emails 

For other folder, somehow, the process can not access that (root) folder. 

Am I missing something ? 

On 2019-01-12 17:37, Joan Moreau wrote: 

THank you 

Now, for the results 

I see the member of fts_result is : 


ARRAY_TYPE(seq_range) definite_uids;

I have the UID as a aray of uint32_t * 

How to put my UIDs into this "definite_uids" ? Obviously this is not a simple array/pointer. How to say someting similar to result->definite_uids[1]=my_uid ? 

On 2019-01-12 10:25, Timo Sirainen wrote: 
On 11 Jan 2019, at 21.23, Joan Moreau via dovecot  wrote: 
The below patch resolves the compilation error


$ diff -p compat.h compat.h.joan 
*** compat.h 2019-01-11 20:21:00.726625427 +0100

--- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
*** struct iovec;
*** 202,207 
--- 202,211 
ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
#endif

+ #ifdef __cplusplus
+ extern "C" {
+ #endif

You should put this extern "C" into the C++ file you're creating. See for 
example how fts-lucene/lucene-wrapper.cc does this.

1 - WHat does represent "subargs" in mail_search_args 
It's set only for SEARCH_OR and SEARCH_SUB. So for example:


SEARCH TEXT foo TEXT bar TEXT baz

results in:

type=SEARCH_SUB
value.subargs = (
{ type=SEARCH, value.str="foo" },
{ type=SEARCH, value.str="bar" },
{ type=SEARCH, value.str="baz" },
)

Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other 
combination of OR/ANDs.
2 - for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large) 
The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it.


3 - for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ? 
I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order.


You can implement get_last_uid() simply by keeping track of it in 
dovecot.index* files, similar to how Lucene and Solr already do it with 
fts_index_get_header() / fts_index_set_header(). They also have a fallback that 
if the index doesn't have the last_uid value, they do a slower search from the 
Lucene/Solr index to find the last UID.

Re: Solr -> Xapian ?

2019-01-12 Thread Joan Moreau via dovecot

I somehow fixed the folder issue. (seems some unix rights after too many
tests) 

Getting back on the "fts_results" structure: 

I am trying: 


I_ARRAY_INIT(&(RESULT->DEFINITE_UIDS),R->SIZE);
I_ARRAY_INIT(&(RESULT->MAYBE_UIDS),0); 


uint32_t uid;
for(i=0;isize;i++)
{
  try
  {

uid=atol(backend->dbr->get_document(r->data[i]).get_value(1).c_str());

 i_warning("Rresult UID=%d",uid);
 ARRAY_IDX_SET(&(RESULT->DEFINITE_UIDS),I,);
  }
  catch(Xapian::Error e)
  {
 i_warning(e.get_msg().c_str());
  }
} 


I can see in hte log that UID are properly found on Xapian database, but
no results are transmitted to dovecot and to the imap client (roundcube
in my case) 

Help please :) 


On 2019-01-12 18:15, Joan Moreau wrote:

additionally, my logic is that the backend stores one databalse per mailox in /xapian-indexes (in the "root" dir of the user), the name od the database is the GUID of the mailbox 

For INBOX, that works perfectly, and database is properly createdm and backed starts indexing all emails 

For other folder, somehow, the process can not access that (root) folder. 

Am I missing something ? 

On 2019-01-12 17:37, Joan Moreau wrote: 

THank you 

Now, for the results 

I see the member of fts_result is : 


ARRAY_TYPE(seq_range) definite_uids;

I have the UID as a aray of uint32_t * 

How to put my UIDs into this "definite_uids" ? Obviously this is not a simple array/pointer. How to say someting similar to result->definite_uids[1]=my_uid ? 

On 2019-01-12 10:25, Timo Sirainen wrote: 
On 11 Jan 2019, at 21.23, Joan Moreau via dovecot  wrote: 
The below patch resolves the compilation error


$ diff -p compat.h compat.h.joan 
*** compat.h 2019-01-11 20:21:00.726625427 +0100

--- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
*** struct iovec;
*** 202,207 
--- 202,211 
ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
#endif

+ #ifdef __cplusplus
+ extern "C" {
+ #endif

You should put this extern "C" into the C++ file you're creating. See for 
example how fts-lucene/lucene-wrapper.cc does this.

1 - WHat does represent "subargs" in mail_search_args 
It's set only for SEARCH_OR and SEARCH_SUB. So for example:


SEARCH TEXT foo TEXT bar TEXT baz

results in:

type=SEARCH_SUB
value.subargs = (
{ type=SEARCH, value.str="foo" },
{ type=SEARCH, value.str="bar" },
{ type=SEARCH, value.str="baz" },
)

Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other 
combination of OR/ANDs.
2 - for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large) 
The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it.


3 - for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ? 
I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order.


You can implement get_last_uid() simply by keeping track of it in 
dovecot.index* files, similar to how Lucene and Solr already do it with 
fts_index_get_header() / fts_index_set_header(). They also have a fallback that 
if the index doesn't have the last_uid value, they do a slower search from the 
Lucene/Solr index to find the last UID.

Re: Solr -> Xapian ?

2019-01-12 Thread Joan Moreau via dovecot

additionally, my logic is that the backend stores one databalse per
mailox in /xapian-indexes (in the "root" dir of the user), the name od
the database is the GUID of the mailbox 


For INBOX, that works perfectly, and database is properly createdm and
backed starts indexing all emails 


For other folder, somehow, the process can not access that (root)
folder. 

Am I missing something ? 


On 2019-01-12 17:37, Joan Moreau wrote:

THank you 

Now, for the results 

I see the member of fts_result is : 


ARRAY_TYPE(seq_range) definite_uids;

I have the UID as a aray of uint32_t * 

How to put my UIDs into this "definite_uids" ? Obviously this is not a simple array/pointer. How to say someting similar to result->definite_uids[1]=my_uid ? 

On 2019-01-12 10:25, Timo Sirainen wrote: 
On 11 Jan 2019, at 21.23, Joan Moreau via dovecot  wrote: 
The below patch resolves the compilation error


$ diff -p compat.h compat.h.joan 
*** compat.h 2019-01-11 20:21:00.726625427 +0100

--- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
*** struct iovec;
*** 202,207 
--- 202,211 
ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
#endif

+ #ifdef __cplusplus
+ extern "C" {
+ #endif

You should put this extern "C" into the C++ file you're creating. See for 
example how fts-lucene/lucene-wrapper.cc does this.

1 - WHat does represent "subargs" in mail_search_args 
It's set only for SEARCH_OR and SEARCH_SUB. So for example:


SEARCH TEXT foo TEXT bar TEXT baz

results in:

type=SEARCH_SUB
value.subargs = (
{ type=SEARCH, value.str="foo" },
{ type=SEARCH, value.str="bar" },
{ type=SEARCH, value.str="baz" },
)

Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other 
combination of OR/ANDs.
2 - for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large) 
The next indexing run is responsible for it. If you return get_last_uid=0, then indexer starts feeding you all mails. So fts backend doesn't have to know about it.


3 - for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ? 
I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 have been indexed by the FTS backend. It's possible that at this point there are already mails with UIDs 101..200 in the folder. So when UID=201 is delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so far, and starts feeding it UIDs 101..201 in that order.


You can implement get_last_uid() simply by keeping track of it in 
dovecot.index* files, similar to how Lucene and Solr already do it with 
fts_index_get_header() / fts_index_set_header(). They also have a fallback that 
if the index doesn't have the last_uid value, they do a slower search from the 
Lucene/Solr index to find the last UID.

Re: Solr -> Xapian ?

2019-01-12 Thread Joan Moreau via dovecot
THank you 

Now, for the results 

I see the member of fts_result is : 


ARRAY_TYPE(seq_range) definite_uids;

I have the UID as a aray of uint32_t * 


How to put my UIDs into this "definite_uids" ? Obviously this is not a
simple array/pointer. How to say someting similar to
result->definite_uids[1]=my_uid ? 


On 2019-01-12 10:25, Timo Sirainen wrote:

On 11 Jan 2019, at 21.23, Joan Moreau via dovecot  wrote: 


The below patch resolves the compilation error

$ diff -p compat.h compat.h.joan 
*** compat.h 2019-01-11 20:21:00.726625427 +0100

--- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
*** struct iovec;
*** 202,207 
--- 202,211 
ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
#endif

+ #ifdef __cplusplus
+ extern "C" {
+ #endif


You should put this extern "C" into the C++ file you're creating. See for 
example how fts-lucene/lucene-wrapper.cc does this.


1 - WHat does represent "subargs" in mail_search_args


It's set only for SEARCH_OR and SEARCH_SUB. So for example:

SEARCH TEXT foo TEXT bar TEXT baz

results in:

type=SEARCH_SUB
value.subargs = (
{ type=SEARCH, value.str="foo" },
{ type=SEARCH, value.str="bar" },
{ type=SEARCH, value.str="baz" },
)

Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other 
combination of OR/ANDs.


2 - for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large)


The next indexing run is responsible for it. If you return get_last_uid=0, then 
indexer starts feeding you all mails. So fts backend doesn't have to know about 
it.


3 - for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ?


I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 
have been indexed by the FTS backend. It's possible that at this point there 
are already mails with UIDs 101..200 in the folder. So when UID=201 is 
delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so 
far, and starts feeding it UIDs 101..201 in that order.

You can implement get_last_uid() simply by keeping track of it in 
dovecot.index* files, similar to how Lucene and Solr already do it with 
fts_index_get_header() / fts_index_set_header(). They also have a fallback that 
if the index doesn't have the last_uid value, they do a slower search from the 
Lucene/Solr index to find the last UID.

Re: Solr -> Xapian ?

2019-01-12 Thread Timo Sirainen
On 11 Jan 2019, at 21.23, Joan Moreau via dovecot  wrote:
> 
> The below patch resolves the compilation error
> 
> $ diff -p compat.h compat.h.joan 
> *** compat.h 2019-01-11 20:21:00.726625427 +0100
> --- compat.h.joan 2019-01-11 20:14:41.729109919 +0100
> *** struct iovec;
> *** 202,207 
> --- 202,211 
> ssize_t i_my_writev(int fd, const struct iovec *iov, int iov_len);
> #endif
> 
> + #ifdef __cplusplus
> + extern "C" {
> + #endif
> 

You should put this extern "C" into the C++ file you're creating. See for 
example how fts-lucene/lucene-wrapper.cc does this.

> 1 - WHat does represent "subargs" in mail_search_args

It's set only for SEARCH_OR and SEARCH_SUB. So for example:

SEARCH TEXT foo TEXT bar TEXT baz

results in:

type=SEARCH_SUB
value.subargs = (
  { type=SEARCH, value.str="foo" },
  { type=SEARCH, value.str="bar" },
  { type=SEARCH, value.str="baz" },
)

Or similarly if there's SEARCH OR foo OR TEXT bar TEXT baz or some other 
combination of OR/ANDs.
 
> 2 - for rescan : who is responsible for passing again the new email ? Is
> the Dovecot core sending again all the emails to index ? or the fts
> shall somehow access the mailbox and read all emails ? Wouldn't just be
> saying "delete all index and get_last_uid is now 0" the easy way ? or
> the fts must process all emails (and block the current thread as a
> mailbx maybe quite large)

The next indexing run is responsible for it. If you return get_last_uid=0, then 
indexer starts feeding you all mails. So fts backend doesn't have to know about 
it.

> 3 - for get_last_uid : this uncertainity is very unclear. "If there is a
> gap, then indexer first indexes all the missing" -> this mean at a
> certain point, indexer maybe rebuilding a previous email, so *last* uid
> is something different than max. And how indexer does know whther there
> is a gap wihtout callong the fts backend (whch it does not as there are
> no function for that) ?

I mean if get_last_uid() returns for example 100, it means that UIDs 1..100 
have been indexed by the FTS backend. It's possible that at this point there 
are already mails with UIDs 101..200 in the folder. So when UID=201 is 
delivered, indexer notices that FTS backend has only UIDs 1..100 indexed so 
far, and starts feeding it UIDs 101..201 in that order.

You can implement get_last_uid() simply by keeping track of it in 
dovecot.index* files, similar to how Lucene and Solr already do it with 
fts_index_get_header() / fts_index_set_header(). They also have a fallback that 
if the index doesn't have the last_uid value, they do a slower search from the 
Lucene/Solr index to find the last UID.



Re: Solr -> Xapian ?

2019-01-11 Thread fauno
El 04/01/19 a las 03:20, Joan Moreau via dovecot escribió:
> What about consedering linking Dovecot with Xapian librairies instead of
> going to nightmare Solr ?
> https://xapian.org/features

given that notmuch already does a good job at indexing email (although
only supports maildirs afaik), wouldn't it be simpler to write a plugin
for running notmuch searches from dovecot?

https://notmuchmail.org/



Re: Solr -> Xapian ?

2019-01-11 Thread Joan Moreau via dovecot
../../../src/lib/compat.h:208:20: error: conflicting declaration of 
'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C' 
linkage 
# define pwrite i_my_pwrite 

Any help welcome 

Hi, 

I figured out the "namespace" issue 

Remaining questions are : 

1 - WHat does represent "subargs" in mail_search_args 

2 - for rescan : who is responsible for passing again the new email ? Is 
the Dovecot core sending again all the emails to index ? or the fts 
shall somehow access the mailbox and read all emails ? Wouldn't just be 
saying "delete all index and get_last_uid is now 0" the easy way ? or 
the fts must process all emails (and block the current thread as a 
mailbx maybe quite large) 

3 - for get_last_uid : this uncertainity is very unclear. "If there is a 
gap, then indexer first indexes all the missing" -> this mean at a 
certain point, indexer maybe rebuilding a previous email, so *last* uid 
is something different than max. And how indexer does know whther there 
is a gap wihtout callong the fts backend (whch it does not as there are 
no function for that) ? 

4 - How to update configure.ac & additional files to add the 
"--with-xapian" wichi will test for libxapian presence and add it to the 
build ? 

Thank you 

On 2019-01-08 04:24, Timo Sirainen wrote: 

On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < dovecot@dovecot.org> 
wrote: 
Hi 

ANyone to answer specifically ? 

Q1 : get_last_uid -> Is this the last UID indexed (which may be not the 
greatest value), or the gratest value (which may not be the latest) (the 
code of existing plugins is unclear about this, Solr looks for the 
greatest for insance) 
All the mails are always supposed to be indexed from the beginning to 
the last indexed mail. If there's a gap, indexer first indexes all the 
missing mails. So the latest UID is supposed to be the greatest UID. 
(Supporting out-of-order indexing would be rather difficult to keep 
track of.) 

Q2 : WHen Indexing an email, the data is not passed by "build_key". Why 
so ? What is the link with "build_more" ? 
The idea is that it calls something like: 

- build_key(type=hdr, hdr_name=From) 
- build_more(" t...@iki.fi") 
- build_key(type=hdr, hdr_name=Subject) 
- build_more("Re: Solr -> Xapian ?") 
- build_key(type=body_part) 
- build_more("message body piece") 
- build_more("message body piece2") 
... 

Q3 : Searching/Lookup : THe fheader in which to llok for (must be a 
least among "cc, to, from, subject, body") is not appearing in the 
'struct' data. WHere to find it ? 
lookup() gets struct mail_search_arg *args, which contains the entire 
IMAP SEARCH query. This could be used for more or less complex query 
builders. 

In case of a single header search, you should have 
args->args->hdr_field_name contain the header name and 
args->args->value.str contain the content you're searching for. 

Q4 : Refresh : this is very unclear. How come there would not be the 
"latest" view on index. What is the real meaning of this function ? 
In case of Xapian it might not matter if it automatically refreshes its 
indexes between each query. But with some other indexes this could 
happen: 

- IMAP session is opened 
- IMAP SEARCH is run, which opens and searches the index 
- a new mail is delivered to the mailbox and indexed 
- IMAP SEARCH is run. Without refresh() it doesn't see the newly 
indexed mail and doesn't include it in the search results. 

Q5 : Rescan : is it just a bout remonving all indexes for a specific 
mailbox ? 
It's run when "doveadm fts rescan" is run manually. Usually that's only 
run manually to fix up some brokenness. So it's intended to verify that 
the current mailbox contents match the FTS indexes: 
- If there are any mails in FTS index that no longer exist in the 
actual mailbox, delete those mails from FTS 
- If FTS is missing any mails in the middle of the mailbox, make sure 
that the next mailbox indexing will index those missing mails. I think 
currently this basically means reindexing all the mails since the first 
missing mail, even the mails that are already in the index. 

fts-lucene implements this, but other FTS backends are lazy and simply 
rebuild all mails. Actually fts-solr is bad because it doesn't even 
delete the extra mails. 

Q6 : lokkup_multi : isn't the function the same for all plugnins (see 
below) ?and finally , for fts_backend__lookup_multi, why is that 
backend dependent ? 
This function is called only when searching in virtual folders. So for 
example the virtual "All mails" folder, which would contain all mails in 
all folders. In that case the boxes[] would contain a list of user's all 
folders, except Trash and Spam. If lookup_multi() isn't implemented 
(left to NULL), the search is run separately via lookup() for each 
folder. With lookup_multi() there can be jus

Re: Solr -> Xapian ?

2019-01-11 Thread Joan Moreau via dovecot

There is no point into a separate plugin, the purpose is to replace
squat as the default fts (solr being a nightmare) 


On 2019-01-11 18:23, Aki Tuomi wrote:

I would recommend making this a standalone plugin for now instead of trying to keep it in core fts.  

Aki 

On 11 January 2019 at 18:40 Joan Moreau via dovecot < dovecot@dovecot.org> wrote: 

I managed to deal with the namespace issue (updated makefile.am) 

However, I reach : 

../../../src/lib/compat.h:207:19: error: conflicting declaration of 
'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage 
# define pread i_my_pread 
^~ 
../../../src/lib/compat.h:210:9: note: previous declaration with 'C++' 
linkage 
ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset); 
^~ 
../../../src/lib/compat.h:208:20: error: conflicting declaration of 
'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C' 
linkage 
# define pwrite i_my_pwrite 

Any help welcome 

Hi, 

I figured out the "namespace" issue 

Remaining questions are : 

1 - WHat does represent "subargs" in mail_search_args 

2 - for rescan : who is responsible for passing again the new email ? Is 
the Dovecot core sending again all the emails to index ? or the fts 
shall somehow access the mailbox and read all emails ? Wouldn't just be 
saying "delete all index and get_last_uid is now 0" the easy way ? or 
the fts must process all emails (and block the current thread as a 
mailbx maybe quite large) 

3 - for get_last_uid : this uncertainity is very unclear. "If there is a 
gap, then indexer first indexes all the missing" -> this mean at a 
certain point, indexer maybe rebuilding a previous email, so *last* uid 
is something different than max. And how indexer does know whther there 
is a gap wihtout callong the fts backend (whch it does not as there are 
no function for that) ? 

4 - How to update configure.ac & additional files to add the 
"--with-xapian" wichi will test for libxapian presence and add it to the 
build ? 

Thank you 

On 2019-01-08 04:24, Timo Sirainen wrote: 

On 7 Jan 2019, at 16.05, Joan Moreau via dovecot < dovecot@dovecot.org> 
wrote: 
Hi 

ANyone to answer specifically ? 

Q1 : get_last_uid -> Is this the last UID indexed (which may be not the 
greatest value), or the gratest value (which may not be the latest) (the 
code of existing plugins is unclear about this, Solr looks for the 
greatest for insance) 
All the mails are always supposed to be indexed from the beginning to 
the last indexed mail. If there's a gap, indexer first indexes all the 
missing mails. So the latest UID is supposed to be the greatest UID. 
(Supporting out-of-order indexing would be rather difficult to keep 
track of.) 

Q2 : WHen Indexing an email, the data is not passed by "build_key". Why 
so ? What is the link with "build_more" ? 
The idea is that it calls something like: 

- build_key(type=hdr, hdr_name=From) 
- build_more(" t...@iki.fi") 
- build_key(type=hdr, hdr_name=Subject) 
- build_more("Re: Solr -> Xapian ?") 
- build_key(type=body_part) 
- build_more("message body piece") 
- build_more("message body piece2") 
... 

Q3 : Searching/Lookup : THe fheader in which to llok for (must be a 
least among "cc, to, from, subject, body") is not appearing in the 
'struct' data. WHere to find it ? 
lookup() gets struct mail_search_arg *args, which contains the entire 
IMAP SEARCH query. This could be used for more or less complex query 
builders. 

In case of a single header search, you should have 
args->args->hdr_field_name contain the header name and 
args->args->value.str contain the content you're searching for. 

Q4 : Refresh : this is very unclear. How come there would not be the 
"latest" view on index. What is the real meaning of this function ? 
In case of Xapian it might not matter if it automatically refreshes its 
indexes between each query. But with some other indexes this could 
happen: 

- IMAP session is opened 
- IMAP SEARCH is run, which opens and searches the index 
- a new mail is delivered to the mailbox and indexed 
- IMAP SEARCH is run. Without refresh() it doesn't see the newly 
indexed mail and doesn't include it in the search results. 

Q5 : Rescan : is it just a bout remonving all indexes for a specific 
mailbox ? 
It's run when "doveadm fts rescan" is run manually. Usually that's only 
run manually to fix up some brokenness. So it's intended to verify that 
the current mailbox contents match the FTS indexes: 
- If there are any mails in FTS index that no longer exist in the 
actual mailbox, delete those mails from FTS 
- If FTS is missing any mails in the middle of the mailbox, make sure 
that the next mailbox indexing will index those missing mails. I think 
currently this basically means reindexing all the mails since the first 
missing 

Re: Solr -> Xapian ?

2019-01-11 Thread Aki Tuomi


 
 
  
   I would recommend making this a standalone plugin for now instead of trying to keep it in core fts. 
  
  
   
  
  
   Aki
  
  
   
On 11 January 2019 at 18:40 Joan Moreau via dovecot <
dovecot@dovecot.org> wrote:
   
   

   
   

   
   
I managed to deal with the namespace issue (updated makefile.am)
   
   

   
   
However, I reach :
   
   

   
   
../../../src/lib/compat.h:207:19: error: conflicting declaration of
   
   
'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage
   
   
# define pread i_my_pread
   
   
^~
   
   
../../../src/lib/compat.h:210:9: note: previous declaration with 'C++'
   
   
linkage
   
   
ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset);
   
   
^~
   
   
../../../src/lib/compat.h:208:20: error: conflicting declaration of
   
   
'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C'
   
   
linkage
   
   
# define pwrite i_my_pwrite
   
   

   
   
Any help welcome
   
   

   
   
Hi,
   
   

   
   
I figured out the "namespace" issue
   
   

   
   
Remaining questions are :
   
   

   
   
1 - WHat does represent "subargs" in mail_search_args
   
   

   
   
2 - for rescan : who is responsible for passing again the new email ? Is
   
   
the Dovecot core sending again all the emails to index ? or the fts
   
   
shall somehow access the mailbox and read all emails ? Wouldn't just be
   
   
saying "delete all index and get_last_uid is now 0" the easy way ? or
   
   
the fts must process all emails (and block the current thread as a
   
   
mailbx maybe quite large)
   
   

   
   
3 - for get_last_uid : this uncertainity is very unclear. "If there is a
   
   
gap, then indexer first indexes all the missing" -> this mean at a
   
   
certain point, indexer maybe rebuilding a previous email, so *last* uid
   
   
is something different than max. And how indexer does know whther there
   
   
is a gap wihtout callong the fts backend (whch it does not as there are
   
   
no function for that) ?
   
   

   
   
4 - How to update configure.ac & additional files to add the
   
   
"--with-xapian" wichi will test for libxapian presence and add it to the
   
   
build ?
   
   

   
   
Thank you
   
   

   
   
On 2019-01-08 04:24, Timo Sirainen wrote:
   
   

   
   
On 7 Jan 2019, at 16.05, Joan Moreau via dovecot <
dovecot@dovecot.org>
   
   
wrote:
   
   
Hi
   
   

   
   
ANyone to answer specifically ?
   
   

   
   
Q1 : get_last_uid -> Is this the last UID indexed (which may be not the
   
   
greatest value), or the gratest value (which may not be the latest) (the
   
   
code of existing plugins is unclear about this, Solr looks for the
   
   
greatest for insance)
   
   
All the mails are always supposed to be indexed from the beginning to
   
   
the last indexed mail. If there's a gap, indexer first indexes all the
   
   
missing mails. So the latest UID is supposed to be the greatest UID.
   
   
(Supporting out-of-order indexing would be rather difficult to keep
   
   
track of.)
   
   

   
   
Q2 : WHen Indexing an email, the data is not passed by "build_key". Why
   
   
so ? What is the link with "build_more" ?
   
   
The idea is that it calls something like:
   
   

   
   
- build_key(type=hdr, hdr_name=From)
   
   
- build_more("
    t...@iki.fi")
   
   
- build_key(type=hdr, hdr_name=Subject)
   
   
- build_more("Re: Solr -> Xapian ?")
   
   
- build_key(type=body_part)
   
   
- build_more("message body piece")
   
   
- build_more("message body piece2")
   
   
...
   
   

   
   
Q3 : Searching/Lookup : THe fheader in which to llok for (must be a
   
   
least among "cc, to, from, subject, body") is not appearing in the
   
   
'struct' data. WHere to find it ?
   
   
lookup() gets struct mail_search_arg *args, which contains the entire
   
   
IMAP SEARCH query. This could be used for more or less complex query
   
   
builders.
   
   

   
   
In case of a single header search, you should have
   
   
args->args->hdr_field_name contain the header name and
   
   
args->args->value.str contain the content you're searching for.
   
   

   
   
Q4 : Refresh : this is very unclear. How come there would not be the
   
   
"latest" view on index. What is the real meaning of this function ?
   
   
In case of Xapian it might not matter if it automatically refreshes its
   
   
indexes between

Re: Solr -> Xapian ?

2019-01-11 Thread Joan Moreau via dovecot

I managed to deal with the namespace issue (updated makefile.am)

However, I reach : 


../../../src/lib/compat.h:207:19: error: conflicting declaration of
'ssize_t i_my_pread(int, void*, size_t, __off_t)' with 'C' linkage
# define pread i_my_pread
^~
../../../src/lib/compat.h:210:9: note: previous declaration with 'C++'
linkage
ssize_t i_my_pread(int fd, void *buf, size_t count, off_t offset);
^~
../../../src/lib/compat.h:208:20: error: conflicting declaration of
'ssize_t i_my_pwrite(int, const void*, size_t, __off_t)' with 'C'
linkage
# define pwrite i_my_pwrite 

Any help welcome 

Hi, 

I figured out the "namespace" issue 

Remaining questions are : 

1 - WHat does represent "subargs" in mail_search_args 


2 - for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large) 


3 - for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ?

4 - How to update configure.ac & additional files to add the
"--with-xapian" wichi will test for libxapian presence and add it to the
build ? 

Thank you 


On 2019-01-08 04:24, Timo Sirainen wrote:

On 7 Jan 2019, at 16.05, Joan Moreau via dovecot 
wrote:
Hi

ANyone to answer specifically ?

Q1 : get_last_uid -> Is this the last UID indexed (which may be not the
greatest value), or the gratest value (which may not be the latest) (the
code of existing plugins is unclear about this, Solr looks for the
greatest for insance)
All the mails are always supposed to be indexed from the beginning to
the last indexed mail. If there's a gap, indexer first indexes all the
missing mails. So the latest UID is supposed to be the greatest UID.
(Supporting out-of-order indexing would be rather difficult to keep
track of.)

Q2 : WHen Indexing an email, the data is not passed by "build_key". Why
so ? What is the link with "build_more" ?
The idea is that it calls something like:

- build_key(type=hdr, hdr_name=From)
- build_more("t...@iki.fi")
- build_key(type=hdr, hdr_name=Subject)
- build_more("Re: Solr -> Xapian ?")
- build_key(type=body_part)
- build_more("message body piece")
- build_more("message body piece2")
...

Q3 : Searching/Lookup : THe fheader in which to llok for (must be a
least among "cc, to, from, subject, body") is not appearing in the
'struct' data. WHere to find it ?
lookup() gets struct mail_search_arg *args, which contains the entire
IMAP SEARCH query. This could be used for more or less complex query
builders.

In case of a single header search, you should have
args->args->hdr_field_name contain the header name and
args->args->value.str contain the content you're searching for.

Q4 : Refresh : this is very unclear. How come there would not be the
"latest" view on index. What is the real meaning of this function ?
In case of Xapian it might not matter if it automatically refreshes its
indexes between each query. But with some other indexes this could
happen:

- IMAP session is opened
- IMAP SEARCH is run, which opens and searches the index
- a new mail is delivered to the mailbox and indexed
- IMAP SEARCH is run. Without refresh() it doesn't see the newly
indexed mail and doesn't include it in the search results.

Q5 : Rescan : is it just a bout remonving all indexes for a specific
mailbox ?
It's run when "doveadm fts rescan" is run manually. Usually that's only
run manually to fix up some brokenness. So it's intended to verify that
the current mailbox contents match the FTS indexes:
- If there are any mails in FTS index that no longer exist in the
actual mailbox, delete those mails from FTS
- If FTS is missing any mails in the middle of the mailbox, make sure
that the next mailbox indexing will index those missing mails. I think
currently this basically means reindexing all the mails since the first
missing mail, even the mails that are already in the index.

fts-lucene implements this, but other FTS backends are lazy and simply
rebuild all mails. Actually fts-solr is bad because it doesn't even
delete the extra mails.

Q6 : lokkup_multi : isn't the function the same for all plugnins (see
below) ?and finally , for fts_backend__lookup_multi, why is that
backend dependent ?
This function is called only when searching in virtual folders. So for
example the

Re: Solr -> Xapian ?

2019-01-11 Thread Joan Moreau via dovecot
Also, 

1 - WHat does represent "subargs" in mail_search_args 


2 - I made my first code, and the error I get compiling within the
dovecot architecture is 


"In file included from fts-xapian-plugin.c:4:
fts-xapian-plugin.h:6:1: error: unknown type name 'using'; did you mean
'uint'?
using namespace std;" 


if I remove this, the Xapian library is also complaining about
"namespace" keyword 


In file included from /usr/include/xapian.h:47,
from fts-backend-xapian.c:11:
/usr/include/xapian/types.h:31:1: error: unknown type name 'namespace';
did you mean 'i_isspace'?
namespace Xapian { 

Someone can bring me some light ? 

Thanks 


On 2019-01-09 09:58, Joan Moreau via dovecot wrote:

Ok. 

Additional question : 

- for rescan : who is responsible for passing again the new email ? Is the Dovecot core sending again all the emails to index ? or the fts shall somehow access the mailbox and read all emails ? Wouldn't just be saying "delete all index and get_last_uid is now 0" the easy way ? or the fts must process all emails (and block the current thread as a mailbx maybe quite large) 


- for get_last_uid : this uncertainity is very unclear. "If there is a gap, then 
indexer first indexes all the missing" -> this mean at a certain point, indexer 
maybe rebuilding a previous email, so *last* uid is something different than max. And how 
indexer does know whther there is a gap wihtout callong the fts backend (whch it does not as 
there are no function for that) ?

On 2019-01-08 04:24, Timo Sirainen wrote: 
On 7 Jan 2019, at 16.05, Joan Moreau via dovecot  wrote: 
Hi


ANyone to answer specifically ?

Q1 : get_last_uid -> Is this the last UID indexed (which may be not the greatest value), or the gratest value (which may not be the latest) (the code of existing plugins is unclear about this, Solr looks for the greatest for insance) 
All the mails are always supposed to be indexed from the beginning to the last indexed mail. If there's a gap, indexer first indexes all the missing mails. So the latest UID is supposed to be the greatest UID. (Supporting out-of-order indexing would be rather difficult to keep track of.)


Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the link with "build_more" ? 
The idea is that it calls something like:


- build_key(type=hdr, hdr_name=From)
- build_more("t...@iki.fi")
- build_key(type=hdr, hdr_name=Subject)
- build_more("Re: Solr -> Xapian ?")
- build_key(type=body_part)
- build_more("message body piece")
- build_more("message body piece2")
...

Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ? 
lookup() gets struct mail_search_arg *args, which contains the entire IMAP SEARCH query. This could be used for more or less complex query builders.


In case of a single header search, you should have args->args->hdr_field_name contain 
the header name and args->args->value.str contain the content you're searching for.

Q4 : Refresh : this is very unclear. How come there would not be the "latest" view on index. What is the real meaning of this function ? 
In case of Xapian it might not matter if it automatically refreshes its indexes between each query. But with some other indexes this could happen:


- IMAP session is opened
- IMAP SEARCH is run, which opens and searches the index
- a new mail is delivered to the mailbox and indexed
- IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail 
and doesn't include it in the search results.

Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ? 
It's run when "doveadm fts rescan" is run manually. Usually that's only run manually to fix up some brokenness. So it's intended to verify that the current mailbox contents match the FTS indexes:

- If there are any mails in FTS index that no longer exist in the actual 
mailbox, delete those mails from FTS
- If FTS is missing any mails in the middle of the mailbox, make sure that the 
next mailbox indexing will index those missing mails. I think currently this 
basically means reindexing all the mails since the first missing mail, even the 
mails that are already in the index.

fts-lucene implements this, but other FTS backends are lazy and simply rebuild 
all mails. Actually fts-solr is bad because it doesn't even delete the extra 
mails.

Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ? 
and finally , for fts_backend__lookup_multi, why is that backend dependent ?


This function is called only when searching in virtual folders. So for
example the virtual "All mails" folder, which would contain all mails in
all folders. In that case the boxes[] would contain a list of user's all
folders, except Tras

Re: Solr -> Xapian ?

2019-01-09 Thread Joan Moreau via dovecot
Ok. 

Additional question : 


- for rescan : who is responsible for passing again the new email ? Is
the Dovecot core sending again all the emails to index ? or the fts
shall somehow access the mailbox and read all emails ? Wouldn't just be
saying "delete all index and get_last_uid is now 0" the easy way ? or
the fts must process all emails (and block the current thread as a
mailbx maybe quite large) 


- for get_last_uid : this uncertainity is very unclear. "If there is a
gap, then indexer first indexes all the missing" -> this mean at a
certain point, indexer maybe rebuilding a previous email, so *last* uid
is something different than max. And how indexer does know whther there
is a gap wihtout callong the fts backend (whch it does not as there are
no function for that) ?

On 2019-01-08 04:24, Timo Sirainen wrote:

On 7 Jan 2019, at 16.05, Joan Moreau via dovecot  wrote: 


Hi

ANyone to answer specifically ?

Q1 : get_last_uid -> Is this the last UID indexed (which may be not the 
greatest value), or the gratest value (which may not be the latest) (the code of 
existing plugins is unclear about this, Solr looks for the greatest for insance)


All the mails are always supposed to be indexed from the beginning to the last 
indexed mail. If there's a gap, indexer first indexes all the missing mails. So 
the latest UID is supposed to be the greatest UID. (Supporting out-of-order 
indexing would be rather difficult to keep track of.)


Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? What is the 
link with "build_more" ?


The idea is that it calls something like:

- build_key(type=hdr, hdr_name=From)
- build_more("t...@iki.fi")
- build_key(type=hdr, hdr_name=Subject)
- build_more("Re: Solr -> Xapian ?")
- build_key(type=body_part)
- build_more("message body piece")
- build_more("message body piece2")
...


Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least among "cc, 
to, from, subject, body") is not appearing in the 'struct' data. WHere to find it ?


lookup() gets struct mail_search_arg *args, which contains the entire IMAP 
SEARCH query. This could be used for more or less complex query builders.

In case of a single header search, you should have args->args->hdr_field_name contain 
the header name and args->args->value.str contain the content you're searching for.


Q4 : Refresh : this is very unclear. How come there would not be the "latest" 
view on index. What is the real meaning of this function ?


In case of Xapian it might not matter if it automatically refreshes its indexes 
between each query. But with some other indexes this could happen:

- IMAP session is opened
- IMAP SEARCH is run, which opens and searches the index
- a new mail is delivered to the mailbox and indexed
- IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail 
and doesn't include it in the search results.


Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ?


It's run when "doveadm fts rescan" is run manually. Usually that's only run 
manually to fix up some brokenness. So it's intended to verify that the current mailbox 
contents match the FTS indexes:
- If there are any mails in FTS index that no longer exist in the actual 
mailbox, delete those mails from FTS
- If FTS is missing any mails in the middle of the mailbox, make sure that the 
next mailbox indexing will index those missing mails. I think currently this 
basically means reindexing all the mails since the first missing mail, even the 
mails that are already in the index.

fts-lucene implements this, but other FTS backends are lazy and simply rebuild 
all mails. Actually fts-solr is bad because it doesn't even delete the extra 
mails.

Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ? 
and finally , for fts_backend__lookup_multi, why is that backend dependent ?


This function is called only when searching in virtual folders. So for
example the virtual "All mails" folder, which would contain all mails in
all folders. In that case the boxes[] would contain a list of user's all
folders, except Trash and Spam. If lookup_multi() isn't implemented
(left to NULL), the search is run separately via lookup() for each
folder. With lookup_multi() there can be just one lookup, and the
backend can filter only the wanted folders and return them directly. So
it's an optimization for FTS indexes that support user-global searches
rather than only per-folder searches.


static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, struct 
mailbox *const boxes[], struct mail_search_arg *args, enum fts_lookup_flags 
flags, struct fts_multi_result *result)
{
struct xapian_fts_backend_update_context *ctx =
(struct xapian_fts_backend_update_context *)_ctx;

int i=0;

while(boxes[i]!=NULL)
{
if(fts_backen

Re: Solr -> Xapian ?

2019-01-07 Thread Timo Sirainen
On 7 Jan 2019, at 16.05, Joan Moreau via dovecot  wrote:
> 
> Hi
> 
> ANyone to answer specifically ?
> 
> Q1 : get_last_uid -> Is this the last UID indexed (which may be not the 
> greatest value), or the gratest value (which may not be the latest) (the code 
> of existing plugins is unclear about this, Solr looks for the greatest for 
> insance)

All the mails are always supposed to be indexed from the beginning to the last 
indexed mail. If there's a gap, indexer first indexes all the missing mails. So 
the latest UID is supposed to be the greatest UID. (Supporting out-of-order 
indexing would be rather difficult to keep track of.)

> Q2 : WHen Indexing an email, the data is not passed by "build_key". Why so ? 
> What is the link with "build_more" ?

The idea is that it calls something like:

 - build_key(type=hdr, hdr_name=From)
 - build_more("t...@iki.fi")
 - build_key(type=hdr, hdr_name=Subject)
 - build_more("Re: Solr -> Xapian ?")
 - build_key(type=body_part)
 - build_more("message body piece")
 - build_more("message body piece2")
 ...

> Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least 
> among "cc, to, from, subject, body") is not appearing in the 'struct' data. 
> WHere to find it ?

lookup() gets struct mail_search_arg *args, which contains the entire IMAP 
SEARCH query. This could be used for more or less complex query builders.

In case of a single header search, you should have args->args->hdr_field_name 
contain the header name and args->args->value.str contain the content you're 
searching for.

> Q4 : Refresh : this is very unclear. How come there would not be the "latest" 
> view on index. What is the real meaning of this function ?

In case of Xapian it might not matter if it automatically refreshes its indexes 
between each query. But with some other indexes this could happen:

 - IMAP session is opened
 - IMAP SEARCH is run, which opens and searches the index
 - a new mail is delivered to the mailbox and indexed
 - IMAP SEARCH is run. Without refresh() it doesn't see the newly indexed mail 
and doesn't include it in the search results.

> Q5 : Rescan : is it just a bout remonving all indexes for a specific mailbox ?

It's run when "doveadm fts rescan" is run manually. Usually that's only run 
manually to fix up some brokenness. So it's intended to verify that the current 
mailbox contents match the FTS indexes:
 - If there are any mails in FTS index that no longer exist in the actual 
mailbox, delete those mails from FTS
 - If FTS is missing any mails in the middle of the mailbox, make sure that the 
next mailbox indexing will index those missing mails. I think currently this 
basically means reindexing all the mails since the first missing mail, even the 
mails that are already in the index.

fts-lucene implements this, but other FTS backends are lazy and simply rebuild 
all mails. Actually fts-solr is bad because it doesn't even delete the extra 
mails.

> Q6 : lokkup_multi : isn't the function the same for all plugnins (see below) ?
>> and finally , for fts_backend__lookup_multi, why is that backend 
>> dependent ?

This function is called only when searching in virtual folders. So for example 
the virtual "All mails" folder, which would contain all mails in all folders. 
In that case the boxes[] would contain a list of user's all folders, except 
Trash and Spam. If lookup_multi() isn't implemented (left to NULL), the search 
is run separately via lookup() for each folder. With lookup_multi() there can 
be just one lookup, and the backend can filter only the wanted folders and 
return them directly. So it's an optimization for FTS indexes that support 
user-global searches rather than only per-folder searches.

>> static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, 
>> struct mailbox *const boxes[], struct mail_search_arg *args, enum 
>> fts_lookup_flags flags, struct fts_multi_result *result)
>> {
>> struct xapian_fts_backend_update_context *ctx =
>> (struct xapian_fts_backend_update_context *)_ctx;
>> 
>> int i=0;
>> 
>> while(boxes[i]!=NULL)
>> {
>> if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0)
>>  return -1;
>> i++;
>> }
>> return 0;
>> }

See fts_backend_lookup_multi() - if you leave lookup_multi=NULL it basically 
does this.

>> For "rescan " and "optimize", wouldn't it be the dovecot core who indicate 
>> which are to be dismissed (expunged), or re-ask for indexing a particular 
>> (or all) uid ? WHy would the backend be aware of the transactions on the 
>> mailbox ???

rescan() is about fixing up a more or less broken index, or simply to verify 
that it'

Re: Solr -> Xapian ?

2019-01-07 Thread Michael Slusarz
Maybe a dumb question (I admit I haven't followed this thread very closely)...

But why are you writing a new FTS driver?  If squat allegedly does everything 
you need it to do, why don't you just take that plugin and fix it up to do what 
you need?  That seems way easier than trying to create a FTS driver from 
scratch.

michael


> On January 7, 2019 at 7:05 AM Joan Moreau via dovecot  
> wrote:
> 
> 
> Hi
> 
> ANyone to answer specifically ?
> 
> Q1 : get_last_uid -> Is this the last UID indexed (which may be not the 
> greatest value), or the gratest value (which may not be the latest) (the code 
> of existing plugins is unclear about this, Solr looks for the greatest for 
> insance)
> 
> Q2 : WHen Indexing an email, the data is not passed by "build_key". Why 
> so ? What is the link with "build_more" ?
> 
> Q3 : Searching/Lookup : THe fheader in which to llok for (must be a least 
> among "cc, to, from, subject, body") is not appearing in the 'struct' data. 
> WHere to find it ?
> 
> Q4 : Refresh : this is very unclear. How come there would not be the 
> "latest" view on index. What is the real meaning of this function ?
> 
> Q5 : Rescan : is it just a bout remonving all indexes for a specific 
> mailbox ?
> 
> Q6 : lokkup_multi : isn't the function the same for all plugnins (see 
> below) ?
> 


Re: Solr -> Xapian ?

2019-01-07 Thread Joan Moreau via dovecot
Hi 

ANyone to answer specifically ? 


Q1 : get_last_uid -> Is this the last UID indexed (which may be not the
greatest value), or the gratest value (which may not be the latest) (the
code of existing plugins is unclear about this, Solr looks for the
greatest for insance) 


Q2 : WHen Indexing an email, the data is not passed by "build_key". Why
so ? What is the link with "build_more" ? 


Q3 : Searching/Lookup : THe fheader in which to llok for (must be a
least among "cc, to, from, subject, body") is not appearing in the
'struct' data. WHere to find it ? 


Q4 : Refresh : this is very unclear. How come there would not be the
"latest" view on index. What is the real meaning of this function ? 


Q5 : Rescan : is it just a bout remonving all indexes for a specific
mailbox ? 


Q6 : lokkup_multi : isn't the function the same for all plugnins (see
below) ? 

THank you 


On 2019-01-06 16:50, Joan Moreau via dovecot wrote:

and finally , for fts_backend__lookup_multi, why is that backend dependent ? 

Would- nt the below function below be the same for any backend ? 

Waiting fro your feedback on all those questions 

Thank you 

JM 

- 


static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend, struct 
mailbox *const boxes[], struct mail_search_arg *args, enum fts_lookup_flags 
flags, struct fts_multi_result *result)
{
struct xapian_fts_backend_update_context *ctx =
(struct xapian_fts_backend_update_context *)_ctx; 

int i=0; 


while(boxes[i]!=NULL)
{
if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0)
 return -1;
i++;
}
return 0;
}

On 2019-01-06 16:31, Joan Moreau via dovecot wrote: 


for fts_backend_xxx_lookup, where is specidifed in which field (to, cc, 
subject, body, from, all) to lookup ?

On 2019-01-06 16:03, Joan Moreau wrote: 

For "rescan " and "optimize", wouldn't it be the dovecot core who indicate which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would the backend be aware of the transactions on the mailbox ??? 


There is alredy "fts_backend_xxx_update_expunge", so I beleive the management 
of the expunged messages is *NOT* in the backend, right ?

On 2019-01-06 15:41, Joan Moreau wrote: 

also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? 

On 2019-01-06 14:10, Joan Moreau wrote: 

for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? 

On 2019-01-06 11:53, Joan Moreau via dovecot wrote: 

Thank you 

I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? 

On 2019-01-06 08:43, Stephan Bosch wrote: 


Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those 
functions ?

Most notably " get_last_uid" 
From src/plugins/fts/fts-api.h:


/* Get the last_uid for the mailbox. */
int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box,
uint32_t *last_uid_r);

The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this 
returns the last UID added to the index for the given mailbox and FTS index.

"set_build_key" 
From src/plugins/fts/fts-api.h:


/* Switch to building index for specified key. If backend doesn't want to
index this key, it can return FALSE and caller will skip to next key. */
bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx,
const struct fts_backend_build_key *key);

Same file provides outline of what a build_key is.

"build_more" , 
/* Add more content to the index for the currently specified build key.

Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters,
but it doesn't need to be NUL-terminated. size contains the data size in
bytes, not characters. This function may be called many times and the data
block sizes may be small. Backend returns 0 if ok, -1 if build should be
aborted. */
int fts_backend_update_build_more(struct fts_backend_update_context *ctx,
const unsigned char *data, size_t size);

You should look at the sources of a few backends like squat and solr to get a 
feel of what exactly this is doing.

what is refresh versus rescan ? 
From fts-api.h:


/* Refresh index to make sure we see latest changes from lookups.
Returns 0 if ok, -1 if error. */
int fts_backend_refresh(struct fts_backend *backend);
/* Go through the entire index and make sure all mails are indexed,
and delete any extra mails in the index. */
int fts_backend_rescan(struct fts_backend *backend);

Regards,

Stepham

On January 5, 2019 14:23:10 Joan Moreau via dovecot  wrote:

Thank Stephan

I basically need to know the role/description of each of the functions of the 
fts_backend:

struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*

{
fts_backend_xapian_alloc,

Re: Solr -> Xapian ?

2019-01-06 Thread Joan Moreau via dovecot

and finally , for fts_backend__lookup_multi, why is that backend
dependent ? 

Would- nt the below function below be the same for any backend ? 

Waiting fro your feedback on all those questions 

Thank you 

JM 

- 


static int fts_backend_xapian_lookup_multi(struct fts_backend *_backend,
struct mailbox *const boxes[], struct mail_search_arg *args, enum
fts_lookup_flags flags, struct fts_multi_result *result)
{
struct xapian_fts_backend_update_context *ctx =
(struct xapian_fts_backend_update_context *)_ctx; 

int i=0; 


while(boxes[i]!=NULL)
{
if(fts_backend_xapian_lookup(backend,box[i],args,flags,result->box_results[i])<0)
return -1;
i++;
}
return 0;
}

On 2019-01-06 16:31, Joan Moreau via dovecot wrote:


for fts_backend_xxx_lookup, where is specidifed in which field (to, cc, 
subject, body, from, all) to lookup ?

On 2019-01-06 16:03, Joan Moreau wrote: 

For "rescan " and "optimize", wouldn't it be the dovecot core who indicate which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would the backend be aware of the transactions on the mailbox ??? 


There is alredy "fts_backend_xxx_update_expunge", so I beleive the management 
of the expunged messages is *NOT* in the backend, right ?

On 2019-01-06 15:41, Joan Moreau wrote: 

also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? 

On 2019-01-06 14:10, Joan Moreau wrote: 

for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? 

On 2019-01-06 11:53, Joan Moreau via dovecot wrote: 

Thank you 

I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? 

On 2019-01-06 08:43, Stephan Bosch wrote: 


Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those 
functions ?

Most notably " get_last_uid" 
From src/plugins/fts/fts-api.h:


/* Get the last_uid for the mailbox. */
int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box,
uint32_t *last_uid_r);

The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this 
returns the last UID added to the index for the given mailbox and FTS index.

"set_build_key" 
From src/plugins/fts/fts-api.h:


/* Switch to building index for specified key. If backend doesn't want to
index this key, it can return FALSE and caller will skip to next key. */
bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx,
const struct fts_backend_build_key *key);

Same file provides outline of what a build_key is.

"build_more" , 
/* Add more content to the index for the currently specified build key.

Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters,
but it doesn't need to be NUL-terminated. size contains the data size in
bytes, not characters. This function may be called many times and the data
block sizes may be small. Backend returns 0 if ok, -1 if build should be
aborted. */
int fts_backend_update_build_more(struct fts_backend_update_context *ctx,
const unsigned char *data, size_t size);

You should look at the sources of a few backends like squat and solr to get a 
feel of what exactly this is doing.

what is refresh versus rescan ? 
From fts-api.h:


/* Refresh index to make sure we see latest changes from lookups.
Returns 0 if ok, -1 if error. */
int fts_backend_refresh(struct fts_backend *backend);
/* Go through the entire index and make sure all mails are indexed,
and delete any extra mails in the index. */
int fts_backend_rescan(struct fts_backend *backend);

Regards,

Stepham

On January 5, 2019 14:23:10 Joan Moreau via dovecot  wrote:

Thank Stephan

I basically need to know the role/description of each of the functions of the 
fts_backend:

struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*

{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};

THank you

On 2019-01-05 08:49, Stephan Bosch wrote:

Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: 
Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin


The Dovecot API documentation is not exhaustive everywhere, but the basics are 
documented. The remaining questions can be answered by looking at examples 
found in 

Re: Solr -> Xapian ?

2019-01-06 Thread Joan Moreau via dovecot

for fts_backend_xxx_lookup, where is specidifed in which field (to, cc,
subject, body, from, all) to lookup ?

On 2019-01-06 16:03, Joan Moreau wrote:

For "rescan " and "optimize", wouldn't it be the dovecot core who indicate which are to be dismissed (expunged), or re-ask for indexing a particular (or all) uid ? WHy would the backend be aware of the transactions on the mailbox ??? 


There is alredy "fts_backend_xxx_update_expunge", so I beleive the management 
of the expunged messages is *NOT* in the backend, right ?

On 2019-01-06 15:41, Joan Moreau wrote: 

also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? 

On 2019-01-06 14:10, Joan Moreau wrote: 

for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? 

On 2019-01-06 11:53, Joan Moreau via dovecot wrote: 

Thank you 

I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? 

On 2019-01-06 08:43, Stephan Bosch wrote: 


Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those 
functions ?

Most notably " get_last_uid" 
From src/plugins/fts/fts-api.h:


/* Get the last_uid for the mailbox. */
int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box,
uint32_t *last_uid_r);

The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this 
returns the last UID added to the index for the given mailbox and FTS index.

"set_build_key" 
From src/plugins/fts/fts-api.h:


/* Switch to building index for specified key. If backend doesn't want to
index this key, it can return FALSE and caller will skip to next key. */
bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx,
const struct fts_backend_build_key *key);

Same file provides outline of what a build_key is.

"build_more" , 
/* Add more content to the index for the currently specified build key.

Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters,
but it doesn't need to be NUL-terminated. size contains the data size in
bytes, not characters. This function may be called many times and the data
block sizes may be small. Backend returns 0 if ok, -1 if build should be
aborted. */
int fts_backend_update_build_more(struct fts_backend_update_context *ctx,
const unsigned char *data, size_t size);

You should look at the sources of a few backends like squat and solr to get a 
feel of what exactly this is doing.

what is refresh versus rescan ? 
From fts-api.h:


/* Refresh index to make sure we see latest changes from lookups.
Returns 0 if ok, -1 if error. */
int fts_backend_refresh(struct fts_backend *backend);
/* Go through the entire index and make sure all mails are indexed,
and delete any extra mails in the index. */
int fts_backend_rescan(struct fts_backend *backend);

Regards,

Stepham

On January 5, 2019 14:23:10 Joan Moreau via dovecot  wrote:

Thank Stephan

I basically need to know the role/description of each of the functions of the 
fts_backend:

struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*

{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};

THank you

On 2019-01-05 08:49, Stephan Bosch wrote:

Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: 
Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin


The Dovecot API documentation is not exhaustive everywhere, but the basics are 
documented. The remaining questions can be answered by looking at examples 
found in similar plugins or the relevant API sources.

I know of one FTS plugin not written by Dovecot developers:

https://github.com/atkinsj/fts-elasticsearch

If you really wish to do something like this, just go ahead. It will not be a 
small effort though. As soon as you have concrete questions, we can help you 
(don't expect rapid responses though).

Regards,

Stephan.

Re: Solr -> Xapian ?

2019-01-06 Thread Joan Moreau via dovecot

For "rescan " and "optimize", wouldn't it be the dovecot core who
indicate which are to be dismissed (expunged), or re-ask for indexing a
particular (or all) uid ? WHy would the backend be aware of the
transactions on the mailbox ??? 


There is alredy "fts_backend_xxx_update_expunge", so I beleive the
management of the expunged messages is *NOT* in the backend, right ?

On 2019-01-06 15:41, Joan Moreau wrote:

also, for fts_backend_solr_update_set_build_key -> where is the data (of the hdr_name or the body) ? 

On 2019-01-06 14:10, Joan Moreau wrote: 

for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? 

On 2019-01-06 11:53, Joan Moreau via dovecot wrote: 

Thank you 

I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? 

On 2019-01-06 08:43, Stephan Bosch wrote: 


Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those 
functions ?

Most notably " get_last_uid" 
From src/plugins/fts/fts-api.h:


/* Get the last_uid for the mailbox. */
int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box,
uint32_t *last_uid_r);

The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this 
returns the last UID added to the index for the given mailbox and FTS index.

"set_build_key" 
From src/plugins/fts/fts-api.h:


/* Switch to building index for specified key. If backend doesn't want to
index this key, it can return FALSE and caller will skip to next key. */
bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx,
const struct fts_backend_build_key *key);

Same file provides outline of what a build_key is.

"build_more" , 
/* Add more content to the index for the currently specified build key.

Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters,
but it doesn't need to be NUL-terminated. size contains the data size in
bytes, not characters. This function may be called many times and the data
block sizes may be small. Backend returns 0 if ok, -1 if build should be
aborted. */
int fts_backend_update_build_more(struct fts_backend_update_context *ctx,
const unsigned char *data, size_t size);

You should look at the sources of a few backends like squat and solr to get a 
feel of what exactly this is doing.

what is refresh versus rescan ? 
From fts-api.h:


/* Refresh index to make sure we see latest changes from lookups.
Returns 0 if ok, -1 if error. */
int fts_backend_refresh(struct fts_backend *backend);
/* Go through the entire index and make sure all mails are indexed,
and delete any extra mails in the index. */
int fts_backend_rescan(struct fts_backend *backend);

Regards,

Stepham

On January 5, 2019 14:23:10 Joan Moreau via dovecot  wrote:

Thank Stephan

I basically need to know the role/description of each of the functions of the 
fts_backend:

struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*

{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};

THank you

On 2019-01-05 08:49, Stephan Bosch wrote:

Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: 
Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin


The Dovecot API documentation is not exhaustive everywhere, but the basics are 
documented. The remaining questions can be answered by looking at examples 
found in similar plugins or the relevant API sources.

I know of one FTS plugin not written by Dovecot developers:

https://github.com/atkinsj/fts-elasticsearch

If you really wish to do something like this, just go ahead. It will not be a 
small effort though. As soon as you have concrete questions, we can help you 
(don't expect rapid responses though).

Regards,

Stephan.

Re: Solr -> Xapian ?

2019-01-05 Thread Joan Moreau via dovecot

also, for fts_backend_solr_update_set_build_key -> where is the data (of
the hdr_name or the body) ? 


On 2019-01-06 14:10, Joan Moreau wrote:

for the "last uid"-> this is not the last added, but the maximum of the UID in the indexed emails, right ? 

On 2019-01-06 11:53, Joan Moreau via dovecot wrote: 

Thank you 

I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? 

On 2019-01-06 08:43, Stephan Bosch wrote: 


Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those 
functions ?

Most notably " get_last_uid" 
From src/plugins/fts/fts-api.h:


/* Get the last_uid for the mailbox. */
int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box,
uint32_t *last_uid_r);

The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this 
returns the last UID added to the index for the given mailbox and FTS index.

"set_build_key" 
From src/plugins/fts/fts-api.h:


/* Switch to building index for specified key. If backend doesn't want to
index this key, it can return FALSE and caller will skip to next key. */
bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx,
const struct fts_backend_build_key *key);

Same file provides outline of what a build_key is.

"build_more" , 
/* Add more content to the index for the currently specified build key.

Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters,
but it doesn't need to be NUL-terminated. size contains the data size in
bytes, not characters. This function may be called many times and the data
block sizes may be small. Backend returns 0 if ok, -1 if build should be
aborted. */
int fts_backend_update_build_more(struct fts_backend_update_context *ctx,
const unsigned char *data, size_t size);

You should look at the sources of a few backends like squat and solr to get a 
feel of what exactly this is doing.

what is refresh versus rescan ? 
From fts-api.h:


/* Refresh index to make sure we see latest changes from lookups.
Returns 0 if ok, -1 if error. */
int fts_backend_refresh(struct fts_backend *backend);
/* Go through the entire index and make sure all mails are indexed,
and delete any extra mails in the index. */
int fts_backend_rescan(struct fts_backend *backend);

Regards,

Stepham

On January 5, 2019 14:23:10 Joan Moreau via dovecot  wrote:

Thank Stephan

I basically need to know the role/description of each of the functions of the 
fts_backend:

struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*

{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};

THank you

On 2019-01-05 08:49, Stephan Bosch wrote:

Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: 
Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin


The Dovecot API documentation is not exhaustive everywhere, but the basics are 
documented. The remaining questions can be answered by looking at examples 
found in similar plugins or the relevant API sources.

I know of one FTS plugin not written by Dovecot developers:

https://github.com/atkinsj/fts-elasticsearch

If you really wish to do something like this, just go ahead. It will not be a 
small effort though. As soon as you have concrete questions, we can help you 
(don't expect rapid responses though).

Regards,

Stephan.

Re: Solr -> Xapian ?

2019-01-05 Thread Joan Moreau via dovecot

for the "last uid"-> this is not the last added, but the maximum of the
UID in the indexed emails, right ? 


On 2019-01-06 11:53, Joan Moreau via dovecot wrote:

Thank you 

I still don't get the "build_key" function. The email (body, hearders, .. and the uid) is the one (and only) to index . What "key" is that function referring to ? Or is the "key" here the actual email ? 

On 2019-01-06 08:43, Stephan Bosch wrote: 


Op 06/01/2019 om 01:00 schreef Joan Moreau: Anyone willing to explain those 
functions ?

Most notably " get_last_uid" 
From src/plugins/fts/fts-api.h:


/* Get the last_uid for the mailbox. */
int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box,
uint32_t *last_uid_r);

The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this 
returns the last UID added to the index for the given mailbox and FTS index.

"set_build_key" 
From src/plugins/fts/fts-api.h:


/* Switch to building index for specified key. If backend doesn't want to
index this key, it can return FALSE and caller will skip to next key. */
bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx,
const struct fts_backend_build_key *key);

Same file provides outline of what a build_key is.

"build_more" , 
/* Add more content to the index for the currently specified build key.

Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters,
but it doesn't need to be NUL-terminated. size contains the data size in
bytes, not characters. This function may be called many times and the data
block sizes may be small. Backend returns 0 if ok, -1 if build should be
aborted. */
int fts_backend_update_build_more(struct fts_backend_update_context *ctx,
const unsigned char *data, size_t size);

You should look at the sources of a few backends like squat and solr to get a 
feel of what exactly this is doing.

what is refresh versus rescan ? 
From fts-api.h:


/* Refresh index to make sure we see latest changes from lookups.
Returns 0 if ok, -1 if error. */
int fts_backend_refresh(struct fts_backend *backend);
/* Go through the entire index and make sure all mails are indexed,
and delete any extra mails in the index. */
int fts_backend_rescan(struct fts_backend *backend);

Regards,

Stepham

On January 5, 2019 14:23:10 Joan Moreau via dovecot  wrote:

Thank Stephan

I basically need to know the role/description of each of the functions of the 
fts_backend:

struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*

{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};

THank you

On 2019-01-05 08:49, Stephan Bosch wrote:

Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: 
Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin


The Dovecot API documentation is not exhaustive everywhere, but the basics are 
documented. The remaining questions can be answered by looking at examples 
found in similar plugins or the relevant API sources.

I know of one FTS plugin not written by Dovecot developers:

https://github.com/atkinsj/fts-elasticsearch

If you really wish to do something like this, just go ahead. It will not be a 
small effort though. As soon as you have concrete questions, we can help you 
(don't expect rapid responses though).

Regards,

Stephan.

Re: Solr -> Xapian ?

2019-01-05 Thread Joan Moreau via dovecot
Thank you 


I still don't get the "build_key" function. The email (body, hearders,
.. and the uid) is the one (and only) to index . What "key" is that
function referring to ? Or is the "key" here the actual email ? 


On 2019-01-06 08:43, Stephan Bosch wrote:

Op 06/01/2019 om 01:00 schreef Joan Moreau: 


Anyone willing to explain those functions ?

Most notably " get_last_uid"


From src/plugins/fts/fts-api.h:

/* Get the last_uid for the mailbox. */
int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox *box,
uint32_t *last_uid_r);

The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me this 
returns the last UID added to the index for the given mailbox and FTS index.


"set_build_key"


From src/plugins/fts/fts-api.h:

/* Switch to building index for specified key. If backend doesn't want to
index this key, it can return FALSE and caller will skip to next key. */
bool fts_backend_update_set_build_key(struct fts_backend_update_context *ctx,
const struct fts_backend_build_key *key);

Same file provides outline of what a build_key is.


"build_more" ,


/* Add more content to the index for the currently specified build key.
Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters,
but it doesn't need to be NUL-terminated. size contains the data size in
bytes, not characters. This function may be called many times and the data
block sizes may be small. Backend returns 0 if ok, -1 if build should be
aborted. */
int fts_backend_update_build_more(struct fts_backend_update_context *ctx,
const unsigned char *data, size_t size);

You should look at the sources of a few backends like squat and solr to get a 
feel of what exactly this is doing.


what is refresh versus rescan ?


From fts-api.h:

/* Refresh index to make sure we see latest changes from lookups.
Returns 0 if ok, -1 if error. */
int fts_backend_refresh(struct fts_backend *backend);
/* Go through the entire index and make sure all mails are indexed,
and delete any extra mails in the index. */
int fts_backend_rescan(struct fts_backend *backend);

Regards,

Stepham

On January 5, 2019 14:23:10 Joan Moreau via dovecot  wrote:

Thank Stephan

I basically need to know the role/description of each of the functions of the 
fts_backend:

struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*

{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};

THank you

On 2019-01-05 08:49, Stephan Bosch wrote:

Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: 
Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin


The Dovecot API documentation is not exhaustive everywhere, but the basics are 
documented. The remaining questions can be answered by looking at examples 
found in similar plugins or the relevant API sources.

I know of one FTS plugin not written by Dovecot developers:

https://github.com/atkinsj/fts-elasticsearch

If you really wish to do something like this, just go ahead. It will not be a 
small effort though. As soon as you have concrete questions, we can help you 
(don't expect rapid responses though).

Regards,

Stephan.

Re: Solr -> Xapian ?

2019-01-05 Thread Stephan Bosch



Op 06/01/2019 om 01:00 schreef Joan Moreau:

Anyone willing to explain those functions ?

Most notably " get_last_uid"


From src/plugins/fts/fts-api.h:

/* Get the last_uid for the mailbox. */
int fts_backend_get_last_uid(struct fts_backend *backend, struct mailbox 
*box,

             uint32_t *last_uid_r);

The solr sources ( src/plugins/fts-solr/fts-backend-solr.c:213) tell me 
this returns the last UID added to the index for the given mailbox and 
FTS index.



"set_build_key"


From src/plugins/fts/fts-api.h:

/* Switch to building index for specified key. If backend doesn't want to
   index this key, it can return FALSE and caller will skip to next key. */
bool fts_backend_update_set_build_key(struct fts_backend_update_context 
*ctx,

                  const struct fts_backend_build_key *key);

Same file provides outline of what a build_key is.


"build_more" ,


/* Add more content to the index for the currently specified build key.
   Non-BODY_PART_BINARY data must contain only full valid UTF-8 characters,
   but it doesn't need to be NUL-terminated. size contains the data size in
   bytes, not characters. This function may be called many times and 
the data

   block sizes may be small. Backend returns 0 if ok, -1 if build should be
   aborted. */
int fts_backend_update_build_more(struct fts_backend_update_context *ctx,
                  const unsigned char *data, size_t size);

You should look at the sources of a few backends like squat and solr to 
get a feel of what exactly this is doing.



what is refresh versus rescan ?


From fts-api.h:

/* Refresh index to make sure we see latest changes from lookups.
   Returns 0 if ok, -1 if error. */
int fts_backend_refresh(struct fts_backend *backend);
/* Go through the entire index and make sure all mails are indexed,
   and delete any extra mails in the index. */
int fts_backend_rescan(struct fts_backend *backend);

Regards,


Stepham





On January 5, 2019 14:23:10 Joan Moreau via dovecot 
 wrote:



Thank Stephan

I basically need to know the role/description of each of the 
functions of the fts_backend:



struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT,*-> what other flags ?*

{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};


THank you

On 2019-01-05 08:49, Stephan Bosch wrote:



Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot:


Why not, but please guide me about the core structure (mandatory 
funcitons, etc..) of a typical Dovecot FTS plugin


The Dovecot API documentation is not exhaustive everywhere, but the 
basics are documented. The remaining questions can be answered by 
looking at examples found in similar plugins or the relevant API 
sources.


I know of one FTS plugin not written by Dovecot developers:

https://github.com/atkinsj/fts-elasticsearch

If you really wish to do something like this, just go ahead. It will 
not be a small effort though. As soon as you have concrete 
questions, we can help you (don't expect rapid responses though).


Regards,

Stephan.








Re: Solr -> Xapian ?

2019-01-05 Thread Joan Moreau via dovecot




Anyone willing to explain those functions ?

Most notably " get_last_uid" "set_build_key" "build_more" , what is refresh 
versus rescan ?




On January 5, 2019 14:23:10 Joan Moreau via dovecot  
wrote:

Thank Stephan
I basically need to know the role/description of each of the functions of 
the fts_backend:


struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT, -> what other flags ?
{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};

THank you
On 2019-01-05 08:49, Stephan Bosch wrote:


Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot:


Why not, but please guide me about the core structure (mandatory funcitons, 
etc..) of a typical Dovecot FTS plugin
The Dovecot API documentation is not exhaustive everywhere, but the basics 
are documented. The remaining questions can be answered by looking at 
examples found in similar plugins or the relevant API sources.


I know of one FTS plugin not written by Dovecot developers:

https://github.com/atkinsj/fts-elasticsearch

If you really wish to do something like this, just go ahead. It will not be 
a small effort though. As soon as you have concrete questions, we can help 
you (don't expect rapid responses though).


Regards,

Stephan.





Re: Solr -> Xapian ?

2019-01-05 Thread Joan Moreau via dovecot


Anyone willing to explain those functions ?

On January 5, 2019 14:23:10 Joan Moreau via dovecot  
wrote:

Thank Stephan
I basically need to know the role/description of each of the functions of 
the fts_backend:


struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT, -> what other flags ?
{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};

THank you
On 2019-01-05 08:49, Stephan Bosch wrote:


Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot:


Why not, but please guide me about the core structure (mandatory funcitons, 
etc..) of a typical Dovecot FTS plugin
The Dovecot API documentation is not exhaustive everywhere, but the basics 
are documented. The remaining questions can be answered by looking at 
examples found in similar plugins or the relevant API sources.


I know of one FTS plugin not written by Dovecot developers:

https://github.com/atkinsj/fts-elasticsearch

If you really wish to do something like this, just go ahead. It will not be 
a small effort though. As soon as you have concrete questions, we can help 
you (don't expect rapid responses though).


Regards,

Stephan.




Re: Solr -> Xapian ?

2019-01-04 Thread Joan Moreau via dovecot
Thank Stephan 


I basically need to know the role/description of each of the functions
of the fts_backend: 


struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT, -> WHAT OTHER FLAGS ? 


{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
}; 

THank you 


On 2019-01-05 08:49, Stephan Bosch wrote:

Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot: 


Why not, but please guide me about the core structure (mandatory funcitons, 
etc..) of a typical Dovecot FTS plugin

The Dovecot API documentation is not exhaustive everywhere, but the basics are 
documented. The remaining questions can be answered by looking at examples 
found in similar plugins or the relevant API sources.

I know of one FTS plugin not written by Dovecot developers:

https://github.com/atkinsj/fts-elasticsearch

If you really wish to do something like this, just go ahead. It will not be a 
small effort though. As soon as you have concrete questions, we can help you 
(don't expect rapid responses though).

Regards,

Stephan.

Re: Solr -> Xapian ?

2019-01-04 Thread Stephan Bosch



Op 04/01/2019 om 11:17 schreef Joan Moreau via dovecot:


Why not, but please guide me about the core structure (mandatory 
funcitons, etc..) of a typical Dovecot FTS plugin


The Dovecot API documentation is not exhaustive everywhere, but the 
basics are documented. The remaining questions can be answered by 
looking at examples found in similar plugins or the relevant API sources.


I know of one FTS plugin not written by Dovecot developers:

https://github.com/atkinsj/fts-elasticsearch

If you really wish to do something like this, just go ahead. It will not 
be a small effort though. As soon as you have concrete questions, we can 
help you (don't expect rapid responses though).


Regards,

Stephan.




Re: Solr -> Xapian ?

2019-01-04 Thread Joan Moreau via dovecot
Also, a description of the "to be" functions of the backend: 


struct fts_backend fts_backend_xapian = {
.name = "xapian",
.flags = FTS_BACKEND_FLAG_NORMALIZE_INPUT, -> WHAT OTHER FLAGS ? 


{
fts_backend_xapian_alloc,
fts_backend_xapian_init,
fts_backend_xapian_deinit,
fts_backend_xapian_get_last_uid,
fts_backend_xapian_update_init,
fts_backend_xapian_update_deinit,
fts_backend_xapian_update_set_mailbox,
fts_backend_xapian_update_expunge,
fts_backend_xapian_update_set_build_key,
fts_backend_xapian_update_unset_build_key,
fts_backend_xapian_update_build_more,
fts_backend_xapian_refresh,
fts_backend_xapian_rescan,
fts_backend_xapian_optimize,
fts_backend_default_can_lookup,
fts_backend_xapian_lookup,
fts_backend_xapian_lookup_multi,
fts_backend_xapian_lookup_done
}
};

On 2019-01-04 20:33, Joan Moreau via dovecot wrote:

Yes but: 

1 - is there a documentation of the main object ? (fts_backend, mail_user, mailbox, etc..) 

2 - What are the mandatory functions ? 

3 - Search : Supposedly, the FTS shall have several parameters : the keyword(s), the user & mailbox, and the fields (to, from, body, etc..) to be includude in the search. What is the function called in the plugin ? 

4 - Indexing : Somehow, what is the logic ? fts core just ask to "index me this email of this mailbox" ? or this is delegated to the plugin to sort out which emails it has indexed yet or not ? 

Thank you 

On 2019-01-04 18:49, admin wrote: 
A starting point would be to have a look at the current FTS plugins: 

https://github.com/dovecot/core/tree/master/src/plugins/fts-solr 
and 
https://github.com/dovecot/core/tree/master/src/plugins/fts-squat 

-M 

Am Freitag, den 04.01.2019, 18:17 +0800 schrieb Joan Moreau via dovecot: 

Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin 

On 2019-01-04 17:20, Aki Tuomi wrote: 
I hope you are aware that "linking with Xapian" requires somewhat more work than just -lxapian in linker? If you or someone feels like writing fts_xapian, go for it. 


Aki

On 04 January 2019 at 08:20 Joan Moreau via dovecot  wrote:

What about consedering linking Dovecot with Xapian librairies instead of
going to nightmare Solr ? 


https://xapian.org/features

On 2019-01-02 17:10, John Tulp wrote:

On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : 
After some time of indexing from Dovecot, Dovecot
returns errors (invalid SID, etc...) and Solr return "out of range
indexes" errors 
I've been watching the progress of this thread with no small concern, mainly

because I've been tasked with providing a server-side email search facility
with a budget and manpower level that comes down to mainly *1*, i.e., me.

I was expecting, given the strongly worded language about "just use
lucene/SOLR" and "ignore squat", that I should invest time + effort into this
JAVA nightmare that is SOLR.

I started with squat and another word-indexor system that used out-of-band
(not a dovecot plugin) software to provide rapid (sub-second) searches through
tens-of-GB-scale mailboxes.

Unlike what I was led to believe, the squat indexes worked surprisingly well,
once you sorted out the odd resource size (ulimit-related) issues (vsz &
friends) limitations. I did notice the "worst-case" search performance have
worryingly high O(x) increases in time, but I'd not seen anything that was a
dealbreaker. It goes without saying that various substring searches worked as
expected, for the most part.

My experiences with SOLR were similar to Messr. Moreau's: lots of startup
errors with provided schemata files. Lots of JAVA nonsense issues. Lots of
sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM,
version of SOLR, and dovecot to find the "best" working combination, only to
find that the searches didn't work out as expected. I expected to be able to
do date-ranging based searches. Didn't work. I expected to search CONTENTS of
emails, and despite many days of tweaks, I couldn't get it to index even the
basics like filenames/types of attachments, so I could exposed
attachment-based searching to my users.

So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a
Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the
following functionality:

1) The ability to search for a string within any of the structured fields
(from/subject) that returns correct results?

2) The ability to search for any string within the BODY of emails, including
the MIME attachment boundaries?

3) The ability to do "ranging" searches for structures within emails that
decompose to "dates" or other simple-numeric data?

OPTIONALLY, and this is probably way outside of the scope of the above,
despite the fact that it's listed as a "selling point" of SOLR versus other
full text search engines:

4) The ability to do searches against any attachments that are able to be
post-processed and hyper-indexed by SOLR+Tika?

-

SOLR 

Re: Solr -> Xapian ?

2019-01-04 Thread Joan Moreau via dovecot
Yes but: 


1 - is there a documentation of the main object ? (fts_backend,
mail_user, mailbox, etc..) 

2 - What are the mandatory functions ? 


3 - Search : Supposedly, the FTS shall have several parameters : the
keyword(s), the user & mailbox, and the fields (to, from, body, etc..)
to be includude in the search. What is the function called in the plugin
? 


4 - Indexing : Somehow, what is the logic ? fts core just ask to "index
me this email of this mailbox" ? or this is delegated to the plugin to
sort out which emails it has indexed yet or not ? 

Thank you 


On 2019-01-04 18:49, admin wrote:

A starting point would be to have a look at the current FTS plugins: 

https://github.com/dovecot/core/tree/master/src/plugins/fts-solr 
and 
https://github.com/dovecot/core/tree/master/src/plugins/fts-squat 

-M 

Am Freitag, den 04.01.2019, 18:17 +0800 schrieb Joan Moreau via dovecot: 

Why not, but please guide me about the core structure (mandatory funcitons, etc..) of a typical Dovecot FTS plugin 

On 2019-01-04 17:20, Aki Tuomi wrote: 
I hope you are aware that "linking with Xapian" requires somewhat more work than just -lxapian in linker? If you or someone feels like writing fts_xapian, go for it. 


Aki

On 04 January 2019 at 08:20 Joan Moreau via dovecot  wrote:

What about consedering linking Dovecot with Xapian librairies instead of
going to nightmare Solr ? 


https://xapian.org/features

On 2019-01-02 17:10, John Tulp wrote:

On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : 
After some time of indexing from Dovecot, Dovecot
returns errors (invalid SID, etc...) and Solr return "out of range
indexes" errors 
I've been watching the progress of this thread with no small concern, mainly

because I've been tasked with providing a server-side email search facility
with a budget and manpower level that comes down to mainly *1*, i.e., me.

I was expecting, given the strongly worded language about "just use
lucene/SOLR" and "ignore squat", that I should invest time + effort into this
JAVA nightmare that is SOLR.

I started with squat and another word-indexor system that used out-of-band
(not a dovecot plugin) software to provide rapid (sub-second) searches through
tens-of-GB-scale mailboxes.

Unlike what I was led to believe, the squat indexes worked surprisingly well,
once you sorted out the odd resource size (ulimit-related) issues (vsz &
friends) limitations. I did notice the "worst-case" search performance have
worryingly high O(x) increases in time, but I'd not seen anything that was a
dealbreaker. It goes without saying that various substring searches worked as
expected, for the most part.

My experiences with SOLR were similar to Messr. Moreau's: lots of startup
errors with provided schemata files. Lots of JAVA nonsense issues. Lots of
sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM,
version of SOLR, and dovecot to find the "best" working combination, only to
find that the searches didn't work out as expected. I expected to be able to
do date-ranging based searches. Didn't work. I expected to search CONTENTS of
emails, and despite many days of tweaks, I couldn't get it to index even the
basics like filenames/types of attachments, so I could exposed
attachment-based searching to my users.

So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a
Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the
following functionality:

1) The ability to search for a string within any of the structured fields
(from/subject) that returns correct results?

2) The ability to search for any string within the BODY of emails, including
the MIME attachment boundaries?

3) The ability to do "ranging" searches for structures within emails that
decompose to "dates" or other simple-numeric data?

OPTIONALLY, and this is probably way outside of the scope of the above,
despite the fact that it's listed as a "selling point" of SOLR versus other
full text search engines:

4) The ability to do searches against any attachments that are able to be
post-processed and hyper-indexed by SOLR+Tika?

-

SOLR seems to have "brand cachet", so presumably it actually works (for 
somebody).

Dovecot has not a little "brand cachet", and for me, I have innate faith and
trust in Timo and his software. I am no stranger to the "costs" of "free"
software, in that you sacrifice your own blood, sweat, and tears just to get
these disparate pieces to work together.

I *DO* respect that Timo has to keep the lights (and sauna) on in Finland.
Maybe there's a super-secret (no advertised prices, "carrier-only" price list)
with _Dovecot, Oy_ wherein the above ARE actually available for something less
than 6.022 x 10^23 Euros per centi-second of licencing fees.

But please, level with us faithful users.  Does this morass of Java B.S.
actually work, and if not, please just deprecate and remove this moribund
software, and stop trying to bury the 

Re: Solr -> Xapian ?

2019-01-04 Thread admin
A starting point would be to have a look at the current FTS plugins:

https://github.com/dovecot/core/tree/master/src/plugins/fts-solrandhttps://github.com/dovecot/core/tree/master/src/plugins/fts-squat
-M

Am Freitag, den 04.01.2019, 18:17 +0800 schrieb Joan Moreau via
dovecot:
> Why not, but please guide me about the core structure (mandatory
> funcitons, etc..) of a typical Dovecot FTS plugin 
> 
> 
> 
> 
>  
> 
> 
> On 2019-01-04 17:20, Aki Tuomi wrote:
> > I hope you are aware that "linking with Xapian" requires somewhat
> > more work than just -lxapian in linker? If you or someone feels
> > like writing fts_xapian, go for it. 
> > 
> > Aki
> > 
> > 
> > > On 04 January 2019 at 08:20 Joan Moreau via dovecot <
> > > dovecot@dovecot.org> wrote:
> > > 
> > > 
> > > What about consedering linking Dovecot with Xapian librairies
> > > instead of
> > > going to nightmare Solr ? 
> > > 
> > > https://xapian.org/features
> > > 
> > > On 2019-01-02 17:10, John Tulp wrote:
> > > 
> > > 
> > > > On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main
> > > > problem is : After some time of indexing from Dovecot, Dovecot
> > > > returns errors (invalid SID, etc...) and Solr return "out of
> > > > range
> > > > indexes" errors 
> > > > I've been watching the progress of this thread with no small
> > > > concern, mainly
> > > > because I've been tasked with providing a server-side email
> > > > search facility
> > > > with a budget and manpower level that comes down to mainly *1*,
> > > > i.e., me.
> > > > 
> > > > I was expecting, given the strongly worded language about "just
> > > > use
> > > > lucene/SOLR" and "ignore squat", that I should invest time +
> > > > effort into this
> > > > JAVA nightmare that is SOLR.
> > > > 
> > > > I started with squat and another word-indexor system that used
> > > > out-of-band
> > > > (not a dovecot plugin) software to provide rapid (sub-second)
> > > > searches through
> > > > tens-of-GB-scale mailboxes.
> > > > 
> > > > Unlike what I was led to believe, the squat indexes worked
> > > > surprisingly well,
> > > > once you sorted out the odd resource size (ulimit-related)
> > > > issues (vsz &
> > > > friends) limitations. I did notice the "worst-case" search
> > > > performance have
> > > > worryingly high O(x) increases in time, but I'd not seen
> > > > anything that was a
> > > > dealbreaker. It goes without saying that various substring
> > > > searches worked as
> > > > expected, for the most part.
> > > > 
> > > > My experiences with SOLR were similar to Messr. Moreau's: lots
> > > > of startup
> > > > errors with provided schemata files. Lots of JAVA nonsense
> > > > issues. Lots of
> > > > sensitivity to WHICH Java runtime, etc, etc. I finally fixated
> > > > a specific JVM,
> > > > version of SOLR, and dovecot to find the "best" working
> > > > combination, only to
> > > > find that the searches didn't work out as expected. I expected
> > > > to be able to
> > > > do date-ranging based searches. Didn't work. I expected to
> > > > search CONTENTS of
> > > > emails, and despite many days of tweaks, I couldn't get it to
> > > > index even the
> > > > basics like filenames/types of attachments, so I could exposed
> > > > attachment-based searching to my users.
> > > > 
> > > > So, without rancour or antipathy, I ask the entire list: has
> > > > ANYONE gotten a
> > > > Dovecot/solr-fts-plugin setup to work that provides as a
> > > > BASELINE, all of the
> > > > following functionality:
> > > > 
> > > > 1) The ability to search for a string within any of the
> > > > structured fields
> > > > (from/subject) that returns correct results?
> > > > 
> > > > 2) The ability to search for any string within the BODY of
> > > > emails, including
> > > > the MIME attachment boundaries?
> > > > 
> > > > 3) The ability to do "ranging" searches for structures within
> > > > emails that
> > > > decompose to "dates" or other simple-numeric data?
> > > > 
> > > > OPTIONALLY, and this is probably way outside of the scope of
> > > > the above,
> > > > despite the fact that it's listed as a "selling point" of SOLR
> > > > versus other
> > > > full text search engines:
> > > > 
> > > > 4) The ability to do searches against any attachments that are
> > > > able to be
> > > > post-processed and hyper-indexed by SOLR+Tika?
> > > > 
> > > > -
> > > > 
> > > > SOLR seems to have "brand cachet", so presumably it actually
> > > > works (for somebody).
> > > > 
> > > > Dovecot has not a little "brand cachet", and for me, I have
> > > > innate faith and
> > > > trust in Timo and his software. I am no stranger to the "costs"
> > > > of "free"
> > > > software, in that you sacrifice your own blood, sweat, and
> > > > tears just to get
> > > > these disparate pieces to work together.
> > > > 
> > > > I *DO* respect that Timo has to keep the lights (and sauna) on
> > > > in Finland.
> > > > Maybe there's a super-secret (no advertised prices, "carrier-
> > > > only" price list)
> > > > 

Re: Solr -> Xapian ?

2019-01-04 Thread Joan Moreau via dovecot

Why not, but please guide me about the core structure (mandatory
funcitons, etc..) of a typical Dovecot FTS plugin 


On 2019-01-04 17:20, Aki Tuomi wrote:

I hope you are aware that "linking with Xapian" requires somewhat more work than just -lxapian in linker? If you or someone feels like writing fts_xapian, go for it. 


Aki

On 04 January 2019 at 08:20 Joan Moreau via dovecot  wrote:

What about consedering linking Dovecot with Xapian librairies instead of
going to nightmare Solr ? 


https://xapian.org/features

On 2019-01-02 17:10, John Tulp wrote:

On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : 
After some time of indexing from Dovecot, Dovecot
returns errors (invalid SID, etc...) and Solr return "out of range
indexes" errors 
I've been watching the progress of this thread with no small concern, mainly

because I've been tasked with providing a server-side email search facility
with a budget and manpower level that comes down to mainly *1*, i.e., me.

I was expecting, given the strongly worded language about "just use
lucene/SOLR" and "ignore squat", that I should invest time + effort into this
JAVA nightmare that is SOLR.

I started with squat and another word-indexor system that used out-of-band
(not a dovecot plugin) software to provide rapid (sub-second) searches through
tens-of-GB-scale mailboxes.

Unlike what I was led to believe, the squat indexes worked surprisingly well,
once you sorted out the odd resource size (ulimit-related) issues (vsz &
friends) limitations. I did notice the "worst-case" search performance have
worryingly high O(x) increases in time, but I'd not seen anything that was a
dealbreaker. It goes without saying that various substring searches worked as
expected, for the most part.

My experiences with SOLR were similar to Messr. Moreau's: lots of startup
errors with provided schemata files. Lots of JAVA nonsense issues. Lots of
sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM,
version of SOLR, and dovecot to find the "best" working combination, only to
find that the searches didn't work out as expected. I expected to be able to
do date-ranging based searches. Didn't work. I expected to search CONTENTS of
emails, and despite many days of tweaks, I couldn't get it to index even the
basics like filenames/types of attachments, so I could exposed
attachment-based searching to my users.

So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a
Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the
following functionality:

1) The ability to search for a string within any of the structured fields
(from/subject) that returns correct results?

2) The ability to search for any string within the BODY of emails, including
the MIME attachment boundaries?

3) The ability to do "ranging" searches for structures within emails that
decompose to "dates" or other simple-numeric data?

OPTIONALLY, and this is probably way outside of the scope of the above,
despite the fact that it's listed as a "selling point" of SOLR versus other
full text search engines:

4) The ability to do searches against any attachments that are able to be
post-processed and hyper-indexed by SOLR+Tika?

-

SOLR seems to have "brand cachet", so presumably it actually works (for 
somebody).

Dovecot has not a little "brand cachet", and for me, I have innate faith and
trust in Timo and his software. I am no stranger to the "costs" of "free"
software, in that you sacrifice your own blood, sweat, and tears just to get
these disparate pieces to work together.

I *DO* respect that Timo has to keep the lights (and sauna) on in Finland.
Maybe there's a super-secret (no advertised prices, "carrier-only" price list)
with _Dovecot, Oy_ wherein the above ARE actually available for something less
than 6.022 x 10^23 Euros per centi-second of licencing fees.

But please, level with us faithful users.  Does this morass of Java B.S.
actually work, and if not, please just deprecate and remove this moribund
software, and stop trying to bury the only FTS plugin many of us HAVE actually
gotten to work.  (Pretty please?)

I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S.
to actually work, as I have. 


He persevered where I'd given up. He's vocal about it, and now I'm chiming in
that this ornate collection of switchblades only cuts those who try to use them.

Respectfully,
=M=  Fascinating...

SOLR says the following are powered by SOLR...

https://wiki.apache.org/solr/PublicServers

Perhaps if you could find out from that list which of them are using
SOLR in conjunction with Dovecot...

food for thought...

Re: Solr -> Xapian ?

2019-01-04 Thread Aki Tuomi
I hope you are aware that "linking with Xapian" requires somewhat more work 
than just -lxapian in linker? If you or someone feels like writing fts_xapian, 
go for it. 

Aki

> On 04 January 2019 at 08:20 Joan Moreau via dovecot  
> wrote:
> 
> 
> What about consedering linking Dovecot with Xapian librairies instead of
> going to nightmare Solr ? 
> 
> https://xapian.org/features
> 
> On 2019-01-02 17:10, John Tulp wrote:
> 
> > On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : 
> > After some time of indexing from Dovecot, Dovecot
> > returns errors (invalid SID, etc...) and Solr return "out of range
> > indexes" errors 
> > I've been watching the progress of this thread with no small concern, mainly
> > because I've been tasked with providing a server-side email search facility
> > with a budget and manpower level that comes down to mainly *1*, i.e., me.
> > 
> > I was expecting, given the strongly worded language about "just use
> > lucene/SOLR" and "ignore squat", that I should invest time + effort into 
> > this
> > JAVA nightmare that is SOLR.
> > 
> > I started with squat and another word-indexor system that used out-of-band
> > (not a dovecot plugin) software to provide rapid (sub-second) searches 
> > through
> > tens-of-GB-scale mailboxes.
> > 
> > Unlike what I was led to believe, the squat indexes worked surprisingly 
> > well,
> > once you sorted out the odd resource size (ulimit-related) issues (vsz &
> > friends) limitations. I did notice the "worst-case" search performance have
> > worryingly high O(x) increases in time, but I'd not seen anything that was a
> > dealbreaker. It goes without saying that various substring searches worked 
> > as
> > expected, for the most part.
> > 
> > My experiences with SOLR were similar to Messr. Moreau's: lots of startup
> > errors with provided schemata files. Lots of JAVA nonsense issues. Lots of
> > sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific 
> > JVM,
> > version of SOLR, and dovecot to find the "best" working combination, only to
> > find that the searches didn't work out as expected. I expected to be able to
> > do date-ranging based searches. Didn't work. I expected to search CONTENTS 
> > of
> > emails, and despite many days of tweaks, I couldn't get it to index even the
> > basics like filenames/types of attachments, so I could exposed
> > attachment-based searching to my users.
> > 
> > So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a
> > Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of 
> > the
> > following functionality:
> > 
> > 1) The ability to search for a string within any of the structured fields
> > (from/subject) that returns correct results?
> > 
> > 2) The ability to search for any string within the BODY of emails, including
> > the MIME attachment boundaries?
> > 
> > 3) The ability to do "ranging" searches for structures within emails that
> > decompose to "dates" or other simple-numeric data?
> > 
> > OPTIONALLY, and this is probably way outside of the scope of the above,
> > despite the fact that it's listed as a "selling point" of SOLR versus other
> > full text search engines:
> > 
> > 4) The ability to do searches against any attachments that are able to be
> > post-processed and hyper-indexed by SOLR+Tika?
> > 
> > -
> > 
> > SOLR seems to have "brand cachet", so presumably it actually works (for 
> > somebody).
> > 
> > Dovecot has not a little "brand cachet", and for me, I have innate faith and
> > trust in Timo and his software. I am no stranger to the "costs" of "free"
> > software, in that you sacrifice your own blood, sweat, and tears just to get
> > these disparate pieces to work together.
> > 
> > I *DO* respect that Timo has to keep the lights (and sauna) on in Finland.
> > Maybe there's a super-secret (no advertised prices, "carrier-only" price 
> > list)
> > with _Dovecot, Oy_ wherein the above ARE actually available for something 
> > less
> > than 6.022 x 10^23 Euros per centi-second of licencing fees.
> > 
> > But please, level with us faithful users.  Does this morass of Java B.S.
> > actually work, and if not, please just deprecate and remove this moribund
> > software, and stop trying to bury the only FTS plugin many of us HAVE 
> > actually
> > gotten to work.  (Pretty please?)
> > 
> > I respect that Messr. Moreau has made an earnest effort to get this JAVA 
> > B.S.
> > to actually work, as I have. 
> > 
> > He persevered where I'd given up. He's vocal about it, and now I'm chiming 
> > in
> > that this ornate collection of switchblades only cuts those who try to use 
> > them.
> > 
> > Respectfully,
> > =M=
>  Fascinating...
> 
> SOLR says the following are powered by SOLR...
> 
> https://wiki.apache.org/solr/PublicServers
> 
> Perhaps if you could find out from that list which of them are using
> SOLR in conjunction with Dovecot...
> 
> food for thought...


Solr -> Xapian ?

2019-01-03 Thread Joan Moreau via dovecot

What about consedering linking Dovecot with Xapian librairies instead of
going to nightmare Solr ? 


https://xapian.org/features

On 2019-01-02 17:10, John Tulp wrote:


On Wed, 2019-01-02 at 00:59 -0800, M. Balridge wrote: The main problem is : 
After some time of indexing from Dovecot, Dovecot
returns errors (invalid SID, etc...) and Solr return "out of range
indexes" errors 
I've been watching the progress of this thread with no small concern, mainly

because I've been tasked with providing a server-side email search facility
with a budget and manpower level that comes down to mainly *1*, i.e., me.

I was expecting, given the strongly worded language about "just use
lucene/SOLR" and "ignore squat", that I should invest time + effort into this
JAVA nightmare that is SOLR.

I started with squat and another word-indexor system that used out-of-band
(not a dovecot plugin) software to provide rapid (sub-second) searches through
tens-of-GB-scale mailboxes.

Unlike what I was led to believe, the squat indexes worked surprisingly well,
once you sorted out the odd resource size (ulimit-related) issues (vsz &
friends) limitations. I did notice the "worst-case" search performance have
worryingly high O(x) increases in time, but I'd not seen anything that was a
dealbreaker. It goes without saying that various substring searches worked as
expected, for the most part.

My experiences with SOLR were similar to Messr. Moreau's: lots of startup
errors with provided schemata files. Lots of JAVA nonsense issues. Lots of
sensitivity to WHICH Java runtime, etc, etc. I finally fixated a specific JVM,
version of SOLR, and dovecot to find the "best" working combination, only to
find that the searches didn't work out as expected. I expected to be able to
do date-ranging based searches. Didn't work. I expected to search CONTENTS of
emails, and despite many days of tweaks, I couldn't get it to index even the
basics like filenames/types of attachments, so I could exposed
attachment-based searching to my users.

So, without rancour or antipathy, I ask the entire list: has ANYONE gotten a
Dovecot/solr-fts-plugin setup to work that provides as a BASELINE, all of the
following functionality:

1) The ability to search for a string within any of the structured fields
(from/subject) that returns correct results?

2) The ability to search for any string within the BODY of emails, including
the MIME attachment boundaries?

3) The ability to do "ranging" searches for structures within emails that
decompose to "dates" or other simple-numeric data?

OPTIONALLY, and this is probably way outside of the scope of the above,
despite the fact that it's listed as a "selling point" of SOLR versus other
full text search engines:

4) The ability to do searches against any attachments that are able to be
post-processed and hyper-indexed by SOLR+Tika?

-

SOLR seems to have "brand cachet", so presumably it actually works (for 
somebody).

Dovecot has not a little "brand cachet", and for me, I have innate faith and
trust in Timo and his software. I am no stranger to the "costs" of "free"
software, in that you sacrifice your own blood, sweat, and tears just to get
these disparate pieces to work together.

I *DO* respect that Timo has to keep the lights (and sauna) on in Finland.
Maybe there's a super-secret (no advertised prices, "carrier-only" price list)
with _Dovecot, Oy_ wherein the above ARE actually available for something less
than 6.022 x 10^23 Euros per centi-second of licencing fees.

But please, level with us faithful users.  Does this morass of Java B.S.
actually work, and if not, please just deprecate and remove this moribund
software, and stop trying to bury the only FTS plugin many of us HAVE actually
gotten to work.  (Pretty please?)

I respect that Messr. Moreau has made an earnest effort to get this JAVA B.S.
to actually work, as I have. 


He persevered where I'd given up. He's vocal about it, and now I'm chiming in
that this ornate collection of switchblades only cuts those who try to use them.

Respectfully,
=M=

Fascinating...

SOLR says the following are powered by SOLR...

https://wiki.apache.org/solr/PublicServers

Perhaps if you could find out from that list which of them are using
SOLR in conjunction with Dovecot...

food for thought...