Re: [Evolution-hackers] Sqlite cache for address-book storage in EDS

2011-03-21 Thread Matthias Braun
Am Montag, den 14.03.2011, 08:40 -0400 schrieb Adam Tauno Williams:
 On Mon, 2011-03-14 at 18:57 +0530, Chenthill Palanisamy wrote:
  On Mon, Mar 14, 2011 at 3:53 PM, Adam Tauno Williams
  awill...@opengroupware.us wrote:
   On Mon, 2011-03-14 at 10:09 +0530, Chenthill Palanisamy wrote:
   On Thu, Mar 10, 2011 at 6:54 PM, Matthew Barnes mbar...@redhat.com 
   wrote:
Okay, this might be a long shot but I'm gonna throw it out there 
anyway:
would it make sense to look at using Xapian to index a directory of raw
vCards?
   Am not sure if its worth doing this for adress-book. Am just making an
   assumption that the
   address-book content may not be as huge as mail data. The only 
   address-book data
   that would be large enough would be GAL (exchange) and
   SystemAdressBook (groupwise).
   This is a self-fulfilling prophecy;  I and others have tried to have
   large address books... which doesn't work... so address books remain
   small.
  I agree, the *only* should be removed from the third sentence of mine,
  there could be other address-books.
  While thinking of Xapian for address-book, am not still convinced.
  One could search on various fields such as sender, subject,
  recipients, full-text search etc. in mailer often and xapian is said
  to work much better.
  Although I have not got any profiling information as such, but its
  just from hearing from multiple people.
  But for address-books, the most often used searches would be based on
  name and email. Even if the address-book has 21k or more data,
  a db with good indexing should perform better. The information stored
  will be small when compared to mail content.. Well these are just
  my observations, are there any other cases am missing ?
 
 This makes sense to me [I've no idea really how it is currently
 implemented or what the practical alternatives are].  But funny side
 note: if I just walk the DAV collection and save all the vcf files to a
 directory ... a simple python script can parse each file [using the
 vobject module], compare the values to a criteria, and report what items
 match... an order of magnitude faster than Evolution.  But the reason
 for this is mentioned below.
 
   I have a CardDAV/GroupDAV collection of ~21,000 contacts I'd love to
   have access to via Evolutions WebDAV address book.  But anything more
   than a thousand or so gets to be unbearably slow.
  AFAIR, there are some UI issues involved here which should be dealt
  with separately.
 
 True,  most importantly [at least for WebDAV address books] why the @^
 $*@ it issues a PROPFIND to the server to enumerate the collection at
 every search?!  Just search the data you have;  it really seems like
 update / synchronizing the collection and searching the collection
 should be independent events.
 
 I suppose I should get around to filing a bug about that.

The problem here is that obviously contacts on the server could have
changed since the last search. How else can you detect this? Do the
propfind results get an ETag, then this would be a good way to speed
things up.

Apart from that I could obviously add some timout, which would only
query the server every N-minutes and do faster queries from cache
only...

Anyway I'd be happy to get more input from people on the server side - I
wrote and used it only for my own contact collection which is ~200
contacts on an apache mod_dav server, because that was enough for me and
I never managed (and wanted) to setup one of the big groupwares just
for myself.

___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Sqlite cache for address-book storage in EDS

2011-03-14 Thread Adam Tauno Williams
On Mon, 2011-03-14 at 10:09 +0530, Chenthill Palanisamy wrote:
 On Thu, Mar 10, 2011 at 6:54 PM, Matthew Barnes mbar...@redhat.com wrote:
  Okay, this might be a long shot but I'm gonna throw it out there anyway:
  would it make sense to look at using Xapian to index a directory of raw
  vCards?
 Am not sure if its worth doing this for adress-book. Am just making an
 assumption that the
 address-book content may not be as huge as mail data. The only address-book 
 data
 that would be large enough would be GAL (exchange) and
 SystemAdressBook (groupwise).

This is a self-fulfilling prophecy;  I and others have tried to have
large address books... which doesn't work... so address books remain
small.

I have a CardDAV/GroupDAV collection of ~21,000 contacts I'd love to
have access to via Evolutions WebDAV address book.  But anything more
than a thousand or so gets to be unbearably slow.

___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Sqlite cache for address-book storage in EDS

2011-03-14 Thread Chenthill Palanisamy
On Mon, Mar 14, 2011 at 3:53 PM, Adam Tauno Williams
awill...@opengroupware.us wrote:
 On Mon, 2011-03-14 at 10:09 +0530, Chenthill Palanisamy wrote:
 On Thu, Mar 10, 2011 at 6:54 PM, Matthew Barnes mbar...@redhat.com wrote:
  Okay, this might be a long shot but I'm gonna throw it out there anyway:
  would it make sense to look at using Xapian to index a directory of raw
  vCards?
 Am not sure if its worth doing this for adress-book. Am just making an
 assumption that the
 address-book content may not be as huge as mail data. The only address-book 
 data
 that would be large enough would be GAL (exchange) and
 SystemAdressBook (groupwise).

 This is a self-fulfilling prophecy;  I and others have tried to have
 large address books... which doesn't work... so address books remain
 small.
I agree, the *only* should be removed from the third sentence of mine,
there could be other address-books.
While thinking of Xapian for address-book, am not still convinced.

One could search on various fields such as sender, subject,
recipients, full-text search etc. in mailer often and xapian is said
to work much better.
Although I have not got any profiling information as such, but its
just from hearing from multiple people.

But for address-books, the most often used searches would be based on
name and email. Even if the address-book has 21k or more data,
a db with good indexing should perform better. The information stored
will be small when compared to mail content.. Well these are just
my observations, are there any other cases am missing ?


 I have a CardDAV/GroupDAV collection of ~21,000 contacts I'd love to
 have access to via Evolutions WebDAV address book.  But anything more
 than a thousand or so gets to be unbearably slow.
AFAIR, there are some UI issues involved here which should be dealt
with separately.

- Chenthill.

 ___
 evolution-hackers mailing list
 evolution-hackers@gnome.org
 To change your list options or unsubscribe, visit ...
 http://mail.gnome.org/mailman/listinfo/evolution-hackers

___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Sqlite cache for address-book storage in EDS

2011-03-14 Thread Adam Tauno Williams
On Mon, 2011-03-14 at 18:57 +0530, Chenthill Palanisamy wrote:
 On Mon, Mar 14, 2011 at 3:53 PM, Adam Tauno Williams
 awill...@opengroupware.us wrote:
  On Mon, 2011-03-14 at 10:09 +0530, Chenthill Palanisamy wrote:
  On Thu, Mar 10, 2011 at 6:54 PM, Matthew Barnes mbar...@redhat.com wrote:
   Okay, this might be a long shot but I'm gonna throw it out there anyway:
   would it make sense to look at using Xapian to index a directory of raw
   vCards?
  Am not sure if its worth doing this for adress-book. Am just making an
  assumption that the
  address-book content may not be as huge as mail data. The only 
  address-book data
  that would be large enough would be GAL (exchange) and
  SystemAdressBook (groupwise).
  This is a self-fulfilling prophecy;  I and others have tried to have
  large address books... which doesn't work... so address books remain
  small.
 I agree, the *only* should be removed from the third sentence of mine,
 there could be other address-books.
 While thinking of Xapian for address-book, am not still convinced.
 One could search on various fields such as sender, subject,
 recipients, full-text search etc. in mailer often and xapian is said
 to work much better.
 Although I have not got any profiling information as such, but its
 just from hearing from multiple people.
 But for address-books, the most often used searches would be based on
 name and email. Even if the address-book has 21k or more data,
 a db with good indexing should perform better. The information stored
 will be small when compared to mail content.. Well these are just
 my observations, are there any other cases am missing ?

This makes sense to me [I've no idea really how it is currently
implemented or what the practical alternatives are].  But funny side
note: if I just walk the DAV collection and save all the vcf files to a
directory ... a simple python script can parse each file [using the
vobject module], compare the values to a criteria, and report what items
match... an order of magnitude faster than Evolution.  But the reason
for this is mentioned below.

  I have a CardDAV/GroupDAV collection of ~21,000 contacts I'd love to
  have access to via Evolutions WebDAV address book.  But anything more
  than a thousand or so gets to be unbearably slow.
 AFAIR, there are some UI issues involved here which should be dealt
 with separately.

True,  most importantly [at least for WebDAV address books] why the @^
$*@ it issues a PROPFIND to the server to enumerate the collection at
every search?!  Just search the data you have;  it really seems like
update / synchronizing the collection and searching the collection
should be independent events.

I suppose I should get around to filing a bug about that.

___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Sqlite cache for address-book storage in EDS

2011-03-13 Thread Chenthill Palanisamy
On Thu, Mar 10, 2011 at 6:54 PM, Matthew Barnes mbar...@redhat.com wrote:

 On Thu, 2011-03-10 at 08:13 +0100, Milan Crha wrote:
  do not forget that the DB cache is compiled conditionally, because some
  distros do not ship libdb. Using SQLite for this was mentioned months
  ago, only no-one got time to actually do it, so go for it.

 Also, as far as I know there is still licensing issues between Berkeley
 DB's Sleepcat license and [L]GPL, which is how libebackend was born.

 https://bugzilla.gnome.org/show_bug.cgi?id=465374

 I'm +1 on dumping Berkeley DB.


  Only think of two things:
  - using binary storage for this kind of data is bad for cases where
    the binary file breaks, either due to an update/downgrade of the
    library providing access to it, or just by a crash. It's not so hot
    with camel as SQLite has there only summary data, but if you want to
    store also real data in it, then it can be a problem. There are people
    having issues recovering their data from addressbook storage already,
    but if you are going to do any change on it, then it would be good to
    think of that from the beginning. It would be good to store raw vCards
    in some plain text file(s) which will be indexed by SQLite summary.
    This plain text file(s) will be then easy to import to evolution if
    something goes wrong, and with erasing SQLite file user will not
    loose any valuable data. (I'm thinking of a flat maildir approach
    here.)

 Milan raises a good point about binary formats versus text.  Would be
 good for the raw data to remain human readable.
Yes, it makes senses to store it that way. If we can index the data in
sqlite summary and store
VCards in the way we store individual mail data, it should be sufficient..


 Okay, this might be a long shot but I'm gonna throw it out there anyway:
 would it make sense to look at using Xapian to index a directory of raw
 vCards?
Am not sure if its worth doing this for adress-book. Am just making an
assumption that the
address-book content may not be as huge as mail data. The only address-book data
that would be large enough would be GAL (exchange) and
SystemAdressBook (groupwise).
I think sqlite should suffice in indexing this..


 We've been talking about moving to notmuch [1] for mail indexing, and
 notmuch is built on Xapian.  Trying out Xapian for address books might
 be a good test drive for using it with mail.
To be honest, I wont be having that much time for testing this for
address-book. Jony
was trying to evaluate the performance between sqlite and notmuch mail indexing
for mails, any updates there Jony ?

- Chenthill.

 The catch is, Xapian is written in C++.  So we'd likely have to hand
 write our own GObject bindings for it in C.  That's what makes it a long
 shot.  But we could look to notmuch even WebKit/GTK+ for examples of
 binding C++ to C.  My C++ is rusty but I still have my Stroustrup text
 book.


 [1] http://notmuchmail.org/

 ___
 evolution-hackers mailing list
 evolution-hackers@gnome.org
 To change your list options or unsubscribe, visit ...
 http://mail.gnome.org/mailman/listinfo/evolution-hackers
___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Sqlite cache for address-book storage in EDS

2011-03-10 Thread Matthew Barnes
On Thu, 2011-03-10 at 08:13 +0100, Milan Crha wrote:
 do not forget that the DB cache is compiled conditionally, because some
 distros do not ship libdb. Using SQLite for this was mentioned months
 ago, only no-one got time to actually do it, so go for it.

Also, as far as I know there is still licensing issues between Berkeley
DB's Sleepcat license and [L]GPL, which is how libebackend was born.

https://bugzilla.gnome.org/show_bug.cgi?id=465374

I'm +1 on dumping Berkeley DB.


 Only think of two things:
 - using binary storage for this kind of data is bad for cases where
   the binary file breaks, either due to an update/downgrade of the
   library providing access to it, or just by a crash. It's not so hot
   with camel as SQLite has there only summary data, but if you want to
   store also real data in it, then it can be a problem. There are people
   having issues recovering their data from addressbook storage already,
   but if you are going to do any change on it, then it would be good to
   think of that from the beginning. It would be good to store raw vCards
   in some plain text file(s) which will be indexed by SQLite summary.
   This plain text file(s) will be then easy to import to evolution if
   something goes wrong, and with erasing SQLite file user will not
   loose any valuable data. (I'm thinking of a flat maildir approach
   here.)

Milan raises a good point about binary formats versus text.  Would be
good for the raw data to remain human readable.

Okay, this might be a long shot but I'm gonna throw it out there anyway:
would it make sense to look at using Xapian to index a directory of raw
vCards?

We've been talking about moving to notmuch [1] for mail indexing, and
notmuch is built on Xapian.  Trying out Xapian for address books might
be a good test drive for using it with mail.

The catch is, Xapian is written in C++.  So we'd likely have to hand
write our own GObject bindings for it in C.  That's what makes it a long
shot.  But we could look to notmuch even WebKit/GTK+ for examples of
binding C++ to C.  My C++ is rusty but I still have my Stroustrup text
book.


[1] http://notmuchmail.org/

___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Sqlite cache for address-book storage in EDS

2011-03-09 Thread Milan Crha
On Thu, 2011-03-10 at 12:09 +0530, Chenthill Palanisamy wrote:
 file, groupwise, exchange uses EBookBackendDBCache.

Hi,
do not forget that the DB cache is compiled conditionally, because some
distros do not ship libdb. Using SQLite for this was mentioned months
ago, only no-one got time to actually do it, so go for it.

Only think of two things:
- using binary storage for this kind of data is bad for cases where
  the binary file breaks, either due to an update/downgrade of the
  library providing access to it, or just by a crash. It's not so hot
  with camel as SQLite has there only summary data, but if you want to
  store also real data in it, then it can be a problem. There are people
  having issues recovering their data from addressbook storage already,
  but if you are going to do any change on it, then it would be good to
  think of that from the beginning. It would be good to store raw vCards
  in some plain text file(s) which will be indexed by SQLite summary.
  This plain text file(s) will be then easy to import to evolution if
  something goes wrong, and with erasing SQLite file user will not
  loose any valuable data. (I'm thinking of a flat maildir approach
  here.)

- be able to store custom values in the summary - backends can have
  a need to make its own notes in the summary, so make it possible for
  it. As these might not be so critical as contact information itself,
  then it should be fine to store to summary only.
Bye,
Milan

___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers