Re: [Evolution-hackers] EBookBackendSqliteDB comments

2011-05-06 Thread sean finney
Hi Milan,

On Fri, May 06, 2011 at 08:56:10AM +0200, Milan Crha wrote:
  As I already said seanus on irc, I will be evaluating the performance
  between having vcards as files Vs having it in db and then choose the
  one which would be best. So the code for both will be there and we can
  choose between them over after testing. I was also thinking of providing
  it as an option for the backends to choose once i complete the testing..
  So what we discussed stays the same :)
 
 This is not only about performance, my main concerns are these:
 a) if something fails with db file, user's data are safe

 b) users can take their contacts anytime and import them on another
 machine, in case of hard disk crash, partial backup or anything like
 that

I think we should stop and consider two different motivations for this
API.  (1) Local addressbook (2) Local cache of remote addressbook.  For
case (1), I agree that having the items split out could be useful and
a good safeguard against any db corruption (though my experience thusfar
with sqlite is fairly positive).

For case (2), I would say if there's a problem with the file just nuke
it and reload it from the remote store.  Since you can guarantee that
you can get a working copy of the info, you can then rely on the existing
UI (or sqlite, or the remote service, or whatever), for exporting the
contacts.  It is a *cache* after all :)

So for something like GAL (or any cached-from-remote addressbooks),
I think it makes a lot of sense to *not* split out the contacts, at
least as long as performance doesn't suffer by having more items in
the sqlitedb file.

 c) folders.db files tend to grow indefinitely. That's another point
 why I do not like one file per account.

I'd like to clarify a detail of the API from having looked over it wrt
evo-mapi: it's designed so that it can be used one file per account, by
creating a single db file and specifying the folder as an API parameter
in all calls.

But this means you could always create multiple db instances at different
file locations, one per folder, and just use a junk FOLDER (or similar)
name for the folder.  Having looked over the current evo-mapi code, I
think you'd want to do soemthign like that.

Of course if you think that there should *never* be a cas where it's used
one db per account, then rethinking the API would make sense, but otherwise
nothing lost by keeping it, it gives you a way to do both.

 An example: my evo-mapi account has 4 addressbooks (one is GAL). I would
 really prefer to have them separated, not in one large file. Not talking

And that should be possible, see above.

 about possible (even unlikely) UID clashes between separate
 addressbooks. Will it also mean that each local addressbook will be
 stored in one large db? Please do not do that.

The underlying db should deal with stuff like UID clashes, agreed.  I
think the current API does so, though I'm not convinced it's the best
way.  Currently, you have:

const gchar *stmt = CREATE TABLE IF NOT EXISTS folders \
 ( folder_id  TEXT PRIMARY KEY, 
\
   folder_name TEXT,
\
   sync_data TEXT,  
\
   bdata1 TEXT, bdata2 TEXT,
\
   bdata3 TEXT);

stmt = sqlite3_mprintf (CREATE TABLE IF NOT EXISTS %Q  \
( uid  TEXT PRIMARY KEY,   \
  nickname TEXT, full_name TEXT,   \
  given_name TEXT, family_name TEXT,   \
  email_1 TEXT, email_2 TEXT,  \
  email_3 TEXT, email_4 TEXT,  \
  vcard TEXT), folderid);

which AIUI means a table named after every folder.  Therefore the UID's
are already internally partitioned and will not conflict.  WRT normalizing
the database, I would suggest something more like:

const gchar *stmt = CREATE TABLE IF NOT EXISTS folders \
 ( folder_id  TEXT PRIMARY KEY, 
\
   folder_name TEXT,
\
   sync_data TEXT,  
\
   bdata1 TEXT, bdata2 TEXT,
\
   bdata3 TEXT);

stmt = sqlite3_mprintf (CREATE TABLE IF NOT EXISTS contacts  \
( folder_id INT,
  uid  TEXT,   \
  nickname TEXT, full_name TEXT,   \
  given_name TEXT, family_name TEXT,   \
  email_1 TEXT, email_2 TEXT,  \
  email_3 TEXT, email_4 TEXT,  

Re: [Evolution-hackers] EBookBackendSqliteDB comments

2011-05-05 Thread sean finney
Hi!

On Thu, May 05, 2011 at 11:20:45AM +0530, Chenthill wrote:
   * No backend _get_contact/_get_contacts equivalent.  Should be
 easily implemented.
 _get_vcard_string == _get_contact, i have not added an API return
 EContact to let the callers decide whether they want to parse the string
 to EContact.

Ah, yes, I think that would work fine.

 i have not observed any use cases for get_contacts needed by the
 backends. _book_backend_sqlitedb_search would server the
 _get_contact_list API in the backend and also for querying using a
 search query to fetch the contact list.

Right, so I think that whole bullet point could be discarded.

   * _add_contact/_remove_contact should be renamed to 
 _add_contacts/_remove_contacts to be consistant with other backend
 methods that take lists.
 Makes sense as it already acts on multiple contacts.
 
   * but also having a _add_contact/_remove_contact that takes just a uid
 (similar to other backends) would be useful
 remove_contacts already takes only uid. I do not know how far
 _add_contact with just the uid would be helpful. Which backend would
 need it ?

Okay, I think I worded this one poorly.  What I meant was having the
singular form of _add_contacts/_remove_contacts (that doesn't use
a GSList but instead a single contact).  So that the calling application
doesn't need to make a 1-item list every time some async callback
acts on a single contact.

   * if folder metadata is going to be free-form, it could be better to have
 a key-value table ( folder_id_id int, key_name text, value text ) rather
 than arbitrarily numbered text/binary fields.
 I was thinking of allowing the backends to store key value pairs using a
 bdata column which could be populated with xml key-value data. Would be
 it be good idea ?

My own preference would be for something leaner and not requiring XML ,
since it would be embedding one structured/serialized data (xml) within
another (sqlite column), which I suspect would result in code more
complicated than it needed to be (getting/setting and
serializing/unserializing vs just getting/setting, esp with multiple
threads, is what jumps to mind).

But I don't have a particularly strong feeling on this, and it's probably
not ever going to be enough on the critical path to matter though.
It's just more of a gut feeling about how the metadata would be used,
and how it might be simpler/safer/cleaner/faster on the implementation
side key/value storage was used to reflect the key/value api unless
there's a pressing reason to have XML.

But I will defer to what you and the other evo folks think, since ultimately
the caller shouldn't be too concerned with the implementation details,
as long as the API provides the key/value functionality.

  @chen: I don't know how active you plan to be on this, but if you're looking
  to offload any work, I can pick up anything that results from the above if
  you like.  Just let me know!
 The work is almost over, but will let you know once i finish the testing
 and you can directly make changes if you require anything more there :)

Okay, sounds like a plan!



sean

-- 
___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] EBookBackendSqliteDB comments

2011-05-05 Thread sean finney
On Thu, May 05, 2011 at 12:23:01PM +0530, Chenthill wrote:
  Be sure that parsing bdata is a pain, and always will,
  especially when you already are in a database world, where are tables
  and relations between them pretty common and nature.
 This is the reason I was thinking whether it would be good idea to have
 a abstract API to store extended (apart from sync_data, populated
 columns etc.) key-value pairs if the backend needs. This can form the
 xml and store it as bdata. Now the bdata would not be exposed to the
 callers. Is there any other better way to do this ?

Forgive the rusty SQL, but assuming you have a single db with
multiple folders in it, soemthing like:

create table folder_kvdata ( 
folder_id_id int foreign key references folders(folder_id),
keyname text,
keyval text
);

?  With this it would be pretty trivial to fetch single values
as well as enumerate/update/delete all keys/values for a folder.
If the caller needed something more complicated than a single value, 
an xml object or whatever else could be embedded on an as-needed basis.

  If I recall correctly then populated and last_modified were also
  stored as keys in the background, but backend could drop them
  accidentally, when accessing through keys directly. It sometimes can
  be considered a benefit, but it usually isn't. If I have specialized API
  to access these keys, then I should use it exclusively. I think.
 For the commonly used keys such as the above we would have specialized
 API's and they would be having separate columns on a per-folder basis.

yeah, I think it would be a good idea to claerly break them out from
the general k/v pairs, to avoid conflicts and special-casing any code.

  I recall us chatting about this on IRC or somewhere one day and one
  point was that the contacts will not be stored in a binary form, but
  rather as separate files. What Sean wrote earlier sounds like you
  changed your mind in this point. I do not think it's a good idea, see
  how often the sqlite folders.db file in camel is broken, and users are
  adviced to delete it. Will they loose all their contacts in such
  situation?
 As I already said seanus on irc, I will be evaluating the performance
 between having vcards as files Vs having it in db and then choose the
 one which would be best. So the code for both will be there and we can
 choose between them over after testing. I was also thinking of providing
 it as an option for the backends to choose once i complete the testing..
 So what we discussed stays the same :)

W.r.t. a performance standpoint, I will be testing against a Global
Address List of somewhere around 60k entries, so that should give a
pretty good idea :)

I think Milan also had concerns with regards to stability/fragility,
with corrupting databases, etc.  But I don't think that the split out
option is immune from these types of problems as well (and there may be
even further problems, since we would be home-rolling that solution as
opposed to relying on a well tested API/DB).


sean
___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers


[Evolution-hackers] EBookBackendSqliteDB comments

2011-05-04 Thread sean finney
Hi Everyone,

I spoke with chen on IRC this morning and got hinted at a preliminary
implementation of EBookBackendSqliteDB sitting in -ews.  Since there
are some benefits of something something like this make it's way to
a common place that could be used by -mapi as well, I thought I'd do
a quick feasability review to see what problems there might be.

Questions/commments/suggestions follow.  Please let me know what you
think!

 * No backend _get_contact/_get_contacts equivalent.  Should be
   easily implemented.
 * _add_contact/_remove_contact should be renamed to 
   _add_contacts/_remove_contacts to be consistant with other backend
   methods that take lists.
 * but also having a _add_contact/_remove_contact that takes just a uid
   (similar to other backends) would be useful
 * -mapi seems to use one cache per-profile-per-folder, but the sqlitedb
   backend takes these as calling parameters.  Not really a problem and
   I think it may be reasons to have one cache db anyway, so this is
   just more of an observation.
 * _get/_set/_delete interfaces are needed for cache metadata (last modified,
   etc).
 * if folder metadata is going to be free-form, it could be better to have
   a key-value table ( folder_id_id int, key_name text, value text ) rather
   than arbitrarily numbered text/binary fields.
 * not sure of this one: given there may be multithreaded access to the db,
   do we need to provide any external big locks on reads/writes?  maybe
   the built in sqlite stuff is sufficient.
 * not sure of this one: beyond the COMMIT statements, should there be
   something to periodically sync the db beyond the backend finalize method?  
   Unsure with commit is sufficient to get consistant on-disk in case of
   crash, etc.
 * do we need a set_populated/is_populated equivalent?  or maybe that could
   be solved in the cases it's needed wtih metadata.
 * do we need a set_time/get_time equivalent?  or maybe that could
   be solved in the cases it's needed wtih metadata.

@chen: I don't know how active you plan to be on this, but if you're looking
to offload any work, I can pick up anything that results from the above if
you like.  Just let me know!


Sean

-- 
___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Fedora builds with 2.32.2+ patches

2011-04-08 Thread sean finney
Hi David,

On Thu, Apr 07, 2011 at 01:07:38PM +0100, David Woodhouse wrote:
 Personally, no. I'd rather ignore MAPI completely and get on with the
 implementation of evolution-ews.

Understandable, though as we've discussed on IRC we don't really have
the option of using that here, at least for another couple quarters.

 
   I have quite the patch queue (maybe 10-20 patches) that I'm managing
  locally for various backported fixes there.
 
 Sounds like you would be in a good position to do it though.

Because I'm not a gnome dev, I (a) don't have push access, and (b)
am a bit hesitant to go against Milan's wishes, since he's the dev
who is primarily keeping things up for -mapi and has made his stance
pretty clear.  I only brought it up because it seemed like there might
be a change in that stance, and if so I'd be happy to share my currently
unshared fixes in .32.

Then again, now that 3.0 is released I may try again to get something
rolled together based on that since there are already a number of api
breaks making backports difficult for .32, and it seems there are lots
more in the pipe for 3.1.


sean
___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Fedora builds with 2.32.2+ patches

2011-04-08 Thread sean finney
Hi David,

On Fri, Apr 08, 2011 at 09:08:28AM +0100, David Woodhouse wrote:
 You're more than welcome to use git.infradead.org if you want. But even
 if Milan sees the 2.32 branch as being dead and doesn't want to spend
 any of his own time on it (and nobody can blame him for that), I would
 hope that he wouldn't try to obstruct *you* if you feel you need to do
 so.

Well it would be nice to get them *somewhere*, anyway, since it does feel
silly that there are a number of distro's and organizations in the same
situation who are forced to basically do the same work and have no way
to cooperate.  Maybe we can fix something out-of-band from this discussion
then, and leave it as an internal decision for the evo team whether or
not to include them.

  Then again, now that 3.0 is released I may try again to get something
  rolled together based on that since there are already a number of api
  breaks making backports difficult for .32, and it seems there are lots
  more in the pipe for 3.1.
 
 Certainly, my point in maintaining fixes for 2.32 was *not* to
 discourage people from upgrading. So if 3.0 is a viable option for you
 then please do go ahead.

I'd certainly like to upgrade if possible to stay relatively current,
but also have implementation constraints about installation size and
compatability with being run from older gnome desktops.  And last time
i tried (about a month pre-release) it didn't pan out so well.


Sean

-- 
___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers


Re: [Evolution-hackers] Fedora builds with 2.32.2+ patches

2011-04-07 Thread sean finney
Hi David,

On Thu, Apr 07, 2011 at 11:33:22AM +0100, David Woodhouse wrote:
 Once this passes muster, I'll push these patches (probably *without* the
 NTLM bits, if you're looking closely at what I included) to the
 gnome-2-32 branches and perhaps start doing a 'final call' for 2.32.3
 candidate bugs/patches.

Are there any plans to do the same for -mapi?  I have quite
the patch queue (maybe 10-20 patches) that I'm managing locally for
various backported fixes there.


sean
___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers


[Evolution-hackers] Very poor performance and hangs with ema addressbook factory

2011-03-16 Thread sean finney
Hi all,

First off, a quick into for those who haven't already met me in IRC:  

I'm consulting with the IT department of a large corporation who are
evaluating evolution-mapi as a basis for a native linux mail client
for use in large scale deployment.  In the past month or two I've been
hanging out on IRC (pestering Milan, mostly :D ), doing a fair amount of
QA and a little bit of hacking when it necessary.  At this point I'm pretty
optomistic that we'll be able to move forward with this, which is awesome,
though there are a couple issues remaining with show stopper
status here.


So on that note, I'd like to talk about the awful, awful performance of
the addressbook factory in EMA.  I've filed a bug[1], and was pointed
at the ongoing discussion about backend caches on this list, and
thought I'd join in and find out what needs to be done to get this
fixed in a reasonable (and hopefully backportable-to-.32) manner.  Since
this is more architectural in nature, I figure here is a better place
to discuss it rather than there.

For those too lazy to follow the links, this is the issue from what I can
tell so far:

 * evolution+mapi fetches the entire Global Address List (GAL) from Active
   Directory[2], and caches[3] it as an XML file
 * while fetching, the entire application will often block/hang
 * while fetching, the addressbook-factory will monopolize one or more
   cores at 100%.
 * on any access to contacts information from this list (contacts pane,
   autocomplete, etc), this data is loaded by the backend factory in its
   entirety.
 * seemingly (haven't read the code well enough), this data is searched
   linearly for matches.  or... it's really slow anyway.
 * if the frontend visits the contacts pane (where ALL contacts will be
   shown), or the To/CC type buttons, all information from the GAL is
   loaded *also* into the frontend, doubling memory usage.

Some ballpark idea of the sizes we're talking about here:

~61k contact entries in the GAL
~40MB xml file containing cached contacts
~500 MB RAM usage in the addressbook factory backend
~500 MB RAM usage in the evolution frontend
~5-10 minutes of evolution being entirely hung/unresponsive (as in greyed
out by the window manager, even).

I don't have any profiling output but have a very strong suspicion from
just poking via gdb that the majority of the time is spent doing various
things with the in-memory loaded xml file.  I think this would be a great
place to either (a) replace the xml file with an sqlite database, or (b)
split out the xml file into individual xml files and/or vcards and have
a rebuildable sqlite index.

So my question to the list is: is anyone already working on something similar
to this, somewhere else?  otherwise, any opinions for how it ought to be done?


sean


[1] https://bugzilla.gnome.org/show_bug.cgi?id=644817
[2] the entire fetching is needed for stuff like autocompletion to work, i've
been told.
[3] the term cache is not really appropriate here, as it's not a cache,
it's a replica, but i digress...
___
evolution-hackers mailing list
evolution-hackers@gnome.org
To change your list options or unsubscribe, visit ...
http://mail.gnome.org/mailman/listinfo/evolution-hackers