Re: [Evolution-hackers] libdb performance issue? (was: Re: libebook: errors when using asynchronous contact addition/removal functions)
On Fri, May 13, 2011 at 03:44:08PM +0200, Patrick Ohly wrote: > I tried the LD_PRELOAD=libeatmydata.so workaround suggested in the mail > above and it does avoid the problem. If eatmydata removes the bottleneck, then it's likely that either (a) each operation is corresponding to an fsync/fsyncdata/sync call (i.e. each update corresponds to some kind of commit/flush), or (b) the database file is opened O_SYNC, resulting in something similar. I don't know eds or libdb well enough to know if eds is doing something out of the ordinary with the api and/or whether this is "working as designed" and/or whether there's a way to remove the synchronous behavior. > Is there anyone around who understand libdb well enough to shed some > light on this? What is a proper fix? Again from the armchair here (have not even looked at the code, feel free to LART me as a result), if there's no way to avoid the synchronous writes with the libdb api, i guess you could consider batching the async additions together and submitting them as a group? Alternatively, maybe this is another argument for moving forward with the sqlite backend support? :) sean ___ evolution-hackers mailing list evolution-hackers@gnome.org To change your list options or unsubscribe, visit ... http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] EBookBackendSqliteDB comments
Hi Milan, On Fri, May 06, 2011 at 08:56:10AM +0200, Milan Crha wrote: > > As I already said seanus on irc, I will be evaluating the performance > > between having vcards as files Vs having it in db and then choose the > > one which would be best. So the code for both will be there and we can > > choose between them over after testing. I was also thinking of providing > > it as an option for the backends to choose once i complete the testing.. > > So what we discussed stays the same :) > > This is not only about performance, my main concerns are these: > a) if something fails with db file, user's data are safe > b) users can take their contacts anytime and import them on another > machine, in case of hard disk crash, partial backup or anything like > that I think we should stop and consider two different motivations for this API. (1) Local addressbook (2) Local cache of remote addressbook. For case (1), I agree that having the items split out could be useful and a good safeguard against any db corruption (though my experience thusfar with sqlite is fairly positive). For case (2), I would say if there's a problem with the file just nuke it and reload it from the remote store. Since you can guarantee that you can get a "working copy" of the info, you can then rely on the existing UI (or sqlite, or the remote service, or whatever), for exporting the contacts. It is a *cache* after all :) So for something like GAL (or any cached-from-remote addressbooks), I think it makes a lot of sense to *not* split out the contacts, at least as long as performance doesn't suffer by having more items in the sqlitedb file. > c) folders.db files tend to grow "indefinitely". That's another point > why I do not like "one file per account". I'd like to clarify a detail of the API from having looked over it wrt evo-mapi: it's designed so that it can be used "one file per account", by creating a single db file and specifying the "folder" as an API parameter in all calls. But this means you could always create multiple db instances at different file locations, one per folder, and just use a junk "FOLDER" (or similar) name for the folder. Having looked over the current evo-mapi code, I think you'd want to do soemthign like that. Of course if you think that there should *never* be a cas where it's used one db per account, then rethinking the API would make sense, but otherwise nothing lost by keeping it, it gives you a way to do both. > An example: my evo-mapi account has 4 addressbooks (one is GAL). I would > really prefer to have them separated, not in one large file. Not talking And that should be possible, see above. > about possible (even unlikely) UID clashes between separate > addressbooks. Will it also mean that each local addressbook will be > stored in one large db? Please do not do that. The underlying db should deal with stuff like UID clashes, agreed. I think the current API does so, though I'm not convinced it's the best way. Currently, you have: const gchar *stmt = "CREATE TABLE IF NOT EXISTS folders \ ( folder_id TEXT PRIMARY KEY, \ folder_name TEXT, \ sync_data TEXT, \ bdata1 TEXT, bdata2 TEXT, \ bdata3 TEXT)"; stmt = sqlite3_mprintf ("CREATE TABLE IF NOT EXISTS %Q \ ( uid TEXT PRIMARY KEY, \ nickname TEXT, full_name TEXT, \ given_name TEXT, family_name TEXT, \ email_1 TEXT, email_2 TEXT, \ email_3 TEXT, email_4 TEXT, \ vcard TEXT)", folderid); which AIUI means a table named after every folder. Therefore the UID's are already internally partitioned and will not conflict. WRT normalizing the database, I would suggest something more like: const gchar *stmt = "CREATE TABLE IF NOT EXISTS folders \ ( folder_id TEXT PRIMARY KEY, \ folder_name TEXT, \ sync_data TEXT, \ bdata1 TEXT, bdata2 TEXT, \ bdata3 TEXT)"; stmt = sqlite3_mprintf ("CREATE TABLE IF NOT EXISTS contacts \ ( folder_id INT, uid TEXT, \ nickname TEXT, full_name TEXT, \ given_name TEXT, family_name TEXT, \ email_1 TEXT, email_2 TEXT, \
Re: [Evolution-hackers] EBookBackendSqliteDB comments
On Thu, May 05, 2011 at 12:23:01PM +0530, Chenthill wrote: > > Be sure that parsing bdata is a pain, and always will, > > especially when you already are in a database world, where are tables > > and relations between them pretty common and nature. > This is the reason I was thinking whether it would be good idea to have > a abstract API to store extended (apart from sync_data, populated > columns etc.) key-value pairs if the backend needs. This can form the > xml and store it as bdata. Now the bdata would not be exposed to the > callers. Is there any other better way to do this ? Forgive the rusty SQL, but assuming you have a single db with multiple folders in it, soemthing like: create table folder_kvdata ( folder_id_id int foreign key references folders(folder_id), keyname text, keyval text ); ? With this it would be pretty trivial to fetch single values as well as enumerate/update/delete all keys/values for a folder. If the caller needed something more complicated than a single value, an xml object or whatever else could be embedded on an as-needed basis. > > If I recall correctly then "populated" and "last_modified" were also > > stored as keys in the background, but backend could drop them > > accidentally, when accessing through keys "directly". It sometimes can > > be considered a benefit, but it usually isn't. If I have specialized API > > to access these keys, then I should use it exclusively. I think. > For the commonly used keys such as the above we would have specialized > API's and they would be having separate columns on a per-folder basis. yeah, I think it would be a good idea to claerly break them out from the "general" k/v pairs, to avoid conflicts and special-casing any code. > > I recall us chatting about this on IRC or somewhere one day and one > > point was that the contacts will not be stored in a binary form, but > > rather as separate files. What Sean wrote earlier sounds like you > > changed your mind in this point. I do not think it's a good idea, see > > how often the sqlite folders.db file in camel is broken, and users are > > adviced to delete it. Will they loose all their contacts in such > > situation? > As I already said seanus on irc, I will be evaluating the performance > between having vcards as files Vs having it in db and then choose the > one which would be best. So the code for both will be there and we can > choose between them over after testing. I was also thinking of providing > it as an option for the backends to choose once i complete the testing.. > So what we discussed stays the same :) W.r.t. a performance standpoint, I will be testing against a Global Address List of somewhere around 60k entries, so that should give a pretty good idea :) I think Milan also had concerns with regards to "stability/fragility", with corrupting databases, etc. But I don't think that the "split out" option is immune from these types of problems as well (and there may be even further problems, since we would be home-rolling that solution as opposed to relying on a well tested API/DB). sean ___ evolution-hackers mailing list evolution-hackers@gnome.org To change your list options or unsubscribe, visit ... http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] EBookBackendSqliteDB comments
Hi! On Thu, May 05, 2011 at 11:20:45AM +0530, Chenthill wrote: > > * No backend _get_contact/_get_contacts equivalent. Should be > >easily implemented. > _get_vcard_string ==> _get_contact, i have not added an API return > EContact to let the callers decide whether they want to parse the string > to EContact. Ah, yes, I think that would work fine. > i have not observed any use cases for get_contacts needed by the > backends. _book_backend_sqlitedb_search would server the > _get_contact_list API in the backend and also for querying using a > search query to fetch the contact list. Right, so I think that whole bullet point could be discarded. > > * _add_contact/_remove_contact should be renamed to > >_add_contacts/_remove_contacts to be consistant with other backend > >methods that take lists. > Makes sense as it already acts on multiple contacts. > > > * but also having a _add_contact/_remove_contact that takes just a uid > >(similar to other backends) would be useful > remove_contacts already takes only uid. I do not know how far > _add_contact with just the uid would be helpful. Which backend would > need it ? Okay, I think I worded this one poorly. What I meant was having the "singular" form of _add_contacts/_remove_contacts (that doesn't use a GSList but instead a single contact). So that the calling application doesn't need to make a 1-item list every time some async callback acts on a single contact. > > * if folder metadata is going to be free-form, it could be better to have > >a key->value table ( folder_id_id int, key_name text, value text ) rather > >than arbitrarily numbered text/binary fields. > I was thinking of allowing the backends to store key value pairs using a > bdata column which could be populated with xml key-value data. Would be > it be good idea ? My own preference would be for something leaner and not requiring XML , since it would be embedding one structured/serialized data (xml) within another (sqlite column), which I suspect would result in code more complicated than it needed to be (getting/setting and serializing/unserializing vs just getting/setting, esp with multiple threads, is what jumps to mind). But I don't have a particularly strong feeling on this, and it's probably not ever going to be enough on the critical path to matter though. It's just more of a gut feeling about how the metadata would be used, and how it might be simpler/safer/cleaner/faster on the implementation side key/value storage was used to reflect the key/value api unless there's a pressing reason to have XML. But I will defer to what you and the other evo folks think, since ultimately the caller shouldn't be too concerned with the implementation details, as long as the API provides the key/value functionality. > > @chen: I don't know how active you plan to be on this, but if you're looking > > to offload any work, I can pick up anything that results from the above if > > you like. Just let me know! > The work is almost over, but will let you know once i finish the testing > and you can directly make changes if you require anything more there :) Okay, sounds like a plan! sean -- ___ evolution-hackers mailing list evolution-hackers@gnome.org To change your list options or unsubscribe, visit ... http://mail.gnome.org/mailman/listinfo/evolution-hackers
[Evolution-hackers] EBookBackendSqliteDB comments
Hi Everyone, I spoke with chen on IRC this morning and got hinted at a preliminary implementation of EBookBackendSqliteDB sitting in -ews. Since there are some benefits of something something like this make it's way to a common place that could be used by -mapi as well, I thought I'd do a quick feasability review to see what problems there might be. Questions/commments/suggestions follow. Please let me know what you think! * No backend _get_contact/_get_contacts equivalent. Should be easily implemented. * _add_contact/_remove_contact should be renamed to _add_contacts/_remove_contacts to be consistant with other backend methods that take lists. * but also having a _add_contact/_remove_contact that takes just a uid (similar to other backends) would be useful * -mapi seems to use one cache per-profile-per-folder, but the sqlitedb backend takes these as calling parameters. Not really a problem and I think it may be reasons to have one cache db anyway, so this is just more of an observation. * _get/_set/_delete interfaces are needed for cache metadata (last modified, etc). * if folder metadata is going to be free-form, it could be better to have a key->value table ( folder_id_id int, key_name text, value text ) rather than arbitrarily numbered text/binary fields. * not sure of this one: given there may be multithreaded access to the db, do we need to provide any external "big locks" on reads/writes? maybe the built in sqlite stuff is sufficient. * not sure of this one: beyond the COMMIT statements, should there be something to periodically sync the db beyond the backend "finalize" method? Unsure with commit is sufficient to get consistant on-disk in case of crash, etc. * do we need a set_populated/is_populated equivalent? or maybe that could be solved in the cases it's needed wtih metadata. * do we need a set_time/get_time equivalent? or maybe that could be solved in the cases it's needed wtih metadata. @chen: I don't know how active you plan to be on this, but if you're looking to offload any work, I can pick up anything that results from the above if you like. Just let me know! Sean -- ___ evolution-hackers mailing list evolution-hackers@gnome.org To change your list options or unsubscribe, visit ... http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] Fedora builds with 2.32.2+ patches
Hi David, On Fri, Apr 08, 2011 at 09:08:28AM +0100, David Woodhouse wrote: > You're more than welcome to use git.infradead.org if you want. But even > if Milan sees the 2.32 branch as being dead and doesn't want to spend > any of his own time on it (and nobody can blame him for that), I would > hope that he wouldn't try to obstruct *you* if you feel you need to do > so. Well it would be nice to get them *somewhere*, anyway, since it does feel silly that there are a number of distro's and organizations in the same situation who are forced to basically do the same work and have no way to cooperate. Maybe we can fix something out-of-band from this discussion then, and leave it as an internal decision for the evo team whether or not to include them. > > Then again, now that 3.0 is released I may try again to get something > > rolled together based on that since there are already a number of api > > breaks making backports difficult for .32, and it seems there are lots > > more in the pipe for 3.1. > > Certainly, my point in maintaining fixes for 2.32 was *not* to > discourage people from upgrading. So if 3.0 is a viable option for you > then please do go ahead. I'd certainly like to upgrade if possible to stay relatively current, but also have implementation constraints about installation size and compatability with being run from older gnome desktops. And last time i tried (about a month pre-release) it didn't pan out so well. Sean -- ___ evolution-hackers mailing list evolution-hackers@gnome.org To change your list options or unsubscribe, visit ... http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] Fedora builds with 2.32.2+ patches
Hi David, On Thu, Apr 07, 2011 at 01:07:38PM +0100, David Woodhouse wrote: > Personally, no. I'd rather ignore MAPI completely and get on with the > implementation of evolution-ews. Understandable, though as we've discussed on IRC we don't really have the option of using that here, at least for another couple quarters. > > > I have quite the patch queue (maybe 10-20 patches) that I'm managing > > locally for various backported fixes there. > > Sounds like you would be in a good position to do it though. Because I'm not a gnome dev, I (a) don't have push access, and (b) am a bit hesitant to go against Milan's wishes, since he's the dev who is primarily keeping things up for -mapi and has made his stance pretty clear. I only brought it up because it seemed like there might be a change in that stance, and if so I'd be happy to share my currently unshared fixes in .32. Then again, now that 3.0 is released I may try again to get something rolled together based on that since there are already a number of api breaks making backports difficult for .32, and it seems there are lots more in the pipe for 3.1. sean ___ evolution-hackers mailing list evolution-hackers@gnome.org To change your list options or unsubscribe, visit ... http://mail.gnome.org/mailman/listinfo/evolution-hackers
Re: [Evolution-hackers] Fedora builds with 2.32.2+ patches
Hi David, On Thu, Apr 07, 2011 at 11:33:22AM +0100, David Woodhouse wrote: > Once this passes muster, I'll push these patches (probably *without* the > NTLM bits, if you're looking closely at what I included) to the > gnome-2-32 branches and perhaps start doing a 'final call' for 2.32.3 > candidate bugs/patches. Are there any plans to do the same for -mapi? I have quite the patch queue (maybe 10-20 patches) that I'm managing locally for various backported fixes there. sean ___ evolution-hackers mailing list evolution-hackers@gnome.org To change your list options or unsubscribe, visit ... http://mail.gnome.org/mailman/listinfo/evolution-hackers
[Evolution-hackers] Very poor performance and hangs with ema addressbook factory
Hi all, First off, a quick into for those who haven't already met me in IRC: I'm consulting with the IT department of a large corporation who are evaluating evolution-mapi as a basis for a native linux mail client for use in large scale deployment. In the past month or two I've been hanging out on IRC (pestering Milan, mostly :D ), doing a fair amount of QA and a little bit of hacking when it necessary. At this point I'm pretty optomistic that we'll be able to move forward with this, which is awesome, though there are a couple issues remaining with "show stopper" status here. So on that note, I'd like to talk about the awful, awful performance of the addressbook factory in EMA. I've filed a bug[1], and was pointed at the ongoing discussion about "backend caches" on this list, and thought I'd join in and find out what needs to be done to get this fixed in a reasonable (and hopefully backportable-to-.32) manner. Since this is more architectural in nature, I figure here is a better place to discuss it rather than there. For those too lazy to follow the links, this is the issue from what I can tell so far: * evolution+mapi fetches the entire Global Address List (GAL) from Active Directory[2], and "caches"[3] it as an XML file * while fetching, the entire application will often block/hang * while fetching, the addressbook-factory will monopolize one or more cores at 100%. * on any access to contacts information from this list (contacts pane, autocomplete, etc), this data is loaded by the backend factory in its entirety. * seemingly (haven't read the code well enough), this data is searched linearly for matches. or... it's really slow anyway. * if the frontend visits the contacts pane (where ALL contacts will be shown), or the "To/CC" type buttons, all information from the GAL is loaded *also* into the frontend, doubling memory usage. Some ballpark idea of the sizes we're talking about here: ~61k contact entries in the GAL ~40MB xml file containing "cached" contacts ~500 MB RAM usage in the addressbook factory backend ~500 MB RAM usage in the evolution frontend ~5-10 minutes of evolution being entirely hung/unresponsive (as in greyed out by the window manager, even). I don't have any profiling output but have a very strong suspicion from just poking via gdb that the majority of the time is spent doing various things with the in-memory loaded xml file. I think this would be a great place to either (a) replace the xml file with an sqlite database, or (b) split out the xml file into individual xml files and/or vcards and have a rebuildable sqlite index. So my question to the list is: is anyone already working on something similar to this, somewhere else? otherwise, any opinions for how it ought to be done? sean [1] https://bugzilla.gnome.org/show_bug.cgi?id=644817 [2] the entire fetching is needed for stuff like autocompletion to work, i've been told. [3] the term "cache" is not really appropriate here, as it's not a cache, it's a "replica", but i digress... ___ evolution-hackers mailing list evolution-hackers@gnome.org To change your list options or unsubscribe, visit ... http://mail.gnome.org/mailman/listinfo/evolution-hackers