2008/1/15, Miguel Arroz <[EMAIL PROTECTED]>: > > Hi! > > I'm thinking how to approach the following problem, and I would > like to know opinions about this, because I may be overcomplicating > this, as I often do. > > I need to manage contact lists. A contact is an object with an > email, first name, last name, and some flags. The important thing is > the email, that's what make a contact unique. > > A contact list may have tens of thousands of contacts (this is not > a theoretical limit, it's a requirement), and cannot have duplicate > records (ie, two contacts with the same email). > > Well, my first approach is to create a restriction on the DB that > will prevent the existence of two records with the same email on the > same contact list. > > Then, let's suppose I have a contact list with 10k contacts, and > I'm adding another 10k contacts. The basic approach is: > > 1) Divide the 10k in batches of 100, to make this manageable. > 2) Try to insert the 100 contacts. > 3) If an exception raises due to the UNIQUE constraint, remove the > offending object and try again. > > This has an obvious problem, which is the fact that in the worst > case, the 100 contacts may be repeated, making this very inefficient. > > So, what I though was, if I have a failure: > > 1) Fo a fetch request to get the contacts with the emails of the > 100 contacts batch (ie, blablabla where email = email1 or email = > email2 or email = email3 ...).
sorry, but this is very ugly... you should use something like InSetQualifier (e.g. "WHERE a IN (1,2,3...") 2) Remove duplicates in memory using a fast method, like putting > the stuff in NSSets or whatever. > 3) Try to save again. Of course, it may still fail (concurrency > sucks) but the probability is much lower. > > This is all thought with the assumption that the UNIQUE-related > exception is thrown when the first offending object is inserted, so I > won't get all the information I need in one single exception, which > I'm not 100% sure it's true yet. > > So... suggestions! Is this too crappy? :) in witch format do you have you contacts? BTW I think the best way is to change the approach, and parse the contact before the insert fase. Could be an idea to fetch duplicates in the list you are going to insert into the db, I mean if you have a list with: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] you should remove the duplicates before try to insert, and you can do this in java, out of EOF. This should give less Exception while insert YOu can also do some reorder in the contacts before insert them for example order by host name, fetch by host name, is the host isn't in you can insert every one of them, else you will have very few of record to serch in. One more thing whitch db you use? Yours > > Miguel Arroz > > Miguel Arroz > http://www.terminalapp.net > http://www.ipragma.com > > > > > _______________________________________________ > Do not post admin requests to the list. They will be ignored. > Webobjects-dev mailing list ([email protected]) > Help/Unsubscribe/Update your Subscription: > http://lists.apple.com/mailman/options/webobjects-dev/ildenae%40gmail.com > > This email sent to [EMAIL PROTECTED] > > -- Daniele Corti AIM: S0CR4TE5 Messenger: [EMAIL PROTECTED] -- Computers are like air conditioners -- they stop working properly if you open WINDOWS -- What about the four lusers of the apocalypse? I nominate: "advertising", "can't log in", "power switch" and "what backup?" --Alistair Young
_______________________________________________ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list ([email protected]) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]
