[Spanning Sync] Detailed info about the duplicate contacts problem

cwood Wed, 12 Aug 2009 14:04:47 -0700

For those of you who appreciate more information, here's the note I
just sent to Larry and Byron about the duplicate contact problem:


If the user had been using Address Book compatibility mode in v2.1.3,
his data in Google had Unicode ZLS characters in it. The Spanning Sync
3 server code didn't properly parse that, and instead pulled it down
as-is, which caused the client not to recognize the contact as the
same as its counterpart in Address Book, so it created a duplicate
(with "invisible" Unicode characters in the name and postal addresses
at that).

I've now fixed the Spanning Sync 3 server code to properly parse ZLS-
encoded names, so they're copacetic.

People can remove the dupes with Tools, or they can simply restore
from the backup made when they installed Spanning Sync 3. Instructions
for that are on the web at <http://spanningsync.com/help/#preinstall-
restore>.

But wait, there's more!

Some time in the last two weeks, Google made an unannounced change to
the way their Contacts API parses names. Previously, if we gave them a
structured name (like title="Dr.", first name="Hans Peter", last
name="Schulz"), Google would store those values in their own
"structured" fields, and also create a "formatted" version for display
in Google Contacts, like "Dr. Hans Peter Schulz". If however someone
created a contact in Google Contacts, which doesn't know anything
about structured fields yet, the record would get a formatted version
("Dr. Hans Peter Schulz") and the structured fields would be left
empty. This works well, since our code can look first for the
structured data, use it if it's there, and if not then fall back to
parsing the formatted/unstructured data, which we do pretty well. We
worked hard on that.

But then Google changed something. Now when someone enters a contact
name in Google Contacts, Google applies their own parser to the
formatted/unstructured name and populates the structured fields with
the output. That's wouldn't be so bad except for the fact that their
parser is awful. "Dr. Adam Smith" gets parsed as (first name="Dr.",
middle name="Adam", last name="Smith"), to say nothing of multiword
names like "von Beethoven" or suffixes like "Ph.D.". But there's no
way for me to know if Google put the "structured" information in the
record or if we did, so if I see it I use it. And if Google put it
there, chances are it's going to cause a dupe unless it's a super-
simple name like "Adam Smith".

So the good news is people won't see every single contact duplicated
on their first sync after upgrading. The bad news is, until Google
fixes their parser, people will still see dupes sometimes for contacts
with anything other than super-simple names. The fix is easy: just
delete the version where the names aren't in the right fields and
sync, but it's still a pain.

I'm in touch with the Contacts API team and hope they can get this
fixed soon.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Spanning Sync" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/spanningsync?hl=en
-~----------~----~----~----~------~----~------~--~---

[Spanning Sync] Detailed info about the duplicate contacts problem

Reply via email to