Good question :) In short: yes, though not this very list. I've been using Google Refine to normalise the formats from the monthly dumps. This is a combination of string matching and manual labour to pick (what I think are) better terms. I then feed "bad" and "better" formats to VacuumBot, which updates them.
A problem I encountered with that approach is that some bad terms come back every month, because people keep using "pbk" and "Hard Cover". I correct those terms every time, although the number of instances goes down after every run of VacuumBot. Also, I didn't have a way of saving the dictionary in one publicly accessible list yet and Nomenklatura looks like it may just do that. And it can be easily integrated with VacuumBot, as it's all Python. I just started experimenting, it's not _the_ list that I use yet, but it may grow towards being the list that VacuumBot uses. It can be made editable for logged in users (you can login with a GitHub account). Ben On 1 February 2013 01:20, Karen Coyle <[email protected]> wrote: > Ben, I'm unclear what you are doing with these terms - are you using > them to normalize the terms in OL? > > kc > > On 1/31/13 3:23 PM, Ben Companjen wrote: > > If you're involved with OpenRefine, you may (not) want to know that I > > just started experimenting with Nomenklatura, a reconciliation > > service/software package running at OKFN Labs. It appears to me that you > > can't use it as a reconciliation service inside OpenRefine, but a Python > > library is provided. > > <http://nomenklatura.okfnlabs.org/about> > > > > It works as follows: > > You look up a term, and you get a matching authorative term back, or a > > "No Match" error. If I look up a term, authenticated with my key, and > > get a "No Match" error, the candidate term is saved for manual > > reconciliation. Others just get the error ;) > > > > The first three formats can now be viewed at > > <http://nomenklatura.okfnlabs.org/ol_book_formats> > > > > There is little room for description of the "DataSet", so for those > > interested: I'm not trying to be authorative. If you disagree on a term, > > let's discuss - everything can be changed afterwards. > > > > Ben > > > > On 31 January 2013 23:43, Tom Morris <[email protected] > > <mailto:[email protected]>> wrote: > > > > > > Yay OpenRefine! (One of my other projects) > > > > Tom > > > > _______________________________________________ > > Ol-tech mailing list > > [email protected] <mailto:[email protected]> > > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > > To unsubscribe from this mailing list, send email to > > [email protected] <mailto: > [email protected]> > > > > > > > > > > _______________________________________________ > > Ol-tech mailing list > > [email protected] > > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > > To unsubscribe from this mailing list, send email to > [email protected] > > > > -- > Karen Coyle > [email protected] http://kcoyle.net > ph: 1-510-540-7596 > m: 1-510-435-8234 > skype: kcoylenet > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] >
_______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
