On 2/1/13 4:31 PM, Ben Companjen wrote:
> Hi Karen,
>
> You replied only to me - if that was a mistake, feel free to include
> this email in an on-list response (or forward).
>
> After a run (or sweep) of VacuumBot, some "PPB" like formats show up
> again, but about 5 at the most. It looks to me like human input. "H" or
> "P" could have - in theory - come from me, when I start typing
> "Hardcover", see Firefox's autocomplete come up with that and press
> enter before selecting it, essentially submitting "H".
>
> LC Bot hasn't run since August 8, ImportBot since October 25.

Ben,

As you surmised, the indexing that was restarted was the one that 
indexes the full text of digitized books in the IA collection, not the 
indexing of OL metadata.

Anand reports that the OL metadata search problem is not just a matter 
of starting up search but that it needs "fixing." As we all know, it's 
hard to know how much fixing something needs until you start fixing it. 
But Anand intends to turn his attention to that (and to importing the IA 
digital books) once he finishes the project he is currently working on 
(which I do not understand so best leave it at that).

kc


>
> Ben
>
>
> On 1 February 2013 16:21, Karen Coyle <[email protected]
> <mailto:[email protected]>> wrote:
>
>     In terms of stuff "coming back" -- is there a way to see the source
>     of these things? I don't know if OL is still running in LC and
>     Amazon data on a regular basis, but "pbk" could well come from
>     library data (I can try to dig around in the MARC input programs
>     .... ). If that's the case then it would be better to deal with this
>     on input. If it's folks inputting these by hand, then a suggestion
>     list for that box would be great.
>
>     kc
>
>     On 1/31/13 4:48 PM, Ben Companjen wrote:
>
>         Good question :) In short: yes, though not this very list.
>
>         I've been using Google Refine to normalise the formats from the
>         monthly
>         dumps. This is a combination of string matching and manual labour to
>         pick (what I think are) better terms.
>         I then feed "bad" and "better" formats to VacuumBot, which
>         updates them.
>
>         A problem I encountered with that approach is that some bad
>         terms come
>         back every month, because people keep using "pbk" and "Hard
>         Cover". I
>         correct those terms every time, although the number of instances
>         goes
>         down after every run of VacuumBot. Also, I didn't have a way of
>         saving
>         the dictionary in one publicly accessible list yet and Nomenklatura
>         looks like it may just do that. And it can be easily integrated with
>         VacuumBot, as it's all Python.
>
>         I just started experimenting, it's not _the_ list that I use
>         yet, but it
>         may grow towards being the list that VacuumBot uses. It can be made
>         editable for logged in users (you can login with a GitHub account).
>
>         Ben
>
>
>         On 1 February 2013 01:20, Karen Coyle <[email protected]
>         <mailto:[email protected]>
>         <mailto:[email protected] <mailto:[email protected]>>> wrote:
>
>              Ben, I'm unclear what you are doing with these terms - are
>         you using
>              them to normalize the terms in OL?
>
>              kc
>
>              On 1/31/13 3:23 PM, Ben Companjen wrote:
>               > If you're involved with OpenRefine, you may (not) want
>         to know that I
>               > just started experimenting with Nomenklatura, a
>         reconciliation
>               > service/software package running at OKFN Labs. It
>         appears to me
>              that you
>               > can't use it as a reconciliation service inside
>         OpenRefine, but a
>              Python
>               > library is provided.
>               > <http://nomenklatura.okfnlabs.__org/about
>         <http://nomenklatura.okfnlabs.org/about>>
>               >
>               > It works as follows:
>               > You look up a term, and you get a matching authorative
>         term back,
>              or a
>               > "No Match" error. If I look up a term, authenticated
>         with my key, and
>               > get a "No Match" error, the candidate term is saved for
>         manual
>               > reconciliation. Others just get the error ;)
>               >
>               > The first three formats can now be viewed at
>               > <http://nomenklatura.okfnlabs.__org/ol_book_formats
>         <http://nomenklatura.okfnlabs.org/ol_book_formats>>
>               >
>               > There is little room for description of the "DataSet",
>         so for those
>               > interested: I'm not trying to be authorative. If you
>         disagree on
>              a term,
>               > let's discuss - everything can be changed afterwards.
>               >
>               > Ben
>               >
>               > On 31 January 2013 23:43, Tom Morris <[email protected]
>         <mailto:[email protected]>
>              <mailto:[email protected] <mailto:[email protected]>>
>               > <mailto:[email protected] <mailto:[email protected]>
>         <mailto:[email protected] <mailto:[email protected]>>>> wrote:
>               >
>               >
>               >     Yay OpenRefine! (One of my other projects)
>               >
>               >     Tom
>               >
>               >     _________________________________________________
>               >     Ol-tech mailing list
>               > [email protected] <mailto:[email protected]>
>         <mailto:[email protected] <mailto:[email protected]>>
>              <mailto:[email protected] <mailto:[email protected]>
>         <mailto:[email protected] <mailto:[email protected]>>>
>               >
>         http://mail.archive.org/cgi-__bin/mailman/listinfo/ol-tech
>         <http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech>
>               >     To unsubscribe from this mailing list, send email to
>               > Ol-tech-unsubscribe@archive.__org
>         <mailto:[email protected]>
>              <mailto:Ol-tech-unsubscribe@__archive.org
>         <mailto:[email protected]>>
>              <mailto:Ol-tech-unsubscribe@__archive.org
>         <mailto:[email protected]>
>              <mailto:Ol-tech-unsubscribe@__archive.org
>         <mailto:[email protected]>>>
>               >
>               >
>               >
>               >
>               > _________________________________________________
>               > Ol-tech mailing list
>               > [email protected] <mailto:[email protected]>
>         <mailto:[email protected] <mailto:[email protected]>>
>               >
>         http://mail.archive.org/cgi-__bin/mailman/listinfo/ol-tech
>         <http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech>
>               > To unsubscribe from this mailing list, send email to
>         Ol-tech-unsubscribe@archive.__org
>         <mailto:[email protected]>
>         <mailto:Ol-tech-unsubscribe@__archive.org
>         <mailto:[email protected]>>
>               >
>
>              --
>              Karen Coyle
>         [email protected] <mailto:[email protected]>
>         <mailto:[email protected] <mailto:[email protected]>>
>         http://kcoyle.net
>              ph: 1-510-540-7596
>              m: 1-510-435-8234
>              skype: kcoylenet
>              _________________________________________________
>              Ol-tech mailing list
>         [email protected] <mailto:[email protected]>
>         <mailto:[email protected] <mailto:[email protected]>>
>         http://mail.archive.org/cgi-__bin/mailman/listinfo/ol-tech
>         <http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech>
>              To unsubscribe from this mailing list, send email to
>         Ol-tech-unsubscribe@archive.__org
>         <mailto:[email protected]>
>         <mailto:Ol-tech-unsubscribe@__archive.org
>         <mailto:[email protected]>>
>
>
>
>     --
>     Karen Coyle
>     [email protected] <mailto:[email protected]> http://kcoyle.net
>     ph: 1-510-540-7596
>     m: 1-510-435-8234
>     skype: kcoylenet
>
>

-- 
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to