Jason, Dan is correct -- & appears rather than an & because at some point the symbol is not interpreted correctly or someone copy and pasted information from another source into a bib record and didn't notice & didn't render correctly. & can also be the result of an OCR program not correctly interpreting the scan. We used to see it much more often. It is in 520s and 505s since they are often copied and pasted or scanned.
There are also occasions where utf-8 translation of diacritics and other special characters doesn't happen correctly and the codes display in the bib record. OCLC did an upgrade once that caused the problem for record imported with the Z39.50 gateway. Evergreen had a bug many versions ago as well that didn't display such characters correctly When I find records what don't correctly display, I overlay a more current record from OCLC. If it is an scanning or copy and paste issue, I usually have to correct it in OCLC first (our only source of bib records is OCLC). While the display isn't as desired, if it isn't causing any search and retrieval issue, we just handle it serendipitously. Elaine J. Elaine Hardy PINES & Collaborative Projects Manager Georgia Public Library Service/PINES 1800 Century Place, Ste. 150 Atlanta, GA 30045 404.235.7128 Office 404.548.4241 Cell 404.235.7201 FAX On Wed, Mar 21, 2018 at 10:29 AM, Josh Stompro <stomp...@exchange.larl.org> wrote: > Hello Dan, > > > > We are still on 2.10 using the XUL client, so maybe the 520 display > anomaly has been fixed in a later version. I’ll make a note to check back > once we are on a more modern version. > > > > Would it be accurate to say that characters like & in the marc editor are > encoded as html entities in the biblio.record_entry.marc since they are > stored as marc xml? > > > > The record that I was looking at was one of the free overdrive records, > which are very very very rough, so it wouldn’t surprise me that they are > grabbing the 520 from a web page and not being very careful with encoding. > I just looked for occurrences of ‘&amp’ and there are only 344 of them > and all but 3 are from the free overdrive records. There are also quite a > few instances(6000) of &#8212; (em dash), again they are all the free > overdrive records. I guess we get what we pay for. > > > > I’m tempted to just use regexp_replace against biblio.record_entry to try > and clean these up, like the example here: https://wiki.evergreen-ils. > org/doku.php?id=scratchpad:random_magic_spells#how_to_ > prune_a_tag_under_the_hood > > > > Josh Stompro - LARL IT Director > > > > *From:* Open-ils-general [mailto:open-ils-general- > boun...@list.georgialibraries.org] *On Behalf Of *Dan Scott > *Sent:* Tuesday, March 20, 2018 4:23 PM > *To:* Evergreen Discussion Group <open-ils-general@list. > georgialibraries.org> > *Subject:* Re: [OPEN-ILS-GENERAL] HTML entities in MARC Record editor > > > > Hi Josh: > > Quick question: XUL or web staff client? And version? > > > > In theory, what you see is what you should get - MARC has no idea what > HTML entities are, so "&" in the editor should be displayed as "&" > (properly escaped, of course) in the catalogue. > > If you see &amp; in the biblio.record_entry.marc, it may be the result > of corrupted catalogue enrichment efforts (e.g. grabbing the summary for a > book from a website via a script with a bug), and thus should just be > corrected directly to "&". Unless it's a deliberately torturous book title > like "Escaping <HTML> &amp; other Secure Web Practices" :) > > If & in the MARC shows up as just & in the 520 catalogue output, it > sounds like there might be a bug for us to track down... > > > > Thanks, > > Dan > > > > On Tue, Mar 20, 2018 at 9:50 PM, Josh Stompro <stomp...@exchange.larl.org> > wrote: > > Hello, could someone give me some pointers in regards to html entities in > marc data? Sometimes I see & used in 490a data and displayed as & > in the evergreen marc editor, and in the catalog it is displayed as & > and not as &. > > > > We also see things like a 520 that contains & but it does get > displayed as & in the catalog? > > > > And when I look at the biblio.record_entry.marc It looks like & in the > editor gets encoded as &amp;, so is this a double encoding error? > Should I ever see html entities when looking at marc data in the editor? > > > > If those should be cleaned up, anyone have any magic spells/queries for > doing so? > > Thanks > > Josh > > > > Lake Agassiz Regional Library - Moorhead MN larl.org > > Josh Stompro | Office 218.233.3757 EXT-139 <(218)%20233-3757> > > LARL IT Director | Cell 218.790.2110 <(218)%20790-2110> > > > > >