Ah, of course you're right that the & will get escaped as & in the MARCXML in biblio.record_entry.marc. But then when it's displayed in the catalogue or if you load it back up in the editor, you should just see "&" again.
However, I just tested putting "This & that" into a 520 field in our 2.12 system and noticed that the displayed data is "This & that" -- meaning it is *not* getting escaped. Which is a Very Bad Thing. (Bug forthcoming). As for your free Overdrive records, yes, I regularly spend time fixing mass corruption of records that are sourced from OCLC, where someone has almost-but-not-quite managed to flesh out the table of contents & summary fields correctly. There is some pretty heinous data out there :) On Wed, Mar 21, 2018 at 3:29 PM, Josh Stompro <[email protected]> wrote: > Hello Dan, > > > > We are still on 2.10 using the XUL client, so maybe the 520 display > anomaly has been fixed in a later version. I’ll make a note to check back > once we are on a more modern version. > > > > Would it be accurate to say that characters like & in the marc editor are > encoded as html entities in the biblio.record_entry.marc since they are > stored as marc xml? > > > > The record that I was looking at was one of the free overdrive records, > which are very very very rough, so it wouldn’t surprise me that they are > grabbing the 520 from a web page and not being very careful with encoding. > I just looked for occurrences of ‘&amp’ and there are only 344 of them > and all but 3 are from the free overdrive records. There are also quite a > few instances(6000) of &#8212; (em dash), again they are all the free > overdrive records. I guess we get what we pay for. > > > > I’m tempted to just use regexp_replace against biblio.record_entry to try > and clean these up, like the example here: https://wiki.evergreen-ils. > org/doku.php?id=scratchpad:random_magic_spells#how_to_ > prune_a_tag_under_the_hood > > > > Josh Stompro - LARL IT Director > > > > *From:* Open-ils-general [mailto:open-ils-general- > [email protected]] *On Behalf Of *Dan Scott > *Sent:* Tuesday, March 20, 2018 4:23 PM > *To:* Evergreen Discussion Group <open-ils-general@list. > georgialibraries.org> > *Subject:* Re: [OPEN-ILS-GENERAL] HTML entities in MARC Record editor > > > > Hi Josh: > > Quick question: XUL or web staff client? And version? > > > > In theory, what you see is what you should get - MARC has no idea what > HTML entities are, so "&" in the editor should be displayed as "&" > (properly escaped, of course) in the catalogue. > > If you see &amp; in the biblio.record_entry.marc, it may be the result > of corrupted catalogue enrichment efforts (e.g. grabbing the summary for a > book from a website via a script with a bug), and thus should just be > corrected directly to "&". Unless it's a deliberately torturous book title > like "Escaping <HTML> &amp; other Secure Web Practices" :) > > If & in the MARC shows up as just & in the 520 catalogue output, it > sounds like there might be a bug for us to track down... > > > > Thanks, > > Dan > > > > On Tue, Mar 20, 2018 at 9:50 PM, Josh Stompro <[email protected]> > wrote: > > Hello, could someone give me some pointers in regards to html entities in > marc data? Sometimes I see & used in 490a data and displayed as & > in the evergreen marc editor, and in the catalog it is displayed as & > and not as &. > > > > We also see things like a 520 that contains & but it does get > displayed as & in the catalog? > > > > And when I look at the biblio.record_entry.marc It looks like & in the > editor gets encoded as &amp;, so is this a double encoding error? > Should I ever see html entities when looking at marc data in the editor? > > > > If those should be cleaned up, anyone have any magic spells/queries for > doing so? > > Thanks > > Josh > > > > Lake Agassiz Regional Library - Moorhead MN larl.org > > Josh Stompro | Office 218.233.3757 EXT-139 <(218)%20233-3757> > > LARL IT Director | Cell 218.790.2110 <(218)%20790-2110> > > > > >
