Linda, Your plan sounds OK to me. I think it can wait for the new features, since your site is the main (probably the only) user of the module.
I'll take a look at the Open Library bug you mentioned. We're not using it, but maybe there is something that we can do generically to resolve this in a way that doesn't require duplicated code. Cheers, Jason On 1/14/19 9:19 AM, Linda Jansova wrote: > Hi Jason, > > There is not a Launchpad bug for this yet because at this point it has > only been fixed locally (in ObalkyKnih.pm module in our Evergreen > installation). Currently we are in the process of adding some more > features to ObalkyKnih.pm module and testing these features. Of course, > we will create a wishlist bug and submit the enriched code (which would > also contain the encoding fix) to GIT. Should there be a wider interest > in the corrected code at this point, we would submit a patch for the > encoding only (and add the new features at a later date). > > Maybe that this (or similar) solution would also work for encoding > issues in added content coming from Open Library - for which a bug has > been opened already: https://bugs.launchpad.net/evergreen/+bug/1610678. > (Josh did some testing with Open Library data encoding.) > > Linda > > On 1/14/19 2:45 PM, Jason Stephenson wrote: >> Hi, Linda. >> >> Good find! That sounds like a bug to me. Could you submit a Launchpad >> bug (if there isn't one already) and a patch commit, please? >> >> Depending on which module this occurs in, the fix may not be as simple >> as using utf8::decode. It could be that we need code to determine the >> character set from the HTTP response. >> >> Cheers, >> Jason >> >> On 1/14/19 2:38 AM, Linda Jansova wrote: >>> Hi, >>> >>> Just letting you know that our developer has eventually changed the >>> encoding of the data blob coming from Obalkyknih.cz from UTF-8 to Perl's >>> internal representation: >>> >>> utf8::decode($response_content); >>> >>> Adding this step before further data processing takes place has >>> successfully solved the problem of seeing gibberish characters in our >>> catalog :-). >>> >>> Linda >>> >>> On 11/25/18 4:56 PM, Linda Jansova wrote: >>>> Hi, >>>> >>>> We have encountered a problem in AddedContent.pm's get_url function >>>> (both in Evergreen 3.1.4 and 2.12.6) - letters with diacritics from >>>> Czech added content provider Obalkyknih.cz have started being >>>> corrupted in our TPACs. Our added content provider has reported a >>>> switch to application/json MIME type. >>>> >>>> After the switch we seem to be getting strange chars in summary (and >>>> table of contents if available as text, not as an image). We have >>>> tried to locate the problem in a separate Perl program and have come >>>> to a conclusion that data only get corrupted when fetched by the >>>> get_url function from AddedContent.pm. >>>> >>>> We have added additional logging to the following part of >>>> AddedContent.pm: >>>> >>>> # returns an HTPP::Response object >>>> sub get_url { >>>> my( $self, $url ) = @_; >>>> >>>> $logger->info("added content getting [timeout=$net_timeout, >>>> errors_remaining=$error_countdown] URL = $url"); >>>> my $agent = LWP::UserAgent->new(timeout => $net_timeout); >>>> >>>> my $res = $agent->get($url); >>>> $logger->info("added content request returned with code " . >>>> $res->code); >>>> >>>> #VJ >>>> $logger->info("added contet res is: " . $res->content); >>>> >>>> die "added content request failed: " . $res->status_line ."\n" >>>> unless $res->is_success; >>>> >>>> return $res; >>>> } >>>> >>>> And a corresponding sample log looks like this: >>>> >>>> [2018-11-24 22:22:58] /usr/sbin/apache2 >>>> [INFO:32231:AddedContent.pm:296:1543094555322319] added contet res is: >>>> [{"_id":"5bf904c905509b06182848ac","succ_toc_count":"0","cover_preview510_url":"https://cache.obalkyknih.cz/file/cover/1830989/preview510","ean":"9788026203667","uuid":["uuid:6ec863b0-055e-11e6-a611-005056827e51"],"cooperating_with":"https://www.cbdb.cz|CBDB.cz","succ_cover_count":"0","flag_bare_record":0,"csn_iso_690_source":"Národnà >>>> >>>> knihovna Ä<U+008C>eské Republiky >>>> 18.11.2018","rating_url":"https://www.obalkyknih.cz/stars?value=100","rating_sum":200,"cover_thumbnail_url":"https://cache.obalkyknih.cz/file/cover/1830989/thumbnail","oclc_other":[],"reviews":[],"part_root":1,"orig_height":"510","backlink_url":"https://www.obalkyknih.cz/view?isbn=9788026203667","toc_thumbnail_url":"https://cache.obalkyknih.cz/file/toc/362647/thumbnail","cover_medium_url":"https://cache.obalkyknih.cz/file/cover/1830989/medium","annotation":{"source":"Web >>>> >>>> obalkyknih.cz","html":"Encyklopedie sociálnà práce (ESP) >>>> pÅ<U+0099>inášà pÅ<U+0099>es 200 hesel ze vÅ¡ech oblastà >>>> sociálnà práce. ESP je postavena na interakÄ<U+008D>nÃm pojetà >>>> sociálnà práce. JedineÄ<U+008D>nost sociálnà práce spoÄ >>>> <U+008D>Ãvá v tom, že operuje v poli mezi klientem a jeho >>>> sociálnÃm prostÅ<U+0099>edÃm; pracovnÃk je v obecném smyslu >>>> mediátorem mezi jednotlivcem a spoleÄ<U+008D>nostÃ. Jeho úkolem je >>>> napomáhat sociálnÃmu fungovánà klientů a pomáhat >>>> spoleÄ<U+008D>nosti, aby citlivÄ<U+009B> reagovala na potÅ<U+0099>eby >>>> svých Ä<U+008D>lenů. Tato dvojitá mediaÄ<U+008D>nà role je role >>>> angažovaná. Je zakotvená hodnotovÄ<U+009B> v náboženstvà nebo v >>>> huma >>>> [2018-11-24 22:22:58] /usr/sbin/apache2 >>>> [INFO:32231:ObalkyKnih.pm:228:1543094555322319] ObalkyKnih.cz for >>>> books?isbn=9788026203667 response was >>>> [{"_id":"5bf904c905509b06182848ac","succ_toc_count":"0","cover_preview510_url":"https://cache.obalkyknih.cz/file/cover/1830989/preview510","ean":"9788026203667","uuid":["uuid:6ec863b0-055e-11e6-a611-005056827e51"],"cooperating_with":"https://www.cbdb.cz|CBDB.cz","succ_cover_count":"0","flag_bare_record":0,"csn_iso_690_source":"Národnà >>>> >>>> knihovna Ä<U+008C>eské Republiky >>>> 18.11.2018","rating_url":"https://www.obalkyknih.cz/stars?value=100","rating_sum":200,"cover_thumbnail_url":"https://cache.obalkyknih.cz/file/cover/1830989/thumbnail","oclc_other":[],"reviews":[],"part_root":1,"orig_height":"510","backlink_url":"https://www.obalkyknih.cz/view?isbn=9788026203667","toc_thumbnail_url":"https://cache.obalkyknih.cz/file/toc/362647/thumbnail","cover_medium_url":"https://cache.obalkyknih.cz/file/cover/1830989/medium","annotation":{"source":"Web >>>> >>>> obalkyknih.cz","html":"Encyklopedie sociálnà práce (ESP) >>>> pÅ<U+0099>inášà pÅ<U+0099>es 200 hesel ze vÅ¡ech oblastà >>>> sociálnà práce. ESP je postavena na interakÄ<U+008D>nÃm pojetà >>>> sociálnà práce. JedineÄ >>>> <U+008D>nost sociálnà práce spoÄ<U+008D>Ãvá v tom, že operuje v >>>> poli mezi klientem a jeho sociálnÃm prostÅ<U+0099>edÃm; pracovnÃk >>>> je v obecném smyslu mediátorem mezi jednotlivcem a >>>> spoleÄ<U+008D>nostÃ. Jeho úkolem je napomáhat sociálnÃmu >>>> fungovánà klientů a pomáhat spoleÄ<U+008D>nosti, aby >>>> citlivÄ<U+009B> reagovala na potÅ >>>> <U+0099>eby svých Ä<U+008D>lenů. Tato dvojitá mediaÄ<U+008D>nà >>>> role je role angažovaná. Je zakotvená hodnot >>>> >>>> Our Perl code used for testing purposes >>>> >>>> #!usr/bin/perl >>>> use LWP::UserAgent; >>>> >>>> my $ua = LWP::UserAgent->new; >>>> my $response = $ua->get( >>>> 'http://cache.obalkyknih.cz/api/books?isbn=978-80-262-0366-7' ); >>>> my $content = $response->content; >>>> print $content ; >>>> >>>> produces data with the correct encoding: >>>> >>>> [{"_id":"5bf904c905509b06182848ac","succ_toc_count":"0","cover_preview510_url":"https://cache.obalkyknih.cz/file/cover/1830989/preview510","ean":"9788026203667","uuid":["uuid:6ec863b0-055e-11e6-a611-005056827e51"],"cooperating_with":"https://www.cbdb.cz|CBDB.cz","succ_cover_count":"0","flag_bare_record":0,"csn_iso_690_source":"Národní >>>> >>>> knihovna České Republiky >>>> 18.11.2018","rating_url":"https://www.obalkyknih.cz/stars?value=100","rating_sum":200,"cover_thumbnail_url":"https://cache.obalkyknih.cz/file/cover/1830989/thumbnail","oclc_other":[],"reviews":[],"part_root":1,"orig_height":"510","backlink_url":"https://www.obalkyknih.cz/view?isbn=9788026203667","toc_thumbnail_url":"https://cache.obalkyknih.cz/file/toc/362647/thumbnail","cover_medium_url":"https://cache.obalkyknih.cz/file/cover/1830989/medium","annotation":{"source":"Web >>>> >>>> obalkyknih.cz","html":"Encyklopedie sociální práce (ESP) přináší přes >>>> 200 hesel ze všech oblastí sociální práce. ESP je postavena na >>>> interakčním pojetí sociální práce. Jedinečnost sociální práce spočívá >>>> v tom, že operuje v poli mezi klientem a jeho sociálním prostředím; >>>> pracovník je v obecném smyslu mediátorem mezi jednotlivcem a >>>> společností. Jeho úkolem je napomáhat sociálnímu fungování klientů a >>>> pomáhat společnosti, aby citlivě reagovala na potřeby svých členů. >>>> Tato dvojitá mediační role je role angažovaná. Je zakotvená hodnotově >>>> v náboženství nebo v humanitních ideálech. V podobě tematicky >>>> uspořádaných samostatných hesel poskytuje toto rozsáhlé dílo přehled >>>> psychologických a sociologických teorií a přístupů s dopadem do >>>> sociální práce, náboženský, filozofický a společenský kontext oboru. >>>> Přináší přehled klíčových pojmů, technik a metod sociální práce, >>>> ohrožených skupin a poskytovaných služeb. Samostatnou část tvoří hesla >>>> charakterizující profesi sociálního pracovníka a hesla zabývající se >>>> výzkumem v oblasti sociální práce. ESP reflektuje domácí vývoj oboru v >>>> evropském kontextu a zohledňuje i širší mezinárodní zřetel. Hesla >>>> popisují daný jev a jeho historii, hodnotová východiska, aplikační >>>> možnosti a výzkum.","id":"2391500"},"csn_iso_690":"MATOUŠEK, Oldřich. >>>> <i>Encyklopedie sociální práce. </i>Vyd. 1. Editor Alois KŘIŠŤAN. >>>> Praha: Portál, 2013. 570 >>>> s.","nbn":"cnb002436000","rating_avg100":"100","orig_width":"346","rating_avg5":5,"toc_pdf_url":"https://cache.obalkyknih.cz/file/toc/362647/pdf","ean_other":[],"nbn_other":[],"rating_count":2,"cover_icon_url":"https://cache.obalkyknih.cz/file/cover/1830989/icon","dig_obj":{"BOA001":{"public":0,"url":"https://kramerius.mzk.cz/search/i.jsp?pid=uuid:6ec863b0-055e-11e6-a611-005056827e51","uuid":"uuid:6ec863b0-055e-11e6-a611-005056827e51"},"ABA001":{"public":0,"url":"http://kramerius4.nkp.cz/search/i.jsp?pid=uuid:6ec863b0-055e-11e6-a611-005056827e51","uuid":"uuid:6ec863b0-055e-11e6-a611-005056827e51"}},"bib_year":"2013","oclc":"(OCoLC)852382182","bib_title":"Encyklopedie >>>> >>>> sociální práce","succ_bib_count":"0","book_id":"112038753"}] >>>> >>>> It confirms that data from Obalkyknih.cz are in UTF-8. >>>> >>>> Basically, what happens when things go wrong is this (using letter á >>>> as an example): the original character U+00E1 >>>> (http://www.fileformat.info/info/unicode/char/e1/index.htm) is encoded >>>> as two letters á (\u00c3\u00a1 as represented in memcached log). à is >>>> U+00E3 (https://www.fileformat.info/info/unicode/char/00c3/index.htm) >>>> while ¡ is U+00A1 >>>> (http://www.fileformat.info/info/unicode/char/00A1/index.htm). >>>> >>>> A description at >>>> https://www.effectiveperlprogramming.com/2011/08/know-the-difference-between-character-strings-and-utf-8-strings/ >>>> >>>> probably gives some ideas how encoding could get broken in Perl. >>>> >>>> However, we are not sure if the issue is in AddedContent.pm or in our >>>> Apache configuration (because our test Perl code run from bash works >>>> okay but AddedContent.pm called from Apache does not). >>>> >>>> Does anybody have any idea where to look next to fix it? >>>> >>>> Thank you in advance! >>>> >>>> Linda >>>> >>>>
