Hi Tom,

The AS API is UTF-8 by default and AS tries to make sure your database is set 
up correctly, too, by checking the database/table encodings. As a data point, 
with dozens of migrations making millions of calls to the AS API and sending 
data in both directions I've yet to come across a single instance of AS 
inserting spurious characters into API responses, but I've had plenty of 
encoding issues in the same migrations on the data/database level. I'm fairly 
confident you'll find the source of those characters if you look at the raw 
data.

p

________________________________
From: archivesspace_users_group-boun...@lyralists.lyrasis.org 
<archivesspace_users_group-boun...@lyralists.lyrasis.org> on behalf of Tom 
Hanstra <hans...@nd.edu>
Sent: 03 September 2021 18:09
To: Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] API output - extra unicode

Brian (and others),

The data in the database should be UTF-8 as far as I can tell. So, I think this 
has to be happening at the API export level. Is there anything specific that 
needs to be done to have the API know that this is UTF-8 data?

Tom

On Fri, Sep 3, 2021 at 11:42 AM Brian Harrington 
<brian.harring...@lyrasis.org<mailto:brian.harring...@lyrasis.org>> wrote:

Hi Tom,



In my experience \u00c3 appearing in anything is almost always a sign of 
encoding issues.  I would make sure that everything is UTF-8 all the way 
through.



Brian



From: 
<archivesspace_users_group-boun...@lyralists.lyrasis.org<mailto:archivesspace_users_group-boun...@lyralists.lyrasis.org>>
 on behalf of Tom Hanstra <hans...@nd.edu<mailto:hans...@nd.edu>>
Reply-To: Archivesspace Users Group 
<archivesspace_users_group@lyralists.lyrasis.org<mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Date: Friday, September 3, 2021 at 11:06 AM
To: Archivesspace Users Group 
<archivesspace_users_group@lyralists.lyrasis.org<mailto:archivesspace_users_group@lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] API output - extra unicode



On our local version of ArchivesSpace, we are testing API output and are 
finding that we are getting extra Unicode characters on export. It looks like 
the data is right in the database, but doesn't quite come out right from the 
API extract. It looks like there is an extra unicode character added (in some 
of the code we reviewed, this was either \u00c3 or \u00a2).



Where might we have something set incorrectly?  Where might the extra data be 
coming from or have been introduced along the way?



Thanks,

Tom



--

Tom Hanstra

Sr. Systems Administrator

hans...@nd.edu<mailto:hans...@nd.edu>



[Image removed by sender.]

_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org<mailto:Archivesspace_Users_Group@lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


--
Tom Hanstra
Sr. Systems Administrator
hans...@nd.edu<mailto:hans...@nd.edu>

[https://docs.google.com/uc?export=download&id=1GFX1KaaMTtQ2Kg2u8bMXt1YwBp96bvf0&revid=0B7APN9POn6xAQ244WWFYMFU3aVJwZ0lxbmVHK3FxNXlCd0RRPQ]
_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group@lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

Reply via email to