The internal representation of character strings in Perl-5 is not identical to UTF-8 or UTF-X, although they both may occur in the same string variable. There is no automatic conversion; the "use utf8;" pragma is only to enable Perl-5 source code written in UTF-8 (see "perldoc utf8"). Therefore each UTF-8 text coming from outside the program must be decoded, as well as all data to leave the program as UTF-8 text must be encoded.

So please after including the "use Encode;" pragma replace your line
    my $data_structure = decode_json(`curl -X GET $url`);
by something like
    my $data_structure = decode_json(decode('utf8', `curl -X GET $url`));
and replace analogously
    my $Post = `curl -X PUT $url -d '$returnJSON'`;
by
    my $JSONutf8 = encode('utf8', $returnJSON);
    my $Post = `curl -X PUT $url -d '$JSONutf8'`;

This method helped my a lot to build and use a couch database with many international names in its texts. Since the error message you included is related to UTF-8, it should be worth while to try in your case.

Kind regards,
Raimund Riedel


Am 07.04.2018 um 07:24 schrieb Bill Stephenson:
I’ve been working on a “comments” feature for my “CherryPC blog”.

I don’t want readers to have to make a user account to comment so I’m wanting 
to use a perl script on the server side that has the user credentials in the 
$url variable below.

This is the code I’m using to update the document with the comment.

# Convert the JSON to a perl object

my $data_structure = decode_json(`curl -X GET $url`);

my $_id = $data_structure->{'_id'};
my $_rev = $data_structure->{'_rev'};
my $title = $data_structure->{'title'};
my $subtitle = $data_structure->{'subtitle'};
my $content = $data_structure->{'content'};
my $Text_publish = $data_structure->{'Text_publish'};
my $publishDate = $data_structure->{'publishDate'};


my $returnJSON = qq`{"$_id": "_id", "_rev": "$_rev", "title": "$title", "subtitle": "$subtitle", "content": "$content", 
"docType": "text", "Text_publish": "yes", "publishDate": "$publishDate",$newCommentsList}`;

my $Post = `curl -X PUT $url -d '$returnJSON'`;

This works fine with plain text, but the blog posts are made with TinyMCE and 
use HTML.  I can update them fine with Javascript and PouchDB, but Perl is 
dying on double quotes, single quotes, and backslashes:

‘ “ \

I’ve narrowed it down to just those 3 characters. If I strip those from the 
html and comments it will all post fine, but html doesn’t work without those so 
that’s not an option.

I’m using these modules:

use strict;
use warnings;
use utf8;
use JSON::XS;
use Data::Dumper;
use CGI;

 From what I understand "use utf8” forces the all data to be utf-8 encoded and 
I’ve used several different modules to encode the data and built the entire document 
in a perl object and converted that to JSON as opposed to a simple string like 
above, but it still dies on those three characters.

This is what the curl error tells me:

PUT Error: bad_request
reason: invalid UTF-8 JSON

So, it’s those 3 characters that are not being encoded correctly.

If anyone has any ideas and/or advice on how to deal with this I’d sure 
appreciate them. I’ve pretty much ran out of them at this point.

Kindest Regards,

Bill Stephenson





--
Raimund Riedel
______________ rajmun...@gmail.com
______________ Mi parolas Esperanton

Reply via email to