Brian <br...@merrells.org> added the comment:
On Mon, Mar 14, 2011 at 4:09 PM, Raymond Hettinger
<rep...@bugs.python.org>wrote:
>
> Raymond Hettinger <rhettin...@users.sourceforge.net> added the comment:
>
> > We seem to be in the worst of both worlds right now
> > as I've generated and stored a lot of json that can
> > not be read back in
>
> This is unfortunate. The dumps() should have never worked in the first
> place.
>
> I don't think that loads() should be changed to accommodate the dumps()
> error though. JSON is UTF-8 by definition and it is a useful feature that
> invalid UTF-8 won't load.
>
I may be wrong but it appeared that json actually encoded the data as the
string "u\da00" ie (6-bytes) which is slightly different than the encoding
of the utf-8 encoding of the json itself. Not sure if this is relevant but
it seems less severe than actually invalid utf-8 coding in the bytes.
Unfortunately I don't believe this does anything on python 2.x as only
python 3.x encode/decode flags this as invalid.
> ----------
> nosy: +rhettinger
> priority: normal -> high
>
> _______________________________________
> Python tracker <rep...@bugs.python.org>
> <http://bugs.python.org/issue11489>
> _______________________________________
>
----------
nosy: +merrellb
Added file: http://bugs.python.org/file21135/unnamed
_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11489>
_______________________________________
On Mon, Mar 14, 2011 at 4:09 PM, Raymond Hettinger <span dir="ltr"><<a
href="mailto:rep...@bugs.python.org">rep...@bugs.python.org</a>></span>
wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Raymond Hettinger <<a
href="mailto:rhettin...@users.sourceforge.net">rhettin...@users.sourceforge.net</a>>
added the comment:<br>
<div class="im"><br>
> We seem to be in the worst of both worlds right now<br>
> as I've generated and stored a lot of json that can<br>
> not be read back in<br>
<br>
</div>This is unfortunate. Â The dumps() should have never worked in the first
place.<br>
<br>
I don't think that loads() should be changed to accommodate the dumps()
error though. Â JSON is UTF-8 by definition and it is a useful feature that
invalid UTF-8 won't load.<br></blockquote><div>Â </div><div>I may be wrong
but it appeared that json actually encoded the data as the string
"u\da00" ie (6-bytes) which is slightly different than the encoding
of the utf-8 encoding of the json itself. Â Not sure if this is relevant but it
seems less severe than actually invalid utf-8 coding in the bytes.</div>
<div>Â </div><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
To fix the data you've already created (one that other compliant JSON
readers wouldn't be able to parse), I think you need to repreprocess those
file to make them valid:<br>
<br>
 bs.decode('utf-8',
errors='ignore').encode('utf-8')<br></blockquote><div>Unfortunately
I don't believe this does anything on python 2.x as only python 3.x
encode/decode flags this as invalid.</div>
<div><br></div><div>Â </div><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex;">
----------<br>
nosy: +rhettinger<br>
priority: normal -> high<br>
<div><div></div><div class="h5"><br>
_______________________________________<br>
Python tracker <<a
href="mailto:rep...@bugs.python.org">rep...@bugs.python.org</a>><br>
<<a href="http://bugs.python.org/issue11489"
target="_blank">http://bugs.python.org/issue11489</a>><br>
_______________________________________<br>
</div></div></blockquote></div><br>
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com