Brian <br...@merrells.org> added the comment:

On Mon, Mar 14, 2011 at 4:09 PM, Raymond Hettinger
<rep...@bugs.python.org>wrote:

>
> Raymond Hettinger <rhettin...@users.sourceforge.net> added the comment:
>
> > We seem to be in the worst of both worlds right now
> > as I've generated and stored a lot of json that can
> > not be read back in
>
> This is unfortunate.  The dumps() should have never worked in the first
> place.
>
> I don't think that loads() should be changed to accommodate the dumps()
> error though.  JSON is UTF-8 by definition and it is a useful feature that
> invalid UTF-8 won't load.
>

I may be wrong but it appeared that json actually encoded the data as the
string "u\da00" ie (6-bytes) which is slightly different than the encoding
of the utf-8 encoding of the json itself.  Not sure if this is relevant but
it seems less severe than actually invalid utf-8 coding in the bytes.

Unfortunately I don't believe this does anything on python 2.x as only
python 3.x encode/decode flags this as invalid.

> ----------
> nosy: +rhettinger
> priority: normal -> high
>
> _______________________________________
> Python tracker <rep...@bugs.python.org>
> <http://bugs.python.org/issue11489>
> _______________________________________
>

----------
nosy: +merrellb
Added file: http://bugs.python.org/file21135/unnamed

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11489>
_______________________________________
On Mon, Mar 14, 2011 at 4:09 PM, Raymond Hettinger <span dir="ltr">&lt;<a 
href="mailto:rep...@bugs.python.org";>rep...@bugs.python.org</a>&gt;</span> 
wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" 
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>
Raymond Hettinger &lt;<a 
href="mailto:rhettin...@users.sourceforge.net";>rhettin...@users.sourceforge.net</a>&gt;
 added the comment:<br>
<div class="im"><br>
&gt; We seem to be in the worst of both worlds right now<br>
&gt; as I&#39;ve generated and stored a lot of json that can<br>
&gt; not be read back in<br>
<br>
</div>This is unfortunate.  The dumps() should have never worked in the first 
place.<br>
<br>
I don&#39;t think that loads() should be changed to accommodate the dumps() 
error though.  JSON is UTF-8 by definition and it is a useful feature that 
invalid UTF-8 won&#39;t load.<br></blockquote><div> </div><div>I may be wrong 
but it appeared that json actually encoded the data as the string 
&quot;u\da00&quot; ie (6-bytes) which is slightly different than the encoding 
of the utf-8 encoding of the json itself.  Not sure if this is relevant but it 
seems less severe than actually invalid utf-8 coding in the bytes.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 
.8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
To fix the data you&#39;ve already created (one that other compliant JSON 
readers wouldn&#39;t be able to parse), I think you need to repreprocess those 
file to make them valid:<br>
<br>
   bs.decode(&#39;utf-8&#39;, 
errors=&#39;ignore&#39;).encode(&#39;utf-8&#39;)<br></blockquote><div>Unfortunately
 I don&#39;t believe this does anything on python 2.x as only python 3.x 
encode/decode flags this as invalid.</div>

<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 
.8ex;border-left:1px #ccc solid;padding-left:1ex;">
----------<br>
nosy: +rhettinger<br>
priority: normal -&gt; high<br>
<div><div></div><div class="h5"><br>
_______________________________________<br>
Python tracker &lt;<a 
href="mailto:rep...@bugs.python.org";>rep...@bugs.python.org</a>&gt;<br>
&lt;<a href="http://bugs.python.org/issue11489"; 
target="_blank">http://bugs.python.org/issue11489</a>&gt;<br>
_______________________________________<br>
</div></div></blockquote></div><br>
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to