Gardenhose apparently returns illegal Unicode, as confirmed by
PostgreSQL and Perl's Encode, a very trusted, high-mileage code.  We
surely can trap illegal Unicode errors but need to know whether you're
aware of it, the rationale, and plan of action, if any. -- Alexy

On Nov 21, 5:10 pm, braver <delivera...@gmail.com> wrote:
> I've tried loading the gardenhose via Perl's JSON, and it fails on
> quite a few Asian ones with \uffff in them, e.g. the tweet if
> 5277460813:
>
> {"text":"RT @RealLamarOdom \uffffIf you haven't heard it, go 
> towww.richsoilclothing.comand look under \"updates\". Tell me what you
> think. It's hot!",...}
>
> Is it the artifact of downloading, or Twitter serves illegal UTF8?
> Here's an example of what Perl says about it, for another tweet:
>
> *** json ENCODING error: malformed or illegal unicode character in
> string [ Artest l], cannot convert to JSON at /home/alexyk/twitter/
> loader/jwilter.pl line 30, <> line 44817003.
>
>  {"in_reply_to_screen_name":null,"text":"RT @TheLakersNation
> \uffffArtest looked great. Lamar dominated the boards. Kobe is Kobe.
> And most importantly, the Lakers take the WIN!","source":"<a href=
> \"http://mobileways.de/gravity\"; rel=\"nofollow\">Gravity</
> a>","in_reply_to_user_id":null,"in_reply_to_status_id":null,"truncated":fal 
> se,"geo":null,"created_at":"Mon
> Nov 02 05:55:49 +0000 2009","user":
> {"profile_background_tile":false,"profile_sidebar_border_color":"BDDCAD","f 
> ollowing":null,"statuses_count":
> 243,"followers_count":33,"profile_image_url":"http://a3.twimg.com/
> profile_images/406146987/Real_Force_normal.jpg","friends_count":
> 93,"description":"My Love:Kobe Bryant,Los Angeles
> Lakers,NBA,Twitter,Music,Movie.I Love This Game.Determination:Let's
> again!","location":"CN","geo_enabled":false,"profile_background_color":"9AE 
> 4E8","screen_name":"Real_Force","favourites_count":
> 4,"verified":false,"notifications":null,"profile_text_color":"333333","time 
> _zone":"Beijing","protected":false,"url":"http://
> hi.baidu.com/real_force/","created_at":"Wed Sep 09 12:41:22 +0000
> 2009","profile_link_color":"0084B4","name":"Zhang
> Yuhao","profile_background_image_url":"http://a1.twimg.com/
> profile_background_images/36003404/
> photo_manipulation_photo_art_the_mansion.jpg","id":
> 72842359,"utc_offset":
> 28800,"profile_sidebar_fill_color":"DDFFCC"},"favorited":false,"id":
> 5357163705}
>
> PostgreSQL shows similar annoyance on its text field in UTF8.  Pls
> clarify what do you do to unicode here!
> Cheers,
> Alexy

Reply via email to