Siqi,

Yeah, I have written a custom parser for this, since I need only a small
part of the status object, and the volume of statutes is quite large.

I can share the Scala function that accounts for this change. This function
only unescapes Unicode characters that directly map to ASCII.

def unescape(s:String) =
{

    val len=s.length
    var i=0
    var c:Char=' '
    val sb = new StringBuffer(len)


    while (i<len) {
      c = s.charAt(i)
      i += 1


      if (c=='\\') {

        if (i<len) {
          val next = s.charAt(i);


          if (next=='u') {


            val code = Integer.parseInt(s.substring(i+1,i+5),16).intValue
            if (code < 256) {
              c = code.toChar
              i += 5
            }


          } else {
            c = next
            i += 1
         }
      }

    sb.append(c)
  }
  sb.toString
}

And I take back my statement about Twitter's encoding being wrong. It is
unusual but not wrong.


On Mon, Nov 1, 2010 at 6:37 PM, Siqi Li <siqi...@11outof10.com> wrote:

>  HI Harshad,
>
>
>
> What’s your parser? Do you write it yourself or just modifying any plug-in?
>
>
>
> Could you share the main part?
>
>
>
>

cheers,
-- 
Harshad RJ
http://twitter.com/h__r__j

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Reply via email to