Not sure if this helps: http://www.jguru.com/faq/view.jsp?EID=137049
On Fri, Aug 6, 2010 at 3:49 PM, Jian Fang <[email protected]> wrote: > I see your problem here, in your input file, the unicode is presented as > plain text and thus, > the Java String also treats them as a String. One thing you can do is to > convert > the unicode String back to unicode, then do the conversion to utf-8. > > For example, you can have a state machine, which traces the start character > "\u", i.e, > two characters "\", "u", then you should know it is a unicode for the next > couple characters. > > There may be some better way to handle this. Need do some googling. > > Thanks, > > Jian > > > On Fri, Aug 6, 2010 at 12:57 PM, Jade <[email protected]> wrote: > >> Hi Jian, >> >> Thank you for all of the information. I tried implementing the >> UnicodeTypeHandler as you mentioned. However, the new >> String(test.getBytes(),"UTF-8"); call isn't working correctly because >> the bytes at that point are already not correct. >> >> The input string s is: >> >> [\, Q, e, n, t, e, r, D, o, c, u, m, e, n, t, I, n, f, o, r, m, a, t, >> i, o, n, |, T, e, s, t, , T, i, t, l, e, |, M, y, , a, b, s, t, r, >> a, c, t, , i, s, , ., ., ., , \, u, 0, 0, 4, E, \, u, 0, 0, F, C, >> \, u, 0, 0, 6, 8, \, u, 0, 0, 6, 5, , &, |, t, e, s, t, ;, , V, i, >> r, e, o, |, 3, |, n, o, -, r, e, p, l, y, @, t, d, l, ., o, r, g, |, >> t, r, u, e, \, E] >> >> and the part of the string that represents the data is: >> >> My abstract is ... \u004E\u00FC\u0068\u0065 & >> >> abstractText (in the data file): My abstract is ... \u004E\u00FC >> \u0068\u0065 & >> >> String test = "My abstract is ... \u004E\u00FC\u0068\u0065 &"; >> >> In the console: >> >> abstractText: My abstract is ... \u004E\u00FC\u0068\u0065 & >> test: My abstract is ... Nühe & >> >> I looked at the data in the debugger: >> >> In the debugger, the \u is double-escaped: \\u >> c: My abstract is ... \u004E\u00FC\u0068\u0065 & >> >> Each char is seen as a char, the \u was not correctly interpreted: >> >> thus c is 45 chars. >> [M, y, , a, b, s, t, r, a, c, t, , i, s, , ., ., ., , \, u, 0, 0, >> 4, E, \, u, 0, 0, F, C, \, u, 0, 0, 6, 8, \, u, 0, 0, 6, 5, , &] >> >> cBytes: [77, 121, 32, 97, 98, 115, 116, 114, 97, 99, 116, 32, 105, >> 115, 32, 46, 46, 46, 32, 92, 117, 48, 48, 52, 69, 92, 117, 48, 48, 70, >> 67, 92, 117, 48, 48, 54, 56, 92, 117, 48, 48, 54, 53, 32, 38] >> >> d: (25 chars) [M, y, , a, b, s, t, r, a, c, t, , i, >> s, , ., ., ., , N, ü, h, e, , &] >> >> dBytes: [77, 121, 32, 97, 98, 115, 116, 114, 97, 99, 116, 32, 105, >> 115, 32, 46, 46, 46, 32, 78, -61, -68, 104, 101, 32, 38] >> >> >> Jade >> >> On Aug 5, 4:18 pm, Jian Fang <[email protected]> wrote: >> > To save your time, I post an example type handler here: >> > >> > >> ---------------------------------------------------------------------------------------------------- >> > >> > package org.telluriumsource.ut >> > >> > import org.telluriumsource.test.ddt.mapping.type.TypeHandler >> > >> > class PhoneNumberTypeHandler implements TypeHandler{ >> > protected final static String PHONE_SEPARATOR = "-" >> > protected final static int PHONE_LENGTH = 12 >> > >> > //remove the "-" inside the phone number >> > public String valueOf(String s) { >> > String value >> > >> > if(s != null && (s.length() > 0)){ >> > value = s.replaceAll(PHONE_SEPARATOR, "") >> > }else { >> > value = s >> > } >> > >> > return value >> > } >> > >> > } >> > On Thu, Aug 5, 2010 at 5:16 PM, Jian Fang <[email protected]> >> wrote: >> > > Seems the following code could convert the uicode to a utf-8 string. >> > >> > > @Test >> > > public void testUicode(){ >> > > String test = >> > > "\u004E\u00FC\u0068\u0065\u00F0\u0061\u006E\u0020\u03AC\u03C1\u03C7"; >> > > try { >> > > String c = new String(test.getBytes(),"UTF-8"); >> > > System.out.println("Converted: " + c); >> > > } catch (UnsupportedEncodingException e) { >> > > e.printStackTrace(); >> > > } >> > > } >> > >> > > For your data driven test, you need to create a custom type handler. >> Please >> > > see the example here: >> > >> > >http://code.google.com/p/aost/wiki/UserGuide070DetailsOnTellurium#typ. >> .. >> > >> > > Thanks, >> > >> > > Jian >> > >> > > On Thu, Aug 5, 2010 at 4:59 PM, Jian Fang <[email protected] >> >wrote: >> > >> > >> Seems you need to create a custom handle to convert the unicode to >> "UTF8" >> > >> format. I will try to find some time to see if I can create some test >> code >> > >> for you. >> > >> Sorry for that, I am busy with Trump now. >> > >> > >> Thanks, >> > >> > >> Jian >> > >> > >> On Thu, Aug 5, 2010 at 4:11 PM, Jade <[email protected]> wrote: >> > >> > >>> Hi, >> > >> > >>> Some of our test data includes unicode characters, such as: >> > >> > >>> enterDocumentInformation|Test Title|My abstract is ... \u004E\u00FC >> > >>> \u0068\u0065 &|test; Vireo|3|[email protected]|true >> > >> > >>> However, the unicode characters aren't being unencoded as they're >> read >> > >>> in and bound to the variable. >> > >> > >>> String abstractText = bind("DocumentInformationData.abstract") >> > >> > >>> println "abstractText: ${abstractText}" >> > >> > >>> String test = >> "\u004E\u00FC\u0068\u0065\u00F0\u0061\u006E\u0020\u03AC >> > >>> \u03C1\u03C7" >> > >>> println "test: ${test}" >> > >> > >>> abstractText: My abstract is ... \u004E\u00FC\u0068\u0065 & >> > >>> test: Nüheðan άρχ >> > >> > >>> Is there another method that I need to call to unencode the unicode? >> > >> > >>> Jade >> > >> > >>> -- >> > >>> You received this message because you are subscribed to the Google >> Groups >> > >>> "tellurium-users" group. >> > >>> To post to this group, send email to >> [email protected]. >> > >>> To unsubscribe from this group, send email to >> > >>> [email protected]<tellurium-users%[email protected]> >> <tellurium-users%[email protected]<tellurium-users%[email protected]> >> > >> > >>> . >> > >>> For more options, visit this group at >> > >>>http://groups.google.com/group/tellurium-users?hl=en. >> > > -- You received this message because you are subscribed to the Google Groups "tellurium-users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tellurium-users?hl=en.
