Thank you so much Jian! I got busy with the tests, and I missed your post. This is just the fix I needed.
Jade On Aug 18, 12:21 am, Jian Fang <[email protected]> wrote: > BTW, I used your data to created the following DDT module: > > class UnicodeModule extends TelluriumDataDrivenModule{ > void defineModule() { > typeHandler "unicode", "org.telluriumsource.ddt.UnicodeTypeHandler" > > fs.FieldSet(name: "record", description: "Data format for testing > Unicode") { > Test(value: "testUnicode") > Field(name: "title", description: "test title") > Field(name: "abstract", type: "unicode", description: "abstract") > Field(name: "email", description: "email") > Field(name: "indicator", type: "boolean", description: "indicator") > } > > defineTest("testUnicode") { > String title = bind("record.title") > String abst = bind("record.abstract") > String email = bind("record.email") > boolean indicator = bind("record.indicator") > println "$title, $abst, $email, $indicator" > } > > } > > } > > where the "unicode" type handler is defined as follows. > > class UnicodeTypeHandler implements TypeHandler { > > public String valueOf(String s) { > if(s == null || s.trim().length() == 0){ > return s; > } > return parseUnicode(s); > > } > > You can find the whole test case from trunk/core. > > Let us know if you have further problems. > > Thanks, > > Jian > > > > On Wed, Aug 18, 2010 at 1:18 AM, Jian Fang <[email protected]> wrote: > > Finally find some time to get back to this topic, I found a utility class > > as follows to parse unicode: > > > public static String parseUnicode(String input) > > { > > StringTokenizer st = new StringTokenizer(input, "\\", true); > > > StringBuffer sb = new StringBuffer(); > > > while(st.hasMoreTokens()) > > { > > String token = st.nextToken(); > > if (token.charAt(0) == '\\' && token.length() == 1) > > { > > if(st.hasMoreTokens()) > > { > > token = st.nextToken(); > > } > > if(token.charAt(0) == 'u') > > { > > String hexnum; > > if (token.length() > 5) > > { > > hexnum = token.substring(1,5); > > token = token.substring(5); > > } > > else > > { > > hexnum = token.substring(1); > > token = ""; > > } > > sb.append((char)Integer.parseInt(hexnum, 16)); > > } > > } > > sb.append(token); > > } > > return sb.toString(); > > > } > > > On Fri, Aug 6, 2010 at 3:53 PM, Jian Fang <[email protected]>wrote: > > >> Not sure if this helps: > > >>http://www.jguru.com/faq/view.jsp?EID=137049 > > >> On Fri, Aug 6, 2010 at 3:49 PM, Jian Fang <[email protected]>wrote: > > >>> I see your problem here, in your input file, the unicode is presented as > >>> plain text and thus, > >>> the Java String also treats them as a String. One thing you can do is to > >>> convert > >>> the unicode String back to unicode, then do the conversion to utf-8. > > >>> For example, you can have a state machine, which traces the start > >>> character "\u", i.e, > >>> two characters "\", "u", then you should know it is a unicode for the > >>> next couple characters. > > >>> There may be some better way to handle this. Need do some googling. > > >>> Thanks, > > >>> Jian > > >>> On Fri, Aug 6, 2010 at 12:57 PM, Jade <[email protected]> wrote: > > >>>> Hi Jian, > > >>>> Thank you for all of the information. I tried implementing the > >>>> UnicodeTypeHandler as you mentioned. However, the new > >>>> String(test.getBytes(),"UTF-8"); call isn't working correctly because > >>>> the bytes at that point are already not correct. > > >>>> The input string s is: > > >>>> [\, Q, e, n, t, e, r, D, o, c, u, m, e, n, t, I, n, f, o, r, m, a, t, > >>>> i, o, n, |, T, e, s, t, , T, i, t, l, e, |, M, y, , a, b, s, t, r, > >>>> a, c, t, , i, s, , ., ., ., , \, u, 0, 0, 4, E, \, u, 0, 0, F, C, > >>>> \, u, 0, 0, 6, 8, \, u, 0, 0, 6, 5, , &, |, t, e, s, t, ;, , V, i, > >>>> r, e, o, |, 3, |, n, o, -, r, e, p, l, y, @, t, d, l, ., o, r, g, |, > >>>> t, r, u, e, \, E] > > >>>> and the part of the string that represents the data is: > > >>>> My abstract is ... \u004E\u00FC\u0068\u0065 & > > >>>> abstractText (in the data file): My abstract is ... \u004E\u00FC > >>>> \u0068\u0065 & > > >>>> String test = "My abstract is ... \u004E\u00FC\u0068\u0065 &"; > > >>>> In the console: > > >>>> abstractText: My abstract is ... \u004E\u00FC\u0068\u0065 & > >>>> test: My abstract is ... Nühe & > > >>>> I looked at the data in the debugger: > > >>>> In the debugger, the \u is double-escaped: \\u > >>>> c: My abstract is ... \u004E\u00FC\u0068\u0065 & > > >>>> Each char is seen as a char, the \u was not correctly interpreted: > > >>>> thus c is 45 chars. > >>>> [M, y, , a, b, s, t, r, a, c, t, , i, s, , ., ., ., , \, u, 0, 0, > >>>> 4, E, \, u, 0, 0, F, C, \, u, 0, 0, 6, 8, \, u, 0, 0, 6, 5, , &] > > >>>> cBytes: [77, 121, 32, 97, 98, 115, 116, 114, 97, 99, 116, 32, 105, > >>>> 115, 32, 46, 46, 46, 32, 92, 117, 48, 48, 52, 69, 92, 117, 48, 48, 70, > >>>> 67, 92, 117, 48, 48, 54, 56, 92, 117, 48, 48, 54, 53, 32, 38] > > >>>> d: (25 chars) [M, y, , a, b, s, t, r, a, c, t, , i, > >>>> s, , ., ., ., , N, ü, h, e, , &] > > >>>> dBytes: [77, 121, 32, 97, 98, 115, 116, 114, 97, 99, 116, 32, 105, > >>>> 115, 32, 46, 46, 46, 32, 78, -61, -68, 104, 101, 32, 38] > > >>>> Jade > > >>>> On Aug 5, 4:18 pm, Jian Fang <[email protected]> wrote: > >>>> > To save your time, I post an example type handler here: > > >>>> --------------------------------------------------------------------------- > >>>> ------------------------- > > >>>> > package org.telluriumsource.ut > > >>>> > import org.telluriumsource.test.ddt.mapping.type.TypeHandler > > >>>> > class PhoneNumberTypeHandler implements TypeHandler{ > >>>> > protected final static String PHONE_SEPARATOR = "-" > >>>> > protected final static int PHONE_LENGTH = 12 > > >>>> > //remove the "-" inside the phone number > >>>> > public String valueOf(String s) { > >>>> > String value > > >>>> > if(s != null && (s.length() > 0)){ > >>>> > value = s.replaceAll(PHONE_SEPARATOR, "") > >>>> > }else { > >>>> > value = s > >>>> > } > > >>>> > return value > >>>> > } > > >>>> > } > >>>> > On Thu, Aug 5, 2010 at 5:16 PM, Jian Fang <[email protected]> > >>>> wrote: > >>>> > > Seems the following code could convert the uicode to a utf-8 string. > > >>>> > > �...@test > >>>> > > public void testUicode(){ > >>>> > > String test = > > >>>> "\u004E\u00FC\u0068\u0065\u00F0\u0061\u006E\u0020\u03AC\u03C1\u03C7"; > >>>> > > try { > >>>> > > String c = new String(test.getBytes(),"UTF-8"); > >>>> > > System.out.println("Converted: " + c); > >>>> > > } catch (UnsupportedEncodingException e) { > >>>> > > e.printStackTrace(); > >>>> > > } > >>>> > > } > > >>>> > > For your data driven test, you need to create a custom type handler. > >>>> Please > >>>> > > see the example here: > > >>>>http://code.google.com/p/aost/wiki/UserGuide070DetailsOnTellurium#typ. > >>>> .. > > >>>> > > Thanks, > > >>>> > > Jian > > >>>> > > On Thu, Aug 5, 2010 at 4:59 PM, Jian Fang <[email protected] > >>>> >wrote: > > >>>> > >> Seems you need to create a custom handle to convert the unicode to > >>>> "UTF8" > >>>> > >> format. I will try to find some time to see if I can create some > >>>> test code > >>>> > >> for you. > >>>> > >> Sorry for that, I am busy with Trump now. > > >>>> > >> Thanks, > > >>>> > >> Jian > > >>>> > >> On Thu, Aug 5, 2010 at 4:11 PM, Jade <[email protected]> wrote: > > >>>> > >>> Hi, > > >>>> > >>> Some of our test data includes unicode characters, such as: > > >>>> > >>> enterDocumentInformation|Test Title|My abstract is ... > >>>> \u004E\u00FC > >>>> > >>> \u0068\u0065 &|test; Vireo|3|[email protected]|true > > >>>> > >>> However, the unicode characters aren't being unencoded as they're > >>>> read > >>>> > >>> in and bound to the variable. > > >>>> > >>> String abstractText = bind("DocumentInformationData.abstract") > > >>>> > >>> println "abstractText: ${abstractText}" > > >>>> > >>> String test = > >>>> "\u004E\u00FC\u0068\u0065\u00F0\u0061\u006E\u0020\u03AC > >>>> > >>> \u03C1\u03C7" > >>>> > >>> println "test: ${test}" > > >>>> > >>> abstractText: My abstract is ... \u004E\u00FC\u0068\u0065 & > >>>> > >>> test: Nüheðan άρχ > > >>>> > >>> Is there another method that I need to call to unencode the > >>>> unicode? > > >>>> > >>> Jade > > >>>> > >>> -- > >>>> > >>> You received this message because you are subscribed to the Google > >>>> Groups > >>>> > >>> "tellurium-users" group. > >>>> > >>> To post to this group, send email to > >>>> [email protected]. > >>>> > >>> To unsubscribe from this group, send email to > >>>> > >>> [email protected]<tellurium-users%2Bunsubscribe@ > >>>> > >>> googlegroups.com> > >>>> <tellurium-users%[email protected]<tellurium-users%252Bunsubsc > >>>> [email protected]> > > >>>> > >>> . > >>>> > >>> For more options, visit this group at > >>>> > >>>http://groups.google.com/group/tellurium-users?hl=en. -- You received this message because you are subscribed to the Google Groups "tellurium-users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tellurium-users?hl=en.
