BTW, I used your data to created the following DDT module:
class UnicodeModule extends TelluriumDataDrivenModule{
void defineModule() {
typeHandler "unicode", "org.telluriumsource.ddt.UnicodeTypeHandler"
fs.FieldSet(name: "record", description: "Data format for testing
Unicode") {
Test(value: "testUnicode")
Field(name: "title", description: "test title")
Field(name: "abstract", type: "unicode", description: "abstract")
Field(name: "email", description: "email")
Field(name: "indicator", type: "boolean", description: "indicator")
}
defineTest("testUnicode") {
String title = bind("record.title")
String abst = bind("record.abstract")
String email = bind("record.email")
boolean indicator = bind("record.indicator")
println "$title, $abst, $email, $indicator"
}
}
}
where the "unicode" type handler is defined as follows.
class UnicodeTypeHandler implements TypeHandler {
public String valueOf(String s) {
if(s == null || s.trim().length() == 0){
return s;
}
return parseUnicode(s);
}
You can find the whole test case from trunk/core.
Let us know if you have further problems.
Thanks,
Jian
On Wed, Aug 18, 2010 at 1:18 AM, Jian Fang <[email protected]> wrote:
> Finally find some time to get back to this topic, I found a utility class
> as follows to parse unicode:
>
> public static String parseUnicode(String input)
> {
> StringTokenizer st = new StringTokenizer(input, "\\", true);
>
> StringBuffer sb = new StringBuffer();
>
> while(st.hasMoreTokens())
> {
> String token = st.nextToken();
> if (token.charAt(0) == '\\' && token.length() == 1)
> {
> if(st.hasMoreTokens())
> {
> token = st.nextToken();
> }
> if(token.charAt(0) == 'u')
> {
> String hexnum;
> if (token.length() > 5)
> {
> hexnum = token.substring(1,5);
> token = token.substring(5);
> }
> else
> {
> hexnum = token.substring(1);
> token = "";
> }
> sb.append((char)Integer.parseInt(hexnum, 16));
> }
> }
> sb.append(token);
> }
> return sb.toString();
>
> }
>
>
> On Fri, Aug 6, 2010 at 3:53 PM, Jian Fang <[email protected]>wrote:
>
>>
>>
>>
>> Not sure if this helps:
>>
>> http://www.jguru.com/faq/view.jsp?EID=137049
>>
>>
>> On Fri, Aug 6, 2010 at 3:49 PM, Jian Fang <[email protected]>wrote:
>>
>>> I see your problem here, in your input file, the unicode is presented as
>>> plain text and thus,
>>> the Java String also treats them as a String. One thing you can do is to
>>> convert
>>> the unicode String back to unicode, then do the conversion to utf-8.
>>>
>>> For example, you can have a state machine, which traces the start
>>> character "\u", i.e,
>>> two characters "\", "u", then you should know it is a unicode for the
>>> next couple characters.
>>>
>>> There may be some better way to handle this. Need do some googling.
>>>
>>> Thanks,
>>>
>>> Jian
>>>
>>>
>>> On Fri, Aug 6, 2010 at 12:57 PM, Jade <[email protected]> wrote:
>>>
>>>> Hi Jian,
>>>>
>>>> Thank you for all of the information. I tried implementing the
>>>> UnicodeTypeHandler as you mentioned. However, the new
>>>> String(test.getBytes(),"UTF-8"); call isn't working correctly because
>>>> the bytes at that point are already not correct.
>>>>
>>>> The input string s is:
>>>>
>>>> [\, Q, e, n, t, e, r, D, o, c, u, m, e, n, t, I, n, f, o, r, m, a, t,
>>>> i, o, n, |, T, e, s, t, , T, i, t, l, e, |, M, y, , a, b, s, t, r,
>>>> a, c, t, , i, s, , ., ., ., , \, u, 0, 0, 4, E, \, u, 0, 0, F, C,
>>>> \, u, 0, 0, 6, 8, \, u, 0, 0, 6, 5, , &, |, t, e, s, t, ;, , V, i,
>>>> r, e, o, |, 3, |, n, o, -, r, e, p, l, y, @, t, d, l, ., o, r, g, |,
>>>> t, r, u, e, \, E]
>>>>
>>>> and the part of the string that represents the data is:
>>>>
>>>> My abstract is ... \u004E\u00FC\u0068\u0065 &
>>>>
>>>> abstractText (in the data file): My abstract is ... \u004E\u00FC
>>>> \u0068\u0065 &
>>>>
>>>> String test = "My abstract is ... \u004E\u00FC\u0068\u0065 &";
>>>>
>>>> In the console:
>>>>
>>>> abstractText: My abstract is ... \u004E\u00FC\u0068\u0065 &
>>>> test: My abstract is ... Nühe &
>>>>
>>>> I looked at the data in the debugger:
>>>>
>>>> In the debugger, the \u is double-escaped: \\u
>>>> c: My abstract is ... \u004E\u00FC\u0068\u0065 &
>>>>
>>>> Each char is seen as a char, the \u was not correctly interpreted:
>>>>
>>>> thus c is 45 chars.
>>>> [M, y, , a, b, s, t, r, a, c, t, , i, s, , ., ., ., , \, u, 0, 0,
>>>> 4, E, \, u, 0, 0, F, C, \, u, 0, 0, 6, 8, \, u, 0, 0, 6, 5, , &]
>>>>
>>>> cBytes: [77, 121, 32, 97, 98, 115, 116, 114, 97, 99, 116, 32, 105,
>>>> 115, 32, 46, 46, 46, 32, 92, 117, 48, 48, 52, 69, 92, 117, 48, 48, 70,
>>>> 67, 92, 117, 48, 48, 54, 56, 92, 117, 48, 48, 54, 53, 32, 38]
>>>>
>>>> d: (25 chars) [M, y, , a, b, s, t, r, a, c, t, , i,
>>>> s, , ., ., ., , N, ü, h, e, , &]
>>>>
>>>> dBytes: [77, 121, 32, 97, 98, 115, 116, 114, 97, 99, 116, 32, 105,
>>>> 115, 32, 46, 46, 46, 32, 78, -61, -68, 104, 101, 32, 38]
>>>>
>>>>
>>>> Jade
>>>>
>>>> On Aug 5, 4:18 pm, Jian Fang <[email protected]> wrote:
>>>> > To save your time, I post an example type handler here:
>>>> >
>>>> >
>>>> ----------------------------------------------------------------------------------------------------
>>>> >
>>>> > package org.telluriumsource.ut
>>>> >
>>>> > import org.telluriumsource.test.ddt.mapping.type.TypeHandler
>>>> >
>>>> > class PhoneNumberTypeHandler implements TypeHandler{
>>>> > protected final static String PHONE_SEPARATOR = "-"
>>>> > protected final static int PHONE_LENGTH = 12
>>>> >
>>>> > //remove the "-" inside the phone number
>>>> > public String valueOf(String s) {
>>>> > String value
>>>> >
>>>> > if(s != null && (s.length() > 0)){
>>>> > value = s.replaceAll(PHONE_SEPARATOR, "")
>>>> > }else {
>>>> > value = s
>>>> > }
>>>> >
>>>> > return value
>>>> > }
>>>> >
>>>> > }
>>>> > On Thu, Aug 5, 2010 at 5:16 PM, Jian Fang <[email protected]>
>>>> wrote:
>>>> > > Seems the following code could convert the uicode to a utf-8 string.
>>>> >
>>>> > > @Test
>>>> > > public void testUicode(){
>>>> > > String test =
>>>> > >
>>>> "\u004E\u00FC\u0068\u0065\u00F0\u0061\u006E\u0020\u03AC\u03C1\u03C7";
>>>> > > try {
>>>> > > String c = new String(test.getBytes(),"UTF-8");
>>>> > > System.out.println("Converted: " + c);
>>>> > > } catch (UnsupportedEncodingException e) {
>>>> > > e.printStackTrace();
>>>> > > }
>>>> > > }
>>>> >
>>>> > > For your data driven test, you need to create a custom type handler.
>>>> Please
>>>> > > see the example here:
>>>> >
>>>> > >
>>>> http://code.google.com/p/aost/wiki/UserGuide070DetailsOnTellurium#typ.
>>>> ..
>>>> >
>>>> > > Thanks,
>>>> >
>>>> > > Jian
>>>> >
>>>> > > On Thu, Aug 5, 2010 at 4:59 PM, Jian Fang <[email protected]
>>>> >wrote:
>>>> >
>>>> > >> Seems you need to create a custom handle to convert the unicode to
>>>> "UTF8"
>>>> > >> format. I will try to find some time to see if I can create some
>>>> test code
>>>> > >> for you.
>>>> > >> Sorry for that, I am busy with Trump now.
>>>> >
>>>> > >> Thanks,
>>>> >
>>>> > >> Jian
>>>> >
>>>> > >> On Thu, Aug 5, 2010 at 4:11 PM, Jade <[email protected]> wrote:
>>>> >
>>>> > >>> Hi,
>>>> >
>>>> > >>> Some of our test data includes unicode characters, such as:
>>>> >
>>>> > >>> enterDocumentInformation|Test Title|My abstract is ...
>>>> \u004E\u00FC
>>>> > >>> \u0068\u0065 &|test; Vireo|3|[email protected]|true
>>>> >
>>>> > >>> However, the unicode characters aren't being unencoded as they're
>>>> read
>>>> > >>> in and bound to the variable.
>>>> >
>>>> > >>> String abstractText = bind("DocumentInformationData.abstract")
>>>> >
>>>> > >>> println "abstractText: ${abstractText}"
>>>> >
>>>> > >>> String test =
>>>> "\u004E\u00FC\u0068\u0065\u00F0\u0061\u006E\u0020\u03AC
>>>> > >>> \u03C1\u03C7"
>>>> > >>> println "test: ${test}"
>>>> >
>>>> > >>> abstractText: My abstract is ... \u004E\u00FC\u0068\u0065 &
>>>> > >>> test: Nüheðan άρχ
>>>> >
>>>> > >>> Is there another method that I need to call to unencode the
>>>> unicode?
>>>> >
>>>> > >>> Jade
>>>> >
>>>> > >>> --
>>>> > >>> You received this message because you are subscribed to the Google
>>>> Groups
>>>> > >>> "tellurium-users" group.
>>>> > >>> To post to this group, send email to
>>>> [email protected].
>>>> > >>> To unsubscribe from this group, send email to
>>>> > >>> [email protected]<tellurium-users%[email protected]>
>>>> <tellurium-users%[email protected]<tellurium-users%[email protected]>
>>>> >
>>>> > >>> .
>>>> > >>> For more options, visit this group at
>>>> > >>>http://groups.google.com/group/tellurium-users?hl=en.
>>>>
>>>
>>>
>>
>>
>
--
You received this message because you are subscribed to the Google Groups
"tellurium-users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tellurium-users?hl=en.