unicode characters aren't displaying correctly with data driven test

Jian Fang Fri, 06 Aug 2010 12:53:06 -0700

Not sure if this helps:

http://www.jguru.com/faq/view.jsp?EID=137049



On Fri, Aug 6, 2010 at 3:49 PM, Jian Fang <[email protected]> wrote:

> I see your problem here, in your input file, the unicode is presented as
> plain text and thus,
> the Java String also treats them as a String. One thing you can do is to
> convert
> the unicode String back to unicode, then do the conversion to utf-8.
>
> For example, you can have a state machine, which traces the start character
> "\u", i.e,
> two characters "\", "u", then you should know it is a unicode for the next
> couple characters.
>
> There may be some better way to handle this. Need do some googling.
>
> Thanks,
>
> Jian
>
>
> On Fri, Aug 6, 2010 at 12:57 PM, Jade <[email protected]> wrote:
>
>> Hi Jian,
>>
>> Thank you for all of the information. I tried implementing the
>> UnicodeTypeHandler as you mentioned. However, the new
>> String(test.getBytes(),"UTF-8"); call isn't working correctly because
>> the bytes at that point are already not correct.
>>
>> The input string s is:
>>
>> [\, Q, e, n, t, e, r, D, o, c, u, m, e, n, t, I, n, f, o, r, m, a, t,
>> i, o, n, |, T, e, s, t,  , T, i, t, l, e, |, M, y,  , a, b, s, t, r,
>> a, c, t,  , i, s,  , ., ., .,  , \, u, 0, 0, 4, E, \, u, 0, 0, F, C,
>> \, u, 0, 0, 6, 8, \, u, 0, 0, 6, 5,  , &, |, t, e, s, t, ;,  , V, i,
>> r, e, o, |, 3, |, n, o, -, r, e, p, l, y, @, t, d, l, ., o, r, g, |,
>> t, r, u, e, \, E]
>>
>> and the part of the string that represents the data is:
>>
>> My abstract is ... \u004E\u00FC\u0068\u0065 &
>>
>> abstractText (in the data file): My abstract is ... \u004E\u00FC
>> \u0068\u0065 &
>>
>> String test = "My abstract is ... \u004E\u00FC\u0068\u0065 &";
>>
>> In the console:
>>
>> abstractText: My abstract is ... \u004E\u00FC\u0068\u0065 &
>> test: My abstract is ... Nühe &
>>
>> I looked at the data in the debugger:
>>
>> In the debugger, the \u is double-escaped: \\u
>> c: My abstract is ... \u004E\u00FC\u0068\u0065 &
>>
>> Each char is seen as a char, the \u was not correctly interpreted:
>>
>> thus c is 45 chars.
>> [M, y,  , a, b, s, t, r, a, c, t,  , i, s,  , ., ., .,  , \, u, 0, 0,
>> 4, E, \, u, 0, 0, F, C, \, u, 0, 0, 6, 8, \, u, 0, 0, 6, 5,  , &]
>>
>> cBytes: [77, 121, 32, 97, 98, 115, 116, 114, 97, 99, 116, 32, 105,
>> 115, 32, 46, 46, 46, 32, 92, 117, 48, 48, 52, 69, 92, 117, 48, 48, 70,
>> 67, 92, 117, 48, 48, 54, 56, 92, 117, 48, 48, 54, 53, 32, 38]
>>
>> d: (25 chars) [M, y,  , a, b, s, t, r, a, c, t,  , i,
>> s,  , ., ., .,  , N, ü, h, e,  , &]
>>
>> dBytes: [77, 121, 32, 97, 98, 115, 116, 114, 97, 99, 116, 32, 105,
>> 115, 32, 46, 46, 46, 32, 78, -61, -68, 104, 101, 32, 38]
>>
>>
>> Jade
>>
>> On Aug 5, 4:18 pm, Jian Fang <[email protected]> wrote:
>> > To save your time, I post an example type handler here:
>> >
>> >
>> ----------------------------------------------------------------------------------------------------
>> >
>> > package org.telluriumsource.ut
>> >
>> > import org.telluriumsource.test.ddt.mapping.type.TypeHandler
>> >
>> > class PhoneNumberTypeHandler implements TypeHandler{
>> >     protected final static String PHONE_SEPARATOR = "-"
>> >     protected final static int PHONE_LENGTH = 12
>> >
>> >     //remove the "-" inside the phone number
>> >     public String valueOf(String s) {
>> >         String value
>> >
>> >         if(s != null && (s.length() > 0)){
>> >              value = s.replaceAll(PHONE_SEPARATOR, "")
>> >         }else {
>> >             value = s
>> >         }
>> >
>> >         return value
>> >     }
>> >
>> > }
>> > On Thu, Aug 5, 2010 at 5:16 PM, Jian Fang <[email protected]>
>> wrote:
>> > > Seems the following code could convert the uicode to a utf-8 string.
>> >
>> > >    @Test
>> > >     public void testUicode(){
>> > >         String test =
>> > > "\u004E\u00FC\u0068\u0065\u00F0\u0061\u006E\u0020\u03AC\u03C1\u03C7";
>> > >         try {
>> > >             String c = new String(test.getBytes(),"UTF-8");
>> > >             System.out.println("Converted: " + c);
>> > >         } catch (UnsupportedEncodingException e) {
>> > >             e.printStackTrace();
>> > >         }
>> > >     }
>> >
>> > > For your data driven test, you need to create a custom type handler.
>> Please
>> > > see the example here:
>> >
>> > >http://code.google.com/p/aost/wiki/UserGuide070DetailsOnTellurium#typ.
>> ..
>> >
>> > > Thanks,
>> >
>> > > Jian
>> >
>> > > On Thu, Aug 5, 2010 at 4:59 PM, Jian Fang <[email protected]
>> >wrote:
>> >
>> > >> Seems you need to create a custom handle to convert the unicode to
>> "UTF8"
>> > >> format. I will try to find some time to see if I can create some test
>> code
>> > >> for you.
>> > >> Sorry for that, I am busy with Trump now.
>> >
>> > >> Thanks,
>> >
>> > >> Jian
>> >
>> > >> On Thu, Aug 5, 2010 at 4:11 PM, Jade <[email protected]> wrote:
>> >
>> > >>> Hi,
>> >
>> > >>> Some of our test data includes unicode characters, such as:
>> >
>> > >>> enterDocumentInformation|Test Title|My abstract is ... \u004E\u00FC
>> > >>> \u0068\u0065 &|test; Vireo|3|[email protected]|true
>> >
>> > >>> However, the unicode characters aren't being unencoded as they're
>> read
>> > >>> in and bound to the variable.
>> >
>> > >>> String abstractText = bind("DocumentInformationData.abstract")
>> >
>> > >>> println "abstractText: ${abstractText}"
>> >
>> > >>> String test =
>> "\u004E\u00FC\u0068\u0065\u00F0\u0061\u006E\u0020\u03AC
>> > >>> \u03C1\u03C7"
>> > >>> println "test: ${test}"
>> >
>> > >>> abstractText: My abstract is ... \u004E\u00FC\u0068\u0065 &
>> > >>> test: Nüheðan άρχ
>> >
>> > >>> Is there another method that I need to call to unencode the unicode?
>> >
>> > >>> Jade
>> >
>> > >>> --
>> > >>> You received this message because you are subscribed to the Google
>> Groups
>> > >>> "tellurium-users" group.
>> > >>> To post to this group, send email to
>> [email protected].
>> > >>> To unsubscribe from this group, send email to
>> > >>> [email protected]<tellurium-users%[email protected]>
>> <tellurium-users%[email protected]<tellurium-users%[email protected]>
>> >
>> > >>> .
>> > >>> For more options, visit this group at
>> > >>>http://groups.google.com/group/tellurium-users?hl=en.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tellurium-users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tellurium-users?hl=en.

unicode characters aren't displaying correctly with data driven test

Reply via email to