souri datta wrote:
Hi,
I have an external function which passes 3 arguments (line ,pattern,vecor_of_matched_subexpression). When this 'line' contains unicode character ,the transcode method is throwing an exception 202(transcoding error).
The code looks like:
CharVectorType inputVector;
(args[0]->str()).transcode(inputVector);
 //transcode appends one terminating NULL('\0') char which
 // is not part of the original string
 inputString.assign(inputVector.begin(),inputVector.end()-1);
I have removed the try..catch block here.

How can I convert args[0]->str()  to std::string ?
You need to decide what encoding to use. Clearly, the local code page will not support all of the characters you need.

(i need this string to be passed to boost::regex_search method to search for pattern)
If boost::regex supports UTF-8, then you can transcode to UTF-8. To get a UTF-8 transcoder, you can either use the Xalan-C function XalanTranscodingServices::makeNewTranscoder(), or the Xerces-C function XMLTransService::makeNewTranscoderFor(). Search the Xerces-C and Xalan-C code bases for examples of how to use the transcoders.

If boost::regex doesn't support UTF-8, then you will need to decide what code page will support the characters you need, and create a transcoder for that code page.

The larger problem of figuring out how the data you are searching with the regex is encoded is not something anyone can help you with. You may need to transcode all of that data to a common encoding to make sure your regular expressions work correctly.

Dave

Reply via email to