souri datta wrote:
Hi,
I have an external function which passes 3 arguments (line
,pattern,vecor_of_matched_subexpression).
When this 'line' contains unicode character ,the transcode method is
throwing an exception 202(transcoding error).
The code looks like:
CharVectorType inputVector;
(args[0]->str()).transcode(inputVector);
//transcode appends one terminating NULL('\0') char which
// is not part of the original string
inputString.assign(inputVector.begin(),inputVector.end()-1);
I have removed the try..catch block here.
How can I convert args[0]->str() to std::string ?
You need to decide what encoding to use. Clearly, the local code page
will not support all of the characters you need.
(i need this string to be passed to boost::regex_search method to search
for pattern)
If boost::regex supports UTF-8, then you can transcode to UTF-8. To get
a UTF-8 transcoder, you can either use the Xalan-C function
XalanTranscodingServices::makeNewTranscoder(), or the Xerces-C function
XMLTransService::makeNewTranscoderFor(). Search the Xerces-C and
Xalan-C code bases for examples of how to use the transcoders.
If boost::regex doesn't support UTF-8, then you will need to decide what
code page will support the characters you need, and create a transcoder
for that code page.
The larger problem of figuring out how the data you are searching with
the regex is encoded is not something anyone can help you with. You may
need to transcode all of that data to a common encoding to make sure
your regular expressions work correctly.
Dave