Well, working from limited knowledge of BSTRs, you can assume that it is identical to a wchar_t* and assume that it will be null terminated. In this case you can easily create a DOMString from it as (given our assumption) a BSTR is identical to an XMLCh* on windows platforms. Since you are talking about a BSTR, I am assuming that you are working on windows. Note that on other platforms there may be a difference between the native wchar_t and XMLCh (for example Solaris and HP-UX).
The null termination and length issues show up only if an application chooses to have embedded nulls in the data. A BSTR is capable of containing embedded nulls but this is application dependent. COM does not throw embedded nulls into the string. It just gives callers a uniform allocation/de-allocation mechanism. Here's a link to BSTR documentation on MSDN. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/automat/htm _hh2/chap7_2xgz.asp IMHO, for most applications, you can probably assume that a BSTR is identical to a wchar_t* or XMLCh* on windows. Hence no transcoding is necessary. Note that this is not portable, but as I said, since you are working with BSTRs, portability is probably not a concern? Samar -----Original Message----- From: Bavishi, Pankij [mailto:[EMAIL PROTECTED]] Sent: Tuesday, February 12, 2002 14:51 To: [EMAIL PROTECTED] Subject: RE: BSTR Hello Ellis, Thanks a lot for a detailed explanation. Yeah it is very helpful. But right now on practical grounds, I need to know how I can convert the BSTR data into something that Xerces-C++ can understand. Right now I am only aware of something like DOMString. Do you have any suggestions like using transcode() etc.. If yes could you please detail it out. Thank you so much pankaj -----Original Message----- From: Mr Ellis Birt [mailto:[EMAIL PROTECTED]] Sent: Monday, February 11, 2002 9:10 AM To: [EMAIL PROTECTED] Subject: Re: BSTR Pankij, The previous replies to your question did not really explain why BSTR and DomString are not compatible. I hope this helps you to understand why: The only difference between a Unicode BSTR and an ordinary LPWSTR is that the memory allocated to the BSTR includes the two bytes (May be 4 bytes - don't manipulate directly!) before the one that is pointed to (these hold the length of the string). The buffer that holds the characters can contain nulls, but they do not determine the end of the string. In my experience, however almost all BSTR stings do have a null in the first unused character. DomString is a class (much like the MFC CString and C++ standard library's string) that makes the handling of variable length string data much simpler for the programmer. Its underlying data type is XMLString, which has a memory image similar to that of BSTR. Indeed you can get away with using a BSTR with a null following the last character as a parameter into Xerces functions that do not modify the value (Xerces does not know about the length information and cannot update it). This is the same problem with trying to use a BSTR instead of a DomString (since DomString's methods do not know about the BSTR length information). It would not make sense for the Xerces developers to add support because BSTR does not exist in the other platforms supported by Xerces. When using BSTR in place of other (null-terminated) Unicode string types, remember that as Microsoft says in the documentation for SysAllocStringLen: "The pch string can contain embedded null characters and does not need to end with a NULL" This could be your undoing if you are not careful! NB if you try the reverse (passing a DomString cast to a BSTR) the function you are calling is going to look for the non-existent length information just before the string (you are getting into serious problems here and asking for different behaviour between debug and release builds) A little bit of trivia: BSTR comes from Visual Basic which has stored its variable length stings like this (but using bytes for characters) since the early days (useful to know when you are manipulating a string passed from VB to a C++ DLL). When 16-bit OLE automation came out, BSTR was the obvious format because it already existed in VB. This was them migrated to 32 bit with the change to wide characters. I hope this helps. Ellis Mr Ellis J Birt Dip. Comp.(Open) Applications Developer StruMap, Geodesys Ltd, Astwood House, 1296 Evesham Road, Astwood Bank, Redditch, Worcestershire B96 6AD, UK Tel: +44 (0) 1527 893758 Fax: +44 (0)1527 893833 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
