Re: T_WCHAR final

Marvin Humphrey Thu, 30 Nov 2006 08:27:28 -0800


On Nov 30, 2006, at 4:42 AM, Thomas Busch wrote:

what do you think is faster ? Scanning the input or
allocing more memory ?

Benchmarking is the only way to know for sure. And overshooting onallocation is a design tradeoff.

If you knew the sting's length already, naive allocation wouldprobably be faster. Until you hit swap. ;)

But wcslen is doing a scan already, right? So replace that with yourown custom scan and see what happens.

For European languages
the length should be between 1 and 2 wcslen(src).

Yes. This is a classic problem. It's the reason my big patchchanging Java Lucene to use legal UTF-8 and a bytecount-based Stringheader causes a 20% performance hit. (<https://issues.apache.org/jira/browse/LUCENE-510>) Java's internal routines for precisely thistask -- negotiating how much memory is required when convertingbetween two variable-length Unicode encodings -- are to blame.

You're working on this because you want to manipulate CLucene stringdata from perl-space, correct? You're starting down a long, well-traveled road. ;)

Also where does the +1 come from ?

Null termination. It should be there even though a Perl scalar knowsits own length and may contain null bytes.


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Re: T_WCHAR final

Reply via email to