On the other hand, if you want it to be Unicode correct, you could
use my ElfData plugin's function Scan_NextUTF8 <http://
www.elfdata.com/plugin/technicalref/ElfDataMCat12.html#Scan_NextUTF8>.
I wouldn't trust split to be fast on Unicode, especially with large
arrays.
How big are the files though? A few MB? KB? Or hundreds of MB? The
strategy depends on the size. Assuming "not too big" files, you can
just read the entire file in and process it using
ElfData.Scan_NextUTF8 :)
Once you start getting into files that don't comfortably fit in your
RAM caches, it's time to start reading in chunks which requires a
more sophisticated approach, to read small enough chunks to use split
() on again, so you wouldn't need my plugin for that.
--
http://elfdata.com/plugin/
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>