Re: [Ecls-list] Unicode 16-bits
On Tue, Feb 22, 2011 at 3:33 AM, Daniel Herring dherr...@tentpost.comwrote: As for the database, you can always split it into separately loadable chunks and throw an error if a chunk is not available when needed. It seems that I did not explain myself properly. There have been several threads already about embedding ECL into mobile devices which do not have a proper filesystem. For those, embedding the database is a must. It also becomes an essential item when having ECL moved around as standalone executable, or when deriving programs from it -- also a source of confusion and problems for some users. I am not going to move ECL from its current single-word (16 or 32 bit) encoding right now. It would be too much of a hassle. I am just offering the possibility of having a compromise for devices and platforms that do not care much about the full character set -- OS X or Windows, they only support 16-bit characters in their libraries AFAIK. Juanjo -- Instituto de FĂsica Fundamental, CSIC c/ Serrano, 113b, Madrid 28006 (Spain) http://juanjose.garciaripoll.googlepages.com -- Index, Search Analyze Logs and other IT data in Real-Time with Splunk Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. Free Software Download: http://p.sf.net/sfu/splunk-dev2dev___ Ecls-list mailing list Ecls-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ecls-list
Re: [Ecls-list] Unicode 16-bits
On 2/22/11 3:14 AM, Juan Jose Garcia-Ripoll wrote: I am not going to move ECL from its current single-word (16 or 32 bit) encoding right now. It would be too much of a hassle. I am just offering the possibility of having a compromise for devices and platforms that do not care much about the full character set -- OS X or Windows, they only support 16-bit characters in their libraries AFAIK. I believe they use utf-16, so it's not really just a 16-bit character. I think the best answer to your question would be to ask who actually needs to access characters outside the basic multilingual plane of 16 bits. If there are none, then 16 bits is fine. If there are some, then you can decide whether you want to make strings utf-16 or not, or whether they can live with the 32-bit character type. Ray -- Free Software Download: Index, Search Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev ___ Ecls-list mailing list Ecls-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ecls-list
Re: [Ecls-list] Unicode 16-bits
On Sat, 19 Feb 2011, Juan Jose Garcia-Ripoll wrote: Would you find it useful to have an ECL that only supports character codes 0 - 65535? That would make it probably easier to embed the part of the Unicode database associated to it ( 65535 bytes) and have a standalone executable. Executables would also be a bit faster and use less memory (16-bits vs 32-bits per character) ... Not sure I follow. For many people, that would be fine; but its a subset of unicode and could cause confusion when it breaks. Lately I've heard several fairly knowledgeable people say UTF-8 really is ideal. While UTF-32 allows immediate indexing to a given codepoint, that doesn't help with common tasks due to combining marks and such. They appear to be supported by (or have subverted) wikipedia. http://en.wikipedia.org/wiki/Utf-32 http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings#Processing_issues As for the database, you can always split it into separately loadable chunks and throw an error if a chunk is not available when needed. - Daniel -- Index, Search Analyze Logs and other IT data in Real-Time with Splunk Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. Free Software Download: http://p.sf.net/sfu/splunk-dev2dev ___ Ecls-list mailing list Ecls-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ecls-list
Re: [Ecls-list] Unicode 16-bits
On Sat, 19 Feb 2011 23:43:33 + Juan Jose Garcia-Ripoll juanjose.garciarip...@googlemail.com wrote: Would you find it useful to have an ECL that only supports character codes 0 - 65535? That would make it probably easier to embed the part of the Unicode database associated to it ( 65535 bytes) and have a standalone executable. Executables would also be a bit faster and use less memory (16-bits vs 32-bits per character) Would this be an option or would ECL internally use a 16-bit character representation all the time when unicode support is enabled for the build? Also I understand that the representation would take less memory but would it really be faster on 32-bit+ processors? I know that some processors (including older x86) have faster access times for 32-bit than 16-bit or 8-bit values (i.e. some time back I had to adapt an arcfour implementation to use 32-bit words rather than 8-bit ones for the internal state, despite it only holding values between 0 and 255, to enhance its performance). I also admit that I have some code assuming a 32-bit representation, but it's also ECL specific and could be adapted easily; I don't think that I make use of any character above 65535 myself. That said I have no idea what input I might have to deal with eventually, it's unpredictable. As for the 65535 bytes output file limitation, is that more difficult to fix? Is it a toolchain-dependent issue which ECL has no control over? Thanks, -- Matt -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Ecls-list mailing list Ecls-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ecls-list