Re: [Ecls-list] Unicode 16-bits

2011-02-22 Thread Juan Jose Garcia-Ripoll
On Tue, Feb 22, 2011 at 3:33 AM, Daniel Herring dherr...@tentpost.comwrote:

 As for the database, you can always split it into separately loadable
 chunks and throw an error if a chunk is not available when needed.


It seems that I did not explain myself properly. There have been several
threads already about embedding ECL into mobile devices which do not have a
proper filesystem. For those, embedding the database is a must. It also
becomes an essential item when having ECL moved around as standalone
executable, or when deriving programs from it -- also a source of confusion
and problems for some users.

I am not going to move ECL from its current single-word (16 or 32 bit)
encoding right now. It would be too much of a hassle. I am just offering the
possibility of having a compromise for devices and platforms that do not
care much about the full character set -- OS X or Windows, they only support
16-bit characters in their libraries AFAIK.

Juanjo

-- 
Instituto de FĂ­sica Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
http://juanjose.garciaripoll.googlepages.com
--
Index, Search  Analyze Logs and other IT data in Real-Time with Splunk 
Collect, index and harness all the fast moving IT data generated by your 
applications, servers and devices whether physical, virtual or in the cloud.
Deliver compliance at lower cost and gain new business insights. 
Free Software Download: http://p.sf.net/sfu/splunk-dev2dev___
Ecls-list mailing list
Ecls-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ecls-list


Re: [Ecls-list] Unicode 16-bits

2011-02-22 Thread Raymond Toy
On 2/22/11 3:14 AM, Juan Jose Garcia-Ripoll wrote:

 I am not going to move ECL from its current single-word (16 or 32 bit)
 encoding right now. It would be too much of a hassle. I am just
 offering the possibility of having a compromise for devices and
 platforms that do not care much about the full character set -- OS X
 or Windows, they only support 16-bit characters in their libraries AFAIK.

I believe they use utf-16, so it's not really just a 16-bit character.

I think the best answer to your question would be to ask who actually
needs to access characters outside the basic multilingual plane of 16
bits.   If there are none, then 16 bits is fine.  If there are some,
then you can decide whether you want to make strings utf-16 or not, or
whether they can live with the 32-bit character type.

Ray


--
Free Software Download: Index, Search  Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
___
Ecls-list mailing list
Ecls-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ecls-list


Re: [Ecls-list] Unicode 16-bits

2011-02-21 Thread Daniel Herring
On Sat, 19 Feb 2011, Juan Jose Garcia-Ripoll wrote:

 Would you find it useful to have an ECL that only supports character codes 0 
 - 65535? That would make it probably easier to embed the part of the Unicode 
 database associated to it ( 65535 bytes) and have a standalone executable.
 Executables would also be a bit faster and use less memory (16-bits vs 
 32-bits per character)

...

Not sure I follow.  For many people, that would be fine; but its a subset 
of unicode and could cause confusion when it breaks.

Lately I've heard several fairly knowledgeable people say UTF-8 really is 
ideal.  While UTF-32 allows immediate indexing to a given codepoint, that 
doesn't help with common tasks due to combining marks and such.

They appear to be supported by (or have subverted) wikipedia.
http://en.wikipedia.org/wiki/Utf-32
http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings#Processing_issues

As for the database, you can always split it into separately loadable 
chunks and throw an error if a chunk is not available when needed.

- Daniel

--
Index, Search  Analyze Logs and other IT data in Real-Time with Splunk 
Collect, index and harness all the fast moving IT data generated by your 
applications, servers and devices whether physical, virtual or in the cloud.
Deliver compliance at lower cost and gain new business insights. 
Free Software Download: http://p.sf.net/sfu/splunk-dev2dev
___
Ecls-list mailing list
Ecls-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ecls-list


Re: [Ecls-list] Unicode 16-bits

2011-02-20 Thread Matthew Mondor
On Sat, 19 Feb 2011 23:43:33 +
Juan Jose Garcia-Ripoll juanjose.garciarip...@googlemail.com wrote:

 Would you find it useful to have an ECL that only supports character codes 0
 - 65535? That would make it probably easier to embed the part of the Unicode
 database associated to it ( 65535 bytes) and have a standalone executable.
 Executables would also be a bit faster and use less memory (16-bits vs
 32-bits per character)

Would this be an option or would ECL internally use a 16-bit character
representation all the time when unicode support is enabled for the
build?

Also I understand that the representation would take less memory but
would it really be faster on 32-bit+ processors?  I know that some
processors (including older x86) have faster access times for 32-bit
than 16-bit or 8-bit values (i.e. some time back I had to adapt an
arcfour implementation to use 32-bit words rather than 8-bit ones for
the internal state, despite it only holding values between 0 and 255,
to enhance its performance).

I also admit that I have some code assuming a 32-bit representation,
but it's also ECL specific and could be adapted easily; I don't think
that I make use of any character above 65535 myself.  That said I have
no idea what input I might have to deal with eventually, it's
unpredictable.

As for the 65535 bytes output file limitation, is that more difficult
to fix?  Is it a toolchain-dependent issue which ECL has no control
over?

Thanks,
-- 
Matt

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
Ecls-list mailing list
Ecls-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ecls-list