date:20050827

Re: Fwd: Lucene does NOT use UTF-8.

2005-08-27 Thread Daniel Naber

On Saturday 27 August 2005 16:05, Marvin Humphrey wrote: > Lucene should not be advertising that it uses "standard UTF-8" -- or > even UTF-8 at all, since "Modified UTF-8" is _illegal_ UTF-8. For now, I've changed the information about the file format documentation. Regards Daniel -- http

Re: Lucene does NOT use UTF-8.

2005-08-27 Thread jian chen

Hi, Ken, Thanks for your email. You are right, I was meant to propose that Lucene switch to use true UTF-8, rather than having to work around this issue by fixing the caused problems elsewhere. Also, conforming to standards like UTF-8 will make the code easier for new developers to pick up.

Re: Lucene does NOT use UTF-8.

2005-08-27 Thread Ken Krugler

On Aug 26, 2005, at 10:14 PM, jian chen wrote: It seems to me that in theory, Lucene storage code could use true UTF-8 to store terms. Maybe it is just a legacy issue that the modified UTF-8 is used? The use of 0xC0 0x80 to encode a U+ Unicode code point is an aspect of Java serialization

Re: Lucene does NOT use UTF-8.

2005-08-27 Thread Marvin Humphrey

On Aug 26, 2005, at 10:14 PM, jian chen wrote: It seems to me that in theory, Lucene storage code could use true UTF-8 to store terms. Maybe it is just a legacy issue that the modified UTF-8 is used? It's not a matter of a simple switch. The VInt count at the head of a Lucene string is

Fwd: Lucene does NOT use UTF-8.

2005-08-27 Thread Marvin Humphrey

Greets, Discussion moved from the users list as per suggestion... -- Marvin Humphrey Begin forwarded message: From: Marvin Humphrey <[EMAIL PROTECTED]> Date: August 26, 2005 9:18:21 PM PDT To: java-user@lucene.apache.org, [EMAIL PROTECTED] Subject: Lucene does NOT use UTF-8. Reply-To: java-use

Fwd: Standard or Modified UTF-8?

2005-08-27 Thread Marvin Humphrey

Greets, It was suggested that I move this to the developers list from the users list... -- Marvin Humphrey Begin forwarded message: From: Marvin Humphrey <[EMAIL PROTECTED]> Date: August 26, 2005 4:51:27 PM PDT To: java-user@lucene.apache.org Subject: Standard or Modified UTF-8? Reply-To: j

Re: Fwd: Lucene does NOT use UTF-8.

Re: Lucene does NOT use UTF-8.

Re: Lucene does NOT use UTF-8.

Re: Lucene does NOT use UTF-8.

Fwd: Lucene does NOT use UTF-8.

Fwd: Standard or Modified UTF-8?

6 matches

Site Navigation

Mail list logo

Footer information