On 7/17/07, DM Smith <[EMAIL PROTECTED]> wrote:
According to the UTF-8 spec \uFEFF is not a BOM. In UTF-8 the byte
order is always the same.
But there is a BOM for UTF-8 (even though there is no endian
component, it does serve as a marker indicating the text file is
unicode text encoded in UTF-
[
https://issues.apache.org/jira/browse/LUCENE-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-962.
---
Resolution: Fixed
Lucene Fields: [New, Patch Available] (was: [Patch Available
[
https://issues.apache.org/jira/browse/LUCENE-962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513390
]
Michael McCandless commented on LUCENE-962:
---
Good catch Steven!
I think where you put the try/finally is g
"Peter Keegan" <[EMAIL PROTECTED]> wrote:
> I did some performance comparison testing of Lucene 2.0 vs. trunk (with
> LUCENE-843). I'm seeing at least a 4X increase in indexing rate with the new
> DocumentsWriter in LUCENE-843 (still doing single-threaded indexing). Better
> yet, the total time to
[
https://issues.apache.org/jira/browse/LUCENE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-963:
--
Lucene Fields: [New, Patch Available] (was: [New])
> Add setters to Field to allow re-
[
https://issues.apache.org/jira/browse/LUCENE-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-963:
--
Attachment: LUCENE-963.patch
> Add setters to Field to allow re-use of Field instances
Add setters to Field to allow re-use of Field instances during indexing
---
Key: LUCENE-963
URL: https://issues.apache.org/jira/browse/LUCENE-963
Project: Lucene - Java
Issu
According to the UTF-8 spec \uFEFF is not a BOM. In UTF-8 the byte
order is always the same.
UTF-16 defines a BOM.
If it is a BOM, I bet it was put in there by an MS editor.
On Jul 17, 2007, at 1:58 PM, Yonik Seeley wrote:
On 7/17/07, Steven Parkes <[EMAIL PROTECTED]> wrote:
Can we get rid
[
https://issues.apache.org/jira/browse/LUCENE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll reassigned LUCENE-960:
--
Assignee: Grant Ingersoll
> SpanQueryFilter addition
>
>
>
I did some performance comparison testing of Lucene 2.0 vs. trunk (with
LUCENE-843). I'm seeing at least a 4X increase in indexing rate with the new
DocumentsWriter in LUCENE-843 (still doing single-threaded indexing). Better
yet, the total time to build the index is much shorter because I can now
[
https://issues.apache.org/jira/browse/LUCENE-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steven Parkes updated LUCENE-962:
-
Attachment: LUCENE-962.patch.txt
Patch adds wrappers in IndexWriter to catch exceptions thrown in
I/O exception in DocsWriter add or updateDocument may not delete unreferenced
files
---
Key: LUCENE-962
URL: https://issues.apache.org/jira/browse/LUCENE-962
Project: Luc
On 7/17/07, Steven Parkes <[EMAIL PROTECTED]> wrote:
Can we get rid of the binary characters at the front of CHANGES.txt? Or
do the mean something?
AFAIK, It's a UTF-8 BOM. I don't know who's editor put it there, but
it looks like the last time I edited CHANGES.txt it mangled it. I
just remov
: Can we get rid of the binary characters at the front of CHANGES.txt? Or
: do the mean something?
i may be wrong (I frequently am) but i think that's signifies (to
someone) that it's a UTF-8 file...
http://www.firstobject.com/dn_markutf8preamble.htm
...assuming that source can be trusted, it's
Can we get rid of the binary characters at the front of CHANGES.txt? Or
do the mean something?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
15 matches
Mail list logo