On May 17, 2006, at 2:04 PM, Doug Cutting wrote:
Detecting invalidly encoded text later doesn't help anything in
and of itself; lifting the requirement that everything be
converted to Unicode early on opens up some options.
How useful are those options? Are they worth the price?
Conv
Marvin Humphrey wrote:
I *think* that whether it was invalidly encoded or not wouldn't impact
searching -- it doesn't in KinoSearch. It should only affect display.
I think Java's approach of converting everything to unicode internally
is useful. One must still handle dirty input, but it
On May 17, 2006, at 11:08 AM, Doug Cutting wrote:
Marvin Humphrey wrote:
What I'd like to do is augment my existing patch by making it
possible to specify a particular encoding, both for Lucene and Luke.
What ensures that all documents in fact use the same encoding?
In KinoSearch at this
Marvin Humphrey wrote:
What I'd like to do is augment my existing patch by making it possible
to specify a particular encoding, both for Lucene and Luke.
What ensures that all documents in fact use the same encoding?
The current approach of converting everything to Unicode and then
writing U
On May 16, 2006, at 11:58 PM, Paul Elschot wrote:
Try and invoke luke with the a lucene jar of your choice on the
classpath before luke itself:
java -cp lucene-core-1.9-rc1-dev.jar:lukeall.jar org.getopt.luke.Luke
I tried this on an index built with KinoSearch 0.05, which pre-dates
the addi
On Wednesday 17 May 2006 06:35, Marvin Humphrey wrote:
> Greets,
>
> There does not seem to be a lot of demand for one implementation of
> Lucene to read indexes generated by another implementation of Lucene
> for the purposes of indexing or searching. However, there is a
> demand for index
While you're at it, why not rewrite Luke in Perl as well...
Seems like a great use of your time.
-Original Message-
From: Marvin Humphrey [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 16, 2006 11:36 PM
To: java-dev@lucene.apache.org
Cc: Andrzej Bialecki
Subject: Hacking Luke for bytecount-