Re: Hacking Luke for bytecount-based strings

2006-05-17 Thread Marvin Humphrey
On May 17, 2006, at 2:04 PM, Doug Cutting wrote: Detecting invalidly encoded text later doesn't help anything in and of itself; lifting the requirement that everything be converted to Unicode early on opens up some options. How useful are those options? Are they worth the price? Conv

Re: Hacking Luke for bytecount-based strings

2006-05-17 Thread Doug Cutting
Marvin Humphrey wrote: I *think* that whether it was invalidly encoded or not wouldn't impact searching -- it doesn't in KinoSearch. It should only affect display. I think Java's approach of converting everything to unicode internally is useful. One must still handle dirty input, but it

Re: Hacking Luke for bytecount-based strings

2006-05-17 Thread Marvin Humphrey
On May 17, 2006, at 11:08 AM, Doug Cutting wrote: Marvin Humphrey wrote: What I'd like to do is augment my existing patch by making it possible to specify a particular encoding, both for Lucene and Luke. What ensures that all documents in fact use the same encoding? In KinoSearch at this

Re: Hacking Luke for bytecount-based strings

2006-05-17 Thread Doug Cutting
Marvin Humphrey wrote: What I'd like to do is augment my existing patch by making it possible to specify a particular encoding, both for Lucene and Luke. What ensures that all documents in fact use the same encoding? The current approach of converting everything to Unicode and then writing U

Re: Hacking Luke for bytecount-based strings

2006-05-17 Thread Marvin Humphrey
On May 16, 2006, at 11:58 PM, Paul Elschot wrote: Try and invoke luke with the a lucene jar of your choice on the classpath before luke itself: java -cp lucene-core-1.9-rc1-dev.jar:lukeall.jar org.getopt.luke.Luke I tried this on an index built with KinoSearch 0.05, which pre-dates the addi

Re: Hacking Luke for bytecount-based strings

2006-05-16 Thread Paul Elschot
On Wednesday 17 May 2006 06:35, Marvin Humphrey wrote: > Greets, > > There does not seem to be a lot of demand for one implementation of > Lucene to read indexes generated by another implementation of Lucene > for the purposes of indexing or searching. However, there is a > demand for index

RE: Hacking Luke for bytecount-based strings

2006-05-16 Thread Robert Engels
While you're at it, why not rewrite Luke in Perl as well... Seems like a great use of your time. -Original Message- From: Marvin Humphrey [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 16, 2006 11:36 PM To: java-dev@lucene.apache.org Cc: Andrzej Bialecki Subject: Hacking Luke for bytecount-