[jira] Commented: (LUCENE-826) Language detector

2010-01-24 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804285#action_12804285 ] Ken Krugler commented on LUCENE-826: I think Nutch (and eventually Mahout) plan to use

[jira] Commented: (LUCENE-1343) A replacement for ISOLatin1AccentFilter that does a more thorough job of removing diacritical marks or non-spacing modifiers.

2009-12-06 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786712#action_12786712 ] Ken Krugler commented on LUCENE-1343: - Just to make sure this point doesn't get lost

Re: I wanna contribute a Chinese analyzer to lucene

2009-04-16 Thread Ken Krugler
. -- Ken -- Ken Krugler +1 530-210-6378

Use of Unicode data in Lucene

2009-02-25 Thread Ken Krugler
compatibility with Apache 2.0. Does anybody know whether http://www.unicode.org/copyright.html creates an issue? What's the process for vetting a license? Or is this something I should be posting to a different list? Thanks, -- Ken -- Ken Krugler +1 530-210-6378

Re: TestIndexInput test failures on jdk 1.6/linux after r641303

2009-01-05 Thread Ken Krugler
...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you can't fix it - To unsubscribe, e-mail: java-dev-unsubscr

[jira] Commented: (LUCENE-1343) A replacement for ISOLatin1AccentFilter that does a more thorough job of removing diacritical marks or non-spacing modifiers.

2008-08-14 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12622746#action_12622746 ] Ken Krugler commented on LUCENE-1343: - Hi Robert, So given that you and the Unicode

[jira] Commented: (LUCENE-1343) A replacement for ISOLatin1AccentFilter that does a more thorough job of removing diacritical marks or non-spacing modifiers.

2008-08-13 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12622432#action_12622432 ] Ken Krugler commented on LUCENE-1343: - Hi Robert, FWIW, the issues being discussed

Re: Hadoop RPC for distributed Lucene

2008-07-11 Thread Ken Krugler
PROTECTED][EMAIL PROTECTED] -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you can't fix it

Potential bug in SloppyPhraseScorer

2008-06-24 Thread Ken Krugler
, and the bug isn't obvious. Plus I worry about the probability of introducing a new bug with any modification. If anybody who's touched this code has time to look at the issue and comment, that would be great! Thanks, -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you

Re: How to solve the issue Unable to read entire block; 72 bytes read; expected 512 bytes

2007-11-12 Thread Ken Krugler
/files/svn/svn.apache.org/poi/src/java/org/apache/poi/poifs/storage/HeaderBlockReader.java On line 83. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you can't fix it - To unsubscribe, e-mail: [EMAIL

Re: Analyzers, perfect hash, ICU

2006-01-11 Thread Ken Krugler
to the mailing list a while back, but nothing definitive. FWIW, my experience w/Eclipse 3.1 was that trying to auto-create Eclipse projects using the Ant build file didn't work very well. So we wound up manually creating the project, setting up the classpath, etc. -- Ken -- Ken Krugler

Re: Lucene and UTF-8

2005-09-27 Thread Ken Krugler
or an extended (not in the BMP) Unicode code point. c. Old code is then used to read the index. It may still make sense to defer this change to 2.0, but it's not at the level of changing the format of an index file. -- Ken -- Ken Krugler Krugle, Inc. +1 530-470-9200

Re: Lucene does NOT use UTF-8

2005-08-30 Thread Ken Krugler
On Monday 29 August 2005 19:56, Ken Krugler wrote: Lucene writes strings as a VInt representing the length of the string in Java chars (UTF-16 code units), followed by the character data. But wouldn't UTF-16 mean 2 bytes per character? Yes, UTF-16 means two bytes per code unit. A Unicode

RE: Lucene does NOT use UTF-8.

2005-08-30 Thread Ken Krugler
, August 29, 2005 4:24 PM To: java-dev@lucene.apache.org Subject: Re: Lucene does NOT use UTF-8. Ken Krugler wrote: The remaining issue is dealing with old-format indexes. I think that revving the version number on the segments file would be a good start. This file must be read before any others

Re: Lucene does NOT use UTF-8

2005-08-30 Thread Ken Krugler
Daniel Naber wrote: On Monday 29 August 2005 19:56, Ken Krugler wrote: Lucene writes strings as a VInt representing the length of the string in Java chars (UTF-16 code units), followed by the character data. But wouldn't UTF-16 mean 2 bytes per character? That doesn't seem to be the case

Re: Lucene does NOT use UTF-8.

2005-08-30 Thread Ken Krugler
, -- Ken -- Ken Krugler TransPac Software, Inc. http://www.transpac.com +1 530-470-9200 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene does NOT use UTF-8

2005-08-29 Thread Ken Krugler
that create an interoperability problem. -- Ken -- Ken Krugler TransPac Software, Inc. http://www.transpac.com +1 530-470-9200 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene does NOT use UTF-8

2005-08-29 Thread Ken Krugler
for the low (least significant) surrogate. -- Ken -- Ken Krugler TransPac Software, Inc. http://www.transpac.com +1 530-470-9200 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene does NOT use UTF-8

2005-08-29 Thread Ken Krugler
On Aug 28, 2005, at 11:42 PM, Ken Krugler wrote: I'm not familiar with UTF-8 enough to follow the details of this discussion. I hope other Lucene developers are, so we can resolve this issue anyone raising a hand? I could, but recent posts makes me think this is heading towards

Re: Lucene and UTF-8

2005-08-29 Thread Ken Krugler
. But it shouldn't be too hard to generate. -- Ken -- Ken Krugler TransPac Software, Inc. http://www.transpac.com +1 530-470-9200 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene does NOT use UTF-8.

2005-08-28 Thread Ken Krugler
readers aren't too interested in the on-going discussion. If anybody else would like to be copied, send me an email. -- Ken -- Ken Krugler TransPac Software, Inc. http://www.transpac.com +1 530-470-9200 - To unsubscribe, e-mail