Re: Study Group (WAS Re: Normalized Scoring)

2005-02-07 Thread Kelvin Tan
Hey Paul, thanks for responding. On Sun, 6 Feb 2005 13:26:24 +0100, Paul Elschot wrote:  Tuning the scoring is difficult because one needs to avoid the trap  of optimizing for the test collection and test queries at hand. The  interplays between query structure, coord(), idf() and tf() add to  

RE: Study Group (WAS Re: Normalized Scoring)

2005-02-07 Thread Kelvin Tan
a good forum for IR, please share.  Otis  --- Kelvin Tan [EMAIL PROTECTED] wrote:  Wouldn't it be great if we can form a study-group of Lucene folks  who want to take the next step? I feel uneasy posting non-  Lucene specific questions to dev or user even if its related to  IR.  Feels to me like

Study Group (WAS Re: Normalized Scoring)

2005-02-06 Thread Kelvin Tan
On Sat, 5 Feb 2005 22:10:26 -0800 (PST), Otis Gospodnetic wrote:  Exactly.  Luckily, since then I've learned a bit from lucene-dev  discussions and side IR readings, so some of the topics are making  more sense now.  Otis  --- Kelvin Tan [EMAIL PROTECTED] wrote:  Hi Otis, I was re-reading

Re: Normalized Scoring -- was RE: idf and explain(), was Re: Search and Scoring

2005-02-05 Thread Kelvin Tan
Hi Otis, I was re-reading this whole theoretical thread about idf, scoring, normalization, etc from last Oct and couldn't help laughing out loud when I read your post, coz it summed up what I was thinking the whole time. I think its really great to have people like Chuck and Paul (Eshlot)

Re: cvs commit: jakarta-lucene-sandbox/contributions/javascript/queryConstructor luceneQueryConstructor.html

2004-05-17 Thread Kelvin Tan
retrieving revision 1.3 retrieving revision 1.4 diff -u -r1.3 -r1.4 --- luceneQueryConstructor.html17 May 2004 13:04:42 -1.3 +++ luceneQueryConstructor.html17 May 2004 13:29:24 -1.4 @@ -3,14 +3,16 @@ meta name=Author content=Kelvin Tan titleLucene Query

Re: Bug/enhancement request 21921

2003-09-11 Thread Kelvin Tan
think those two changes are okay. Please submit a patch. I imagine you already made those two changes in your local copy of Lucene and have actually been using it with your highlighting code? Otis --- Kelvin Tan [EMAIL PROTECTED] wrote: On Tue, 9 Sep 2003 15:02:25 -0700 (PDT), Otis

Re: Fwd: Re: [PROPOSAL] Add Lucene Distribution To Mirrors

2003-09-09 Thread Kelvin Tan
On Tue, 9 Sep 2003 15:02:25 -0700 (PDT), Otis Gospodnetic said: What do you think about a 1.3 release? I think we should resolve the JavaCC situation and them make the 1.3 release. Perhaps it would be best to include JavaCC-generated .java files in the CVS, as Doug described the other day.

Text mining/classification

2003-07-27 Thread Kelvin Tan
Here's a pretty cool project at SF http://sourceforge.net/projects/exteca The Exteca platform is an ontology-based technology written in Java for high-quality knowledge management and document categorisation. It can be used in conjunction with search engines. SF Project page says

Normalizer

2003-07-27 Thread Kelvin Tan
I found this SF project doing a search for 'lucene' on SF. http://sourceforge.net/projects/normalizer/ Excerpt says Contextual rule-based text normalization engine written in java, that can be used to implement stemming algorithms or phonetic normalizers. The project includes a french

RE: Lucene

2003-06-06 Thread Kelvin Tan
On Thu, 5 Jun 2003 16:49:18 -0500, Armbrust, Daniel C. said: Maybe you should add another page (or section on the page) that is for people to list the names of their companies or products that are using lucene (with a brief description of what for), as the current page only shows that Lucene is

Re: Indyo

2003-05-27 Thread Kelvin Tan
Bryan, I've removed it from sandbox coz there never was a great deal of interest in it, and the codebase from which I had originated it has moved on, so had no wish to keep maintaining it. Which aspect of Indyo interests you? Kelvin On Tue, 27 May 2003 22:36:39 -0500, Bryan LaPlante said:

Re: Indyo

2003-05-27 Thread Kelvin Tan
: Kelvin Tan [EMAIL PROTECTED] To: Lucene Developers List [EMAIL PROTECTED] Sent: Tuesday, May 27, 2003 10:56 PM Subject: Re: Indyo Bryan, I've removed it from sandbox coz there never was a great deal of interest in it, and the codebase from which I had originated it has moved on, so had no wish

[FAQ] Finding number of occurrences of a given word in a document

2003-01-30 Thread Kelvin Tan
Maybe this is a good FAQ entry, under Searching? http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg01735.html Regards, Kelvin The book giving manifesto - http://how.to/sharethisbook - To unsubscribe,

Re: [FAQ] Finding number of occurrences of a given word in a document

2003-01-30 Thread Kelvin Tan
PROTECTED]/msg01738.html I haven't tested it... Otis --- Kelvin Tan [EMAIL PROTECTED] wrote: Maybe this is a good FAQ entry, under Searching? http://www.mail-archive.com/lucene- [EMAIL PROTECTED]/msg01735.html Regards, Kelvin The book giving manifesto - http://how.to/sharethisbook

Failed Build: Query.java

2003-01-15 Thread Kelvin Tan
Someone's got JDK 1.4 installed...:-) compile: [javac] Compiling 73 source files to C:\checkout\jakarta-lucene\bin\classes [javac] C:\checkout\jakarta-lucene\src\java\org\apache\lucene\search\Query.jav a:175: cannot resolve symbol [javac] symbol : constructor RuntimeException

Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/document Document.java

2003-01-06 Thread Kelvin Tan
I couldn't help noticing that the code formatting for the new methods are different from the rest of the class (Turbine vs Sun). Shouldn't it be corrected? On 7 Jan 2003 02:29:21 -, [EMAIL PROTECTED] said: otis2003/01/06 18:29:21 Modified:src/java/org/apache/lucene/document

Re: cvs commit: jakarta-lucene/src/java/org/apache/lucene/document Document.java

2003-01-06 Thread Kelvin Tan
* just thought I'd point it out... On Mon, 6 Jan 2003 18:36:50 -0800 (PST), Otis Gospodnetic said: Be my guest. --- Kelvin Tan [EMAIL PROTECTED] wrote: I couldn't help noticing that the code formatting for the new methods are different from the rest of the class (Turbine vs Sun). Shouldn't

[REPOST] [Benchmarks] Daniel's numbers

2002-12-11 Thread Kelvin Tan
Please see attached for diff to benchmarks.xml for Daniel's numbers. Thanks Dan! Regards, Kelvin The book giving manifesto - http://how.to/sharethisbook cvs -z9 diff benchmarks.xml (in directory C:\checkout\jakarta-lucene\xdocs\) Index: benchmarks.xml

[Benchmarks] Daniel's numbers

2002-12-09 Thread Kelvin Tan
Please see attached for diff to benchmarks.xml for Daniel's numbers. Thanks Dan! Regards, Kelvin The book giving manifesto - http://how.to/sharethisbook cvs -z9 diff benchmarks.xml (in directory C:\checkout\jakarta-lucene\xdocs\) Index: benchmarks.xml

Re: How do I get TermPositions for a given document?

2002-10-24 Thread Kelvin Tan
Dmitry would need commit access to the Lucene-sandbox to add the code in, I believe... Regards, Kelvin On Wed, 23 Oct 2002 23:21:45 -0700, Peter Carlson wrote: Into the sandbox area sound great. Just add it to the contributions area in a project called TermPositions or something more clever if

Re: Updated Site - Indyo Tutorial

2002-09-17 Thread Kelvin Tan
Peter, The index.xml file in Lucene xdocs should contain a link to it because I modified it with: subsection name=Indyo p Indyo is a datasource-independent Lucene indexing framework. /p p A tutorial for using Indyo can be found a href=indyo/tutorial.htmlhere/a. /p /subsection It _should_ work,

Re: Configuration RFC

2002-07-14 Thread Kelvin Tan
[snip] Having a framework for dealing with multiple file types (text, HTML, PDF, Word, etc) is critical. There was a proposal that floated around a few months ago which should be dusted off. Indyo, the indexing framework I checked into Sandbox (under the appex project) handles this aspect of

SearchService

2002-06-11 Thread Kelvin Tan
there, and we could discuss refactorings to introduce to make it implementation-neutral. Regards, Kelvin Tan -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]

Re: Proposal for Lucene

2002-05-04 Thread Kelvin Tan
Andy, I'm up for it. I've made further changes to what I previously posted and am keen on getting it into sandbox. K - Original Message - From: Andrew C. Oliver [EMAIL PROTECTED] To: Lucene Developers List [EMAIL PROTECTED] Sent: Saturday, May 04, 2002 10:23 AM Subject: Re: Proposal for

Re: [VOTE] Lucene Sandbox committer nomination

2002-05-01 Thread Kelvin Tan
Great! Thanks for the support. Regards, Kelvin - Original Message - From: Peter Carlson [EMAIL PROTECTED] To: Lucene Developers List [EMAIL PROTECTED] Cc: Kelvin Tan [EMAIL PROTECTED] Sent: Wednesday, May 01, 2002 6:46 AM Subject: Re: [VOTE] Lucene Sandbox committer nomination That's

Re: [VOTE] Lucene Sandbox committer nomination

2002-04-24 Thread Kelvin Tan
PROTECTED] Sent: Wednesday, April 24, 2002 11:56 PM Subject: Re: [VOTE] Lucene Sandbox committer nomination I'm using his contribution, so +1, if he wants it, of course. Otis --- Peter Carlson [EMAIL PROTECTED] wrote: I am nominating Kelvin Tan as a contributor to the Lucene-Sandbox Project

Minor javadoc patch for DateFilter

2002-04-09 Thread Kelvin Tan
Copy and paste javadoc error in DateFilter. Really minor. Please see attached. Regards, Kelvin Tan Relevanz Pte Ltd http://www.relevanz.com 180B Bencoolen St. The Bencoolen, #04-01 S(189648) Tel: 6238 6229 Fax: 6337 4417 cvs diff DateFilter.java (in directory C:\checkout\jakarta-lucene

Search framework

2002-03-01 Thread Kelvin Tan
Torque DB index that Reptile uses. Sounds interesting doesn't it? :) Regards, Kelvin Tan Relevanz Pte Ltd http://www.relevanz.com 180B Bencoolen St. The Bencoolen, #04-01 S(189648) Tel: 238 6229 Fax: 337 4417 -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e

Re: Proposal for Lucene

2002-02-26 Thread Kelvin Tan
Mark, My web server is acting all weird -- somehow this zip file refuses to download completely via HTTP (both in IE and Netscape, but downloading via FTP is fine). The workaround is that I've renamed it to http://www.relevanz.com/search_full.z. If your friendly zip program doesn't recognize it

Re: Proposal for Lucene

2002-02-26 Thread Kelvin Tan
the equivalent of the commons-sandbox or turbine-stratum, a workplace kind-of. Regards, Kelvin Has anyone else looked at this? Any objections? -Andy On Sat, 2002-02-09 at 07:58, Kelvin Tan wrote: Here it is. Released under APL (I kinda copied and pasted the license from some Fulcrum code). Some

Re: Proposal for Lucene

2002-02-09 Thread Kelvin Tan
] Sent: Friday, February 08, 2002 9:18 PM Subject: Re: Proposal for Lucene Is this open source? APL'd? Where can I look at it? -Andy On Thu, 2002-02-07 at 20:27, Kelvin Tan wrote: Great suggestions all around, and I'm pretty much in agreement with what's been said. In my app, I've built