Re: Clustering lucene's results

2004-10-07 Thread Dawid Weiss
Hi William,
Ok, here is some demo code I've put together that shows how you can 
achieve clustering of Lucene's results. I hope this will get you started 
on your projects. If you have questions, please don't hesitate to ask -- 
cross posts to carrot2-developers would be a good idea too.

The code (plus the binaries so that you don't have to check out all of 
Carrot2 ;) are at:
http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip

Take a look at Demo.java -- it is the main link between Lucene and 
Carrot. Play with the parameters, I used 100 as the number of search 
results to be clustered. Adjust it to your needs.

int start = 0;
int requiredHits = 100;
I hope the code will be self-explanatory.
Good luck,
Dawid
From the readme file:
An example of using Carrot2 components to clustering search
results from Lucene.
===
Prerequisities
--
You must have an index created with Lucene and containing
documents with the following fields: url, title, summary.
The Lucene demo works with exactly these fields -- I just indexed
all of Lucene's source code and documentation using the following line:
mkdir index
java -Djava.ext.dirs=build org.apache.lucene.demo.IndexHTML -create 
-index index .

The index is now in 'index' folder.
Remember that the quality of snippets and titles heavily influences the
output of the clustering; in fact, the above example index of Lucene's 
API is
not too good because most queries will return nonsensical cluster labels
(see below).

Building Carrot2-Lucene demo

Basically you should have all of Carrot2 source code checked out and
issue the building command:
ant -Dcopy.dependencies=true
All of the required libraries and Carrot2 components will end up
in 'tmp/dist/deps-carrot2-lucene-example-jar' folder.
You can also spare yourself some time and download precompiled binaries
I've put at:
http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip
Now, once you have the compiled binaries, issue the following command
(all on one line of course):
java -Djava.ext.dirs=tmp\dist;tmp\dist\deps-carrot2-lucene-example-jar \
com.dawidweiss.carrot.lucene.Demo index query
The first argument is the location of the Lucene's index created before. 
The second argument
is a query. In the output you should have clusters and max. three 
documents from every cluster:

Results for: query
Timings: index opened in: 0,181s, search: 0,13s, clustering: 0,721s
 : Search Lucene Rc1 Dev API
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/class-use/Query.html
  Uses of Class org.apache.lucene.search.Query (Lucene 1.5-rc1-dev API)
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-summary.html
  org.apache.lucene.search (Lucene 1.5-rc1-dev API)
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-use.html
  Uses of Package org.apache.lucene.search (Lucene 1.5-rc1-dev API)
  (and 19 more)

 : Jakarta Lucene
- F:/Repositories/cvs.apache.org/jakarta-lucene/src/java/overview.html
  Jakarta Lucene API
- F:/Repositories/cvs.apache.org/jakarta-lucene/docs/whoweare.html
  Jakarta Lucene - Who We Are - Jakarta Lucene
- F:/Repositories/cvs.apache.org/jakarta-lucene/docs/index.html
  Jakarta Lucene - Overview - Jakarta Lucene
  (and 12 more)
If you look at the source code of Demo.java, there are plenty of things
apt for customization -- number of results from each cluster, number of 
displayed
clusters (I would cut it to some reasonable number, say 10 or 15 -- the 
further a
cluster is from the top, the less it is likely to be important). Also keep
in mind that some of Carrot2 components produce hierarchical clusters. 
This demonstration
works with flat version of Lingo algorithm, so you don't need to worry 
about it.

Hope this gets you started with using Carrot2 and Lucene.
Please let me know about any successes or failures.
Dawid
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Clustering lucene's results

2004-10-07 Thread Albert Vila
That's great, thanks dawid.
Just a question, how can I modify your code in order to use the 
carrot2-output-xsltrenderer to output the clustering results in a html page?

Can you provide an example?
Thanks
Dawid Weiss wrote:
Hi William,
Ok, here is some demo code I've put together that shows how you can 
achieve clustering of Lucene's results. I hope this will get you 
started on your projects. If you have questions, please don't hesitate 
to ask -- cross posts to carrot2-developers would be a good idea too.

The code (plus the binaries so that you don't have to check out all of 
Carrot2 ;) are at:
http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip

Take a look at Demo.java -- it is the main link between Lucene and 
Carrot. Play with the parameters, I used 100 as the number of search 
results to be clustered. Adjust it to your needs.

int start = 0;
int requiredHits = 100;
I hope the code will be self-explanatory.
Good luck,
Dawid
From the readme file:
An example of using Carrot2 components to clustering search
results from Lucene.
===
Prerequisities
--
You must have an index created with Lucene and containing
documents with the following fields: url, title, summary.
The Lucene demo works with exactly these fields -- I just indexed
all of Lucene's source code and documentation using the following line:
mkdir index
java -Djava.ext.dirs=build org.apache.lucene.demo.IndexHTML -create 
-index index .

The index is now in 'index' folder.
Remember that the quality of snippets and titles heavily influences the
output of the clustering; in fact, the above example index of Lucene's 
API is
not too good because most queries will return nonsensical cluster labels
(see below).

Building Carrot2-Lucene demo

Basically you should have all of Carrot2 source code checked out and
issue the building command:
ant -Dcopy.dependencies=true
All of the required libraries and Carrot2 components will end up
in 'tmp/dist/deps-carrot2-lucene-example-jar' folder.
You can also spare yourself some time and download precompiled binaries
I've put at:
http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip
Now, once you have the compiled binaries, issue the following command
(all on one line of course):
java -Djava.ext.dirs=tmp\dist;tmp\dist\deps-carrot2-lucene-example-jar \
com.dawidweiss.carrot.lucene.Demo index query
The first argument is the location of the Lucene's index created 
before. The second argument
is a query. In the output you should have clusters and max. three 
documents from every cluster:

Results for: query
Timings: index opened in: 0,181s, search: 0,13s, clustering: 0,721s
 : Search Lucene Rc1 Dev API
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/class-use/Query.html 

  Uses of Class org.apache.lucene.search.Query (Lucene 1.5-rc1-dev 
API)
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-summary.html 

  org.apache.lucene.search (Lucene 1.5-rc1-dev API)
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-use.html 

  Uses of Package org.apache.lucene.search (Lucene 1.5-rc1-dev API)
  (and 19 more)
 : Jakarta Lucene
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/src/java/overview.html
  Jakarta Lucene API
- F:/Repositories/cvs.apache.org/jakarta-lucene/docs/whoweare.html
  Jakarta Lucene - Who We Are - Jakarta Lucene
- F:/Repositories/cvs.apache.org/jakarta-lucene/docs/index.html
  Jakarta Lucene - Overview - Jakarta Lucene
  (and 12 more)

If you look at the source code of Demo.java, there are plenty of things
apt for customization -- number of results from each cluster, number 
of displayed
clusters (I would cut it to some reasonable number, say 10 or 15 -- 
the further a
cluster is from the top, the less it is likely to be important). 
Also keep
in mind that some of Carrot2 components produce hierarchical clusters. 
This demonstration
works with flat version of Lingo algorithm, so you don't need to 
worry about it.

Hope this gets you started with using Carrot2 and Lucene.
Please let me know about any successes or failures.
Dawid
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--
Albert Vila
Director de proyectos I+D
http://www.imente.com
902 933 242
[iMente La informacin con ms beneficios]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Clustering lucene's results

2004-10-07 Thread William W
Thanks Dawid ! :)

From: Dawid Weiss [EMAIL PROTECTED]
Reply-To: Lucene Users List [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Subject: Re: Clustering lucene's results
Date: Thu, 07 Oct 2004 10:39:26 +0200
Hi William,
Ok, here is some demo code I've put together that shows how you can achieve 
clustering of Lucene's results. I hope this will get you started on your 
projects. If you have questions, please don't hesitate to ask -- cross 
posts to carrot2-developers would be a good idea too.

The code (plus the binaries so that you don't have to check out all of 
Carrot2 ;) are at:
http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip

Take a look at Demo.java -- it is the main link between Lucene and Carrot. 
Play with the parameters, I used 100 as the number of search results to be 
clustered. Adjust it to your needs.

int start = 0;
int requiredHits = 100;
I hope the code will be self-explanatory.
Good luck,
Dawid
From the readme file:
An example of using Carrot2 components to clustering search
results from Lucene.
===
Prerequisities
--
You must have an index created with Lucene and containing
documents with the following fields: url, title, summary.
The Lucene demo works with exactly these fields -- I just indexed
all of Lucene's source code and documentation using the following line:
mkdir index
java -Djava.ext.dirs=build org.apache.lucene.demo.IndexHTML -create -index 
index .

The index is now in 'index' folder.
Remember that the quality of snippets and titles heavily influences the
output of the clustering; in fact, the above example index of Lucene's API 
is
not too good because most queries will return nonsensical cluster labels
(see below).

Building Carrot2-Lucene demo

Basically you should have all of Carrot2 source code checked out and
issue the building command:
ant -Dcopy.dependencies=true
All of the required libraries and Carrot2 components will end up
in 'tmp/dist/deps-carrot2-lucene-example-jar' folder.
You can also spare yourself some time and download precompiled binaries
I've put at:
http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip
Now, once you have the compiled binaries, issue the following command
(all on one line of course):
java -Djava.ext.dirs=tmp\dist;tmp\dist\deps-carrot2-lucene-example-jar \
com.dawidweiss.carrot.lucene.Demo index query
The first argument is the location of the Lucene's index created before. 
The second argument
is a query. In the output you should have clusters and max. three documents 
from every cluster:

Results for: query
Timings: index opened in: 0,181s, search: 0,13s, clustering: 0,721s
 : Search Lucene Rc1 Dev API
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/class-use/Query.html
  Uses of Class org.apache.lucene.search.Query (Lucene 1.5-rc1-dev 
API)
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-summary.html
  org.apache.lucene.search (Lucene 1.5-rc1-dev API)
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-use.html
  Uses of Package org.apache.lucene.search (Lucene 1.5-rc1-dev API)
  (and 19 more)

 : Jakarta Lucene
- F:/Repositories/cvs.apache.org/jakarta-lucene/src/java/overview.html
  Jakarta Lucene API
- F:/Repositories/cvs.apache.org/jakarta-lucene/docs/whoweare.html
  Jakarta Lucene - Who We Are - Jakarta Lucene
- F:/Repositories/cvs.apache.org/jakarta-lucene/docs/index.html
  Jakarta Lucene - Overview - Jakarta Lucene
  (and 12 more)
If you look at the source code of Demo.java, there are plenty of things
apt for customization -- number of results from each cluster, number of 
displayed
clusters (I would cut it to some reasonable number, say 10 or 15 -- the 
further a
cluster is from the top, the less it is likely to be important). Also 
keep
in mind that some of Carrot2 components produce hierarchical clusters. This 
demonstration
works with flat version of Lingo algorithm, so you don't need to worry 
about it.

Hope this gets you started with using Carrot2 and Lucene.
Please let me know about any successes or failures.
Dawid
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
_
Check out Election 2004 for up-to-date election news, plus voter tools and 
more! http://special.msn.com/msn/election2004.armx

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Clustering lucene's results

2004-10-07 Thread Dawid Weiss
No problem. Let people know if it worked for you -- I look forward to 
hearing your experiences (good or bad).

Dawid
William W wrote:
Thanks Dawid ! :)

From: Dawid Weiss [EMAIL PROTECTED]
Reply-To: Lucene Users List [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Subject: Re: Clustering lucene's results
Date: Thu, 07 Oct 2004 10:39:26 +0200
Hi William,
Ok, here is some demo code I've put together that shows how you can 
achieve clustering of Lucene's results. I hope this will get you 
started on your projects. If you have questions, please don't hesitate 
to ask -- cross posts to carrot2-developers would be a good idea too.

The code (plus the binaries so that you don't have to check out all of 
Carrot2 ;) are at:
http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip

Take a look at Demo.java -- it is the main link between Lucene and 
Carrot. Play with the parameters, I used 100 as the number of search 
results to be clustered. Adjust it to your needs.

int start = 0;
int requiredHits = 100;
I hope the code will be self-explanatory.
Good luck,
Dawid
From the readme file:
An example of using Carrot2 components to clustering search
results from Lucene.
===
Prerequisities
--
You must have an index created with Lucene and containing
documents with the following fields: url, title, summary.
The Lucene demo works with exactly these fields -- I just indexed
all of Lucene's source code and documentation using the following line:
mkdir index
java -Djava.ext.dirs=build org.apache.lucene.demo.IndexHTML -create 
-index index .

The index is now in 'index' folder.
Remember that the quality of snippets and titles heavily influences the
output of the clustering; in fact, the above example index of Lucene's 
API is
not too good because most queries will return nonsensical cluster labels
(see below).

Building Carrot2-Lucene demo

Basically you should have all of Carrot2 source code checked out and
issue the building command:
ant -Dcopy.dependencies=true
All of the required libraries and Carrot2 components will end up
in 'tmp/dist/deps-carrot2-lucene-example-jar' folder.
You can also spare yourself some time and download precompiled binaries
I've put at:
http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip
Now, once you have the compiled binaries, issue the following command
(all on one line of course):
java -Djava.ext.dirs=tmp\dist;tmp\dist\deps-carrot2-lucene-example-jar \
com.dawidweiss.carrot.lucene.Demo index query
The first argument is the location of the Lucene's index created 
before. The second argument
is a query. In the output you should have clusters and max. three 
documents from every cluster:

Results for: query
Timings: index opened in: 0,181s, search: 0,13s, clustering: 0,721s
 : Search Lucene Rc1 Dev API
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/class-use/Query.html 

  Uses of Class org.apache.lucene.search.Query (Lucene 1.5-rc1-dev 
API)
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-summary.html 

  org.apache.lucene.search (Lucene 1.5-rc1-dev API)
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-use.html 

  Uses of Package org.apache.lucene.search (Lucene 1.5-rc1-dev API)
  (and 19 more)
 : Jakarta Lucene
- 
F:/Repositories/cvs.apache.org/jakarta-lucene/src/java/overview.html
  Jakarta Lucene API
- F:/Repositories/cvs.apache.org/jakarta-lucene/docs/whoweare.html
  Jakarta Lucene - Who We Are - Jakarta Lucene
- F:/Repositories/cvs.apache.org/jakarta-lucene/docs/index.html
  Jakarta Lucene - Overview - Jakarta Lucene
  (and 12 more)

If you look at the source code of Demo.java, there are plenty of things
apt for customization -- number of results from each cluster, number 
of displayed
clusters (I would cut it to some reasonable number, say 10 or 15 -- 
the further a
cluster is from the top, the less it is likely to be important). 
Also keep
in mind that some of Carrot2 components produce hierarchical clusters. 
This demonstration
works with flat version of Lingo algorithm, so you don't need to 
worry about it.

Hope this gets you started with using Carrot2 and Lucene.
Please let me know about any successes or failures.
Dawid
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
_
Check out Election 2004 for up-to-date election news, plus voter tools 
and more! http://special.msn.com/msn/election2004.armx

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED

RE: Clustering lucene's results

2004-09-23 Thread William W
Hi Dawid,
I would like to use Carrot2 with lucene. Do you have examples ?
Thanks a lot,
William.

From: Dawid Weiss [EMAIL PROTECTED]
Reply-To: Lucene Users List [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Clustering lucene's results
Date: Thu, 23 Sep 2004 13:36:03 +0200
Dear all,
I saw a post about an attempt to integrate Carrot2 with Lucene. It was a 
while ago, so I'm curious if any outcome has been achieved.

Anyway, as the project coordinator I can offer my help with such 
integration; if you're looking for some ready-to-use code then there is a 
clustering plugin for Nutch that integrates one of the clustering 
algorithms from Carrot2 with Nutch; I'm sure porting it to Lucene wouldn't 
be a big problem.

Ragards,
Dawid
_
List sprawdzony skanerem poczty mks_vir ( http://www.mks.com.pl )
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
_
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
hthttp://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Clustering lucene's results

2004-09-23 Thread Dawid Weiss
Hi William,
No, I don't have examples because I never used Lucene directly. If you 
provide me with a sample index and an API that executes a query on this 
index (I need document titles, summaries, or snippets and an anchor 
(identifier), can be an URL).

Send me such a snippet and I'll try to write the integration code with 
Lucene. It is only a matter of writing a simple InputComponent instance 
and this is really trivial (see Nutch's plugin code).

Dawid
William W wrote:
Hi Dawid,
I would like to use Carrot2 with lucene. Do you have examples ?
Thanks a lot,
William.

From: Dawid Weiss [EMAIL PROTECTED]
Reply-To: Lucene Users List [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Clustering lucene's results
Date: Thu, 23 Sep 2004 13:36:03 +0200
Dear all,
I saw a post about an attempt to integrate Carrot2 with Lucene. It was 
a while ago, so I'm curious if any outcome has been achieved.

Anyway, as the project coordinator I can offer my help with such 
integration; if you're looking for some ready-to-use code then there 
is a clustering plugin for Nutch that integrates one of the clustering 
algorithms from Carrot2 with Nutch; I'm sure porting it to Lucene 
wouldn't be a big problem.

Ragards,
Dawid
_
List sprawdzony skanerem poczty mks_vir ( http://www.mks.com.pl )
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
_
Express yourself instantly with MSN Messenger! Download today - it's 
FREE! hthttp://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 From - Thu
_
List sprawdzony skanerem poczty mks_vir ( http://www.mks.com.pl )
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Clustering lucene's results

2004-09-23 Thread Andrzej Bialecki
Dawid Weiss wrote:
Hi William,
No, I don't have examples because I never used Lucene directly. If you 
provide me with a sample index and an API that executes a query on this 
index (I need document titles, summaries, or snippets and an anchor 
(identifier), can be an URL).
Hi Dawid :-)
I believe the approach to this component should be that you first 
initialize it by reading a mapping of Lucene index field names to 
logical names (metadata) like title, url, body, etc. The reason is 
that each index uses its own metadata schema, i.e. in Lucene-speak, the 
field names.

Moreover, when you execute a query you get just a document id plus its 
score. It's up to you to build a snippet. There is a code in the 
jakarta-lucene-sandbox CVS repo. (highlighter) to create snippets from 
the query and the hit list, take a look at this...

Send me such a snippet and I'll try to write the integration code with 
Lucene. It is only a matter of writing a simple InputComponent instance 
and this is really trivial (see Nutch's plugin code).
The basic usage scenario is that you open the IndexReader (either using 
directory name as a String or a Directory instance), and then create a 
Query instance, usually using QueryParser, and finally you search using 
IndexSearcher. You get a list of Hits, which you can use to get scores, 
and the contents of the documents. Take a look at the IndexFiles and 
SearchFiles classes in org.apache.lucene.demo package (under /src/demo).

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Clustering lucene's results

2004-09-23 Thread Dawid Weiss
Hi Andrzej :)
Yep, ok, I'll take a look at it. After I come back from abroad (next 
week). I just wanted to save myself some time and have an already 
written code that fetches the information we need for clustering; you 
know what I mean, I'm sure. But I'll start from scratch when I get back.

D.
Andrzej Bialecki wrote:
Dawid Weiss wrote:
Hi William,
No, I don't have examples because I never used Lucene directly. If you 
provide me with a sample index and an API that executes a query on 
this index (I need document titles, summaries, or snippets and an 
anchor (identifier), can be an URL).

Hi Dawid :-)
I believe the approach to this component should be that you first 
initialize it by reading a mapping of Lucene index field names to 
logical names (metadata) like title, url, body, etc. The reason is 
that each index uses its own metadata schema, i.e. in Lucene-speak, the 
field names.

Moreover, when you execute a query you get just a document id plus its 
score. It's up to you to build a snippet. There is a code in the 
jakarta-lucene-sandbox CVS repo. (highlighter) to create snippets from 
the query and the hit list, take a look at this...

Send me such a snippet and I'll try to write the integration code with 
Lucene. It is only a matter of writing a simple InputComponent 
instance and this is really trivial (see Nutch's plugin code).

The basic usage scenario is that you open the IndexReader (either using 
directory name as a String or a Directory instance), and then create a 
Query instance, usually using QueryParser, and finally you search using 
IndexSearcher. You get a list of Hits, which you can use to get scores, 
and the contents of the documents. Take a look at the IndexFiles and 
SearchFiles classes in org.apache.lucene.demo package (under /src/demo).

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Clustering lucene's results

2004-09-23 Thread William W
Hi Dawid,
The demos (under /src/demo) are very good. They have the basic usage 
scenario.
Thanks Andrzej.
William.


Dawid Weiss wrote:
Hi William,
No, I don't have examples because I never used Lucene directly. If you 
provide me with a sample index and an API that executes a query on this 
index (I need document titles, summaries, or snippets and an anchor 
(identifier), can be an URL).
Hi Dawid :-)
I believe the approach to this component should be that you first 
initialize it by reading a mapping of Lucene index field names to logical 
names (metadata) like title, url, body, etc. The reason is that each index 
uses its own metadata schema, i.e. in Lucene-speak, the field names.

Moreover, when you execute a query you get just a document id plus its 
score. It's up to you to build a snippet. There is a code in the 
jakarta-lucene-sandbox CVS repo. (highlighter) to create snippets from the 
query and the hit list, take a look at this...

Send me such a snippet and I'll try to write the integration code with 
Lucene. It is only a matter of writing a simple InputComponent instance 
and this is really trivial (see Nutch's plugin code).
The basic usage scenario is that you open the IndexReader (either using 
directory name as a String or a Directory instance), and then create a 
Query instance, usually using QueryParser, and finally you search using 
IndexSearcher. You get a list of Hits, which you can use to get scores, and 
the contents of the documents. Take a look at the IndexFiles and 
SearchFiles classes in org.apache.lucene.demo package (under /src/demo).

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
_
Get ready for school! Find articles, homework help and more in the Back to 
School Guide! http://special.msn.com/network/04backtoschool.armx

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Clustering lucene's results

2004-09-23 Thread Dawid Weiss
yeah... I know there have to be demos... I tried to be lazy, you know :)
Anyway, as I told Andrzej -- I'll take a look at it (and with a 
pleasure) after I come back. i don't think the delay will matter much. 
And if it does, ask Andrzej -- he has excellent experience with both 
projects -- he's just very shy by nature and doesn't talk much, hehe.

D.
William W wrote:
Hi Dawid,
The demos (under /src/demo) are very good. They have the basic usage 
scenario.
Thanks Andrzej.
William.


Dawid Weiss wrote:
Hi William,
No, I don't have examples because I never used Lucene directly. If 
you provide me with a sample index and an API that executes a query 
on this index (I need document titles, summaries, or snippets and an 
anchor (identifier), can be an URL).

Hi Dawid :-)
I believe the approach to this component should be that you first 
initialize it by reading a mapping of Lucene index field names to 
logical names (metadata) like title, url, body, etc. The reason is 
that each index uses its own metadata schema, i.e. in Lucene-speak, 
the field names.

Moreover, when you execute a query you get just a document id plus its 
score. It's up to you to build a snippet. There is a code in the 
jakarta-lucene-sandbox CVS repo. (highlighter) to create snippets from 
the query and the hit list, take a look at this...

Send me such a snippet and I'll try to write the integration code 
with Lucene. It is only a matter of writing a simple InputComponent 
instance and this is really trivial (see Nutch's plugin code).

The basic usage scenario is that you open the IndexReader (either 
using directory name as a String or a Directory instance), and then 
create a Query instance, usually using QueryParser, and finally you 
search using IndexSearcher. You get a list of Hits, which you can use 
to get scores, and the contents of the documents. Take a look at the 
IndexFiles and SearchFiles classes in org.apache.lucene.demo package 
(under /src/demo).

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
_
Get ready for school! Find articles, homework help and more in the Back 
to School Guide! http://special.msn.com/network/04backtoschool.armx

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]