Re: Clustering lucene's results
Hi William, Ok, here is some demo code I've put together that shows how you can achieve clustering of Lucene's results. I hope this will get you started on your projects. If you have questions, please don't hesitate to ask -- cross posts to carrot2-developers would be a good idea too. The code (plus the binaries so that you don't have to check out all of Carrot2 ;) are at: http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip Take a look at Demo.java -- it is the main link between Lucene and Carrot. Play with the parameters, I used 100 as the number of search results to be clustered. Adjust it to your needs. int start = 0; int requiredHits = 100; I hope the code will be self-explanatory. Good luck, Dawid From the readme file: An example of using Carrot2 components to clustering search results from Lucene. === Prerequisities -- You must have an index created with Lucene and containing documents with the following fields: url, title, summary. The Lucene demo works with exactly these fields -- I just indexed all of Lucene's source code and documentation using the following line: mkdir index java -Djava.ext.dirs=build org.apache.lucene.demo.IndexHTML -create -index index . The index is now in 'index' folder. Remember that the quality of snippets and titles heavily influences the output of the clustering; in fact, the above example index of Lucene's API is not too good because most queries will return nonsensical cluster labels (see below). Building Carrot2-Lucene demo Basically you should have all of Carrot2 source code checked out and issue the building command: ant -Dcopy.dependencies=true All of the required libraries and Carrot2 components will end up in 'tmp/dist/deps-carrot2-lucene-example-jar' folder. You can also spare yourself some time and download precompiled binaries I've put at: http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip Now, once you have the compiled binaries, issue the following command (all on one line of course): java -Djava.ext.dirs=tmp\dist;tmp\dist\deps-carrot2-lucene-example-jar \ com.dawidweiss.carrot.lucene.Demo index query The first argument is the location of the Lucene's index created before. The second argument is a query. In the output you should have clusters and max. three documents from every cluster: Results for: query Timings: index opened in: 0,181s, search: 0,13s, clustering: 0,721s : Search Lucene Rc1 Dev API - F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/class-use/Query.html Uses of Class org.apache.lucene.search.Query (Lucene 1.5-rc1-dev API) - F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-summary.html org.apache.lucene.search (Lucene 1.5-rc1-dev API) - F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-use.html Uses of Package org.apache.lucene.search (Lucene 1.5-rc1-dev API) (and 19 more) : Jakarta Lucene - F:/Repositories/cvs.apache.org/jakarta-lucene/src/java/overview.html Jakarta Lucene API - F:/Repositories/cvs.apache.org/jakarta-lucene/docs/whoweare.html Jakarta Lucene - Who We Are - Jakarta Lucene - F:/Repositories/cvs.apache.org/jakarta-lucene/docs/index.html Jakarta Lucene - Overview - Jakarta Lucene (and 12 more) If you look at the source code of Demo.java, there are plenty of things apt for customization -- number of results from each cluster, number of displayed clusters (I would cut it to some reasonable number, say 10 or 15 -- the further a cluster is from the top, the less it is likely to be important). Also keep in mind that some of Carrot2 components produce hierarchical clusters. This demonstration works with flat version of Lingo algorithm, so you don't need to worry about it. Hope this gets you started with using Carrot2 and Lucene. Please let me know about any successes or failures. Dawid - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Clustering lucene's results
That's great, thanks dawid. Just a question, how can I modify your code in order to use the carrot2-output-xsltrenderer to output the clustering results in a html page? Can you provide an example? Thanks Dawid Weiss wrote: Hi William, Ok, here is some demo code I've put together that shows how you can achieve clustering of Lucene's results. I hope this will get you started on your projects. If you have questions, please don't hesitate to ask -- cross posts to carrot2-developers would be a good idea too. The code (plus the binaries so that you don't have to check out all of Carrot2 ;) are at: http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip Take a look at Demo.java -- it is the main link between Lucene and Carrot. Play with the parameters, I used 100 as the number of search results to be clustered. Adjust it to your needs. int start = 0; int requiredHits = 100; I hope the code will be self-explanatory. Good luck, Dawid From the readme file: An example of using Carrot2 components to clustering search results from Lucene. === Prerequisities -- You must have an index created with Lucene and containing documents with the following fields: url, title, summary. The Lucene demo works with exactly these fields -- I just indexed all of Lucene's source code and documentation using the following line: mkdir index java -Djava.ext.dirs=build org.apache.lucene.demo.IndexHTML -create -index index . The index is now in 'index' folder. Remember that the quality of snippets and titles heavily influences the output of the clustering; in fact, the above example index of Lucene's API is not too good because most queries will return nonsensical cluster labels (see below). Building Carrot2-Lucene demo Basically you should have all of Carrot2 source code checked out and issue the building command: ant -Dcopy.dependencies=true All of the required libraries and Carrot2 components will end up in 'tmp/dist/deps-carrot2-lucene-example-jar' folder. You can also spare yourself some time and download precompiled binaries I've put at: http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip Now, once you have the compiled binaries, issue the following command (all on one line of course): java -Djava.ext.dirs=tmp\dist;tmp\dist\deps-carrot2-lucene-example-jar \ com.dawidweiss.carrot.lucene.Demo index query The first argument is the location of the Lucene's index created before. The second argument is a query. In the output you should have clusters and max. three documents from every cluster: Results for: query Timings: index opened in: 0,181s, search: 0,13s, clustering: 0,721s : Search Lucene Rc1 Dev API - F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/class-use/Query.html Uses of Class org.apache.lucene.search.Query (Lucene 1.5-rc1-dev API) - F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-summary.html org.apache.lucene.search (Lucene 1.5-rc1-dev API) - F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-use.html Uses of Package org.apache.lucene.search (Lucene 1.5-rc1-dev API) (and 19 more) : Jakarta Lucene - F:/Repositories/cvs.apache.org/jakarta-lucene/src/java/overview.html Jakarta Lucene API - F:/Repositories/cvs.apache.org/jakarta-lucene/docs/whoweare.html Jakarta Lucene - Who We Are - Jakarta Lucene - F:/Repositories/cvs.apache.org/jakarta-lucene/docs/index.html Jakarta Lucene - Overview - Jakarta Lucene (and 12 more) If you look at the source code of Demo.java, there are plenty of things apt for customization -- number of results from each cluster, number of displayed clusters (I would cut it to some reasonable number, say 10 or 15 -- the further a cluster is from the top, the less it is likely to be important). Also keep in mind that some of Carrot2 components produce hierarchical clusters. This demonstration works with flat version of Lingo algorithm, so you don't need to worry about it. Hope this gets you started with using Carrot2 and Lucene. Please let me know about any successes or failures. Dawid - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Albert Vila Director de proyectos I+D http://www.imente.com 902 933 242 [iMente La informacin con ms beneficios] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Clustering lucene's results
Thanks Dawid ! :) From: Dawid Weiss [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Subject: Re: Clustering lucene's results Date: Thu, 07 Oct 2004 10:39:26 +0200 Hi William, Ok, here is some demo code I've put together that shows how you can achieve clustering of Lucene's results. I hope this will get you started on your projects. If you have questions, please don't hesitate to ask -- cross posts to carrot2-developers would be a good idea too. The code (plus the binaries so that you don't have to check out all of Carrot2 ;) are at: http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip Take a look at Demo.java -- it is the main link between Lucene and Carrot. Play with the parameters, I used 100 as the number of search results to be clustered. Adjust it to your needs. int start = 0; int requiredHits = 100; I hope the code will be self-explanatory. Good luck, Dawid From the readme file: An example of using Carrot2 components to clustering search results from Lucene. === Prerequisities -- You must have an index created with Lucene and containing documents with the following fields: url, title, summary. The Lucene demo works with exactly these fields -- I just indexed all of Lucene's source code and documentation using the following line: mkdir index java -Djava.ext.dirs=build org.apache.lucene.demo.IndexHTML -create -index index . The index is now in 'index' folder. Remember that the quality of snippets and titles heavily influences the output of the clustering; in fact, the above example index of Lucene's API is not too good because most queries will return nonsensical cluster labels (see below). Building Carrot2-Lucene demo Basically you should have all of Carrot2 source code checked out and issue the building command: ant -Dcopy.dependencies=true All of the required libraries and Carrot2 components will end up in 'tmp/dist/deps-carrot2-lucene-example-jar' folder. You can also spare yourself some time and download precompiled binaries I've put at: http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip Now, once you have the compiled binaries, issue the following command (all on one line of course): java -Djava.ext.dirs=tmp\dist;tmp\dist\deps-carrot2-lucene-example-jar \ com.dawidweiss.carrot.lucene.Demo index query The first argument is the location of the Lucene's index created before. The second argument is a query. In the output you should have clusters and max. three documents from every cluster: Results for: query Timings: index opened in: 0,181s, search: 0,13s, clustering: 0,721s : Search Lucene Rc1 Dev API - F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/class-use/Query.html Uses of Class org.apache.lucene.search.Query (Lucene 1.5-rc1-dev API) - F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-summary.html org.apache.lucene.search (Lucene 1.5-rc1-dev API) - F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-use.html Uses of Package org.apache.lucene.search (Lucene 1.5-rc1-dev API) (and 19 more) : Jakarta Lucene - F:/Repositories/cvs.apache.org/jakarta-lucene/src/java/overview.html Jakarta Lucene API - F:/Repositories/cvs.apache.org/jakarta-lucene/docs/whoweare.html Jakarta Lucene - Who We Are - Jakarta Lucene - F:/Repositories/cvs.apache.org/jakarta-lucene/docs/index.html Jakarta Lucene - Overview - Jakarta Lucene (and 12 more) If you look at the source code of Demo.java, there are plenty of things apt for customization -- number of results from each cluster, number of displayed clusters (I would cut it to some reasonable number, say 10 or 15 -- the further a cluster is from the top, the less it is likely to be important). Also keep in mind that some of Carrot2 components produce hierarchical clusters. This demonstration works with flat version of Lingo algorithm, so you don't need to worry about it. Hope this gets you started with using Carrot2 and Lucene. Please let me know about any successes or failures. Dawid - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Check out Election 2004 for up-to-date election news, plus voter tools and more! http://special.msn.com/msn/election2004.armx - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Clustering lucene's results
No problem. Let people know if it worked for you -- I look forward to hearing your experiences (good or bad). Dawid William W wrote: Thanks Dawid ! :) From: Dawid Weiss [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Subject: Re: Clustering lucene's results Date: Thu, 07 Oct 2004 10:39:26 +0200 Hi William, Ok, here is some demo code I've put together that shows how you can achieve clustering of Lucene's results. I hope this will get you started on your projects. If you have questions, please don't hesitate to ask -- cross posts to carrot2-developers would be a good idea too. The code (plus the binaries so that you don't have to check out all of Carrot2 ;) are at: http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip Take a look at Demo.java -- it is the main link between Lucene and Carrot. Play with the parameters, I used 100 as the number of search results to be clustered. Adjust it to your needs. int start = 0; int requiredHits = 100; I hope the code will be self-explanatory. Good luck, Dawid From the readme file: An example of using Carrot2 components to clustering search results from Lucene. === Prerequisities -- You must have an index created with Lucene and containing documents with the following fields: url, title, summary. The Lucene demo works with exactly these fields -- I just indexed all of Lucene's source code and documentation using the following line: mkdir index java -Djava.ext.dirs=build org.apache.lucene.demo.IndexHTML -create -index index . The index is now in 'index' folder. Remember that the quality of snippets and titles heavily influences the output of the clustering; in fact, the above example index of Lucene's API is not too good because most queries will return nonsensical cluster labels (see below). Building Carrot2-Lucene demo Basically you should have all of Carrot2 source code checked out and issue the building command: ant -Dcopy.dependencies=true All of the required libraries and Carrot2 components will end up in 'tmp/dist/deps-carrot2-lucene-example-jar' folder. You can also spare yourself some time and download precompiled binaries I've put at: http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip Now, once you have the compiled binaries, issue the following command (all on one line of course): java -Djava.ext.dirs=tmp\dist;tmp\dist\deps-carrot2-lucene-example-jar \ com.dawidweiss.carrot.lucene.Demo index query The first argument is the location of the Lucene's index created before. The second argument is a query. In the output you should have clusters and max. three documents from every cluster: Results for: query Timings: index opened in: 0,181s, search: 0,13s, clustering: 0,721s : Search Lucene Rc1 Dev API - F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/class-use/Query.html Uses of Class org.apache.lucene.search.Query (Lucene 1.5-rc1-dev API) - F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-summary.html org.apache.lucene.search (Lucene 1.5-rc1-dev API) - F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-use.html Uses of Package org.apache.lucene.search (Lucene 1.5-rc1-dev API) (and 19 more) : Jakarta Lucene - F:/Repositories/cvs.apache.org/jakarta-lucene/src/java/overview.html Jakarta Lucene API - F:/Repositories/cvs.apache.org/jakarta-lucene/docs/whoweare.html Jakarta Lucene - Who We Are - Jakarta Lucene - F:/Repositories/cvs.apache.org/jakarta-lucene/docs/index.html Jakarta Lucene - Overview - Jakarta Lucene (and 12 more) If you look at the source code of Demo.java, there are plenty of things apt for customization -- number of results from each cluster, number of displayed clusters (I would cut it to some reasonable number, say 10 or 15 -- the further a cluster is from the top, the less it is likely to be important). Also keep in mind that some of Carrot2 components produce hierarchical clusters. This demonstration works with flat version of Lingo algorithm, so you don't need to worry about it. Hope this gets you started with using Carrot2 and Lucene. Please let me know about any successes or failures. Dawid - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Check out Election 2004 for up-to-date election news, plus voter tools and more! http://special.msn.com/msn/election2004.armx - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED
RE: Clustering lucene's results
Hi Dawid, I would like to use Carrot2 with lucene. Do you have examples ? Thanks a lot, William. From: Dawid Weiss [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Clustering lucene's results Date: Thu, 23 Sep 2004 13:36:03 +0200 Dear all, I saw a post about an attempt to integrate Carrot2 with Lucene. It was a while ago, so I'm curious if any outcome has been achieved. Anyway, as the project coordinator I can offer my help with such integration; if you're looking for some ready-to-use code then there is a clustering plugin for Nutch that integrates one of the clustering algorithms from Carrot2 with Nutch; I'm sure porting it to Lucene wouldn't be a big problem. Ragards, Dawid _ List sprawdzony skanerem poczty mks_vir ( http://www.mks.com.pl ) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Express yourself instantly with MSN Messenger! Download today - it's FREE! hthttp://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Clustering lucene's results
Hi William, No, I don't have examples because I never used Lucene directly. If you provide me with a sample index and an API that executes a query on this index (I need document titles, summaries, or snippets and an anchor (identifier), can be an URL). Send me such a snippet and I'll try to write the integration code with Lucene. It is only a matter of writing a simple InputComponent instance and this is really trivial (see Nutch's plugin code). Dawid William W wrote: Hi Dawid, I would like to use Carrot2 with lucene. Do you have examples ? Thanks a lot, William. From: Dawid Weiss [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Clustering lucene's results Date: Thu, 23 Sep 2004 13:36:03 +0200 Dear all, I saw a post about an attempt to integrate Carrot2 with Lucene. It was a while ago, so I'm curious if any outcome has been achieved. Anyway, as the project coordinator I can offer my help with such integration; if you're looking for some ready-to-use code then there is a clustering plugin for Nutch that integrates one of the clustering algorithms from Carrot2 with Nutch; I'm sure porting it to Lucene wouldn't be a big problem. Ragards, Dawid _ List sprawdzony skanerem poczty mks_vir ( http://www.mks.com.pl ) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Express yourself instantly with MSN Messenger! Download today - it's FREE! hthttp://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] From - Thu _ List sprawdzony skanerem poczty mks_vir ( http://www.mks.com.pl ) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Clustering lucene's results
Dawid Weiss wrote: Hi William, No, I don't have examples because I never used Lucene directly. If you provide me with a sample index and an API that executes a query on this index (I need document titles, summaries, or snippets and an anchor (identifier), can be an URL). Hi Dawid :-) I believe the approach to this component should be that you first initialize it by reading a mapping of Lucene index field names to logical names (metadata) like title, url, body, etc. The reason is that each index uses its own metadata schema, i.e. in Lucene-speak, the field names. Moreover, when you execute a query you get just a document id plus its score. It's up to you to build a snippet. There is a code in the jakarta-lucene-sandbox CVS repo. (highlighter) to create snippets from the query and the hit list, take a look at this... Send me such a snippet and I'll try to write the integration code with Lucene. It is only a matter of writing a simple InputComponent instance and this is really trivial (see Nutch's plugin code). The basic usage scenario is that you open the IndexReader (either using directory name as a String or a Directory instance), and then create a Query instance, usually using QueryParser, and finally you search using IndexSearcher. You get a list of Hits, which you can use to get scores, and the contents of the documents. Take a look at the IndexFiles and SearchFiles classes in org.apache.lucene.demo package (under /src/demo). -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Clustering lucene's results
Hi Andrzej :) Yep, ok, I'll take a look at it. After I come back from abroad (next week). I just wanted to save myself some time and have an already written code that fetches the information we need for clustering; you know what I mean, I'm sure. But I'll start from scratch when I get back. D. Andrzej Bialecki wrote: Dawid Weiss wrote: Hi William, No, I don't have examples because I never used Lucene directly. If you provide me with a sample index and an API that executes a query on this index (I need document titles, summaries, or snippets and an anchor (identifier), can be an URL). Hi Dawid :-) I believe the approach to this component should be that you first initialize it by reading a mapping of Lucene index field names to logical names (metadata) like title, url, body, etc. The reason is that each index uses its own metadata schema, i.e. in Lucene-speak, the field names. Moreover, when you execute a query you get just a document id plus its score. It's up to you to build a snippet. There is a code in the jakarta-lucene-sandbox CVS repo. (highlighter) to create snippets from the query and the hit list, take a look at this... Send me such a snippet and I'll try to write the integration code with Lucene. It is only a matter of writing a simple InputComponent instance and this is really trivial (see Nutch's plugin code). The basic usage scenario is that you open the IndexReader (either using directory name as a String or a Directory instance), and then create a Query instance, usually using QueryParser, and finally you search using IndexSearcher. You get a list of Hits, which you can use to get scores, and the contents of the documents. Take a look at the IndexFiles and SearchFiles classes in org.apache.lucene.demo package (under /src/demo). - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Clustering lucene's results
Hi Dawid, The demos (under /src/demo) are very good. They have the basic usage scenario. Thanks Andrzej. William. Dawid Weiss wrote: Hi William, No, I don't have examples because I never used Lucene directly. If you provide me with a sample index and an API that executes a query on this index (I need document titles, summaries, or snippets and an anchor (identifier), can be an URL). Hi Dawid :-) I believe the approach to this component should be that you first initialize it by reading a mapping of Lucene index field names to logical names (metadata) like title, url, body, etc. The reason is that each index uses its own metadata schema, i.e. in Lucene-speak, the field names. Moreover, when you execute a query you get just a document id plus its score. It's up to you to build a snippet. There is a code in the jakarta-lucene-sandbox CVS repo. (highlighter) to create snippets from the query and the hit list, take a look at this... Send me such a snippet and I'll try to write the integration code with Lucene. It is only a matter of writing a simple InputComponent instance and this is really trivial (see Nutch's plugin code). The basic usage scenario is that you open the IndexReader (either using directory name as a String or a Directory instance), and then create a Query instance, usually using QueryParser, and finally you search using IndexSearcher. You get a list of Hits, which you can use to get scores, and the contents of the documents. Take a look at the IndexFiles and SearchFiles classes in org.apache.lucene.demo package (under /src/demo). -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Get ready for school! Find articles, homework help and more in the Back to School Guide! http://special.msn.com/network/04backtoschool.armx - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Clustering lucene's results
yeah... I know there have to be demos... I tried to be lazy, you know :) Anyway, as I told Andrzej -- I'll take a look at it (and with a pleasure) after I come back. i don't think the delay will matter much. And if it does, ask Andrzej -- he has excellent experience with both projects -- he's just very shy by nature and doesn't talk much, hehe. D. William W wrote: Hi Dawid, The demos (under /src/demo) are very good. They have the basic usage scenario. Thanks Andrzej. William. Dawid Weiss wrote: Hi William, No, I don't have examples because I never used Lucene directly. If you provide me with a sample index and an API that executes a query on this index (I need document titles, summaries, or snippets and an anchor (identifier), can be an URL). Hi Dawid :-) I believe the approach to this component should be that you first initialize it by reading a mapping of Lucene index field names to logical names (metadata) like title, url, body, etc. The reason is that each index uses its own metadata schema, i.e. in Lucene-speak, the field names. Moreover, when you execute a query you get just a document id plus its score. It's up to you to build a snippet. There is a code in the jakarta-lucene-sandbox CVS repo. (highlighter) to create snippets from the query and the hit list, take a look at this... Send me such a snippet and I'll try to write the integration code with Lucene. It is only a matter of writing a simple InputComponent instance and this is really trivial (see Nutch's plugin code). The basic usage scenario is that you open the IndexReader (either using directory name as a String or a Directory instance), and then create a Query instance, usually using QueryParser, and finally you search using IndexSearcher. You get a list of Hits, which you can use to get scores, and the contents of the documents. Take a look at the IndexFiles and SearchFiles classes in org.apache.lucene.demo package (under /src/demo). -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Get ready for school! Find articles, homework help and more in the Back to School Guide! http://special.msn.com/network/04backtoschool.armx - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]