Re: Lucene demo ideas?
On Wed, Sep 17, 2003 at 08:00:42AM -0400, Erik Hatcher wrote: > I'm about to start some refactorings on the web application demo that > ships with Lucene to show off its features and be usable more easily > and cleanly out of the box - i.e. just drop into Tomcat's webapps > directory and go. > > Does anyone have any suggestions on what they'd like to see in the demo > app? One odd thought (may be out of scope) is to put together a google-flavored query language, since most users are going to be unfamiliar with the default Lucene query language. Lucene doesn't really match google, but something google-flavored might be better at showing off Lucene's features in the demo. -- Steven J. Owens [EMAIL PROTECTED] "I'm going to make broad, sweeping generalizations and strong, declarative statements, because otherwise I'll be here all night and this document will be four times longer and much less fun to read. Take it all with a grain of salt." - Me at http://darksleep.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
Erik Hatcher wrote: [...] - Index text and HTML files. Any others? I don't want to get into putting too many dependencies in though - let's keep it relatively simple, although still demonstrative. Allow search filtering by last modified date range and document type (extension). If I may plug our code again ;-) Docco (http://tockit.sf.net) contains a framework for document handlers, with implementations for plain text, html, xml and OpenOffice based on JDK 1.4 and plugins for PDFBox, POI and Multivalent. There is also a notion of file mappings (i.e. mapping from a match on a FileFilter to a handler) and we plan to add code to mixin external information like meta-data stores or EAs from advanced file systems. It is available on SF (within http://sf.net/projects/toscanaj) and is at the moment BSD-style licensed. We would be happy to contribute bits of that and thanks to the plugin architecture dependencies should be controllable. Admittably the plugin loader is still a hack, but it works. Peter - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
On Wednesday 17 September 2003 07:07, Erik Hatcher wrote: > On Wednesday, September 17, 2003, at 08:43 AM, Killeen, Tom wrote: > > I would suggest XML as well. > > Again, I'd like to hear more about how you'd do this generically. Tell > me what the field names and values would correspond to when presented > with an XML file. Perhaps just one generic "content" field, which would contain tokenized content from all XML segments. That could be done easily & efficiently with just sax event handling? Since it's a simple demo, you can't get much simpler than that, but it should still be fairly useful? Attributes could/should be ignored by default; common practice for XML markup seems to be for attributes not to contain any content that would make sense to index. So I'd think just stripping out all tags (and comments, PIs etc) might be reasonable plain simple approach for demo app. -+ Tatu +- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
Yeah, that would be great! - Original Message - From: "Jeff Linwood" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, September 17, 2003 5:15 PM Subject: Re: Lucene demo ideas? > Paging would be great for the results. > > Jeff > - Original Message - > From: "Erik Hatcher" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Wednesday, September 17, 2003 7:00 AM > Subject: Lucene demo ideas? > > > > I'm about to start some refactorings on the web application demo that > > ships with Lucene to show off its features and be usable more easily > > and cleanly out of the box - i.e. just drop into Tomcat's webapps > > directory and go. > > > > Does anyone have any suggestions on what they'd like to see in the demo > > app? Some of my ideas are: > > > > - Eliminate the need to do a command-line indexing, let the web app do > > this upon command, allowing you to specify where the index lives (there > > will be a reasonable default like ~/lucenedemo/index perhaps) and what > > directory tree to index (perhaps defaulting to the root directory or > > c:\, or where instead?) > > > > - Spin off a background indexing thread so the web app searching is > > immediately useful after kicking off the indexing process, and allow a > > status view of the indexing progress. > > > > - Index text and HTML files. Any others? I don't want to get into > > putting too many dependencies in though - let's keep it relatively > > simple, although still demonstrative. Allow search filtering by last > > modified date range and document type (extension). > > > > - Perhaps allow you to specify the analyzer to use when indexing. > > > > - Show the explanation of how scores are computed in the search results > > as an option. > > > > I'm all ears to possibilities of improvements! Send your wishlist. > > > > Erik > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
I would have the code ready is wanted... - Original Message - From: "Pitre, Russell" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, September 17, 2003 2:21 PM Subject: RE: Lucene demo ideas? I know this may be far fetched, but how about being able to index .jsp'sI know this is a spindle thing, but It seems a lot of people need this functionality. My suggestion Russ -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 17, 2003 8:01 AM To: [EMAIL PROTECTED] Subject: Lucene demo ideas? I'm about to start some refactorings on the web application demo that ships with Lucene to show off its features and be usable more easily and cleanly out of the box - i.e. just drop into Tomcat's webapps directory and go. Does anyone have any suggestions on what they'd like to see in the demo app? Some of my ideas are: - Eliminate the need to do a command-line indexing, let the web app do this upon command, allowing you to specify where the index lives (there will be a reasonable default like ~/lucenedemo/index perhaps) and what directory tree to index (perhaps defaulting to the root directory or c:\, or where instead?) - Spin off a background indexing thread so the web app searching is immediately useful after kicking off the indexing process, and allow a status view of the indexing progress. - Index text and HTML files. Any others? I don't want to get into putting too many dependencies in though - let's keep it relatively simple, although still demonstrative. Allow search filtering by last modified date range and document type (extension). - Perhaps allow you to specify the analyzer to use when indexing. - Show the explanation of how scores are computed in the search results as an option. I'm all ears to possibilities of improvements! Send your wishlist. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
Paging would be great for the results. Jeff - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, September 17, 2003 7:00 AM Subject: Lucene demo ideas? > I'm about to start some refactorings on the web application demo that > ships with Lucene to show off its features and be usable more easily > and cleanly out of the box - i.e. just drop into Tomcat's webapps > directory and go. > > Does anyone have any suggestions on what they'd like to see in the demo > app? Some of my ideas are: > > - Eliminate the need to do a command-line indexing, let the web app do > this upon command, allowing you to specify where the index lives (there > will be a reasonable default like ~/lucenedemo/index perhaps) and what > directory tree to index (perhaps defaulting to the root directory or > c:\, or where instead?) > > - Spin off a background indexing thread so the web app searching is > immediately useful after kicking off the indexing process, and allow a > status view of the indexing progress. > > - Index text and HTML files. Any others? I don't want to get into > putting too many dependencies in though - let's keep it relatively > simple, although still demonstrative. Allow search filtering by last > modified date range and document type (extension). > > - Perhaps allow you to specify the analyzer to use when indexing. > > - Show the explanation of how scores are computed in the search results > as an option. > > I'm all ears to possibilities of improvements! Send your wishlist. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
On Wednesday, September 17, 2003, at 09:21 AM, Pitre, Russell wrote: I know this may be far fetched, but how about being able to index .jsp'sI know this is a spindle thing, but It seems a lot of people need this functionality. Like I communicated in a previous thread, indexing JSP's just has a "smell" to it for me. I can't argue with the pragmatic way others have done it by crawling, but I don't think of JSP's as "content" and I'd rather index actual content, that may or may not be later presented within a JSP. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
On Wednesday, September 17, 2003, at 09:31 AM, Bryan LaPlante wrote: I would like to see the taglib for searching the index in the demo. There is an html form page and result page already built for the taglib that allows you to change search params and demonstrates a fair amount of the search capability of Lucene. Bryan, no offense... but I won't be using the taglib in the demo. I just don't feel accessing a Lucene index via a taglib is the right way to do things. Coupling an index to JSP in that manner is too tight for my tastes. What happens if you want to use Velocity for presentation? Or a Swing app? See what I mean? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
Erik Hatcher wrote: On Wednesday, September 17, 2003, at 08:42 AM, Ben Litchfield wrote: What, no PDF files!! Haha! http://www.pdfbox.org And I've used pdfbox before - its cool. And I'm cool with adding PDF and Word indexing to the demo personally, but I didn't want to increase the "weight" of the demo application. If folks feel strongly about it then I'll incorporate it. A word of warning: PDFBox is fantastic, I agree - but some PDFs are not so... In my application I experienced numerous hangs when PDFBox would start parsing some PDFs (I can send the files to Ben if required), and then got stuck in an infinite wait somewhere... So I came up with a workaround: I run the parser in a separate thread, while waiting in the main thread, and then after a certain timeout I kill the processing thread and return. -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
I think all the attribute values together with element text values should be indexed in the "content" part. Also a xml map file could be used to pick up the nodes need to be indexed separately so we do not create too many fields by indexing non-critical nodes separately. Simple xpath could be used for the map source, the field name and index type should be the map target. Regards, Hui - Original Message - From: "Robert Koberg" <[EMAIL PROTECTED]> To: "'Lucene Users List'" <[EMAIL PROTECTED]> Sent: Wednesday, September 17, 2003 10:09 AM Subject: RE: Lucene demo ideas? > Hi, > > Here are a couple of ideas for XML demos: > > 1. simply index the content into one 'content' field. Don't worry about > attributes. > > 2. index a linked Dublin core meta data file: > > And add fields for every element after rdf:Description > > Best, > -Rob > > > > > -Original Message- > > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, September 17, 2003 6:08 AM > > To: Lucene Users List > > > > On Wednesday, September 17, 2003, at 08:43 AM, Killeen, Tom wrote: > > > I would suggest XML as well. > > > > Again, I'd like to hear more about how you'd do this generically. Tell > > me what the field names and values would correspond to when presented > > with an XML file. > > > > Erik > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
> Does anyone have any suggestions on what they'd like to see in the > demo app? Show how lucene can 1) do incremental indexing, 2) isn't restricted to indexing file system resources and 3) can store and query arbitrary fields. These are in my opinion the features where most other search engines fall flat. -- Eric Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene demo ideas?
Hi, Here are a couple of ideas for XML demos: 1. simply index the content into one 'content' field. Don't worry about attributes. 2. index a linked Dublin core meta data file: And add fields for every element after rdf:Description Best, -Rob > -Original Message- > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > Sent: Wednesday, September 17, 2003 6:08 AM > To: Lucene Users List > > On Wednesday, September 17, 2003, at 08:43 AM, Killeen, Tom wrote: > > I would suggest XML as well. > > Again, I'd like to hear more about how you'd do this generically. Tell > me what the field names and values would correspond to when presented > with an XML file. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene demo ideas?
I know this may be far fetched, but how about being able to index .jsp'sI know this is a spindle thing, but It seems a lot of people need this functionality. My suggestion Russ -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 17, 2003 8:01 AM To: [EMAIL PROTECTED] Subject: Lucene demo ideas? I'm about to start some refactorings on the web application demo that ships with Lucene to show off its features and be usable more easily and cleanly out of the box - i.e. just drop into Tomcat's webapps directory and go. Does anyone have any suggestions on what they'd like to see in the demo app? Some of my ideas are: - Eliminate the need to do a command-line indexing, let the web app do this upon command, allowing you to specify where the index lives (there will be a reasonable default like ~/lucenedemo/index perhaps) and what directory tree to index (perhaps defaulting to the root directory or c:\, or where instead?) - Spin off a background indexing thread so the web app searching is immediately useful after kicking off the indexing process, and allow a status view of the indexing progress. - Index text and HTML files. Any others? I don't want to get into putting too many dependencies in though - let's keep it relatively simple, although still demonstrative. Allow search filtering by last modified date range and document type (extension). - Perhaps allow you to specify the analyzer to use when indexing. - Show the explanation of how scores are computed in the search results as an option. I'm all ears to possibilities of improvements! Send your wishlist. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
I would like to see the taglib for searching the index in the demo. There is an html form page and result page already built for the taglib that allows you to change search params and demonstrates a fair amount of the search capability of Lucene. - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, September 17, 2003 7:00 AM Subject: Lucene demo ideas? > I'm about to start some refactorings on the web application demo that > ships with Lucene to show off its features and be usable more easily > and cleanly out of the box - i.e. just drop into Tomcat's webapps > directory and go. > > Does anyone have any suggestions on what they'd like to see in the demo > app? Some of my ideas are: > > - Eliminate the need to do a command-line indexing, let the web app do > this upon command, allowing you to specify where the index lives (there > will be a reasonable default like ~/lucenedemo/index perhaps) and what > directory tree to index (perhaps defaulting to the root directory or > c:\, or where instead?) > > - Spin off a background indexing thread so the web app searching is > immediately useful after kicking off the indexing process, and allow a > status view of the indexing progress. > > - Index text and HTML files. Any others? I don't want to get into > putting too many dependencies in though - let's keep it relatively > simple, although still demonstrative. Allow search filtering by last > modified date range and document type (extension). > > - Perhaps allow you to specify the analyzer to use when indexing. > > - Show the explanation of how scores are computed in the search results > as an option. > > I'm all ears to possibilities of improvements! Send your wishlist. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
Might want two demos, one for Unix environments and one for Windows. Most users will want a fast start that they can copy and adapt. So quick targets would be: filesystems - html / text / pdf / office documents for windows. xml - fairly simple example maybe against news items. database - again simple maybe a pseudo employee database. website - accessable from the filesystem. website - that requires crawling. Show hit markup. Pete - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, September 17, 2003 1:00 PM Subject: Lucene demo ideas? > I'm about to start some refactorings on the web application demo that > ships with Lucene to show off its features and be usable more easily > and cleanly out of the box - i.e. just drop into Tomcat's webapps > directory and go. > > Does anyone have any suggestions on what they'd like to see in the demo > app? Some of my ideas are: > > - Eliminate the need to do a command-line indexing, let the web app do > this upon command, allowing you to specify where the index lives (there > will be a reasonable default like ~/lucenedemo/index perhaps) and what > directory tree to index (perhaps defaulting to the root directory or > c:\, or where instead?) > > - Spin off a background indexing thread so the web app searching is > immediately useful after kicking off the indexing process, and allow a > status view of the indexing progress. > > - Index text and HTML files. Any others? I don't want to get into > putting too many dependencies in though - let's keep it relatively > simple, although still demonstrative. Allow search filtering by last > modified date range and document type (extension). > > - Perhaps allow you to specify the analyzer to use when indexing. > > - Show the explanation of how scores are computed in the search results > as an option. > > I'm all ears to possibilities of improvements! Send your wishlist. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
On Wednesday, September 17, 2003, at 08:43 AM, Killeen, Tom wrote: I would suggest XML as well. Again, I'd like to hear more about how you'd do this generically. Tell me what the field names and values would correspond to when presented with an XML file. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
On Wednesday, September 17, 2003, at 08:42 AM, Ben Litchfield wrote: What, no PDF files!! Haha! http://www.pdfbox.org And I've used pdfbox before - its cool. And I'm cool with adding PDF and Word indexing to the demo personally, but I didn't want to increase the "weight" of the demo application. If folks feel strongly about it then I'll incorporate it. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
please keep the discussions on the lucene-user e-mail list. of course the source code will be available... what is there is already in lucene's CVS and i will just revamp what is there and commit it. and when we make lucene releases it will be bundled and made available as a single download too. as for indexing XML files that is a possibility, but that is a broad request. how would they be indexed? every element made a field? every attribute too? what are the field names? is this really appropriate for a "demo"? On Wednesday, September 17, 2003, at 08:42 AM, Senthil Kumar K wrote: hi erik, Is it possible to send a source code for the lucene demo u proposed and i want to index xml files in my application. All i have to do from browser. I have to avoid the command line indexing. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene demo ideas?
I would suggest XML as well. Tom -Original Message- From: Ben Litchfield [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 17, 2003 7:42 AM To: Lucene Users List Subject: Re: Lucene demo ideas? > - Index text and HTML files. Any others? What, no PDF files!! Ben -- http://www.pdfbox.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene demo ideas?
> - Index text and HTML files. Any others? What, no PDF files!! Ben -- http://www.pdfbox.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene demo ideas?
I'm about to start some refactorings on the web application demo that ships with Lucene to show off its features and be usable more easily and cleanly out of the box - i.e. just drop into Tomcat's webapps directory and go. Does anyone have any suggestions on what they'd like to see in the demo app? Some of my ideas are: - Eliminate the need to do a command-line indexing, let the web app do this upon command, allowing you to specify where the index lives (there will be a reasonable default like ~/lucenedemo/index perhaps) and what directory tree to index (perhaps defaulting to the root directory or c:\, or where instead?) - Spin off a background indexing thread so the web app searching is immediately useful after kicking off the indexing process, and allow a status view of the indexing progress. - Index text and HTML files. Any others? I don't want to get into putting too many dependencies in though - let's keep it relatively simple, although still demonstrative. Allow search filtering by last modified date range and document type (extension). - Perhaps allow you to specify the analyzer to use when indexing. - Show the explanation of how scores are computed in the search results as an option. I'm all ears to possibilities of improvements! Send your wishlist. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]