Re: Lucene demo ideas?

2003-09-23 Thread Steven J. Owens
On Wed, Sep 17, 2003 at 08:00:42AM -0400, Erik Hatcher wrote:
 I'm about to start some refactorings on the web application demo that 
 ships with Lucene to show off its features and be usable more easily 
 and cleanly out of the box - i.e. just drop into Tomcat's webapps 
 directory and go.
 
 Does anyone have any suggestions on what they'd like to see in the demo 
 app?

 One odd thought (may be out of scope) is to put together a
google-flavored query language, since most users are going to be
unfamiliar with the default Lucene query language.  Lucene doesn't
really match google, but something google-flavored might be better at
showing off Lucene's features in the demo.

-- 
Steven J. Owens
[EMAIL PROTECTED]

I'm going to make broad, sweeping generalizations and strong,
 declarative statements, because otherwise I'll be here all night and
 this document will be four times longer and much less fun to read.
 Take it all with a grain of salt. - Me at http://darksleep.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene demo ideas?

2003-09-18 Thread Peter Becker
Erik Hatcher wrote:

[...]

- Index text and HTML files.  Any others?  I don't want to get into 
putting too many dependencies in though - let's keep it relatively 
simple, although still demonstrative.  Allow search filtering by last 
modified date range and document type (extension). 
If I may plug our code again ;-) Docco (http://tockit.sf.net) contains a 
framework for document handlers, with implementations for plain text, 
html, xml and OpenOffice based on JDK 1.4 and plugins for PDFBox, POI 
and Multivalent. There is also a notion of file mappings (i.e. mapping 
from a match on a FileFilter to a handler) and we plan to add code to 
mixin external information like meta-data stores or EAs from advanced 
file systems. It is available on SF (within 
http://sf.net/projects/toscanaj) and is at the moment BSD-style 
licensed. We would be happy to contribute bits of that and thanks to the 
plugin architecture dependencies should be controllable. Admittably the 
plugin loader is still a hack, but it works.

 Peter

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene demo ideas?

2003-09-17 Thread Ben Litchfield

 - Index text and HTML files.  Any others?


What, no PDF files!!

Ben

--
http://www.pdfbox.org

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene demo ideas?

2003-09-17 Thread Killeen, Tom
I would suggest XML as well.


Tom

-Original Message-
From: Ben Litchfield [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 17, 2003 7:42 AM
To: Lucene Users List
Subject: Re: Lucene demo ideas?



 - Index text and HTML files.  Any others?


What, no PDF files!!

Ben

--
http://www.pdfbox.org

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene demo ideas?

2003-09-17 Thread Erik Hatcher
please keep the discussions on the lucene-user e-mail list.

of course the source code will be available... what is there is already 
in lucene's CVS and i will just revamp what is there and commit it.  
and when we make lucene releases it will be bundled and made available 
as a single download too.

as for indexing XML files that is a possibility, but that is a 
broad request.  how would they be indexed?  every element made a field? 
 every attribute too?  what are the field names?  is this really 
appropriate for a demo?

On Wednesday, September 17, 2003, at 08:42  AM, Senthil Kumar K wrote:
hi erik,

  Is it possible to send a source code for the lucene demo
u proposed and i want to index xml files in my application.
All i have to do from browser. I have to avoid the command
line indexing.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene demo ideas?

2003-09-17 Thread Erik Hatcher
On Wednesday, September 17, 2003, at 08:43  AM, Killeen, Tom wrote:
I would suggest XML as well.
Again, I'd like to hear more about how you'd do this generically.  Tell 
me what the field names and values would correspond to when presented 
with an XML file.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene demo ideas?

2003-09-17 Thread Pete Lewis
Might want two demos, one for Unix environments and one for Windows.

Most users will want a fast start that they can copy and adapt.  So quick
targets would be:

filesystems - html / text / pdf / office documents for windows.
xml - fairly simple example maybe against news items.
database - again simple maybe a pseudo employee database.
website - accessable from the filesystem.
website - that requires crawling.

Show hit markup.

Pete

- Original Message - 
From: Erik Hatcher [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 17, 2003 1:00 PM
Subject: Lucene demo ideas?


 I'm about to start some refactorings on the web application demo that
 ships with Lucene to show off its features and be usable more easily
 and cleanly out of the box - i.e. just drop into Tomcat's webapps
 directory and go.

 Does anyone have any suggestions on what they'd like to see in the demo
 app?  Some of my ideas are:

 - Eliminate the need to do a command-line indexing, let the web app do
 this upon command, allowing you to specify where the index lives (there
 will be a reasonable default like ~/lucenedemo/index perhaps) and what
 directory tree to index (perhaps defaulting to the root directory or
 c:\, or where instead?)

 - Spin off a background indexing thread so the web app searching is
 immediately useful after kicking off the indexing process, and allow a
 status view of the indexing progress.

 - Index text and HTML files.  Any others?  I don't want to get into
 putting too many dependencies in though - let's keep it relatively
 simple, although still demonstrative.  Allow search filtering by last
 modified date range and document type (extension).

 - Perhaps allow you to specify the analyzer to use when indexing.

 - Show the explanation of how scores are computed in the search results
 as an option.

 I'm all ears to possibilities of improvements!  Send your wishlist.

 Erik


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene demo ideas?

2003-09-17 Thread Bryan LaPlante
I would like to see the taglib for searching the index in the demo. There is
an html form page and result page already built for the taglib that allows
you to change search params and demonstrates a fair amount of the search
capability of Lucene.

- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 17, 2003 7:00 AM
Subject: Lucene demo ideas?


 I'm about to start some refactorings on the web application demo that
 ships with Lucene to show off its features and be usable more easily
 and cleanly out of the box - i.e. just drop into Tomcat's webapps
 directory and go.

 Does anyone have any suggestions on what they'd like to see in the demo
 app?  Some of my ideas are:

 - Eliminate the need to do a command-line indexing, let the web app do
 this upon command, allowing you to specify where the index lives (there
 will be a reasonable default like ~/lucenedemo/index perhaps) and what
 directory tree to index (perhaps defaulting to the root directory or
 c:\, or where instead?)

 - Spin off a background indexing thread so the web app searching is
 immediately useful after kicking off the indexing process, and allow a
 status view of the indexing progress.

 - Index text and HTML files.  Any others?  I don't want to get into
 putting too many dependencies in though - let's keep it relatively
 simple, although still demonstrative.  Allow search filtering by last
 modified date range and document type (extension).

 - Perhaps allow you to specify the analyzer to use when indexing.

 - Show the explanation of how scores are computed in the search results
 as an option.

 I'm all ears to possibilities of improvements!  Send your wishlist.

 Erik


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene demo ideas?

2003-09-17 Thread Pitre, Russell
I know this may be far fetched, but how about being able to index
.jsp'sI know this is a spindle thing, but It seems a lot of people
need this functionality.


My suggestion

Russ

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 17, 2003 8:01 AM
To: [EMAIL PROTECTED]
Subject: Lucene demo ideas?

I'm about to start some refactorings on the web application demo that 
ships with Lucene to show off its features and be usable more easily 
and cleanly out of the box - i.e. just drop into Tomcat's webapps 
directory and go.

Does anyone have any suggestions on what they'd like to see in the demo 
app?  Some of my ideas are:

- Eliminate the need to do a command-line indexing, let the web app do 
this upon command, allowing you to specify where the index lives (there 
will be a reasonable default like ~/lucenedemo/index perhaps) and what 
directory tree to index (perhaps defaulting to the root directory or 
c:\, or where instead?)

- Spin off a background indexing thread so the web app searching is 
immediately useful after kicking off the indexing process, and allow a 
status view of the indexing progress.

- Index text and HTML files.  Any others?  I don't want to get into 
putting too many dependencies in though - let's keep it relatively 
simple, although still demonstrative.  Allow search filtering by last 
modified date range and document type (extension).

- Perhaps allow you to specify the analyzer to use when indexing.

- Show the explanation of how scores are computed in the search results 
as an option.

I'm all ears to possibilities of improvements!  Send your wishlist.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene demo ideas?

2003-09-17 Thread Robert Koberg
Hi,

Here are a couple of ideas for XML demos:

1. simply index the content into one 'content' field. Don't worry about
attributes.

2. index a linked Dublin core meta data file:
link rel=meta href=index.rdf /
And add fields for every element after rdf:Description

Best,
-Rob



 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, September 17, 2003 6:08 AM
 To: Lucene Users List
 
 On Wednesday, September 17, 2003, at 08:43  AM, Killeen, Tom wrote:
  I would suggest XML as well.
 
 Again, I'd like to hear more about how you'd do this generically.  Tell
 me what the field names and values would correspond to when presented
 with an XML file.
 
   Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene demo ideas?

2003-09-17 Thread Eric Jain
 Does anyone have any suggestions on what they'd like to see in the
 demo app?

Show how lucene can 1) do incremental indexing, 2) isn't restricted to
indexing file system resources and 3) can store and query arbitrary
fields. These are in my opinion the features where most other search
engines fall flat.

--
Eric Jain


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene demo ideas?

2003-09-17 Thread hui
I think all the attribute values together with element text values should be
indexed in the content part. Also a xml map file could be used to pick up
the nodes need to be indexed separately so we do not create too many fields
by indexing non-critical nodes separately. Simple xpath could be used for
the map source, the field name and index type should be the map target.

Regards,
Hui

- Original Message - 
From: Robert Koberg [EMAIL PROTECTED]
To: 'Lucene Users List' [EMAIL PROTECTED]
Sent: Wednesday, September 17, 2003 10:09 AM
Subject: RE: Lucene demo ideas?


 Hi,

 Here are a couple of ideas for XML demos:

 1. simply index the content into one 'content' field. Don't worry about
 attributes.

 2. index a linked Dublin core meta data file:
 link rel=meta href=index.rdf /
 And add fields for every element after rdf:Description

 Best,
 -Rob



  -Original Message-
  From: Erik Hatcher [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, September 17, 2003 6:08 AM
  To: Lucene Users List
 
  On Wednesday, September 17, 2003, at 08:43  AM, Killeen, Tom wrote:
   I would suggest XML as well.
 
  Again, I'd like to hear more about how you'd do this generically.  Tell
  me what the field names and values would correspond to when presented
  with an XML file.
 
  Erik
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene demo ideas?

2003-09-17 Thread Andrzej Bialecki
Erik Hatcher wrote:
On Wednesday, September 17, 2003, at 08:42  AM, Ben Litchfield wrote:

What, no PDF files!!


Haha!

http://www.pdfbox.org


And I've used pdfbox before - its cool.

And I'm cool with adding PDF and Word indexing to the demo personally, 
but I didn't want to increase the weight of the demo application.  If 
folks feel strongly about it then I'll incorporate it.
A word of warning: PDFBox is fantastic, I agree - but some PDFs are not 
so... In my application I experienced numerous hangs when PDFBox would 
start parsing some PDFs (I can send the files to Ben if required), and 
then got stuck in an infinite wait somewhere... So I came up with a 
workaround: I run the parser in a separate thread, while waiting in the 
main thread, and then after a certain timeout I kill the processing 
thread and return.

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene demo ideas?

2003-09-17 Thread Erik Hatcher
On Wednesday, September 17, 2003, at 09:21  AM, Pitre, Russell wrote:
I know this may be far fetched, but how about being able to index
.jsp'sI know this is a spindle thing, but It seems a lot of people
need this functionality.
Like I communicated in a previous thread, indexing JSP's just has a 
smell to it for me.  I can't argue with the pragmatic way others have 
done it by crawling, but I don't think of JSP's as content and I'd 
rather index actual content, that may or may not be later presented 
within a JSP.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene demo ideas?

2003-09-17 Thread Jeff Linwood
Paging would be great for the results.

Jeff
- Original Message - 
From: Erik Hatcher [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, September 17, 2003 7:00 AM
Subject: Lucene demo ideas?


 I'm about to start some refactorings on the web application demo that 
 ships with Lucene to show off its features and be usable more easily 
 and cleanly out of the box - i.e. just drop into Tomcat's webapps 
 directory and go.
 
 Does anyone have any suggestions on what they'd like to see in the demo 
 app?  Some of my ideas are:
 
 - Eliminate the need to do a command-line indexing, let the web app do 
 this upon command, allowing you to specify where the index lives (there 
 will be a reasonable default like ~/lucenedemo/index perhaps) and what 
 directory tree to index (perhaps defaulting to the root directory or 
 c:\, or where instead?)
 
 - Spin off a background indexing thread so the web app searching is 
 immediately useful after kicking off the indexing process, and allow a 
 status view of the indexing progress.
 
 - Index text and HTML files.  Any others?  I don't want to get into 
 putting too many dependencies in though - let's keep it relatively 
 simple, although still demonstrative.  Allow search filtering by last 
 modified date range and document type (extension).
 
 - Perhaps allow you to specify the analyzer to use when indexing.
 
 - Show the explanation of how scores are computed in the search results 
 as an option.
 
 I'm all ears to possibilities of improvements!  Send your wishlist.
 
 Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene demo ideas?

2003-09-17 Thread Marco Tedone
I would have the code ready is wanted...
- Original Message - 
From: Pitre, Russell [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, September 17, 2003 2:21 PM
Subject: RE: Lucene demo ideas?


I know this may be far fetched, but how about being able to index
.jsp'sI know this is a spindle thing, but It seems a lot of people
need this functionality.


My suggestion

Russ

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 17, 2003 8:01 AM
To: [EMAIL PROTECTED]
Subject: Lucene demo ideas?

I'm about to start some refactorings on the web application demo that 
ships with Lucene to show off its features and be usable more easily 
and cleanly out of the box - i.e. just drop into Tomcat's webapps 
directory and go.

Does anyone have any suggestions on what they'd like to see in the demo 
app?  Some of my ideas are:

- Eliminate the need to do a command-line indexing, let the web app do 
this upon command, allowing you to specify where the index lives (there 
will be a reasonable default like ~/lucenedemo/index perhaps) and what 
directory tree to index (perhaps defaulting to the root directory or 
c:\, or where instead?)

- Spin off a background indexing thread so the web app searching is 
immediately useful after kicking off the indexing process, and allow a 
status view of the indexing progress.

- Index text and HTML files.  Any others?  I don't want to get into 
putting too many dependencies in though - let's keep it relatively 
simple, although still demonstrative.  Allow search filtering by last 
modified date range and document type (extension).

- Perhaps allow you to specify the analyzer to use when indexing.

- Show the explanation of how scores are computed in the search results 
as an option.

I'm all ears to possibilities of improvements!  Send your wishlist.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene demo ideas?

2003-09-17 Thread Marco Tedone
Yeah, that would be great!
- Original Message - 
From: Jeff Linwood [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, September 17, 2003 5:15 PM
Subject: Re: Lucene demo ideas?


 Paging would be great for the results.

 Jeff
 - Original Message - 
 From: Erik Hatcher [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Wednesday, September 17, 2003 7:00 AM
 Subject: Lucene demo ideas?


  I'm about to start some refactorings on the web application demo that
  ships with Lucene to show off its features and be usable more easily
  and cleanly out of the box - i.e. just drop into Tomcat's webapps
  directory and go.
 
  Does anyone have any suggestions on what they'd like to see in the demo
  app?  Some of my ideas are:
 
  - Eliminate the need to do a command-line indexing, let the web app do
  this upon command, allowing you to specify where the index lives (there
  will be a reasonable default like ~/lucenedemo/index perhaps) and what
  directory tree to index (perhaps defaulting to the root directory or
  c:\, or where instead?)
 
  - Spin off a background indexing thread so the web app searching is
  immediately useful after kicking off the indexing process, and allow a
  status view of the indexing progress.
 
  - Index text and HTML files.  Any others?  I don't want to get into
  putting too many dependencies in though - let's keep it relatively
  simple, although still demonstrative.  Allow search filtering by last
  modified date range and document type (extension).
 
  - Perhaps allow you to specify the analyzer to use when indexing.
 
  - Show the explanation of how scores are computed in the search results
  as an option.
 
  I'm all ears to possibilities of improvements!  Send your wishlist.
 
  Erik
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene demo ideas?

2003-09-17 Thread Tatu Saloranta
On Wednesday 17 September 2003 07:07, Erik Hatcher wrote:
 On Wednesday, September 17, 2003, at 08:43  AM, Killeen, Tom wrote:
  I would suggest XML as well.

 Again, I'd like to hear more about how you'd do this generically.  Tell
 me what the field names and values would correspond to when presented
 with an XML file.

Perhaps just one generic content field, which would contain tokenized
content from all XML segments. That could be done easily  efficiently
with just sax event handling? Since it's a simple demo, you can't get much
simpler than that, but it should still be fairly useful?
Attributes could/should be ignored by default; common practice for XML markup
seems to be for attributes not to contain any content that would make sense to 
index.

So I'd think just stripping out all tags (and comments, PIs etc) might be 
reasonable plain simple approach for demo app.

-+ Tatu +-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]