lucene usage without website

2004-03-24 Thread Pleasant, Tracy

I want to create a knowledgebase but it needs to be something that does
not require a server to run constantly (like with using jsp). I just
needs to run on the Windows platform.  Lucene works well with Windows
using an applet right?

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene and Mysql

2003-12-16 Thread Pleasant, Tracy
You would just take the items from mysql database and create a document for each 
record. Then index all the documents.


-Original Message-
From: Stefan Trcko [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 16, 2003 3:31 PM
To: [EMAIL PROTECTED]
Subject: Lucene and Mysql


Hello

I'm new to Lucene. I want users can search text which is stored in mysql database.
Is there any tutorial how to implement this kind of search feature.

Best regards,
Stefan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Unindexed fields

2003-12-08 Thread Pleasant, Tracy
If you don't index something then it's not going to be searched.


-Original Message-
From: Chong, Herb [mailto:[EMAIL PROTECTED]
Sent: Monday, December 08, 2003 11:14 AM
To: Lucene Users List
Subject: Unindexed fields


is there a limit to the size of an UnIndexed field? i changed my code to increase the 
maximum string size per document from 300 bytes to 10,000 and although the index run 
completes without errors, i never find any documents while searching.

Herb

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Returning one result

2003-12-05 Thread Pleasant, Tracy
Ok thanks, but still I can't use the Simple analyzer since it won't even
index that whole thing. I 'll give TermQuery a try. Thanks.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 04, 2003 6:18 PM
To: Lucene Users List
Subject: Re: Returning one result


You really should use a TermQuery in this case anyway, rather than 
using QueryParser.  You wouldn't have to worry about the analyzer at 
that point anyway (and I assume you're using Field.Keyword during 
indexing).

Erik


On Thursday, December 4, 2003, at 05:01  PM, Pleasant, Tracy wrote:

 Ok I realized teh Simple Analyzer does not index numbers, so I
switched
 back to Standard.

 -Original Message-
 From: Pleasant, Tracy
 Sent: Thursday, December 04, 2003 4:53 PM
 To: Lucene Users List
 Subject: Returning one result


  I am indexing a group of items and one field , id, is unique.  When 
 the
 user clicks on a results I want just that one result to show.

  I index and search using SimpleAnalyzer.


  Query query_es = QueryParser.parse(query, id, new
SimpleAnalyzer());

  It should return only one result but returns 200.





 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Returning one result

2003-12-05 Thread Pleasant, Tracy
Maybe I should have been more clear.

static Field Keyword(String name, String value) 
  Constructs a String-valued Field that is not tokenized, but is
indexed and stored. 

I need to have it tokenized because people will search for that also and
it needs to be searchable. 

Should I have two fields - one as a keyword and one as text? 


How would I do that when I want to return search results..

Right now, in the results page it will have something like
a href=display_record.jsp?id=AR334Record AR334/a 

Then in display_record.jsp:
 Searcher searcher = new IndexSearcher(index);
 String term = request.getParameter(id);

 Query query = QueryParser.parse(term, id, new
StandardAnalyzer());

 Hits hits  = searcher.search(query);

Would it have to be something like:
 TermQuery query = ???

or 
 Query query = QueryParser.Term(id);

? ? ? 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 04, 2003 6:18 PM
To: Lucene Users List
Subject: Re: Returning one result


You really should use a TermQuery in this case anyway, rather than 
using QueryParser.  You wouldn't have to worry about the analyzer at 
that point anyway (and I assume you're using Field.Keyword during 
indexing).

Erik


On Thursday, December 4, 2003, at 05:01  PM, Pleasant, Tracy wrote:

 Ok I realized teh Simple Analyzer does not index numbers, so I
switched
 back to Standard.

 -Original Message-
 From: Pleasant, Tracy
 Sent: Thursday, December 04, 2003 4:53 PM
 To: Lucene Users List
 Subject: Returning one result


  I am indexing a group of items and one field , id, is unique.  When 
 the
 user clicks on a results I want just that one result to show.

  I index and search using SimpleAnalyzer.


  Query query_es = QueryParser.parse(query, id, new
SimpleAnalyzer());

  It should return only one result but returns 200.





 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Returning one result

2003-12-05 Thread Pleasant, Tracy
What I meant is.

Say ID is Ar3453 .. well the user may want to search for Ar3453, so in
order for it to be searchable then it would have to be indexed and not a
keyword.

So after using
TermQuery query = new TermQuery(new Term(id, term));

How would I return the other fields in the document?

For instance to display a record it would get the record with the id #
and then display the title, contents, etc.




-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, December 05, 2003 11:32 AM
To: Lucene Users List
Subject: Re: Returning one result


On Friday, December 5, 2003, at 10:41  AM, Pleasant, Tracy wrote:
 Maybe I should have been more clear.

 static Field Keyword(String name, String value)
   Constructs a String-valued Field that is not tokenized, but 
 is
 indexed and stored.

 I need to have it tokenized because people will search for that also 
 and
 it needs to be searchable.

Search for *what* also?  Tokenized means that it is broken into pieces 
which will be separate terms.  For example: see spot is tokenized 
into see and spot, and searching for either of those terms will 
match.

Just try it and see, please!  :)

 Should I have two fields - one as a keyword and one as text?

Depends on what you're doing... but an id field to me indicates 
Field.Keyword to me, only.

 How would I do that when I want to return search results..

  Searcher searcher = new IndexSearcher(index);
  String term = request.getParameter(id);

  Query query = QueryParser.parse(term, id, new
 StandardAnalyzer());

  Hits hits  = searcher.search(query);

 Would it have to be something like:
  TermQuery query = ???

Yes.  TermQuery query = new TermQuery(new Term(id, term));

Use searcher.search exactly as you did before.  Just don't use 
QueryParser to construct a query.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Returning one result

2003-12-05 Thread Pleasant, Tracy
Also what I am indexing is not a bunch of separate documents - or then
it would be easy to simply have a field called url and then the link
would go directly do that document. 

However, there is a text URL with many records
During indexing, a function parses each record and puts each into a
document with appropriate fields. 

When I go to display a particular Document (Lucene Document) I just
query the index for that unique ID rather than go through and parse
through the URL with all the records. 

Wouldn't querying the index for that unique ID be better than going
through that entire page and parsing through it - there is more room for
error that way.  

It's a long story why there isn't a database but it can't be done (don't
ask ... long story). 

-Original Message-
From: Pleasant, Tracy 
Sent: Friday, December 05, 2003 1:25 PM
To: Lucene Users List
Subject: RE: Returning one result


What I meant is.

Say ID is Ar3453 .. well the user may want to search for Ar3453, so in
order for it to be searchable then it would have to be indexed and not a
keyword.

So after using
TermQuery query = new TermQuery(new Term(id, term));

How would I return the other fields in the document?

For instance to display a record it would get the record with the id #
and then display the title, contents, etc.




-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, December 05, 2003 11:32 AM
To: Lucene Users List
Subject: Re: Returning one result


On Friday, December 5, 2003, at 10:41  AM, Pleasant, Tracy wrote:
 Maybe I should have been more clear.

 static Field Keyword(String name, String value)
   Constructs a String-valued Field that is not tokenized, but 
 is
 indexed and stored.

 I need to have it tokenized because people will search for that also 
 and
 it needs to be searchable.

Search for *what* also?  Tokenized means that it is broken into pieces 
which will be separate terms.  For example: see spot is tokenized 
into see and spot, and searching for either of those terms will 
match.

Just try it and see, please!  :)

 Should I have two fields - one as a keyword and one as text?

Depends on what you're doing... but an id field to me indicates 
Field.Keyword to me, only.

 How would I do that when I want to return search results..

  Searcher searcher = new IndexSearcher(index);
  String term = request.getParameter(id);

  Query query = QueryParser.parse(term, id, new
 StandardAnalyzer());

  Hits hits  = searcher.search(query);

 Would it have to be something like:
  TermQuery query = ???

Yes.  TermQuery query = new TermQuery(new Term(id, term));

Use searcher.search exactly as you did before.  Just don't use 
QueryParser to construct a query.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Returning one result

2003-12-05 Thread Pleasant, Tracy
Maybe we are having some communication issues. 

At any rate, I did index it as a KEYWORD and when displaying used the
TermQuery.

The only problem with this though is by storing the ID (i.e. AR345) as a
Keyword, if I search for AR345 no results are returned when I use the
MultiFieldQueryParser .

*sigh* *arg*



-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Friday, December 05, 2003 2:13 PM
To: Lucene Users List
Subject: Re: Returning one result


On Friday, December 5, 2003, at 01:25  PM, Pleasant, Tracy wrote:
 Say ID is Ar3453 .. well the user may want to search for Ar3453, so in
 order for it to be searchable then it would have to be indexed and not

 a
 keyword.

*arg* - we're having a serious communication issue here.  My advice to 
you is to actually write some simple tests (test-driven learning using 
JUnit is a wonderful way to experiement with Lucene, especially thanks 
to the RAMDirectory).  Please refer to my articles at java.net as well 
as the other great Lucene articles out there.

Let me try again a Field.Keyword *IS* indexed!  Even Lucene's 
javadocs say this for this method:

   /** Constructs a String-valued Field that is not tokenized, but is 
 indexed
 and stored.  Useful for non-text fields, e.g. date or url.  */

[I added the emphasis there]


 So after using
 TermQuery query = new TermQuery(new Term(id, term));

 How would I return the other fields in the document?

 For instance to display a record it would get the record with the id #
 and then display the title, contents, etc.

Umm you'd use *exactly* the same way as if you had used 
QueryParser.  QueryParser would create a TermQuery for you, in fact, 
except it would analyze your text first, which is what you want to 
avoid, right?

Hits.doc(n) gives you back a Document.  And then 
Document.get(fieldName) gives you back the fields (as long as you  
stored  them in the index too).

Again, please attempt some of these things in code.  It is a trivial 
matter to index and search using RAMDirectory and experiment with 
TermQuery, QueryParser, Analyzers, etc.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Returning one result

2003-12-05 Thread Pleasant, Tracy
Thanks, but using it as a Keyword, it will not get returned with my
search results when I use MultiFieldQueryParser.

If I could I would use just parse(query) but that is not a static
method, only parse(query,field,analyzer) is... So when I do that and use
an analyzer, the keyword field isn't searched.



-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED]
Sent: Friday, December 05, 2003 2:14 PM
To: Lucene Users List
Subject: Re: Returning one result


On Fri, Dec 05, 2003 at 01:25:23PM -0500, Pleasant, Tracy wrote:
 What I meant is.
 
 Say ID is Ar3453 .. well the user may want to search for Ar3453, so in
 order for it to be searchable then it would have to be indexed and not
a
 keyword.

No. You should store it as a keyword. 

From the javadocs:
Keyword(String name, String value)
  Constructs a String-valued Field that is not tokenized, but is
indexed and stored.


 
 So after using
 TermQuery query = new TermQuery(new Term(id, term));
 
 How would I return the other fields in the document?
 
 For instance to display a record it would get the record with the id #
 and then display the title, contents, etc.
 
 
 
 
 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Friday, December 05, 2003 11:32 AM
 To: Lucene Users List
 Subject: Re: Returning one result
 
 
 On Friday, December 5, 2003, at 10:41  AM, Pleasant, Tracy wrote:
  Maybe I should have been more clear.
 
  static Field Keyword(String name, String value)
Constructs a String-valued Field that is not tokenized,
but 
  is
  indexed and stored.
 
  I need to have it tokenized because people will search for that also

  and
  it needs to be searchable.
 
 Search for *what* also?  Tokenized means that it is broken into pieces

 which will be separate terms.  For example: see spot is tokenized 
 into see and spot, and searching for either of those terms will 
 match.
 
 Just try it and see, please!  :)
 
  Should I have two fields - one as a keyword and one as text?
 
 Depends on what you're doing... but an id field to me indicates 
 Field.Keyword to me, only.
 
  How would I do that when I want to return search results..
 
   Searcher searcher = new IndexSearcher(index);
   String term = request.getParameter(id);
 
   Query query = QueryParser.parse(term, id, new
  StandardAnalyzer());
 
   Hits hits  = searcher.search(query);
 
  Would it have to be something like:
   TermQuery query = ???
 
 Yes.  TermQuery query = new TermQuery(new Term(id, term));
 
 Use searcher.search exactly as you did before.  Just don't use 
 QueryParser to construct a query.
 
   Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Returning one result

2003-12-04 Thread Pleasant, Tracy
 I am indexing a group of items and one field , id, is unique.  When the
user clicks on a results I want just that one result to show.  

 I index and search using SimpleAnalyzer.

 
 Query query_es = QueryParser.parse(query, id, new SimpleAnalyzer());
  
 It should return only one result but returns 200.
  
 
 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Question - not returning desired results

2003-11-26 Thread Pleasant, Tracy
Thanks this helps a lot :)

 



-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 26, 2003 4:58 AM
To: Lucene Users List
Subject: Re: Search Question - not returning desired results


On Tuesday, November 25, 2003, at 12:11  PM, Pleasant, Tracy wrote:

 The documents I have index contain information regarding file names 
 also.

 For instance 'return_results.pl' or something like that may be in the 
 document fields.

 I am not understanding Lucene's way of searching:

 1. If I search for 'return_results', the search does not return 
 anything
 2. If I search for 'results' or 'return', the search does not return 
 anything
 3. If I search for 'results.pl', the search does return the document 
 containg 'return_results.pl'
 4. If I search for 'results~', the search does return the document 
 containg 'return_results.pl'
 5. If I search for 'return_results~', the search does not return 
 anything

 What is going on?

 I want it to return the document in all of the situations.

 I also don't want to have to use '~' all the time.

We sure do have a recurring theme lately :)  Analysis!

Please refer to my article at java.net:

http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

Look at the AnalysisDemo code.  Copy it over and try it out on the text 
you're using and the Analyzer you're using.  The bracketed text that 
comes out are the tokens that you can search on.  It is very very 
important to understand this process and to really know what terms come 
out of text you hand it - otherwise it is a mystery why some things can 
be found and some things cannot despite your expectations to the 
contrary.

A follow-up to the Analysis is querying - and QueryParser has it's own 
set of quirks and caveats related to how things are tokenized/analyzed. 
  And, I've got just the follow-up article for you handy...


http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

If you digest both of these articles (analysis one first please) then I 
think a lot of questions that get asked on this list will be implicitly 
answered.  Understanding analysis is key.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Question - not returning desired results

2003-11-26 Thread Pleasant, Tracy
Erik,

I think there may be a typo in the website.

When I run the AnalyzerDemo :

Analzying xyz corporation - [EMAIL PROTECTED]
org.apache.lucene.analysis.standard.StandardAnalyzer:
[xyz] [corporation] [EMAIL PROTECTED] 

Your website says:

org.apache.lucene.analysis.standard.StandardAnalyzer:
[xyz] [corporation] [EMAIL PROTECTED] [com] 

When I run it it keeps the entire email '[EMAIL PROTECTED]
but according to your website it separates the '[EMAIL PROTECTED]' from the
'com'

Is there a difference between the versions of Lucene? I'm using 1.3rc2.

Plus I think what I want is a StandardAnalyzer with a little tweaking.
The simple one was fine until I realized that it doesn't do numbers,
which I need as part of my search since numbers is important for what
I'm doing. The Standard does numbers but I need it to be a little
different of course. Thanks for the site.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 26, 2003 4:58 AM
To: Lucene Users List
Subject: Re: Search Question - not returning desired results


On Tuesday, November 25, 2003, at 12:11  PM, Pleasant, Tracy wrote:

 The documents I have index contain information regarding file names 
 also.

 For instance 'return_results.pl' or something like that may be in the 
 document fields.

 I am not understanding Lucene's way of searching:

 1. If I search for 'return_results', the search does not return 
 anything
 2. If I search for 'results' or 'return', the search does not return 
 anything
 3. If I search for 'results.pl', the search does return the document 
 containg 'return_results.pl'
 4. If I search for 'results~', the search does return the document 
 containg 'return_results.pl'
 5. If I search for 'return_results~', the search does not return 
 anything

 What is going on?

 I want it to return the document in all of the situations.

 I also don't want to have to use '~' all the time.

We sure do have a recurring theme lately :)  Analysis!

Please refer to my article at java.net:

http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

Look at the AnalysisDemo code.  Copy it over and try it out on the text 
you're using and the Analyzer you're using.  The bracketed text that 
comes out are the tokens that you can search on.  It is very very 
important to understand this process and to really know what terms come 
out of text you hand it - otherwise it is a mystery why some things can 
be found and some things cannot despite your expectations to the 
contrary.

A follow-up to the Analysis is querying - and QueryParser has it's own 
set of quirks and caveats related to how things are tokenized/analyzed. 
  And, I've got just the follow-up article for you handy...


http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

If you digest both of these articles (analysis one first please) then I 
think a lot of questions that get asked on this list will be implicitly 
answered.  Understanding analysis is key.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Eliminating duplicate result

2003-11-26 Thread Pleasant, Tracy
You are searching for the same term and you are searching the same index twice, it 
will return the same results... 

I don't get what you are asking.


-Original Message-
From: Dragan Jotanovic [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 26, 2003 3:19 AM
To: Lucene Users List
Subject: Re: Eliminating duplicate result


 When you are doing two searches are you searching for two different terms?
 

No, I am searching for the same term.


What is the easyest way to eliminate duplicate documents if one is doing two searches 
on the same index?

Have anybody done something similar?



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene refresh index function (incremental indexing).

2003-11-25 Thread Pleasant, Tracy
I was able to get PDFBox to work with my JSP webpages. 

I think you will have to in a way write your own code to do the PDF
files (while still calling the Lucene functions)

 doc = LucenePDFDocument.getDocument(file);


-Original Message-
From: Tun Lin [mailto:[EMAIL PROTECTED]
Sent: Monday, November 24, 2003 11:07 PM
To: 'Lucene Users List'
Subject: RE: Lucene refresh index function (incremental indexing).


Does it support indexing the contents of pdf files? I have found one
project
called PDFBox that can be integrated with Lucene to search inside of the
pdf
files. Currently, Lucene can only search for the pdf filename. I tried
with
PDFBox and I got the following message when I typed the command: java
org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 

log4j:WARN No appenders could be found for logger
(org.pdfbox.pdfparser.PDFParse
r).
log4j:WARN Please initialize the log4j system properly.

Can anyone advise?

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 25, 2003 5:01 AM
To: Lucene Users List
Subject: Re: Lucene refresh index function (incremental indexing).

Tun Lin wrote:
 These are the steps I took:
 
 1) I compile all the files in a particular directory using the
command: 
 java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 
 , putting all the indexed files in c:\\index.
 2) Everytime, I added an additional file in that directory. I need to 
 reindex/recompile that directory to generate the indexes again. As the

 directory gets larger, the indexing takes a longer time.
 
 My question is how do I generate the indexes automatically everytime a

 new document is added in that directory without me recompiling
everytime
manually?

To update, try removing the '-create' from the command line.  The demo
code
supports incremental updates.  It will re-scan the directory and figure
out
which files have changed, what new files have appeared and which
previously
existing files have been removed.

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Score

2003-11-25 Thread Pleasant, Tracy
Thanks for your input. 
I am using the standard analyzer for everything. I haven't created my
own analyzer yet.

The documents I am using: 

Plain text
PDF Documents

(I have two indexes) 

When I create my index: 
   IndexWriter writer = new IndexWriter(index_name, new
StandardAnalyzer(),true);

When I search:
Analyzer analyzer = new StandardAnalyzer();
query = MultiFieldQueryParser.parse(queryString, fields, analyzer); 
(where query String is the term to search and fields is the array of
fields)

When searching it does the one index then it does the other. 


When you say you use different analyzers for different fields in your
index, how would you accomplish that? When I create the index it has a
parameter for analyzer.. unless you create different indexes , how do
you use two different ones? 



-Original Message-
From: Gerret Apelt [mailto:[EMAIL PROTECTED]
Sent: Monday, November 24, 2003 3:25 PM
To: Lucene Users List
Subject: Re: Score


Tracey --

it would help if you could give more detail on the types of documents, 
fields and analyzers you're using. Also what do you mean by Multi Field

Search? I presume you're using the MultiFieldQueryParser to have query 
terms in a user-submitted query be searched for in each field in your
index.

If I am understanding your problem, then it might be the same one I had 
a few weeks ago -- highly relevant matches would not receive a high 
ranking. (This paragraph will apply to you only if you use more than 
just one Analyzer for the set of your fields). I had six fields in my 
index, most of which were populated with a standard analyzer. I used 
self-made Analyzers for two of the fields. This turned out to be my 
problem when using MultiFieldQueryParser: I told my 
MultiFieldQueryParser instance to use only the standard analyzer. 
Instead I discovered that I needed to make use of 
org.apache.lucene.analysis.PerFieldAnalyzerWrapper and feed that to the 
MultiFieldQueryParser. Unless you do this, your problem is whats 
described here: 
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.in
dexingtoc=faq#q15.

Most likely, if your scoring is off, you're doing something wrong in 
the way you use the Lucene API -- at least, thats what I've discovered 
to be the case when my ranking is off.

If you're interested in the nitty-gritty of how scoring is done, check 
this FAQ entry:
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.se
archtoc=faq#q31

cheers,
Gerret

Pleasant, Tracy wrote:

Hi,

I'm using the Multi Field Search to search all the fields of my
documents during the search. 

When it returns results the scores are numerically low - .06, .17, etc.
I would think if I searched for Dog and there was a doc with Dog in
the title and several times in the contents of a document that it would
receive a score more like 1.0 or close to it.

Is there a way that I can tweak the score?

I tried using Boost but that did absolutely nothing.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Tokenizing text custom way

2003-11-25 Thread Pleasant, Tracy
Not exactly and answer to the question but I haven't yet used the Token 
classes/functionality that came with Lucene. Can someone give me an idea of how and 
why one may use this?

 

-Original Message-
From: Dragan Jotanovic [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 6:42 AM
To: Lucene Users List
Subject: Tokenizing text custom way


Hi. I need to tokenize text while indexing but I don't want space to be delimiter. 
Delimiter should be my custom character (for example comma). I understand that I would 
probably need to implement my own analyzer, but could someone help me where to start. 
Is there any other way to do this without writing custom analyzer?

This is what I want to achieve.
If I have some text that will be indexed like following:

man, people, time out, sun

and if I enter 'time' as a search word, I don't want to get time out in results. I 
need exact keyword matching. I would achieve this if I tokenize time out as one 
token while idexing.

Maybe someone had similar problem? If someone knows how to handle this, please help me.

Dragan Jotanovic


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: XML support in Lucene

2003-11-25 Thread Pleasant, Tracy
This may help you: 

http://www.jguru.com/faq/view.jsp?EID=1074235



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 11:07 AM
To: [EMAIL PROTECTED]
Subject: XML support in Lucene


Hello group,

does Lucene offer an effective and flexible way to treat XML files. I
know
that as soon as an InputStream is provided Lucene can basically index
(evtl.
after clearning) everything. How is it with XML files?

If there is a way is it possbile to have one big XML file with many
individual parts in it. This should be considered as docuemnts and the
repeative XML
tags as fields.

Here an example:

MySMSList
  SMS
FromTim/From
ContentHow are you? Tom/Content
  /SMS
  SMS
 FromLinda/From
Contentbla bla bla/Content
  SMS
/MySMSList


Does somebody has already developed classes which go though this XML
file,
create TWO documents with the fields From and Content and fill in
the text
between the tags ? The Indexing business should then be the same since
it is
abstract against the Document object. The same for the search process.
The
search process however could be optimised with stuctural information
(i.e.
only search in Content)...

Cheers,
Ralph

-- 
NEU FUR ALLE - GMX MediaCenter - fur Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gru?, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse fur Mail, Message, More! +++


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene refresh index function (incremental indexing).

2003-11-25 Thread Pleasant, Tracy
I vaguely remmeber I had a problem back when I used 0.6.2. I reverted back and used 
0.6.1 instead. I haven't had any problems.


-Original Message-
From: Zhou, Oliver [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 11:03 AM
To: 'Lucene Users List'
Subject: RE: Lucene refresh index function (incremental indexing).


I do have other problems with PDFBox-0.6.4.  For one, it has annoying debug
information at very low level parsing process.  The other, I got infinite
loop while indexing pdf files although they say the infinite loop bug has
been fixed in their release notes.  Anybody knows what's going on?

Thanks,
Oliver

 

-Original Message-
From: Ben Litchfield [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 9:45 AM
To: Lucene Users List
Subject: RE: Lucene refresh index function (incremental indexing).



Yes, just add the log4j configuration.  The easiest way to do that is as a
system parameter like this

java -Dlog4j.configuration=log4j.xml org.apache.lucene.demo.IndexHTML
-create -index c:\\index ..

Where log4j.xml is the path to your log4j config, PDFBox has an example
one you can use.

Ben
http://www.pdfbox.org

On Tue, 25 Nov 2003, Zhou, Oliver wrote:

 Lucene doesn't have pdf parser.  In order to index pdf files you have to
add
 one by your self.  PDFBox is a good choice.  You may just ignore the
warning
 for log4j or you can add log4j in your classpath.

 Oliver


 -Original Message-
 From: Tun Lin [mailto:[EMAIL PROTECTED]
 Sent: Monday, November 24, 2003 10:07 PM
 To: 'Lucene Users List'
 Subject: RE: Lucene refresh index function (incremental indexing).


 Does it support indexing the contents of pdf files? I have found one
project
 called PDFBox that can be integrated with Lucene to search inside of the
pdf
 files. Currently, Lucene can only search for the pdf filename. I tried
with
 PDFBox and I got the following message when I typed the command: java
 org.apache.lucene.demo.IndexHTML -create -index c:\\index ..

 log4j:WARN No appenders could be found for logger
 (org.pdfbox.pdfparser.PDFParse
 r).
 log4j:WARN Please initialize the log4j system properly.

 Can anyone advise?

 -Original Message-
 From: Doug Cutting [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, November 25, 2003 5:01 AM
 To: Lucene Users List
 Subject: Re: Lucene refresh index function (incremental indexing).

 Tun Lin wrote:
  These are the steps I took:
 
  1) I compile all the files in a particular directory using the command:
  java org.apache.lucene.demo.IndexHTML -create -index c:\\index ..
  , putting all the indexed files in c:\\index.
  2) Everytime, I added an additional file in that directory. I need to
  reindex/recompile that directory to generate the indexes again. As the
  directory gets larger, the indexing takes a longer time.
 
  My question is how do I generate the indexes automatically everytime a
  new document is added in that directory without me recompiling everytime
 manually?

 To update, try removing the '-create' from the command line.  The demo
code
 supports incremental updates.  It will re-scan the directory and figure
out
 which files have changed, what new files have appeared and which
previously
 existing files have been removed.

 Doug


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Searching different types of words

2003-11-25 Thread Pleasant, Tracy
If I search for like I would want the search to return documents
containing like, liked, likes, etc.. variations of the word.

Is there a way to tell Lucene to do this? 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Search Question - not returning desired results

2003-11-25 Thread Pleasant, Tracy

The documents I have index contain information regarding file names also.

For instance 'return_results.pl' or something like that may be in the document fields.

I am not understanding Lucene's way of searching:

1. If I search for 'return_results', the search does not return anything
2. If I search for 'results' or 'return', the search does not return anything
3. If I search for 'results.pl', the search does return the document containg 
'return_results.pl' 
4. If I search for 'results~', the search does return the document containg 
'return_results.pl' 
5. If I search for 'return_results~', the search does not return anything

What is going on? 

I want it to return the document in all of the situations.

I also don't want to have to use '~' all the time.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Hits Highlighting

2003-11-25 Thread Pleasant, Tracy

 Are there any  hits highlighting functions? 

 I have a simple one, but it gets complicated with searching multiple
words, having tokens, etc.


 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Question

2003-11-25 Thread Pleasant, Tracy
 How come if I search for 'red_car*' it returns nothing.

 I am using standard analyzer, too. 

-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 12:22 PM
To: Lucene Users List
Subject: Re: Search Question


No, but if you use the standard analyzer searching red* will return
documents with read_car

On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote:
 
  If I have words within a document like 
  
  red_car
  
  If I search for 'red' would it return documents containing 'red_car'?

 
  
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search Question

2003-11-25 Thread Pleasant, Tracy
Also searching 'red_*' returns nothing, also.





-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 12:22 PM
To: Lucene Users List
Subject: Re: Search Question


No, but if you use the standard analyzer searching red* will return
documents with read_car

On Tue, Nov 25, 2003 at 12:00:01PM -0500, Pleasant, Tracy wrote:
 
  If I have words within a document like 
  
  red_car
  
  If I search for 'red' would it return documents containing 'red_car'?

 
  
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Hits Highlighting

2003-11-25 Thread Pleasant, Tracy
I have seen that one, but it doesn't include the source code, only the
jar with classes.

I need something to actually highlight - like if you took a yellow
marker and highlighted,not doing it in bold.  

 

-Original Message-
From: Dror Matalon [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 12:29 PM
To: Lucene Users List
Subject: Re: Hits Highlighting


Hi,

The lucene home page has a lot of resources, including the FAQs,
articles, javadocs and contributions. 

For instance, there's a query hilighter in the contributions page. 



On Tue, Nov 25, 2003 at 12:17:41PM -0500, Pleasant, Tracy wrote:
 
  Are there any  hits highlighting functions? 
 
  I have a simple one, but it gets complicated with searching multiple
 words, having tokens, etc.
 
 
  
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Score

2003-11-24 Thread Pleasant, Tracy
Hi,

I'm using the Multi Field Search to search all the fields of my
documents during the search. 

When it returns results the scores are numerically low - .06, .17, etc.
I would think if I searched for Dog and there was a doc with Dog in
the title and several times in the contents of a document that it would
receive a score more like 1.0 or close to it.

Is there a way that I can tweak the score?

I tried using Boost but that did absolutely nothing.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]