Re: Tool for analyzing analyzers

2004-06-02 Thread Zilverline
Hi Erik,
Thanks for your reply. Have you tried it on a collection yet? I'd love 
the get some of your feedback. I have limited knowledge of the 
underlying capabilities of the lucene library, which is a complement to 
you, since it was extremely easy to integrate lucene. But I'd like to 
get more out of  lucene, such as incremental indexing, to name one. On 
the otherhand I'm interested in general requirements and wishes for the app.

regards,
 Michael Franken
Erik Hatcher wrote:
On May 28, 2004, at 6:50 AM, Zilverline info wrote:
But I'd love to build a Lucene demo application that is powerful 
enough to be used as a foundation for folks to use out-of-the-box.

That's just what I thought. Here's one: http://www.zilverline.org

Michael - zilverline is nicely done!  I downloaded it and dropped it 
into Tomcat and it came right up.  I did not actually configure a 
collection yet, but from the docs on the website it looks like you 
have built something quite nice.  Maybe you could embed a built-in 
collection of the zilverline docs so something comes up right away and 
is searchable :)

Nice work.  I'll definitely stay tuned into your project.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: a list of matching search term

2004-06-02 Thread Erik Hatcher
On Jun 1, 2004, at 9:19 PM, Anson Lau wrote:
Further to my previous email: The highlighter package should be able 
to pick
up the matching search terms.  Can some experienced highlighter package
users tell me if I should look down that line?
Yes, Highlighter (available in the sandbox) picks out matching terms.  
If you used a custom Formatter with Highlighter, you could pick out 
matching terms and have a list of them.  This would not be something 
you do for every hit, though, as it would take a little time to do for 
each document.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Range Query Sombody HELP please

2004-06-02 Thread Karthik N S

Hey Ype/Erick

Thx in advance in helping me for the Range of Queries.
Finally I was able to trace the wrong process within my code and closed
them.

I still have 3 small Questions.

1)While creating the Range Query Is it possible for Lucene to do somthing
similar..

 +(button AND shirt) +filename:[b10181_p100 TO b10181_p200]

 [Do you think this will work]  It's not on returning hits , but it does
return hits with either one of them  Shirt or button Only.

2)When the indexer start indexing does it do according to alphabetic order
or is it some other way...

3)The Field Type  Keyword  is not accepting name of Files as it indexes
   [ Try indexing filenames and then do a search on them ,the hits will
return u 0 defnitly,  lucene1.3-final version ]

 doc.add(Field.Text(filename,file.getName()))
     Will return Hits

doc.add(Field.Keyword(filename,file.getName()))
  Will Not return Hits


 why???



with regards
Karthik


On Monday 31 May 2004 13:47, Karthik N S wrote:
 Hey Ype...

 1) I switched Off the Multi search Senerio.

 2) Changing the Field type from Text to Keyword
 will fail When I search for the the Field type  filename
 so,I still maintained it to be Text

Just make sure the file name is indexed as you show it,
ie. the underscore should be in the indexed term.
The best way to do that is to index the filename as keyword.
Check the output of the analyzer, or use luke to see what is in the index
for the filename field.

 D:\JAVA\lucene\src\demojava org.lucene.src.indexer.search.SearchFiles
 Search Keyword : b10181_p388
 Source path [ E:/po/ ] : e:/indexer3/b10181
 Query: ['b10181_p388'] in Folder e:/indexer3/b10181/b10181_indx_

 Found document(s) that matched : 'b10181_p388' no of hits :'1' in query
 Field :'filename'
 File Name : B10181_P388


 3)On Search for range between 2 file names  B10181_P702   to  B01081_P355
 still returns me  0 hits  [Included space before the 2nd '+' ]

 D:\JAVA\lucene\src\demojava org.lucene.src.indexer.search.SearchFiles
 Search Keyword : +button +filename:[b10181_p702 TO b10181_p355]

Could you try this:

+button +filename:[b10181_p355 TO b10181_p702]

?
If this does not work, please narrow your problem down to a java test
program
of 10-20 lines, and post the code.

Regards,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: optimize() is not merging into single file? !!!!!!

2004-06-02 Thread iouli . golovatyi
I rechecked  the results. Here they are:

IndexWriter compiled with v.1.4-rc2 generates after optimization
_36d.cfs3779 kb

IndexWriter compiled with v.1.4-rc3 generates after optimization

_36d.cfs   3778 kb
_36c.cfs31 kb
_35z.cfs14 kb
_35o.cfs   14  kb
.
etc.

I both cases segment file contains _36d.cfs

Looks like new version just foget to clean up






Iouli Golovatyi/X/GP/[EMAIL PROTECTED]
01.06.2004 17:22
Please respond to Lucene Users List

 
To: [EMAIL PROTECTED]
cc: 
Subject:optimeze() is not merging into single file?
Category: 



I optimize and close the index after that, but don't get just one .cvs 
file as it promised in doc. Instead of it I see something like small 
segments and a couple of big.
This weird behavor seems started since i changed from v 1.4-rc2 to 
1.4-rc3.
Before I got just one cvs segment . Any ideas?
Thanks in advance
J.



Re: Range Query Sombody HELP please

2004-06-02 Thread Erik Hatcher
On Jun 2, 2004, at 6:20 AM, Karthik N S wrote:
Hey Ype/Erick
If you're gonna ask for help, the least ya could do is spell my name 
correctly :)

I still have 3 small Questions.
1)While creating the Range Query Is it possible for Lucene to do 
somthing
similar..

 +(button AND shirt) +filename:[b10181_p100 TO b10181_p200]
 [Do you think this will work]  It's not on returning hits , but 
it does
return hits with either one of them  Shirt or button Only.
My guess is you have documents none of your documents in that range 
have button AND shirt in them.

2)When the indexer start indexing does it do according to alphabetic 
order
or is it some other way...
I don't understand the question, sorry.  Terms in the index are ordered 
lexicographically, if that is what you mean.

3)The Field Type  Keyword  is not accepting name of Files as it 
indexes
   [ Try indexing filenames and then do a search on them ,the hits will
return u 0 defnitly,  lucene1.3-final version ]

 doc.add(Field.Text(filename,file.getName()))
     Will return Hits
doc.add(Field.Keyword(filename,file.getName()))
  Will Not return Hits
 why???
Because of your analyzer.  Try indexing as a Keyword and search using a 
TermQuery.  Don't use QueryParser at first - it gets in the way of 
understanding what is really going on.  For fun, look at the .toString 
of the Query generated by QueryParser if you like.  Look at the 
AnalysisParalysis page on the wiki for more details.  Read my java.net 
articles to get a better understanding.   The short answer is that it 
is analysis that is bogging you down here.

You need to decide how to index file names on how you plan on querying 
for them.  We cannot answer this for you.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE : optimize() is not merging into single file? !!!!!!

2004-06-02 Thread Rasik Pandey
Hello,

I am running a two-week old version of Lucene from the CVS HEAD and seeing the same 
behavior.?

Regards,
RBP 

 -Message d'origine-
 De : [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]
 Envoy : mercredi 2 juin 2004 13:53
  : Lucene Users List
 Objet : Re: optimize() is not merging into single file? !!
 
 I rechecked  the results. Here they are:
 
 IndexWriter compiled with v.1.4-rc2 generates after
 optimization
 _36d.cfs3779 kb
 
 IndexWriter compiled with v.1.4-rc3 generates after
 optimization
 
 _36d.cfs   3778 kb
 _36c.cfs31 kb
 _35z.cfs14 kb
 _35o.cfs   14  kb
 .
 etc.
 
 I both cases segment file contains _36d.cfs
 
 Looks like new version just foget to clean up
 
 
 
 
 
 
 Iouli Golovatyi/X/GP/[EMAIL PROTECTED]
 01.06.2004 17:22
 Please respond to Lucene Users List
 
 
 To: [EMAIL PROTECTED]
 cc:
 Subject:optimeze() is not merging into single
 file?
 Category:
 
 
 
 I optimize and close the index after that, but don't get just
 one .cvs
 file as it promised in doc. Instead of it I see something like
 small
 segments and a couple of big.
 This weird behavor seems started since i changed from v 1.4-rc2
 to
 1.4-rc3.
 Before I got just one cvs segment . Any ideas?
 Thanks in advance
 J.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Tool for analyzing analyzers

2004-06-02 Thread Leo Galambos


Zilverline [EMAIL PROTECTED] wrote:
__

get more out of  lucene, such as incremental indexing, to name one. On 

Hello,

as far as I know, the incremental indexing
could be a real bottleneck if you implemented
your system without some knowledge
about Lucene internals.

The respective test is here:
http://www.egothor.org/twiki/bin/view/Know/LuceneIssue

Cheers,
Leo



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



indexing french text with Lucene

2004-06-02 Thread uddam chukmol
Hi all, 
 
Lucene is a very powerful tool for english document indexing. I really wonder if it's 
that powerful to index french text.
 
In fact, I need to compute the similarity between 2 french texts. So, if somebody has 
already had the experience of indexing french text, your ideas and recommendation are 
mostly welcome. 
 
Thanks before hand.
 
Uddam
 
 


-
Do you Yahoo!?
Friends.  Fun. Try the all-new Yahoo! Messenger

Re: similarity of two texts

2004-06-02 Thread Terry Steichen
Erik,

Could you expand on this just a wee bit, perhaps with an example of how to
compute this vector angle?

TIA,

Terry

- Original Message - 
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, June 01, 2004 9:39 AM
Subject: Re: similarity of two texts


 On Jun 1, 2004, at 9:24 AM, Grant Ingersoll wrote:
  Hey Eric,

 Eri*K*  :)

  What did you do to calc similarity?

 I computed the angle between two vectors.  The vectors are obtained
 from IndexReader.getTermFreqVector(docId, field).

I haven't had time, but was thinking of ways to add the ability to
  get the similarity score (as calculated when doing a search) given a
  term vector (or just a document id).

 It would be quite compute-intensive to do something like this.  This
 could be done through a custom sort as well, if applying it at the
 scoring level doesn't work.  I haven't given any thought to how this
 could work for scoring or sorting before, but does sound quite
 interesting.

Any ideas on how to approach this would be appreciated.  The scoring
  in Lucene has always been a bit confusing to me, despite looking at
  the code several times, especially once you get into boolean queries,
  etc.

 No doubt that it is confusing - to me also.  But Explanation is your
 friend.

 Erik


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Can I prevent Sort fields from influencing score?

2004-06-02 Thread Andy Goodell
I have been using the new lucene 1.4 SortField implementation wih some
custom fields added to old indexes so that the results can be sorted
by them.  My problem here is that some of the String fields that I add
to the index come up in the search terms, so my results in sort by
score order are different.  Here's an example:

I added the field AUTHOR_SORTABLE to most of the documents in the
index.  But if one of the AUTHOR_SORTABLE field in a document is set
to andy, and i search for andy, this document gets a very
different score than it used to.

Since my added fields aren't set in stone, I'm interested in a general
solution, where all fields containing the text SORTABLE in the name
aren't considered for matches, only for sorting.  Could I do this by
overriding Similarity?  I tried doing this to set the lengthNorm() for
each of my sortable fields to 0, but it hasnt worked yet.  Is there a
different way to store the sortable fields that will prevent this?

Any help would be greatly appreciated.

- andy g

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: similarity of two texts

2004-06-02 Thread David Spencer
Terry Steichen wrote:
Erik,
Could you expand on this just a wee bit, perhaps with an example of how to
compute this vector angle?
I'm tempted to write the code to see how it works, but FYI this doc 
seems to nicely explain the concepts:

http://www.la2600.org/talks/files/20040102/Vector_Space_Search_Engine_Theory.pdf
TIA,
Terry
- Original Message - 
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, June 01, 2004 9:39 AM
Subject: Re: similarity of two texts


On Jun 1, 2004, at 9:24 AM, Grant Ingersoll wrote:
Hey Eric,
Eri*K*  :)

What did you do to calc similarity?
I computed the angle between two vectors.  The vectors are obtained
from IndexReader.getTermFreqVector(docId, field).

 I haven't had time, but was thinking of ways to add the ability to
get the similarity score (as calculated when doing a search) given a
term vector (or just a document id).
It would be quite compute-intensive to do something like this.  This
could be done through a custom sort as well, if applying it at the
scoring level doesn't work.  I haven't given any thought to how this
could work for scoring or sorting before, but does sound quite
interesting.

 Any ideas on how to approach this would be appreciated.  The scoring
in Lucene has always been a bit confusing to me, despite looking at
the code several times, especially once you get into boolean queries,
etc.
No doubt that it is confusing - to me also.  But Explanation is your
friend.
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: similarity of two texts

2004-06-02 Thread Erik Hatcher
On Jun 2, 2004, at 1:39 PM, David Spencer wrote:
Erik,
Could you expand on this just a wee bit, perhaps with an example of  
how to
compute this vector angle?
I'm tempted to write the code to see how it works, but FYI this doc  
seems to nicely explain the concepts:

http://www.la2600.org/talks/files/20040102/ 
Vector_Space_Search_Engine_Theory.pdf
This is, in fact, one of the documents I referenced to get a grasp on  
how to do it.

My code has some built-in assumptions on parts of the equation that get  
short-circuited (there is only 1 of each term in my case, for example)  
so it would not be a general-purpose algorithm.  It's basically just  
using the TermFreqVector information and plugging it into an equation  
like found in that PDF - nothing more than that actually.

Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: similarity of two texts - another question

2004-06-02 Thread Gerard Sychay
Hmm, the term vector does not have to consist of only term frequencies,
does it? To give weight to rare terms, could you create a term vector of
(TF*IDF) values for each term?  Then, a distance function would measure
how many terms two vectors have in common, giving weight to how many
rare terms two vectors have in common.

 David Spencer [EMAIL PROTECTED] 06/01/04 08:25PM 
Erik Hatcher wrote:

 On Jun 1, 2004, at 4:41 PM, uddam chukmol wrote:

 Well, a question again, how does Lucene compute the score between a 

 document and a query?


And I might add, thus, this approach to similarity gives more weight to

rare terms that match, which one might want for this kind of similarity

measure.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: similarity of two texts - another question

2004-06-02 Thread David Spencer
Gerard Sychay wrote:
Hmm, the term vector does not have to consist of only term frequencies,
does it? To give weight to rare terms, could you create a term vector of
(TF*IDF) values for each term?  Then, a distance function would measure
how many terms two vectors have in common, giving weight to how many
rare terms two vectors have in common.
Yeah, but if you're gonna do that why not just form a query with all 
words in the source document, and let the Lucene engine do the idf/tf 
calculations? I've done this and it seems to work fine.

Here's code I've used. It could be done better by avoiding QueryParser, 
and odds are it could hit that exception for too many clauses in a 
boolean expression unless you configure lucene from its default, but 
this is the idea. srch is the entire body of the source document.

public static Query formSimilarQuery( String srch, Analyzer a)
throws org.apache.lucene.queryParser.ParseException, IOException
{
StringBuffer sb = new StringBuffer();
TokenStream ts = a.tokenStream( foo, new StringReader( srch));
org.apache.lucene.analysis.Token t; 
while ( (t = ts.next()) != null)
{
sb.append( t.termText() +  );
}
return QueryParser.parse( sb.toString(),DFields.CONTENTS, a);
}


David Spencer [EMAIL PROTECTED] 06/01/04 08:25PM 
Erik Hatcher wrote:

On Jun 1, 2004, at 4:41 PM, uddam chukmol wrote:

Well, a question again, how does Lucene compute the score between a 

document and a query?

And I might add, thus, this approach to similarity gives more weight to
rare terms that match, which one might want for this kind of similarity
measure.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Range Query Sombody HELP please

2004-06-02 Thread Ype Kingma
On Wednesday 02 June 2004 14:46, Erik Hatcher wrote:
 On Jun 2, 2004, at 6:20 AM, Karthik N S wrote:
...
  I still have 3 small Questions.
 
  1)While creating the Range Query Is it possible for Lucene to do
  somthing
  similar..
 
   +(button AND shirt) +filename:[b10181_p100 TO b10181_p200]
 
   [Do you think this will work]  It's not on returning hits , but
  it does
  return hits with either one of them  Shirt or button Only.

 My guess is you have documents none of your documents in that range
 have button AND shirt in them.

You can also try this:

+button +shirt +filename:[b10181_p100 TO b10181_p200]

I never got to completely understand the way the query parser deals with
AND and OR, so I prefer to avoid them.

Regards,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Can I prevent Sort fields from influencing score?

2004-06-02 Thread Tim Jones
This seems like it would be determined by how you generate your query - if
your query doesn't search in the sorted fields, they shouldn't affect the
scoring of your documents ...


 -Original Message-
 From: Andy Goodell [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, June 02, 2004 12:22 PM
 To: [EMAIL PROTECTED]
 Subject: Can I prevent Sort fields from influencing score?
 
 
 I have been using the new lucene 1.4 SortField implementation wih some
 custom fields added to old indexes so that the results can be sorted
 by them.  My problem here is that some of the String fields that I add
 to the index come up in the search terms, so my results in sort by
 score order are different.  Here's an example:
 
 I added the field AUTHOR_SORTABLE to most of the documents in the
 index.  But if one of the AUTHOR_SORTABLE field in a document is set
 to andy, and i search for andy, this document gets a very
 different score than it used to.
 
 Since my added fields aren't set in stone, I'm interested in a general
 solution, where all fields containing the text SORTABLE in the name
 aren't considered for matches, only for sorting.  Could I do this by
 overriding Similarity?  I tried doing this to set the lengthNorm() for
 each of my sortable fields to 0, but it hasnt worked yet.  Is there a
 different way to store the sortable fields that will prevent this?
 
 Any help would be greatly appreciated.
 
 - andy g
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Can I prevent Sort fields from influencing score?

2004-06-02 Thread Andy Goodell
thanks that was my problem, i had code extending the search out to all
the fields, now it only extends the search out to the fields i'm
interested in.

- andy g

On Wed, 2 Jun 2004 14:21:24 -0500 , Tim Jones [EMAIL PROTECTED] wrote:
 
 This seems like it would be determined by how you generate your query - if
 your query doesn't search in the sorted fields, they shouldn't affect the
 scoring of your documents ...
 
 
 
  -Original Message-
  From: Andy Goodell [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, June 02, 2004 12:22 PM
  To: [EMAIL PROTECTED]
  Subject: Can I prevent Sort fields from influencing score?
 
 
  I have been using the new lucene 1.4 SortField implementation wih some
  custom fields added to old indexes so that the results can be sorted
  by them.  My problem here is that some of the String fields that I add
  to the index come up in the search terms, so my results in sort by
  score order are different.  Here's an example:
 
  I added the field AUTHOR_SORTABLE to most of the documents in the
  index.  But if one of the AUTHOR_SORTABLE field in a document is set
  to andy, and i search for andy, this document gets a very
  different score than it used to.
 
  Since my added fields aren't set in stone, I'm interested in a general
  solution, where all fields containing the text SORTABLE in the name
  aren't considered for matches, only for sorting.  Could I do this by
  overriding Similarity?  I tried doing this to set the lengthNorm() for
  each of my sortable fields to 0, but it hasnt worked yet.  Is there a
  different way to store the sortable fields that will prevent this?
 
  Any help would be greatly appreciated.
 
  - andy g
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Can I prevent Sort fields from influencing score?

2004-06-02 Thread Gus Kormeier
Just curious,
Are you building your query or using a particular Query Parser?
which one?

Are you using MultiFieldQueryParser?  I had problems with MFQP before and
was looking for other solutions besides dumping fields into a massive
content field.

TIA,
-Gus

-Original Message-
From: Andy Goodell [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 02, 2004 1:30 PM
To: Lucene Users List
Subject: Re: Can I prevent Sort fields from influencing score?


thanks that was my problem, i had code extending the search out to all
the fields, now it only extends the search out to the fields i'm
interested in.

- andy g

On Wed, 2 Jun 2004 14:21:24 -0500 , Tim Jones [EMAIL PROTECTED] wrote:
 
 This seems like it would be determined by how you generate your query - if
 your query doesn't search in the sorted fields, they shouldn't affect the
 scoring of your documents ...
 
 
 
  -Original Message-
  From: Andy Goodell [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, June 02, 2004 12:22 PM
  To: [EMAIL PROTECTED]
  Subject: Can I prevent Sort fields from influencing score?
 
 
  I have been using the new lucene 1.4 SortField implementation wih some
  custom fields added to old indexes so that the results can be sorted
  by them.  My problem here is that some of the String fields that I add
  to the index come up in the search terms, so my results in sort by
  score order are different.  Here's an example:
 
  I added the field AUTHOR_SORTABLE to most of the documents in the
  index.  But if one of the AUTHOR_SORTABLE field in a document is set
  to andy, and i search for andy, this document gets a very
  different score than it used to.
 
  Since my added fields aren't set in stone, I'm interested in a general
  solution, where all fields containing the text SORTABLE in the name
  aren't considered for matches, only for sorting.  Could I do this by
  overriding Similarity?  I tried doing this to set the lengthNorm() for
  each of my sortable fields to 0, but it hasnt worked yet.  Is there a
  different way to store the sortable fields that will prevent this?
 
  Any help would be greatly appreciated.
 
  - andy g
  
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



help needed in starting lucene

2004-06-02 Thread milind honrao
Hi,
 
I am just a beginner. I installed lucene according to the intsructions provided. 
I did all the changed to the environment variables
when i try to run the test program for building indexes using the following command:  
java  org.apache.lucene.demo.IndexFiles test/Doc
I am getting the following exception 
Exception in thread main class java.lang.ExceptionInInitializerError: 
java.lang.RuntimeException: java.security.NoSuchAlgorithmException: MD5: Class not 
found.

 

 

Yahoo! India Matrimony: Find your partner online.

RE: help needed in starting lucene

2004-06-02 Thread wallen
It sounds to me like you need a newer version of Java.

-Original Message-
From: milind honrao [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 02, 2004 5:36 PM
To: [EMAIL PROTECTED]
Subject: help needed in starting lucene


Hi,
 
I am just a beginner. I installed lucene according to the intsructions
provided. 
I did all the changed to the environment variables
when i try to run the test program for building indexes using the following
command:  java  org.apache.lucene.demo.IndexFiles test/Doc
I am getting the following exception 
Exception in thread main class java.lang.ExceptionInInitializerError:
java.lang.RuntimeException: java.security.NoSuchAlgorithmException: MD5:
Class not found.

 

 

Yahoo! India Matrimony: Find your partner online.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Can I prevent Sort fields from influencing score?

2004-06-02 Thread Andy Goodell
I build the query myself, its really easy, I just use the normal query
parser with IndexReader.getFieldNames(true) and loop through all of
them to search everything at once.  You can either make a really big
BooleanQuery or make a bunch of small queries and merge the results,
depending on what kind of results you are looking for.  It's probably
not as fast as the one big data field method, but speed is not an
issue yet for anything i've done, whereas code maintenance is a pain,
witness my question that started this thread.

- andy g

On Wed, 2 Jun 2004 13:43:41 -0700 , Gus Kormeier [EMAIL PROTECTED] wrote:
 
 Just curious,
 Are you building your query or using a particular Query Parser?
 which one?
 
 Are you using MultiFieldQueryParser?  I had problems with MFQP before and
 was looking for other solutions besides dumping fields into a massive
 content field.
 
 TIA,
 -Gus
 
 
 
 -Original Message-
 From: Andy Goodell [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, June 02, 2004 1:30 PM
 To: Lucene Users List
 Subject: Re: Can I prevent Sort fields from influencing score?
 
 thanks that was my problem, i had code extending the search out to all
 the fields, now it only extends the search out to the fields i'm
 interested in.
 
 - andy g
 
 On Wed, 2 Jun 2004 14:21:24 -0500 , Tim Jones [EMAIL PROTECTED] wrote:
 
  This seems like it would be determined by how you generate your query - if
  your query doesn't search in the sorted fields, they shouldn't affect the
  scoring of your documents ...
 
 
 
   -Original Message-
   From: Andy Goodell [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, June 02, 2004 12:22 PM
   To: [EMAIL PROTECTED]
   Subject: Can I prevent Sort fields from influencing score?
  
  
   I have been using the new lucene 1.4 SortField implementation wih some
   custom fields added to old indexes so that the results can be sorted
   by them.  My problem here is that some of the String fields that I add
   to the index come up in the search terms, so my results in sort by
   score order are different.  Here's an example:
  
   I added the field AUTHOR_SORTABLE to most of the documents in the
   index.  But if one of the AUTHOR_SORTABLE field in a document is set
   to andy, and i search for andy, this document gets a very
   different score than it used to.
  
   Since my added fields aren't set in stone, I'm interested in a general
   solution, where all fields containing the text SORTABLE in the name
   aren't considered for matches, only for sorting.  Could I do this by
   overriding Similarity?  I tried doing this to set the lengthNorm() for
   each of my sortable fields to 0, but it hasnt worked yet.  Is there a
   different way to store the sortable fields that will prevent this?
  
   Any help would be greatly appreciated.
  
   - andy g
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Marten Senkel/IS/EUROPE/SIALEUROPE is out of the office.

2004-06-02 Thread Marten Senkel




I will be out of the office starting  2004-06-02 and will not return until
2004-06-04.

Please contact Nicolas Guala-Molino for any request.

Thanks!

building custom-stemmer

2004-06-02 Thread Musku, Anil (LA)
Hi,
 
I have a fairly decent idea of using Lucene. I need to use it with some
non-European, Indian and CJK languages. There are some languages among these
that do not currently have a stemmer (I've looked in Snowball). I was
wondering how I could write my own stemmer, say for e.g. for Hindi.
 
Regards,
Anil


RE: a list of matching search term

2004-06-02 Thread Anson Lau
Thanks Erik I'll give that a try.

Anson

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 02, 2004 7:28 PM
To: Lucene Users List
Subject: Re: a list of matching search term

On Jun 1, 2004, at 9:19 PM, Anson Lau wrote:
 Further to my previous email: The highlighter package should be able
 to pick
 up the matching search terms.  Can some experienced highlighter package
 users tell me if I should look down that line?

Yes, Highlighter (available in the sandbox) picks out matching terms.
If you used a custom Formatter with Highlighter, you could pick out
matching terms and have a list of them.  This would not be something
you do for every hit, though, as it would take a little time to do for
each document.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: help needed in starting lucene

2004-06-02 Thread Karthik N S
Hey I think u have a file path problem in there try giving the full path

java  org.apache.lucene.demo.IndexFiles  e:/lucene/../test/Doc

Also set classpath for lucene1.3-final.jar or lucene-1.4-rc2.jar
before start indexing


with regards
Karthik

-Original Message-
From: milind honrao [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 03, 2004 3:06 AM
To: [EMAIL PROTECTED]
Subject: help needed in starting lucene


Hi,

I am just a beginner. I installed lucene according to the intsructions
provided.
I did all the changed to the environment variables
when i try to run the test program for building indexes using the following
command:  java  org.apache.lucene.demo.IndexFiles test/Doc
I am getting the following exception
Exception in thread main class java.lang.ExceptionInInitializerError:
java.lang.RuntimeException: java.security.NoSuchAlgorithmException: MD5:
Class not found.





Yahoo! India Matrimony: Find your partner online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



problems with lucene in multithreaded environment

2004-06-02 Thread Jayant Kumar
We recently tested lucene with an index size of 2 GB
which has about 1,500,000 documents, each document
having about 25 fields. The frequency of search was
about 20 queries per second. This resulted in an
average response time of about 20 seconds approx
per search. What we observed was that lucene queues
the queries and does not release them until the
results are found. so the queries that have come in
later take up about 500 seconds. Please let us know
whether there is a technique to optimize lucene in
such circumstances. 

Please note that we have created a single object for
the searcher (IndexSearcher) and all queries are
passed to this searcher only. We are using a P4 dual
processor machine with 6 gb of ram. We need results at
the rate of about 60 queries/second at peak load. Is
there a way to optimize lucene to get this performance
from this machine? What other ways can i optimize
lucene for this output?

Regards
Jayant


Yahoo! India Matrimony: Find your partner online. 
http://yahoo.shaadi.com/india-matrimony/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: problems with lucene in multithreaded environment

2004-06-02 Thread Doug Cutting
Jayant Kumar wrote:
We recently tested lucene with an index size of 2 GB
which has about 1,500,000 documents, each document
having about 25 fields. The frequency of search was
about 20 queries per second. This resulted in an
average response time of about 20 seconds approx
per search.
That sounds slow, unless your queries are very complex.  What are your 
queries like?

What we observed was that lucene queues
the queries and does not release them until the
results are found. so the queries that have come in
later take up about 500 seconds. Please let us know
whether there is a technique to optimize lucene in
such circumstances. 
Multiple queries executed from different threads using a single searcher 
should not queue, but should run in parallel.  A technique to find out 
where threads are queueing is to get a thread dump and see where all of 
the threads are stuck.  In Solaris and Linux, sending the JVM a SIGQUIT 
will give a thread dump.  On Windows, use Control-Break.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Range Query Sombody HELP please

2004-06-02 Thread Karthik N S
Hey

   Ype the Query  of range

   +button +shirt +filename:[b10181_p100 TO b10181_p200]

  did not work for me but on other way around

  +(button OR shirt) +filename:[b10181_p100 TO b10181_p200]

  resulted to me in 2 hits with either one term  button / shirt   in each
page,but not both of them

 I found from the Html file that both words are present  in more then 2
files,

 Are there any other possibilities  for getting both words.


with regards
Karthik


-Original Message-
From: Ype Kingma [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 03, 2004 12:26 AM
To: [EMAIL PROTECTED]
Subject: Re: Range Query Sombody HELP please


On Wednesday 02 June 2004 14:46, Erik Hatcher wrote:
 On Jun 2, 2004, at 6:20 AM, Karthik N S wrote:
...
  I still have 3 small Questions.
 
  1)While creating the Range Query Is it possible for Lucene to do
  somthing
  similar..
 
   +(button AND shirt) +filename:[b10181_p100 TO b10181_p200]
 
   [Do you think this will work]  It's not on returning hits , but
  it does
  return hits with either one of them  Shirt or button Only.

 My guess is you have documents none of your documents in that range
 have button AND shirt in them.

You can also try this:

+button +shirt +filename:[b10181_p100 TO b10181_p200]

I never got to completely understand the way the query parser deals with
AND and OR, so I prefer to avoid them.

Regards,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]