RE: hypens

2006-04-18 Thread Ramana Jelda
 Hi,
I would use index & search analyzers in this case..
"b-trunk" is analyzed & indexed as b,btrunk,trunk
Search term "b-trunk" is anlayzed using search analyzer as "btrunk" and
searched. U will find the result..

Similarly for 12412-235, 12412-121, 12412-etc , indexed as
12412,12412235,235 etc
So obviously it will find 12412 search term.


Good luck,
Jelda


> -Original Message-
> From: John Powers [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 17, 2006 6:59 PM
> To: java-user@lucene.apache.org
> Subject: hypens
> 
> Hello,
> 
>  
> 
> If I have a user search for "b-trunk"  I would like them to be able to
> 
> find "b-trunk" (with hypen).   I would also like someone searching for
> 
> "b trunk" to also find "b-trunk".
> 
>  
> 
> On the other side, if someone searches for 12412 I would like 
> them to be
> 
> able to find 12412-235, 12412-121, 12412-etc...  as well 
> as letting
> 
> someone type in 12412-235 directly and get a good result 
> list: the one item would be best, but a larger list with that 
> one on top is good too.
> 
>  
> 
> So for now I am using the standardanalyzer.   I do a search for what
> 
> they give me in quotes on all fields as well as the same 
> thing w/o quotes.  When I print out the final query the half 
> of the overall query in quotes seems to have the hypens 
> stripped out, but the w/o quotes
> 
> version doesn't...so this lets me find what I want.   But I have each
> 
> search phrase in the final query twice now.it seems to work fine,
> 
> but it seems pretty inelegant--unelegant even.   
> 
>  
> 
> It seems like I can't just strip out the hypens, nor keep 
> them.I am
> 
> storing the name as keyword, but everything else as Text.   I thought
> 
> that would matter but a description or keyword or other field 
> may have something like "this also relates to 23523-235"  so 
> if someone was searching for 23523 I would also want this in 
> the list... and if they
> 
> searched for the 23523-235 then I would also want this still.So I
> 
> don't know if its solvable by the type of field I use to 
> index it.   Or
> 
> do I have to store each field twice with different analyzer?  
> That seems just as clumsy as my double-search solution.  
> 
>  
> 
> Any thoughts?
> 
>  
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Max Frequency and Tf/Idf

2006-04-18 Thread Danilo Cicognani
Hi Grant Ingersoll and everybody.

> The Term Vector code can be used to get the term frequencies from a
> specific document.  Search this list, see the Lucene In
> Action book or
> look at http://www.cnlp.org/apachecon2005 for examples on how to use
> Term Vectors 

Maybe I didn't explain well my question.
Following is the code we are using now: we was considering the possiblity to
have more informations from Lucene (for example the maximum term frequency
in one document) to optimized the calculations.
The first method is the one that start the calculation of Tf/Idf using the
class TTfIdf whose constructor is reported below.

public TTfIdf getFieldTfIdf(long tid, long aid, String field) throws 
RisorseMultipleException, IOException, RisorsaNonTrovataException, 
TTfIdfException {
reader= IndexReader.open(indexDir);
int id=getDocumentId(tid,aid);
TermFreqVector tfv = reader.getTermFreqVector(id,field);
int[] freqs=tfv.getTermFrequencies();
String[] terms=tfv.getTerms();
int[] df=new int[terms.length];
for(int i=0;imaxfreq) maxfreq=freqs[i];
}
this.freqs=new double[l];
double tf;
double idf;
for(int i=0;i

Re: Max Frequency and Tf/Idf

2006-04-18 Thread karl wettin


18 apr 2006 kl. 11.45 skrev Danilo Cicognani:

Following is the code we are using now: we was considering the  
possiblity to
have more informations from Lucene (for example the maximum term  
frequency

in one document) to optimized the calculations.
The first method is the one that start the calculation of Tf/Idf  
using the

class TTfIdf whose constructor is reported below.

for(int i=0;imaxfreq) maxfreq=freqs[i];
}
this.freqs=new double[l];
double tf;
double idf;
for(int i=0;i

Not quite sure what you do above, but I guess you could caclulate the  
information at index time. To persist it in the index, extend/hack  
TermFreqVector and related IO-classes.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Indexing - scheduled batch process or server?

2006-04-18 Thread Marc Dauncey
Thanks for the response, Jeremy.

Quartz seems like a great solution - are you running
it within the app server?

I think the benefits of doing this would be
convenience of messaging the search server to pick up
fresh indexes. Previously I considered a CRON job and
was thinking of making a web services call to achieve
the same thing.

The only thing that concerns me (and this is maybe a
question for the Quartz mailing list rather than this
one) is the spawning of user threads issue. That kind
of thing makes me nervous in an app server context,
but lots of people use Quartz for J2EE scheduling so
it must be fairly stable.

What was your experience of it?

Many thanks

Marc


--- Jeremy Hanna <[EMAIL PROTECTED]> wrote:

> I'm pretty new with this, but with my index for a
> database, I'm using  
> a Quartz scheduler.  Also at the end of the index
> update, I set my  
> singleton of IndexSearcher to null.  That way the
> index searcher will  
> be using the latest information.  That bit as well
> as setting it to  
> null and not closing it I found searching around on
> forums.  The  
> reason given for not closing it is to allow searches
> currently using  
> the index searches to finish using it.
> Anyway, I hope this helps.
> Jeremy
> 
> On Apr 17, 2006, at 2:53 PM, Marc Dauncey wrote:
> 
> > Hi everyone,
> >
> > I'm currently designing a Lucene search system and
> i'm
> > considering the indexing side of things.
> >
> > Just wondered what kind of architecture people
> have
> > adopted for indexing - are CHRON jobs sufficient
> for
> > high volume drip feed indexing or has anyone
> > implemented a more sophisticated solution with web
> > services to index on demand?
> >
> > And has anyone used Quartz to schedule Lucene
> index
> > updates?  Sounds like an interesting product in
> this
> > context.
> >
> > Many thanks
> >
> >
> > Marc Dauncey
> >
> >
> > 
> > 
> > 
> >
>
___
> > Yahoo! Messenger - NEW crystal clear PC to PC
> calling worldwide  
> > with voicemail http://uk.messenger.yahoo.com
> >
> >
>
-
> > To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> > For additional commands, e-mail:
> [EMAIL PROTECTED]
> >
> 
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 




___ 
24 FIFA World Cup tickets to be won with Yahoo! Mail http://uk.mail.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Indexing - scheduled batch process or server?

2006-04-18 Thread Yonik Seeley
On 4/17/06, Marc Dauncey <[EMAIL PROTECTED]> wrote:
> or has anyone
> implemented a more sophisticated solution with web
> services to index on demand?

In Solr, documents (XML versions of Lucene Documents) are POSTed to the server.
There are explicit  commands that cause an new IndexReader to
be opened and warmed in the background.

-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: hypens

2006-04-18 Thread John Powers
What do you mean by "use index and search analyzers".  Don't you always
have to pass in an analyzer?   I am using the standardanalyzer in both
cases.

Which analyzer are you recommending I use for this?   

-Original Message-
From: Ramana Jelda [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 18, 2006 3:45 AM
To: java-user@lucene.apache.org
Subject: RE: hypens

 Hi,
I would use index & search analyzers in this case..
"b-trunk" is analyzed & indexed as b,btrunk,trunk
Search term "b-trunk" is anlayzed using search analyzer as "btrunk" and
searched. U will find the result..

Similarly for 12412-235, 12412-121, 12412-etc , indexed as
12412,12412235,235 etc
So obviously it will find 12412 search term.


Good luck,
Jelda


> -Original Message-
> From: John Powers [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 17, 2006 6:59 PM
> To: java-user@lucene.apache.org
> Subject: hypens
> 
> Hello,
> 
>  
> 
> If I have a user search for "b-trunk"  I would like them to be able to
> 
> find "b-trunk" (with hypen).   I would also like someone searching for
> 
> "b trunk" to also find "b-trunk".
> 
>  
> 
> On the other side, if someone searches for 12412 I would like 
> them to be
> 
> able to find 12412-235, 12412-121, 12412-etc...  as well 
> as letting
> 
> someone type in 12412-235 directly and get a good result 
> list: the one item would be best, but a larger list with that 
> one on top is good too.
> 
>  
> 
> So for now I am using the standardanalyzer.   I do a search for what
> 
> they give me in quotes on all fields as well as the same 
> thing w/o quotes.  When I print out the final query the half 
> of the overall query in quotes seems to have the hypens 
> stripped out, but the w/o quotes
> 
> version doesn't...so this lets me find what I want.   But I have each
> 
> search phrase in the final query twice now.it seems to work fine,
> 
> but it seems pretty inelegant--unelegant even.   
> 
>  
> 
> It seems like I can't just strip out the hypens, nor keep 
> them.I am
> 
> storing the name as keyword, but everything else as Text.   I thought
> 
> that would matter but a description or keyword or other field 
> may have something like "this also relates to 23523-235"  so 
> if someone was searching for 23523 I would also want this in 
> the list... and if they
> 
> searched for the 23523-235 then I would also want this still.So I
> 
> don't know if its solvable by the type of field I use to 
> index it.   Or
> 
> do I have to store each field twice with different analyzer?  
> That seems just as clumsy as my double-search solution.  
> 
>  
> 
> Any thoughts?
> 
>  
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: hypens

2006-04-18 Thread Ramana Jelda
I mean, using separate analyzers for indexing & searching..

I will not use any standard analyzers provided by lucene rather implement a
custom anaylzer which is not so difficult.


Jelda

> -Original Message-
> From: John Powers [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, April 18, 2006 4:53 PM
> To: java-user@lucene.apache.org
> Subject: RE: hypens
> 
> What do you mean by "use index and search analyzers".  Don't 
> you always
> have to pass in an analyzer?   I am using the standardanalyzer in both
> cases.
> 
> Which analyzer are you recommending I use for this?   
> 
> -Original Message-
> From: Ramana Jelda [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, April 18, 2006 3:45 AM
> To: java-user@lucene.apache.org
> Subject: RE: hypens
> 
>  Hi,
> I would use index & search analyzers in this case..
> "b-trunk" is analyzed & indexed as b,btrunk,trunk Search term 
> "b-trunk" is anlayzed using search analyzer as "btrunk" and 
> searched. U will find the result..
> 
> Similarly for 12412-235, 12412-121, 12412-etc , indexed as
> 12412,12412235,235 etc
> So obviously it will find 12412 search term.
> 
> 
> Good luck,
> Jelda
> 
> 
> > -Original Message-
> > From: John Powers [mailto:[EMAIL PROTECTED]
> > Sent: Monday, April 17, 2006 6:59 PM
> > To: java-user@lucene.apache.org
> > Subject: hypens
> > 
> > Hello,
> > 
> >  
> > 
> > If I have a user search for "b-trunk"  I would like them to 
> be able to
> > 
> > find "b-trunk" (with hypen).   I would also like someone 
> searching for
> > 
> > "b trunk" to also find "b-trunk".
> > 
> >  
> > 
> > On the other side, if someone searches for 12412 I would 
> like them to 
> > be
> > 
> > able to find 12412-235, 12412-121, 12412-etc...  as well 
> > as letting
> > 
> > someone type in 12412-235 directly and get a good result
> > list: the one item would be best, but a larger list with 
> that one on 
> > top is good too.
> > 
> >  
> > 
> > So for now I am using the standardanalyzer.   I do a search for what
> > 
> > they give me in quotes on all fields as well as the same thing w/o 
> > quotes.  When I print out the final query the half of the overall 
> > query in quotes seems to have the hypens stripped out, but the w/o 
> > quotes
> > 
> > version doesn't...so this lets me find what I want.   But I 
> have each
> > 
> > search phrase in the final query twice now.it seems to 
> work fine,
> > 
> > but it seems pretty inelegant--unelegant even.   
> > 
> >  
> > 
> > It seems like I can't just strip out the hypens, nor keep 
> > them.I am
> > 
> > storing the name as keyword, but everything else as Text.   
> I thought
> > 
> > that would matter but a description or keyword or other 
> field may have 
> > something like "this also relates to 23523-235"  so if someone was 
> > searching for 23523 I would also want this in the list... 
> and if they
> > 
> > searched for the 23523-235 then I would also want this 
> still.So I
> > 
> > don't know if its solvable by the type of field I use to 
> > index it.   Or
> > 
> > do I have to store each field twice with different analyzer?  
> > That seems just as clumsy as my double-search solution.  
> > 
> >  
> > 
> > Any thoughts?
> > 
> >  
> > 
> > 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: hypens

2006-04-18 Thread Yonik Seeley
On 4/18/06, John Powers <[EMAIL PROTECTED]> wrote:
> What do you mean by "use index and search analyzers".  Don't you always
> have to pass in an analyzer?   I am using the standardanalyzer in both
> cases.

I think he means a different analyzer for search than is used for
indexing.  It can make sense in certain cases.

Solr has a WordDelimiterFilter that handles hyphen (and many other)
cases like this.
It can make wi-fi match a query of wifi or "wi fi" or "WiFi".  Solr
also allows easy specification of different analyzers for index vs
query time.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters


-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Indexing - scheduled batch process or server?

2006-04-18 Thread Jeremy Hanna

Marc,

I am using it within the web app.  I use Spring and there are ways to  
throttle a call down to one thread with Spring, if you're worried  
about overloading the server when you update the index.  I'm not sure  
about Quartz and its ability to set a priority or limit the number of  
threads or how to use a thread pool or have load-balancing.  I did a  
quick search on the Quartz user forum and found a lot of discussion  
on Java threads though, so that might be promising (http:// 
forums.opensymphony.com/search.jspa?objID=f6&q=thread).


Anyway, Quartz has seemed to work for what I'm doing - I inherited  
using it from the previous developer and it's had a good history of  
being reliable for our stuff.


Jeremy

On Apr 18, 2006, at 7:38 AM, Marc Dauncey wrote:


Thanks for the response, Jeremy.

Quartz seems like a great solution - are you running
it within the app server?

I think the benefits of doing this would be
convenience of messaging the search server to pick up
fresh indexes. Previously I considered a CRON job and
was thinking of making a web services call to achieve
the same thing.

The only thing that concerns me (and this is maybe a
question for the Quartz mailing list rather than this
one) is the spawning of user threads issue. That kind
of thing makes me nervous in an app server context,
but lots of people use Quartz for J2EE scheduling so
it must be fairly stable.

What was your experience of it?

Many thanks

Marc


--- Jeremy Hanna <[EMAIL PROTECTED]> wrote:


I'm pretty new with this, but with my index for a
database, I'm using
a Quartz scheduler.  Also at the end of the index
update, I set my
singleton of IndexSearcher to null.  That way the
index searcher will
be using the latest information.  That bit as well
as setting it to
null and not closing it I found searching around on
forums.  The
reason given for not closing it is to allow searches
currently using
the index searches to finish using it.
Anyway, I hope this helps.
Jeremy

On Apr 17, 2006, at 2:53 PM, Marc Dauncey wrote:


Hi everyone,

I'm currently designing a Lucene search system and

i'm

considering the indexing side of things.

Just wondered what kind of architecture people

have

adopted for indexing - are CHRON jobs sufficient

for

high volume drip feed indexing or has anyone
implemented a more sophisticated solution with web
services to index on demand?

And has anyone used Quartz to schedule Lucene

index

updates?  Sounds like an interesting product in

this

context.

Many thanks


Marc Dauncey









___

Yahoo! Messenger - NEW crystal clear PC to PC

calling worldwide

with voicemail http://uk.messenger.yahoo.com





-

To unsubscribe, e-mail:

[EMAIL PROTECTED]

For additional commands, e-mail:

[EMAIL PROTECTED]







-

To unsubscribe, e-mail:
[EMAIL PROTECTED]
For additional commands, e-mail:
[EMAIL PROTECTED]







___
24 FIFA World Cup tickets to be won with Yahoo! Mail http:// 
uk.mail.yahoo.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



wildcards with SpanQuery

2006-04-18 Thread Michael Dodson

Is it possible to use wildcards with SpanNearQuery?

For example, if the user enters "fast car" with a slop of 1 things  
like "fast cars" "faster cars" "fast brown cars" etc would be found?


Thanks,
Mike 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: wildcards with SpanQuery

2006-04-18 Thread karl wettin


18 apr 2006 kl. 20.10 skrev Michael Dodson:


Is it possible to use wildcards with SpanNearQuery?

For example, if the user enters "fast car" with a slop of 1 things  
like "fast cars" "faster cars" "fast brown cars" etc would be found?


You might be looking for stem-analysis? You can, for instance, take a  
look at the snowball stemmers.


Wildcards would be if the user enters 'fast* car'.

But to answer your question, there is no SpanWildcardQuery as far as  
I know. You could make one though. And I can imagine it would, just  
as SpanFuzzyQuery, be really slow and virtually unusable in the hand  
of the users.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



using custom sort method

2006-04-18 Thread Urvashi Gadi

Hello All,

My requirement is to combine 2 or more fields using some critera (for 
example weighted average) and sort the search results based on the 
combined fields.


I am looking at DistanceComparatorSource class to implement custom sort 
but it takes only one field for calculation and then sorts the result. 
Is there a way to use more than one field? I looked in sorts in 
succession by the criteria in each SortField class but this doesn't 
fulfill my requirement.


Please suggest.

Urvashi





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: wildcards with SpanQuery

2006-04-18 Thread Erik Hatcher

There isn't a SpanWildcardQuery, per se, but there is a SpanRegexQuery:

	


It can be used to achieve the same sort of thing, only using standard  
regex syntax like fast.* (instead of fast*)


But, stemming should be considered as well.  It'd certainly be more  
performant using a regular PhraseQuery with terms analyzed and  
stemmed, matching terms analyzed the same way during indexing.


Erik


On Apr 18, 2006, at 2:33 PM, karl wettin wrote:



18 apr 2006 kl. 20.10 skrev Michael Dodson:


Is it possible to use wildcards with SpanNearQuery?

For example, if the user enters "fast car" with a slop of 1 things  
like "fast cars" "faster cars" "fast brown cars" etc would be found?


You might be looking for stem-analysis? You can, for instance, take  
a look at the snowball stemmers.


Wildcards would be if the user enters 'fast* car'.

But to answer your question, there is no SpanWildcardQuery as far  
as I know. You could make one though. And I can imagine it would,  
just as SpanFuzzyQuery, be really slow and virtually unusable in  
the hand of the users.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: using custom sort method

2006-04-18 Thread Erik Hatcher
Could your computation be done at indexing time rather than at search  
time?  If so, pre-compute the value and index that into a single field.


Erik


On Apr 18, 2006, at 3:46 PM, Urvashi Gadi wrote:


Hello All,

My requirement is to combine 2 or more fields using some critera  
(for example weighted average) and sort the search results based on  
the combined fields.


I am looking at DistanceComparatorSource class to implement custom  
sort but it takes only one field for calculation and then sorts the  
result. Is there a way to use more than one field? I looked in  
sorts in succession by the criteria in each SortField class but  
this doesn't fulfill my requirement.


Please suggest.

Urvashi





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: using custom sort method

2006-04-18 Thread Urvashi Gadi

No...the information is available only at search time

Quoting Erik Hatcher <[EMAIL PROTECTED]>:

Could your computation be done at indexing time rather than at search 
 time?  If so, pre-compute the value and index that into a single 
field.


Erik


On Apr 18, 2006, at 3:46 PM, Urvashi Gadi wrote:


Hello All,

My requirement is to combine 2 or more fields using some critera  
(for example weighted average) and sort the search results based on  
the combined fields.


I am looking at DistanceComparatorSource class to implement custom  
sort but it takes only one field for calculation and then sorts the  
result. Is there a way to use more than one field? I looked in  
sorts in succession by the criteria in each SortField class but  
this doesn't fulfill my requirement.


Please suggest.

Urvashi





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



using boolean operators with the PhraseQuery

2006-04-18 Thread Vishal Bathija
Hi,
I am trying to find the number of hits for a phrase using the
PhraseQuery. I would like to know how I could seach for 2 phrases at
the same time using the boolean operators OR, AND. The code snippet
that I use to seach for one phrase is

String test ="avoids deadlock"
String[] phraseTerms = test.split( " ");
PhraseQuery query =new PhraseQuery();   
searcher = new IndexSearcher(rd);   
Term[] phrTerm=new Term[phraseTerms.length];
for(int u=0; u

Re: using boolean operators with the PhraseQuery

2006-04-18 Thread Erik Hatcher

Wrap the PhraseQuery's inside a BooleanQuery to achieve AND/OR.

Erik


On Apr 18, 2006, at 10:00 PM, Vishal Bathija wrote:


Hi,
I am trying to find the number of hits for a phrase using the
PhraseQuery. I would like to know how I could seach for 2 phrases at
the same time using the boolean operators OR, AND. The code snippet
that I use to seach for one phrase is

String test ="avoids deadlock"
String[] phraseTerms = test.split( " ");
PhraseQuery query =new PhraseQuery();   
searcher = new IndexSearcher(rd);   
Term[] phrTerm=new Term[phraseTerms.length];
for(int u=0; u


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: using boolean operators with the PhraseQuery

2006-04-18 Thread Vishal Bathija
I am not sure if I understand you.  Do I add the terms for the second
phrase immediately after I add the terms for the first phrase. When do
i wrap the PhraseQuery I construct into a BooleanQuery.

For instance
String newTerm1= "avoids deadlock";
String newTerm2= "reduces cost";
PhraseQuery query =new PhraseQuery();   

String[] phraseTerms1 = newTerm1.split( " ");
Term[] phrTerm1=new Term[phraseTerms1.length];

for(int u=0; u wrote:
> Wrap the PhraseQuery's inside a BooleanQuery to achieve AND/OR.
>
>Erik
>
>
> On Apr 18, 2006, at 10:00 PM, Vishal Bathija wrote:
>
> > Hi,
> > I am trying to find the number of hits for a phrase using the
> > PhraseQuery. I would like to know how I could seach for 2 phrases at
> > the same time using the boolean operators OR, AND. The code snippet
> > that I use to seach for one phrase is
> >
> > String test ="avoids deadlock"
> > String[] phraseTerms = test.split( " ");
> > PhraseQuery query =new PhraseQuery();
> > searcher = new IndexSearcher(rd);
> > Term[] phrTerm=new Term[phraseTerms.length];
> > for(int u=0; u > {
> > phrTerm[u]=new Term("contents",phraseTerms[u]);
> > query.add(phrTerm[u]);
> > }
> >
> > Hits hits = searcher.search(query);
> >
> > How can I extend this to search for multiple phrases?
> >
> > Regards
> > Vishal
> > --
> > Vishal Bathija
> > Graduate Student
> > Department of Computer Science & Systems Analysis
> > Miami University
> > Oxford,Ohio
> > Phone: (513)-461-9239
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


--
Vishal Bathija
Graduate Student
Department of Computer Science & Systems Analysis
Miami University
Oxford,Ohio
Phone: (513)-461-9239

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: using custom sort method

2006-04-18 Thread Yang Sun
I have asked the exact same question a few weeks ago. I just follow the 
customized distance example and loop the results again to get another 
field and compute the scores. It will be painful if you need more than 3 
fields. So far I didn't find any other way to do it. Hope we can see 
some new classes to address the problem. To be able to customize ranking 
is very important to a search engine.


Yang

Urvashi Gadi wrote:


No...the information is available only at search time

Quoting Erik Hatcher <[EMAIL PROTECTED]>:

Could your computation be done at indexing time rather than at search 
 time?  If so, pre-compute the value and index that into a single field.


Erik


On Apr 18, 2006, at 3:46 PM, Urvashi Gadi wrote:


Hello All,

My requirement is to combine 2 or more fields using some critera  
(for example weighted average) and sort the search results based on  
the combined fields.


I am looking at DistanceComparatorSource class to implement custom  
sort but it takes only one field for calculation and then sorts the  
result. Is there a way to use more than one field? I looked in  
sorts in succession by the criteria in each SortField class but  
this doesn't fulfill my requirement.


Please suggest.

Urvashi





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: using boolean operators with the PhraseQuery

2006-04-18 Thread Chris Hostetter

: The code above just adds the terms of phrase2 following the
: terms  for phrase1.
: Can you give me an example building a BooleanQuery OR  for the
: newTerm1 and newTerm2.

At no point does your code use a BooleanQuery ... have you looked at the
javadocs for the BooleanQuery class?



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Remote Parallel MultiSearcher

2006-04-18 Thread Sunil Kumar PK
Hi All,

What I have understood from Lucene Remote Parallel Multi Searcher Search
Procedure is first compute the weight for the Query in each Index
sequentially (one by one, eg: - calculate "query weight" of index1 first and
then index2) and then perform searching of each index one by one and merge
the results.

I want to know is there any possibility or method to merge the weight
calculation of index1 and its search in a single RPC instead of doing the
both function in separate steps.

Another query I have to clear is In RemoteParallelMultiSearcher the method
"docFreq (Term term)" is not parallelized, why it is not
parallelized, and please specify any reason for that.

I am waiting for your feed back .

SuniL


Re: using custom sort method

2006-04-18 Thread Chris Hostetter

: I have asked the exact same question a few weeks ago. I just follow the
: customized distance example and loop the results again to get another
: field and compute the scores. It will be painful if you need more than 3
: fields. So far I didn't find any other way to do it. Hope we can see

It's not clear to me what exactly you've tried so far, and what it doesn't
do that you want it to ... can you post some code demonstrating your
problem?

: some new classes to address the problem. To be able to customize ranking
: is very important to a search engine.

Customizing ranking/scoring is not exactly the same thing as defining your
own sort -- there are lots of ways to customize the scoring of your
queries: doc boosts, field boosts, query boosts, customomized Similarity ...

you should also take a look at the FunctionQueries available in Solr, they
allow you to define custom functions that are factoried into the total
score of your query...

http://incubator.apache.org/solr/docs/api/org/apache/solr/search/function/package-summary.html


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: using boolean operators with the PhraseQuery

2006-04-18 Thread Vishal Bathija
I tried using the boolean query to perform an OR  as below
BooleanQuery b1 = new BooleanQuery();
 b1.add(query,BooleanClause .Occur .SHOULD  );
 b1.add(query2,BooleanClause .Occur .SHOULD );
Hits hits = searcher.search(b1);


System.out.println("Query= "+b1.toString() );

gave me
Query= contents:"provides  distribution" contents:"supports  distribution"

This returns 0 hits.

I am not sure why it returns 0, when I have both phrases present in the docs.


Vishal
On 4/19/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> : The code above just adds the terms of phrase2 following the
> : terms  for phrase1.
> : Can you give me an example building a BooleanQuery OR  for the
> : newTerm1 and newTerm2.
>
> At no point does your code use a BooleanQuery ... have you looked at the
> javadocs for the BooleanQuery class?
>
>
>
> -Hoss
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


--
Vishal Bathija
Graduate Student
Department of Computer Science & Systems Analysis
Miami University
Oxford,Ohio
Phone: (513)-461-9239

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Any storage initatives for optimized indexing/searching

2006-04-18 Thread Prasenjit Mukherjee
It seems that the performance aspects of any indexing/searching 
algorithm is very much dependent upon the disk-access-technologies.  
Just curious, anybody know of any company working(mostly storage 
companies)  in improving their storage/disk access technology  to make  
indexing/searching efficient by optimizing  k-way merges/sorts ?


prasen

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]