Lucene based projects...?

2004-01-12 Thread ambiesense
Hello group,

who knows other software projects (like Nutch) which are based and build
around Lucene??  I think it can be quite interesting and helpful for new people
to see and learn from examples... 

Cheers,
Ralf

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Getting top n most frequent words ?

2004-01-12 Thread Ralph
Hi,

does Lucene have functionality to get the top n most frequent words from a
given text / stream / troken stream etc. ? With frequencies? 

Ralf

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene based projects...?

2004-01-12 Thread Erik Hatcher
On Jan 12, 2004, at 6:24 AM, [EMAIL PROTECTED] wrote:
who knows other software projects (like Nutch) which are based and 
build
around Lucene??  I think it can be quite interesting and helpful for 
new people
to see and learn from examples...
This is the purpose of the Powered by section on Lucene's website.

More contributions are welcome!

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: merged search of document

2004-01-12 Thread Thomas Scheffler

Erik Hatcher sagte:
 On Jan 7, 2004, at 4:18 PM, Dror Matalon wrote:
 Actually I would guess that performence should be fine. I would look at
 the code generated by the standard analyzer,
 http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/
 standard/package-summary.html
 which translates from (a AND b) OR c to +a +b c and then see what
 it does with it.

 Huh?  StandardAnalyzer does not do that type of translation.
 QueryParser is what parses (a AND b) OR c into a nested BooleanQuery.

 Query.toString is what will visually turn it into the +/- view.

OK, I've looked inside QueryParser and it's seems to be the right place to
do that. But it's rather complicated to transform a query to another,
since QueryParserTokenManager as an extreme example is not quite
understandable and needs a huge time for all that stuff to work in.

Here's my progress. First I grab all values for DocID in the lucene
index. This seems to work well after a sample earlier in this mailing
list.
For every DocID I want to search for the submitted query. Therefor I
need to transform the queries after the following rule:

foo bar -- +DocID:inserthere +foo bar

foo bar -- (+DocID:inserthere +foo) AND (+DocID:inserthere +bar)

foo -bar -- (+DocID:inserthere +foo) AND (+DocID:inserthere -bar)

I think this should do the trick, since (foo bar) should be understand as
a query for foo and bar must be in the document. So a relative simple
syntax of my queries need only little adaptation to run with lucene I
think. But of cause I don't know how to transform above styled queries.

Any help on this topic available?

Thank you.

Thomas

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: merged search of document

2004-01-12 Thread Erik Hatcher
On Jan 12, 2004, at 8:21 AM, Thomas Scheffler wrote:
OK, I've looked inside QueryParser and it's seems to be the right 
place to
do that. But it's rather complicated to transform a query to another,
since QueryParserTokenManager as an extreme example is not quite
understandable and needs a huge time for all that stuff to work in.
Keep in mind that QueryParser is reasonably overridable.  Simply 
subclass it and override one of the get*Query methods.  This may be the 
simplest way for you to inject additional clauses into a query.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: merged search of document

2004-01-12 Thread Thomas Scheffler

Erik Hatcher sagte:
 On Jan 12, 2004, at 8:21 AM, Thomas Scheffler wrote:
 OK, I've looked inside QueryParser and it's seems to be the right
 place to
 do that. But it's rather complicated to transform a query to another,
 since QueryParserTokenManager as an extreme example is not quite
 understandable and needs a huge time for all that stuff to work in.

 Keep in mind that QueryParser is reasonably overridable.  Simply
 subclass it and override one of the get*Query methods.  This may be the
 simplest way for you to inject additional clauses into a query.

You're right if the source query consist of a non atomic query. If it's a
combined one it does allready function quite well. Because of the last
check in Query(String) of QueryParser it returns firstQuery then and
getBooleanQuery is never called. As I'm not able to overwrite
Query(String) I'm not quite sure how to handle searches like: foo.

Thomas

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: merged search of document

2004-01-12 Thread Thomas Scheffler

Thomas Scheffler sagte:

 Erik Hatcher sagte:
 On Jan 12, 2004, at 8:21 AM, Thomas Scheffler wrote:
 OK, I've looked inside QueryParser and it's seems to be the right
 place to
 do that. But it's rather complicated to transform a query to another,
 since QueryParserTokenManager as an extreme example is not quite
 understandable and needs a huge time for all that stuff to work in.

 Keep in mind that QueryParser is reasonably overridable.  Simply
 subclass it and override one of the get*Query methods.  This may be the
 simplest way for you to inject additional clauses into a query.

 You're right if the source query consist of a non atomic query. If it's a
 combined one it does allready function quite well. Because of the last
 check in Query(String) of QueryParser it returns firstQuery then and
 getBooleanQuery is never called. As I'm not able to overwrite
 Query(String) I'm not quite sure how to handle searches like: foo.


Of cause you can overwrite parse(String), so I do.

public Query parse(String query) throws ParseException{
Query queryTemp=super.parse(query);
if (queryTemp.toString(field).equals(query)){
Vector v=new Vector();
BooleanClause clause=new BooleanClause(queryTemp,true,false);
v.add(clause);
return getBooleanQuery(v);
}
return queryTemp;
}

Now all relevant cases seem to function. Thanks for the help. You were
pushing a great step forward in my working progress.



-- 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Query question

2004-01-12 Thread Scott Smith
I have two fields, call them FieldA and FieldB.  I have a set of words I'm
looking for in FieldA, call them A1 and A2.  I have a different set of words
for FieldB, call them B1 and B2.  Now I want a hit list which contains items
that have at least one A item in FieldA and one B item in FieldB.  In
essence, I think I'm saying I want (A1 OR A2) AND (B1 OR B2)

Does the following do that:

BooleanQuery Query QA = new Boolean Query();
Query qa1 = QueryParser.parse(A1, FieldA, analyzer());
Query qa2 = QueryParser.parse(A2, FieldA, analyzer());
QA.add(qa1, false, false);  // this term is not required
QA.add(qa2, false, false);  // this term is not required

BooleanQuery QB = new BooleanQuery();
Query qb1 = QueryParser.parse(B1, FieldB, analyzer());
Query qb2 = QueryParser.parse(B2, FieldB, analyzer());
QB.add(qb1, false, false);  // this term is not required
QB.add(qb2, false, false);  // this term is not required

BooleanQuery Qfinal = new BooleanQuery();
Qfinal.add(QA, true, false);// gotta have at least one from here
Qfinal.add(QB, true, false);// gotta have at least one from here

hits = mySearcher.search(Qfinal);

I guess I'm assuming that if I add a queries to a BooleanQuery and none of
the items are required, there still needs to be a hit on at least one of the
items for the Document to make it out of the BooleanQuery.

Is this the right way to do this?  Is there an easier/faster way to do the
same thing?

Scott

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]