Lucene based projects...?
Hello group, who knows other software projects (like Nutch) which are based and build around Lucene?? I think it can be quite interesting and helpful for new people to see and learn from examples... Cheers, Ralf -- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Getting top n most frequent words ?
Hi, does Lucene have functionality to get the top n most frequent words from a given text / stream / troken stream etc. ? With frequencies? Ralf -- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene based projects...?
On Jan 12, 2004, at 6:24 AM, [EMAIL PROTECTED] wrote: who knows other software projects (like Nutch) which are based and build around Lucene?? I think it can be quite interesting and helpful for new people to see and learn from examples... This is the purpose of the Powered by section on Lucene's website. More contributions are welcome! Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: merged search of document
Erik Hatcher sagte: On Jan 7, 2004, at 4:18 PM, Dror Matalon wrote: Actually I would guess that performence should be fine. I would look at the code generated by the standard analyzer, http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/ standard/package-summary.html which translates from (a AND b) OR c to +a +b c and then see what it does with it. Huh? StandardAnalyzer does not do that type of translation. QueryParser is what parses (a AND b) OR c into a nested BooleanQuery. Query.toString is what will visually turn it into the +/- view. OK, I've looked inside QueryParser and it's seems to be the right place to do that. But it's rather complicated to transform a query to another, since QueryParserTokenManager as an extreme example is not quite understandable and needs a huge time for all that stuff to work in. Here's my progress. First I grab all values for DocID in the lucene index. This seems to work well after a sample earlier in this mailing list. For every DocID I want to search for the submitted query. Therefor I need to transform the queries after the following rule: foo bar -- +DocID:inserthere +foo bar foo bar -- (+DocID:inserthere +foo) AND (+DocID:inserthere +bar) foo -bar -- (+DocID:inserthere +foo) AND (+DocID:inserthere -bar) I think this should do the trick, since (foo bar) should be understand as a query for foo and bar must be in the document. So a relative simple syntax of my queries need only little adaptation to run with lucene I think. But of cause I don't know how to transform above styled queries. Any help on this topic available? Thank you. Thomas - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: merged search of document
On Jan 12, 2004, at 8:21 AM, Thomas Scheffler wrote: OK, I've looked inside QueryParser and it's seems to be the right place to do that. But it's rather complicated to transform a query to another, since QueryParserTokenManager as an extreme example is not quite understandable and needs a huge time for all that stuff to work in. Keep in mind that QueryParser is reasonably overridable. Simply subclass it and override one of the get*Query methods. This may be the simplest way for you to inject additional clauses into a query. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: merged search of document
Erik Hatcher sagte: On Jan 12, 2004, at 8:21 AM, Thomas Scheffler wrote: OK, I've looked inside QueryParser and it's seems to be the right place to do that. But it's rather complicated to transform a query to another, since QueryParserTokenManager as an extreme example is not quite understandable and needs a huge time for all that stuff to work in. Keep in mind that QueryParser is reasonably overridable. Simply subclass it and override one of the get*Query methods. This may be the simplest way for you to inject additional clauses into a query. You're right if the source query consist of a non atomic query. If it's a combined one it does allready function quite well. Because of the last check in Query(String) of QueryParser it returns firstQuery then and getBooleanQuery is never called. As I'm not able to overwrite Query(String) I'm not quite sure how to handle searches like: foo. Thomas - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: merged search of document
Thomas Scheffler sagte: Erik Hatcher sagte: On Jan 12, 2004, at 8:21 AM, Thomas Scheffler wrote: OK, I've looked inside QueryParser and it's seems to be the right place to do that. But it's rather complicated to transform a query to another, since QueryParserTokenManager as an extreme example is not quite understandable and needs a huge time for all that stuff to work in. Keep in mind that QueryParser is reasonably overridable. Simply subclass it and override one of the get*Query methods. This may be the simplest way for you to inject additional clauses into a query. You're right if the source query consist of a non atomic query. If it's a combined one it does allready function quite well. Because of the last check in Query(String) of QueryParser it returns firstQuery then and getBooleanQuery is never called. As I'm not able to overwrite Query(String) I'm not quite sure how to handle searches like: foo. Of cause you can overwrite parse(String), so I do. public Query parse(String query) throws ParseException{ Query queryTemp=super.parse(query); if (queryTemp.toString(field).equals(query)){ Vector v=new Vector(); BooleanClause clause=new BooleanClause(queryTemp,true,false); v.add(clause); return getBooleanQuery(v); } return queryTemp; } Now all relevant cases seem to function. Thanks for the help. You were pushing a great step forward in my working progress. -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query question
I have two fields, call them FieldA and FieldB. I have a set of words I'm looking for in FieldA, call them A1 and A2. I have a different set of words for FieldB, call them B1 and B2. Now I want a hit list which contains items that have at least one A item in FieldA and one B item in FieldB. In essence, I think I'm saying I want (A1 OR A2) AND (B1 OR B2) Does the following do that: BooleanQuery Query QA = new Boolean Query(); Query qa1 = QueryParser.parse(A1, FieldA, analyzer()); Query qa2 = QueryParser.parse(A2, FieldA, analyzer()); QA.add(qa1, false, false); // this term is not required QA.add(qa2, false, false); // this term is not required BooleanQuery QB = new BooleanQuery(); Query qb1 = QueryParser.parse(B1, FieldB, analyzer()); Query qb2 = QueryParser.parse(B2, FieldB, analyzer()); QB.add(qb1, false, false); // this term is not required QB.add(qb2, false, false); // this term is not required BooleanQuery Qfinal = new BooleanQuery(); Qfinal.add(QA, true, false);// gotta have at least one from here Qfinal.add(QB, true, false);// gotta have at least one from here hits = mySearcher.search(Qfinal); I guess I'm assuming that if I add a queries to a BooleanQuery and none of the items are required, there still needs to be a hit on at least one of the items for the Document to make it out of the BooleanQuery. Is this the right way to do this? Is there an easier/faster way to do the same thing? Scott - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]