Term weighting and Term boost
Hello all, I am new to the Lucene scene and have a few questions regarding the term boost physolophy: Is the term boost equal to a term weight? Example: If I boost a term with 0.2 does this mean the term has a weight of 0.2 then? If this is not the case, how is the term weight of the query calculated then? Formula? Are there parts in it which I cannot influence? Does this formular depend on the type of Query or is it independent. Maybe somebody can provide a small code example? Give the following code: TermQuery termQuery1 = new TermQuery(new Term(contents, house)); TermQuery termQuery2 = new TermQuery(new Term(contents, tree)); termQuery2.setBoost( ? ); BooleanQuery finalQuery = new BooleanQuery(); finalQuery.add(termQuery1, true, false); finalQuery.add(termQuery2, true, false); How can I realise that the term tree is double as important for search than house? Many questions I know but I am sure that the experts here can answer them easily. Cheers, Karl -- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery question
Karl Koch sagte: Hi all, why does the boolean query have a required and a prohited field (boolean value)? If something is required it cannot be forbidden and otherwise? How does this match with the Boolean model we know from theory? What if required and prohibited are both off? That's somthing we need. Are there differences between Lucene and the Boolean model in theory? To save three conditions you have to take at least 2 bits. That's for the theory. Kind regards Thomas - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Term weighting and Term boost
Hello and thank you for this link. I think this is a very usefull tool to analyse Lucene internals. I realize this is not exactly the answer, but you may want to try one of the new features of Luke (http://www.getopt.org/luke), namely the query result explanation. When I start it according to the description on your web site and select the index directory I get an error message current threat no owner... What does it mean and what do I wrong? Kind Regards, Karl Currently the best way to start Luke is to use Java WebStart. Then open an already existing index, go to the Search tab, enter a query (use Update button to see exactly what it is parsed into), press Search, and then highlight one of the results and press Explain. It was revealing for me to see how weights, boosts, normalizations etc. are applied under the hood so to speak, especially for Fuzzy or Phrase queries. After experimenting a little, you may want to consult the classes in org.apache.lucene.search (e.g. Scorer and Similarity) to see the gory details. -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Indexing of deep structured XML
Hello all, it is obviously possible to index the follwoing XML structure in Lucene: address name/ street/ postcode/ niceplace/ /address by mapping all the xml tags (name, street, postcode and city) it to the documents (address) fields directly. However is it also possible to map these? address name/ street/ area niceplace/ /area /address Here we have a hierarchy in area (niceplace) which I want to preserve. Suppose that the meaning of niceplace in an area is different from the niceplace in the first xml structure (closer specified). I want to preserve this. Is there a way to index with Lucene means? If not, are there any attempt of people doing this or does somebody have ideas how this could be solved? Cheers, Karl -- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Term weighting and Term boost
Karl Koch wrote: Hello and thank you for this link. I think this is a very usefull tool to analyse Lucene internals. I realize this is not exactly the answer, but you may want to try one of the new features of Luke (http://www.getopt.org/luke), namely the query result explanation. When I start it according to the description on your web site and select the index directory I get an error message current threat no owner... I.e. Java WebStart, or by getting the jars and starting it from command-line? What does it mean and what do I wrong? Beats me... I've never seen something like that. Could you please turn on the Java console, and see what kind of exception and where is thrown? -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Indexing of deep structured XML
Hi Karl, ol' fellow try the apache commons digester. there is a nice explanation about how it works written by thomas habing. regards thomas [EMAIL PROTECTED] wrote: Hello all, it is obviously possible to index the follwoing XML structure in Lucene: address name/ street/ postcode/ niceplace/ /address by mapping all the xml tags (name, street, postcode and city) it to the documents (address) fields directly. However is it also possible to map these? address name/ street/ area niceplace/ /area /address Here we have a hierarchy in area (niceplace) which I want to preserve. Suppose that the meaning of niceplace in an area is different from the niceplace in the first xml structure (closer specified). I want to preserve this. Is there a way to index with Lucene means? If not, are there any attempt of people doing this or does somebody have ideas how this could be solved? Cheers, Karl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing of deep structured XML
To really preserve the relationships in arbitrarily structured XML, you pretty much need to use a database that directly supports an XML query language like XQuery or XPath. Mick . -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 16, 2004 8:19 AM To: [EMAIL PROTECTED] Subject: Indexing of deep structured XML Hello all, it is obviously possible to index the follwoing XML structure in Lucene: address name/ street/ postcode/ niceplace/ /address by mapping all the xml tags (name, street, postcode and city) it to the documents (address) fields directly. However is it also possible to map these? address name/ street/ area niceplace/ /area /address Here we have a hierarchy in area (niceplace) which I want to preserve. Suppose that the meaning of niceplace in an area is different from the niceplace in the first xml structure (closer specified). I want to preserve this. Is there a way to index with Lucene means? If not, are there any attempt of people doing this or does somebody have ideas how this could be solved? Cheers, Karl -- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery question
No, you don't need required or prohibited, but you can't have both. Here is a rundown: * A required clause will allow a document to be selected if and only if it contains that clause and will exclude any documents that don't. * A prohibited clause will exclude any documents that contain that clause. * A clause that is neither prohibited nor required will select a document if it contains the clause, but the clause will not prevent non-matching documents from being selected by other clauses. Hopefully that helps, Scott On Jan 16, 2004, at 7:32 AM, Thomas Scheffler wrote: Karl Koch sagte: Hi all, why does the boolean query have a required and a prohited field (boolean value)? If something is required it cannot be forbidden and otherwise? How does this match with the Boolean model we know from theory? What if required and prohibited are both off? That's somthing we need. Are there differences between Lucene and the Boolean model in theory? To save three conditions you have to take at least 2 bits. That's for the theory. Kind regards Thomas - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] smime.p7s Description: S/MIME cryptographic signature
Re: Term weighting and Term boost
Hello Andrzej, sorry. I mistakenly run it under Java 1.2.2 which cannot work :-) Then you get Threat Exceptions... Anyway, solved now. Thank you, Karl Karl Koch wrote: Hello and thank you for this link. I think this is a very usefull tool to analyse Lucene internals. I realize this is not exactly the answer, but you may want to try one of the new features of Luke (http://www.getopt.org/luke), namely the query result explanation. When I start it according to the description on your web site and select the index directory I get an error message current threat no owner... I.e. Java WebStart, or by getting the jars and starting it from command-line? What does it mean and what do I wrong? Beats me... I've never seen something like that. Could you please turn on the Java console, and see what kind of exception and where is thrown? -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- +++ GMX - die erste Adresse für Mail, Message, More +++ Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
mergeFactor and maxMergeDocs
what effect and what recommendations are valid for Lucene 1.3? Herb - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene version in PERL - status update?
How long till there is a server version in PERL? -- This message may contain confidential information, and is intended only for the use of the individual(s) to whom it is addressed. == - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Ordening documents
Hi folks, I have some documents doc 1 == name=Palm Zire doc 2 == name=Palm Zilion Zire doc 3 == name=Palm Test I will insert these docs in my index following the order doc 1, doc 2, doc3. If I execute the query == name:Palm Witch order will the documents come ? And if I execute the query == name:(Palml Zire) ?? I thougth that the documents would ALWAYS be in the order that I included in the index. How will I know the order of the result ? Thanks, William. _ Find high-speed net deals comparison-shop your local providers here. https://broadband.msn.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Ordening documents
Hi Folks, To the order of the result What really matters is ONLY the order in which the information is stored in the index ? Thanks, William. From: William W [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Ordening documents Date: Fri, 16 Jan 2004 19:14:06 + Hi folks, I have some documents doc 1 == name=Palm Zire doc 2 == name=Palm Zilion Zire doc 3 == name=Palm Test I will insert these docs in my index following the order doc 1, doc 2, doc3. If I execute the query == name:Palm Witch order will the documents come ? And if I execute the query == name:(Palml Zire) ?? I thougth that the documents would ALWAYS be in the order that I included in the index. How will I know the order of the result ? Thanks, William. _ Find high-speed net deals comparison-shop your local providers here. https://broadband.msn.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Check out the coupons and bargains on MSN Offers! http://shopping.msn.com/softcontent/softcontent.aspx?scmId=1418 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Ordening documents
William, The order of the results are going to be based on how well they match the query (i.e. weighted by relevancy). So although all of those values contain the term Palm, I would assume you would get the shorter entries (i.e. 1 3) before the longer ones (2) as they have a higher percentage of palmness. Same goes for the second query (it is a better match for 1, than 2). If you want the documents to come back in their document order, you would need to sort the results yourself. -Mike At 02:33 PM 1/16/2004, you wrote: Hi Folks, To the order of the result What really matters is ONLY the order in which the information is stored in the index ? Thanks, William. From: William W [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Ordening documents Date: Fri, 16 Jan 2004 19:14:06 + Hi folks, I have some documents doc 1 == name=Palm Zire doc 2 == name=Palm Zilion Zire doc 3 == name=Palm Test I will insert these docs in my index following the order doc 1, doc 2, doc3. If I execute the query == name:Palm Witch order will the documents come ? And if I execute the query == name:(Palml Zire) ?? I thougth that the documents would ALWAYS be in the order that I included in the index. How will I know the order of the result ? Thanks, William. _ Find high-speed net deals comparison-shop your local providers here. https://broadband.msn.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Check out the coupons and bargains on MSN Offers! http://shopping.msn.com/softcontent/softcontent.aspx?scmId=1418 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Ordening documents
Results are returned by order of score (highest first), not by the order they are inserted in the index. You may find the faq useful, http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi in particular take a look at the 'searching section'. hth, -John -Original Message- From: William W [mailto:[EMAIL PROTECTED] Sent: Friday, January 16, 2004 2:34 PM To: [EMAIL PROTECTED] Subject: RE: Ordening documents Hi Folks, To the order of the result What really matters is ONLY the order in which the information is stored in the index ? Thanks, William. From: William W [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Ordening documents Date: Fri, 16 Jan 2004 19:14:06 + Hi folks, I have some documents doc 1 == name=Palm Zire doc 2 == name=Palm Zilion Zire doc 3 == name=Palm Test I will insert these docs in my index following the order doc 1, doc 2, doc3. If I execute the query == name:Palm Witch order will the documents come ? And if I execute the query == name:(Palml Zire) ?? I thougth that the documents would ALWAYS be in the order that I included in the index. How will I know the order of the result ? Thanks, William. _ Find high-speed 'net deals - comparison-shop your local providers here. https://broadband.msn.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Check out the coupons and bargains on MSN Offers! http://shopping.msn.com/softcontent/softcontent.aspx?scmId=1418 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Ordening documents
What is the returned order for documents with identical scores? Peter - Original Message - From: Chun, John [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, January 16, 2004 3:44 PM Subject: RE: Ordening documents Results are returned by order of score (highest first), not by the order they are inserted in the index. You may find the faq useful, http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi in particular take a look at the 'searching section'. hth, -John -Original Message- From: William W [mailto:[EMAIL PROTECTED] Sent: Friday, January 16, 2004 2:34 PM To: [EMAIL PROTECTED] Subject: RE: Ordening documents Hi Folks, To the order of the result What really matters is ONLY the order in which the information is stored in the index ? Thanks, William. From: William W [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Ordening documents Date: Fri, 16 Jan 2004 19:14:06 + Hi folks, I have some documents doc 1 == name=Palm Zire doc 2 == name=Palm Zilion Zire doc 3 == name=Palm Test I will insert these docs in my index following the order doc 1, doc 2, doc3. If I execute the query == name:Palm Witch order will the documents come ? And if I execute the query == name:(Palml Zire) ?? I thougth that the documents would ALWAYS be in the order that I included in the index. How will I know the order of the result ? Thanks, William. _ Find high-speed 'net deals - comparison-shop your local providers here. https://broadband.msn.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Check out the coupons and bargains on MSN Offers! http://shopping.msn.com/softcontent/softcontent.aspx?scmId=1418 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]