Term weighting and Term boost

2004-01-16 Thread Karl Koch
Hello all,

I am new to the Lucene scene and have a few questions regarding the term
boost physolophy:

Is the term boost equal to a term weight? Example: If I boost a term with
0.2 does this mean the term has a weight of 0.2 then?

If this is not the case, how is the term weight of the query calculated
then? Formula? Are there parts in it which I cannot influence? Does this formular
depend on the type of Query or is it independent. Maybe somebody can provide
a small code example? 

Give the following code:

TermQuery termQuery1 = new TermQuery(new Term(contents, house));
TermQuery termQuery2 = new TermQuery(new Term(contents, tree));
termQuery2.setBoost( ? );
BooleanQuery finalQuery = new BooleanQuery();
finalQuery.add(termQuery1, true, false);
finalQuery.add(termQuery2, true, false);

How can I realise that the term tree is double as important for search
than house?

Many questions I know but I am sure that the experts here can answer them
easily.

Cheers,
Karl

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: BooleanQuery question

2004-01-16 Thread Thomas Scheffler

Karl Koch sagte:
 Hi all,

 why does the boolean query have a required and a prohited field
 (boolean
 value)? If something is required it cannot be forbidden and otherwise? How
 does this match with the Boolean model we know from theory?

What if required and prohibited are both off? That's somthing we need.


 Are there differences between Lucene and the Boolean model in theory?

To save three conditions you have to take at least 2 bits. That's for the
theory.


Kind regards

Thomas

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Term weighting and Term boost

2004-01-16 Thread Karl Koch

Hello and thank you for this link. I think this is a very usefull tool to
analyse Lucene internals.

 I realize this is not exactly the answer, but you may want to try one of 
 the new features of Luke (http://www.getopt.org/luke), namely the query 
 result explanation.

When I start it according to the description on your web site and select the
index directory I get an error message current threat no owner...

What does it mean and what do I wrong?

Kind Regards,
Karl


 
 Currently the best way to start Luke is to use Java WebStart. Then open 
 an already existing index, go to the Search tab, enter a query (use 
 Update button to see exactly what it is parsed into), press Search, 
 and then highlight one of the results and press Explain.
 
 It was revealing for me to see how weights, boosts, normalizations etc. 
 are applied under the hood so to speak, especially for  Fuzzy or 
 Phrase queries.
 
 After experimenting a little, you may want to consult the classes in 
 org.apache.lucene.search (e.g. Scorer and Similarity) to see the gory 
 details.
 
 -- 
 Best regards,
 Andrzej Bialecki
 
 -
 Software Architect, System Integration Specialist
 CEN/ISSS EC Workshop, ECIMF project chair
 EU FP6 E-Commerce Expert/Evaluator
 -
 FreeBSD developer (http://www.freebsd.org)
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Indexing of deep structured XML

2004-01-16 Thread TheRanger
Hello all,

it is obviously possible to index the follwoing XML structure in Lucene:

address
  name/
  street/
  postcode/
  niceplace/
/address

by mapping all the xml tags (name, street, postcode and city) it to the
documents (address) fields directly. However is it also possible to map these?

address
  name/
  street/
  area
niceplace/
  /area
/address

Here we have a hierarchy in area (niceplace) which I want to preserve.
Suppose that the meaning of niceplace in an area is different from the niceplace
in the first xml structure (closer specified). I want to preserve this. 

Is there a way to index with Lucene means? If not, are there any attempt of
people doing this or does somebody have ideas how this could be solved?

Cheers,
Karl

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Term weighting and Term boost

2004-01-16 Thread Andrzej Bialecki
Karl Koch wrote:

Hello and thank you for this link. I think this is a very usefull tool to
analyse Lucene internals.

I realize this is not exactly the answer, but you may want to try one of 
the new features of Luke (http://www.getopt.org/luke), namely the query 
result explanation.


When I start it according to the description on your web site and select the
index directory I get an error message current threat no owner...
I.e. Java WebStart, or by getting the jars and starting it from 
command-line?

What does it mean and what do I wrong?
Beats me... I've never seen something like that. Could you please turn 
on the Java console, and see what kind of exception and where is thrown?

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Indexing of deep structured XML

2004-01-16 Thread Thomas Krämer
Hi Karl, ol' fellow

try the apache commons digester.
there is a nice explanation about how it works written by thomas habing.
regards

thomas

[EMAIL PROTECTED] wrote:
Hello all,

it is obviously possible to index the follwoing XML structure in Lucene:

address
  name/
  street/
  postcode/
  niceplace/
/address
by mapping all the xml tags (name, street, postcode and city) it to the
documents (address) fields directly. However is it also possible to map these?
address
  name/
  street/
  area
niceplace/
  /area
/address
Here we have a hierarchy in area (niceplace) which I want to preserve.
Suppose that the meaning of niceplace in an area is different from the niceplace
in the first xml structure (closer specified). I want to preserve this. 

Is there a way to index with Lucene means? If not, are there any attempt of
people doing this or does somebody have ideas how this could be solved?
Cheers,
Karl


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Indexing of deep structured XML

2004-01-16 Thread Goulish, Michael

To really preserve the relationships in arbitrarily 
structured XML, you pretty much need to use a database 
that directly supports an XML query language like 
XQuery or XPath.

 Mick .



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 16, 2004 8:19 AM
To: [EMAIL PROTECTED]
Subject: Indexing of deep structured XML


Hello all,

it is obviously possible to index the follwoing XML structure in Lucene:

address
  name/
  street/
  postcode/
  niceplace/
/address

by mapping all the xml tags (name, street, postcode and city) it to the
documents (address) fields directly. However is it also possible to map these?

address
  name/
  street/
  area
niceplace/
  /area
/address

Here we have a hierarchy in area (niceplace) which I want to preserve.
Suppose that the meaning of niceplace in an area is different from the niceplace
in the first xml structure (closer specified). I want to preserve this. 

Is there a way to index with Lucene means? If not, are there any attempt of
people doing this or does somebody have ideas how this could be solved?

Cheers,
Karl

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: BooleanQuery question

2004-01-16 Thread Scott ganyo
No, you don't need required or prohibited, but you can't have both.  
Here is a rundown:

* A required clause will allow a document to be selected if and only if 
it contains that clause and will exclude any documents that don't.

* A prohibited clause will exclude any documents that contain that 
clause.

* A clause that is neither prohibited nor required will select a 
document if it contains the clause, but the clause will not prevent 
non-matching documents from being selected by other clauses.

Hopefully that helps,

Scott

On Jan 16, 2004, at 7:32 AM, Thomas Scheffler wrote:

Karl Koch sagte:
Hi all,

why does the boolean query have a required and a prohited field
(boolean
value)? If something is required it cannot be forbidden and 
otherwise? How
does this match with the Boolean model we know from theory?
What if required and prohibited are both off? That's somthing we need.

Are there differences between Lucene and the Boolean model in theory?
To save three conditions you have to take at least 2 bits. That's for 
the
theory.

Kind regards

Thomas

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


smime.p7s
Description: S/MIME cryptographic signature


Re: Term weighting and Term boost

2004-01-16 Thread Karl Koch
Hello Andrzej,

sorry. I mistakenly run it under Java 1.2.2 which cannot work :-) Then you
get Threat Exceptions...

Anyway, solved now. Thank you,
Karl

 Karl Koch wrote:
 
  Hello and thank you for this link. I think this is a very usefull tool
 to
  analyse Lucene internals.
  
  
 I realize this is not exactly the answer, but you may want to try one of
 
 the new features of Luke (http://www.getopt.org/luke), namely the query 
 result explanation.
  
  
  When I start it according to the description on your web site and select
 the
  index directory I get an error message current threat no owner...
  
 
 I.e. Java WebStart, or by getting the jars and starting it from 
 command-line?
 
  What does it mean and what do I wrong?
 
 Beats me... I've never seen something like that. Could you please turn 
 on the Java console, and see what kind of exception and where is thrown?
 
 -- 
 Best regards,
 Andrzej Bialecki
 
 -
 Software Architect, System Integration Specialist
 CEN/ISSS EC Workshop, ECIMF project chair
 EU FP6 E-Commerce Expert/Evaluator
 -
 FreeBSD developer (http://www.freebsd.org)
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



mergeFactor and maxMergeDocs

2004-01-16 Thread Chong, Herb
what effect and what recommendations are valid for Lucene 1.3?

Herb

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene version in PERL - status update?

2004-01-16 Thread Charlie Smith
How long till there is a server version in PERL?


--
This message may contain confidential information, and is intended only for the use of 
the individual(s) to whom it is addressed.


==


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Ordening documents

2004-01-16 Thread William W
Hi folks,

I have some documents

 doc 1 ==  name=Palm Zire
 doc 2 ==  name=Palm Zilion Zire
 doc 3 ==  name=Palm Test
I will insert these docs in my index following the order  doc 1, doc 2, 
doc3.

If I execute the query == name:Palm
Witch order will the documents come ?
And if I execute the query == name:(Palml Zire) ??

I thougth that the documents would ALWAYS be in the order that I included in 
the index.

How will I know the order of the result ?

Thanks,
William.
_
Find high-speed ‘net deals — comparison-shop your local providers here. 
https://broadband.msn.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Ordening documents

2004-01-16 Thread William W
Hi Folks,

To the order of the result What really matters is ONLY the order in which 
the information is stored in the index ?

Thanks,
William.


From: William W [EMAIL PROTECTED]
Reply-To: Lucene Users List [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Ordening documents
Date: Fri, 16 Jan 2004 19:14:06 +
Hi folks,

I have some documents

 doc 1 ==  name=Palm Zire
 doc 2 ==  name=Palm Zilion Zire
 doc 3 ==  name=Palm Test
I will insert these docs in my index following the order  doc 1, doc 2, 
doc3.

If I execute the query == name:Palm
Witch order will the documents come ?
And if I execute the query == name:(Palml Zire) ??

I thougth that the documents would ALWAYS be in the order that I included 
in the index.

How will I know the order of the result ?

Thanks,
William.
_
Find high-speed ‘net deals — comparison-shop your local providers here. 
https://broadband.msn.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
_
Check out the coupons and bargains on MSN Offers! 
http://shopping.msn.com/softcontent/softcontent.aspx?scmId=1418

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Ordening documents

2004-01-16 Thread Michael Giles
William,

The order of the results are going to be based on how well they match the 
query (i.e. weighted by relevancy).  So although all of those values 
contain the term Palm, I would assume you would get the shorter entries 
(i.e. 1  3) before the longer ones (2) as they have a higher percentage of 
palmness.  Same goes for the second query (it is a better match for 1, 
than 2).  If you want the documents to come back in their document order, 
you would need to sort the results yourself.

-Mike

At 02:33 PM 1/16/2004, you wrote:

Hi Folks,

To the order of the result What really matters is ONLY the order in which 
the information is stored in the index ?

Thanks,
William.


From: William W [EMAIL PROTECTED]
Reply-To: Lucene Users List [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Ordening documents
Date: Fri, 16 Jan 2004 19:14:06 +
Hi folks,

I have some documents

 doc 1 ==  name=Palm Zire
 doc 2 ==  name=Palm Zilion Zire
 doc 3 ==  name=Palm Test
I will insert these docs in my index following the order  doc 1, doc 2, doc3.

If I execute the query == name:Palm
Witch order will the documents come ?
And if I execute the query == name:(Palml Zire) ??

I thougth that the documents would ALWAYS be in the order that I included 
in the index.

How will I know the order of the result ?

Thanks,
William.
_
Find high-speed ‘net deals — comparison-shop your local providers here. 
https://broadband.msn.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
_
Check out the coupons and bargains on MSN Offers! 
http://shopping.msn.com/softcontent/softcontent.aspx?scmId=1418

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Ordening documents

2004-01-16 Thread Chun, John
Results are returned by order of score (highest first), not by the order
they are inserted in the index.

You may find the faq useful, 
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi
in particular take a look at the 'searching section'.

hth,
-John

-Original Message-
From: William W [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 16, 2004 2:34 PM
To: [EMAIL PROTECTED]
Subject: RE: Ordening documents



Hi Folks,

To the order of the result What really matters is ONLY the order in
which 
the information is stored in the index ?

Thanks,
William.



From: William W [EMAIL PROTECTED]
Reply-To: Lucene Users List [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Ordening documents
Date: Fri, 16 Jan 2004 19:14:06 +


Hi folks,

I have some documents

  doc 1 ==  name=Palm Zire
  doc 2 ==  name=Palm Zilion Zire
  doc 3 ==  name=Palm Test

I will insert these docs in my index following the order  doc 1, doc 2,

doc3.

If I execute the query == name:Palm
Witch order will the documents come ?

And if I execute the query == name:(Palml Zire) ??

I thougth that the documents would ALWAYS be in the order that I
included 
in the index.

How will I know the order of the result ?

Thanks,
William.

_
Find high-speed 'net deals - comparison-shop your local providers here.

https://broadband.msn.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


_
Check out the coupons and bargains on MSN Offers! 
http://shopping.msn.com/softcontent/softcontent.aspx?scmId=1418


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Ordening documents

2004-01-16 Thread Peter Keegan
What is the returned order for documents with identical scores?

Peter

- Original Message - 
From: Chun, John [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, January 16, 2004 3:44 PM
Subject: RE: Ordening documents


Results are returned by order of score (highest first), not by the order
they are inserted in the index.

You may find the faq useful, 
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi
in particular take a look at the 'searching section'.

hth,
-John

-Original Message-
From: William W [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 16, 2004 2:34 PM
To: [EMAIL PROTECTED]
Subject: RE: Ordening documents



Hi Folks,

To the order of the result What really matters is ONLY the order in
which 
the information is stored in the index ?

Thanks,
William.



From: William W [EMAIL PROTECTED]
Reply-To: Lucene Users List [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Ordening documents
Date: Fri, 16 Jan 2004 19:14:06 +


Hi folks,

I have some documents

  doc 1 ==  name=Palm Zire
  doc 2 ==  name=Palm Zilion Zire
  doc 3 ==  name=Palm Test

I will insert these docs in my index following the order  doc 1, doc 2,

doc3.

If I execute the query == name:Palm
Witch order will the documents come ?

And if I execute the query == name:(Palml Zire) ??

I thougth that the documents would ALWAYS be in the order that I
included 
in the index.

How will I know the order of the result ?

Thanks,
William.

_
Find high-speed 'net deals - comparison-shop your local providers here.

https://broadband.msn.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


_
Check out the coupons and bargains on MSN Offers! 
http://shopping.msn.com/softcontent/softcontent.aspx?scmId=1418


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]