RE: Question regarding using Lucene or not

2004-10-04 Thread AmitShukla
Thanks Daniel
Can you tell me two more things.
1. How difficult it is to implement our own Similarity class that can do the
things we want ?
2. If there are more than one field that are percentage match like HP, can
we also specify which field gets the preference while search.
For example, in the search, the model has to be Cargo, HP value should be
55,000 or near (tolerance of 5000) and GVWR value should 10,000 or near
(tolerance of 1000). Also GVWR gets a preference over HP value. So if one of
the file contains 
Cargo, HP=54,000 and GVWR=9800 
and second file contains 
Cargo, HP=55,000 and GVWR=9200 
then it should give first file a better rating although the second one has
HP as the exact matching because GVWR has more weightage than HP.

Thanks in advance.

-Original Message-
From: Daniel Naber [mailto:[EMAIL PROTECTED] 
Sent: Saturday, October 02, 2004 6:37 AM
To: Lucene Users List
Subject: Re: Question regarding using Lucene or not


On Saturday 02 October 2004 02:06, [EMAIL PROTECTED] wrote:

 The parameters are both string and numeric. For example, the model 
 should be Cargo and its HP value should be 55,000 or near it . If we 
 specify tolerance value of 5000 then it should search for all the data 
 files where model node is Cargo (definitive match) and HP value is 
 between 50,000 to 60,000 with the one having 55,000 coming as the 100% 
 match.

That's possible with Lucene, you'll need to parse the XML files and put the 
required data into the Lucene index. Then you can search with a query like 
this:

+model:cargo^0 +hp:[5 TO 6] hp:55000^10

This will match all document which contain cargo in the model field and a 
value of 5 to 6 in the hp field. Matches with hp 55000 will be 
boosted so they appear on top. However, matches 5 to 54999 and 50001 
to 6 will have the same ranking. To change that you will need to 
implement your own variation of Lucene's Similarity class.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Question regarding using Lucene or not

2004-10-04 Thread Daniel Naber
On Monday 04 October 2004 22:22, you wrote:

 1. How difficult it is to implement our own Similarity class that can do
 the things we want ?

It should be very easy. The API is described here: 
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html
I think in your case all methods (except one) that return a float can just 
return 1.0f. The one that doesn't return 1 then returns a value that 
represents the difference to the perfect value (well, more like 
1/difference).

 2. If there are more than one field that are percentage match like HP,
 can we also specify which field gets the preference while search.

If you implement the method mentioned above so that it always ranks some 
field higher than another, that should be possible.

But it you've only got 1000 documents (and that number won't increase) you 
could also just search for HP:cargo, put all matches in your own Match 
objects and then sort these via your own implementation of Java's 
compareTo().

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Question regarding using Lucene or not

2004-10-02 Thread Daniel Naber
On Saturday 02 October 2004 02:06, [EMAIL PROTECTED] wrote:

 The parameters are both string and numeric. For example, the model
 should be Cargo and its HP value should be 55,000 or near it . If we
 specify tolerance value of 5000 then it should search for all the data
 files where model node is Cargo (definitive match) and HP value is
 between 50,000 to 60,000 with the one having 55,000 coming as the 100%
 match.

That's possible with Lucene, you'll need to parse the XML files and put the 
required data into the Lucene index. Then you can search with a query like 
this:

+model:cargo^0 +hp:[5 TO 6] hp:55000^10

This will match all document which contain cargo in the model field and a 
value of 5 to 6 in the hp field. Matches with hp 55000 will be 
boosted so they appear on top. However, matches 5 to 54999 and 50001 
to 6 will have the same ranking. To change that you will need to 
implement your own variation of Lucene's Similarity class.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Question regarding using Lucene or not

2004-10-01 Thread AmitShukla
Hello
I have a stand-alone java application. We have a new requirement where there
will be around 1000 data files in XML format. Each of them have the same
format. Nodes will have value and attributes. In the application, the user
will search for a particular spec (the data file) by defining parameters.
The parameters are both string and numeric. For example, the model should be
Cargo and its HP value should be 55,000 or near it . If we specify tolerance
value of 5000 then it should search for all the data files where model node
is Cargo (definitive match) and HP value is between 50,000 to 60,000 with
the one having 55,000 coming as the 100% match. 
Do you think Lucene can meet this requirement or do I need to look into any
other product ?

Please let me know.

Thanks.