Re: Atomicity in Lucene operations
As soon as I've cleaned up the code, I'll publish it, it needs a little more documentation as well. Nader Roy Shan wrote: Maybe you can contribute it to sandbox? On Mon, 18 Oct 2004 08:31:30 -0700 (PDT), Yonik Seeley [EMAIL PROTECTED] wrote: Hi Nader, I would greatly appreciate it if you could CC me on the docs or the code. Thanks! Yonik --- Nader Henein [EMAIL PROTECTED] wrote: It's pretty integrated into our system at this point, I'm working on Packaging it and cleaning up my documentation and then I'll make it available, I can give you the documents and if you still want the code I'll slap together a ruff copy for you and ship it across. Nader Henein Roy Shan wrote: Hello, Nader: I am very interested in how you implement the atomicity. Could you send me a copy of your code? Thanks in advance. Roy __ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Range Query
Hi Guys Apologies. I have a Field Type Text 'ItemPrice' , Using it to Store Price Factor in numeric such as 10, 25.25 , 50.00 If I am suppose to Find the Range factor between 2 prices ex - Contents:shoes +ItemPrice:[10.00 TO 50.60] I get results other then the Range that has been executed [This may be due to query parsing the Ascii values instead of numeric values ] Am I am missing something in the Querry syntax or Is this the wrong way to construct the Query. Please Somebody Advise me ASAP. :( Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Range Query
Range queries use a lexicographic (dictionary) order. So, assuming all your values are positive, you need to ensure that the integer part of each number has a fixed number of digits (pad with leading 0's). The fractional part should be fine, although 1.0 will follow 1. If you have negative numbers you need to pad an extra 0 on the left of the positives, start the negatives with -, and invert the magnitude of the negatives (so they go in the other order). Your actual example below should work as is, except that 10 will not be in the range since 10.00 is strictly after 10. However, this won't work without the padding assuming you have any prices with at an integer part of other than exactly two digits (e.g., 10 is before 6, but after 06). Chuck -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 19, 2004 12:05 AM To: LUCENE Subject: Range Query Hi Guys Apologies. I have a Field Type Text 'ItemPrice' , Using it to Store Price Factor in numeric such as 10, 25.25 , 50.00 If I am suppose to Find the Range factor between 2 prices ex - Contents:shoes +ItemPrice:[10.00 TO 50.60] I get results other then the Range that has been executed [This may be due to query parsing the Ascii values instead of numeric values ] Am I am missing something in the Querry syntax or Is this the wrong way to construct the Query. Please Somebody Advise me ASAP. :( Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search Help in word doc
Hi FFI, I am indexing multiple documents like (word,excel,html,ppt,pdf) at the time of indexing there is no problem. My search results contents(description) comes with small Boxes(this is happening only word documents) I think this is happening because of some special characters like(bullets and symbols) How can I rectify this problem??? Regards, Natarajan.
RE: QueryParsing
hi erik and everyone else ok i will buy the book ;) but this still does not solve the problem of why String x = \jakarta apache\~100; is being transalted as a PhraseQuery FULL_TEXT:jakarta apache~100 is the correct query beining formed ? or is there something wrong with the Proximity Search topic in the URL http://jakarta.apache.org/lucene/docs/queryparsersyntax.html Regards Rupinder -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 18 October 2004 21:05 To: Lucene Users List Subject: Re: QueryParsing QueryParser does not (currently) support SpanQuery's. PhraseQuery is what you'll always get with double-quoted strings. However, you can customize the behavior and get a SpanQuery instead by subclassing and overriding getPhraseQuery. In fact, this is an example I wrote for Lucene in Action. Erik On Oct 18, 2004, at 2:39 PM, Rupinder Singh Mazara wrote: hi all i have a question regarding the QueryParser and Proximity Searches I executed the following piece of code String x = \jakarta apache\~100; QueryParser parser = new QueryParser(FULL_TEXT,new StandardAnalyzer() ); parser.setOperator( QueryParser.DEFAULT_OPERATOR_AND ); Query query = parser.parse(x); System.out.println(query.getClass()+ - +query.toString()); IndexReader indexReader = IndexReader.open( new File(luceneroot) ); query = query.rewrite(indexReader); System.out.println(query.getClass()+ - +query.toString()); in both System.out.println I get the following result class org.apache.lucene.search.PhraseQuery - FULL_TEXT:jakarta apache~100 is this correct, I was expecting to see a SpanQuery being formed at the second println statement I have take this from the example in http://jakarta.apache.org/lucene/docs/queryparsersyntax.html If I remove the quotes I see a QueryParsing error which tell me that the Similarity should be between 0.0 and 1.0 which is as expected please let me know if I missed something Regards Rupinder - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Help in word doc
Seen that. I use the Character.isISOControl() function to identify and remove these characters. -Message d'origine- De : Natarajan.T [mailto:[EMAIL PROTECTED] Envoyé : mardi 19 octobre 2004 10:37 À : [EMAIL PROTECTED] Objet : Search Help in word doc Hi FFI, I am indexing multiple documents like (word,excel,html,ppt,pdf) at the time of indexing there is no problem. My search results contents(description) comes with small Boxes(this is happening only word documents) I think this is happening because of some special characters like(bullets and symbols) How can I rectify this problem??? Regards, Natarajan. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Help in word doc
Hi Remi, Thanks for your response... Pls send me the jar name with sample code. Thanks, Natarajan. -Original Message- From: Cocula Remi [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 19, 2004 2:26 PM To: Lucene Users List Subject: RE: Search Help in word doc Seen that. I use the Character.isISOControl() function to identify and remove these characters. -Message d'origine- De : Natarajan.T [mailto:[EMAIL PROTECTED] Envoyé : mardi 19 octobre 2004 10:37 À : [EMAIL PROTECTED] Objet : Search Help in word doc Hi FFI, I am indexing multiple documents like (word,excel,html,ppt,pdf) at the time of indexing there is no problem. My search results contents(description) comes with small Boxes(this is happening only word documents) I think this is happening because of some special characters like(bullets and symbols) How can I rectify this problem??? Regards, Natarajan. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: QueryParsing
Rupinder Singh Mazara writes: hi erik and everyone else ok i will buy the book ;) but this still does not solve the problem of why String x = \jakarta apache\~100; is being transalted as a PhraseQuery FULL_TEXT:jakarta apache~100 is the correct query beining formed ? or is there something wrong with the Proximity Search topic in the URL http://jakarta.apache.org/lucene/docs/queryparsersyntax.html A proximity search is done by a PhraseQuery with a slop. The slop makes the PhraseQuery to perform a proximity search (so you can argue that the name is problematic). That's what query parser creates. SpanQueries where introduced later. Maybe you can get the effect of a proximity search by SpanQueries also, but that's not handled by the query parser. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Help in word doc
This sample code changes undesired characters into underscores. Document doc = char[] cs = doc.get(content).toCharArray(); StringBuffer sb = new StringBuffer(); for (int j=0;j Array.getLength(cs);j++) { if (!Character.isISOControl(cs[j])) { sb.append(cs[j]); } else { sb.append( _ ); } } System.out.println(sb.toString()); -Message d'origine- De : Natarajan.T [mailto:[EMAIL PROTECTED] Envoyé : mardi 19 octobre 2004 11:06 À : 'Lucene Users List' Objet : RE: Search Help in word doc Hi Remi, Thanks for your response... Pls send me the jar name with sample code. Thanks, Natarajan. -Original Message- From: Cocula Remi [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 19, 2004 2:26 PM To: Lucene Users List Subject: RE: Search Help in word doc Seen that. I use the Character.isISOControl() function to identify and remove these characters. -Message d'origine- De : Natarajan.T [mailto:[EMAIL PROTECTED] Envoyé : mardi 19 octobre 2004 10:37 À : [EMAIL PROTECTED] Objet : Search Help in word doc Hi FFI, I am indexing multiple documents like (word,excel,html,ppt,pdf) at the time of indexing there is no problem. My search results contents(description) comes with small Boxes(this is happening only word documents) I think this is happening because of some special characters like(bullets and symbols) How can I rectify this problem??? Regards, Natarajan. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: QueryParsing
thank you Morus this makes things very clear to me Regards Rupinder -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: 19 October 2004 10:05 To: Lucene Users List Subject: RE: QueryParsing Rupinder Singh Mazara writes: hi erik and everyone else ok i will buy the book ;) but this still does not solve the problem of why String x = \jakarta apache\~100; is being transalted as a PhraseQuery FULL_TEXT:jakarta apache~100 is the correct query beining formed ? or is there something wrong with the Proximity Search topic in the URL http://jakarta.apache.org/lucene/docs/queryparsersyntax.html A proximity search is done by a PhraseQuery with a slop. The slop makes the PhraseQuery to perform a proximity search (so you can argue that the name is problematic). That's what query parser creates. SpanQueries where introduced later. Maybe you can get the effect of a proximity search by SpanQueries also, but that's not handled by the query parser. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Help in word doc
Ok Thanks Remi -Original Message- From: Cocula Remi [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 19, 2004 2:37 PM To: Lucene Users List Subject: RE: Search Help in word doc This sample code changes undesired characters into underscores. Document doc = char[] cs = doc.get(content).toCharArray(); StringBuffer sb = new StringBuffer(); for (int j=0;j Array.getLength(cs);j++) { if (!Character.isISOControl(cs[j])) { sb.append(cs[j]); } else { sb.append( _ ); } } System.out.println(sb.toString()); -Message d'origine- De : Natarajan.T [mailto:[EMAIL PROTECTED] Envoyé : mardi 19 octobre 2004 11:06 À : 'Lucene Users List' Objet : RE: Search Help in word doc Hi Remi, Thanks for your response... Pls send me the jar name with sample code. Thanks, Natarajan. -Original Message- From: Cocula Remi [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 19, 2004 2:26 PM To: Lucene Users List Subject: RE: Search Help in word doc Seen that. I use the Character.isISOControl() function to identify and remove these characters. -Message d'origine- De : Natarajan.T [mailto:[EMAIL PROTECTED] Envoyé : mardi 19 octobre 2004 10:37 À : [EMAIL PROTECTED] Objet : Search Help in word doc Hi FFI, I am indexing multiple documents like (word,excel,html,ppt,pdf) at the time of indexing there is no problem. My search results contents(description) comes with small Boxes(this is happening only word documents) I think this is happening because of some special characters like(bullets and symbols) How can I rectify this problem??? Regards, Natarajan. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Help in word doc
Are you doing this functionality under indexing part or search part -Original Message- From: Cocula Remi [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 19, 2004 2:37 PM To: Lucene Users List Subject: RE: Search Help in word doc This sample code changes undesired characters into underscores. Document doc = char[] cs = doc.get(content).toCharArray(); StringBuffer sb = new StringBuffer(); for (int j=0;j Array.getLength(cs);j++) { if (!Character.isISOControl(cs[j])) { sb.append(cs[j]); } else { sb.append( _ ); } } System.out.println(sb.toString()); -Message d'origine- De : Natarajan.T [mailto:[EMAIL PROTECTED] Envoyé : mardi 19 octobre 2004 11:06 À : 'Lucene Users List' Objet : RE: Search Help in word doc Hi Remi, Thanks for your response... Pls send me the jar name with sample code. Thanks, Natarajan. -Original Message- From: Cocula Remi [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 19, 2004 2:26 PM To: Lucene Users List Subject: RE: Search Help in word doc Seen that. I use the Character.isISOControl() function to identify and remove these characters. -Message d'origine- De : Natarajan.T [mailto:[EMAIL PROTECTED] Envoyé : mardi 19 octobre 2004 10:37 À : [EMAIL PROTECTED] Objet : Search Help in word doc Hi FFI, I am indexing multiple documents like (word,excel,html,ppt,pdf) at the time of indexing there is no problem. My search results contents(description) comes with small Boxes(this is happening only word documents) I think this is happening because of some special characters like(bullets and symbols) How can I rectify this problem??? Regards, Natarajan. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Help in word doc
Ok, Thanks a lot... -Original Message- From: Cocula Remi [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 19, 2004 3:14 PM To: Lucene Users List Subject: RE: Search Help in word doc In my case, search. But probably that the best is to do it at indexing time. -Message d'origine- De : Natarajan.T [mailto:[EMAIL PROTECTED] Envoyé : mardi 19 octobre 2004 11:41 À : 'Lucene Users List' Objet : RE: Search Help in word doc Are you doing this functionality under indexing part or search part -Original Message- From: Cocula Remi [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 19, 2004 2:37 PM To: Lucene Users List Subject: RE: Search Help in word doc This sample code changes undesired characters into underscores. Document doc = char[] cs = doc.get(content).toCharArray(); StringBuffer sb = new StringBuffer(); for (int j=0;j Array.getLength(cs);j++) { if (!Character.isISOControl(cs[j])) { sb.append(cs[j]); } else { sb.append( _ ); } } System.out.println(sb.toString()); -Message d'origine- De : Natarajan.T [mailto:[EMAIL PROTECTED] Envoyé : mardi 19 octobre 2004 11:06 À : 'Lucene Users List' Objet : RE: Search Help in word doc Hi Remi, Thanks for your response... Pls send me the jar name with sample code. Thanks, Natarajan. -Original Message- From: Cocula Remi [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 19, 2004 2:26 PM To: Lucene Users List Subject: RE: Search Help in word doc Seen that. I use the Character.isISOControl() function to identify and remove these characters. -Message d'origine- De : Natarajan.T [mailto:[EMAIL PROTECTED] Envoyé : mardi 19 octobre 2004 10:37 À : [EMAIL PROTECTED] Objet : Search Help in word doc Hi FFI, I am indexing multiple documents like (word,excel,html,ppt,pdf) at the time of indexing there is no problem. My search results contents(description) comes with small Boxes(this is happening only word documents) I think this is happening because of some special characters like(bullets and symbols) How can I rectify this problem??? Regards, Natarajan. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Arabic analyzer
Hi, Scott Smith a écrit : Is anyone aware of an open source (non-GPL; i.e.., free for commercial use) Arabic analyzer for Lucene? Unfortunately (for you), my Arabic Analyzer for Java (http://savannah.nongnu.org/projects/aramorph) is GPL-ed. Does Arabic really require a stemmer as well (some of the reading I've seen on the web would suggest that a stemmer is almost a necessity with Arabic to get anything useful where it is not with other languages). IMHO, stemming *is* a necessity in arabic since this language involves prefixing, suffixing and infixing as well as written a few yet very frequent word agregations. Good luck, -- Pierrick Brihaye mailto:[EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
using optimize and addDocument concurrently.
Hi, My basic question is whether it is possible to continue to add documents to an index in one Thread while running a long running optimization of the index (approx 30 mins) in another thread. I'm using Lucene version 1.4.2. The concurrency matrix at http://www.jguru.com/faq/view.jsp?EID=913302 shows that if you use the same IndexWriter object you can do concurrent writes and optimization. When I try it in my program the addDocuments wait until the optimization has finished, so in this respect it is Thread safe, but the operations cannot be performed at the same time. Our problem is that the index needs to be continually kept up to date with new news articles, but also needs to be regularly optimized to keep it fast. If I cannot update and optimize one index at the same time the best way I can see of doing this is maintaining multiple identical indexes and offlining, optimizing, letting them catch up-to-date and re-onlining them. Does that sounds best to you? Thanks a lot in advance Steve
Null or no analyzer
Hi All I have a question regarding selection of Analyzer's during query parsing i have three field in my index db_id, full_text, subject all three are indexed, however while indexing I specified to lucene to index db_id and subject but not tokenize them I want to give a single search box in my application to enable searching for documents some query can look lile motor cross rally this will get fed to QueryParser to do the relevent parsing however if the user enters Jhon Kerry subject:Elections 2004 I want to make sure that No analyzer is used fro the subject field ? how can that be done. this is because I expect the users to know the subject from a List of controlled vocabularies and also I am searching for documents that have the exact subject I tried using the PerFieldAnalyzerWrapper, but how do I get hold a Analyzer that does nothing but pass the text trough to the Searcher ? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
MultiSearcher object question
I've just indexed over 600,000 documents (index size = 12GB) and have a simple servlet to search the index. I am using the MultiSearcher object (I will add more indexes in the future) in a servlet to test searching. I have noticed that the instantiation of my MulitSearcher object is taking about 5 seconds. As a solution, I have created the MultiSearcher object and stored it in the Application context so I create it once and access it subsequent times. My question is, is this a recommended practice? If I have 1000 users concurrently searching, will this approach cause problems? What do others do in web applications using the MultiSearcher object? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: using optimize and addDocument concurrently.
Steve, The behavior that you descibe is as expected. I have tackled a similar problem to yours by creating a proxy object that acts as a gatekeeper to all IndexReader, IndexSearcher and IndexWriter operations. With fully synchronized access to all methods of the proxy you will not run into any problems. Everytime I need to perform something with the writer, I close the searcher etc. As to regular optimization I tend to reindex now and again with a completely seperate writer and replace the index by moving it to the new location. This BTW has also become a method in my proxy object. Hope this helps, Cheers, Aad Hi, My basic question is whether it is possible to continue to add documents to an index in one Thread while running a long running optimization of the index (approx 30 mins) in another thread. I'm using Lucene version 1.4.2. The concurrency matrix at http://www.jguru.com/faq/view.jsp?EID=913302 shows that if you use the same IndexWriter object you can do concurrent writes and optimization. When I try it in my program the addDocuments wait until the optimization has finished, so in this respect it is Thread safe, but the operations cannot be performed at the same time. Our problem is that the index needs to be continually kept up to date with new news articles, but also needs to be regularly optimized to keep it fast. If I cannot update and optimize one index at the same time the best way I can see of doing this is maintaining multiple identical indexes and offlining, optimizing, letting them catch up-to-date and re-onlining them. Does that sounds best to you? Thanks a lot in advance Steve - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Thesaurus ...
Hello, I'm a new user of Lucene, and a would like to use it to create a Thesaurus. Do you have any idea to do this? Thanks! kind regards P.Galeas - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Null or no analyzer
You can use WhiteSpaceAnalyzer Aviran http://aviran.mordos.com -Original Message- From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 19, 2004 11:23 AM To: Lucene Users List Subject: Null or no analyzer Hi All I have a question regarding selection of Analyzer's during query parsing i have three field in my index db_id, full_text, subject all three are indexed, however while indexing I specified to lucene to index db_id and subject but not tokenize them I want to give a single search box in my application to enable searching for documents some query can look lile motor cross rally this will get fed to QueryParser to do the relevent parsing however if the user enters Jhon Kerry subject:Elections 2004 I want to make sure that No analyzer is used fro the subject field ? how can that be done. this is because I expect the users to know the subject from a List of controlled vocabularies and also I am searching for documents that have the exact subject I tried using the PerFieldAnalyzerWrapper, but how do I get hold a Analyzer that does nothing but pass the text trough to the Searcher ? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Downloading Full Copies of Web Pages
Hi folks, I want to download full copies of web pages and storage them locally as well the hyperlink structures as local directories. I tried to use Lucene, but I've realized that it doesn't have a crawler. Does anyone know a software that make this? Thanks, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Range Query
That is exactly right. It is searching the ASCII. To solve it I pad my price using a method like this: /** * Pads the Price so that all prices are the same number of characters and * can be compared lexigraphically. * @param price * @return */ public static String formatPriceAsString(Double price) { if (price == null) { return null; } return PRICE_FORMATTER.format(price.doubleValue()); } where PRICE_FORMATTER contains enough digits for your largest number. private static final DecimalFormat PRICE_FORMATTER = new DecimalFormat(000.00); When searching I also pad the query term. I looked into hooking into QueryParser, but since the lower/upper prices for my application are different inputs, I choose to handle them without hooking into the QueryParser. Jonathan On Tue, 19 Oct 2004 12:35:06 +0530, Karthik N S [EMAIL PROTECTED] wrote: Hi Guys Apologies. I have a Field Type Text 'ItemPrice' , Using it to Store Price Factor in numeric such as 10, 25.25 , 50.00 If I am suppose to Find the Range factor between 2 prices ex - Contents:shoes +ItemPrice:[10.00 TO 50.60] I get results other then the Range that has been executed [This may be due to query parsing the Ascii values instead of numeric values ] Am I am missing something in the Querry syntax or Is this the wrong way to construct the Query. Please Somebody Advise me ASAP. :( Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Thesaurus ...
Erik Hatcher wrote: Have a look at the WordNet contribution in the Lucene sandbox repository. It could be leveraged for part of a solution. It's something I contributed. Relevant links are: http://jakarta.apache.org/lucene/docs/lucene-sandbox/ http://www.tropo.com/techno/java/lucene/wordnet.html Basically it uses the Lucene index as a kind of associated array to map words to their synonyms using the thesaurus from Wordnet, so a key like, say, fast will have mappings to quick and rapid. This can then be used for query expansion. An example of this expansion in use is here: http://www.hostmon.com/rfc/advanced.jsp Erik On Oct 19, 2004, at 12:40 PM, Patricio Galeas wrote: Hello, I'm a new user of Lucene, and a would like to use it to create a Thesaurus. Do you have any idea to do this? Thanks! kind regards P.Galeas - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]