Re: ezmlm response
On Oct 5, 2004, at 4:29 PM, Patel, Viral wrote: Does anyone know how can I iterate through entire index and display all of the "records" without typing anything in the query? You can use the IndexReader API to navigate the index and walk through all of the documents. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: ezmlm response
Does anyone know how can I iterate through entire index and display all of the "records" without typing anything in the query? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 05, 2004 3:25 PM To: Patel, Viral Subject: ezmlm response Hi! This is the ezmlm program. I'm managing the [EMAIL PROTECTED] mailing list. I'm working for my owner, who can be reached at [EMAIL PROTECTED] No information has been provided for this list. --- Administrative commands for the lucene-user list --- I can handle administrative requests automatically. Please do not send them to the list address! Instead, send your message to the correct command address: To subscribe to the list, send a message to: <[EMAIL PROTECTED]> To remove your address from the list, send a message to: <[EMAIL PROTECTED]> Send mail to the following for info and FAQ for this list: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> Similar addresses exist for the digest list: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> To get messages 123 through 145 (a maximum of 100 per request), mail: <[EMAIL PROTECTED]> To get an index with subject and author for messages 123-456 , mail: <[EMAIL PROTECTED]> They are always returned as sets of 100, max 2000 per request, so you'll actually get 100-499. To receive all messages with the same subject as message 12345, send an empty message to: <[EMAIL PROTECTED]> The messages do not really need to be empty, but I will ignore their content. Only the ADDRESS you send to is important. You can start a subscription for an alternate address, for example "[EMAIL PROTECTED]", just add a hyphen and your address (with '=' instead of '@') after the command word: <[EMAIL PROTECTED]> To stop subscription for this address, mail: <[EMAIL PROTECTED]> In both cases, I'll send a confirmation message to that address. When you receive it, simply reply to it to complete your subscription. If despite following these instructions, you do not get the desired results, please contact my owner at [EMAIL PROTECTED] Please be patient, my owner is a lot slower than I am ;-) --- Enclosed is a copy of the request I received. Return-Path: <[EMAIL PROTECTED]> Received: (qmail 72273 invoked by uid 99); 5 Oct 2004 20:24:45 - X-ASF-Spam-Status: No, hits=3.2 required=10.0 tests=HTML_MESSAGE,HTML_SHORT_COMMENT,HTML_TITLE_EMPTY,MISSING_SUBJECT X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [216.43.57.220] (HELO mail2.countryfinancial.com) (216.43.57.220) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 05 Oct 2004 13:24:44 -0700 Received: from mail2.countryfinancial.com (localhost [127.0.0.1]) by localhost.countryfinancial.com (Postfix) with ESMTP id 86CF726663 for <[EMAIL PROTECTED]>; Tue, 5 Oct 2004 15:21:18 -0500 (CDT) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="_=_NextPart_001_01C4AB19.56EB4D90" X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Subject: Date: Tue, 5 Oct 2004 15:24:39 -0500 X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Index: AcSrGVci8ZazItqwTMC4OUko8mOL6Q== From: "Patel, Viral" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Message-Id: <[EMAIL PROTECTED]> X-Virus-Checked: Checked This is a multi-part message in MIME format. --_=_NextPart_001_01C4AB19.56EB4D90 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Does anyone know how can I iterate through entire index and display all = of the "records" without typing anything in the query? --_=_NextPart_001_01C4AB19.56EB4D90 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Does anyone know how can I iterate = through entire index and display all of the "records" without = typing anything in the query? --_=_NextPart_001_01C4AB19.56EB4D90-- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: leakage in RAMDirectory ?
forgot to mention lucene 1.4.2 is the version I am currently using >-Original Message- >From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED] >Sent: 05 October 2004 19:32 >To: Lucene Users List >Subject: leakage in RAMDirectory ? > > >hi all > following is some code that i use to index the contents of a >table ( there >are 18746 records in the table. ) > using a database result set , i loop over all the records , > creating a document object and indexing into ramDirectory and >then onto the >fileSystem > > when I open a IndexReader and output numDoc i get 18740, > > How ever on running the same code, but using a FSDirectory object on >opening a IndexReader I get 18476 > > has anyone else come across this behaviour ? jdk being used is 1.4.1 > > >public class JournalIndexer extends JournalConstants { >IndexWriter ramWriter ; >Directory ramDirectory; >String dir; >public JournalIndexer(String dir) throws Exception{ >this.dir = dir; >ramDirectory = new RAMDirectory(); >ramWriter = new IndexWriter( ramDirectory, new SimpleAnalyzer() >,true ); >} > >public static void main(String args[]) throws Exception { >Statement stmt = connection.createStatement(); >JournalIndexer indexer = new JournalIndexer("journals"); > int main_counter = 0; >// SELECT ID, JOURNALTITLE, NLM_ID, ISSN, MEDLINE_ABBREVIATION, >ISO_ABBREVIATION, ESSN "+ >ResultSet rs = stmt.executeQuery(sqlFetchJournals); >while(rs.next() ){ >Journal journal = new Journal(); > ///set values >main_counter++; >indexer.add( journal ); >} >indexer.close(); >} > >int count = 0; > >public void add(Journal journal) throws Exception { >Document j_doc = new Document(); > //Field(String name , Stringstring, boolean store, boolean >index, boolean token) >Field id = new Field(ID,""+journal.getId(), true, >true, false ); >j_doc.add( id ); >ramWriter.addDocument( j_doc ); > count++; > >} > >public void close() throws Exception { >IndexWriter fileWriter = new IndexWriter( >FSDirectory.getDirectory(dir,true), new SimpleAnalyzer(),true); >Directory dirs[] = { ramDirectory }; >fileWriter.addIndexes( dirs ); >fileWriter.optimize(); >fileWriter.close(); >} > > class JournalAnalyzer extends Analyzer { > public TokenStream tokenStream(String field,Reader reader) { >TokenStream result = new WhitespaceTokenizer(reader); >result = new LowerCaseFilter(result); >return result; > } > } > >} > > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
leakage in RAMDirectory ?
hi all following is some code that i use to index the contents of a table ( there are 18746 records in the table. ) using a database result set , i loop over all the records , creating a document object and indexing into ramDirectory and then onto the fileSystem when I open a IndexReader and output numDoc i get 18740, How ever on running the same code, but using a FSDirectory object on opening a IndexReader I get 18476 has anyone else come across this behaviour ? jdk being used is 1.4.1 public class JournalIndexer extends JournalConstants { IndexWriter ramWriter ; Directory ramDirectory; String dir; public JournalIndexer(String dir) throws Exception{ this.dir = dir; ramDirectory = new RAMDirectory(); ramWriter = new IndexWriter( ramDirectory, new SimpleAnalyzer() ,true ); } public static void main(String args[]) throws Exception { Statement stmt = connection.createStatement(); JournalIndexer indexer = new JournalIndexer("journals"); int main_counter = 0; // SELECT ID, JOURNALTITLE, NLM_ID, ISSN, MEDLINE_ABBREVIATION, ISO_ABBREVIATION, ESSN "+ ResultSet rs = stmt.executeQuery(sqlFetchJournals); while(rs.next() ){ Journal journal = new Journal(); ///set values main_counter++; indexer.add( journal ); } indexer.close(); } int count = 0; public void add(Journal journal) throws Exception { Document j_doc = new Document(); //Field(String name , Stringstring, boolean store, boolean index, boolean token) Field id = new Field(ID,""+journal.getId(), true, true, false ); j_doc.add( id ); ramWriter.addDocument( j_doc ); count++; } public void close() throws Exception { IndexWriter fileWriter = new IndexWriter( FSDirectory.getDirectory(dir,true), new SimpleAnalyzer(),true); Directory dirs[] = { ramDirectory }; fileWriter.addIndexes( dirs ); fileWriter.optimize(); fileWriter.close(); } class JournalAnalyzer extends Analyzer { public TokenStream tokenStream(String field,Reader reader) { TokenStream result = new WhitespaceTokenizer(reader); result = new LowerCaseFilter(result); return result; } } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - Too Many Clases on date range.
How about use inter based filter instead of datatime based filter. datetime can convert to unix timestamp for compare. Thanks Che Dong http://www.chedong.com/ Chris Fraschetti wrote: Surely some folks out there have used lucene on a large scale and have had to compensate for this somehow, any other solutions? Morus, thank you very more for your imput, and I am looking into your solution, just putting my feelers out there once more. The lucene API is very limited as to it's descriptions of it's components, short of digging into the code, is there a good doc somewhere out there that explains the workins of lucene? On Mon, 4 Oct 2004 01:57:06 -0700, Chris Fraschetti <[EMAIL PROTECTED]> wrote: So before I spend a significant amount of time digging into the lucene code, how does your experience with lucene give light to my situation Our current index is pretty huge, and with each increase in side i've had, i've experienced a problem like this... Without taking up too much of your time.. because obviously this i my task, I thought i'd ask you if you'd had any experience with this boolean clause nonsense... of course it can be overcome, but if you know a quick hack, awesome, otherwise.. no big, but off to work i go :) -Fraschetti -- Forwarded message -- From: Morus Walter <[EMAIL PROTECTED]> Date: Mon, 4 Oct 2004 09:01:50 +0200 Subject: Re: BooleanQuery - Too Many Clases on date range. To: Lucene Users List <[EMAIL PROTECTED]>, Chris Fraschetti <[EMAIL PROTECTED]> Chris Fraschetti writes: So i decicded to move my epoch date to the 20040608 date which fixed my boolean query problem in regards to my current data size (approx 600,000) but now as soon as I do a query like ... a* I get the boolean error again. Google obviously can handle this query, and I'm pretty sure lucene can handle it.. any ideas? With out without a date dange specified i still get the TooManyClauses error. I tired cranking the maxclauses up to Integer.MaxInt, but java gave me a out of memory error. Is this b/c the boolean search tried to allocate that many clauses by default or because my query actually needed that many clauses? boolean search allocates clauses for all tokens having the prefix or matching the wildcard expression. Why does it work on small indexes but not large? Because there are fewer tokens starting with a. Is there any way to have the parser create as many clauses as it can and then search with what it has? w/o recompiling the source? You need to create your own version of Wildcard- and Prefix-Query that takes a maximum term number and ignores further clauses. And you need a variant of the query parser that uses these queries. This can be done, even without recompiling lucene, but you will have to do some programming at the level of lucene queries. Shouldn't be hard, since you can use the sources as a starting point. I guess this does not exist because the lucene developer decided to prefer a query error rather than uncomplete results. Morus -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
WebLucene 0.5 released: with a SAX based indexing sample Re: XML Indexing
http://sourceforge.net/projects/weblucene/ Regards Che Dong http://www.chedong.com/tech/weblucene.html Sumathi wrote: Can any one give me a demo for indexing XML files ? Mit freundlichen Grüssen - with kind regards _ Sumathi P Junior Consultant QA GFT Technologies , India 95 , Bharathidasan Salai Cantonment , Trichy-620001 TamilNadu , India T +91-431-2418 398 F +91-431-2418 698 [EMAIL PROTECTED] www.gft.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: BooleanQuery - Too Many Clases on date range.
On Oct 4, 2004, at 2:12 PM, Chris Fraschetti wrote: absoultely, limiting the user's query is no problem here. I've currently implemented the lucene javascript to catcha lot of user quries that could cause issues.. blank queries, ? or * at the beginning of query, etc etc... but I couldn't think of a way to prevent the user from doing a* but not comment* wanting comments or commentary... any suggestions would be warmly welcomed. I recommend subclassing QueryParser, and overriding getPrefixQuery and getWildcardQuery. In both of the overridden methods, throw a ParseException. You should be handling ParseException gracefully somehow already, so that should do the trick. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: XML Indexing
Check this article out. http://www-106.ibm.com/developerworks/library/j-lucene/ Simon - Original Message - From: "Sumathi" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, October 05, 2004 2:02 PM Subject: XML Indexing Can any one give me a demo for indexing XML files ? Mit freundlichen Grüssen - with kind regards _ Sumathi P Junior Consultant QA GFT Technologies , India 95 , Bharathidasan Salai Cantonment , Trichy-620001 TamilNadu , India T +91-431-2418 398 F +91-431-2418 698 [EMAIL PROTECTED] www.gft.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
XML Indexing
Can any one give me a demo for indexing XML files ? Mit freundlichen Grüssen - with kind regards _ Sumathi P Junior Consultant QA GFT Technologies , India 95 , Bharathidasan Salai Cantonment , Trichy-620001 TamilNadu , India T +91-431-2418 398 F +91-431-2418 698 [EMAIL PROTECTED] www.gft.com
Re: LUCENE and algorithm for asing score to hits
im sorry friends.. i put the title incorrectly for two times - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: hibernate and algorithm for asing score to hits
Hello, Lucene's API includes classes that let you explain hit scores (e.g. http://research.yahoo.com/demo/nutch/explain.xml?idx=6&id=4752515&query=web) Otis --- "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > (first excuse me for my english) > > hi people.. > > we are programming a few tests battery to see how lucene creates the > scores in > a search into a very simple and controllable documents. > > but we don't understand why the results looks not ok for us.. > > i am going to try to explain, the details of the tests described at > botton.. > > if you look the tests you will see: > > in test 1) > query: info:house > lucene returns a 50.00 score for this document > id info > - -- > house2house noise noise noise > > and the same score for this document: > id info > - -- > house3house noise noise noise noise > > > in test 2) > query: info:house > > lucene returns more score for this document: > id info > - -- > house3noise noiseso noise noiseso noise noiseso house house house > house > > than for this one: > id info > - -- > house5noise noiseso noise noiseso noise noiseso house house house > house house > house > > > > i have seen lucene's algoritm for creating this scores, i don't > understand all > but i > think the results of these searches are not ok for me.. > > i need comments, or urls to study and improve my search method > > > thanks for all > d2clon > > > > > > --- > TESTS > --- > > > > > we always execute this query, using the StandardAnalizer: > query: info:house > > > test 1 > --- > > document list: > > idinfo > --- > house0house noise > house1house noise noise > house2house noise noise noise > house3house noise noise noise noise > house4house noise noise noise noise noise > nohouse noise noiseso noise noiseso noise noiseso > > > scores: > > id score > --- > house062.5 > house150.0 > house250.0 > house343.75 > house437.5 > > > > > > > > test 2: > > > document list: > > idinfo > --- > house0noise noiseso noise noiseso noise noiseso house > house1noise noiseso noise noiseso noise noiseso house house > house2noise noiseso noise noiseso noise noiseso house house house > house3noise noiseso noise noiseso noise noiseso house house house > house > house4noise noiseso noise noiseso noise noiseso house house house > house house > house5noise noiseso noise noiseso noise noiseso house house house > house house > house > house6noise noiseso noise noiseso noise noiseso house house house > house house > house house > house7noise noiseso noise noiseso noise noiseso house house house > house house > house house house > house8noise noiseso noise noiseso noise noiseso house house house > house house > house house house house > house9noise noiseso noise noiseso noise noiseso house house house > house house > house house house house house > nohouse noise noiseso noise noiseso noise noiseso noise noiseso noise > noiseso > noise noiseso nohouse > > scores: > > id score > --- > house979.0569 > house875.0 > house770.7107 > house666.1438 > house362.5 > house561.2372 > house455.9017 > house254.1266 > house144.1942 > house037.5 > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
hibernate and algorithm for asing score to hits
(first excuse me for my english) hi people.. we are programming a few tests battery to see how lucene creates the scores in a search into a very simple and controllable documents. but we don't understand why the results looks not ok for us.. i am going to try to explain, the details of the tests described at botton.. if you look the tests you will see: in test 1) query: info:house lucene returns a 50.00 score for this document id info - -- house2 house noise noise noise and the same score for this document: id info - -- house3 house noise noise noise noise in test 2) query: info:house lucene returns more score for this document: id info - -- house3 noise noiseso noise noiseso noise noiseso house house house house than for this one: id info - -- house5 noise noiseso noise noiseso noise noiseso house house house house house house i have seen lucene's algoritm for creating this scores, i don't understand all but i think the results of these searches are not ok for me.. i need comments, or urls to study and improve my search method thanks for all d2clon --- TESTS --- we always execute this query, using the StandardAnalizer: query: info:house test 1 --- document list: id info --- house0 house noise house1 house noise noise house2 house noise noise noise house3 house noise noise noise noise house4 house noise noise noise noise noise nohouse noise noiseso noise noiseso noise noiseso scores: id score --- house062.5 house150.0 house250.0 house343.75 house437.5 test 2: document list: id info --- house0 noise noiseso noise noiseso noise noiseso house house1 noise noiseso noise noiseso noise noiseso house house house2 noise noiseso noise noiseso noise noiseso house house house house3 noise noiseso noise noiseso noise noiseso house house house house house4 noise noiseso noise noiseso noise noiseso house house house house house house5 noise noiseso noise noiseso noise noiseso house house house house house house house6 noise noiseso noise noiseso noise noiseso house house house house house house house house7 noise noiseso noise noiseso noise noiseso house house house house house house house house house8 noise noiseso noise noiseso noise noiseso house house house house house house house house house house9 noise noiseso noise noiseso noise noiseso house house house house house house house house house house nohouse noise noiseso noise noiseso noise noiseso noise noiseso noise noiseso noise noiseso nohouse scores: id score --- house979.0569 house875.0 house770.7107 house666.1438 house362.5 house561.2372 house455.9017 house254.1266 house144.1942 house037.5 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
hibernate y el algoritmo para asignar scores a los hits
(first excuse me for my english) hi people.. we are programming a few tests battery to see how lucene creates the scores in a search into a very simple and controllable documents. but we don't understand why the results looks not ok for us.. i am going to try to explain, the details of the tests described at botton.. if you look the tests you will see: in test 1) query: info:house lucene returns a 50.00 score for this document id info - -- house2 house noise noise noise and the same score for this document: id info - -- house3 house noise noise noise noise in test 2) query: info:house lucene returns more score for this document: id info - -- house3 noise noiseso noise noiseso noise noiseso house house house house than for this one: id info - -- house5 noise noiseso noise noiseso noise noiseso house house house house house house i have seen lucene's algoritm for creating this scores, i don't understand all but i think the results of these searches are not ok for me.. i need comments, or urls to study and improve my search method thanks for all d2clon --- TESTS --- we always execute this query, using the StandardAnalizer: query: info:house test 1 --- document list: id info --- house0 house noise house1 house noise noise house2 house noise noise noise house3 house noise noise noise noise house4 house noise noise noise noise noise nohouse noise noiseso noise noiseso noise noiseso scores: id score --- house062.5 house150.0 house250.0 house343.75 house437.5 test 2: document list: id info --- house0 noise noiseso noise noiseso noise noiseso house house1 noise noiseso noise noiseso noise noiseso house house house2 noise noiseso noise noiseso noise noiseso house house house house3 noise noiseso noise noiseso noise noiseso house house house house house4 noise noiseso noise noiseso noise noiseso house house house house house house5 noise noiseso noise noiseso noise noiseso house house house house house house house6 noise noiseso noise noiseso noise noiseso house house house house house house house house7 noise noiseso noise noiseso noise noiseso house house house house house house house house house8 noise noiseso noise noiseso noise noiseso house house house house house house house house house house9 noise noiseso noise noiseso noise noiseso house house house house house house house house house house nohouse noise noiseso noise noiseso noise noiseso noise noiseso noise noiseso noise noiseso nohouse scores: id score --- house979.0569 house875.0 house770.7107 house666.1438 house362.5 house561.2372 house455.9017 house254.1266 house144.1942 house037.5 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Cannot navigate to respective screens
Hi .. I'm new to lucene . I have copied the luceneweb.war that comes with lucene and created index also . i can go to the index file tru http://localhost:8080/luceneweb . when search for a word say 'sample' i'm getting a list of urls , pointing to files inside /webapps/.. . but when i click those urls i'm getting an error stating that the files could not be found . HTTP Status 404 - /webapps/tomcat-docs/appdev/printer/processes.html type Status report message /webapps/tomcat-docs/appdev/printer/processes.html description The requested resource (/webapps/tomcat-docs/appdev/printer/processes.html) is not available. What could be the problem ? Thanks in Advance !!