[newbie] Confused about PrefixQuery
All, I'm investigating the use of Lucene as a search engine, and have been doing some 'proof-of-concept' coding today. I'm indexing about 650 text files, and then searching against them using QueryParser. Here's the indexing code snippet: snip public static void Result(IndexWriter indexWriter, File file) throws FileNotFoundException { Document document = null; String content = ; BufferedReader br = new BufferedReader(new FileReader(file)); boolean EOF = false; try { while(!EOF) { String s = (String) br.readLine(); if (null == s) { EOF = true; } else { if (!.equals(s) CC.equals(s.substring(0, 3))) { document = new Document(); document.add(Field.Text(account, s.substring(3, 7))); document.add(Field.Keyword(created, s.substring(s.indexOf(DC) + 3, s.indexOf(DC) + 11))); content = new String(); } else if (!.equals(s) AN.equals(s.substring(0, 3))) { document.add(Field.Keyword(lastname, s.substring(3, 28).trim().toLowerCase())); document.add(Field.Keyword(firstname, s.substring(28, 43).trim().toLowerCase())); document.add(Field.Text(name, s.substring(28, 43).trim() + + s.substring(3, 28).trim())); document.add(Field.Keyword(controlnumber, s.substring(44, 52))); document.add(Field.Keyword(status, s.substring(52, 53).trim())); document.add(Field.Keyword(ssn, s.substring(53, 62))); document.add(Field.Keyword(dob, s.substring(62, 70))); document.add(Field.Keyword(collected, s.substring(137, 145))); } else if (!.equals(s) FF.equals(s.substring(0, 3))) { document.add(Field.UnStored(content, content)); indexWriter.addDocument(document); } else { content = content + s + \n; } } } br.close(); } catch(IOException ioe) { System.out.println(ioe.getClass() + caught with message + ioe.getMessage()); } } /snip The text files have two control lines at the beginning of them - CC and AN. I extract particular fields from these lines and add them to my document. Everything (I think) indexes correctly. When I search against this index, though, I get some weird results, especially when using an '*' at the end of my criteria. Here's the search code snippet: snip public static void main(String[] args) { try { Searcher searcher = new IndexSearcher(c:\\ResultIndex); Analyzer analyzer = new StandardAnalyzer(); BufferedReader br= new BufferedReader(new InputStreamReader(System.in)); while(true) { System.out.println(Query: ); String s = br.readLine(); if (null == s) { break; } else { Query query = QueryParser.parse(s, content, analyzer); System.out.println(Searching for: + query.toString(content)); Hits hits = searcher.search(query); System.out.println(... Found + hits.length() + matching documents); System.out.println(); for (int i = 0; i hits.length(); i++) { Document document = hits.doc(i); System.out.println(Hit + i + : Specimen = + document.get(controlnumber) + , Account = + document.get(account) + , Status = + document.get(status) + , Name = + document.get(name) + , SSN = + document.get(ssn) +
Re: [newbie] Confused about PrefixQuery
On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote: The text files have two control lines at the beginning of them - CC and AN. That's quite a complex example to ask a user list to decipher. Simplifying the example, besides making it easier for us to understand, would likely shed light on the problem. Everything (I think) indexes correctly. To be sure, try Luke out and see what got indexed exactly. You can also use Luke as an ad-hoc search tool rather than writing your own. When I search against this index, though, I get some weird results, especially when using an '*' at the end of my criteria. The results you got definitely are weird given the query, and in my initial glance through your code I did not see the issue pop out. Luke will likely shed much more light on the matter. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [newbie] Confused about PrefixQuery
Erik, Thanks for reply. Some lists want all the info, some don't. Just thought I'd try to provide as much info as possible 8-) That being said, where do I find Luke? Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 19, 2005 2:42 PM To: Lucene Users List Subject: Re: [newbie] Confused about PrefixQuery On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote: The text files have two control lines at the beginning of them - CC and AN. That's quite a complex example to ask a user list to decipher. Simplifying the example, besides making it easier for us to understand, would likely shed light on the problem. Everything (I think) indexes correctly. To be sure, try Luke out and see what got indexed exactly. You can also use Luke as an ad-hoc search tool rather than writing your own. When I search against this index, though, I get some weird results, especially when using an '*' at the end of my criteria. The results you got definitely are weird given the query, and in my initial glance through your code I did not see the issue pop out. Luke will likely shed much more light on the matter. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [newbie] Confused about PrefixQuery
oops / Never mind. Stupid, stupid assumption on my part with the data. Thanks anyway. Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Jerry Jalenak [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 19, 2005 3:12 PM To: 'Lucene Users List' Subject: RE: [newbie] Confused about PrefixQuery Erik, Thanks for reply. Some lists want all the info, some don't. Just thought I'd try to provide as much info as possible 8-) That being said, where do I find Luke? Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 19, 2005 2:42 PM To: Lucene Users List Subject: Re: [newbie] Confused about PrefixQuery On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote: The text files have two control lines at the beginning of them - CC and AN. That's quite a complex example to ask a user list to decipher. Simplifying the example, besides making it easier for us to understand, would likely shed light on the problem. Everything (I think) indexes correctly. To be sure, try Luke out and see what got indexed exactly. You can also use Luke as an ad-hoc search tool rather than writing your own. When I search against this index, though, I get some weird results, especially when using an '*' at the end of my criteria. The results you got definitely are weird given the query, and in my initial glance through your code I did not see the issue pop out. Luke will likely shed much more light on the matter. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [newbie] Confused about PrefixQuery
Sorry. Thought Luke came bundled with Lucene, and I was just missing it.. Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne, Inc. 10101 Renner Blvd. Lenexa, KS 66219 (913) 577-1496 [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 19, 2005 3:28 PM To: Lucene Users List Subject: Re: [newbie] Confused about PrefixQuery On Jan 19, 2005, at 4:12 PM, Jerry Jalenak wrote: Thanks for reply. Some lists want all the info, some don't. Just thought I'd try to provide as much info as possible 8-) The info is good... I just push for simple examples :) By simplifying, often the problem becomes apparent and trivial. That being said, where do I find Luke? Silly response, but go to Google, type in _luke lucene_ and press I'm feeling lucky :) But, since I already have the URL handy, here it is: http://www.getopt.org/luke/ Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] This transmission (and any information attached to it) may be confidential and is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient or the person responsible for delivering the transmission to the intended recipient, be advised that you have received this transmission in error and that any use, dissemination, forwarding, printing, or copying of this information is strictly prohibited. If you have received this transmission in error, please immediately notify LabOne at the following email address: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]