[newbie] Confused about PrefixQuery

2005-01-19 Thread Jerry Jalenak
All,

I'm investigating the use of Lucene as a search engine, and have been doing
some 'proof-of-concept' coding today.  I'm indexing about 650 text files,
and then searching against them using QueryParser.  Here's the indexing code
snippet:

snip
public static void Result(IndexWriter indexWriter, File file)
throws FileNotFoundException
{
Document document = null;
String content = ;

BufferedReader br = new BufferedReader(new FileReader(file));
boolean EOF = false;

try
{
while(!EOF)
{
String s = (String) br.readLine();
if (null == s)
{
EOF = true;
}
else
{
if (!.equals(s) 
CC.equals(s.substring(0, 3)))
{
document = new Document();

document.add(Field.Text(account,
s.substring(3, 7)));

document.add(Field.Keyword(created, s.substring(s.indexOf(DC) + 3,
s.indexOf(DC) + 11)));

content = new String();
}
else if (!.equals(s) 
AN.equals(s.substring(0, 3)))
{

document.add(Field.Keyword(lastname, s.substring(3,
28).trim().toLowerCase()));

document.add(Field.Keyword(firstname, s.substring(28,
43).trim().toLowerCase()));
document.add(Field.Text(name,
s.substring(28, 43).trim() +   + s.substring(3, 28).trim()));

document.add(Field.Keyword(controlnumber, s.substring(44, 52)));
document.add(Field.Keyword(status,
s.substring(52, 53).trim()));
document.add(Field.Keyword(ssn,
s.substring(53, 62)));
document.add(Field.Keyword(dob,
s.substring(62, 70)));

document.add(Field.Keyword(collected, s.substring(137, 145)));
}
else if (!.equals(s) 
FF.equals(s.substring(0, 3)))
{

document.add(Field.UnStored(content, content));
indexWriter.addDocument(document);
}
else
{
content = content + s + \n;
}
}
}
br.close();
}
catch(IOException ioe)
{
System.out.println(ioe.getClass() +  caught with message 
+ ioe.getMessage());
}
}
/snip

The text files have two control lines at the beginning of them - CC and
AN.  I extract particular fields from these lines and add them to my
document.  Everything (I think) indexes correctly.  When I search against
this index, though, I get some weird results, especially when using an '*'
at the end of my criteria.  Here's the search code snippet:

snip
public static void main(String[] args)
{
try
{
Searcher searcher = new IndexSearcher(c:\\ResultIndex);
Analyzer analyzer = new StandardAnalyzer();

BufferedReader br= new BufferedReader(new
InputStreamReader(System.in));
while(true)
{
System.out.println(Query: );
String s = br.readLine();
if (null == s)
{
break;
}
else
{
Query query = QueryParser.parse(s,
content, analyzer);
System.out.println(Searching for:  +
query.toString(content));

Hits hits = searcher.search(query);
System.out.println(... Found  +
hits.length() +  matching documents);
System.out.println();

for (int i = 0; i  hits.length(); i++)
{
Document document = hits.doc(i);
System.out.println(Hit  + i + :
Specimen =  + document.get(controlnumber) + , Account =  +
document.get(account) + 
, Status =  +
document.get(status) + , Name =  + document.get(name) + , SSN =  +
document.get(ssn) + 

Re: [newbie] Confused about PrefixQuery

2005-01-19 Thread Erik Hatcher
On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote:
The text files have two control lines at the beginning of them - CC 
and
AN.
That's quite a complex example to ask a user list to decipher.
Simplifying the example, besides making it easier for us to understand, 
would likely shed light on the problem.

Everything (I think) indexes correctly.
To be sure, try Luke out and see what got indexed exactly.  You can 
also use Luke as an ad-hoc search tool rather than writing your own.

  When I search against
this index, though, I get some weird results, especially when using an 
'*'
at the end of my criteria.
The results you got definitely are weird given the query, and in my 
initial glance through your code I did not see the issue pop out.  Luke 
will likely shed much more light on the matter.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: [newbie] Confused about PrefixQuery

2005-01-19 Thread Jerry Jalenak
Erik,

Thanks for reply.  Some lists want all the info, some don't.  Just thought
I'd try to provide as much info as possible  8-)

That being said, where do I find Luke?



Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, January 19, 2005 2:42 PM
 To: Lucene Users List
 Subject: Re: [newbie] Confused about PrefixQuery
 
 
 
 On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote:
  The text files have two control lines at the beginning of 
 them - CC 
  and
  AN.
 
 That's quite a complex example to ask a user list to decipher.
 
 Simplifying the example, besides making it easier for us to 
 understand, 
 would likely shed light on the problem.
 
  Everything (I think) indexes correctly.
 
 To be sure, try Luke out and see what got indexed exactly.  You can 
 also use Luke as an ad-hoc search tool rather than writing your own.
 
When I search against
  this index, though, I get some weird results, especially 
 when using an 
  '*'
  at the end of my criteria.
 
 The results you got definitely are weird given the query, and in my 
 initial glance through your code I did not see the issue pop 
 out.  Luke 
 will likely shed much more light on the matter.
 
   Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [newbie] Confused about PrefixQuery

2005-01-19 Thread Jerry Jalenak
oops /

Never mind.  Stupid, stupid assumption on my part with the data.

Thanks anyway.

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


 -Original Message-
 From: Jerry Jalenak [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, January 19, 2005 3:12 PM
 To: 'Lucene Users List'
 Subject: RE: [newbie] Confused about PrefixQuery
 
 
 Erik,
 
 Thanks for reply.  Some lists want all the info, some don't.  
 Just thought
 I'd try to provide as much info as possible  8-)
 
 That being said, where do I find Luke?
 
 
 
 Jerry Jalenak
 Senior Programmer / Analyst, Web Publishing
 LabOne, Inc.
 10101 Renner Blvd.
 Lenexa, KS  66219
 (913) 577-1496
 
 [EMAIL PROTECTED]
 
 
  -Original Message-
  From: Erik Hatcher [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, January 19, 2005 2:42 PM
  To: Lucene Users List
  Subject: Re: [newbie] Confused about PrefixQuery
  
  
  
  On Jan 19, 2005, at 3:16 PM, Jerry Jalenak wrote:
   The text files have two control lines at the beginning of 
  them - CC 
   and
   AN.
  
  That's quite a complex example to ask a user list to decipher.
  
  Simplifying the example, besides making it easier for us to 
  understand, 
  would likely shed light on the problem.
  
   Everything (I think) indexes correctly.
  
  To be sure, try Luke out and see what got indexed exactly.  You can 
  also use Luke as an ad-hoc search tool rather than writing your own.
  
 When I search against
   this index, though, I get some weird results, especially 
  when using an 
   '*'
   at the end of my criteria.
  
  The results you got definitely are weird given the query, and in my 
  initial glance through your code I did not see the issue pop 
  out.  Luke 
  will likely shed much more light on the matter.
  
  Erik
  
  
  
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
  
  
 
 This transmission (and any information attached to it) may be 
 confidential and
 is intended solely for the use of the individual or entity to 
 which it is
 addressed. If you are not the intended recipient or the 
 person responsible for
 delivering the transmission to the intended recipient, be 
 advised that you
 have received this transmission in error and that any use, 
 dissemination,
 forwarding, printing, or copying of this information is 
 strictly prohibited.
 If you have received this transmission in error, please 
 immediately notify
 LabOne at the following email address: 
 [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [newbie] Confused about PrefixQuery

2005-01-19 Thread Jerry Jalenak
Sorry.  Thought Luke came bundled with Lucene, and I was just missing it..

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


 -Original Message-
 From: Erik Hatcher [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, January 19, 2005 3:28 PM
 To: Lucene Users List
 Subject: Re: [newbie] Confused about PrefixQuery
 
 
 
 On Jan 19, 2005, at 4:12 PM, Jerry Jalenak wrote:
  Thanks for reply.  Some lists want all the info, some don't.  Just 
  thought
  I'd try to provide as much info as possible  8-)
 
 The info is good... I just push for simple examples :)  By 
 simplifying, 
 often the problem becomes apparent and trivial.
 
  That being said, where do I find Luke?
 
 Silly response, but go to Google, type in _luke lucene_ and 
 press I'm 
 feeling lucky :)
 
 But, since I already have the URL handy, here it is:
 
   http://www.getopt.org/luke/
 
   Erik
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]