Here is another version of something I had posted earlier. It attempts to
read the text out of binary files. Not perfect and doesn't work at all on
PDF. It permits you use the reader form of a Field to index.
import java.util.*;
import java.io.*;
/**
pThis class is designed to retrieve text
This program illustrates what may be a bug. It creates an index, a document
with two fields. The second field is the problem. I use the Field
constructor to make a field that is not stored, is indexed, not tokenized
(there is no factory method for this combination).
The program then queries
Inspired by the Unix strings command, I have written a subclass of
FilterReader; which I have called BinaryReader. The idea is simply to index
any proprietary file format by filtering out all non-printable characters.
The assumption is that text is text. It will end up with more than the
into account the accent if Latin type of locale?
-Original Message-
From: Cecil, Paula New [mailto:[EMAIL PROTECTED]]
Sent: Monday, November 19, 2001 9:47 PM
To: LUCENE Text Search
Subject: Attribute Search
I am trying index a set of data, storing only a primary key. This
primary
]
Cecil, Paula New wrote:
This is my first message to this list. I have successfully created
several little tests of the Lucene api. In my last test, I am trying to
index data records. Only the primary key needs to be stored (and I did
not even index this field). For the others I want
I am trying index a set of data, storing only a primary key. This primary key I
left un-indexed. There is one text field, that I indexed and tokenized.
The others I neither want to store or tokenized. My reasoning was that not
tokenizing would produce the smallest index. The remaining