the list of terms contained in a document

2001-11-26 Thread Chantal Ackermann


dear all,

we have a linguistics project running here and we
want to use lucene for the 
information retrieval. rather then just searching
for specific terms we want 
to build frequency lists and detect coocurrences
of terms.

what we need is some kind of the following
functionality (I will give what I 
think could be a resulting API)

1. IndexSearcher.search(query) (already implemented)
2. Hits.getLength() (already implemented)
3. for (...) Hits.doc(i).getTerms() or
Hits.doc(i).getTerms(Field) (required)
(4. and for each returned doc its frequency, but
that is the same as above - 
or could it be retrieved together with the term list?)

This means, that if I get a Hits object back, I
want for all its documents to 
get the terms and their frequency. sure, I could
look the document up and 
parse it - again. but then if the first query
produces, say 20.000 hits, I 
would have to reparse these 20.000 documents while
this parsing has already 
been done for the index creation. instead I wanted
to ask if there is a 
possibility within the existing classes (or at
least with some use of them 
and some new ones) to retrieve this information:
to wich terms a single 
document is assigned to.

thanx a lot for any help or hint
sincerely,
Chantal


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




SqlDirectory

2001-11-26 Thread Marc Kramis

hi all

some time ago, there was a short discussion about a database store. I also
needed some persistence layer that was accessible via JDBC. It turned out,
that a BLOB implementation is strongly dependent on the RDBMS used and also
poorly performing.

I implemented a SqlDirectory, based on the idea of RAMDirectory and its
buffers as basic element.
goals:
1. should work with all JDBC compliant RDBMS (no adaption required, no
blobs!).
2. performance should be acceptable.
3. simple db schema.

status:
1. tested on Oracle 8i (free oracle JDBC driver type 4) and SQL Server 2000
(free microsoft JDBC beta driver type 4). works perfectly.
2. consists of 2 tables and 1 index. (one tablespace can have several
indexes of course)
3. promising performance.

todo:
1. test reliability, performance, concurrency (multiple reader/writer), test
with mySQL
2. code review
3. introduce caching (maybe CacheDirectory)

if someone has experience or just likes to test it, mail me. Anyway, could I
simply attach the SqlDirectory.java file to my mails?

marc



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Attribute Search

2001-11-26 Thread New, Cecil (GEAE)

this is exactly what I was doing.  Store=false, index=true, and token=false.

This combination is *not* represented by one of the factory methods.  It
appeared to work ok, but searches *never* returned any hits.

That's why I suspect it is a bug.

-Original Message-
From: Ype Kingma [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, November 21, 2001 2:51 PM
To: Lucene Users List
Subject: Re: Attribute Search


Paula,

I came across a tutorial which had some details on the static factory Field
methods.  But none of the factory methods return a Field object with the
following settings:
Store = false
Index = true
Tokenize = false

I'm beginning to think this is a bug - that this combination is handled
correctly.

The Field() constructor is public, can't you use that instead of one
of the factory methods?

public Field(String name,
 String string,
 boolean store,
 boolean index,
 boolean token)

Regards,
Ype

--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Attribute Search

2001-11-26 Thread Doug Cutting

 From: New, Cecil (GEAE) [mailto:[EMAIL PROTECTED]]
 
 this is exactly what I was doing.  Store=false, index=true, 
 and token=false.
 
 It appeared to work ok, but searches *never* returned any hits.
 
 That's why I suspect it is a bug.

If you think this is a bug, please submit a test case, as a simple class
whose 'main()' method illustrates the problem.

Doug

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]