the list of terms contained in a document
dear all, we have a linguistics project running here and we want to use lucene for the information retrieval. rather then just searching for specific terms we want to build frequency lists and detect coocurrences of terms. what we need is some kind of the following functionality (I will give what I think could be a resulting API) 1. IndexSearcher.search(query) (already implemented) 2. Hits.getLength() (already implemented) 3. for (...) Hits.doc(i).getTerms() or Hits.doc(i).getTerms(Field) (required) (4. and for each returned doc its frequency, but that is the same as above - or could it be retrieved together with the term list?) This means, that if I get a Hits object back, I want for all its documents to get the terms and their frequency. sure, I could look the document up and parse it - again. but then if the first query produces, say 20.000 hits, I would have to reparse these 20.000 documents while this parsing has already been done for the index creation. instead I wanted to ask if there is a possibility within the existing classes (or at least with some use of them and some new ones) to retrieve this information: to wich terms a single document is assigned to. thanx a lot for any help or hint sincerely, Chantal -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
SqlDirectory
hi all some time ago, there was a short discussion about a database store. I also needed some persistence layer that was accessible via JDBC. It turned out, that a BLOB implementation is strongly dependent on the RDBMS used and also poorly performing. I implemented a SqlDirectory, based on the idea of RAMDirectory and its buffers as basic element. goals: 1. should work with all JDBC compliant RDBMS (no adaption required, no blobs!). 2. performance should be acceptable. 3. simple db schema. status: 1. tested on Oracle 8i (free oracle JDBC driver type 4) and SQL Server 2000 (free microsoft JDBC beta driver type 4). works perfectly. 2. consists of 2 tables and 1 index. (one tablespace can have several indexes of course) 3. promising performance. todo: 1. test reliability, performance, concurrency (multiple reader/writer), test with mySQL 2. code review 3. introduce caching (maybe CacheDirectory) if someone has experience or just likes to test it, mail me. Anyway, could I simply attach the SqlDirectory.java file to my mails? marc -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Attribute Search
this is exactly what I was doing. Store=false, index=true, and token=false. This combination is *not* represented by one of the factory methods. It appeared to work ok, but searches *never* returned any hits. That's why I suspect it is a bug. -Original Message- From: Ype Kingma [mailto:[EMAIL PROTECTED]] Sent: Wednesday, November 21, 2001 2:51 PM To: Lucene Users List Subject: Re: Attribute Search Paula, I came across a tutorial which had some details on the static factory Field methods. But none of the factory methods return a Field object with the following settings: Store = false Index = true Tokenize = false I'm beginning to think this is a bug - that this combination is handled correctly. The Field() constructor is public, can't you use that instead of one of the factory methods? public Field(String name, String string, boolean store, boolean index, boolean token) Regards, Ype -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
RE: Attribute Search
From: New, Cecil (GEAE) [mailto:[EMAIL PROTECTED]] this is exactly what I was doing. Store=false, index=true, and token=false. It appeared to work ok, but searches *never* returned any hits. That's why I suspect it is a bug. If you think this is a bug, please submit a test case, as a simple class whose 'main()' method illustrates the problem. Doug -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]