Re: using list of items to be excluded while querying

2008-10-16 Thread Mark Harwood
Yes, use TermsFilter to add your 5000 terms by calling TermsFilter.addTerm(term) repeatedly then put that single filter as a single "not" clause in a BooleanFilter Cheers Mark On 17 Oct 2008, at 04:02, "prabin meitei" <[EMAIL PROTECTED]> wrote: Hi, Thanks for the reply. I looked through the Fi

Re: IndexSearcher update

2008-10-16 Thread Anshum
Yes you may do that as well... no updates are noted by the searcher until it (the searcher) is updated :) -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Thu, Oct 16, 2008

AUTO: Benjamin Sznajder/Haifa/Contr/IBM is out of the office. (returning 22/10/2008)

2008-10-16 Thread Benjamin Sznajder
I am out of the office until 22/10/2008. Note: This is an automated response to your message "Re: Unique tokens analyzer" sent on 14/10/08 21:00:12. This is the only notification you will receive while this person is away. ---

Re: using list of items to be excluded while querying

2008-10-16 Thread prabin meitei
Hi, Thanks for the reply. I looked through the Filter class. I think i can use the TermFilter for my requirement. But I have few doubs regarding the use of termFilter. Can I add any number of terms to it?? say 5000 terms? Is there any limit? Can i use this term filter by adding to another boolean

Re: Link map over results? or term freq

2008-10-16 Thread Darren Govoni
Very nice work. That's what I'm wanting to do. Without giving away your paper thesis algorithm, does it use TFV's? Are the tags in the cloud calculated from the result documents by some non-Lucene scoring mechanism? I could probably come up with one given the results, but was curious if Lucene mad

Re: Link map over results? or term freq

2008-10-16 Thread Darren Govoni
You guys are so awesome! Thank you for the detailed and thoughtful responses. I will eagerly look at your work! I see the tag cloud thing similar to clustering as N.Hira mentions except its more "fuzzy" than clustering. I also recently looked at carrot as well and am learning what it does, but ev

Re: Link map over results? or term freq

2008-10-16 Thread Glen Newton
See also: http://zzzoot.blogspot.com/2007/10/drill-clouds-for-search-refinement-id.html and http://zzzoot.blogspot.com/2007/10/tag-cloud-inspired-html-select-lists.html -glen 2008/10/16 Glen Newton <[EMAIL PROTECTED]>: > Yes, tag clouds. > > I've implemented them using Lucene here for NRC Resear

Re: Link map over results? or term freq

2008-10-16 Thread Glen Newton
Yes, tag clouds. I've implemented them using Lucene here for NRC Research Press articles: http://lab.cisti-icist.nrc-cnrc.gc.ca/ungava/Search?tagCloud=true&collection=jos&tagField=keyword&keyword=%22chromatin%22&numCloudDocs=200&numCloudTags=50&sortBy=relevance and here on the Colorado State Univ

Re: Link map over results? or term freq

2008-10-16 Thread N. Hira
I think I understand what you're describing as a "link map" to be a "tag cloud" where each tag is a "frequent" or "strong" term. We did something like this as an experiment (without Lucene): http://www.cognocys.com/prospector/news.html If you're talking about something similar, then I think yo

Re: Link map over results? or term freq

2008-10-16 Thread Darren Govoni
I guess a link map (as I understand it) is a collection of hyperlinks of words/phrases where the dominant ones are bolder color and larger font. Its relatively new schema, some sites are using. For example, someone searches for a person and a link map would show them all the most frequent terms i

Re: using list of items to be excluded while querying

2008-10-16 Thread Erick Erickson
This sounds like a filter would work here. The basic idea of a filter is that it's a bitmap where each bit's ordinal position represents the a doc ID. Only documents corresponding to "on" bits are returned. Filters can be combined, flipped, etc. All the things you'd expect to do with a bunch of bi

Re: how to get the "highlight" code for v 2.2.0 (or any prior version)?

2008-10-16 Thread Chris Hostetter
: a) I can see how to get a zip/jar of the Lucene :v.2.2.0 (http://www.urlstructure.com/apache/lucene/java/archive/) :or v.2.3.0 (http://www.urlstructure.com/apache/lucene/java/) : : b) but none of those contain the package "org.apache.lucene.search.highlight" it's located in the highl

excluding a list of items while querying

2008-10-16 Thread prabin meitei
Hi, I have a large index of documents of fields "id" "name" and few other. while querying i do want to exclude a list of ids i passed in. for this what i use is Query query = new BooleanQuery(); for (int i=0; i

using list of items to be excluded while querying

2008-10-16 Thread prabin meitei
Hi, I have a large index of documents of fields "id" "name" and few other. while querying i do want to exclude a list of ids i passed in. for this what i use is Query query = new BooleanQuery(); for (int i=0; i

Re: Link map over results? or term freq

2008-10-16 Thread Glen Newton
Sorry, could you explain what you mean by a "link map over lucene results"? thanks, -glen 2008/10/16 Darren Govoni <[EMAIL PROTECTED]>: > Hi, > Has anyone created a link map over lucene results or know of a link > describing the process? If not, I would like to build one to contribute. > > Also,

RE: how to get the

2008-10-16 Thread rolarenfan
Steve- >On 10/16/2008 at 12:00 PM, [EMAIL PROTECTED] wrote: >> Still a newbie here, sorry: >> >d) Get the tagged version by checking out the "highlight" package through SVN: > > > Perfect -- thanks! (Just didn't

Link map over results? or term freq

2008-10-16 Thread Darren Govoni
Hi, Has anyone created a link map over lucene results or know of a link describing the process? If not, I would like to build one to contribute. Also, I read about term frequencies in the book, but wanted to know if I can extract the strongest occurring terms from a given result set or result?

RE: how to get the "highlight" code for v 2.2.0 (or any prior version)?

2008-10-16 Thread Steven A Rowe
Hi Paul, On 10/16/2008 at 12:00 PM, [EMAIL PROTECTED] wrote: > Still a newbie here, sorry: > > a) I can see how to get a zip/jar of the Lucene >v.2.2.0 (http://www.urlstructure.com/apache/lucene/java/archive/) >or v.2.3.0 (http://www.urlstructure.com/apache/lucene/java/) > > b) but none

how to get the "highlight" code for v 2.2.0 (or any prior version)?

2008-10-16 Thread rolarenfan
Still a newbie here, sorry: a) I can see how to get a zip/jar of the Lucene v.2.2.0 (http://www.urlstructure.com/apache/lucene/java/archive/) or v.2.3.0 (http://www.urlstructure.com/apache/lucene/java/) b) but none of those contain the package "org.apache.lucene.search.highlight" c) I

Re: bunch of newbie queries, PS

2008-10-16 Thread rolarenfan
Hoss -- Thanks for prior replies; now back to this one. Context: I am trying to work with 2.2.0 (as mentioned), but when I got the downloads (src and class-jar), they did not inclde something called "highlighter" that it now turns out I need. So ... >: the "anonymous" SVN (http://svn.apache

Equal distribution over a field

2008-10-16 Thread Anselmo
I am using Lucene to search for products in an online-shop with several shops, brands etc. Is it possible to sort documents with the same score, in a way that a field (e.g. the brand field) is equally distributed. Otherwise, the products are sorted by the appearance of the brands in the database

Re: How to restore corrupted index

2008-10-16 Thread Michael McCandless
Can you post the full traceback for your exception, and describe your indexing process as well? Mike mahdi yari wrote: hi dears i have same problem i indexing on Ubuntu Linux Distro and i have large index (>30G) and mergeFactor = 10, my Lucene version is 2.2.0 i think this maybe bug on Luc

Re: How to restore corrupted index

2008-10-16 Thread Michael McCandless
You should run CheckIndex to quickly get back to a usable index, but it will remove any segments that have problems loading. But... I'd like to get to the root cause here. It looks like this is Lucene 2.2.0. Somehow the file _w5.cfs is entirely missing. Can you describe how your indexing p

Re: update IndexSearcher

2008-10-16 Thread Erick Erickson
Yes, assuming that your searcher does not close/reopen the reader. Conceptually, the indexsearcher takes a snapshot of your index at the instant it's opened and uses that snapshot until you close the underlying reader, so you should be fine. Best Erick On Thu, Oct 16, 2008 at 6:17 AM, mahdi yari

Re: No hits for longer search strings

2008-10-16 Thread Erick Erickson
query.toSting() is your friend, as is Luke's explain tab. I'd strongly recommend that you try those, because I suspect that you're not quite getting the search string you think. That said, why use StandardAnalyzer for this? I'd recommend KeywordAnalyzer instead (but watch the case). The wildcard

Re: No hits for longer search strings

2008-10-16 Thread Karsten F.
Hi Chris, most likely this is not a lucene problem. You looked with luke in the stored fields of your document? Please take a second look with luke in the terms of your field 'unique_id' (with "Show top terms"): What do you see? Best regards Karsten btw: why do you use the prefix search? Thi

Re: How to restore corrupted index

2008-10-16 Thread mahdi yari
hi dears i have same problem i indexing on Ubuntu Linux Distro and i have large index (>30G) and mergeFactor = 10, my Lucene version is 2.2.0 i think this maybe bug on Lucene 2.2.0 but i get this error sometimes, not always thanks alot On Thu, Oct 16, 2008 at 3:01 PM, Chaula Ganatra <[EMAIL PROT

RE: How to restore corrupted index

2008-10-16 Thread Chaula Ganatra
Hi, I am again getting the following error while optimization. java.io.FileNotFoundException: \\machine01\indexes\_w5.cfs (The system cannot find the file specified) 16:20:57,533 INFO [STDOUT] : 140 at java.io.RandomAccessFile.open(Native Method) 16:20:57,533 INFO [STDOUT] : 140 at

Re: IndexSearcher update

2008-10-16 Thread mahdi yari
thanks for your reply. and how about merge? if i searcher on index1, and in other thread, i try to merge index2 into index1 and i do not update searcher, can i continue searching on index1? thanks Mahdi On Thu, Oct 16, 2008 at 2:19 PM, Anshum <[EMAIL PROTECTED]> wrote: > Yes you can! :) > Very no

Re: IndexSearcher update

2008-10-16 Thread Anshum
Yes you can! :) Very normally. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Thu, Oct 16, 2008 at 3:43 PM, mahdi yari <[EMAIL PROTECTED]> wrote: > hi dears > > i have a

update IndexSearcher

2008-10-16 Thread mahdi yari
hi dears i have a question of Lucene i have on index with 1,000 document with id field(String:UUID) and one indexSearcher for search on it, after that, i start one IndexWriter that writes 1,000,000 new document in to index, now if i do not update IndexSearcher, can i search on first 1,000 documen

No hits for longer search strings

2008-10-16 Thread Chris Mannion
Hi All I have a bit of a puzzle in the Lucene system we've been running. Part of our use involves inserting documents indexed by a unique key and then running exact searches to find that single document again later to display (the documents are also indexed by several other fields and used in a b

IndexSearcher update

2008-10-16 Thread mahdi yari
hi dears i have a question of Lucene i have on index with 1,000 document with id field(String:UUID) and one indexSearcher for search on it, after that, i start one IndexWriter that writes 1,000,000 new document in to index, now if i do not update IndexSearcher, can i search on first 1,000 documen

Re: highlighter / fragmenter performance for large fields

2008-10-16 Thread Karsten F.
Hi Brian, I don't know the internals of highlighting („explanation“) in lucene. But I know that XTF ( http://xtf.wiki.sourceforge.net/underHood_Documents#tocunderHood_Documents5 ) can handle very large documents (above 100 Mbyte) with highlighting very fast. The difference to your approach is, th

Using JdbcDirectory

2008-10-16 Thread Kalani Ruwanpathirana
Hi, I am using the Compass implementation of Lucene's JdbcDirectory interface, to create the index in a database. Currently I am using RAMDirectory to act in the middle to get some buffering support to reduce the performance hit. Am I doing anything unnecessary here? Somewhere I saw that JdbcDire