Problems about using Lucene to generate tag cloud..

2008-03-31 Thread wuqi
Hi, I am trying to use Lucene index to implement a tag cloud system. I add a new field named "tags" in index to store all the tags,and we don't support tags with more than one word, so different tags of the same document just are separate by white space. The "tags" filed in one document may

Problems about using Lucene to generate tag cloud..

2008-03-31 Thread wuqi
Hi, I am trying to use Lucene index to implement a tag cloud system. I add a new field named "tags" in index to store all the tags,and we don't support tags with more than one word, so different tags of the same document just are separate by white space. The "tags" filed in one document may

Re: java.lang.IllegalArgumentException: Segment is too large

2008-03-31 Thread Michael McCandless
This happens when addIndexesNoOptimize finds a segment that's larger than maxMergeDocs in the index(es) it's given. If you leave maxMergeDocs at Integer.MAX_VALUE it will fix that. Though really it's being a little too pedantic because that setting (maxMergeDocs) sets the maximum size of a

Tokenize on another character

2008-03-31 Thread fiaz.khan
Hello I just joined the list and need some help. I have a database of music tracks.These tracks have been added to an index. They are classified using keywords, so a track can have up to 20 keywords assigned to them. I took the keywords and create a "keyword" FIELD which was not stored and tokeni

RE: setPositionIncrement questions

2008-03-31 Thread Itamar Syn-Hershko
Chris, Thanks for your input. Please let me make sure that I get this right: while iterating through the words in a document, I can use my tokenizer to setPositionIncrement(150) on a specific token, what would make it be more distant from the previous token than it should have been. The next tok

Re: Tokenize on another character

2008-03-31 Thread Erick Erickson
I'm confused on the use case you're trying to implement, could you add a bit more explanation? In particular, do you ever want ROCK to match ROCK AND ROLL? If you want both, that is some searches match partial keywords and some match entire keywords, I recommend you create a second field in your d

Re: setPositionIncrement questions

2008-03-31 Thread Erick Erickson
See below... On Mon, Mar 31, 2008 at 7:02 AM, Itamar Syn-Hershko <[EMAIL PROTECTED]> wrote: > > Chris, > > Thanks for your input. > > Please let me make sure that I get this right: while iterating through the > words in a document, I can use my tokenizer to setPositionIncrement(150) > on > a spec

Re: Tokenize on another character

2008-03-31 Thread Fiaz Khan
Thanks Erick Ok,.. I have a track called METAL MAN, this has 4 categories assigned to it like so: GUITAR ROCK ROCK AND ROLL METAL I have another track called NOISE with the following 3 categories: GUITAR ROCK AND ROLL METAL When a user searches using the keyword ROCK, it is finding both w

Re: Tokenize on another character

2008-03-31 Thread Erick Erickson
Much clearer. Here's what I'd try. Index UN_TOKENIZED as follows: for METAL MAN (bad pseudo-code...) Document doc = new Document(); doc.add("category", "GUITAR", Store.NO, UN_TOKENIZED); doc.add("category", "ROCK", Store.NO, UN_TOKENIZED); doc.add("category", "ROCK AND ROLL" , Store.NO, UN_TOKENIZ

RE: setPositionIncrement questions

2008-03-31 Thread Itamar Syn-Hershko
Well, here is the thing - I don't necessarily want to get results per paragraphs - which your code will do just fine for. I want to have my article titles and sub-headers in the main text field, after I have duplicated them to give the words they contain more weight. So I will not want to return h

Re: java.lang.IllegalArgumentException: Segment is too large

2008-03-31 Thread Yonik Seeley
On Mon, Mar 31, 2008 at 5:19 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > I think we should remove those checks and allow addIndexesNoOptimize > to import and index even if it has segments over this limit. I'll > open an issue. +1 -Yonik ---

RE: setPositionIncrement questions

2008-03-31 Thread Chris Hostetter
: duplicated them to give the words they contain more weight. So I will not : want to return higher PositionIncrement for each instance of a field, just : those which I'm interested in (title/headers). Can this be done somehow : without injecting a "magic string", as Chris called it? there are mu

Lucene Search Engine Query

2008-03-31 Thread shrish garg
Hi, I am using Lucene search engine in my website for document search . though it is working fine and searching the keywords into the documents properly, i am facing a problem during the search . When i am searching some keywords whose occurence are very low in the document and only

Re: Lucene Search Engine Query

2008-03-31 Thread Michael McCandless
This could be the maxFieldLength default in IndexWriter? By default IndexWriter only indexes the first 10,000 tokens of a document. Mike shrish garg wrote: Hi, I am using Lucene search engine in my website for document search . though it is working fine and searching the keywords

Lucene Search Engine Query

2008-03-31 Thread shrish garg
Hi, I am using Lucene search engine in my website for document search . though it is working fine and searching the keywords into the documents properly, i am facing a problem during the search . When i am searching some keywords whose occurence are very low in the document and only