Re: QueryParser returning TermQuery instead of PhraseQuery?

2008-10-20 Thread Daniel Noll
samd wrote: I have field for example say "foo" I need to match exactly foo but there is also another field for exampled called "foo1" What I want is a PhraseQuery so I surround foo with quotes before it gets passed to the QueryParser.parse method. However I get back a TermQuery and the values th

QueryParser returning TermQuery instead of PhraseQuery?

2008-10-20 Thread samd
I have field for example say "foo" I need to match exactly foo but there is also another field for exampled called "foo1" What I want is a PhraseQuery so I surround foo with quotes before it gets passed to the QueryParser.parse method. However I get back a TermQuery and the values that match foo1

Re: How to search in metadata? (filename, document title, cocument creator, ...)

2008-10-20 Thread Grant Ingersoll
On Oct 20, 2008, at 5:07 PM, mil84 wrote: thx :) There was also another problem with filename (because I indexed full path, not only name). But I fixed it, and now it finally works. Last question - how to get number of hits in every document (not only global number of hits)? Best would

Re: highlighter / fragmenter performance for large fields

2008-10-20 Thread Brian Beard
Karsten, Thanks, I will look into this. >Hi Brian, > >I don't know the internals of highlighting („explanation“) in lucene. >But I know that XTF ( >http://xtf.wiki.sourceforge.net/underHood_Documents#tocunderHood_Documents5 >) can handle very large documents (above 100 Mbyte) with highlighting v

RE: How to search in metadata? (filename, document title, cocument creator, ...)

2008-10-20 Thread mil84
thx :) There was also another problem with filename (because I indexed full path, not only name). But I fixed it, and now it finally works. Last question - how to get number of hits in every document (not only global number of hits)? Best would be a simple example, if possible... -- View this me

RE: How to search in metadata? (filename, document title, cocument creator, ...)

2008-10-20 Thread Steven A Rowe
On 10/20/2008 at 12:41 PM, mil84 wrote: > doc.add(new Field("Title", "hohoho", Field.Store.YES, Field.Index.TOKENIZED)); [...] > 3) Searching in title - it DON'T WORK (I try to find hohoho, and nothing). [...] > QueryParser parser = new QueryParser("title", new StandardAnalyzer()); Field names are

Re: How to search in metadata? (filename, document title, cocument creator, ...)

2008-10-20 Thread mil84
Ok, better example. For example I want to search pdf file. 1) Indexing: Document doc = LucenePDFDocument.getDocument(f); doc.add(new Field("contents", new FileReader(f))); doc.add(new Field("filename", "Example.pdf", Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field("C

Re: How to search in metadata? (filename, document title, cocument creator, ...)

2008-10-20 Thread Erick Erickson
If Grant's suggestions don't help you, some examples of your search code would be helpful to further pinpoint things... Best Erick On Mon, Oct 20, 2008 at 11:29 AM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > > On Oct 20, 2008, at 10:32 AM, mil84 wrote: > > >> I've a problem witch searching. I

Re: How to search in metadata? (filename, document title, cocument creator, ...)

2008-10-20 Thread Grant Ingersoll
On Oct 20, 2008, at 10:32 AM, mil84 wrote: I've a problem witch searching. I need to search not only in file contents, but also in metadata. But I don't know how to do it. My code: Document doc = new Document(); doc.add(new Field("contents", new FileReader(f))); writer.addDocument(doc); ..

Re: Hiring etiquette

2008-10-20 Thread Richard Marr
> Can you post details of what you're hitting? Sorry Mike, my problem doesn't seem to be repeatable. I'll bring it back to the group if/when I nail it down. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-m

How to search in metadata? (filename, document title, cocument creator, ...)

2008-10-20 Thread mil84
I've a problem witch searching. I need to search not only in file contents, but also in metadata. But I don't know how to do it. My code: Document doc = new Document(); doc.add(new Field("contents", new FileReader(f))); writer.addDocument(doc); ... QueryParser parser = new QueryParser("contents",

RE: I am not able to run Lucene 2.4 Demo

2008-10-20 Thread Sudarsan, Sithu D.
Hi, I'm using Lucene2.3.2, and no problem so far with Windows. One issue to look at would be, whether your Index directory has the permission to write. Probably, your Index folder is Read_only. Sincerely, Sithu Sudarsan Graduate Research Assistant, UALR & Visiting Researcher, CDRH/OSEL [EMAI

Re: Merge index will maintain index order

2008-10-20 Thread Erick Erickson
Let's claim you have build indexes in this order (by date). index1 index2 index3 Now if you addindexes clause has them ordered the same way, then, in the merged index, the first docID from index 2 will be greater than the last doc id from index1 in the merged index. The first doc id from index 3 w

Re: robots.txt

2008-10-20 Thread Erik Hatcher
On Oct 20, 2008, at 8:58 AM, Alexander Aristov wrote: Just wonder if Nutch takes into consideration rules from the robots.txt file while crawling a site. Wrong e-mail list, but yeah, Nutch supports robots.txt considerations. Erik ---

Re: Merge index will maintain index order

2008-10-20 Thread mahdi yari
i can not understand about last sentence "So the final index will only ..." can you write more about this, because i have same question... thanks On Mon, Oct 20, 2008 at 2:04 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > By merge you mean using addIndexes*, right? > > Those methods logic

Re: About TermQuery

2008-10-20 Thread Grant Ingersoll
You should be able to get the clauses from the BooleanQuery: http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/search/BooleanQuery.html#clauses() And, from there, you can do instanceof to determine the query type, eventually getting to a TermQuery, where you can do: http://lucene.

robots.txt

2008-10-20 Thread Alexander Aristov
Hi all, Just wonder if Nutch takes into consideration rules from the robots.txt file while crawling a site. -- Best Regards Alexander Aristov

Re: Hiring etiquette

2008-10-20 Thread Michael McCandless
Richard Marr wrote: I'm having issues merging indexes using indexWriter.addIndexes(). Can you post details of what you're hitting? Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PRO

Re: Hiring etiquette

2008-10-20 Thread Richard Marr
Ah, okay. Thanks for the advice Ard. I'll use that list in future. Anybody who is annoyed by my hiring notice is quite welcome to get in touch with me and I'll buy them a beer or something if and when they come through London. Oh dear, spamming _and_ bribery in one day. My moral fibre is looking

Re: Merge index will maintain index order

2008-10-20 Thread Michael McCandless
By merge you mean using addIndexes*, right? Those methods logically concatenate the indices in order by doc ID. So the original docs in your index will keep all their docIDs, and the newly added indices are assigned docIDs after that, in the order they were added. So the final index wil

Re: How to avoid Index corruption

2008-10-20 Thread Michael McCandless
If the application (JVM) crashes or is killed or otherwise ungracefully shut down, it should never corrupt the Lucene index. The only known "normal" (ie, assuming no errors in your hard drives, bugs in the OS/filesystem, etc.) cases where corruption may occur is if power is lost to the ma

Re: Problem with updating Index continuously

2008-10-20 Thread Mr Shore
Mysql supports freetext search,why still stick on nutch? 2008/10/20 Michael McCandless <[EMAIL PROTECTED]> > > Is it possible you are closing it somewhere else? > > This code fragment looks correct to me. > > Mike > > Cool The Breezer wrote: > > You need to close the old read, only if the newRea

How to avoid Index corruption

2008-10-20 Thread Ganesh
Hello all, I am using Lucene 2.3.2 in Windows and Linux. I have to do incremental indexing. I am worried about the corruption of index in case of power failure or forceful restart the server or application crash. How to avoid the situitation of corruption. Any tips would be grately appricate

RE: Hiring etiquette

2008-10-20 Thread Ard Schrijvers
Hello Rich, There is actually also a specific list indeed for it, [EMAIL PROTECTED], but it is a really low traffic list I must admit, most likely not read at all by the people you are looking for...though, officially, it is the list to use :-) Ard > Hi all, > > Is there a mailing-list-appropr

Merge index will maintain index order

2008-10-20 Thread Ganesh
Hello all, I am planning to merge two or more indexes. Once merged, will the DB maintain the same index order as before merge? I am doing sorting on Index Order as sorting on date-time takes more amount of RAM. If i merge the index DB, will the same index order be maintained or the indexes wil

Re: Problem with updating Index continuously

2008-10-20 Thread Michael McCandless
Is it possible you are closing it somewhere else? This code fragment looks correct to me. Mike Cool The Breezer wrote: You need to close the old read, only if the newReader is different (ie, it was in fact reopened because there were changes in the index). I tried closing but getting "inde

Re: I am not able to run Lucene 2.4 Demo

2008-10-20 Thread Ganesh
Could you give me full stack trace. Also provide the output messges displayed on the console. Regards Ganesh - Original Message - From: prabina pattanayak To: java-user@lucene.apache.org ; [EMAIL PROTECTED] ; Ganesh Sent: Saturday, October 18, 2008 10:40 AM Subject: Re: I am

Re: Problem with updating Index continuously

2008-10-20 Thread Cool The Breezer
> You need to close the old read, only if the newReader is > different > (ie, it was in fact reopened because there were changes in > the index). I tried closing but getting "index already closed" error. IndexReader newReader = reader.reopen(); if (newReader != reader)

About TermQuery

2008-10-20 Thread Carlos Rodríguez Fernández
How can I get the boost value of the subqueries "TermQuery" from a BooleanQuery? In the Similarity ecuation http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/search/Similarity.html I don't know how I can get the t.getBoost() value. Could you help me? I need it because I need to recalculat

Re: Hiring etiquette

2008-10-20 Thread Richard Marr
> in this day/time, when you don't know if your job is safe next week, who's > really going to frown upon a potentially serious project/offer... > > just be cool on the spamming aspect, and you should be ok!! Thanks for the feedback everybody. If there's anybody in or near London that might be i

Re: TermDocs and "read"

2008-10-20 Thread Michael McCandless
It seems like you are trying to use the TermDocs iterator to load the term freq for that particular document (doc)? It doesn't work that way -- instead, it simply iterates over all documents that this term occurred in. (Ie it will replace the doc in the int[] that you passed in, with the

Re: Problem with updating Index continuously

2008-10-20 Thread Michael McCandless
You need to close the old read, only if the newReader is different (ie, it was in fact reopened because there were changes in the index). Not closing the old reader will cause the files it held open to be undeletable. Mike Cool The Breezer wrote: Hi, I have requirement of updating se

TermDocs and "read"

2008-10-20 Thread Carlos Rodríguez Fernández
Hello: I have a problem with TermDocs#read operation. the following code has an incorrect result: . int termFreq=0; . TermDocs termDocs = indexReader.termDocs(new Term(((Field)field).name(),termCons)); int[] freqs = new int[]{0};

TermDocs and "read"

2008-10-20 Thread Carlos Rodríguez Fernández
Hello: I have a problem with TermDocs#read operation. the following code has an incorrect result: . int termFreq=0; . TermDocs termDocs = indexReader.termDocs(new Term(((Field)field).name(),termCons)); int[] freqs = new int[]{0};