Re: How to add machine learning to Apache lucene

2014-05-19 Thread Ahmet Arslan
Hi Diego, There is no such thing in lucene ecosystem yet. Although some ideas http://search-lucene.com/m/WwzTb2nt1Tk1  http://search-lucene.com/m/WwzTb2d9o2m float time to time.  I would like to integrate https://code.google.com/p/jforests/ and create a prototype my self in the future. New a

Re: Performance issue when using multiple PhraseQueries against a 1+ million entries index

2014-05-19 Thread Liviu Matei
Thanks for the reply. When you mention system memory you referring to RAM (or HEAP as this is running as a java process) ? The index size is around 13G and the java process is not given so many memory (in terms of XMX). Could this be the cause? My understandint while reading some articles on the in

Re: Performance issue when using multiple PhraseQueries against a 1+ million entries index

2014-05-19 Thread Jack Krupansky
Does your index fit fully in system memory - the OS file cache? If not, there could be a lot of thrashing (I/O) as Lucene accesses the index. -- Jack Krupansky -Original Message- From: Liviu Matei Sent: Monday, May 19, 2014 4:21 PM To: java-user@lucene.apache.org Subject: Performance

Performance issue when using multiple PhraseQueries against a 1+ million entries index

2014-05-19 Thread Liviu Matei
Hi, In order to achieve a somehow "smarter" search that takes into consideration also the context I decided to use PhraseQuery. Now I create ~100 phrase queries from the input text and combine them with boolean query into one query and issue a search against the index. Now if the index size is big

Getting payloads for query terms of constantscore query

2014-05-19 Thread Puneet Pawaia
Hi I can get the payloads for query terms using getPayloadsForQuery from PayloadSpanUtil. However this does not support ConstantScore queries. So how do I get the payloads for queries that get rewritten to ConstantScore query for example PrefixQuery, WildcardQuery. Thanks Puneet

Re: search time & number of segments

2014-05-19 Thread Toke Eskildsen
On Mon, 2014-05-19 at 11:54 +0200, De Simone, Alessandro wrote: [24GB index, 8GB disk cache, only indexed fields] > The "IO calls" I was referring to is the number of time the > "BufferedIndexInput.refill()" function is called. So it means that we > have 16 times more bytes read when there are 16

Re: writer.updateDocument() not working (possible bug?)

2014-05-19 Thread Jack Krupansky
Out of curiosity, do any of the current crowd of Lucene commiters/users have any insight as to how or why that seemingly obvious design requirement was ignored or consciously avoided in the original design for Lucene? I've always assumed that Lucene (and Solr) were originally designed for a batc

NewBie To Lucene || Perfect configuration on a 64 bit server

2014-05-19 Thread Shruthi
Hi, We are using Lucene 4.7 on our server application for searching the documents placed on a nasshare. We have 10 million+ documents and have decided not to index all the documents. The strategy that we applied is as follows: 1. Client makes a request with a search phrase. Lucene applic

Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-19 Thread Michael McCandless
On Mon, May 19, 2014 at 6:14 AM, Clemens Wyss DEV wrote: > Mike, > first of all thanks for all your input, I really appreciate (as much as I > like reading your blog). You're welcome! >> Hmm, but you swap these files over while an IndexReader is still open on the >> index? > no IndexReader is

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-19 Thread Clemens Wyss DEV
Mike, first of all thanks for all your input, I really appreciate (as much as I like reading your blog). > Hmm, but you swap these files over while an IndexReader is still open on the > index? no IndexReader is open while swapping. At least not by design. We have at most one (current)reader per

Re: writer.updateDocument() not working (possible bug?)

2014-05-19 Thread Michael McCandless
I know, it's a commonly requested feature, but unfortunately it's very complex to implement. See e.g. the discussions on https://issues.apache.org/jira/browse/LUCENE-4258 Mike McCandless http://blog.mikemccandless.com On Mon, May 19, 2014 at 5:15 AM, Jamie wrote: > Michael > > Thanks for the

RE: search time & number of segments

2014-05-19 Thread De Simone, Alessandro
Thank you for your input > How much RAM does your search machine have? We have 16GB of ram, and there is at least 8GB free memory for the OS file cache. The cache is working pretty well. > That sounds right. Although each segment is 1/16 of the full index size, the > number of seeks per segmen

Re: Issue with Lucene 3.6.1 and MMapDirectory

2014-05-19 Thread Liviu Matei
Also one more thing ... sorry forgot to add by using lsof I noticed deleted index files that are still used by the application. Is this ok? Can't this cause issues? The IndexReader trying to access an index file that was deleted ? I suspect the deletion happens because of index merges during indexi

Re: Issue with Lucene 3.6.1 and MMapDirectory

2014-05-19 Thread Liviu Matei
Thank you very much to all of you the answers. Uwe this is the strange thing that I am currently never closing the index reader and opening a new one from 8 to 8 hours and I am noticing that crash in indeed a highly concurrent environment. The indexes reside in a NFS file system. And the location i

Re: writer.updateDocument() not working (possible bug?)

2014-05-19 Thread Jamie
Michael Thanks for the clarification. This is a hefty limitation of the Lucene. One would expect, that you would be able to update a specific field in the index without having to reindex the entire document. Regards Jamie On 2014/05/16, 11:34 PM, Michael McCandless wrote: You can retrieve

Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-19 Thread Michael McCandless
On Mon, May 19, 2014 at 4:59 AM, Clemens Wyss DEV wrote: >> Are you using doc-values updates? > Not to my knowledge, i.e. not explicitly Hmm ok. >> Are you ever removing files directly from the index directory yourself >> between reopens? > Yes. Reindexing an index completely(*) is done in a se

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-19 Thread Clemens Wyss DEV
> Are you using doc-values updates? Not to my knowledge, i.e. not explicitly > Are you ever removing files directly from the index directory yourself > between reopens? Yes. Reindexing an index completely(*) is done in a separate temporary index/folder. After that we (guarded by a mutex) swap th

Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-19 Thread Michael McCandless
Hmm, I was wrong before, the code is more complex than I thought. Are you using doc-values updates? Can you describe a bit more how your app works? Are you ever removing files directly from the index directory yourself between reopens? Mike McCandless http://blog.mikemccandless.com On Mon, M

Call for papers: JTRES 2014

2014-05-19 Thread w...@dtu.dk
(Apologies if you reveive multiple copies of this CfP.) The 12th International Workshop on Java Technologies for Real-time and Embedded Systems - JTRES 2014 October 13th - 14th Niagara Falls, NY, USA