Fuzzy Searching on Lucene / Solr

2013-08-13 Thread Michael Tobias
My first post so please be gentle with me. I am about to start 'playing' with Solr to see if it will be the correct tool for a new searchable database development. One of my requirements is the ability to do 'fuzzy' searches and I understand that the latest versions of Lucene / Solr use an improv

Re: IllegalStateException in SpanTermQuery

2013-08-13 Thread Yonghui Zhao
In our old code, we create the filed like this. Field metaField = new Field(name,strVal,fldDef.store, Index.NOT_ANALYZED_NO_NORMS); metaField.setOmitNorms(true); *metaField.setIndexOptions(IndexOptions.DOCS_ONLY);* luceneDoc.add(metaFi

Re: Avoid automaton Memory Usage

2013-08-13 Thread Michael McCandless
On Tue, Aug 13, 2013 at 9:44 AM, Anna Björk Nikulásdóttir wrote: > I created these 3 issues for the discussed items: Thanks! If you (or anyone!) want to work up a patch that would be great ... > Thanks a lot for your suggestions (pun intended) ;) ;) Mike McCandless http://blog.mikemccandless

Re: problem found with DiskDocValuesFormat

2013-08-13 Thread Duke DAI
Hi Mike, Thanks for your quick response. All data was newly indexed, so compatibility is not the culprit. Is it possible a multi-thread issue? I use shared IndexReaders between different IndexSearchers. No evidence for this guess because I have many multi-thread test cases and they passed, but t

Re: Avoid automaton Memory Usage

2013-08-13 Thread Anna Björk Nikulásdóttir
I created these 3 issues for the discussed items: On disk FST objects: https://issues.apache.org/jira/browse/LUCENE-5174 FuzzySuggester should boost terms with minimal Levenshtein Distance: https://issues.apache.org/jira/browse/LUCENE-5172 AnalyzingSuggester and FuzzySuggester should be able to

Trying to store Offsets. Dont know the exact meaning of some terms.

2013-08-13 Thread Ankit Murarka
Hello, I generally add fields to my document in the following manner. I wish to add offsets to this field. doc.add(new StringField("contents",line,Field.Store.YES)); I wish to also store offsets. So, I went through javadoc, and found I need to use FieldType. So, I ended up using :

Re: IllegalStateException in SpanTermQuery

2013-08-13 Thread Michael McCandless
All span queries require positions to work; older Lucene released failed to catch you if you tried to use a span query on a field that did not index positions, but now Lucene 4.x does catch you (this is an improvement). You should double check your unit test: it really should not have been passing

IllegalStateException in SpanTermQuery

2013-08-13 Thread Yonghui Zhao
One of my UT is passed In lucene 3.5, but it is failed in lucene4.3. The exception is: IllegalStateException("field \"" + term.field() + "\" was indexed without position data; cannot run SpanTermQuery (term=" + term.text() + ")"); After I change index option of the field from DOCS_ONLY to DOCS_A

Re: problem found with DiskDocValuesFormat

2013-08-13 Thread Michael McCandless
DiskDVFormat does not have index back compatibility between minor releases; maybe that's what you are seeing? So, you must fully re-index after any DiskDVFormat field after upgrading ... Only the default formats support index back compatibility between releases. Mike McCandless http://blog.mik

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-13 Thread Lingviston
I'm currently using this snippet (with older Highlighter): HitPositionCollector collector = new HitPositionCollector(); highlighter = new Highlighter(collector, scorer); highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer,

Re: Boolean Query when indexing each line as a document.

2013-08-13 Thread Ian Lea
remedialaction != "remedial action"? Show us your query. Show a small self-contained sample program or test case that demonstrates the problem. You need to give us something more to go on. -- Ian. On Tue, Aug 13, 2013 at 11:13 AM, Ankit Murarka wrote: > Hello, > I am aware of that l

Re: Boolean Query when indexing each line as a document.

2013-08-13 Thread Ankit Murarka
Hello, I am aware of that link and I have been through that link many number of times. Problem I have is: 1. Each line is indexed. So indexed line looks something like "\" 2. I am easily firing a phrase query on this line. It suggest me the possible values. No problem,. 3. If I fire

Re: How to get hits coordinates in Lucene 4.4.0

2013-08-13 Thread Michael McCandless
If you use PostingsHighlighter, then Passage.getMatchStarts/Ends gives you the offsets of each match. You'd need a custom PassageFormatter that takes these ints and saves them somewhere; or possibly the patch on LUCENE-4906 (allowing you to return custom objects, not just String) from your highlig

Re: Creating Indexes when data inside the file is being written.

2013-08-13 Thread Ian Lea
I'm not sure what you're getting at. If you've got one job reading data, writing to an output file and indexing as you go, it should work. If you've got multiple jobs trying to write to the same output file and lucene index you'll need some external synchronisation. -- Ian. On Tue, Aug 13, 20

Re: Creating Indexes when data inside the file is being written.

2013-08-13 Thread Jugal Kolariya
Probably, Last doubt: The data in my application is coming from a stream after performing some functionality. This stream is getting continously written in the file. So , effectively, if I open a lucene index and create indexes using this file, I would be able to create the indexes ..??? Wo

Re: Boolean Query when indexing each line as a document.

2013-08-13 Thread Ian Lea
Should be straightforward enough. Work through the tips in the FAQ entry at http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F and post back if that doesn't help, with details of how you are analyzing the data and how you are searching. -- Ian. On Tue,

problem found with DiskDocValuesFormat

2013-08-13 Thread Duke DAI
Hi experts, I'm upgrading Lucene 4.4 and trying to use DocValues instead of store field for performance reason. But due to unknown size of index(depends on customer), so I will use DiskDocValuesFormat, especially for some binary field. Then I wrote my customized Codec: final Codec codec = n

Re: Creating Indexes when data inside the file is being written.

2013-08-13 Thread Ian Lea
If I've understood your question correctly, the answer is yes. Assuming the input data is coming from another file the flow will be along the lines of . Open input file for reading . Open output file for writing . Open (or create) lucene index . For each input record - write to output file

Re: Creating Indexes when data inside the file is being written.

2013-08-13 Thread Jugal Kolariya
That only answer my 2nd part. My most important question still remains. " In my code case, I am creating a new file and writing data to that file. Now, when the file writing is in progress, I would like to create Lucene Indexes. Once indexes are created, I can then perform operation on the ind

Boolean Query when indexing each line as a document.

2013-08-13 Thread Ankit Murarka
Hello All, I have 2 different usecases. I am trying to provide both boolean query and phrase search query in the application. In every line of the document which I am indexing I have content like : \ Due to the phrase search requirement, I am indexing each line of the file as