[suggestions] fetch terms from a FilterAtomicReader(subclass)?

2014-10-27 Thread Clemens Wyss DEV
Is it possible to fetch the terms of a FilterAtomicReader in order to provide suggestions from a subset of all documents in an index? So my target is to provide suggestions from a subset of all documents in an index. Note: I have an equal discussion ongoing in the solr-mailinglist. But I

real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Michael Breu
Hello, I'm looking for an infix suggester that allows infix search for a given term. This might not be that important in English. However in German we have quite complex composite words like Donaudampfschifffahrtsgesellschaftskapitän which is composed by the nouns Donau (danube), Dampf

Re: [suggestions] fetch terms from a FilterAtomicReader(subclass)?

2014-10-27 Thread Olivier Binda
On 10/27/2014 07:32 AM, Clemens Wyss DEV wrote: Is it possible to fetch the terms of a FilterAtomicReader in order to provide suggestions from a subset of all documents in an index? Yes, it is possible. I do it by feeding a custom Dictionary with a custom InputIterator in the lookup.build()

Re: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Michael Sokolov
Have you considered combining the AnalyzingInfixSuggester with a German decompounding filter? If you break compound words into their constituent parts during analysis, then the suggester will be able to do what you want (prefix matches on the word-parts). I found this project with a quick

RE: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Oliver Christ
The hard way may be to use the standard Analyzing Suggester but to add each (analyzed) suffix of the surface string (mapping to the full surface form) during automaton generation. I.e. when adding Donau..., you add all analyzed suffixes donau..., onau..., nau..., ... - all mapping to

Lucene not showing Low Score Doc

2014-10-27 Thread Priyanka Tufchi
Hi All Actually I have set of 10 doc which i gave for comparison through apache lucene now when i check score for the set ,out of 10 i am getting 8 in my database , rest 2 are not showing . If the score is very less still lucene should show something , how can i handle it as i have to show all

AW: [suggestions] fetch terms from a FilterAtomicReader(subclass)?

2014-10-27 Thread Clemens Wyss DEV
Salut Olivier, would you mind providing me your Suggester-class code (or the relevant snippets) as an ideal jump-start? -Clemens -Ursprüngliche Nachricht- Von: Olivier Binda [mailto:olivier.bi...@wanadoo.fr] Gesendet: Montag, 27. Oktober 2014 11:51 An: java-user@lucene.apache.org

Re: Lucene not showing Low Score Doc

2014-10-27 Thread Shai Erera
Hi Your question is a bit fuzzy -- what do you mean by not showing low scores? Are you sure that these 2 documents are matched by the query? Can you boil it down to a short test case that demonstrates the problem? In general though, when you search through IndexSearch.search(Query, int), you

Re: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Michael Breu
Hello Michael, Thank you for your kind support. I had a look into the elasticsearch-analysis-decompound and tried to integration. However it seemed to me that it is somewhat hard to integrate it into our work based on lucene-core. I have manged to set up a test environment, however I was not

Re: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Michael Breu
Hello Oliver, I already had a look into the AnalyzingSuggester before. I was not able to spot the location where it generates the prefixes. It works with some path analysis based on automaton (both for analysis and query). It is not really clear to me how to extend this automaton. Could you give

Weighted tags for document instances (at index time)

2014-10-27 Thread Ralf Bierig
I want to index documents together with a list of tags (usually between 10-30) that represent meta information about this document. Normally, i would create an extra field tag store every tag, by its name, inside that field and create my 10-30 fields that and adding it to the document before

RE: real infix suggester, not AnalyzingInfixSuggester

2014-10-27 Thread Oliver Christ
Hi Michael, There may be several entry points, I'm not sure which one still works - the suggester data processing chain has changed quite a bit since I looked at it about two years ago, maybe Mike or Robert can chime in if I'm totally off. One way I experimented with was to implement a custom

Re: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Nischal, I had similar indexing issue. My lucene indexing took 22 mins for 70 MB docs. When i debugged the problem, i found out the indexWriter.addDocument(doc) taking a really long time. Have you already found the solution about it? Thank you, Jason -- View this message in context:

RE: Making lucene indexing multi threaded

2014-10-27 Thread Fuad Efendi
I believe there were many reports of many-thousands-docs per second in average. I experienced similar SOLR speeds many years ago too, with small documents (512-bytes each) You can check harddrive performance at first (use SSD, etc...); and second, check your indexing architecture: is it

RE: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Fuad, Thanks for your suggestions and quick response. I am using a single-threaded indexing way to add docs. I will try the multiple-threaded indexing to see if my issue will be resolved. This issue only exists after I upgraded lucene version from 2.4.1(with Java 1.6) to 4.8.1(with Java 1.7).

Re: Making lucene indexing multi threaded

2014-10-27 Thread G.Long
Like Nischal, did you check that you don't call the commit() method after each indexed document? :) Regards, Gary Long Le 27/10/2014 16:47, Jason Wu a écrit : Hi Fuad, Thanks for your suggestions and quick response. I am using a single-threaded indexing way to add docs. I will try the

Re: Making lucene indexing multi threaded

2014-10-27 Thread Jason Wu
Hi Gary, Thanks for your response. I only call the commit when all my docs are added. Here is the procedure of my Lucene indexing and re-indexing: 1. If index data exists inside index directory, remove all the index data. 2. Create IndexWriter with 256MB RAMBUFFERSIZE 3. Process

Indexing Weighted Tags per Document

2014-10-27 Thread Ralf Bierig
I want to index documents together with a list of tags (usually between 10-30) that represent meta information about this document. Normally, i would create an extra field tag store every tag, by its name, inside that field and create my 10-30 fields that and adding it to the document before

Re: Lucene not showing Low Score Doc

2014-10-27 Thread Shai Erera
I'm sorry, I still don't feel like I have all the information in order to help with the problem that you're seeing. Can you at least paste the contents of the documents and the query? Can you search with a TotalHitCountCollector only, and print the total number of hits? Shai On Mon, Oct 27,

Questions about the Lucene query language

2014-10-27 Thread Prad Nelluru
Hi everyone, I'm trying to understand how to use the Lucene query language. 1. Does Lucene support negative phrase queries like -hello dolly ? Or do I need to subtract from some other term like: joy -hello dolly ? My intention is to find all documents that do not have the words hello

Re: Questions about the Lucene query language

2014-10-27 Thread Jack Krupansky
Pure negative queries are not supported, but all you need to do is include *:*, which translates into MatchAllDocsQuery. hello dolly is the same as hello dolly~0 -- Jack Krupansky -Original Message- From: Prad Nelluru Sent: Monday, October 27, 2014 8:57 PM To: