[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488015#comment-14488015 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1672458 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1672458 ] LUCENE-6339: Maven config: add resource dir src/resources/ to the POM. [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.1 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488027#comment-14488027 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1672461 from [~steve_rowe] in branch 'dev/branches/lucene_solr_5_1' [ https://svn.apache.org/r1672461 ] LUCENE-6339: Maven config: add resource dir src/resources/ to the POM. (merged trunk r1672458) [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.1 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14488021#comment-14488021 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1672459 from [~steve_rowe] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1672459 ] LUCENE-6339: Maven config: add resource dir src/resources/ to the POM. (merged trunk r1672458) [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.1 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483734#comment-14483734 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1671914 from [~areek] in branch 'dev/trunk' [ https://svn.apache.org/r1671914 ] LUCENE-6339: fix test (take into account inadmissible filtered search for multiple segments) [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.1 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483736#comment-14483736 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1671916 from [~areek] in branch 'dev/branches/lucene_solr_5_1' [ https://svn.apache.org/r1671916 ] LUCENE-6339: fix test (take into account inadmissible filtered search for multiple segments) [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.1 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483735#comment-14483735 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1671915 from [~areek] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1671915 ] LUCENE-6339: fix test (take into account inadmissible filtered search for multiple segments) [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.1 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395194#comment-14395194 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1671196 from [~areek] in branch 'dev/trunk' [ https://svn.apache.org/r1671196 ] LUCENE-6339: fix test (ensure the maximum requested size is bounded to 1000) [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.1 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395174#comment-14395174 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1671187 from [~areek] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1671187 ] LUCENE-6339: fix test (ensure the maximum requested size is bounded to 1000) [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.1 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395176#comment-14395176 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1671189 from [~areek] in branch 'dev/branches/lucene_solr_5_1' [ https://svn.apache.org/r1671189 ] LUCENE-6339: fix test (ensure the maximum requested size is bounded to 1000) [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.1 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393425#comment-14393425 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1670969 from [~areek] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1670969 ] LUCENE-6339: fix test bug (ensure opening nrt reader with applyAllDeletes) [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.x Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393474#comment-14393474 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1670978 from [~areek] in branch 'dev/branches/lucene_solr_5_1' [ https://svn.apache.org/r1670978 ] LUCENE-6339: fix test bug (ensure opening nrt reader with applyAllDeletes) [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.x Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393438#comment-14393438 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1670972 from [~areek] in branch 'dev/trunk' [ https://svn.apache.org/r1670972 ] LUCENE-6339: fix test bug (ensure opening nrt reader with applyAllDeletes) [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.x Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384405#comment-14384405 ] Michael McCandless commented on LUCENE-6339: I think the tie break should be a.doc b.doc, for consistency with Lucene? I.e., on a score tie, the smaller doc ID should sorter earlier than the bigger doc ID? Otherwise +1 to commit! Thanks [~areek]! [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384583#comment-14384583 ] Uwe Schindler commented on LUCENE-6339: --- I just reviewed the patch, too. I like the API, but have not yet looked into it closely like Mike - I just skimmed it. Just one question: What happens if 2 documents have the same SuggestField and same suggestion presented to user? This would now produce duplicates, right? I was just thinking about how to prevent that (coming from Elasticsearch world). [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384604#comment-14384604 ] Uwe Schindler commented on LUCENE-6339: --- +1 [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384631#comment-14384631 ] Areek Zillur commented on LUCENE-6339: -- Hi [~thetaphi], Thanks for the review! If two documents do have the same suggestion for the same SuggestField, it will produce duplicates in terms of the suggestion, but because they are from two documents (different doc ids) they are not considered as duplicates. Maybe we can add a boolean flag in the NRTSuggester to only collect unique suggestions, but then we will have to decide on which suggestion to throw out, as they are now tied to doc ids? [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384823#comment-14384823 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1669698 from [~areek] in branch 'dev/trunk' [ https://svn.apache.org/r1669698 ] LUCENE-6339: Added Near-real time Document Suggester via custom postings format [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384807#comment-14384807 ] Uwe Schindler commented on LUCENE-6339: --- bq. If two documents do have the same suggestion for the same SuggestField, it will produce duplicates in terms of the suggestion, but because they are from two documents (different doc ids) they are not considered as duplicates. Yeah that's what I mean by duplicate. The suggester only returns doc ids. Vor display to user, you would read a stored field (the actual suggestion) and this produces the duplicate. I am not sure how to solve that. It was just an idea. If this is really an issue, one could filter the duplicates afterwards. [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384857#comment-14384857 ] Uwe Schindler commented on LUCENE-6339: --- Indeed the suggestion does not need to come from a stored field of the result document, nice! But one could use that to add additional suggestion information, right - instead of the payload? [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.x Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384859#comment-14384859 ] ASF subversion and git services commented on LUCENE-6339: - Commit 1669703 from [~areek] in branch 'dev/trunk' [ https://svn.apache.org/r1669703 ] LUCENE-6339: move changes entry from 6.0.0 to 5.1.0 [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.x Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384865#comment-14384865 ] Areek Zillur commented on LUCENE-6339: -- Yes [~thetaphi] that is the idea :). the payload option has been removed entirely, now instead of using payloads one can grab any associated values from the document with each suggestion [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.x Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384976#comment-14384976 ] Areek Zillur commented on LUCENE-6339: -- Committed to branch_5x with revision r1669715 (missed out on prepending the commit message with jira #) [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: Trunk, 5.x Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382992#comment-14382992 ] Michael McCandless commented on LUCENE-6339: Patch looks great! Can we pull out SuggestScoreDocPQ into its own .java source? Should its lessThan method tie break by docID? I think the logic to compute maxQueueSize in getMaxTopNSearcherQueueSize could possibly overflow int? Maybe use long, and then cast back to int after the Math.min? [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int num, Filter filter, TopSuggestDocsCollector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377513#comment-14377513 ] Michael McCandless commented on LUCENE-6339: New patch looks great, thanks [~areek]! In TopSuggestDocsCollector: - In collect, we seem to assume the suggest searcher will never call collect more than num times? How is that? If so, can you add that to the javadocs, and maybe add an assert upto num in collect? - Can we just allocate scoreDocs up front instead of lazily? - In the javadocs, instead of one hit can be... maybe one doc can be...? Hit is a tricky word in this context since it could be a doc or a suggestion... In SuggestIndexSearcher, does it really ever make sense to take a generic Collector/LeafCollector? Can we instead just strongly type the params to all the methods to be TopSuggestDocsCollector? In case a filter has to be applied, the queue size is doubled is not quite correct? Maybe change the logic there so the int queueSize is first computed, and then if filter is enabled, it's doubled? Can we remove the separate WeightProcessor class and just make encode/decode static methods on NRTSuggester? We can add back abstractions later if users somehow need control over weight encoding... Can we add a test that tests the extreme case of nearly all docs filtered out and another test with nearly all docs deleted? [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0 Attachments: LUCENE-6339.patch, LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { PostingsFormat completionPostingsFormat = new Completion50PostingsFormat(); @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return completionPostingsFormat; } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int maxNumPerLeaf, Filter filter, Collector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer(Analyzer analyzer, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail:
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351191#comment-14351191 ] Areek Zillur commented on LUCENE-6339: -- {quote} you fetch the checksum for the dict file in {{ CompletionFieldsProducer#ctor }} via {{ CodecUtil.retrieveChecksum(dictIn); } but you ignore it's return value, was this intended? I think you don't wanna do that here? Did you intend to check the entire file? I wonder if we should just write one file for both, the index and the FSTs? What's the benefit from having two? {quote} This was intentional, used the same convention for {{BlockTreeTermsReader#termsIn}} here. The thought was doing the checksum check would be very costly, in most cases the {{dict}} file would be large? If we write one file instead of two, then the checksum check would be more expensive for the index then now? [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0 Attachments: LUCENE-6339.patch, LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return new CompletionPostingsFormat(super.getPostingsFormatForField(field)); } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*. Hits are collected in descending order of the suggestion's weight {code:java} // full options for TopSuggestDocs (TopDocs) TopSuggestDocs suggest(String field, CharSequence key, int num, Filter filter) // full options for Collector // note: only collects does not score void suggest(String field, CharSequence key, int maxNumPerLeaf, Filter filter, Collector collector) {code} h4. Analyzer *CompletionAnalyzer* can be used instead to wrap another analyzer to tune suggest field only parameters. {code:java} CompletionAnalyzer completionAnalyzer = new CompletionAnalyzer(analyzer); completionAnalyzer.setPreserveSep(..) completionAnalyzer.setPreservePositionsIncrements(..) completionAnalyzer.setMaxGraphExpansions(..) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350351#comment-14350351 ] Simon Willnauer commented on LUCENE-6339: - Hey Areek, I agree with mike this looks awesome... lemme give you some comments * can we make {{CompletionAnalyzer}} immutable by any chance? I'd really like to not have setters if possible? For that I guess it's constants need to be public as well? * is {{private boolean isReservedInputCharacter(char c) }} needed since we then afterwards check it again in the {{checkKey}} method, maybe you just wanna use a switch here? * In {{CompletionFieldsConsumer#close()}} I think we need to make sure {{IOUtils.close(dictOut);}} is also called if an exception is hit? * do we need the extra {{InputStreamDataInput}} in {{CompletionTermWriter#parse}}, I mean it's a byte input stream so we should be able to read all of the bytes? * {{SuggestPayload}} doesn't need a default ctor * can we use {{ if (success == false) }} instead of {{ if (!success) }} as a pattern in general? * use try / finally in {{CompletionFieldsProducer#close()}} to ensure all resource are closed or pass both the dict and {{ delegateFieldsProducer }} to IOUtils#close()? * you fetch the checksum for the dict file in {{ CompletionFieldsProducer#ctor }} via {{ CodecUtil.retrieveChecksum(dictIn); } but you ignore it's return value, was this intended? I think you don't wanna do that here? Did you intend to check the entire file? * I wonder if we should just write one file for both, the index and the FSTs? What's the benefit from having two? For loading the dict you put a comment in there sayingm {{ // is there a better way of doing this?}} I think what you need to do is this: {code} public synchronized SegmentLookup lookup() throws IOException { if (lookup == null) { try (IndexInput dictClone = dictIn.clone()) { // let multiple fields load concurrently dictClone.seek(offset); // this is your field private clone lookup = NRTSuggester.load(dictClone); } } return lookup; } {code} I'd appreciate a tests that this works just fine ie. loading multiple FSTs concurrently. I didn't get further than this due to the lack of time but I will come back to this either today or tomorrow. Good stuff Areek [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0 Attachments: LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return new CompletionPostingsFormat(super.getPostingsFormatForField(field)); } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader, queryAnalyzer); // suggest 10 documents for titl on suggest_title field TopSuggestDocs suggest = indexSearcher.suggest(suggest_title, titl, 10); {code} h4. Indexing Index analyzer set through *IndexWriterConfig* {code:java} SuggestField(String name, String value, long weight) {code} h4. Query Query analyzer set through *SuggestIndexSearcher*.
[jira] [Commented] (LUCENE-6339) [suggest] Near real time Document Suggester
[ https://issues.apache.org/jira/browse/LUCENE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349653#comment-14349653 ] Michael McCandless commented on LUCENE-6339: This looks really nice! I think AutomatonUtil is (nearly?) the same thing as TokenStreamToAutomaton? Can we somehow consolidate the two? When I try to ant test with the patch on current 5.x some things are angry: {noformat} [mkdir] Created dir: /l/areek/lucene/build/suggest/classes/java [javac] Compiling 65 source files to /l/areek/lucene/build/suggest/classes/java [javac] /l/areek/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.java:597: warning: [cast] redundant cast to TopFieldDocs [javac] TopFieldDocs hits = (TopFieldDocs) c.topDocs(); [javac] ^ [javac] /l/areek/lucene/suggest/src/java/org/apache/lucene/search/suggest/document/NRTSuggester.java:208: error: local variable collector is accessed from within inner class; needs to be declared final [javac] collector.collect(docID); [javac] ^ [javac] /l/areek/lucene/suggest/src/java/org/apache/lucene/search/suggest/document/CompletionFieldsProducer.java:164: error: CompletionFieldsProducer.CompletionsTermsReader is not abstract and does not override abstract method getChildResources() in Accountable [javac] private class CompletionsTermsReader implements Accountable { [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 2 errors [javac] 1 warning {noformat} Not sure why we need an FSTBuilder inside the NRTSuggesterBuilder; can't the first be absorbed into the latter? Can NRTSuggesterBuilder be package private? Ie the public API here is the postings format and SuggestIndexSearcher / SuggestTopDocs? I think other things can be private, e.g. CompletionTokenStream. Can you use CodecUtil.writeIndexHeader when storing the FST? It also stores the segment ID and file extension in the header. And then CodecUtil.checkIndexHeader at read-time. CompletionTermsReader.lookup() should be sync'd? Else two threads could try to use the IndexInput (dictIn) at once? Maybe we should move the code in SuggestIndexSearcher.suggest into a new TopSuggestDocs.merge method? Do we really need the separate SegmentLookup interface? Seems like we can just invoke lookup method directly on CompletionTerms? Why do we allow -1 weight? And why do we restrict to int not long (other suggesters are long I think, though it does seem like overkill!). [suggest] Near real time Document Suggester --- Key: LUCENE-6339 URL: https://issues.apache.org/jira/browse/LUCENE-6339 Project: Lucene - Core Issue Type: New Feature Components: core/search Affects Versions: 5.0 Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0 Attachments: LUCENE-6339.patch The idea is to index documents with one or more *SuggestField*(s) and be able to suggest documents with a *SuggestField* value that matches a given key. A SuggestField can be assigned a numeric weight to be used to score the suggestion at query time. Document suggestion can be done on an indexed *SuggestField*. The document suggester can filter out deleted documents in near real-time. The suggester can filter out documents based on a Filter (note: may change to a non-scoring query?) at query time. A custom postings format (CompletionPostingsFormat) is used to index SuggestField(s) and perform document suggestions. h4. Usage {code:java} // hook up custom postings format // indexAnalyzer for SuggestField Analyzer analyzer = ... IndexWriterConfig config = new IndexWriterConfig(analyzer); Codec codec = new Lucene50Codec() { @Override public PostingsFormat getPostingsFormatForField(String field) { if (isSuggestField(field)) { return new CompletionPostingsFormat(super.getPostingsFormatForField(field)); } return super.getPostingsFormatForField(field); } }; config.setCodec(codec); IndexWriter writer = new IndexWriter(dir, config); // index some documents with suggestions Document doc = new Document(); doc.add(new SuggestField(suggest_title, title1, 2)); doc.add(new SuggestField(suggest_name, name1, 3)); writer.addDocument(doc) ... // open an nrt reader for the directory DirectoryReader reader = DirectoryReader.open(writer, false); // SuggestIndexSearcher is a thin wrapper over IndexSearcher // queryAnalyzer will be used to analyze the query string SuggestIndexSearcher indexSearcher = new SuggestIndexSearcher(reader,