[jira] [Updated] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings
[ https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-5594: --- Attachment: SOLR-5594.patch Patch with the following changes: * Fixes to SimpleQParserPlugin and PrefixQParserPlugin * Test to show that prefix query for integer fields works as it did prior to this change. * Test to show how custom fields override getPrefixQuery() method for 2 different custom fields. Enable using extended field types with prefix queries for non-default encoded strings - Key: SOLR-5594 URL: https://issues.apache.org/jira/browse/SOLR-5594 Project: Solr Issue Type: Improvement Components: query parsers, Schema and Analysis Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Priority: Minor Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, SOLR-5594.patch, SOLR-5594.patch Enable users to be able to use prefix query with custom field types with non-default encoding/decoding for queries more easily. e.g. having a custom field work with base64 encoded query strings. Currently, the workaround for it is to have the override at getRewriteMethod level. Perhaps having the prefixQuery also use the calling FieldType's readableToIndexed method would work better. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings
[ https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-5594: --- Attachment: (was: SOLR-5594.patch) Enable using extended field types with prefix queries for non-default encoded strings - Key: SOLR-5594 URL: https://issues.apache.org/jira/browse/SOLR-5594 Project: Solr Issue Type: Improvement Components: query parsers, Schema and Analysis Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Priority: Minor Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, SOLR-5594.patch, SOLR-5594.patch Enable users to be able to use prefix query with custom field types with non-default encoding/decoding for queries more easily. e.g. having a custom field work with base64 encoded query strings. Currently, the workaround for it is to have the override at getRewriteMethod level. Perhaps having the prefixQuery also use the calling FieldType's readableToIndexed method would work better. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings
[ https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-5594: --- Attachment: SOLR-5594.patch There was something wrong with the last patch. Here's another one. Enable using extended field types with prefix queries for non-default encoded strings - Key: SOLR-5594 URL: https://issues.apache.org/jira/browse/SOLR-5594 Project: Solr Issue Type: Improvement Components: query parsers, Schema and Analysis Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Priority: Minor Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, SOLR-5594.patch, SOLR-5594.patch Enable users to be able to use prefix query with custom field types with non-default encoding/decoding for queries more easily. e.g. having a custom field work with base64 encoded query strings. Currently, the workaround for it is to have the override at getRewriteMethod level. Perhaps having the prefixQuery also use the calling FieldType's readableToIndexed method would work better. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866512#comment-13866512 ] ASF subversion and git services commented on LUCENE-5376: - Commit 1556775 from [~mikemccand] in branch 'dev/branches/lucene5376' [ https://svn.apache.org/r1556775 ] LUCENE-5376: allow setting norms format, including compressed norms Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4906) PostingsHighlighter's PassageFormatter should allow for rendering to arbitrary objects
[ https://issues.apache.org/jira/browse/LUCENE-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866535#comment-13866535 ] ASF subversion and git services commented on LUCENE-4906: - Commit 1556786 from [~mikemccand] in branch 'dev/branches/lucene5376' [ https://svn.apache.org/r1556786 ] LUCENE-4906, LUCENE-5376: using the expert 'render to Object' APIs in PostingsHighlighter to render directly to JSONArray in lucene server PostingsHighlighter's PassageFormatter should allow for rendering to arbitrary objects -- Key: LUCENE-4906 URL: https://issues.apache.org/jira/browse/LUCENE-4906 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.6, 5.0 Attachments: LUCENE-4906.patch, LUCENE-4906.patch, LUCENE-4906.patch For example, in a server, I may want to render the highlight result to JsonObject to send back to the front-end. Today since we render to string, I have to render to JSON string and then re-parse to JsonObject, which is inefficient... Or, if (Rob's idea:) we make a query that's like MoreLikeThis but it pulls terms from snippets instead, so you get proximity-influenced salient/expanded terms, then perhaps that renders to just an array of tokens or fragments or something from each snippet. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5376) Add a demo search server
[ https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866536#comment-13866536 ] ASF subversion and git services commented on LUCENE-5376: - Commit 1556786 from [~mikemccand] in branch 'dev/branches/lucene5376' [ https://svn.apache.org/r1556786 ] LUCENE-4906, LUCENE-5376: using the expert 'render to Object' APIs in PostingsHighlighter to render directly to JSONArray in lucene server Add a demo search server Key: LUCENE-5376 URL: https://issues.apache.org/jira/browse/LUCENE-5376 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Attachments: lucene-demo-server.tgz I think it'd be useful to have a demo search server for Lucene. Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting. The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc. This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard. I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change. As a starting point, I'll post what I built for the eating your own dog food search app for Lucene's Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing searching APIs via JSON, but it's very rough (lots nocommits). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866561#comment-13866561 ] Benson Margulies commented on LUCENE-5388: -- Should I try to get the branch in git to match the .patch, or should I just let you proceed from here? I guess that might depend on reactions of others. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies Attachments: LUCENE-5388.patch In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5213) collections?action=SPLITSHARD parent vs. sub-shards numDocs
[ https://issues.apache.org/jira/browse/SOLR-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866565#comment-13866565 ] Ramkumar Aiyengar commented on SOLR-5213: - Shalin, any objection to this patch going in? May be with SOLR-5338, the severity of the 0 shard case can be reduced from log.error, but the patch should good otherwise.. collections?action=SPLITSHARD parent vs. sub-shards numDocs --- Key: SOLR-5213 URL: https://issues.apache.org/jira/browse/SOLR-5213 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.4 Reporter: Christine Poerschke Assignee: Shalin Shekhar Mangar Attachments: SOLR-5213.patch The problem we saw was that splitting a shard took a long time and at the end of it the sub-shards contained fewer documents than the original shard. The root cause was eventually tracked down to the disappearing documents not falling into the hash ranges of the sub-shards. Could SolrIndexSplitter split report per-segment numDocs for parent and sub-shards, with at least a warning logged for any discrepancies (documents falling into none of the sub-shards or documents falling into several sub-shards)? Additionally, could a case be made for erroring out when discrepancies are detected i.e. not proceeding with the shard split? Either to always error or to have an verifyNumDocs=false/true optional parameter for the SPLITSHARD action. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5213) collections?action=SPLITSHARD parent vs. sub-shards numDocs
[ https://issues.apache.org/jira/browse/SOLR-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866565#comment-13866565 ] Ramkumar Aiyengar edited comment on SOLR-5213 at 1/9/14 11:25 AM: -- Shalin, any objection to this patch going in? May be with SOLR-5338, the severity of the 0 shard case can be reduced from log.error (alternatively, it could check for split.key being present and decide severity if we want to be smarter), but the patch should good otherwise.. was (Author: andyetitmoves): Shalin, any objection to this patch going in? May be with SOLR-5338, the severity of the 0 shard case can be reduced from log.error, but the patch should good otherwise.. collections?action=SPLITSHARD parent vs. sub-shards numDocs --- Key: SOLR-5213 URL: https://issues.apache.org/jira/browse/SOLR-5213 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.4 Reporter: Christine Poerschke Assignee: Shalin Shekhar Mangar Attachments: SOLR-5213.patch The problem we saw was that splitting a shard took a long time and at the end of it the sub-shards contained fewer documents than the original shard. The root cause was eventually tracked down to the disappearing documents not falling into the hash ranges of the sub-shards. Could SolrIndexSplitter split report per-segment numDocs for parent and sub-shards, with at least a warning logged for any discrepancies (documents falling into none of the sub-shards or documents falling into several sub-shards)? Additionally, could a case be made for erroring out when discrepancies are detected i.e. not proceeding with the shard split? Either to always error or to have an verifyNumDocs=false/true optional parameter for the SPLITSHARD action. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings
[ https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866615#comment-13866615 ] Robert Muir commented on SOLR-5594: --- Can we avoid reformatting SimpleQParser here? it makes it impossible to review the changes Enable using extended field types with prefix queries for non-default encoded strings - Key: SOLR-5594 URL: https://issues.apache.org/jira/browse/SOLR-5594 Project: Solr Issue Type: Improvement Components: query parsers, Schema and Analysis Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Priority: Minor Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, SOLR-5594.patch, SOLR-5594.patch Enable users to be able to use prefix query with custom field types with non-default encoding/decoding for queries more easily. e.g. having a custom field work with base64 encoded query strings. Currently, the workaround for it is to have the override at getRewriteMethod level. Perhaps having the prefixQuery also use the calling FieldType's readableToIndexed method would work better. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866617#comment-13866617 ] Robert Muir commented on LUCENE-5388: - Benson, dont worry about it, I think its good. I just put up the patch so that Uwe might look at it. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies Attachments: LUCENE-5388.patch In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866627#comment-13866627 ] Uwe Schindler commented on LUCENE-5388: --- I am fine with this patch in trunk only. We can decide later if we backport. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies Attachments: LUCENE-5388.patch In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866640#comment-13866640 ] ASF subversion and git services commented on LUCENE-5388: - Commit 1556801 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1556801 ] LUCENE-5388: remove Reader from Tokenizer ctor (closes #16) Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies Attachments: LUCENE-5388.patch In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene-solr pull request: LUCENE-5388: code and highlighting changes to rem...
Github user benson-basis closed the pull request at: https://github.com/apache/lucene-solr/pull/16 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5388: -- Fix Version/s: 5.0 Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies Fix For: 5.0 Attachments: LUCENE-5388.patch In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866641#comment-13866641 ] Uwe Schindler commented on LUCENE-5388: --- Juhu! Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies Fix For: 5.0 Attachments: LUCENE-5388.patch In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5388) Eliminate construction over readers for Tokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5388. - Resolution: Fixed Marking fixed for 5.0. Thanks a lot Benson for doing all the grunt work here. Note: If you really want a backport please just open an issue and hash it out. Eliminate construction over readers for Tokenizer - Key: LUCENE-5388 URL: https://issues.apache.org/jira/browse/LUCENE-5388 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Benson Margulies Fix For: 5.0 Attachments: LUCENE-5388.patch In the modern world, Tokenizers are intended to be reusable, with input supplied via #setReader. The constructors that take Reader are a vestige. Worse yet, they invite people to make mistakes in handling the reader that tangle them up with the state machine in Tokenizer. The sensible thing is to eliminate these ctors, and force setReader usage. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Nested Grouping / Field Collapsing
Kranti, You've got it exactly. And yes sorting and limiting the doclist within the nested groups will be supported. Joel Bernstein Search Engineer at Heliosearch On Wed, Jan 8, 2014 at 6:54 PM, Kranti Parisa kranti.par...@gmail.comwrote: Joel, 1) Collapse on the top level group. - done thru CollapsingQParserPlugin 2) Expand a single page of collapsed results to display nested groups. - probably done thru ExpandComponent Is that correct? and does the scope of ExpandComponent includes the options to sort and limit the docList within the nested groups? Which means, we are going to first create the top level groups and while expanding each group, we create nested groups and allow to pass the sort, limit params? Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Wed, Jan 8, 2014 at 5:48 PM, Joel Bernstein joels...@gmail.com wrote: Kranti, I'm wondering if this can be separated into two phases: 1) Collapse on the top level group. 2) Expand a single page of collapsed results to display nested groups. I'll be working on the ExpandComponent shortly, which will expand a single page of results that were collapsed by the CollapsingQParserPlugin. This seems like something that could be implemented as part of the ExpandComponent. Joel Joel Bernstein Search Engineer at Heliosearch On Wed, Jan 8, 2014 at 12:28 PM, Kranti Parisa kranti.par...@gmail.comwrote: Anyone has got latest updates for https://issues.apache.org/jira/browse/SOLR-2553 ? I am trying to take a look at the implementation and see how complex this is to achieve. If someone else had a look into it earlier, could you please share your thoughts/comments.. Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa
Analysis API next step: Reader-CharFilter?
Now that we're forcing everyone to think about the Analysis API in 5.0, what do you think of making the fundamental input source be a CharFilter, thus removing the need for instanceof-ing? To touch a hotter potato, I also wonder about 'reset()'. In a world where the only way to put something in there is setReader, do we need 'reset' in between setReader and incrementToken? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5390) Loosen assert in IW on pending event after close
Simon Willnauer created LUCENE-5390: --- Summary: Loosen assert in IW on pending event after close Key: LUCENE-5390 URL: https://issues.apache.org/jira/browse/LUCENE-5390 Project: Lucene - Core Issue Type: Task Affects Versions: 4.6, 5.0, 4.7, 4.6.1 Reporter: Simon Willnauer Priority: Minor Fix For: 5.0, 4.7, 4.6.1 Sometimes the assert in the IW is tripped due to pending merge events. Those events can always happen but they are meaningless since we close / rollback the IW anyway. I suggest we loosen the assert here to not fail if there are only pending merge events. noformat 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads Error Message: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Caused by: java.lang.RuntimeException: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at __randomizedtesting.SeedInfo.seed([98DFB1602D9F9A2A]:0) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619) Caused by: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2026) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:575) /noformat -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Analysis API next step: Reader-CharFilter?
On Thu, Jan 9, 2014 at 9:08 AM, Benson Margulies bimargul...@gmail.com wrote: Now that we're forcing everyone to think about the Analysis API in 5.0, what do you think of making the fundamental input source be a CharFilter, thus removing the need for instanceof-ing? Personally, i don't like doing that, because when we change a parameter from a 'standard jdk' one to a custom lucene one, it makes the API harder to grok as its more classes the user *must* wrap their head around. On the other hand, today users only have to grok CharFilter if they want to do CharFiltering, which is pretty expert. Instanceofs are cheap in java, what is the benefit? To touch a hotter potato, I also wonder about 'reset()'. In a world where the only way to put something in there is setReader, do we need 'reset' in between setReader and incrementToken? But the main issue is TokenStream: it doesnt have any concept of Readers baked in. So there must be a way to reset state in things like TokenFilters, too. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5390) Loosen assert in IW on pending event after close
[ https://issues.apache.org/jira/browse/LUCENE-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-5390: Attachment: LUCENE-5390.patch here is a patch Loosen assert in IW on pending event after close Key: LUCENE-5390 URL: https://issues.apache.org/jira/browse/LUCENE-5390 Project: Lucene - Core Issue Type: Task Affects Versions: 4.6, 5.0, 4.7, 4.6.1 Reporter: Simon Willnauer Priority: Minor Fix For: 5.0, 4.7, 4.6.1 Attachments: LUCENE-5390.patch Sometimes the assert in the IW is tripped due to pending merge events. Those events can always happen but they are meaningless since we close / rollback the IW anyway. I suggest we loosen the assert here to not fail if there are only pending merge events. noformat 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads Error Message: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Caused by: java.lang.RuntimeException: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at __randomizedtesting.SeedInfo.seed([98DFB1602D9F9A2A]:0) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619) Caused by: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2026) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:575) /noformat -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5579) Leader stops processing collection-work-queue after failed collection reload
[ https://issues.apache.org/jira/browse/SOLR-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866696#comment-13866696 ] Eric Bus commented on SOLR-5579: Just a quick update: the leader again stopped working. I had to restart the cluster to get everything working again. The script that is running to check the status did not work, so unfortunately I don't have additional information from the logs. When I do, I'll report back here. Leader stops processing collection-work-queue after failed collection reload Key: SOLR-5579 URL: https://issues.apache.org/jira/browse/SOLR-5579 Project: Solr Issue Type: Bug Affects Versions: 4.5.1 Environment: Debian Linux 6.0 running on VMWare Using embedded SOLR Jetty. Reporter: Eric Bus Assignee: Mark Miller Labels: collections, queue I've been experiencing the same problem a few times now. My leader in /overseer_elect/leader stops processing the collection queue at /overseer/collection-queue-work. The queue will build up and it will trigger an alert in my monitoring tool. I haven't been able to pinpoint the reason that the leader stops, but usually I kill the leader node to trigger a leader election. The new node will pick up the queue. And this is where the problems start. When the new leader is processing the queue and picks up a reload for a shard without an active leader, the queue stops. It keeps repeating the message that there is no active leader for the shard. But a new leader is never elected: {quote} ERROR - 2013-12-24 14:43:40.390; org.apache.solr.common.SolrException; Error while trying to recover. core=magento_349_shard1_replica1:org.apache.solr.common.SolrException: No registered leader was found, collection:magento_349 slice:shard1 at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:482) at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:465) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:317) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219) ERROR - 2013-12-24 14:43:40.391; org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying again... (7) core=magento_349_shard1_replica1 INFO - 2013-12-24 14:43:40.391; org.apache.solr.cloud.RecoveryStrategy; Wait 256.0 seconds before trying to recover again (8) {quote} Is the leader election in some way connected to the collection queue? If so, can this be a deadlock, because it won't elect until the reload is complete? -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings
[ https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866713#comment-13866713 ] Anshum Gupta commented on SOLR-5594: Robert, I thought about how to handle SimleQParser with this change before I even put up this patch but I can't think of another way to handle it here. This seems like the best way to go as far as handling SimpleQParser for this change is concerned. Enable using extended field types with prefix queries for non-default encoded strings - Key: SOLR-5594 URL: https://issues.apache.org/jira/browse/SOLR-5594 Project: Solr Issue Type: Improvement Components: query parsers, Schema and Analysis Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Priority: Minor Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, SOLR-5594.patch, SOLR-5594.patch Enable users to be able to use prefix query with custom field types with non-default encoding/decoding for queries more easily. e.g. having a custom field work with base64 encoded query strings. Currently, the workaround for it is to have the override at getRewriteMethod level. Perhaps having the prefixQuery also use the calling FieldType's readableToIndexed method would work better. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings
[ https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866720#comment-13866720 ] Robert Muir commented on SOLR-5594: --- Thats not what i mean: i mean that in the patch its not possible to see your actual logic changes, because every single line of code is reformatted. Enable using extended field types with prefix queries for non-default encoded strings - Key: SOLR-5594 URL: https://issues.apache.org/jira/browse/SOLR-5594 Project: Solr Issue Type: Improvement Components: query parsers, Schema and Analysis Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Priority: Minor Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, SOLR-5594.patch, SOLR-5594.patch Enable users to be able to use prefix query with custom field types with non-default encoding/decoding for queries more easily. e.g. having a custom field work with base64 encoded query strings. Currently, the workaround for it is to have the override at getRewriteMethod level. Perhaps having the prefixQuery also use the calling FieldType's readableToIndexed method would work better. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1301) Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce.
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866723#comment-13866723 ] ASF subversion and git services commented on SOLR-1301: --- Commit 1556846 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1556846 ] SOLR-1301: IntelliJ config: morphlines-cell Solr contrib needs lucene-core test-scope dependency Add a Solr contrib that allows for building Solr indexes via Hadoop's Map-Reduce. - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: New Feature Reporter: Andrzej Bialecki Assignee: Mark Miller Fix For: 5.0, 4.7 Attachments: README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings
[ https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866728#comment-13866728 ] Anshum Gupta commented on SOLR-5594: I'm sorry I misread it. Perhaps it's something that idea did. Let me have a look at it and fix that. Thanks for pointing that out. Enable using extended field types with prefix queries for non-default encoded strings - Key: SOLR-5594 URL: https://issues.apache.org/jira/browse/SOLR-5594 Project: Solr Issue Type: Improvement Components: query parsers, Schema and Analysis Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Priority: Minor Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, SOLR-5594.patch, SOLR-5594.patch Enable users to be able to use prefix query with custom field types with non-default encoding/decoding for queries more easily. e.g. having a custom field work with base64 encoded query strings. Currently, the workaround for it is to have the override at getRewriteMethod level. Perhaps having the prefixQuery also use the calling FieldType's readableToIndexed method would work better. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866729#comment-13866729 ] Markus Jelsma commented on SOLR-5379: - Yes +1 Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Iterating BinaryDocValues
Don't you think it's worth to raise a jira regarding those 'new bytes[]' ? I'm able to provide a patch if you wish. On Wed, Jan 8, 2014 at 2:02 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: FWIW, Micro benchmark shows 4% gain on reusing incoming ByteRef.bytes in short binary docvalues Test2BBinaryDocValues.testVariableBinary() with mmap directory. I wonder why it doesn't reads into incoming bytes https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401 On Wed, Jan 8, 2014 at 12:53 AM, Michael McCandless luc...@mikemccandless.com wrote: Going sequentially should help, if the pages are not hot (in the OS's IO cache). You can also use a different DVFormat, e.g. Direct, but this holds all bytes in RAM. Mike McCandless http://blog.mikemccandless.com On Tue, Jan 7, 2014 at 1:09 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Joel, I tried to hack it straightforwardly, but found no free gain there. The only attempt I can suggest is to try to reuse bytes in https://github.com/apache/lucene-solr/blame/trunk/lucene/core/src/java/org/apache/lucene/codecs/lucene45/Lucene45DocValuesProducer.java#L401 right now it allocates bytes every time, which beside of GC can also impact memory access locality. Could you try fix memory waste and repeat performance test? Have a good hack! On Mon, Dec 23, 2013 at 9:51 PM, Joel Bernstein joels...@gmail.com wrote: Hi, I'm looking for a faster way to perform large scale docId - bytesRef lookups for BinaryDocValues. I'm finding that I can't get the performance that I need from the random access seek in the BinaryDocValues interface. I'm wondering if sequentially scanning the docValues would be a faster approach. I have a BitSet of matching docs, so if I sequentially moved through the docValues I could test each one against that bitset. Wondering if that approach would be faster for bulk extracts and how tricky it would be to add an iterator to the BinaryDocValues interface? Thanks, Joel -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
[jira] [Updated] (LUCENE-5354) Blended score in AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remi Melisson updated LUCENE-5354: -- Attachment: LUCENE-5354_3.patch Hi! Here is new patch including your comment for the coefficient calculation (I guess a Lambda function would be perfect here!). I ran the performance test on my laptop, here is the results compared to the AnalyzingInfixSuggester : -- construction time AnalyzingInfixSuggester input: 50001, time[ms]: 1780 [+- 367.58] BlendedInfixSuggester input: 50001, time[ms]: 6507 [+- 2106.52] -- prefixes: 2-4, num: 7, onlyMorePopular: false AnalyzingInfixSuggester queries: 50001, time[ms]: 6804 [+- 1403.13], ~kQPS: 7 BlendedInfixSuggester queries: 50001, time[ms]: 26503 [+- 2624.41], ~kQPS: 2 -- prefixes: 6-9, num: 7, onlyMorePopular: false AnalyzingInfixSuggester queries: 50001, time[ms]: 3995 [+- 551.20], ~kQPS: 13 BlendedInfixSuggester queries: 50001, time[ms]: 5355 [+- 1295.41], ~kQPS: 9 -- prefixes: 100-200, num: 7, onlyMorePopular: false AnalyzingInfixSuggester queries: 50001, time[ms]: 2626 [+- 588.43], ~kQPS: 19 BlendedInfixSuggester queries: 50001, time[ms]: 1980 [+- 574.16], ~kQPS: 25 -- RAM consumption AnalyzingInfixSuggester size[B]:1,430,920 BlendedInfixSuggester size[B]:1,630,488 If you have any idea on how we could improve the performance, let me know (see above my comment for your previous suggestion to avoid visiting term vectors). Blended score in AnalyzingInfixSuggester Key: LUCENE-5354 URL: https://issues.apache.org/jira/browse/LUCENE-5354 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Affects Versions: 4.4 Reporter: Remi Melisson Priority: Minor Labels: suggester Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch, LUCENE-5354_3.patch I'm working on a custom suggester derived from the AnalyzingInfix. I require what is called a blended score (//TODO ln.399 in AnalyzingInfixSuggester) to transform the suggestion weights depending on the position of the searched term(s) in the text. Right now, I'm using an easy solution : If I want 10 suggestions, then I search against the current ordered index for the 100 first results and transform the weight : bq. a) by using the term position in the text (found with TermVector and DocsAndPositionsEnum) or bq. b) by multiplying the weight by the score of a SpanQuery that I add when searching and return the updated 10 most weighted suggestions. Since we usually don't need to suggest so many things, the bigger search + rescoring overhead is not so significant but I agree that this is not the most elegant solution. We could include this factor (here the position of the term) directly into the index. So, I can contribute to this if you think it's worth adding it. Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a dedicated class ? -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5594) Enable using extended field types with prefix queries for non-default encoded strings
[ https://issues.apache.org/jira/browse/SOLR-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-5594: --- Attachment: SOLR-5594.patch Fixed the reformatting, however as things have moved (and there's been a level change.. new inner classes etc) it still looks a little tricky but yes, it's no longer just reformatted code in the patch. Enable using extended field types with prefix queries for non-default encoded strings - Key: SOLR-5594 URL: https://issues.apache.org/jira/browse/SOLR-5594 Project: Solr Issue Type: Improvement Components: query parsers, Schema and Analysis Affects Versions: 4.6 Reporter: Anshum Gupta Assignee: Anshum Gupta Priority: Minor Attachments: SOLR-5594-branch_4x.patch, SOLR-5594.patch, SOLR-5594.patch, SOLR-5594.patch, SOLR-5594.patch Enable users to be able to use prefix query with custom field types with non-default encoding/decoding for queries more easily. e.g. having a custom field work with base64 encoded query strings. Currently, the workaround for it is to have the override at getRewriteMethod level. Perhaps having the prefixQuery also use the calling FieldType's readableToIndexed method would work better. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5389) Even more doc for construction of TokenStream components
[ https://issues.apache.org/jira/browse/LUCENE-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866765#comment-13866765 ] Benson Margulies commented on LUCENE-5389: -- [~rcmuir]I think that this is ready to go . If you commit this and merge down to 4.x, I can then tackle work on this file for the new stuff. Even more doc for construction of TokenStream components Key: LUCENE-5389 URL: https://issues.apache.org/jira/browse/LUCENE-5389 Project: Lucene - Core Issue Type: Improvement Reporter: Benson Margulies There are more useful things to tell would-be authors of tokenizers. Let's tell them. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5389) Even more doc for construction of TokenStream components
[ https://issues.apache.org/jira/browse/LUCENE-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866766#comment-13866766 ] Robert Muir commented on LUCENE-5389: - Thanks Benson! I'll take a look at this in a bit. Even more doc for construction of TokenStream components Key: LUCENE-5389 URL: https://issues.apache.org/jira/browse/LUCENE-5389 Project: Lucene - Core Issue Type: Improvement Reporter: Benson Margulies There are more useful things to tell would-be authors of tokenizers. Let's tell them. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: overriding getRangeQuery
On Jan 9, 2014, at 16:00, Shawn Grant shawn.gr...@orcatec.com wrote: Updating the parameters in the extension's method and rebuilding wasn't enough to fix the problem for me. Not sure what I'm missing. I checked in a fix on trunk a couple of days ago. Did you remove extensions.jar so that it would be rebuilt and rewrapped ? If this still doesn't fix it, please write a small test case so that I can reproduce this. Thanks ! It looks like this bug also affects the analysis of the range clause. I need the terms to be case sensitive so I'm using a per-field analyzer to make sure that field doesn't get lowercased but it's getting ignored and sent to the default analyzer. That's probably an issue with using Lucene itself. You should ask about this on java-u...@lucene.apache.org. Andi.. On 01/03/2014 05:38 PM, Andi Vajda wrote: On Jan 3, 2014, at 21:35, Shawn Grant shawn.gr...@orcatec.com wrote: whoops, bad link expansion. Was supposed to be: getRangeQuery(String field, String part1, String part2, boolean startInclusive, boolean endInclusive); Yes, that would be the problem. The signature changed but the extension's didn't. Andi.. On 01/03/2014 04:33 PM, Shawn Grant wrote: I have a subclass of PythonQueryParser that overrides several methods but I can't seem to get it to use getRangeQuery. I noticed that the method definition in PythonQueryParser is: getRangeQuery(String field, String part1, String part2, boolean inclusive); but the lucene definition for QueryParser (in QueryParserBase) is: |*getRangeQuery https://lucene.apache.org/core/4_4_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#getRangeQuery%28java.lang.String,%20java.lang.String,%20java.lang.String,%20boolean,%20boolean%29*(String http://download.oracle.com/javase/6/docs/api/java/lang/String.html?is-external=true field, String http://download.oracle.com/javase/6/docs/api/java/lang/String.html?is-external=true part1, String http://download.oracle.com/javase/6/docs/api/java/lang/String.html?is-external=true part2, boolean startInclusive, boolean endInclusive)| Is that an issue?
[jira] [Comment Edited] (LUCENE-5354) Blended score in AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866760#comment-13866760 ] Remi Melisson edited comment on LUCENE-5354 at 1/9/14 4:57 PM: --- Hi! Here is new patch including your comment for the coefficient calculation (I guess a Lambda function would be perfect here!). I ran the performance test on my laptop, here are the results compared to the AnalyzingInfixSuggester : -- construction time AnalyzingInfixSuggester input: 50001, time[ms]: 1780 [+- 367.58] BlendedInfixSuggester input: 50001, time[ms]: 6507 [+- 2106.52] -- prefixes: 2-4, num: 7, onlyMorePopular: false AnalyzingInfixSuggester queries: 50001, time[ms]: 6804 [+- 1403.13], ~kQPS: 7 BlendedInfixSuggester queries: 50001, time[ms]: 26503 [+- 2624.41], ~kQPS: 2 -- prefixes: 6-9, num: 7, onlyMorePopular: false AnalyzingInfixSuggester queries: 50001, time[ms]: 3995 [+- 551.20], ~kQPS: 13 BlendedInfixSuggester queries: 50001, time[ms]: 5355 [+- 1295.41], ~kQPS: 9 -- prefixes: 100-200, num: 7, onlyMorePopular: false AnalyzingInfixSuggester queries: 50001, time[ms]: 2626 [+- 588.43], ~kQPS: 19 BlendedInfixSuggester queries: 50001, time[ms]: 1980 [+- 574.16], ~kQPS: 25 -- RAM consumption AnalyzingInfixSuggester size[B]:1,430,920 BlendedInfixSuggester size[B]:1,630,488 If you have any idea on how we could improve the performance, let me know (see above my comment for your previous suggestion to avoid visiting term vectors). was (Author: rmelisson): Hi! Here is new patch including your comment for the coefficient calculation (I guess a Lambda function would be perfect here!). I ran the performance test on my laptop, here is the results compared to the AnalyzingInfixSuggester : -- construction time AnalyzingInfixSuggester input: 50001, time[ms]: 1780 [+- 367.58] BlendedInfixSuggester input: 50001, time[ms]: 6507 [+- 2106.52] -- prefixes: 2-4, num: 7, onlyMorePopular: false AnalyzingInfixSuggester queries: 50001, time[ms]: 6804 [+- 1403.13], ~kQPS: 7 BlendedInfixSuggester queries: 50001, time[ms]: 26503 [+- 2624.41], ~kQPS: 2 -- prefixes: 6-9, num: 7, onlyMorePopular: false AnalyzingInfixSuggester queries: 50001, time[ms]: 3995 [+- 551.20], ~kQPS: 13 BlendedInfixSuggester queries: 50001, time[ms]: 5355 [+- 1295.41], ~kQPS: 9 -- prefixes: 100-200, num: 7, onlyMorePopular: false AnalyzingInfixSuggester queries: 50001, time[ms]: 2626 [+- 588.43], ~kQPS: 19 BlendedInfixSuggester queries: 50001, time[ms]: 1980 [+- 574.16], ~kQPS: 25 -- RAM consumption AnalyzingInfixSuggester size[B]:1,430,920 BlendedInfixSuggester size[B]:1,630,488 If you have any idea on how we could improve the performance, let me know (see above my comment for your previous suggestion to avoid visiting term vectors). Blended score in AnalyzingInfixSuggester Key: LUCENE-5354 URL: https://issues.apache.org/jira/browse/LUCENE-5354 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Affects Versions: 4.4 Reporter: Remi Melisson Priority: Minor Labels: suggester Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch, LUCENE-5354_3.patch I'm working on a custom suggester derived from the AnalyzingInfix. I require what is called a blended score (//TODO ln.399 in AnalyzingInfixSuggester) to transform the suggestion weights depending on the position of the searched term(s) in the text. Right now, I'm using an easy solution : If I want 10 suggestions, then I search against the current ordered index for the 100 first results and transform the weight : bq. a) by using the term position in the text (found with TermVector and DocsAndPositionsEnum) or bq. b) by multiplying the weight by the score of a SpanQuery that I add when searching and return the updated 10 most weighted suggestions. Since we usually don't need to suggest so many things, the bigger search + rescoring overhead is not so significant but I agree that this is not the most elegant solution. We could include this factor (here the position of the term) directly into the index. So, I can contribute to this if you think it's worth adding it. Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a dedicated class ? -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866868#comment-13866868 ] ASF subversion and git services commented on SOLR-5541: --- Commit 1556903 from [~joel.bernstein] in branch 'dev/trunk' [ https://svn.apache.org/r1556903 ] SOLR-5541: Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Key: SOLR-5541 URL: https://issues.apache.org/jira/browse/SOLR-5541 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.6 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch The QueryElevationComponent currently uses an xml file to map query strings to elevateIds and excludeIds. This ticket adds the ability to pass in elevateIds and excludeIds through two new http parameters elevateIds and excludeIds. This will allow more sophisticated business logic to be used in selecting which ids to elevate/exclude. Proposed syntax: http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8 The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866922#comment-13866922 ] ASF subversion and git services commented on SOLR-5541: --- Commit 1556923 from [~joel.bernstein] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1556923 ] SOLR-5541: Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Key: SOLR-5541 URL: https://issues.apache.org/jira/browse/SOLR-5541 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.6 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch The QueryElevationComponent currently uses an xml file to map query strings to elevateIds and excludeIds. This ticket adds the ability to pass in elevateIds and excludeIds through two new http parameters elevateIds and excludeIds. This will allow more sophisticated business logic to be used in selecting which ids to elevate/exclude. Proposed syntax: http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8 The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein resolved SOLR-5541. -- Resolution: Fixed Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Key: SOLR-5541 URL: https://issues.apache.org/jira/browse/SOLR-5541 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.6 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch The QueryElevationComponent currently uses an xml file to map query strings to elevateIds and excludeIds. This ticket adds the ability to pass in elevateIds and excludeIds through two new http parameters elevateIds and excludeIds. This will allow more sophisticated business logic to be used in selecting which ids to elevate/exclude. Proposed syntax: http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8 The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Nested Grouping / Field Collapsing
That's cool. just curious, do you have any tentative timelines for the ExpandComponent? Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Thu, Jan 9, 2014 at 8:37 AM, Joel Bernstein joels...@gmail.com wrote: Kranti, You've got it exactly. And yes sorting and limiting the doclist within the nested groups will be supported. Joel Bernstein Search Engineer at Heliosearch On Wed, Jan 8, 2014 at 6:54 PM, Kranti Parisa kranti.par...@gmail.comwrote: Joel, 1) Collapse on the top level group. - done thru CollapsingQParserPlugin 2) Expand a single page of collapsed results to display nested groups. - probably done thru ExpandComponent Is that correct? and does the scope of ExpandComponent includes the options to sort and limit the docList within the nested groups? Which means, we are going to first create the top level groups and while expanding each group, we create nested groups and allow to pass the sort, limit params? Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Wed, Jan 8, 2014 at 5:48 PM, Joel Bernstein joels...@gmail.comwrote: Kranti, I'm wondering if this can be separated into two phases: 1) Collapse on the top level group. 2) Expand a single page of collapsed results to display nested groups. I'll be working on the ExpandComponent shortly, which will expand a single page of results that were collapsed by the CollapsingQParserPlugin. This seems like something that could be implemented as part of the ExpandComponent. Joel Joel Bernstein Search Engineer at Heliosearch On Wed, Jan 8, 2014 at 12:28 PM, Kranti Parisa kranti.par...@gmail.comwrote: Anyone has got latest updates for https://issues.apache.org/jira/browse/SOLR-2553 ? I am trying to take a look at the implementation and see how complex this is to achieve. If someone else had a look into it earlier, could you please share your thoughts/comments.. Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa
[jira] [Created] (SOLR-5621) Let Solr use Lucene's SeacherManager
Tomás Fernández Löbbe created SOLR-5621: --- Summary: Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe updated SOLR-5621: Attachment: SOLR-5621.patch Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866962#comment-13866962 ] Uwe Schindler commented on SOLR-5621: - Thanks for opening this. This is really a good idea, I had the same idea in the past, but my Solr internals knowledge was to limited to be successful here. Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5390) Loosen assert in IW on pending event after close
[ https://issues.apache.org/jira/browse/LUCENE-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866979#comment-13866979 ] Michael McCandless commented on LUCENE-5390: +1 Loosen assert in IW on pending event after close Key: LUCENE-5390 URL: https://issues.apache.org/jira/browse/LUCENE-5390 Project: Lucene - Core Issue Type: Task Affects Versions: 4.6, 5.0, 4.7, 4.6.1 Reporter: Simon Willnauer Priority: Minor Fix For: 5.0, 4.7, 4.6.1 Attachments: LUCENE-5390.patch Sometimes the assert in the IW is tripped due to pending merge events. Those events can always happen but they are meaningless since we close / rollback the IW anyway. I suggest we loosen the assert here to not fail if there are only pending merge events. noformat 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads Error Message: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Caused by: java.lang.RuntimeException: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at __randomizedtesting.SeedInfo.seed([98DFB1602D9F9A2A]:0) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619) Caused by: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2026) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:575) /noformat -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866980#comment-13866980 ] Michael McCandless commented on SOLR-5621: -- +1, this would be awesome. Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866983#comment-13866983 ] Yonik Seeley commented on SOLR-5621: It seems like a ton of change, and a lot of risk to gain really no additional functionality. Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5354) Blended score in AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866989#comment-13866989 ] Michael McCandless commented on LUCENE-5354: Thanks Remi, the performance seems fine? But I realized this is not the best benchmark, since all suggestions are just a single token. New patch looks great; I think we should commit this approach, and performance improvements can come later if necessary. bq. see above my comment for your previous suggestion to avoid visiting term vectors Oh, the idea I had was to not use term vectors at all: you can get a TermsEnum for the normal inverted index, and then visit each term from the query, and then .advance to each doc from the top N results. But we can do this later ... I'll commit this patch (I'll make some small code style improvements, e.g. adding { } around all ifs). Blended score in AnalyzingInfixSuggester Key: LUCENE-5354 URL: https://issues.apache.org/jira/browse/LUCENE-5354 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Affects Versions: 4.4 Reporter: Remi Melisson Priority: Minor Labels: suggester Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch, LUCENE-5354_3.patch I'm working on a custom suggester derived from the AnalyzingInfix. I require what is called a blended score (//TODO ln.399 in AnalyzingInfixSuggester) to transform the suggestion weights depending on the position of the searched term(s) in the text. Right now, I'm using an easy solution : If I want 10 suggestions, then I search against the current ordered index for the 100 first results and transform the weight : bq. a) by using the term position in the text (found with TermVector and DocsAndPositionsEnum) or bq. b) by multiplying the weight by the score of a SpanQuery that I add when searching and return the updated 10 most weighted suggestions. Since we usually don't need to suggest so many things, the bigger search + rescoring overhead is not so significant but I agree that this is not the most elegant solution. We could include this factor (here the position of the term) directly into the index. So, I can contribute to this if you think it's worth adding it. Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a dedicated class ? -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 1207 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1207/ Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 10415 lines...] [junit4] JVM J0: stdout was not empty, see: /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20140109_195727_377.sysout [junit4] JVM J0: stdout (verbatim) [junit4] # [junit4] # A fatal error has been detected by the Java Runtime Environment: [junit4] # [junit4] # SIGBUS (0xa) at pc=0x0001394fd59f, pid=210, tid=111879 [junit4] # [junit4] # JRE version: Java(TM) SE Runtime Environment (7.0_45-b18) (build 1.7.0_45-b18) [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode bsd-amd64 ) [junit4] # Problematic frame: [junit4] # C 0x0001394fd59f [junit4] # [junit4] # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try ulimit -c unlimited before starting Java again [junit4] # [junit4] # An error report file with more information is saved as: [junit4] # /Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/J0/hs_err_pid210.log [junit4] # [junit4] # If you would like to submit a bug report, please visit: [junit4] # http://bugreport.sun.com/bugreport/crash.jsp [junit4] # The crash happened outside the Java Virtual Machine in native code. [junit4] # See problematic frame for where to report the bug. [junit4] # [junit4] JVM J0: EOF [...truncated 1 lines...] [junit4] ERROR: JVM J0 ended with an exception, command line: /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/jre/bin/java -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=DC1FA1AD8188BFD4 -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.docvaluesformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp -Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Djdk.map.althashing.threshold=0 -Dtests.disableHdfs=true -Dfile.encoding=ISO-8859-1 -classpath
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867007#comment-13867007 ] Tomás Fernández Löbbe commented on SOLR-5621: - That's true, however I think it's good because it allows Solr to reuse Lucene's components instead of duplicate the code. I understand that the SearcherManager was not originally used because it didn't exist by the time Solr was created, but now that it does (and AFAK it's a Lucene best practice for cases like this) we should try to adopt it. Also, I think it would allow Solr to also use Lucene's SearcherLifetimeManager for searcher leases, which I think could allow Solr to use internal docids for distributed search instead of the unique key. I know leases could be implemented in Solr too without using the SearcherLifetimeManager, but that way we continue duplicating functionality instead of using what's already built. Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4647) Grouping is broken on docvalues-only fields
[ https://issues.apache.org/jira/browse/SOLR-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867009#comment-13867009 ] Iker Huerga commented on SOLR-4647: --- Hi, I've been able to replicate the issue which I think happens when stored=false in schema.xml for the DocValue field type. I could start working on a patch for it if nobody else is already working on it. Thanks Iker Grouping is broken on docvalues-only fields --- Key: SOLR-4647 URL: https://issues.apache.org/jira/browse/SOLR-4647 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Adrien Grand Labels: newdev There are a few places where grouping uses FieldType.toObject(SchemaField.createField(String, float)) to translate a String field value to an Object. The problem is that createField returns null when the field is neither stored nor indexed, even if it has doc values. An option to fix it could be to use the ValueSource instead to resolve the Object value (similarily to NumericFacets). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867020#comment-13867020 ] Yonik Seeley commented on SOLR-5621: bq. That's true, however I think it's good because it allows Solr to reuse Lucene's components instead of duplicate the code. That's not a good enough reason for me. It would be if one were about to write the Solr code and it already existed in Lucene... but that's not the case. Lucene did the duplication of code here, and there's no reason Solr should have to move just because duplicated code now exists. Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5390) Loosen assert in IW on pending event after close
[ https://issues.apache.org/jira/browse/LUCENE-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867016#comment-13867016 ] ASF subversion and git services commented on LUCENE-5390: - Commit 1556942 from [~simonw] in branch 'dev/trunk' [ https://svn.apache.org/r1556942 ] LUCENE-5390: Loosen assert in IW on pending event after close Loosen assert in IW on pending event after close Key: LUCENE-5390 URL: https://issues.apache.org/jira/browse/LUCENE-5390 Project: Lucene - Core Issue Type: Task Affects Versions: 4.6, 5.0, 4.7, 4.6.1 Reporter: Simon Willnauer Priority: Minor Fix For: 5.0, 4.7, 4.6.1 Attachments: LUCENE-5390.patch Sometimes the assert in the IW is tripped due to pending merge events. Those events can always happen but they are meaningless since we close / rollback the IW anyway. I suggest we loosen the assert here to not fail if there are only pending merge events. noformat 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads Error Message: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Caused by: java.lang.RuntimeException: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at __randomizedtesting.SeedInfo.seed([98DFB1602D9F9A2A]:0) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619) Caused by: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2026) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:575) /noformat -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5390) Loosen assert in IW on pending event after close
[ https://issues.apache.org/jira/browse/LUCENE-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867033#comment-13867033 ] ASF subversion and git services commented on LUCENE-5390: - Commit 1556947 from [~simonw] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1556947 ] LUCENE-5390: Loosen assert in IW on pending event after close Loosen assert in IW on pending event after close Key: LUCENE-5390 URL: https://issues.apache.org/jira/browse/LUCENE-5390 Project: Lucene - Core Issue Type: Task Affects Versions: 4.6, 5.0, 4.7, 4.6.1 Reporter: Simon Willnauer Priority: Minor Fix For: 5.0, 4.7, 4.6.1 Attachments: LUCENE-5390.patch Sometimes the assert in the IW is tripped due to pending merge events. Those events can always happen but they are meaningless since we close / rollback the IW anyway. I suggest we loosen the assert here to not fail if there are only pending merge events. noformat 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads Error Message: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Caused by: java.lang.RuntimeException: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at __randomizedtesting.SeedInfo.seed([98DFB1602D9F9A2A]:0) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619) Caused by: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2026) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:575) /noformat -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867053#comment-13867053 ] Tomás Fernández Löbbe commented on SOLR-5621: - I'm not saying that Solr duplicated Lucene code or the other way around, I'm just saying that at this point, the code is duplicated. Lucene can't use Solr code, but Solr can use Lucene's. Making that happen would not only remove from Solr part of the code, but it would also improve the testing in both, Lucene and Solr. Using custom code also causes the need of more custom code (like in my previous example with the SearcherLifetimeManager). I think that as Lucene evolves, Solr should keep up to date with Lucene's changes and best practices, after all, it's the same Apache project, right? I do think these are good reasons. Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867062#comment-13867062 ] Yonik Seeley commented on SOLR-5621: bq. it's the same Apache project, right? It was supposed to be. Hasn't exactly worked out well IMO. Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5345) range facets don't work with float/double fields
[ https://issues.apache.org/jira/browse/LUCENE-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867065#comment-13867065 ] ASF subversion and git services commented on LUCENE-5345: - Commit 1556952 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1556952 ] LUCENE-5345: add new BlendedInfixSuggester range facets don't work with float/double fields Key: LUCENE-5345 URL: https://issues.apache.org/jira/browse/LUCENE-5345 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Michael McCandless Fix For: 5.0, 4.7 Attachments: LUCENE-5345.patch With LUCENE-5297 we generalized range faceting to accept a ValueSource. But, when I tried to use this to facet by distance ( 1 km, 2 km, etc.), it's not working ... the problem is that the RangeAccumulator always uses .longVal and assumes this was a double encoded as a long (via DoubleField). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5374) Call processEvents before IndexWriter is closed
[ https://issues.apache.org/jira/browse/LUCENE-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867066#comment-13867066 ] ASF subversion and git services commented on LUCENE-5374: - Commit 1556953 from [~simonw] in branch 'dev/branches/lucene_solr_4_6' [ https://svn.apache.org/r1556953 ] LUCENE-5374: Call IW#processEvents before IndexWriter is closed Call processEvents before IndexWriter is closed --- Key: LUCENE-5374 URL: https://issues.apache.org/jira/browse/LUCENE-5374 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.6 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 5.0, 4.7 Attachments: LUCENE-5374.patch We saw failures on jenkins that complain about processing events in the IW while the IW is already closed: {noformat} com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=193, name=Thread-133, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Caused by: java.lang.RuntimeException: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at __randomizedtesting.SeedInfo.seed([3FAF37E1AFFB2502]:0) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619) Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:645) at org.apache.lucene.index.IndexWriter.numDeletedDocs(IndexWriter.java:622) at org.apache.lucene.index.IndexWriter.segString(IndexWriter.java:4265) at org.apache.lucene.index.IndexWriter.publishFlushedSegment(IndexWriter.java:2324) at org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.publishFlushedSegment(DocumentsWriterFlushQueue.java:198) at org.apache.lucene.index.DocumentsWriterFlushQueue$FlushTicket.finishFlush(DocumentsWriterFlushQueue.java:213) at org.apache.lucene.index.DocumentsWriterFlushQueue$SegmentFlushTicket.publish(DocumentsWriterFlushQueue.java:249) at org.apache.lucene.index.DocumentsWriterFlushQueue.innerPurge(DocumentsWriterFlushQueue.java:116) at org.apache.lucene.index.DocumentsWriterFlushQueue.forcePurge(DocumentsWriterFlushQueue.java:138) at org.apache.lucene.index.DocumentsWriter.purgeBuffer(DocumentsWriter.java:185) at org.apache.lucene.index.IndexWriter.purge(IndexWriter.java:4634) at org.apache.lucene.index.DocumentsWriter$ForcedPurgeEvent.process(DocumentsWriter.java:701) at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4665) at org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4657) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1067) at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2106) at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2024) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:575) {noformat} we need to process the events before we enter the finally block in IW#closeInternal -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5345) range facets don't work with float/double fields
[ https://issues.apache.org/jira/browse/LUCENE-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867069#comment-13867069 ] ASF subversion and git services commented on LUCENE-5345: - Commit 1556954 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1556954 ] LUCENE-5345: add new BlendedInfixSuggester range facets don't work with float/double fields Key: LUCENE-5345 URL: https://issues.apache.org/jira/browse/LUCENE-5345 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Michael McCandless Fix For: 5.0, 4.7 Attachments: LUCENE-5345.patch With LUCENE-5297 we generalized range faceting to accept a ValueSource. But, when I tried to use this to facet by distance ( 1 km, 2 km, etc.), it's not working ... the problem is that the RangeAccumulator always uses .longVal and assumes this was a double encoded as a long (via DoubleField). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5345) range facets don't work with float/double fields
[ https://issues.apache.org/jira/browse/LUCENE-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867070#comment-13867070 ] Michael McCandless commented on LUCENE-5345: Woops, above commit was for LUCENE-5354 instead. range facets don't work with float/double fields Key: LUCENE-5345 URL: https://issues.apache.org/jira/browse/LUCENE-5345 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Michael McCandless Fix For: 5.0, 4.7 Attachments: LUCENE-5345.patch With LUCENE-5297 we generalized range faceting to accept a ValueSource. But, when I tried to use this to facet by distance ( 1 km, 2 km, etc.), it's not working ... the problem is that the RangeAccumulator always uses .longVal and assumes this was a double encoded as a long (via DoubleField). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5354) Blended score in AnalyzingInfixSuggester
[ https://issues.apache.org/jira/browse/LUCENE-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-5354. Resolution: Fixed Fix Version/s: 4.7 5.0 Thanks Remi! I committed with the wrong issue LUCENE-5345 by accident... Blended score in AnalyzingInfixSuggester Key: LUCENE-5354 URL: https://issues.apache.org/jira/browse/LUCENE-5354 Project: Lucene - Core Issue Type: Improvement Components: modules/spellchecker Affects Versions: 4.4 Reporter: Remi Melisson Priority: Minor Labels: suggester Fix For: 5.0, 4.7 Attachments: LUCENE-5354.patch, LUCENE-5354_2.patch, LUCENE-5354_3.patch I'm working on a custom suggester derived from the AnalyzingInfix. I require what is called a blended score (//TODO ln.399 in AnalyzingInfixSuggester) to transform the suggestion weights depending on the position of the searched term(s) in the text. Right now, I'm using an easy solution : If I want 10 suggestions, then I search against the current ordered index for the 100 first results and transform the weight : bq. a) by using the term position in the text (found with TermVector and DocsAndPositionsEnum) or bq. b) by multiplying the weight by the score of a SpanQuery that I add when searching and return the updated 10 most weighted suggestions. Since we usually don't need to suggest so many things, the bigger search + rescoring overhead is not so significant but I agree that this is not the most elegant solution. We could include this factor (here the position of the term) directly into the index. So, I can contribute to this if you think it's worth adding it. Do you think I should tweak AnalyzingInfixSuggester, subclass it or create a dedicated class ? -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5390) Loosen assert in IW on pending event after close
[ https://issues.apache.org/jira/browse/LUCENE-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867074#comment-13867074 ] ASF subversion and git services commented on LUCENE-5390: - Commit 1556956 from [~simonw] in branch 'dev/branches/lucene_solr_4_6' [ https://svn.apache.org/r1556956 ] LUCENE-5390: Loosen assert in IW on pending event after close Loosen assert in IW on pending event after close Key: LUCENE-5390 URL: https://issues.apache.org/jira/browse/LUCENE-5390 Project: Lucene - Core Issue Type: Task Affects Versions: 4.6, 5.0, 4.7, 4.6.1 Reporter: Simon Willnauer Priority: Minor Fix For: 5.0, 4.7, 4.6.1 Attachments: LUCENE-5390.patch Sometimes the assert in the IW is tripped due to pending merge events. Those events can always happen but they are meaningless since we close / rollback the IW anyway. I suggest we loosen the assert here to not fail if there are only pending merge events. noformat 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads Error Message: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Caused by: java.lang.RuntimeException: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at __randomizedtesting.SeedInfo.seed([98DFB1602D9F9A2A]:0) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619) Caused by: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2026) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:575) /noformat -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5390) Loosen assert in IW on pending event after close
[ https://issues.apache.org/jira/browse/LUCENE-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-5390. - Resolution: Fixed Loosen assert in IW on pending event after close Key: LUCENE-5390 URL: https://issues.apache.org/jira/browse/LUCENE-5390 Project: Lucene - Core Issue Type: Task Affects Versions: 4.6, 5.0, 4.7, 4.6.1 Reporter: Simon Willnauer Priority: Minor Fix For: 5.0, 4.7, 4.6.1 Attachments: LUCENE-5390.patch Sometimes the assert in the IW is tripped due to pending merge events. Those events can always happen but they are meaningless since we close / rollback the IW anyway. I suggest we loosen the assert here to not fail if there are only pending merge events. noformat 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterWithThreads.testRollbackAndCommitWithThreads Error Message: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=288, name=Thread-222, state=RUNNABLE, group=TGRP-TestIndexWriterWithThreads] Caused by: java.lang.RuntimeException: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at __randomizedtesting.SeedInfo.seed([98DFB1602D9F9A2A]:0) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:619) Caused by: java.lang.AssertionError: [org.apache.lucene.index.DocumentsWriter$MergePendingEvent@67ef293b] at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2026) at org.apache.lucene.index.TestIndexWriterWithThreads$1.run(TestIndexWriterWithThreads.java:575) /noformat -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5391) uax29urlemailtokenizer - unexpected tokenisation of index2.php (and other inputs)
Chris created LUCENE-5391: - Summary: uax29urlemailtokenizer - unexpected tokenisation of index2.php (and other inputs) Key: LUCENE-5391 URL: https://issues.apache.org/jira/browse/LUCENE-5391 Project: Lucene - Core Issue Type: Bug Reporter: Chris The uax29urlemailtokenizer tokenises index2.php as: URL index2.ph ALPHANUM p While it does not do the same for index.php Screenshot from analyser: http://postimg.org/image/aj6c98n3b/ -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5543) solr.xml duplicat eentries after SWAP 4.6
[ https://issues.apache.org/jira/browse/SOLR-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867142#comment-13867142 ] Shawn Heisey commented on SOLR-5543: With a verbal OK from [~markrmil...@gmail.com] via IRC, I am backporting the fix for this issue to 4.6.1. Both precommit and tests in solr/ are passing. The commits for trunk and branch_4x have no code changes, I'm just moving the CHANGES.txt entry. solr.xml duplicat eentries after SWAP 4.6 - Key: SOLR-5543 URL: https://issues.apache.org/jira/browse/SOLR-5543 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Bill Bell Assignee: Alan Woodward Fix For: 5.0, 4.7 Attachments: SOLR-5543.patch We are having issues with SWAP CoreAdmin in 4.6. Using legacy solr.xml we issue a COreodmin SWAP, and we want it persistent. It has been running flawless since 4.5. Now it creates duplicate lines in solr.xml. Even the example multi core schema in doesn't work with persistent=true - it creates duplicate lines in solr.xml. cores adminPath=/admin/cores core name=autosuggest loadOnStartup=true instanceDir=autosuggest transient=false/ core name=citystateprovider loadOnStartup=true instanceDir=citystateprovider transient=false/ core name=collection1 loadOnStartup=true instanceDir=collection1 transient=false/ core name=facility loadOnStartup=true instanceDir=facility transient=false/ core name=inactiveproviders loadOnStartup=true instanceDir=inactiveproviders transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ core name=locationgeo loadOnStartup=true instanceDir=locationgeo transient=false/ core name=market loadOnStartup=true instanceDir=market transient=false/ core name=portalprovider loadOnStartup=true instanceDir=portalprovider transient=false/ core name=practice loadOnStartup=true instanceDir=practice transient=false/ core name=provider loadOnStartup=true instanceDir=provider transient=false/ core name=providersearch loadOnStartup=true instanceDir=providersearch transient=false/ core name=tridioncomponents loadOnStartup=true instanceDir=tridioncomponents transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ /cores -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867147#comment-13867147 ] Uwe Schindler commented on SOLR-5621: - Hi Yonik, please don't be unfair to Tomás: You might be right, that this is too risky for the stable branch, but we have still LuSolr trunk, so I see no problem committing this (once its done) to Solr trunk. It can then bake for long time, until Solr 5.0 is released. You have to recognize that he opened this issue with affects/fix version 5.0. As Tomás describes: bq. Also, I think it would allow Solr to also use Lucene's SearcherLifetimeManager for searcher leases, which I think could allow Solr to use internal docids for distributed search instead of the unique key. This is a perfect use-case, although I am not sure if this would be easy. But a Job for another followup issue for Solr 5.0. bq. It was supposed to be. Hasn't exactly worked out well IMO. You are the only one that uses this statement. In my opinion the same Apache project worked perfectly: - We got a lot of additional per-segment stuff in Solr. - I helped a lot to get lot's of API changes in Lucene into Solr, e.g. the refactoring of Document, IndexReader. Others helped with TermsEnum,... - Better Analyzer support in Solr. Users don't need to write factories for stuff that's already in Lucene. Just plugin e.g. lucene-analysis-kuromoji into your lib/ folder and it automatically works thaks to SPI. If we would still have facories solely in Sold, one would have to write factories for all Lucene modules or we would need to ship with them with Solr Core (so dependiing on stuff like kuromoji the user don't needs). - All codec support was mostly written by (originally) Lucene committers. With your statement, you are the only person who fights against working together even more! Some examples: - The Facet module improved so much, why not allow to use it from Solr? To me it looks like you are against. Just because you would need to configure in the schema which fields you want facet on! The current Solr facetting uninverting all stuff is a disaster performance- and memory wise - Extracting factories from Solr: No you were against, because your enemy ES could use it - But we did it anyhow. That's good! And ES did not yet completely took over your code, so where is the problem? With the given possibilities for improvements and better maintainability of this code we are on the right path. I am sure with the new code maybe the crazy Solr failures hitting us all the time from Solr tests maybe get better (you know, the damnful: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=59 closes=58). Uwe Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867145#comment-13867145 ] Jan Høydahl commented on SOLR-5541: --- Great feature. Valid ids may contain commas. Should not this feature provide a way to elevate/exclude such docs? Either allow escaping, i.e. {{one\,id,another\,id}}, or allow configuring separator, i.e. {{elevate.sep=;}}. Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Key: SOLR-5541 URL: https://issues.apache.org/jira/browse/SOLR-5541 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.6 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch The QueryElevationComponent currently uses an xml file to map query strings to elevateIds and excludeIds. This ticket adds the ability to pass in elevateIds and excludeIds through two new http parameters elevateIds and excludeIds. This will allow more sophisticated business logic to be used in selecting which ids to elevate/exclude. Proposed syntax: http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8 The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5543) solr.xml duplicat eentries after SWAP 4.6
[ https://issues.apache.org/jira/browse/SOLR-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867155#comment-13867155 ] ASF subversion and git services commented on SOLR-5543: --- Commit 1556965 from [~elyograg] in branch 'dev/branches/lucene_solr_4_6' [ https://svn.apache.org/r1556965 ] SOLR-5543: Backport to 4.6 branch, for 4.6.1 release. solr.xml duplicat eentries after SWAP 4.6 - Key: SOLR-5543 URL: https://issues.apache.org/jira/browse/SOLR-5543 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Bill Bell Assignee: Alan Woodward Fix For: 5.0, 4.7 Attachments: SOLR-5543.patch We are having issues with SWAP CoreAdmin in 4.6. Using legacy solr.xml we issue a COreodmin SWAP, and we want it persistent. It has been running flawless since 4.5. Now it creates duplicate lines in solr.xml. Even the example multi core schema in doesn't work with persistent=true - it creates duplicate lines in solr.xml. cores adminPath=/admin/cores core name=autosuggest loadOnStartup=true instanceDir=autosuggest transient=false/ core name=citystateprovider loadOnStartup=true instanceDir=citystateprovider transient=false/ core name=collection1 loadOnStartup=true instanceDir=collection1 transient=false/ core name=facility loadOnStartup=true instanceDir=facility transient=false/ core name=inactiveproviders loadOnStartup=true instanceDir=inactiveproviders transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ core name=locationgeo loadOnStartup=true instanceDir=locationgeo transient=false/ core name=market loadOnStartup=true instanceDir=market transient=false/ core name=portalprovider loadOnStartup=true instanceDir=portalprovider transient=false/ core name=practice loadOnStartup=true instanceDir=practice transient=false/ core name=provider loadOnStartup=true instanceDir=provider transient=false/ core name=providersearch loadOnStartup=true instanceDir=providersearch transient=false/ core name=tridioncomponents loadOnStartup=true instanceDir=tridioncomponents transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ /cores -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867147#comment-13867147 ] Uwe Schindler edited comment on SOLR-5621 at 1/9/14 10:08 PM: -- Hi Yonik, please don't be unfair to Tomás: You might be right, that this is too risky for the stable branch, but we have still LuSolr trunk, so I see no problem committing this (once its done) to Solr trunk. It can then bake for long time, until Solr 5.0 is released. You have to recognize that he opened this issue with affects/fix version 5.0. As Tomás describes: bq. Also, I think it would allow Solr to also use Lucene's SearcherLifetimeManager for searcher leases, which I think could allow Solr to use internal docids for distributed search instead of the unique key. This is a perfect use-case, although I am not sure if this would be easy. But a Job for another followup issue for Solr 5.0. bq. It was supposed to be. Hasn't exactly worked out well IMO. You are the only one that uses this statement. In my opinion the same Apache project worked perfectly: - We got a lot of additional per-segment stuff in Solr. - I helped a lot to get lot's of API changes in Lucene into Solr, e.g. the refactoring of Document, IndexReader. Others helped with TermsEnum,... - Better Analyzer support in Solr. Users don't need to write factories for stuff that's already in Lucene. Just plugin e.g. {{lucene-analysis-kuromoji.jar}} into your lib/ folder and it automatically works thanks to SPI. If we would still have factories solely in Solr, one would have to write factories for all Lucene modules or we would need to ship with them with Solr Core (so dependiing on stuff like kuromoji the user don't needs). - All codec support was mostly written by (originally) Lucene committers. With your statement, you are the only person who fights against working together even more! Some examples: - The Facet module improved so much, why not allow to use it from Solr? To me it looks like you are against. Just because you would need to configure in the schema which fields you want facet on! The current Solr facetting uninverting all stuff is a disaster performance- and memory wise - Extracting factories from Solr: No you were against, because your enemy ES could use it - But we did it anyhow. That's good! And ES did not yet completely took over your code, so where is the problem? With the given possibilities for improvements and better maintainability of this code we are on the right path. I am sure with the new code maybe the crazy Solr failures hitting us all the time from Solr tests maybe get better (you know, the damnful: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=59 closes=58). Uwe was (Author: thetaphi): Hi Yonik, please don't be unfair to Tomás: You might be right, that this is too risky for the stable branch, but we have still LuSolr trunk, so I see no problem committing this (once its done) to Solr trunk. It can then bake for long time, until Solr 5.0 is released. You have to recognize that he opened this issue with affects/fix version 5.0. As Tomás describes: bq. Also, I think it would allow Solr to also use Lucene's SearcherLifetimeManager for searcher leases, which I think could allow Solr to use internal docids for distributed search instead of the unique key. This is a perfect use-case, although I am not sure if this would be easy. But a Job for another followup issue for Solr 5.0. bq. It was supposed to be. Hasn't exactly worked out well IMO. You are the only one that uses this statement. In my opinion the same Apache project worked perfectly: - We got a lot of additional per-segment stuff in Solr. - I helped a lot to get lot's of API changes in Lucene into Solr, e.g. the refactoring of Document, IndexReader. Others helped with TermsEnum,... - Better Analyzer support in Solr. Users don't need to write factories for stuff that's already in Lucene. Just plugin e.g. lucene-analysis-kuromoji into your lib/ folder and it automatically works thaks to SPI. If we would still have facories solely in Sold, one would have to write factories for all Lucene modules or we would need to ship with them with Solr Core (so dependiing on stuff like kuromoji the user don't needs). - All codec support was mostly written by (originally) Lucene committers. With your statement, you are the only person who fights against working together even more! Some examples: - The Facet module improved so much, why not allow to use it from Solr? To me it looks like you are against. Just because you would need to configure in the schema which fields you want facet on! The current Solr facetting uninverting all stuff is a disaster performance- and memory wise - Extracting factories from Solr: No you were against, because your enemy ES could use it - But we did it anyhow. That's good! And ES did
[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry
[ https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867165#comment-13867165 ] Shawn Heisey commented on SOLR-5615: Noted while backporting SOLR-5543 to the 4.6 branch: In the trunk CHANGES.txt file for trunk, this issue number shows up in the 4.6.1 section, but does not appear to have been actually backported to the 4.6 branch yet. Deadlock while trying to recover after a ZK session expiry -- Key: SOLR-5615 URL: https://issues.apache.org/jira/browse/SOLR-5615 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.4, 4.5, 4.6 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch The sequence of events which might trigger this is as follows: - Leader of a shard, say OL, has a ZK expiry - The new leader, NL, starts the election process - NL, through Overseer, clears the current leader (OL) for the shard from the cluster state - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread) - OL marks itself down - OL sets up watches for cluster state, and then retrieves it (with no leader for this shard) - NL, through Overseer, updates cluster state to mark itself leader for the shard - OL tries to register itself as a replica, and waits till the cluster state is updated with the new leader from event thread - ZK sends a watch update to OL, but it is blocked on the event thread waiting for it. Oops. This finally breaks out after trying to register itself as replica times out after 20 mins. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5543) solr.xml duplicat eentries after SWAP 4.6
[ https://issues.apache.org/jira/browse/SOLR-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867167#comment-13867167 ] ASF subversion and git services commented on SOLR-5543: --- Commit 1556968 from [~elyograg] in branch 'dev/trunk' [ https://svn.apache.org/r1556968 ] SOLR-5543: move changes entry from 4.7.0 to 4.6.1. solr.xml duplicat eentries after SWAP 4.6 - Key: SOLR-5543 URL: https://issues.apache.org/jira/browse/SOLR-5543 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Bill Bell Assignee: Alan Woodward Fix For: 5.0, 4.7 Attachments: SOLR-5543.patch We are having issues with SWAP CoreAdmin in 4.6. Using legacy solr.xml we issue a COreodmin SWAP, and we want it persistent. It has been running flawless since 4.5. Now it creates duplicate lines in solr.xml. Even the example multi core schema in doesn't work with persistent=true - it creates duplicate lines in solr.xml. cores adminPath=/admin/cores core name=autosuggest loadOnStartup=true instanceDir=autosuggest transient=false/ core name=citystateprovider loadOnStartup=true instanceDir=citystateprovider transient=false/ core name=collection1 loadOnStartup=true instanceDir=collection1 transient=false/ core name=facility loadOnStartup=true instanceDir=facility transient=false/ core name=inactiveproviders loadOnStartup=true instanceDir=inactiveproviders transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ core name=locationgeo loadOnStartup=true instanceDir=locationgeo transient=false/ core name=market loadOnStartup=true instanceDir=market transient=false/ core name=portalprovider loadOnStartup=true instanceDir=portalprovider transient=false/ core name=practice loadOnStartup=true instanceDir=practice transient=false/ core name=provider loadOnStartup=true instanceDir=provider transient=false/ core name=providersearch loadOnStartup=true instanceDir=providersearch transient=false/ core name=tridioncomponents loadOnStartup=true instanceDir=tridioncomponents transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ /cores -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5543) solr.xml duplicat eentries after SWAP 4.6
[ https://issues.apache.org/jira/browse/SOLR-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated SOLR-5543: --- Fix Version/s: 4.6.1 solr.xml duplicat eentries after SWAP 4.6 - Key: SOLR-5543 URL: https://issues.apache.org/jira/browse/SOLR-5543 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Bill Bell Assignee: Alan Woodward Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5543.patch We are having issues with SWAP CoreAdmin in 4.6. Using legacy solr.xml we issue a COreodmin SWAP, and we want it persistent. It has been running flawless since 4.5. Now it creates duplicate lines in solr.xml. Even the example multi core schema in doesn't work with persistent=true - it creates duplicate lines in solr.xml. cores adminPath=/admin/cores core name=autosuggest loadOnStartup=true instanceDir=autosuggest transient=false/ core name=citystateprovider loadOnStartup=true instanceDir=citystateprovider transient=false/ core name=collection1 loadOnStartup=true instanceDir=collection1 transient=false/ core name=facility loadOnStartup=true instanceDir=facility transient=false/ core name=inactiveproviders loadOnStartup=true instanceDir=inactiveproviders transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ core name=locationgeo loadOnStartup=true instanceDir=locationgeo transient=false/ core name=market loadOnStartup=true instanceDir=market transient=false/ core name=portalprovider loadOnStartup=true instanceDir=portalprovider transient=false/ core name=practice loadOnStartup=true instanceDir=practice transient=false/ core name=provider loadOnStartup=true instanceDir=provider transient=false/ core name=providersearch loadOnStartup=true instanceDir=providersearch transient=false/ core name=tridioncomponents loadOnStartup=true instanceDir=tridioncomponents transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ /cores -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5543) solr.xml duplicat eentries after SWAP 4.6
[ https://issues.apache.org/jira/browse/SOLR-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867170#comment-13867170 ] ASF subversion and git services commented on SOLR-5543: --- Commit 1556969 from [~elyograg] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1556969 ] SOLR-5543: Move changes entry from 4.7.0 to 4.6.1 (merge trunk r1556968) solr.xml duplicat eentries after SWAP 4.6 - Key: SOLR-5543 URL: https://issues.apache.org/jira/browse/SOLR-5543 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Bill Bell Assignee: Alan Woodward Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5543.patch We are having issues with SWAP CoreAdmin in 4.6. Using legacy solr.xml we issue a COreodmin SWAP, and we want it persistent. It has been running flawless since 4.5. Now it creates duplicate lines in solr.xml. Even the example multi core schema in doesn't work with persistent=true - it creates duplicate lines in solr.xml. cores adminPath=/admin/cores core name=autosuggest loadOnStartup=true instanceDir=autosuggest transient=false/ core name=citystateprovider loadOnStartup=true instanceDir=citystateprovider transient=false/ core name=collection1 loadOnStartup=true instanceDir=collection1 transient=false/ core name=facility loadOnStartup=true instanceDir=facility transient=false/ core name=inactiveproviders loadOnStartup=true instanceDir=inactiveproviders transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ core name=locationgeo loadOnStartup=true instanceDir=locationgeo transient=false/ core name=market loadOnStartup=true instanceDir=market transient=false/ core name=portalprovider loadOnStartup=true instanceDir=portalprovider transient=false/ core name=practice loadOnStartup=true instanceDir=practice transient=false/ core name=provider loadOnStartup=true instanceDir=provider transient=false/ core name=providersearch loadOnStartup=true instanceDir=providersearch transient=false/ core name=tridioncomponents loadOnStartup=true instanceDir=tridioncomponents transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ /cores -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867172#comment-13867172 ] Yonik Seeley commented on SOLR-5621: There's boatloads of FUD in your response Uwe, but I'm too tired of the politics to respond to them all. Solr support for lucene faceting doesn't exist because no one has developed a patch yet. Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Fix For: 5.0 Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-5621: --- Fix Version/s: 5.0 Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Fix For: 5.0 Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5615) Deadlock while trying to recover after a ZK session expiry
[ https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867180#comment-13867180 ] Mark Miller commented on SOLR-5615: --- Yeah, I started it, but turns out it's difficult without backporting another fix first. Deadlock while trying to recover after a ZK session expiry -- Key: SOLR-5615 URL: https://issues.apache.org/jira/browse/SOLR-5615 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.4, 4.5, 4.6 Reporter: Ramkumar Aiyengar Assignee: Mark Miller Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch The sequence of events which might trigger this is as follows: - Leader of a shard, say OL, has a ZK expiry - The new leader, NL, starts the election process - NL, through Overseer, clears the current leader (OL) for the shard from the cluster state - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread) - OL marks itself down - OL sets up watches for cluster state, and then retrieves it (with no leader for this shard) - NL, through Overseer, updates cluster state to mark itself leader for the shard - OL tries to register itself as a replica, and waits till the cluster state is updated with the new leader from event thread - ZK sends a watch update to OL, but it is blocked on the event thread waiting for it. Oops. This finally breaks out after trying to register itself as replica times out after 20 mins. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5379) Query-time multi-word synonym expansion
[ https://issues.apache.org/jira/browse/SOLR-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867202#comment-13867202 ] Nolan Lawson commented on SOLR-5379: +1 as well. Tien's patch also seems to be a better candidate seeing as they included Java tests, whereas my tests are in Python 'cuz I was lazy. :) Query-time multi-word synonym expansion --- Key: SOLR-5379 URL: https://issues.apache.org/jira/browse/SOLR-5379 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Tien Nguyen Manh Labels: multi-word, queryparser, synonym Fix For: 4.7 Attachments: quoted.patch, synonym-expander.patch While dealing with synonym at query time, solr failed to work with multi-word synonyms due to some reasons: - First the lucene queryparser tokenizes user query by space so it split multi-word term into two terms before feeding to synonym filter, so synonym filter can't recognized multi-word term to do expansion - Second, if synonym filter expand into multiple terms which contains multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to handle synonyms. But MultiPhraseQuery don't work with term have different number of words. For the first one, we can extend quoted all multi-word synonym in user query so that lucene queryparser don't split it. There are a jira task related to this one https://issues.apache.org/jira/browse/LUCENE-2605. For the second, we can replace MultiPhraseQuery by an appropriate BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream have multi-word synonym. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867218#comment-13867218 ] Michael McCandless commented on SOLR-5621: -- +1 to do this in trunk, and give it time to bake. A future cutover to SearcherLifetimeManager makes sense too; then Solr doesn't need to load stored documents to get the id field to reference documents anymore. Just use the searcher version + docID. Refactoring code is a healthy and ongoing process in good open-source projects, like ours. Yes, there is short-term risk of instability, but over time this trades off for a stronger long-term design for Solr. Solr should also be [more] per-segment, use Lucene Filters, etc. Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Fix For: 5.0 Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867226#comment-13867226 ] Michael McCandless commented on SOLR-5621: -- bq. Solr support for lucene faceting doesn't exist because no one has developed a patch yet. In fact, cutting over to SearcherManager is a good step towards adding Lucene facets to Solr: the jump from SearcherManager (a ReferenceManagerIS) to SearcherTaxonomyManager (a ReferenceManagerIS + TaxoReader) is easy. Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Fix For: 5.0 Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867246#comment-13867246 ] Yonik Seeley commented on SOLR-5541: The canonical way to do this in Solr is a StrUtils.splitSmart variant (the second one that doesn't do quotes I imagine) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Key: SOLR-5541 URL: https://issues.apache.org/jira/browse/SOLR-5541 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.6 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch The QueryElevationComponent currently uses an xml file to map query strings to elevateIds and excludeIds. This ticket adds the ability to pass in elevateIds and excludeIds through two new http parameters elevateIds and excludeIds. This will allow more sophisticated business logic to be used in selecting which ids to elevate/exclude. Proposed syntax: http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8 The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5618) false query result cache hits possible when duplicate filter queries exist in one query -- discovered via: Reproducible failure from TestFiltering.testRandomFiltering
[ https://issues.apache.org/jira/browse/SOLR-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5618: --- Attachment: SOLR-5618.patch bq. Probably most efficient for small lists would be to make a copy of one list and then remove equivalent elements as they are found. Attached patch fixes the bug and adds some randomized testng on the QueryResultKey equality comparisons ensuring that both the positive and negative situations are covered. (I'm still running full tests, but unless there are any objections i'll probably commit backport to 4.6.1 ASAP) false query result cache hits possible when duplicate filter queries exist in one query -- discovered via: Reproducible failure from TestFiltering.testRandomFiltering -- Key: SOLR-5618 URL: https://issues.apache.org/jira/browse/SOLR-5618 Project: Solr Issue Type: Bug Affects Versions: 4.5, 4.5.1, 4.6 Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.6.1 Attachments: SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch SOLR-5057 introduced a bug in queryResultCaching such that the following circumstances can result in a false cache hit... * identical main query in both requests * identical number of filter queries in both requests * filter query from one request exists multiple times in other request * sum of hashCodes for all filter queries is equal in both request Details of how this problem was initially uncovered listed below... uwe's jenkins found this in java8... http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestFiltering -Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY -Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8 [junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering [junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true tag=t}-_query_:{!frange v=val_i l=1 u=1}] [junit4] at __randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0) [junit4] at org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327) {noformat} The seed fails consistently for me on trunk using java7, and on 4x using both java7 and java6 - details to follow in comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5617) Default SolrResourceLoader restrictions may be too tight
[ https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-5617: Summary: Default SolrResourceLoader restrictions may be too tight (was: Default classloader restrictions may be too tight) Default SolrResourceLoader restrictions may be too tight Key: SOLR-5617 URL: https://issues.apache.org/jira/browse/SOLR-5617 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Shawn Heisey Priority: Minor Labels: security Fix For: 5.0, 4.7 SOLR-4882 introduced restrictions for the Solr class loader that cause resources outside the instanceDir to fail to load. This is a very good goal, but what if you have common resources like included config files that are outside instanceDir but are still fully inside the solr home? I can understand not wanting to load resources from an arbitrary path, but the solr home and its children should be about as trustworthy as instanceDir. Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted automatically. If I need to define a system property to make this happen, I'm OK with that -- as long as I don't have to turn off the safety checking entirely. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867272#comment-13867272 ] Ryan McKinley commented on SOLR-5621: - Refactoring code is a healthy and ongoing process in good open-source projects, like ours. Yes, there is short-term risk of instability, but over time this trades off for a stronger long-term design for Solr. +1 for trunk Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Fix For: 5.0 Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867274#comment-13867274 ] Hoss Man commented on SOLR-5463: I haven't seen any negative feedback or suspicious jenkins failures, so unless someone sees a problem i'll start backporting tomorrow. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 5.0 Attachments: SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5541) Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters
[ https://issues.apache.org/jira/browse/SOLR-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867279#comment-13867279 ] Joel Bernstein commented on SOLR-5541: -- Looks like splitSmart will give us \ escapes. I'll slide that in. Allow QueryElevationComponent to accept elevateIds and excludeIds as http parameters Key: SOLR-5541 URL: https://issues.apache.org/jira/browse/SOLR-5541 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 4.6 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch, SOLR-5541.patch The QueryElevationComponent currently uses an xml file to map query strings to elevateIds and excludeIds. This ticket adds the ability to pass in elevateIds and excludeIds through two new http parameters elevateIds and excludeIds. This will allow more sophisticated business logic to be used in selecting which ids to elevate/exclude. Proposed syntax: http://localhost:8983/solr/elevate?q=*:*elevatedIds=3,4excludeIds=6,8 The elevateIds and excludeIds point to the unique document Id. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5617) Default SolrResourceLoader restrictions may be too tight
[ https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867287#comment-13867287 ] Uwe Schindler commented on SOLR-5617: - Hi Shawn, in fact the code was written exactly to support symbolic links! So your workaround is actually wanted. The idea of also using the Solr Home directory is theoretically possible, if you would extend SolrResourceLoader.getResource to also look in the parent ResourceLoader. There is already work done that this may work in the future (if ResourceLoaders would have the same parent-child relations like ClassLoaders), but currently its not easy possible. There is currently also another elegant workaround: If the file is not in the config dir directly, SolrResourceLoader looks in the classpath (through Core's ClassLoader) and tries to find the file from there. So the easiest for you is to add the shared directory as additional lib folder to the solrconfig.xml of all cores. You may need to pack the files as JAR, but we can improve solr here, that it might also accept non-jared class path components for lib directives. Thats in fact the most clean solution, also working on windows without symlinks. Also this is easy for the user to understand: Just add another lib / classes / whatevername folder where your shared config files are. Default SolrResourceLoader restrictions may be too tight Key: SOLR-5617 URL: https://issues.apache.org/jira/browse/SOLR-5617 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Shawn Heisey Priority: Minor Labels: security Fix For: 5.0, 4.7 SOLR-4882 introduced restrictions for the Solr class loader that cause resources outside the instanceDir to fail to load. This is a very good goal, but what if you have common resources like included config files that are outside instanceDir but are still fully inside the solr home? I can understand not wanting to load resources from an arbitrary path, but the solr home and its children should be about as trustworthy as instanceDir. Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted automatically. If I need to define a system property to make this happen, I'm OK with that -- as long as I don't have to turn off the safety checking entirely. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5617) Default SolrResourceLoader restrictions may be too tight
[ https://issues.apache.org/jira/browse/SOLR-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-5617: Fix Version/s: (was: 4.7) Issue Type: Task (was: Bug) Default SolrResourceLoader restrictions may be too tight Key: SOLR-5617 URL: https://issues.apache.org/jira/browse/SOLR-5617 Project: Solr Issue Type: Task Affects Versions: 4.6 Reporter: Shawn Heisey Priority: Minor Labels: security Fix For: 5.0 SOLR-4882 introduced restrictions for the Solr class loader that cause resources outside the instanceDir to fail to load. This is a very good goal, but what if you have common resources like included config files that are outside instanceDir but are still fully inside the solr home? I can understand not wanting to load resources from an arbitrary path, but the solr home and its children should be about as trustworthy as instanceDir. Ideally I'd like to have anything that's in $\{solr.solr.home\} trusted automatically. If I need to define a system property to make this happen, I'm OK with that -- as long as I don't have to turn off the safety checking entirely. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867291#comment-13867291 ] Chris Male commented on SOLR-5621: -- +1 for trunk Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Fix For: 5.0 Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5618) false query result cache hits possible when duplicate filter queries exist in one query -- discovered via: Reproducible failure from TestFiltering.testRandomFiltering
[ https://issues.apache.org/jira/browse/SOLR-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867305#comment-13867305 ] ASF subversion and git services commented on SOLR-5618: --- Commit 1556988 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1556988 ] SOLR-5618: Fix false cache hits in queryResultCache when hashCodes are equal and duplicate filter queries exist in one of the requests false query result cache hits possible when duplicate filter queries exist in one query -- discovered via: Reproducible failure from TestFiltering.testRandomFiltering -- Key: SOLR-5618 URL: https://issues.apache.org/jira/browse/SOLR-5618 Project: Solr Issue Type: Bug Affects Versions: 4.5, 4.5.1, 4.6 Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.6.1 Attachments: SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch SOLR-5057 introduced a bug in queryResultCaching such that the following circumstances can result in a false cache hit... * identical main query in both requests * identical number of filter queries in both requests * filter query from one request exists multiple times in other request * sum of hashCodes for all filter queries is equal in both request Details of how this problem was initially uncovered listed below... uwe's jenkins found this in java8... http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestFiltering -Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY -Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8 [junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering [junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true tag=t}-_query_:{!frange v=val_i l=1 u=1}] [junit4] at __randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0) [junit4] at org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327) {noformat} The seed fails consistently for me on trunk using java7, and on 4x using both java7 and java6 - details to follow in comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867316#comment-13867316 ] Joel Bernstein commented on SOLR-5463: -- This is a great feature. I think this should work automatically with the CollapsingQParserPlugin so there's some grouping support. I'll do some testing on this to confirm. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 5.0 Attachments: SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5618) false query result cache hits possible when duplicate filter queries exist in one query -- discovered via: Reproducible failure from TestFiltering.testRandomFiltering
[ https://issues.apache.org/jira/browse/SOLR-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867325#comment-13867325 ] ASF subversion and git services commented on SOLR-5618: --- Commit 1556996 from hoss...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1556996 ] SOLR-5618: Fix false cache hits in queryResultCache when hashCodes are equal and duplicate filter queries exist in one of the requests (merge r1556988) false query result cache hits possible when duplicate filter queries exist in one query -- discovered via: Reproducible failure from TestFiltering.testRandomFiltering -- Key: SOLR-5618 URL: https://issues.apache.org/jira/browse/SOLR-5618 Project: Solr Issue Type: Bug Affects Versions: 4.5, 4.5.1, 4.6 Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.6.1 Attachments: SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch SOLR-5057 introduced a bug in queryResultCaching such that the following circumstances can result in a false cache hit... * identical main query in both requests * identical number of filter queries in both requests * filter query from one request exists multiple times in other request * sum of hashCodes for all filter queries is equal in both request Details of how this problem was initially uncovered listed below... uwe's jenkins found this in java8... http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestFiltering -Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY -Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8 [junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering [junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true tag=t}-_query_:{!frange v=val_i l=1 u=1}] [junit4] at __randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0) [junit4] at org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327) {noformat} The seed fails consistently for me on trunk using java7, and on 4x using both java7 and java6 - details to follow in comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867327#comment-13867327 ] Mark Miller commented on SOLR-5621: --- I don't think we can flat out reject refactoring or code contributions because they might destabilize code thats been around for a while. If we do that, Solr will not evolve properly. I sympathize with the idea that we don't want to add a lot of instability - I'm fighting that battle with SolrCloud while it's still in it's hardening phase. However, it's an argument that can easily be carried too far. Honestly, I wouldn't be that happy to have such a big change in 5 that is not in 4 - it starts making development and back porting a major pain. But the sad fact is, this is exactly what 5.0 is for and all about. I have not read the patch, so I don't know if I am for or against this, but simply sharing code with Lucene adds to the contributors to the code, so flat out, there are certainly some advantages. Perhaps some disadvantages too, but without a doubt, advantages. Anyway, I think we need to judge this on the technical merits of the final patch. Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Fix For: 5.0 Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867331#comment-13867331 ] Mark Miller commented on SOLR-5621: --- One way to make such a refactoring a bit more palatable IMO, is to add a lot to the testing around this rather than just relying on the existing tests... Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Fix For: 5.0 Attachments: SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5391) uax29urlemailtokenizer - unexpected tokenisation of index2.php (and other inputs)
[ https://issues.apache.org/jira/browse/LUCENE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867359#comment-13867359 ] Steve Rowe edited comment on LUCENE-5391 at 1/10/14 12:52 AM: -- I understand why index.php is not broken up: the URL rule matches index.ph, but the ALPHANUM rule has a longer match, so it wins. Conversely, ALPHANUM does not match index2.php (likely because the [number][period] sequence is not allowed), so the shorter URL match is tokenized. Another improperly broken-up filename-looking thing: index-h.php - the URL rule matches index-h.ph, but the ALPHANUM rule doesn't match (likely because of the hyphen). I think the fix here is to disallow URLs when there is no trailing port, path, query or fragment, and the following character is [-A-Za-z0-9] (allowable domain label characters). I'll make a patch. was (Author: steve_rowe): I understand why index.php is not broken up: the URL rule matches index.ph, but the ALPHANUM rule has a longer match, so it wins. Conversely, ALPHANUM does not match index2.php (likely because the {[number][period]} sequence is not allowed), so the shorter URL match is tokenized. Another improperly broken-up filename-looking thing: index-h.php - the URL rule matches index-h.ph, but the ALPHANUM rule doesn't match (likely because of the hyphen). I think the fix here is to disallow URLs when there is no trailing port, path, query or fragment, and the following character is [-A-Za-z0-9] (allowable domain label characters). I'll make a patch. uax29urlemailtokenizer - unexpected tokenisation of index2.php (and other inputs) --- Key: LUCENE-5391 URL: https://issues.apache.org/jira/browse/LUCENE-5391 Project: Lucene - Core Issue Type: Bug Reporter: Chris Geeringh The uax29urlemailtokenizer tokenises index2.php as: URL index2.ph ALPHANUM p While it does not do the same for index.php Screenshot from analyser: http://postimg.org/image/aj6c98n3b/ -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5391) uax29urlemailtokenizer - unexpected tokenisation of index2.php (and other inputs)
[ https://issues.apache.org/jira/browse/LUCENE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867359#comment-13867359 ] Steve Rowe commented on LUCENE-5391: I understand why index.php is not broken up: the URL rule matches index.ph, but the ALPHANUM rule has a longer match, so it wins. Conversely, ALPHANUM does not match index2.php (likely because the {[number][period]} sequence is not allowed), so the shorter URL match is tokenized. Another improperly broken-up filename-looking thing: index-h.php - the URL rule matches index-h.ph, but the ALPHANUM rule doesn't match (likely because of the hyphen). I think the fix here is to disallow URLs when there is no trailing port, path, query or fragment, and the following character is [-A-Za-z0-9] (allowable domain label characters). I'll make a patch. uax29urlemailtokenizer - unexpected tokenisation of index2.php (and other inputs) --- Key: LUCENE-5391 URL: https://issues.apache.org/jira/browse/LUCENE-5391 Project: Lucene - Core Issue Type: Bug Reporter: Chris Geeringh The uax29urlemailtokenizer tokenises index2.php as: URL index2.ph ALPHANUM p While it does not do the same for index.php Screenshot from analyser: http://postimg.org/image/aj6c98n3b/ -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5618) false query result cache hits possible when duplicate filter queries exist in one query -- discovered via: Reproducible failure from TestFiltering.testRandomFiltering
[ https://issues.apache.org/jira/browse/SOLR-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867398#comment-13867398 ] ASF subversion and git services commented on SOLR-5618: --- Commit 1557008 from hoss...@apache.org in branch 'dev/branches/lucene_solr_4_6' [ https://svn.apache.org/r1557008 ] SOLR-5618: Fix false cache hits in queryResultCache when hashCodes are equal and duplicate filter queries exist in one of the requests (merge r1556988) false query result cache hits possible when duplicate filter queries exist in one query -- discovered via: Reproducible failure from TestFiltering.testRandomFiltering -- Key: SOLR-5618 URL: https://issues.apache.org/jira/browse/SOLR-5618 Project: Solr Issue Type: Bug Affects Versions: 4.5, 4.5.1, 4.6 Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.6.1 Attachments: SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch SOLR-5057 introduced a bug in queryResultCaching such that the following circumstances can result in a false cache hit... * identical main query in both requests * identical number of filter queries in both requests * filter query from one request exists multiple times in other request * sum of hashCodes for all filter queries is equal in both request Details of how this problem was initially uncovered listed below... uwe's jenkins found this in java8... http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestFiltering -Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY -Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8 [junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering [junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true tag=t}-_query_:{!frange v=val_i l=1 u=1}] [junit4] at __randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0) [junit4] at org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327) {noformat} The seed fails consistently for me on trunk using java7, and on 4x using both java7 and java6 - details to follow in comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5618) false query result cache hits possible when duplicate filter queries exist in one query -- discovered via: Reproducible failure from TestFiltering.testRandomFiltering
[ https://issues.apache.org/jira/browse/SOLR-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-5618. Resolution: Fixed Fix Version/s: 4.7 5.0 false query result cache hits possible when duplicate filter queries exist in one query -- discovered via: Reproducible failure from TestFiltering.testRandomFiltering -- Key: SOLR-5618 URL: https://issues.apache.org/jira/browse/SOLR-5618 Project: Solr Issue Type: Bug Affects Versions: 4.5, 4.5.1, 4.6 Reporter: Hoss Man Assignee: Hoss Man Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch, SOLR-5618.patch SOLR-5057 introduced a bug in queryResultCaching such that the following circumstances can result in a false cache hit... * identical main query in both requests * identical number of filter queries in both requests * filter query from one request exists multiple times in other request * sum of hashCodes for all filter queries is equal in both request Details of how this problem was initially uncovered listed below... uwe's jenkins found this in java8... http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9004/consoleText {noformat} [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestFiltering -Dtests.method=testRandomFiltering -Dtests.seed=C22042E80957AE3E -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=ar_LY -Dtests.timezone=Asia/Katmandu -Dtests.file.encoding=UTF-8 [junit4] FAILURE 16.9s J1 | TestFiltering.testRandomFiltering [junit4] Throwable #1: java.lang.AssertionError: FAILURE: iiter=11 qiter=336 request=[q, {!frange v=val_i l=0 u=1 cost=139 tag=t}, fq, {!frange v=val_i l=0 u=1}, fq, {! cost=92}-_query_:{!frange v=val_i l=1 u=1}, fq, {!frange v=val_i l=0 u=1 cache=true tag=t}, fq, {! cache=true tag=t}-_query_:{!frange v=val_i l=1 u=1}] [junit4] at __randomizedtesting.SeedInfo.seed([C22042E80957AE3E:DD43E12DEC70EE37]:0) [junit4] at org.apache.solr.search.TestFiltering.testRandomFiltering(TestFiltering.java:327) {noformat} The seed fails consistently for me on trunk using java7, and on 4x using both java7 and java6 - details to follow in comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5389) Even more doc for construction of TokenStream components
[ https://issues.apache.org/jira/browse/LUCENE-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867429#comment-13867429 ] Robert Muir commented on LUCENE-5389: - OK, i took a look. I had to make a fix for documentation-lint to pass, basically it didnt like the multiline \{@code} element you had for the code sample, because 'javadoc' would give an error that it couldnt find the closing brace. Maybe the \{@override} was messing it up. In general i've never used multiline \{@code} before... Anyway i just made it consistent with other code samples by doing this: {code} pre class=prettyprint public class ForwardingTokenizer extends Tokenizer { private Tokenizer delegate; ... {@literal @Override} public void reset() { super.reset(); delegate.setReader(this.input); delegate.reset(); } } /pre {code} The class=prettyprint gives colored syntax highlighting in the javadocs, and the override is escaped with literal. At least these are the way the others are done. I'm committing this. Do you want to make a patch to trunk-only to update the 5.x docs with respect to LUCENE-5388? Stuff like (A future release of Apache Lucene may remove the reader parameters from the Tokenizer constructors.) Thanks! Even more doc for construction of TokenStream components Key: LUCENE-5389 URL: https://issues.apache.org/jira/browse/LUCENE-5389 Project: Lucene - Core Issue Type: Improvement Reporter: Benson Margulies There are more useful things to tell would-be authors of tokenizers. Let's tell them. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5389) Even more doc for construction of TokenStream components
[ https://issues.apache.org/jira/browse/LUCENE-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867430#comment-13867430 ] ASF subversion and git services commented on LUCENE-5389: - Commit 1557010 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1557010 ] LUCENE-5389: Add more guidance in the analyis documentation package overview (closes #14) Even more doc for construction of TokenStream components Key: LUCENE-5389 URL: https://issues.apache.org/jira/browse/LUCENE-5389 Project: Lucene - Core Issue Type: Improvement Reporter: Benson Margulies There are more useful things to tell would-be authors of tokenizers. Let's tell them. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene-solr pull request: LUCENE-5389: more analysis advice.
Github user benson-basis closed the pull request at: https://github.com/apache/lucene-solr/pull/14 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5389) Even more doc for construction of TokenStream components
[ https://issues.apache.org/jira/browse/LUCENE-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5389. - Resolution: Fixed Fix Version/s: 4.7 5.0 Even more doc for construction of TokenStream components Key: LUCENE-5389 URL: https://issues.apache.org/jira/browse/LUCENE-5389 Project: Lucene - Core Issue Type: Improvement Reporter: Benson Margulies Fix For: 5.0, 4.7 There are more useful things to tell would-be authors of tokenizers. Let's tell them. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5389) Even more doc for construction of TokenStream components
[ https://issues.apache.org/jira/browse/LUCENE-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867431#comment-13867431 ] ASF subversion and git services commented on LUCENE-5389: - Commit 1557011 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1557011 ] LUCENE-5389: Add more guidance in the analyis documentation package overview (closes #14) Even more doc for construction of TokenStream components Key: LUCENE-5389 URL: https://issues.apache.org/jira/browse/LUCENE-5389 Project: Lucene - Core Issue Type: Improvement Reporter: Benson Margulies Fix For: 5.0, 4.7 There are more useful things to tell would-be authors of tokenizers. Let's tell them. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org