[jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis
[ https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-4619: - Attachment: SOLR-4619.patch Updated patch, fixes PreAnalyzedFieldTest.testInvalidJson() to properly initialize its PreAnalyzedField, and to call reset() on the token streams created with the invalid JSON snippets, so that the exceptions triggered by the invalid JSON and stored during token stream creation are appropriately thrown. Also tests that PreAnalyzedAnalyzer can be reused with valid input after having been fed invalid input. > Improve PreAnalyzedField query analysis > --- > > Key: SOLR-4619 > URL: https://issues.apache.org/jira/browse/SOLR-4619 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.0, 4.1, 4.2, 4.2.1, Trunk >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki > Fix For: Trunk > > Attachments: SOLR-4619.patch, SOLR-4619.patch, SOLR-4619.patch, > SOLR-4619.patch > > > PreAnalyzed field extends plain FieldType and mistakenly uses the > DefaultAnalyzer as query analyzer, and doesn't allow for customization via > schema elements. > Instead it should extend TextField and support all query analysis supported > by that type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis
[ https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-4619: - Attachment: SOLR-4619.patch {quote} bq. A new analyzer class employing PreAnalyzedTokenizer could override initReader() or setReader(). I'll try with setReader(), since the docs for initReader() are focused on reader conditioning via char filters. I was referring to TokenStreamComponents.setReader() here, which is called as part of Analyzer.tokenStream(): A subclass created in the new analyzer's overridden createComponents() could call a new method on PreAnalyzedTokenizer to consume the input reader and in so doing provide the attributes. {quote} Patch implementing the idea, splitting reader consumption out from reset() into its own method: decodeInput(). This method first removes all attributes from PreAnalyzedTokenizer's AttributeSource, then adds needed ones as a side effect of parsing the input. There is a kludge here: because TokenStreamComponents.setReader() doesn't throw an exception, PreAnalyzedAnalyzer overrides createComponents() to create a TokenStreamComponents instance that catches and stores exceptions encountered during reader consumption with the stream's PreAnalyzedTokenizer instance, whose reset() method will then throw the stored exception, if any. With this patch, PreAnalyzedAnalyzer can be reused; previously PreAnalyzedTokenizer reuse would ignore new input and re-emit tokens deserialized from the initial input. With this patch, PreAnalyzedField analysis works like this: # If a query analyzer is specified in the schema then it will be used at query time. # If an analyzer is specified in the schema with no type (i.e., it is neither of "index" nor "query" type), then this analyzer will be used for query parsing, but will be ignored at index time. # If only an analyzer of "index" type is specified in the schema, then this analyzer will be used for query parsing, but will be ignored at index time. This patch adds a new method removeAllAttributes() to AttributeSource, to support reuse of token streams with variable attributes, like PreAnalyzedTokenizer. I think it's ready to go. > Improve PreAnalyzedField query analysis > --- > > Key: SOLR-4619 > URL: https://issues.apache.org/jira/browse/SOLR-4619 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.0, 4.1, 4.2, 4.2.1, Trunk >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki > Fix For: Trunk > > Attachments: SOLR-4619.patch, SOLR-4619.patch, SOLR-4619.patch > > > PreAnalyzed field extends plain FieldType and mistakenly uses the > DefaultAnalyzer as query analyzer, and doesn't allow for customization via > schema elements. > Instead it should extend TextField and support all query analysis supported > by that type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis
[ https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-4619: - Attachment: SOLR-4619.patch Patch that brings Andrzej's patch up to date with trunk, and adds tests for query-time functionality. I had assumed that {{PreAnalyzedField}}-s would use the {{PreAnalyzedTokenizer}} at query time, but that is not (currently) the case: instead {{FieldType.DefaultAnalyzer}} is used. This patch changes the behavior when no analyzer is specified to instead use {{PreAnalyzedTokenizer}}. However, there is a chicken-and-egg interaction between {{PreAnalyzedTokenizer}} and {{QueryBuilder.createFieldQuery()}}, which aborts before performing any tokenization if the supplied analyzer's attribute factory doesn't contain a {{TermToBytesRefAttribute}}. But {{PreAnalyzedTokenizer}} doesn't have any attributes defined until the input stream is consumed, in {{reset()}}. [~rcmuir] added a comment as part of LUCENE-5388 to {{PreAnalyzedTokenizer}}'s ctor, where {{AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY}} is set as the attribute factory rather than the default packed implementation: "we don't pack attributes: since we are used for (de)serialization and dont want bloat." This patch moves the {{stream.reset()}} call in {{QueryBuilder.createFieldQuery()}} in front of the {{TermToBytesRefAttribute}} check, so that {{PreAnalyzedTokenizer}} (and other tokenizers that don't have a pre-added set of attributes) and also moves the {{addAttribute(PositionIncrementAttribute.class)}} call to after the the {{TermToBytesRefAttribute}} check. An alternate approach to fix the chicken-and-egg problem might be to have {{PreAnalyzedTokenizer}} always include a dummy {{TermToBytesRefAttribute}} implementation, and then remove it when {{reset()}} is called, but that seems hackish. I haven't run the full tests yet with this patch, but the included query-time {{PreAnalyzedField}} tests success. I welcome feedback. > Improve PreAnalyzedField query analysis > --- > > Key: SOLR-4619 > URL: https://issues.apache.org/jira/browse/SOLR-4619 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.0, 4.1, 4.2, 4.2.1, Trunk >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki > Fix For: Trunk > > Attachments: SOLR-4619.patch, SOLR-4619.patch > > > PreAnalyzed field extends plain FieldType and mistakenly uses the > DefaultAnalyzer as query analyzer, and doesn't allow for customization via > schema elements. > Instead it should extend TextField and support all query analysis supported > by that type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis
[ https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-4619: Attachment: SOLR-4619.patch Improve PreAnalyzedField query analysis --- Key: SOLR-4619 URL: https://issues.apache.org/jira/browse/SOLR-4619 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0, 4.1, 4.2, 5.0, 4.2.1 Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 5.0, 4.2.1 Attachments: SOLR-4619.patch PreAnalyzed field extends plain FieldType and mistakenly uses the DefaultAnalyzer as query analyzer, and doesn't allow for customization via analyzer schema elements. Instead it should extend TextField and support all query analysis supported by that type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis
[ https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-4619: Fix Version/s: (was: 4.2.1) Removing 4.2.1 from Fix version - this apparently needs more discussion. Improve PreAnalyzedField query analysis --- Key: SOLR-4619 URL: https://issues.apache.org/jira/browse/SOLR-4619 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0, 4.1, 4.2, 5.0, 4.2.1 Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 5.0 Attachments: SOLR-4619.patch PreAnalyzed field extends plain FieldType and mistakenly uses the DefaultAnalyzer as query analyzer, and doesn't allow for customization via analyzer schema elements. Instead it should extend TextField and support all query analysis supported by that type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org