[jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis

2016-01-19 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-4619:
-
Attachment: SOLR-4619.patch

Updated patch, fixes PreAnalyzedFieldTest.testInvalidJson() to properly 
initialize its PreAnalyzedField, and to call reset() on the token streams 
created with the invalid JSON snippets, so that the exceptions triggered by the 
invalid JSON and stored during token stream creation are appropriately thrown.  
Also tests that PreAnalyzedAnalyzer can be reused with valid input after having 
been fed invalid input. 

> Improve PreAnalyzedField query analysis
> ---
>
> Key: SOLR-4619
> URL: https://issues.apache.org/jira/browse/SOLR-4619
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.0, 4.1, 4.2, 4.2.1, Trunk
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
> Fix For: Trunk
>
> Attachments: SOLR-4619.patch, SOLR-4619.patch, SOLR-4619.patch, 
> SOLR-4619.patch
>
>
> PreAnalyzed field extends plain FieldType and mistakenly uses the 
> DefaultAnalyzer as query analyzer, and doesn't allow for customization via 
>  schema elements.
> Instead it should extend TextField and support all query analysis supported 
> by that type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis

2016-01-19 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-4619:
-
Attachment: SOLR-4619.patch

{quote}
bq. A new analyzer class employing PreAnalyzedTokenizer could override 
initReader() or setReader(). I'll try with setReader(), since the docs for 
initReader() are focused on reader conditioning via char filters.

I was referring to TokenStreamComponents.setReader() here, which is called as 
part of Analyzer.tokenStream(): A subclass created in the new analyzer's 
overridden createComponents() could call a new method on PreAnalyzedTokenizer 
to consume the input reader and in so doing provide the attributes.
{quote}

Patch implementing the idea, splitting reader consumption out from reset() into 
its own method: decodeInput().  This method first removes all attributes from 
PreAnalyzedTokenizer's AttributeSource, then adds needed ones as a side effect 
of parsing the input.

There is a kludge here: because TokenStreamComponents.setReader() doesn't throw 
an exception, PreAnalyzedAnalyzer overrides createComponents() to create a 
TokenStreamComponents instance that catches and stores exceptions encountered 
during reader consumption with the stream's PreAnalyzedTokenizer instance, 
whose reset() method will then throw the stored exception, if any.

With this patch, PreAnalyzedAnalyzer can be reused; previously 
PreAnalyzedTokenizer reuse would ignore new input and re-emit tokens 
deserialized from the initial input.

With this patch, PreAnalyzedField analysis works like this: 
# If a query analyzer is specified in the schema then it will be used at query 
time.
# If an analyzer is specified in the schema with no type (i.e., it is neither 
of "index" nor "query" type), then this analyzer will be used for query 
parsing, but will be ignored at index time.
# If only an analyzer of "index" type is specified in the schema, then this 
analyzer will be used for query parsing, but will be ignored at index time.

This patch adds a new method removeAllAttributes() to AttributeSource, to 
support reuse of token streams with variable attributes, like 
PreAnalyzedTokenizer.

I think it's ready to go.

> Improve PreAnalyzedField query analysis
> ---
>
> Key: SOLR-4619
> URL: https://issues.apache.org/jira/browse/SOLR-4619
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.0, 4.1, 4.2, 4.2.1, Trunk
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
> Fix For: Trunk
>
> Attachments: SOLR-4619.patch, SOLR-4619.patch, SOLR-4619.patch
>
>
> PreAnalyzed field extends plain FieldType and mistakenly uses the 
> DefaultAnalyzer as query analyzer, and doesn't allow for customization via 
>  schema elements.
> Instead it should extend TextField and support all query analysis supported 
> by that type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis

2016-01-13 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-4619:
-
Attachment: SOLR-4619.patch

Patch that brings Andrzej's patch up to date with trunk, and adds tests for 
query-time functionality.

I had assumed that {{PreAnalyzedField}}-s would use the 
{{PreAnalyzedTokenizer}} at query time, but that is not (currently) the case: 
instead {{FieldType.DefaultAnalyzer}} is used.  This patch changes the behavior 
when no analyzer is specified to instead use {{PreAnalyzedTokenizer}}.

However, there is a chicken-and-egg interaction between 
{{PreAnalyzedTokenizer}} and {{QueryBuilder.createFieldQuery()}}, which aborts 
before performing any tokenization if the supplied analyzer's attribute factory 
doesn't contain a {{TermToBytesRefAttribute}}.  But {{PreAnalyzedTokenizer}} 
doesn't have any attributes defined until the input stream is consumed, in 
{{reset()}}. [~rcmuir] added a comment as part of LUCENE-5388 to 
{{PreAnalyzedTokenizer}}'s ctor, where 
{{AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY}} is set as the attribute factory 
rather than the default packed implementation: "we don't pack attributes: since 
we are used for (de)serialization and dont want bloat."

This patch moves the {{stream.reset()}} call in 
{{QueryBuilder.createFieldQuery()}} in front of the {{TermToBytesRefAttribute}} 
check, so that {{PreAnalyzedTokenizer}} (and other tokenizers that don't have a 
pre-added set of attributes) and also moves the 
{{addAttribute(PositionIncrementAttribute.class)}} call to after the the 
{{TermToBytesRefAttribute}} check.

An alternate approach to fix the chicken-and-egg problem might be to have 
{{PreAnalyzedTokenizer}} always include a dummy {{TermToBytesRefAttribute}} 
implementation, and then remove it when {{reset()}} is called, but that seems 
hackish.

I haven't run the full tests yet with this patch, but the included query-time 
{{PreAnalyzedField}} tests success.

I welcome feedback.

> Improve PreAnalyzedField query analysis
> ---
>
> Key: SOLR-4619
> URL: https://issues.apache.org/jira/browse/SOLR-4619
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.0, 4.1, 4.2, 4.2.1, Trunk
>Reporter: Andrzej Bialecki 
>Assignee: Andrzej Bialecki 
> Fix For: Trunk
>
> Attachments: SOLR-4619.patch, SOLR-4619.patch
>
>
> PreAnalyzed field extends plain FieldType and mistakenly uses the 
> DefaultAnalyzer as query analyzer, and doesn't allow for customization via 
>  schema elements.
> Instead it should extend TextField and support all query analysis supported 
> by that type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis

2013-03-20 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-4619:


Attachment: SOLR-4619.patch

 Improve PreAnalyzedField query analysis
 ---

 Key: SOLR-4619
 URL: https://issues.apache.org/jira/browse/SOLR-4619
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.0, 4.1, 4.2, 5.0, 4.2.1
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 5.0, 4.2.1

 Attachments: SOLR-4619.patch


 PreAnalyzed field extends plain FieldType and mistakenly uses the 
 DefaultAnalyzer as query analyzer, and doesn't allow for customization via 
 analyzer schema elements.
 Instead it should extend TextField and support all query analysis supported 
 by that type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4619) Improve PreAnalyzedField query analysis

2013-03-20 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-4619:


Fix Version/s: (was: 4.2.1)

Removing 4.2.1 from Fix version - this apparently needs more discussion.

 Improve PreAnalyzedField query analysis
 ---

 Key: SOLR-4619
 URL: https://issues.apache.org/jira/browse/SOLR-4619
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.0, 4.1, 4.2, 5.0, 4.2.1
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 5.0

 Attachments: SOLR-4619.patch


 PreAnalyzed field extends plain FieldType and mistakenly uses the 
 DefaultAnalyzer as query analyzer, and doesn't allow for customization via 
 analyzer schema elements.
 Instead it should extend TextField and support all query analysis supported 
 by that type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org