PreAnalyzed field analyzer
--------------------------
Key: SOLR-1020
URL: https://issues.apache.org/jira/browse/SOLR-1020
Project: Solr
Issue Type: New Feature
Components: Analysis
Affects Versions: 1.3
Reporter: Karl Wettin
Priority: Minor
An Analyzer that produce a TokenStream based on XML input that contains a
marshalled TokenStream. Also contains static TokenStream XML marshaller.
I kind of pulled this out of my pocket without testing it in a real environment
in order to get some comments on the solution before I add it to my project. So
cosider it a beta-patch.
It use JSR173 XMLStream API available in Java 1.6, compatible with Java 1.5 and
downloadable from https://sjsxp.dev.java.net/
XSD:
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="tokens" type="tokensType"/>
<xs:complexType name="tokensType">
<xs:sequence>
<xs:element type="tokenType" name="token"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="tokenType">
<xs:sequence>
<xs:element type="xs:int" name="positionIncrement" maxOccurs="1"/>
<xs:element type="xs:string" name="term" minOccurs="1"
maxOccurs="1"/>
<xs:element type="xs:string" name="type" maxOccurs="1"/>
<xs:element type="xs:int" name="startOffset" maxOccurs="1"/>
<xs:element type="xs:int" name="endOffset" maxOccurs="1"/>
<xs:element type="xs:int" name="flags" maxOccurs="1"/>
<xs:element type="payloadType" name="payload" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="payloadType">
<xs:choice maxOccurs="1" minOccurs="1">
<xs:element type="bytesType" name="bytes"/>
<xs:element type="xs:string" name="hex"/>
<xs:element type="xs:string" name="base64"/>
</xs:choice>
</xs:complexType>
<xs:complexType name="bytesType">
<xs:sequence>
<xs:element type="xs:byte" name="byte" maxOccurs="unbounded"
minOccurs="1"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
{code}
Even though I've added a couple of variants to how to handle a Payload in the
XSD only <hex> is supported.
Example XML:
{code:xml}
<tokens>
<token>
<positionIncrement>1</positionIncrement>
<term>term</term>
<type>type</type>
<startOffset>0</startOffset>
<endOffset>3</endOffset>
<flags>65535</flags>
<payload><hex>fffefd</hex></payload>
</token>
</tokens>
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.