[ 
https://issues.apache.org/jira/browse/SOLR-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640615#action_12640615
 ] 

Todd Feak commented on SOLR-814:
--------------------------------

Yes, they are used differently. 

However, a word written in Hiragana is the *same* word when written in 
Katakana. Same meaning. Futhermore, it's not always cut and dried which to use. 
For example, a movie title may be written in Hiragana or Katakana, depending on 
the Director's preference. The user (searcher) may not have remembered the 
Director's preference, so may search using the other. Without this 
normalization they would get a search miss.

I don't doubt your experience at Ultraseek, but this feature was explicitly 
asked for by Japanese (native speaking) engineers at Sony. I *just* (literally) 
double checked with a couple of onsite native speaking Japanese engineers and 
both agree that this is useful, at least for our searches.

I would say that it should be up to the schema developer as to whether this 
functionality is useful or not for their situation. Either way, I offer it up 
to the community for their decision.


> Add new Japanese Hiragana Filter and Factory
> --------------------------------------------
>
>                 Key: SOLR-814
>                 URL: https://issues.apache.org/jira/browse/SOLR-814
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: SOLR-814.patch
>
>
> Japanese Hiragana and Katakana character sets can be easily translated 
> between. This filter normalizes all Hiragana characters to their Katakana 
> counterpart, allowing for indexing and searching using either.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to