Tomás That wont really work, transliteration to Romaji works for individual terms only so you would need to tokenize the Japanese prior to transliteration. I am not sure what tool you plan to use for transliteration, I have used ICU in the past and from what I can tell it does not transliterates Kanji. Besides transliterating Kanji is debatable for a variety of reasons.
What I would suggest is that you transliterate Hiragana to Katakana, leave the Kanji alone, and index/search using ngrams. If you want 'proper' tokenization I would recommend Mecab. I have looked into this for a client and there is no clear cut solution. Cheers François On Mar 11, 2011, at 10:29 AM, Tomás Fernández Löbbe wrote: > This question is probably not a completely Solr question but it's related to > it. I'm dealing with a Japanese Solr application in which I would like to be > able to search in any of the Japanese Alphabets. The content can also be in > any Japanese Alphabet. I've been thinking in this solution: Convert > everything to roma-ji, on Index time and query time. > For example: > > Indexing time: > [Something in Hiragana] --> translate it to roma-ji --> index > > Searching time: > [Something in Katakana] --> translate it to roma-ji --> search > or > [Something in Kanji] --> translate it to roma-ji --> search > > I don't have a deep understanding of Japanese, and that's my problem. Did > somebody in the list tried something like this before? Did it work? > > > Thanks, > > Tomás