Hi I have some questions regarding accumulo and unicode. I'm working with the wikisearch example:
Given some article such as: 197576 ... I see in the Wiki example that the title is normalized and becomes encoded as 1975\xE2\x80\x9376 But if I ingest that same data myself and do not use the Normalizer I get the same title that the normalizer produced. Likewise, if I insert the wikipedia data as plain XML and not base64 encoded, I see the same thing, specifically where articles link to other languages. The language characters are normalized. Does accumulo normalize automatically? Am I misunderstanding what I am seeing? What is the general guidance for using accumulo with Unicode characters? Thanks, Ralph
