I have 10M+ textual company names(in Chinese) that extracted from work
experiences of user's profile. Because those company names are manually
entered by users of our site, so there are lots of duplication. Our goal is
extracting & cleansing those data to establish a company dictionary. For
example, those terms should considered as one company:

Huawei Technologies Co. Ltd
Huawei
huawei.com
华为                        ->  (华为 is Huawei in Chinese)
华为有限公司                         -> (有限公司 is Co. Ltd in Chinese)

Looks like it's a clustering process, but i don't have any idea how can i
implement it.

Regards.
- Jason

Reply via email to