I couldn't access the site you crawled to check, but it seems to me
that nutch couldn't get the correct encoding/charset of the page.

Nutch looks for the encoding from the contenttype header and from a
meta content type tag in the HEAD section of the page. If the
webserver/page shows neither, I think it defaults to
parser.character.encoding.default, which usually is the wrong one for
Chinese pages.

As for the characters turning right again in your email, I guess when
you got them from nutch they were encoded in java's unicode instead of
GB/UTF8 (which means they show up as a shorter squiggle - as you
observed) but after you pasted it into a email, the email was sent as
unicode which turns them back into normal characters on receipt.


On 3/25/06, kauu <[EMAIL PROTECTED]> wrote:
> what's going on?
>  after sending my mail i see that what is just tangly character turn normall
> ,why? anyone can tell me something about it?
>  well, another thing is that,after i entrying some CHINESE into the query
> box ,it turn tangly character when i button the query button. so why?
>  any reply will be appreciated!
>
> On 3/25/06, kauu <[EMAIL PROTECTED]> wrote:
> >
> > hi all
> >   i got another problem now, after my crawling and startup the tomcat(I've
> > change the nutch-site.xml),then i search some thing , i got some tangly
> > results which looks like
> >
> >
> > * 延边大学本科生招生信息网-- <http://zsb.ybu.edu.cn/search.php>*
> > 延边大学本科生招生信息网--    延边大学本科生招生信息网 提示 请输入搜索关键字 点击此处返回上一页 处长* ... *
> > http://zsb.ybu.edu.cn/search.php 
> > (cached<http://localhost:8080/cached.jsp?idx=0&id=28>)
> > (explain <http://localhost:8080/explain.jsp?idx=0&id=28&query=search>) (
> > anchors <http://localhost:8080/anchors.jsp?idx=0&id=28>)
> >
> > #######         and the tangly results should be CHINESE.      ########
> > my
> >  os is winxp(sp2)
> >  brower is firefox  (i get the same result in  IE)
> >
> > everything goes well except this
> > any one can help me? any reply will be appreciated!!!
> >
> > --
> > www.babatu.com
> >
>
>
>
> --
> www.babatu.com
>
N�HS^甸��X���'���u急<纶��.�蛛y�"��*m�x%jx.j���^谱�v譬�X�j亘�颧��m┹������v&�蹲�v�^�+蘖孳j�Z�罔�{az����^介hリ喈��n���)��{h����∝��撰�+h�(m���遍Z搽jY�w��钎rg

Reply via email to