Hello everyone!
I have studied nutch 1.2 some days, my task is to get the body text of the
webpage. At last i get the text file- dump, and i wonder what is the
specific format of the dump text file ? it is utf-8 format ? because the
text is somekind of foreign language to me ,i don't know whether they are
messed up.
thank you so much
--
View this message in context:
http://lucene.472066.n3.nabble.com/NUTCH1-2-the-specific-format-of-the-dump-text-file-tp4062845.html
Sent from the Nutch - User mailing list archive at Nabble.com.