It is in English language. I am pretty sure it is not in other language because here is the document url
http://www.irs.gov/pub/irs-pdf/f1040as1.pdf. On Jan 2, 2008 10:49 AM, Dennis Kubes <[EMAIL PROTECTED]> wrote: > Most likely this page is in a different language. > > Dennis > > Developer Developer wrote: > > Hello , > > > > I need to access parse text from nutch documents, I am using nuthbean to > > search and then access the parseText from it. Here is the sample code > > > > > > > > Configuration conf = NutchConfiguration.create(); > > NutchBean nb = new NutchBean(conf); > > Hits hits = nb.search(Query.parse("irs", conf), 10); > > > > //get a sample hit > > Hit hit = hits.getHit(8); > > > > HitDetails hitDetails = nb.getDetails(hit); > > > > ParseText pText = nb.getParseText(hitDetails); > > > > System.out.println(pText.getText()); > > > > The System.out command prints non readable characters as follows > > > > obj<</Length 31683/Filter/FlateDecode/Length1 1720/Length2 30704/Length3 > > 532>>stream > > H‰¤U 8Të (R)Ýåt›=*鯶„P†Y†fÆ.'!vb )'´Ì,,fÖŒµÖ¸ÔV*—P > ¥›¨]QJî%'kDE…ЍØÏ"ê(EÚ‡JÍYklgËÉóœszæyþYÿ÷ ÿ»Þï{ßÿ_ºZ|g†•Hê ÛJQ‚ 1Í?õˆ Æ > (c) B &" 1Ù4]]k † DŠò > > &sä0°  > > > > > > Any idea what I am missing ? The document is a pdf in english. > > > > Thanks ! > > >
