Re: Error parsing html

2012-10-09 Thread Sebastian Nagel
> I should mention, that I'm using Nutch in a Web-Application. It's possible though it's hard. > While debugging I came across the runParser method in ParseUtil class in > which the task.get(MAX_PARSE_TIME, TimeUnit.SECONDS); returns null. See http://wiki.apache.org/nutch/RunNutchInEclipse#Debuggi

Re: Error parsing html

2012-10-09 Thread CarinaBambina
I checked the directory permissions. They should be ok, set to read/write access. It's just hard to debug, as i can't make Hadoop logs work. I only see Warnings and Infos in the console. -- View this message in context: http://lucene.472066.n3.nabble.com/Error-parsing-html-tp39946

Re: Error parsing html

2012-10-09 Thread alxsss
Sent: Tue, Oct 9, 2012 10:03 am Subject: Re: Error parsing html i now also tried using all source files itself instead of the nutch.jar, but nothing changed. Is there anyone who has an idea what the reason for this error might be? Or at least where and what i should look for? Any hint?! Thanks in

Re: Error parsing html

2012-10-09 Thread CarinaBambina
.472066.n3.nabble.com/Error-parsing-html-tp3994699p4012755.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Error parsing html

2012-10-02 Thread alxsss
Can you provide a few lines of log or the url that gives the exception? -Original Message- From: CarinaBambina To: user Sent: Tue, Oct 2, 2012 2:04 pm Subject: Re: Error parsing html Thanks for the reply. I'm now using Nutch 1.5.1, but nothing has changed so far. While debu

Re: Error parsing html

2012-10-02 Thread CarinaBambina
ogram raise the ParseException. Right now i have no clue what the problem could be. I also tried using all default configurations, but nothing changed. -- View this message in context: http://lucene.472066.n3.nabble.com/Error-parsing-html-tp3994699p4011495.html Sent from the Nutch - User mailing

Re: Error parsing html

2012-10-02 Thread Lewis John Mcgibbney
Hi, For starters can you please use 1.5.1. On Tue, Oct 2, 2012 at 4:32 PM, CarinaBambina wrote: > Hi, > i'm curious if you have come up with any solution yet? As i'm having the > exact same problem! > When i start the crawl the entered Url is parsed perfectly, but for all > 'links' on this site

Re: Error parsing html

2012-10-02 Thread CarinaBambina
I'm using Nutch 1.5. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Error-parsing-html-tp3994699p4011436.html Sent from the Nutch - User mailing list archive at Nabble.com.

RE: Error parsing html

2012-07-12 Thread Markus Jelsma
Please provide the whole log snippet. Is it an HTML file? Can the parser parse it, is it large? -Original message- > From:Sudip Datta > Sent: Thu 12-Jul-2012 23:47 > To: Markus Jelsma > Cc: user@nutch.apache.org > Subject: Re: Error parsing html > > In Parse

Re: Error parsing html

2012-07-12 Thread Sudip Datta
ma > > Cc: user@nutch.apache.org > > Subject: Re: Error parsing html > > > > Hi Markus, > > > > Yes, they seem to be rightly mapped: > > > > parse-plugins.xml reads: > > > > > > > > > > >

RE: Error parsing html

2012-07-12 Thread Markus Jelsma
Seems correct indeed. Please check the logs, they may tell some more. -Original message- > From:Sudip Datta > Sent: Thu 12-Jul-2012 21:51 > To: Markus Jelsma > Cc: user@nutch.apache.org > Subject: Re: Error parsing html > > Hi Markus, > > Yes, the

Re: Error parsing html

2012-07-12 Thread Sudip Datta
a regex of content types. > > > -Original message- > > From:Sudip Datta > > Sent: Thu 12-Jul-2012 20:36 > > To: user@nutch.apache.org > > Subject: Re: Error parsing html > > > > Nopes. That didn't help. In fact, I had added that entry minutes before &

RE: Error parsing html

2012-07-12 Thread Markus Jelsma
tch.apache.org > Subject: Re: Error parsing html > > Nopes. That didn't help. In fact, I had added that entry minutes before > sending a mail to the group and after couple of hours of frustration in > trying to get the parser to work. > > On Thu, Jul 12, 2012 at 11:40 P

Re: Error parsing html

2012-07-12 Thread Sudip Datta
Nopes. That didn't help. In fact, I had added that entry minutes before sending a mail to the group and after couple of hours of frustration in trying to get the parser to work. On Thu, Jul 12, 2012 at 11:40 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > For starters there is no p

Re: Error parsing html

2012-07-12 Thread Lewis John Mcgibbney
For starters there is no parse-xhtml plugin unless of course this is a custom one you've written yourself. Unless this is the case then remove this from the plugin.includes property and re-spin it hth On Thu, Jul 12, 2012 at 7:00 PM, Sudip Datta wrote: > Hi, > > I am using Nutch 1.4 and Solr. M

Error parsing html

2012-07-12 Thread Sudip Datta
Hi, I am using Nutch 1.4 and Solr. My crawls were working perfectly fine before I made some changes to the SolrWriter (which I believe has nothing to do with my problem). Since then, I am getting: WARN : org.apache.nutch.parse.ParseUtil - Unable to successfully parse content of type text/html IN