Hi Andrzej, This is what sets Fetcher to parse to true or false, right?
<property> <name>fetcher.parse</name> <value>true</value> <description>If true, fetcher will parse content.</description> </property> I don't have my nutch-default and nutch-site files with me right now but I would say that for 95% I didn't change this value in my nutch-site (and I didn't change nutch-default at all). So the answer is YES, Fetcher is in parsing mode (with ~ 95% confience). I am running nutch against my local apache (not visible for you). But you may noticed that I used depth=2 so only a few pages (16 to be exact) are crawled. If you are interested I can send you them all so that you can upload this content on any server you need for your tests. Look into crawl.log file (attached to previous email sent at 8:21am today) for deatils. I will try to simulate this issue with one or two arbitraty html pages. If that will produce the issue then I can send you them. Lukas On 1/5/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Lukas Vlcek wrote: > > >How can I learn that? > >What I do is running regular one-step command [/bin/nutch crawl] > > > > > > In that case your nutch-default.xml / nutch-site.xml decides, there is a > boolean option there. If you didn't change this, then it defaults to > true (i.e. your fetcher is parsing the content). > > Is it easy to reproduce this if I knew the seed urls? If that's the > case, please send me the seed urls (contact me off the list, if it's > sensitive). > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers