Re: svn commit: r230867 - /lucene/nutch/trunk/conf/crawl-urlfilter.txt.template

2005-08-08 Thread Piotr Kosiorowski
No problem for me. I have just run the test crawl on http://lucene.apache.org/nutch as described in new tutorial and a lot of pdf and png files were causing big exceptions and stack traces in log. I thought that people (usually using nutch for the first time) might think that they did something

User agent string

2005-08-08 Thread Piotr Kosiorowski
Hello, We should probably change user agent string in nutch-default.xml to point to Apache site. The only question is http.agent.version - should we set it to 0.07 for release and 0.08-dev for future work? I do not know how it was used previously. Current values: property

Nutch website deployment

2005-08-07 Thread Piotr Kosiorowski
Hi, I just wanted to finally add myself to the list of nutch committers on nutch website and I am not sure how to deploy it. So I have installed forrest and modified src/site/src/documentation/content/xdocs. Than run 'forrest'. And it generated content in src/site/build/site. And now the

Re: Strange search results

2005-08-05 Thread Piotr Kosiorowski
Hello, In my experience it is very important to use anchor text giving it quite high boost. It allows me to return http://www.aa.com when user searches for American Airlines - without using anchor text it was impossible to achieve - a lot of sites (spam or not) with american airlines in url and

Re: near-term plan

2005-08-04 Thread Piotr Kosiorowski
Hello, I think it is good idea to release ASAP. I wanted to contribute my code for fault-tolerant searching - it takes more time than I expected because as some of you know in meantime I become a father. But I hope I will be able to send something for comments early next week. I will look at

Re: bin/nutch issue - on Mac OS X

2005-07-19 Thread Piotr Kosiorowski
Hello, Tested on cygwin and on linux box. : based syntax is used ealier in nutch script too. Commited. Thanks Piotr Erik Hatcher wrote: I'm getting expr: syntax error when running all bin/nutch commands. It comes from this line: if expr match `uname` 'CYGWIN*' /dev/null; then should

Re: [Nutch-dev] Re: a silly question

2005-07-16 Thread Piotr Kosiorowski
Hello, I understood you have all your segments in /home/fji/SE/nutch-nightly/crawl.test/ but according to log file you sent nutch is looking in: /home/fji/SE/tomcat4/segments Please copy your segment directory from /home/fji/SE/nutch-nightly/crawl.test/ to /home/fji/SE/tomcat4/ and restart

<    1   2