Running ANT; was -- Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread David M. Cole
re's a place to vote to suggest that compiled versions still be distributed, I vote for that. Thanks. \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor & P

Re: Nutch near future - strategic directions

2009-11-16 Thread David M. Cole
require a different command-line? Thanks. \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor & Publisher, NewsInc. <http://newsinc.net>V: (650

Re: Difference between Deiselpoint and Nutch?

2009-09-18 Thread David M. Cole
ct. \dmc PS: The robots.txt file shouldn't have any mention of a sitemap, except possibly to include the URL. -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor &

Re: Difference between Deiselpoint and Nutch?

2009-09-18 Thread David M. Cole
here) to the IP address where Nutch is running and the regular one to all other IP addresses. There may be other kludges available. Hope this helps. \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled

Re: Ignoring Robots.txt

2009-09-11 Thread David M. Cole
user-agents in the http.robots.agents tag with an asterisk (*), i.e.: http.robots.agents my-robot,* Hope this helps. \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com

Re: Crawling Password Protected Pages

2009-09-09 Thread David M. Cole
09-09-09 15:46:58,659 INFO fetcher.Fetcher - -activeThreads=0 Thank you in advance, bye, Kranthi Reddy. B -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor & Publisher, Ne

Re: Authentication

2009-09-05 Thread David M. Cole
-- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor & Publisher, NewsInc. <http://newsinc.net>V: (650) 557-2993 Consultant: The Cole Group <http://colegroup.com/> F: (650) 475-8479 *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+

Re: Can Nutch crawler Impersonate user-agent?

2009-06-01 Thread David M. Cole
-- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor & Publisher, NewsInc. <http://newsinc.net>V: (650) 557-2993 Consultant: The Cole Group <http://colegroup.com/> F: (650) 475-8479 *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+

Styling -- was Re: good documentation for nutch generate ?

2009-05-29 Thread David M. Cole
which I just pass along. I hope my humble little effort helps someone. \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor & Publisher, NewsInc. <http://newsinc.net>V: (650) 557-2993 Consultant: The Cole Group <http://colegroup.com/> F: (650) 475-8479 *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+

Re: nutch 1.0

2009-04-21 Thread David M. Cole
Peter Wang tutorial should work fine, though you do need to have Java 1.6 installed, as the Hadoop routines require it. \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor

Re: Nutch Crawling Questions

2009-04-20 Thread David M. Cole
ports Java 1.6 on 10.5 Intel. \dmc PS: And I just used the standard Peter Wang tutorial for installing Nutch on a Mac; just figure on using Terminal rather than Cygwin. -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ Dav

Re: Can't build Nutch

2009-04-20 Thread David M. Cole
d, you can find an installer. \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor & Publisher, NewsInc. <http://newsinc.net>V: (650) 557-2993 Consult

URL normalization ...

2009-03-22 Thread David M. Cole
nks. \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor & Publisher, NewsInc. <http://newsinc.net>V: (650) 557-2993 Consultant: The Cole Group <http://colegroup.com/> F: (650) 475-8479 *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+

Re: Build #722 won't start on Mac OS X, 10.4.11

2009-02-15 Thread David M. Cole
722. Thanks. \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Coled...@colegroup.com Editor & Publisher, NewsInc. <http://newsinc.net>V: (650) 557-2993 Consultant: The Cole Group <http://colegroup.com/> F: (650) 475-8479 *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+

Build #722 won't start on Mac OS X, 10.4.11

2009-02-14 Thread David M. Cole
ine up? Alternately, is there a way to get basic HTTP authorization without using httpclient-auth? Your thoughts would be appreciated. Thanks. \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Cole