Re: Why cant I inject a google link to the database?
Larsson85, Please read past responses. Google is blocking all crawlers, not just yours from indexing their search results. Because of their robots.txt file directives you will not be able to do this. If you place a sign on your house, DO NOT ENTER, and I entered, you would be very upset. That is what the robots.txt file does for a site. It tells visiting bots what they can enter and what they can't enter. Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS On Fri, Jul 17, 2009 at 9:32 AM, Larsson85kristian1...@hotmail.com wrote: I think I need more help on how to do this. I tried using property namehttp.robots.agents/name valueMozilla/5.0*/value descriptionThe agent strings we'll look for in robots.txt files, comma-separated, in decreasing order of precedence. You should put the value of http.agent.name as the first agent name, and keep the default * at the end of the list. E.g.: BlurflDev,Blurfl,* /description /property If I dont have the star in the end I get the same as earlier, No URLs to fetch. And if I do I get 0 records selected for fetching, exiting reinhard schwab wrote: identify nutch as popular user agent such as firefox. Larsson85 schrieb: Any workaround for this? Making nutch identify as something else or something similar? reinhard schwab wrote: http://www.google.se/robots.txt google disallows it. User-agent: * Allow: /searchhistory/ Disallow: /search Larsson85 schrieb: Why isnt nutch able to handle links from google? I tried to start a crawl from the following url http://www.google.se/search?q=site:sehl=svstart=100sa=N And all I get is no more URLs to fetch The reason for why I want to do this is because I had a tought on maby I could use google to generate my start list of urls by injecting pages of search result. Why wont this page be parsed and links extracted so the crawl can start? -- View this message in context: http://www.nabble.com/Why-cant-I-inject-a-google-link-to-the-database--tp24533162p24534522.html Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Job failed help
Any suggestions on this problem? Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS On Wed, Jul 15, 2009 at 8:41 AM, Jake Jacobsonjakecjacob...@gmail.com wrote: Did this with the same results. In my home directory I had a directory name linkdb-1292468754 created with caused the process to run out of disk space. In the hadoop-site.xml I have this set up configuration property namehadoop.tmp.dir/name value/webroot/oscrawlers/nutch/tmp//value descriptionA base for other temporary directories./description /property /configuration I am using the following command line options to run Nutch 1.0 /webroot/oscrawlers/nutch/bin/nutch crawl /webroot/oscrawlers/nutch/urls/seed.txt -dir /webroot/oscrawlers/nutch/crawl -depth 10 /webroot/oscrawlers/nutch/logs/crawl_log.txt In my log file I see this error message: LinkDb: adding segment: file:/webroot/oscrawlers/nutch/crawl/segments/20090714095100 Exception in thread main java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:170) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:147) at org.apache.nutch.crawl.Crawl.main(Crawl.java:129) Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS On Mon, Jul 13, 2009 at 9:00 AM, SunGodsun...@cheemer.org wrote: if you use hadoop run nutch please add property namehadoop.tmp.dir/name value/youtempfs/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property to you hadoop-site.xml 2009/7/13 Jake Jacobson jakecjacob...@gmail.com Hi, I have tried to run nutch 1.0 several times and it fails due to lack of disk space. I have defined the crawl to place all files on a disk that has plenty of space but when it starts building the linkdb it wants to put temp files in the home dir which doesn't have enough space. How can I force Nutch not to do this? Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS
Crawling with a PKI Cert
Hi, Has there been any work with Nutch to crawl with a PKI cert? How about sites that take username/password and set cookies? Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS
Re: how to crawl a page but not index it
Hi, Nutch should follow the meta robots directives so in page A add this meta directive. meta name=robots content=noindex,follow http://www.seoresource.net/robots-metatags.htm Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS On Tue, Jul 14, 2009 at 8:32 AM, Beatstarun_agrawal...@yahoo.com wrote: hi, actually what i want is to crawl a web page say 'page A' and all its outlinks. i want to index all the content gathered by crawling the outlinks. But not the 'page A'. is there any way to do it in single run. with Regards Beats be...@yahoo.com SunGod wrote: 1.create work dir test first 2.insert url ../bin/nutch inject test -urlfile urls 3.create fetchlist ../bin/nutch generate test test/segments 4.fetch url s1=`ls -d crawl/segments/2* | tail -1` echo $s1 ../bin/nutch fetch test/segments/20090628160619 5.update crawldb ../bin/nutch updatedb test test/segments/20090628160619 loop step 3 - 5, write a bash script running is best! next time please use google search first 2009/7/13 Beats tarun_agrawal...@yahoo.com can anyone help me on this.. i m using solr to index the nutch doc. So i think prune tool will not work. i do not want to index the document taken from a particular set of sites with regards Beats -- View this message in context: http://www.nabble.com/how-to-crawl-a-page-but-not-index-it-tp24437901p24459435.html Sent from the Nutch - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/how-to-crawl-a-page-but-not-index-it-tp24437901p24478530.html Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Job failed help
Did this with the same results. In my home directory I had a directory name linkdb-1292468754 created with caused the process to run out of disk space. In the hadoop-site.xml I have this set up configuration property namehadoop.tmp.dir/name value/webroot/oscrawlers/nutch/tmp//value descriptionA base for other temporary directories./description /property /configuration I am using the following command line options to run Nutch 1.0 /webroot/oscrawlers/nutch/bin/nutch crawl /webroot/oscrawlers/nutch/urls/seed.txt -dir /webroot/oscrawlers/nutch/crawl -depth 10 /webroot/oscrawlers/nutch/logs/crawl_log.txt In my log file I see this error message: LinkDb: adding segment: file:/webroot/oscrawlers/nutch/crawl/segments/20090714095100 Exception in thread main java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:170) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:147) at org.apache.nutch.crawl.Crawl.main(Crawl.java:129) Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS On Mon, Jul 13, 2009 at 9:00 AM, SunGodsun...@cheemer.org wrote: if you use hadoop run nutch please add property namehadoop.tmp.dir/name value/youtempfs/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property to you hadoop-site.xml 2009/7/13 Jake Jacobson jakecjacob...@gmail.com Hi, I have tried to run nutch 1.0 several times and it fails due to lack of disk space. I have defined the crawl to place all files on a disk that has plenty of space but when it starts building the linkdb it wants to put temp files in the home dir which doesn't have enough space. How can I force Nutch not to do this? Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS
Re: Nutch Tutorial 1.0 based off of the French Version
I did attach it. Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS On Mon, Jul 13, 2009 at 9:04 PM, alx...@aim.com wrote: Hi, Is it available on the internet? If not, could you please attach it. Thanks. A. -Original Message- From: Jake Jacobson jakecjacob...@gmail.com To: nutch-user@lucene.apache.org Sent: Mon, Jul 13, 2009 1:26 pm Subject: Nutch Tutorial 1.0 based off of the French Version Hi, Not finding any other Nutch 1.0 tutorial, I took the one b.bouzid.moha...@gmail.com posted a few days ago and ran it through the Google translation page. I have not had time to go over the steps and I don't think I will for a few weeks but wanted to send this out to the community. Hope it helps someone and we can add to it. Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS
Re: Nutch Tutorial 1.0 based off of the French Version
Posted it to my blog, http://jakecjacobson.blogspot.com/2009/07/nutch10installationguide.html Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS On Mon, Jul 13, 2009 at 4:26 PM, Jake Jacobsonjakecjacob...@gmail.com wrote: Hi, Not finding any other Nutch 1.0 tutorial, I took the one b.bouzid.moha...@gmail.com posted a few days ago and ran it through the Google translation page. I have not had time to go over the steps and I don't think I will for a few weeks but wanted to send this out to the community. Hope it helps someone and we can add to it. Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS
Job failed help
Hi, I have tried to run nutch 1.0 several times and it fails due to lack of disk space. I have defined the crawl to place all files on a disk that has plenty of space but when it starts building the linkdb it wants to put temp files in the home dir which doesn't have enough space. How can I force Nutch not to do this? Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS
Nutch Tutorial 1.0 based off of the French Version
Hi, Not finding any other Nutch 1.0 tutorial, I took the one b.bouzid.moha...@gmail.com posted a few days ago and ran it through the Google translation page. I have not had time to go over the steps and I don't think I will for a few weeks but wanted to send this out to the community. Hope it helps someone and we can add to it. Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/jakecjacobson http://twitter.com/jakejacobson Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS
Script to crawl web
Hi, I was wondering if anyone has a simple script using Nutch 1.0 to crawl an Intranet sites with multiple webservers. I can use /webroot/oscrawlers/nutch/bin/nutch crawl /webroot/oscrawlers/nutch/urls/seed.txt -dir /webroot/oscrawlers/nutch/crawl -depth 8 -topN 1000 and get a big chunk of the files. I then tried to follow the steps outlined on the Nutch Tutorial, http://wiki.apache.org/nutch/NutchTutorial on crawling Whole-web and nothing new seems to get into the index. It seems to be crawling the same URLs. When I run the -stats command against the database I get the same stats output. Here is my script #!/bin/sh # nutch_crawler.sh echo Set UMASK ...; umask 002; echo # Set Variables LIMIT=1 # Max loops to execute A=0 NUTCHBINARY='/webroot/oscrawlers/nutch/bin/nutch' NUTCHDB='/webroot/oscrawlers/nutch/crawl/crawldb' NUTCHSEGMENTS='/webroot/oscrawlers/nutch/crawl/segments' NUTCHINDEXES='/webroot/oscrawlers/nutch/crawl/indexes' NUTCHLINKDB='/webroot/oscrawlers/nutch/crawl/linkdb' # Inject starting URLs into the database #echo Injecting Starting URLs ... #echo #$NUTCHBINARY inject $NUTCHDB /webroot/oscrawlers/nutch/urls/seed.txt #sleep 30 while [ $A -le $LIMIT ] do # Generate a fetch list echo Generating fetch list ... $NUTCHBINARY generate $NUTCHDB $NUTCHSEGMENTS -topN 1000 # Find the newest created segment echo echo Get segment ... s1=`ls -d /webroot/oscrawlers/nutch/crawl/segments/2* | tail -1` echo echo Segment is: $s1 ... # Fetch this segment $NUTCHBINARY fetch $s1 # Add one to A and continue looping until LIMIT is reached A=$(($A+1)) sleep 60 done # Invert links echo echo Building inverted links ... $NUTCHBINARY invertlinks $NUTCHLINKDB -dir $NUTCHSEGMENTS # Before I can do this, I need to delete the current indexes. Doesn't seem to affect the current searches echo echo Remove old indexes ... rm -rf $NUTCHINDEXES # Index Segments echo echo Build new indexes ... $NUTCHBINARY index $NUTCHINDEXES $NUTCHDB $NUTCHLINKDB $NUTCHSEGMENTS/* echo echo Done ...; ### Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.new.facebook.com/people/Jake_Jacobson/622727274 Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS
Running Nutch on VMs
Hi, Has anyone had experience in running a large scale nutch on VM running RedHat Linux? Would like to setup a test bed that would index 80 million documents and support up to 5 searches per second. If so, can you provide me any guidance on how much ram, diskspace, and processors needed for the configuration? Does Nutch get any performance boost from running on 64 bit verses a 32 bit OS? Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.new.facebook.com/people/Jake_Jacobson/622727274 Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS
Re: Can Nutch crawler Impersonate user-agent?
Hi, Well I found out the problem. In the nutch-default.xml there is a setting: property namehttp.robots.agents/name value*/value descriptionThe agent strings we'll look for in robots.txt files, comma-separated, in decreasing order of precedence. You should put the value of http.agent.name as the first agent name, and keep the default * at the end of the list. E.g.: BlurflDev,Blurfl,* /description /property I copied this to my nutch-site.xml file, edited it with my user-agent string and the magic worked. I would suggest that this block of code is added to the nutch-site.xml file by default. Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.new.facebook.com/people/Jake_Jacobson/622727274 Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS On Mon, Jun 1, 2009 at 2:23 PM, Jake Jacobson jakecjacob...@gmail.com wrote: Hi, I am testing out Nutch 1.0 and it doesn't seem to be able to crawl my website that has the following robots.txt file: User-agent: imo-robot-intelink Disallow: /App_Themes/ Disallow: /app_themes/ Disallow: /Archive/ Disallow: /archive/ Disallow: /Bin/ Disallow: /bin/ I have the nutch-site.xml defined as: configuration property namehttp.agent.name/name valueimo-robot-intelink/value descriptionICES Robots Name/description /property property namehttp.agent.version/name value/value description/description /property property namehttp.agent.description/name valueICES Open Source Web Crawler using Nutch 1.0/value description/description /property property namehttp.agent.url/name valuehttp://www.xxx.gov/search//value description/description /property property namehttp.agent.email/name value/value description/description /property /configuration When I run the following ./nutch crawl ../urls -dir ../crawl/ -depth 3 -topN 50 from the command line I get: crawl started in: ../crawl rootUrlDir = ../urls threads = 10 depth = 3 topN = 50 Injector: starting Injector: crawlDb: ../crawl/crawldb Injector: urlDir: ../urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: ../crawl/segments/20090601180745 Generator: filtering: true Generator: topN: 50 Generator: jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. Fetcher: starting Fetcher: segment: ../crawl/segments/20090601180745 Fetcher: threads: 10 QueueFeeder finished: total 3 records. fetching http://www.intelink.gov/ fetching http://www.intelink.gov/blogs/ fetching http://www.intelink.gov/wiki/Main_Page -finishing thread FetcherThread, activeThreads=9 -finishing thread FetcherThread, activeThreads=8 -finishing thread FetcherThread, activeThreads=7 -finishing thread FetcherThread, activeThreads=6 -finishing thread FetcherThread, activeThreads=5 -finishing thread FetcherThread, activeThreads=4 -finishing thread FetcherThread, activeThreads=3 -finishing thread FetcherThread, activeThreads=2 -finishing thread FetcherThread, activeThreads=1 -finishing thread FetcherThread, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 -activeThreads=0 Fetcher: done CrawlDb update: starting CrawlDb update: db: ../crawl/crawldb CrawlDb update: segments: [../crawl/segments/20090601180745] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: Merging segment data into db. CrawlDb update: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: ../crawl/segments/20090601180757 Generator: filtering: true Generator: topN: 50 Generator: jobtracker is 'local', generating exactly one partition. Generator: 0 records selected for fetching, exiting ... Stopping at depth=1 - no more URLs to fetch. LinkDb: starting LinkDb: linkdb: ../crawl/linkdb LinkDb: URL normalize: true LinkDb: URL filter: true LinkDb: adding segment: file:/linsearchtools1o/oscrawlers/nutch-1.0/crawl/segments/20090601180745 LinkDb: done Indexer: starting Indexer: done Dedup: starting Dedup: adding indexes in: ../crawl/indexes Dedup: done merging indexes to: ../crawl/index Adding file:/linsearchtools1o/oscrawlers/nutch-1.0/crawl/indexes/part-0 done merging crawl finished: ../crawl I have a tail on my webserver log files and I see
Can Nutch crawler Impersonate user-agent?
Hi, I am testing out Nutch 1.0 and it doesn't seem to be able to crawl my website that has the following robots.txt file: User-agent: imo-robot-intelink Disallow: /App_Themes/ Disallow: /app_themes/ Disallow: /Archive/ Disallow: /archive/ Disallow: /Bin/ Disallow: /bin/ I have the nutch-site.xml defined as: configuration property namehttp.agent.name/name valueimo-robot-intelink/value descriptionICES Robots Name/description /property property namehttp.agent.version/name value/value description/description /property property namehttp.agent.description/name valueICES Open Source Web Crawler using Nutch 1.0/value description/description /property property namehttp.agent.url/name valuehttp://www.xxx.gov/search//value description/description /property property namehttp.agent.email/name value/value description/description /property /configuration When I run the following ./nutch crawl ../urls -dir ../crawl/ -depth 3 -topN 50 from the command line I get: crawl started in: ../crawl rootUrlDir = ../urls threads = 10 depth = 3 topN = 50 Injector: starting Injector: crawlDb: ../crawl/crawldb Injector: urlDir: ../urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: ../crawl/segments/20090601180745 Generator: filtering: true Generator: topN: 50 Generator: jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. Fetcher: starting Fetcher: segment: ../crawl/segments/20090601180745 Fetcher: threads: 10 QueueFeeder finished: total 3 records. fetching http://www.intelink.gov/ fetching http://www.intelink.gov/blogs/ fetching http://www.intelink.gov/wiki/Main_Page -finishing thread FetcherThread, activeThreads=9 -finishing thread FetcherThread, activeThreads=8 -finishing thread FetcherThread, activeThreads=7 -finishing thread FetcherThread, activeThreads=6 -finishing thread FetcherThread, activeThreads=5 -finishing thread FetcherThread, activeThreads=4 -finishing thread FetcherThread, activeThreads=3 -finishing thread FetcherThread, activeThreads=2 -finishing thread FetcherThread, activeThreads=1 -finishing thread FetcherThread, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 -activeThreads=0 Fetcher: done CrawlDb update: starting CrawlDb update: db: ../crawl/crawldb CrawlDb update: segments: [../crawl/segments/20090601180745] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: Merging segment data into db. CrawlDb update: done Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: ../crawl/segments/20090601180757 Generator: filtering: true Generator: topN: 50 Generator: jobtracker is 'local', generating exactly one partition. Generator: 0 records selected for fetching, exiting ... Stopping at depth=1 - no more URLs to fetch. LinkDb: starting LinkDb: linkdb: ../crawl/linkdb LinkDb: URL normalize: true LinkDb: URL filter: true LinkDb: adding segment: file:/linsearchtools1o/oscrawlers/nutch-1.0/crawl/segments/20090601180745 LinkDb: done Indexer: starting Indexer: done Dedup: starting Dedup: adding indexes in: ../crawl/indexes Dedup: done merging indexes to: ../crawl/index Adding file:/linsearchtools1o/oscrawlers/nutch-1.0/crawl/indexes/part-0 done merging crawl finished: ../crawl I have a tail on my webserver log files and I see the robots.txt file requested with a 200 but nothing gets into the index. I see the error message Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. which it is listed first. Any help given would be most appreciated. Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/people/Jake_Jacobson/622727274 Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS
Re: Can Nutch crawler Impersonate user-agent?
The Allow directive in the robots.txt is optional. If you don't have an explicit disallow statement, it means that directory or file is available for indexing. Jake Jacobson http://www.linkedin.com/in/jakejacobson http://www.facebook.com/people/Jake_Jacobson/622727274 Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter. -- ANONYMOUS On Mon, Jun 1, 2009 at 2:46 PM, David M. Cole d...@colegroup.com wrote: At 2:23 PM -0400 6/1/09, Jake Jacobson wrote: User-agent: imo-robot-intelink Disallow: /App_Themes/ Disallow: /app_themes/ Disallow: /Archive/ Disallow: /archive/ Disallow: /Bin/ Disallow: /bin/ Jake: I think you need to add one more line after the last line: Allow: / \dmc -- *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+ David M. Cole ...@colegroup.com Editor Publisher, NewsInc. http://newsinc.net V: (650) 557-2993 Consultant: The Cole Group http://colegroup.com/ F: (650) 475-8479 *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+