NullPointerExceptions in Fetch

2009-05-01 Thread tsmori
I'm having an interesting problem that I think revolves around the interplay of a few settings that I'm not really clear on how they affect the crawl. Currently I have: content.limit = -1 fetcher.threads = 1000 fetcher.threads.per host = 100 indexer.max.tokens = 75 I also increased the JAVA

Nutch randomly skipping locations during crawl

2009-10-01 Thread tsmori
This is strange. I manage the webservers for a large university library. On our site we have a staff directory where each user has a location for information. The URLs take the form of: http://mydomain.edu/staff/userid I've added the staff URL to the urls seed file. But even with a crawl set to

RE: Nutch randomly skipping locations during crawl

2009-10-01 Thread tsmori
Both good ideas. Unfortunately, the content for each user is the same. It's a static php file that simply calls information out of our LDAP. It's very strange because I cannot see any difference between the user files/directories that are fetched and those that aren't. In checking both the crawl

Re: [VOTE] Apache Nutch 1.1 Release Candidate #1

2010-04-07 Thread tsmori
I'm not sure what exactly changed that made all my nullpointer errors go away, but I'm grateful for it, whatever it was. So, +1 from me, not that I'm even sure I get a vote in the matter, but if it's open to anyone on the list, I'm on board. -- View this message in context: http://n3.nabble.c

Weird crawl issue. Nutch picking up drop-down menu options.

2010-04-15 Thread tsmori
I have an old page on my site that Nutch is fetching. The results in the Nutch web app look like this: Site Map ... INSECT SYSTEMATIC RESOURCES Home : Site Map search Resources by Scientific Name ... Common Name Select NameAlderfliesAntsAntlionsAphidsBarkliceBeesBeetlesBookliceBristletailsBugs