Mark, You have a Web Crawler/Spider/Bot hitting your site; yours happens to be written as a Java Spider.
Why? It could simply be human/programming error. It could be an email harvester of some sort (but as your links are NPR related, I don't think this is it). Perhaps someone is attempting to download your site for offline viewing and has the wrong domain. There are hundreds of possibilities I suppose, some even nefarious. I'm pretty sure you can block bots by utilizing your .htaccess file. You can use a robots.txt file but alot of bots don't follow the rules. I'd probably start by getting a list of all the open source Java spiders. I've never had to do this, so Google is your friend. Regards, Kaffeen On Thu, Jun 3, 2010 at 5:29 PM, Mark Phillip <[email protected]> wrote: > Evening folks, > > I have pretty high expectations for the Refresh Austin list whenever I have > a tough question, but I might have found one stump-worthy. > > A couple months ago I started seeing requests in my web server access log > for "/ombudsman". I don't have an Ombudsman page, so it returned a 404. > Digging a little deeper, the same IP was repeatedly searching for the same > set of non-existent pages on my site: > > /about/privacypolicy.html > /about/termsofuse.html > /audiohelp/progstream.html > /blogs > /corrections > /email > /help > /help/communityfaq.html > /music > /ombudsman > /podcast > > After a bit more digging, I realized that it wasn't coming from just one IP > address. Turns out there are dozens of IP addresses all requesting the same > non-existent URLs. Each IP is scattered across the globe without any common > thread. The only user-agent listed in each request is a member of the > "Java/1.6.0" family. > > I am 100% stumped on this one. All Googling for community-sourced > Java-based search spiders comes up completely empty. > > > Any thoughts? Solve this and I'll buy you a beer on Tuesday. > > > > > Thanks, > Mark > http://markphillip.com > > -- > Our Web site: http://www.RefreshAustin.org/ > > You received this message because you are subscribed to the Google Groups > "Refresh Austin" group. > > [ Posting ] > To post to this group, send email to [email protected] > Job-related postings should follow http://tr.im/refreshaustinjobspolicy > We do not accept job posts from recruiters. > > [ Unsubscribe ] > To unsubscribe from this group, send email to > [email protected]<refresh-austin%[email protected]> > > [ More Info ] > For more options, visit this group at > http://groups.google.com/group/Refresh-Austin > -- If you understand, things are just as they are. If you do not understand, things are just as they are. -- Our Web site: http://www.RefreshAustin.org/ You received this message because you are subscribed to the Google Groups "Refresh Austin" group. [ Posting ] To post to this group, send email to [email protected] Job-related postings should follow http://tr.im/refreshaustinjobspolicy We do not accept job posts from recruiters. [ Unsubscribe ] To unsubscribe from this group, send email to [email protected] [ More Info ] For more options, visit this group at http://groups.google.com/group/Refresh-Austin
