Hey folks, thanks for all the thoughts.

My hope was that someone would recognize this Java-based
s...@home-esquespider, but there just might not be an answer to the
source of all it.


See everybody Tuesday,
Mark
http://markphillip.com





On Fri, Jun 4, 2010 at 2:05 PM, Brandtley <[email protected]> wrote:

> Hey Mark,
>
> I read about this on a design blog recently while developing a clients
> mobile site and it had an article relating to things like this.
>
>
> http://perishablepress.com/press/2010/04/26/stop-404-requests-for-mobile-versions-of-your-site/
>
> According to the article, these types of bots just spider your site
> for pages listings of varied sorts and are attempting to harvest date
> from them.
>
> The article also shows a couple .htaccess techniques to stop 404
> requests like these
>
> Hope this helps.
>
> - Brandtley McMinn
> http://giggleboxstudios.net
>
> On Jun 3, 5:29 pm, Mark Phillip <[email protected]> wrote:
> > Evening folks,
> >
> > I have pretty high expectations for the Refresh Austin list whenever I
> have
> > a tough question, but I might have found one stump-worthy.
> >
> > A couple months ago I started seeing requests in my web server access log
> > for "/ombudsman".  I don't have an Ombudsman page, so it returned a 404.
> > Digging a little deeper, the same IP was repeatedly searching for the
> same
> > set of non-existent pages on my site:
> >
> > /about/privacypolicy.html
> > /about/termsofuse.html
> > /audiohelp/progstream.html
> > /blogs
> > /corrections
> > /email
> > /help
> > /help/communityfaq.html
> > /music
> > /ombudsman
> > /podcast
> >
> > After a bit more digging, I realized that it wasn't coming from just one
> IP
> > address.  Turns out there are dozens of IP addresses all requesting the
> same
> > non-existent URLs.  Each IP is scattered across the globe without any
> common
> > thread.  The only user-agent listed in each request is a member of the
> > "Java/1.6.0" family.
> >
> > I am 100% stumped on this one.  All Googling for community-sourced
> > Java-based search spiders comes up completely empty.
> >
> > Any thoughts?  Solve this and I'll buy you a beer on Tuesday.
> >
> > Thanks,
> > Markhttp://markphillip.com
>
> --
> Our Web site: http://www.RefreshAustin.org/
>
> You received this message because you are subscribed to the Google Groups
> "Refresh Austin" group.
>
> [ Posting ]
> To post to this group, send email to [email protected]
> Job-related postings should follow http://tr.im/refreshaustinjobspolicy
> We do not accept job posts from recruiters.
>
> [ Unsubscribe ]
> To unsubscribe from this group, send email to
> [email protected]<refresh-austin%[email protected]>
>
> [ More Info ]
> For more options, visit this group at
> http://groups.google.com/group/Refresh-Austin
>

-- 
Our Web site: http://www.RefreshAustin.org/

You received this message because you are subscribed to the Google Groups 
"Refresh Austin" group.

[ Posting ]
To post to this group, send email to [email protected]
Job-related postings should follow http://tr.im/refreshaustinjobspolicy
We do not accept job posts from recruiters.

[ Unsubscribe ]
To unsubscribe from this group, send email to 
[email protected]

[ More Info ]
For more options, visit this group at 
http://groups.google.com/group/Refresh-Austin

Reply via email to