Re: Update: Robots cannot read JSP?

Leon Rosenberg Fri, 17 Feb 2006 07:10:50 -0800

wget www.theuniquepear.com
saves the welcome.do page. So it seems to work.

Btw. I would suggest you change your mapping from .do to .html or
change your mapping to path-mapping and not extension:
/unique/do/welcome instead of
/unique/welcome.do. For better indexing change 'do' through something
meaningfull: "decor", "clocks", "lamps" - whatever is most important
in your page and you want to be found under. Maybe a combination of
them, up to 5 'subfolders' are evaluated by google (rumors).


regards
leon

P.S. You may want simply to add n different mappings, one for each
keyword, but beware of delivering completely equal content under
different urls. This would be considered spam and you'll thrown out of
the index.





On 2/17/06, Scott Purcell <[EMAIL PROTECTED]> wrote:
> I started the below thread last weekend, and upon suggestions, I have
> changed some javascript redirects to get to my site, into some JSP
> redirects, based upon user input earlier this week.
>
> In a nutshell, I am trying to make sure that robots can index my web
> site.
> My web site is a struts application, and is the default app. The way the
> site is configured, it is the root app, and I configured the root app to
> use welcome-file as a .jsp. So when the user hits the url
> <www.theuniquepear.com> it goes to a jsp page, which then does a jsp
> redirect to the www.theuniquepear.com/unique/welcome.do the way struts
> is set up then finally to the jsp via the action.
>
> Due to my lack of robot understanding, if I use curl now, and just issue
>
> curl www.theuniquepear.com it shows nothing, and does not do the
> redirect.
> But if I hit curl -L www.theuniquepear.com all is good and it is what I
> want the robots to read.
>
> I made the change last Monday or so, and each day I check my access log
> and the only entry I see is the robots come in and get a 500 and they
> are gone. When I google for my site, nothing shows up.
>
> Does anyone know if the robots follow the links like the curl -L or does
> it just use something like curl and never indexes my site? Also, what is
> really silly is that even this email will probably be found when I type
> in my url. Currently if one types in 'the unique pear' in google, I see
> all the threads I start for this subject, but the site is never to be
> found ... not good for business.
>
> Any input would be appreciated.
>
> Thanks,
>
>
>
> -----Original Message-----
> From: Mike Sabroff [mailto:[EMAIL PROTECTED]
> Sent: Saturday, February 11, 2006 11:09 AM
> To: Tomcat Users List
> Subject: Re: Robots cannot read JSP?
>
> Scott,
> Your assessment is incorrect!  First off, curl doesn't read html pages,
> it does a get or post to a url just as though you clicked it in your
> browser (and a lot of other things you can do with curl). Second off, it
>
> is not the jsp that is the problem, it is the javascript as Tim said,
> and the lack of links.
>
> Mike
>
> David Smith wrote:
> > I doubt the problem is with curl not being able to read files other
> > than .htm or .html. The problem is only browsers execute javascript.
> > Think of curl or the search engines as a browser without javascript
> > enabled.  What would you get in IE or Firefox if you disabled
> javascript?
> >
> > -- David
> >
> > Scott Purcell wrote:
> >> Tim,
> >> Thanks a lot for the info. I got to thinking, and tried invoking curl
> >> from my box on the url, and see exactly what you saw. The js screwing
> >> things up.
> >>
> >> So I decided to run curl on different pages, and I came to the
> >> conclusion that only htm, or html pages show up via curl?
> >>
> >> Does anyone think that the robots are just like curl, and that they
> can
> >> only read HTML files?
> >>
> >> Thanks for all, I know this is a bit off topic ...and I hope I don't
> >> hack anyone off.
> >>
> >> Thanks
> >> Scott
> >>
> >> -----Original Message-----
> >> From: Tim Funk [mailto:[EMAIL PROTECTED] Sent: Friday, February 10,
>
> >> 2006 8:50 PM
> >> To: Tomcat Users List
> >> Subject: Re: Access log to see where robots go.
> >>
> >> The problem is your home page, not robots.txt. When / is requested -
> the
> >>
> >> following is served back, notice the javascript redirect: (the full
> file
> >> is below)
> >>
> >> ----
> >>    function invokeWebApp() {
> >>      top.location.href =
> >> "http://www.theuniquepear.com/unique/index.jsp";;
> >>    }
> >> ----
> >> Search engines do not execute javascript are there are no links on
> the
> >> page so search engines have no where to go. (Except someone else's
> >> site).
> >>
> >> As much as I detest SEO companies, you might find it helpful to
> search
> >> for one for some assistance.
> >>
> >> <html>
> >> <head>
> >>    <head>
> >>      <title>The Unique Pear | Unique Home Decor & Accessories</title>
> >>                  <meta name="description" content="The Unique Pear is
> an
> >>
> >> online b                     outique specializing in home decor &
> >> accessories. Products include clocks, candl                     es,
> wall
> >>
> >> decor, garden, lighting, bath and more.">
> >>      <meta name="keywords" content="The Unique Pear Timework clocks,
> >> lamps, lamp                      shades, candles, aroma, aroma
> >> difuser, wall
> >> decor, wall scounces, wrought iron,                      pitchers,
> >> bookstands,
> >> jaqua bath products, candleholders">
> >>                  <meta name="description" content="">
> >> <meta name="keywords" content="">
> >>   </head>
> >> <body bgcolor="#FFFFFF">
> >>
> >> <script language = "javascript">
> >>    //<!--
> >>    function invokeWebApp() {
> >>      top.location.href =
> >> "http://www.theuniquepear.com/unique/index.jsp";;
> >>    }
> >>    invokeWebApp();
> >>    // -->
> >> </script>
> >>
> >> hello
> >> </body>
> >> </html>
> >>
> >> -Tim
> >>
> >> Scott Purcell wrote:
> >>
> >>> I have had trouble getting search engines to see my site. I built it
> >>>
> >> with struts, and use some tags from the index.html page to get
> business
> >> logic, to finally get to my page. The url is
> >> http://www.theuniquepear.com
> >>
> >>> Anyway, upon talking to some co-workers, they suggested I watch my
> >>>
> >> access log, so I can see what files they are indexing. I thought I
> had
> >> the access log turned on for the site, and see when someone hits my
> web
> >> site, but as far as the searchbots go, I only see this in my logs
> daily.
> >>
> >>> $ cat  localhost_access_log.2006-02-07.txt | less
> >>> 67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] "GET /robots.txt
> >>>
> >> HTTP/1.0" 404 985
> >>
> >>> 67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] "GET / HTTP/1.0" 200
> 844
> >>> 67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] "GET /robots.txt
> >>>
> >> HTTP/1.0" 404 985
> >>
> >>> 62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] "GET
> >>>
> >> /unique/welcome.do?OVRAW=home%20decorating%20ideas&OVKEY=home
> >>
> >>> 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET
> >>>
> >> /unique/includes/siteWide.css HTTP/1.1" 200 15402
> >>
> >>> 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] "GET
> >>>
> >> /unique/images/header_pear.jpg HTTP/1.1" 200 11227
> >>
> >>> I see the entry for robots.txt, but I have no idea where they are
> >>>
> >> going, or what they are doing.
> >>
> >>> I turned on access log like this in the server.xml like so:
> >>>         <Valve className="org.apache.catalina.valves.AccessLogValve"
> >>>                  directory="logs"  prefix="localhost_access_log."
> >>>
> >> suffix=".txt"
> >>
> >>>                  pattern="common" resolveHosts="false"/>
> >>>
> >>> And that is a snippet of the log from above.
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
>
> --
> Mike Sabroff
> Web Services Developer
> [EMAIL PROTECTED]
> 920-568-8379
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Update: Robots cannot read JSP?

Reply via email to