RE: Robots cannot read JSP?
It's not html or JSP nature of things. You are returning text/html for the mime type, and a real HTML document. The problem is the content you return does not provide the robots any place to go. Perhaps responding with a redirect (302) will provide them somewhere to go. You can use meta-refresh, or logic:redirect or if front-ended with apache, just provide a RedirectMatch ^/$ /unique/index.jsp line... HTH, Tim -Original Message- From: Scott Purcell [mailto:[EMAIL PROTECTED] Sent: Saturday, February 11, 2006 9:34 AM To: Tomcat Users List Subject: Robots cannot read JSP? Tim, Thanks a lot for the info. I got to thinking, and tried invoking curl from my box on the url, and see exactly what you saw. The js screwing things up. So I decided to run curl on different pages, and I came to the conclusion that only htm, or html pages show up via curl? Does anyone think that the robots are just like curl, and that they can only read HTML files? Thanks for all, I know this is a bit off topic ...and I hope I don't hack anyone off. Thanks Scott -Original Message- From: Tim Funk [mailto:[EMAIL PROTECTED] Sent: Friday, February 10, 2006 8:50 PM To: Tomcat Users List Subject: Re: Access log to see where robots go. The problem is your home page, not robots.txt. When / is requested - the following is served back, notice the javascript redirect: (the full file is below) function invokeWebApp() { top.location.href = http://www.theuniquepear.com/unique/index.jsp;; } Search engines do not execute javascript are there are no links on the page so search engines have no where to go. (Except someone else's site). As much as I detest SEO companies, you might find it helpful to search for one for some assistance. html head head titleThe Unique Pear | Unique Home Decor Accessories/title meta name=description content=The Unique Pear is an online b outique specializing in home decor accessories. Products include clocks, candl es, wall decor, garden, lighting, bath and more. meta name=keywords content=The Unique Pear Timework clocks, lamps, lamp shades, candles, aroma, aroma difuser, wall decor, wall scounces, wrought iron, pitchers, bookstands, jaqua bath products, candleholders meta name=description content= meta name=keywords content= /head body bgcolor=#FF script language = javascript //!-- function invokeWebApp() { top.location.href = http://www.theuniquepear.com/unique/index.jsp;; } invokeWebApp(); // -- /script hello /body /html -Tim Scott Purcell wrote: I have had trouble getting search engines to see my site. I built it with struts, and use some tags from the index.html page to get business logic, to finally get to my page. The url is http://www.theuniquepear.com Anyway, upon talking to some co-workers, they suggested I watch my access log, so I can see what files they are indexing. I thought I had the access log turned on for the site, and see when someone hits my web site, but as far as the searchbots go, I only see this in my logs daily. $ cat localhost_access_log.2006-02-07.txt | less 67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] GET /robots.txt HTTP/1.0 404 985 67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] GET / HTTP/1.0 200 844 67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] GET /robots.txt HTTP/1.0 404 985 62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] GET /unique/welcome.do?OVRAW=home%20decorating%20ideasOVKEY=home 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] GET /unique/includes/siteWide.css HTTP/1.1 200 15402 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] GET /unique/images/header_pear.jpg HTTP/1.1 200 11227 I see the entry for robots.txt, but I have no idea where they are going, or what they are doing. I turned on access log like this in the server.xml like so: Valve className=org.apache.catalina.valves.AccessLogValve directory=logs prefix=localhost_access_log. suffix=.txt pattern=common resolveHosts=false/ And that is a snippet of the log from above. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Robots cannot read JSP?
I doubt the problem is with curl not being able to read files other than .htm or .html. The problem is only browsers execute javascript. Think of curl or the search engines as a browser without javascript enabled. What would you get in IE or Firefox if you disabled javascript? -- David Scott Purcell wrote: Tim, Thanks a lot for the info. I got to thinking, and tried invoking curl from my box on the url, and see exactly what you saw. The js screwing things up. So I decided to run curl on different pages, and I came to the conclusion that only htm, or html pages show up via curl? Does anyone think that the robots are just like curl, and that they can only read HTML files? Thanks for all, I know this is a bit off topic ...and I hope I don't hack anyone off. Thanks Scott -Original Message- From: Tim Funk [mailto:[EMAIL PROTECTED] Sent: Friday, February 10, 2006 8:50 PM To: Tomcat Users List Subject: Re: Access log to see where robots go. The problem is your home page, not robots.txt. When / is requested - the following is served back, notice the javascript redirect: (the full file is below) function invokeWebApp() { top.location.href = http://www.theuniquepear.com/unique/index.jsp;; } Search engines do not execute javascript are there are no links on the page so search engines have no where to go. (Except someone else's site). As much as I detest SEO companies, you might find it helpful to search for one for some assistance. html head head titleThe Unique Pear | Unique Home Decor Accessories/title meta name=description content=The Unique Pear is an online b outique specializing in home decor accessories. Products include clocks, candl es, wall decor, garden, lighting, bath and more. meta name=keywords content=The Unique Pear Timework clocks, lamps, lamp shades, candles, aroma, aroma difuser, wall decor, wall scounces, wrought iron, pitchers, bookstands, jaqua bath products, candleholders meta name=description content= meta name=keywords content= /head body bgcolor=#FF script language = javascript //!-- function invokeWebApp() { top.location.href = http://www.theuniquepear.com/unique/index.jsp;; } invokeWebApp(); // -- /script hello /body /html -Tim Scott Purcell wrote: I have had trouble getting search engines to see my site. I built it with struts, and use some tags from the index.html page to get business logic, to finally get to my page. The url is http://www.theuniquepear.com Anyway, upon talking to some co-workers, they suggested I watch my access log, so I can see what files they are indexing. I thought I had the access log turned on for the site, and see when someone hits my web site, but as far as the searchbots go, I only see this in my logs daily. $ cat localhost_access_log.2006-02-07.txt | less 67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] GET /robots.txt HTTP/1.0 404 985 67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] GET / HTTP/1.0 200 844 67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] GET /robots.txt HTTP/1.0 404 985 62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] GET /unique/welcome.do?OVRAW=home%20decorating%20ideasOVKEY=home 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] GET /unique/includes/siteWide.css HTTP/1.1 200 15402 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] GET /unique/images/header_pear.jpg HTTP/1.1 200 11227 I see the entry for robots.txt, but I have no idea where they are going, or what they are doing. I turned on access log like this in the server.xml like so: Valve className=org.apache.catalina.valves.AccessLogValve directory=logs prefix=localhost_access_log. suffix=.txt pattern=common resolveHosts=false/ And that is a snippet of the log from above. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Robots cannot read JSP?
Scott, Your assessment is incorrect! First off, curl doesn't read html pages, it does a get or post to a url just as though you clicked it in your browser (and a lot of other things you can do with curl). Second off, it is not the jsp that is the problem, it is the javascript as Tim said, and the lack of links. Mike David Smith wrote: I doubt the problem is with curl not being able to read files other than .htm or .html. The problem is only browsers execute javascript. Think of curl or the search engines as a browser without javascript enabled. What would you get in IE or Firefox if you disabled javascript? -- David Scott Purcell wrote: Tim, Thanks a lot for the info. I got to thinking, and tried invoking curl from my box on the url, and see exactly what you saw. The js screwing things up. So I decided to run curl on different pages, and I came to the conclusion that only htm, or html pages show up via curl? Does anyone think that the robots are just like curl, and that they can only read HTML files? Thanks for all, I know this is a bit off topic ...and I hope I don't hack anyone off. Thanks Scott -Original Message- From: Tim Funk [mailto:[EMAIL PROTECTED] Sent: Friday, February 10, 2006 8:50 PM To: Tomcat Users List Subject: Re: Access log to see where robots go. The problem is your home page, not robots.txt. When / is requested - the following is served back, notice the javascript redirect: (the full file is below) function invokeWebApp() { top.location.href = http://www.theuniquepear.com/unique/index.jsp;; } Search engines do not execute javascript are there are no links on the page so search engines have no where to go. (Except someone else's site). As much as I detest SEO companies, you might find it helpful to search for one for some assistance. html head head titleThe Unique Pear | Unique Home Decor Accessories/title meta name=description content=The Unique Pear is an online b outique specializing in home decor accessories. Products include clocks, candl es, wall decor, garden, lighting, bath and more. meta name=keywords content=The Unique Pear Timework clocks, lamps, lamp shades, candles, aroma, aroma difuser, wall decor, wall scounces, wrought iron, pitchers, bookstands, jaqua bath products, candleholders meta name=description content= meta name=keywords content= /head body bgcolor=#FF script language = javascript //!-- function invokeWebApp() { top.location.href = http://www.theuniquepear.com/unique/index.jsp;; } invokeWebApp(); // -- /script hello /body /html -Tim Scott Purcell wrote: I have had trouble getting search engines to see my site. I built it with struts, and use some tags from the index.html page to get business logic, to finally get to my page. The url is http://www.theuniquepear.com Anyway, upon talking to some co-workers, they suggested I watch my access log, so I can see what files they are indexing. I thought I had the access log turned on for the site, and see when someone hits my web site, but as far as the searchbots go, I only see this in my logs daily. $ cat localhost_access_log.2006-02-07.txt | less 67.15.16.30 - - [07/Feb/2006:03:44:55 -0600] GET /robots.txt HTTP/1.0 404 985 67.15.16.30 - - [07/Feb/2006:03:46:21 -0600] GET / HTTP/1.0 200 844 67.15.16.30 - - [07/Feb/2006:03:51:57 -0600] GET /robots.txt HTTP/1.0 404 985 62.114.208.233 - - [07/Feb/2006:03:52:42 -0600] GET /unique/welcome.do?OVRAW=home%20decorating%20ideasOVKEY=home 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] GET /unique/includes/siteWide.css HTTP/1.1 200 15402 62.114.208.233 - - [07/Feb/2006:03:52:44 -0600] GET /unique/images/header_pear.jpg HTTP/1.1 200 11227 I see the entry for robots.txt, but I have no idea where they are going, or what they are doing. I turned on access log like this in the server.xml like so: Valve className=org.apache.catalina.valves.AccessLogValve directory=logs prefix=localhost_access_log. suffix=.txt pattern=common resolveHosts=false/ And that is a snippet of the log from above. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Mike Sabroff Web Services Developer [EMAIL PROTECTED] 920-568-8379 - To