On 01/08/2012 03:53 PM, Guillaume Fenollar wrote:
Hi Kaya,

Yes, if you don't use any front webserver (ie Apache or nginx), you should
put robots.txt directly into /ROOT directory of tomcat (if this one listen
on port 80). After that, you can simply test your set up, trying to join
http://youdomain.org/robots.txt. If you don't find it this way, bots won't
find it neither.

Thanks for the response Guillaume!

I found a site: http://www.frobee.com/robots-txt-check

which actually tests compliancey of the robots.txt and it seems mine are fine.


Concerning the disallow directives, it is your choice to let the bots to
index what you want/need. My advice would be the make an inventory of space
and actions you don't want to index.
You could take this one as example: http://cdlsworld.xwiki.com/robots.txt

I took a look at it and will compare that to the example off the Xwiki site.


Finally, it's funny you're asking about the fact that bots could harass
your server, because almost everyone want them (except for bad robots) to
come indexing their websites :-)
Anyway, I don't think that robots could take a remarkable amount of trafic.
But the users who find your content through search engines, will ;-) I
guess it's what you want.

It's not that I don't want things to be indexed or viewed but am getting a strange issue on one of my Xwiki sites that whenever I load the site, ie start tomcat, the memory usage is really low ~600MB; then after a while the cpu will start working a little ~10% and the memory consumed by the process will jump up to 1.6GB. There's not much on that site to begin with, I mean my Wiki site has more information and images etc.. then this site which is my www site yet the www site is consuming way more memory??

I'm not really sure of how to even begin debugging as I have both webalizer and awstats working on my reverse Squid proxy infront of tomcat. So far awstats which has been working from the beginning (3rd Jan this year) shows nearly 9000 hits :-S out of which a lot come from Googlebot.

That was my only issue.

The URLs of both sites are here:


http://www.optiplex-networks.com

http://wiki.optiplex-networks.com


and footprints are shown here:

PID JID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
51547  22 www         46  44    0  3545M  1590M ucond   1   6:04  0.00% java
28878  14 www         49  44    0  3544M   404M ucond   0   3:47  0.00% java


with JID 14 being the wiki. site and JID 22 being the www. site.....


Regards,


Regards,

Kaya
_______________________________________________
users mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/users

Reply via email to