Re: [xwiki-users] Usage and placement of robots.txt file
On Jan 8, 2012, at 10:03 PM, Kaya Saman wrote: On 01/08/2012 10:06 PM, Vincent Massol wrote: Hi Guys, Would be great if you could update the existing wiki documentation with the information in this thread since it wasn't good enough in the first place apparently :) Thanks! -Vincent That's an interesting point Vincent! As an Xwiki user a while back I wrote a howto on installing Xwiki with Tomcat and Postgresql on FreeBSD. I just can't remember where the 'user space' was to put it in as it's been a while since I last worked with Xwiki I know that we have some external tutorials listed here: http://platform.xwiki.org/xwiki/bin/view/AdminGuide/Installation#HTutorials - although from my posting you can see that I'm back and things are online again. I would like to write some documentation about migrating Xwiki from server to server or multiple platforms as I've gone from FreeBSD over to Nexenta Core 3 with GlassfishV3 in the past and recently over to Fedora 11. I've got a lot of time on my hands and would like to contribute...…. Hey's that's great. Feel free to ask specific questions if you need help on knowing where to put stuff! You could start by adding the content of this thread that you started on xwiki.org in the location where we explain how to use robots.txt? Thanks -Vincent Regards, Kaya On Jan 8, 2012, at 4:49 PM, Kaya Saman wrote: On 01/08/2012 03:53 PM, Guillaume Fenollar wrote: Hi Kaya, Yes, if you don't use any front webserver (ie Apache or nginx), you should put robots.txt directly into /ROOT directory of tomcat (if this one listen on port 80). After that, you can simply test your set up, trying to join http://youdomain.org/robots.txt. If you don't find it this way, bots won't find it neither. Thanks for the response Guillaume! I found a site: http://www.frobee.com/robots-txt-check which actually tests compliancey of the robots.txt and it seems mine are fine. Concerning the disallow directives, it is your choice to let the bots to index what you want/need. My advice would be the make an inventory of space and actions you don't want to index. You could take this one as example: http://cdlsworld.xwiki.com/robots.txt I took a look at it and will compare that to the example off the Xwiki site. Finally, it's funny you're asking about the fact that bots could harass your server, because almost everyone want them (except for bad robots) to come indexing their websites :-) Anyway, I don't think that robots could take a remarkable amount of trafic. But the users who find your content through search engines, will ;-) I guess it's what you want. It's not that I don't want things to be indexed or viewed but am getting a strange issue on one of my Xwiki sites that whenever I load the site, ie start tomcat, the memory usage is really low ~600MB; then after a while the cpu will start working a little ~10% and the memory consumed by the process will jump up to 1.6GB. There's not much on that site to begin with, I mean my Wiki site has more information and images etc.. then this site which is my www site yet the www site is consuming way more memory?? I'm not really sure of how to even begin debugging as I have both webalizer and awstats working on my reverse Squid proxy infront of tomcat. So far awstats which has been working from the beginning (3rd Jan this year) shows nearly 9000 hits :-S out of which a lot come from Googlebot. That was my only issue. The URLs of both sites are here: http://www.optiplex-networks.com http://wiki.optiplex-networks.com and footprints are shown here: PID JID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 51547 22 www 46 440 3545M 1590M ucond 1 6:04 0.00% java 28878 14 www 49 440 3544M 404M ucond 0 3:47 0.00% java with JID 14 being the wiki. site and JID 22 being the www. site. Regards, Regards, Kaya ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users
Re: [xwiki-users] Usage and placement of robots.txt file
On 01/09/2012 11:49 AM, Vincent Massol wrote: On Jan 8, 2012, at 10:03 PM, Kaya Saman wrote: On 01/08/2012 10:06 PM, Vincent Massol wrote: Hi Guys, Would be great if you could update the existing wiki documentation with the information in this thread since it wasn't good enough in the first place apparently :) Thanks! -Vincent That's an interesting point Vincent! As an Xwiki user a while back I wrote a howto on installing Xwiki with Tomcat and Postgresql on FreeBSD. I just can't remember where the 'user space' was to put it in as it's been a while since I last worked with Xwiki I know that we have some external tutorials listed here: http://platform.xwiki.org/xwiki/bin/view/AdminGuide/Installation#HTutorials I managed to find my old howto - is still stored as draft :-) google'd freebsd xwiki http://dev.xwiki.org/xwiki/bin/view/Drafts/BSD_Install Can I get this on the 'real' wiki?? P.s. second link on Google goes directly to my personal wiki site which I put all of the Xwiki stuff I'm doing onto. So managed to crack 2 eggs for the price of one it seems! - although from my posting you can see that I'm back and things are online again. I would like to write some documentation about migrating Xwiki from server to server or multiple platforms as I've gone from FreeBSD over to Nexenta Core 3 with GlassfishV3 in the past and recently over to Fedora 11. I've got a lot of time on my hands and would like to contribute...…. Hey's that's great. Feel free to ask specific questions if you need help on knowing where to put stuff! You could start by adding the content of this thread that you started on xwiki.org in the location where we explain how to use robots.txt? I'll have a look at updating the robots.txt content today. Also I will put some migration stuff on the drafts section today as that's quite in depth. Additionally I will add to my draft about containing Java Heap Space memory errors, Perm Space memory allocation and my custom scripts to kick Tomcat back online after being killed by the OS after the system runs out of memory. as said I'm unemployed at the moment so I have waaay too much time :-) Maybe some kind person could donate some virtual bananas for me to keep my energy up while Xwiki'ing the wiki ;-P Thanks -Vincent Regards, Kaya ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users
Re: [xwiki-users] Usage and placement of robots.txt file
Hi Kaya, here you go: http://www.businesspundit.com/wp-content/uploads/2008/10/banana1.jpg Looking forward to the FreeBSD documentation, Guillaume On Mon, Jan 9, 2012 at 2:23 PM, Kaya Saman kayasa...@gmail.com wrote: On 01/09/2012 11:49 AM, Vincent Massol wrote: On Jan 8, 2012, at 10:03 PM, Kaya Saman wrote: On 01/08/2012 10:06 PM, Vincent Massol wrote: Hi Guys, Would be great if you could update the existing wiki documentation with the information in this thread since it wasn't good enough in the first place apparently :) Thanks! -Vincent That's an interesting point Vincent! As an Xwiki user a while back I wrote a howto on installing Xwiki with Tomcat and Postgresql on FreeBSD. I just can't remember where the 'user space' was to put it in as it's been a while since I last worked with Xwiki I know that we have some external tutorials listed here: http://platform.xwiki.org/**xwiki/bin/view/AdminGuide/** Installation#HTutorialshttp://platform.xwiki.org/xwiki/bin/view/AdminGuide/Installation#HTutorials I managed to find my old howto - is still stored as draft :-) google'd freebsd xwiki http://dev.xwiki.org/xwiki/**bin/view/Drafts/BSD_Installhttp://dev.xwiki.org/xwiki/bin/view/Drafts/BSD_Install Can I get this on the 'real' wiki?? P.s. second link on Google goes directly to my personal wiki site which I put all of the Xwiki stuff I'm doing onto. So managed to crack 2 eggs for the price of one it seems! - although from my posting you can see that I'm back and things are online again. I would like to write some documentation about migrating Xwiki from server to server or multiple platforms as I've gone from FreeBSD over to Nexenta Core 3 with GlassfishV3 in the past and recently over to Fedora 11. I've got a lot of time on my hands and would like to contribute...…. Hey's that's great. Feel free to ask specific questions if you need help on knowing where to put stuff! You could start by adding the content of this thread that you started on xwiki.org in the location where we explain how to use robots.txt? I'll have a look at updating the robots.txt content today. Also I will put some migration stuff on the drafts section today as that's quite in depth. Additionally I will add to my draft about containing Java Heap Space memory errors, Perm Space memory allocation and my custom scripts to kick Tomcat back online after being killed by the OS after the system runs out of memory. as said I'm unemployed at the moment so I have waaay too much time :-) Maybe some kind person could donate some virtual bananas for me to keep my energy up while Xwiki'ing the wiki ;-P Thanks -Vincent Regards, Kaya __**_ users mailing list users@xwiki.org http://lists.xwiki.org/**mailman/listinfo/usershttp://lists.xwiki.org/mailman/listinfo/users ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users
Re: [xwiki-users] Usage and placement of robots.txt file
On 01/09/2012 03:31 PM, Guillaume Lerouge wrote: Hi Kaya, here you go: http://www.businesspundit.com/wp-content/uploads/2008/10/banana1.jpg Looking forward to the FreeBSD documentation, Guillaume Thanks!! :-) I appended my first doc with a section at the end called Memory Errors: http://dev.xwiki.org/xwiki/bin/view/Drafts/BSD_Install This doc is now complete and should be taken over to the main wiki so if I could get permission to do so I will move it. Next up robots.txt --- On Mon, Jan 9, 2012 at 2:23 PM, Kaya Samankayasa...@gmail.com wrote: On 01/09/2012 11:49 AM, Vincent Massol wrote: On Jan 8, 2012, at 10:03 PM, Kaya Saman wrote: On 01/08/2012 10:06 PM, Vincent Massol wrote: Hi Guys, Would be great if you could update the existing wiki documentation with the information in this thread since it wasn't good enough in the first place apparently :) Thanks! -Vincent That's an interesting point Vincent! As an Xwiki user a while back I wrote a howto on installing Xwiki with Tomcat and Postgresql on FreeBSD. I just can't remember where the 'user space' was to put it in as it's been a while since I last worked with Xwiki I know that we have some external tutorials listed here: http://platform.xwiki.org/**xwiki/bin/view/AdminGuide/** Installation#HTutorialshttp://platform.xwiki.org/xwiki/bin/view/AdminGuide/Installation#HTutorials I managed to find my old howto - is still stored as draft :-) google'd freebsd xwiki http://dev.xwiki.org/xwiki/**bin/view/Drafts/BSD_Installhttp://dev.xwiki.org/xwiki/bin/view/Drafts/BSD_Install Can I get this on the 'real' wiki?? P.s. second link on Google goes directly to my personal wiki site which I put all of the Xwiki stuff I'm doing onto. So managed to crack 2 eggs for the price of one it seems! - although from my posting you can see that I'm back and things are online again. I would like to write some documentation about migrating Xwiki from server to server or multiple platforms as I've gone from FreeBSD over to Nexenta Core 3 with GlassfishV3 in the past and recently over to Fedora 11. I've got a lot of time on my hands and would like to contribute...…. Hey's that's great. Feel free to ask specific questions if you need help on knowing where to put stuff! You could start by adding the content of this thread that you started on xwiki.org in the location where we explain how to use robots.txt? I'll have a look at updating the robots.txt content today. Also I will put some migration stuff on the drafts section today as that's quite in depth. Additionally I will add to my draft about containing Java Heap Space memory errors, Perm Space memory allocation and my custom scripts to kick Tomcat back online after being killed by the OS after the system runs out of memory. as said I'm unemployed at the moment so I have waaay too much time :-) Maybe some kind person could donate some virtual bananas for me to keep my energy up while Xwiki'ing the wiki ;-P Thanks -Vincent Regards, Kaya __**_ users mailing list users@xwiki.org http://lists.xwiki.org/**mailman/listinfo/usershttp://lists.xwiki.org/mailman/listinfo/users ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users
Re: [xwiki-users] Usage and placement of robots.txt file
Hi Kaya, Yes, if you don't use any front webserver (ie Apache or nginx), you should put robots.txt directly into /ROOT directory of tomcat (if this one listen on port 80). After that, you can simply test your set up, trying to join http://youdomain.org/robots.txt. If you don't find it this way, bots won't find it neither. Concerning the disallow directives, it is your choice to let the bots to index what you want/need. My advice would be the make an inventory of space and actions you don't want to index. You could take this one as example: http://cdlsworld.xwiki.com/robots.txt Finally, it's funny you're asking about the fact that bots could harass your server, because almost everyone want them (except for bad robots) to come indexing their websites :-) Anyway, I don't think that robots could take a remarkable amount of trafic. But the users who find your content through search engines, will ;-) I guess it's what you want. Regards, -- Guillaume Fenollar XWiki SysAdmin Tel : +33 (0)1.83.62.65.97 2012/1/8 Kaya Saman kayasa...@gmail.com Hi, in the Xwiki documentation for the robots.txt file it says to put it in the webserver configuration. http://platform.xwiki.org/**xwiki/bin/view/AdminGuide/** Performances#HRobots.txthttp://platform.xwiki.org/xwiki/bin/view/AdminGuide/Performances#HRobots.txt On Tomcat where would this go? - Directly on the webapps/ROOT/ directory?? Also the directives used it claims: # It could be also usefull to block certain spaces from crawling, # especially if this spaces doesn't provide new content Should the: /xwiki/bin/view/Photos/ portion also be excluded?? Just as a last thing, what kind of performance benefits would be adhered to by stopping crawlers? I am imagining: CPU, RAM, Network B/W Regards, Kaya __**_ users mailing list users@xwiki.org http://lists.xwiki.org/**mailman/listinfo/usershttp://lists.xwiki.org/mailman/listinfo/users ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users
Re: [xwiki-users] Usage and placement of robots.txt file
On 01/08/2012 03:53 PM, Guillaume Fenollar wrote: Hi Kaya, Yes, if you don't use any front webserver (ie Apache or nginx), you should put robots.txt directly into /ROOT directory of tomcat (if this one listen on port 80). After that, you can simply test your set up, trying to join http://youdomain.org/robots.txt. If you don't find it this way, bots won't find it neither. Thanks for the response Guillaume! I found a site: http://www.frobee.com/robots-txt-check which actually tests compliancey of the robots.txt and it seems mine are fine. Concerning the disallow directives, it is your choice to let the bots to index what you want/need. My advice would be the make an inventory of space and actions you don't want to index. You could take this one as example: http://cdlsworld.xwiki.com/robots.txt I took a look at it and will compare that to the example off the Xwiki site. Finally, it's funny you're asking about the fact that bots could harass your server, because almost everyone want them (except for bad robots) to come indexing their websites :-) Anyway, I don't think that robots could take a remarkable amount of trafic. But the users who find your content through search engines, will ;-) I guess it's what you want. It's not that I don't want things to be indexed or viewed but am getting a strange issue on one of my Xwiki sites that whenever I load the site, ie start tomcat, the memory usage is really low ~600MB; then after a while the cpu will start working a little ~10% and the memory consumed by the process will jump up to 1.6GB. There's not much on that site to begin with, I mean my Wiki site has more information and images etc.. then this site which is my www site yet the www site is consuming way more memory?? I'm not really sure of how to even begin debugging as I have both webalizer and awstats working on my reverse Squid proxy infront of tomcat. So far awstats which has been working from the beginning (3rd Jan this year) shows nearly 9000 hits :-S out of which a lot come from Googlebot. That was my only issue. The URLs of both sites are here: http://www.optiplex-networks.com http://wiki.optiplex-networks.com and footprints are shown here: PID JID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 51547 22 www 46 440 3545M 1590M ucond 1 6:04 0.00% java 28878 14 www 49 440 3544M 404M ucond 0 3:47 0.00% java with JID 14 being the wiki. site and JID 22 being the www. site. Regards, Regards, Kaya ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users
Re: [xwiki-users] Usage and placement of robots.txt file
Hi Guys, Would be great if you could update the existing wiki documentation with the information in this thread since it wasn't good enough in the first place apparently :) Thanks! -Vincent On Jan 8, 2012, at 4:49 PM, Kaya Saman wrote: On 01/08/2012 03:53 PM, Guillaume Fenollar wrote: Hi Kaya, Yes, if you don't use any front webserver (ie Apache or nginx), you should put robots.txt directly into /ROOT directory of tomcat (if this one listen on port 80). After that, you can simply test your set up, trying to join http://youdomain.org/robots.txt. If you don't find it this way, bots won't find it neither. Thanks for the response Guillaume! I found a site: http://www.frobee.com/robots-txt-check which actually tests compliancey of the robots.txt and it seems mine are fine. Concerning the disallow directives, it is your choice to let the bots to index what you want/need. My advice would be the make an inventory of space and actions you don't want to index. You could take this one as example: http://cdlsworld.xwiki.com/robots.txt I took a look at it and will compare that to the example off the Xwiki site. Finally, it's funny you're asking about the fact that bots could harass your server, because almost everyone want them (except for bad robots) to come indexing their websites :-) Anyway, I don't think that robots could take a remarkable amount of trafic. But the users who find your content through search engines, will ;-) I guess it's what you want. It's not that I don't want things to be indexed or viewed but am getting a strange issue on one of my Xwiki sites that whenever I load the site, ie start tomcat, the memory usage is really low ~600MB; then after a while the cpu will start working a little ~10% and the memory consumed by the process will jump up to 1.6GB. There's not much on that site to begin with, I mean my Wiki site has more information and images etc.. then this site which is my www site yet the www site is consuming way more memory?? I'm not really sure of how to even begin debugging as I have both webalizer and awstats working on my reverse Squid proxy infront of tomcat. So far awstats which has been working from the beginning (3rd Jan this year) shows nearly 9000 hits :-S out of which a lot come from Googlebot. That was my only issue. The URLs of both sites are here: http://www.optiplex-networks.com http://wiki.optiplex-networks.com and footprints are shown here: PID JID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 51547 22 www 46 440 3545M 1590M ucond 1 6:04 0.00% java 28878 14 www 49 440 3544M 404M ucond 0 3:47 0.00% java with JID 14 being the wiki. site and JID 22 being the www. site. Regards, Regards, Kaya ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users
Re: [xwiki-users] Usage and placement of robots.txt file
On 01/08/2012 10:06 PM, Vincent Massol wrote: Hi Guys, Would be great if you could update the existing wiki documentation with the information in this thread since it wasn't good enough in the first place apparently :) Thanks! -Vincent That's an interesting point Vincent! As an Xwiki user a while back I wrote a howto on installing Xwiki with Tomcat and Postgresql on FreeBSD. I just can't remember where the 'user space' was to put it in as it's been a while since I last worked with Xwiki - although from my posting you can see that I'm back and things are online again. I would like to write some documentation about migrating Xwiki from server to server or multiple platforms as I've gone from FreeBSD over to Nexenta Core 3 with GlassfishV3 in the past and recently over to Fedora 11. I've got a lot of time on my hands and would like to contribute... Regards, Kaya On Jan 8, 2012, at 4:49 PM, Kaya Saman wrote: On 01/08/2012 03:53 PM, Guillaume Fenollar wrote: Hi Kaya, Yes, if you don't use any front webserver (ie Apache or nginx), you should put robots.txt directly into /ROOT directory of tomcat (if this one listen on port 80). After that, you can simply test your set up, trying to join http://youdomain.org/robots.txt. If you don't find it this way, bots won't find it neither. Thanks for the response Guillaume! I found a site: http://www.frobee.com/robots-txt-check which actually tests compliancey of the robots.txt and it seems mine are fine. Concerning the disallow directives, it is your choice to let the bots to index what you want/need. My advice would be the make an inventory of space and actions you don't want to index. You could take this one as example: http://cdlsworld.xwiki.com/robots.txt I took a look at it and will compare that to the example off the Xwiki site. Finally, it's funny you're asking about the fact that bots could harass your server, because almost everyone want them (except for bad robots) to come indexing their websites :-) Anyway, I don't think that robots could take a remarkable amount of trafic. But the users who find your content through search engines, will ;-) I guess it's what you want. It's not that I don't want things to be indexed or viewed but am getting a strange issue on one of my Xwiki sites that whenever I load the site, ie start tomcat, the memory usage is really low ~600MB; then after a while the cpu will start working a little ~10% and the memory consumed by the process will jump up to 1.6GB. There's not much on that site to begin with, I mean my Wiki site has more information and images etc.. then this site which is my www site yet the www site is consuming way more memory?? I'm not really sure of how to even begin debugging as I have both webalizer and awstats working on my reverse Squid proxy infront of tomcat. So far awstats which has been working from the beginning (3rd Jan this year) shows nearly 9000 hits :-S out of which a lot come from Googlebot. That was my only issue. The URLs of both sites are here: http://www.optiplex-networks.com http://wiki.optiplex-networks.com and footprints are shown here: PID JID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 51547 22 www 46 440 3545M 1590M ucond 1 6:04 0.00% java 28878 14 www 49 440 3544M 404M ucond 0 3:47 0.00% java with JID 14 being the wiki. site and JID 22 being the www. site. Regards, Regards, Kaya ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users
[xwiki-users] Usage and placement of robots.txt file
Hi, in the Xwiki documentation for the robots.txt file it says to put it in the webserver configuration. http://platform.xwiki.org/xwiki/bin/view/AdminGuide/Performances#HRobots.txt On Tomcat where would this go? - Directly on the webapps/ROOT/ directory?? Also the directives used it claims: # It could be also usefull to block certain spaces from crawling, # especially if this spaces doesn't provide new content Should the: /xwiki/bin/view/Photos/ portion also be excluded?? Just as a last thing, what kind of performance benefits would be adhered to by stopping crawlers? I am imagining: CPU, RAM, Network B/W Regards, Kaya ___ users mailing list users@xwiki.org http://lists.xwiki.org/mailman/listinfo/users