On Fri, Jan 21, 2011 at 06:58:26PM +0100, Miroslav Suchy wrote:
> I stumbled upon
>  https://bugzilla.redhat.com/show_bug.cgi?id=249743
> and I say, "Easy, lets make some reasonable robots.txt", but then it
> get complicate (as usually). And I would like to know your opinion.
> 
> Publicly reachable are 4 types of URL:
> 
> /rhn/Login.do
>       -- login page
> /rhn/help/*
>       -- help in jsp
> /help/
>       -- pxt help, or plain html
> /pub/
>       -- your local garbage
> 
> Did I forget something?

Definitely /network/ for things like

        https://FQDN/network/systems/ssm/index.pxt

and /ty/ (is that still live), and the new cobbler kickstart URL format.

> Now the robots.txt...
> One approach can be "disallow everything":
>  User-agent: *
>  Disallow: /
> 
> But do we want that?
> 
> Is there reason why to forbid indexing first login page (you can put
> there info about your company and want that indexed)?
> Is there reason why to forbid indexing help?
> Is there reason why to forbid /pub? You either do not have it
> publicly (and put it behind firewall) or you have it in wild
> internet and then you probably do not care if somebody will index
> it.
> 
> Ideas? Comments?

I'd probably just Disallow everything and leave it on administrators
to allow what they see useful to be opened.

Spacewalk really does not provide any publicly available content,
worth of indexing, except maybe /pub/.

-- 
Jan Pazdziora
Principal Software Engineer, Satellite Engineering, Red Hat

_______________________________________________
Spacewalk-devel mailing list
Spacewalk-devel@redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-devel

Reply via email to