I stumbled upon
 https://bugzilla.redhat.com/show_bug.cgi?id=249743
and I say, "Easy, lets make some reasonable robots.txt", but then it get complicate (as usually). And I would like to know your opinion.

Publicly reachable are 4 types of URL:

/rhn/Login.do
        -- login page
/rhn/help/*
        -- help in jsp
/help/
        -- pxt help, or plain html
/pub/
        -- your local garbage

Did I forget something?

Now the robots.txt...
One approach can be "disallow everything":
 User-agent: *
 Disallow: /

But do we want that?

Is there reason why to forbid indexing first login page (you can put there info about your company and want that indexed)?
Is there reason why to forbid indexing help?
Is there reason why to forbid /pub? You either do not have it publicly (and put it behind firewall) or you have it in wild internet and then you probably do not care if somebody will index it.

Ideas? Comments?

Mirek

_______________________________________________
Spacewalk-devel mailing list
Spacewalk-devel@redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-devel

Reply via email to