I do it like this: /etc/map/adobe/sling:match = http/adobe.example.com/(.+) /etc/map/adobe/sling:internalRedirect = /content/adobe/$1 /etc/map/day/sling:match = http/day.example.com/(.+) /etc/map/day/sling:internalRedirect = /content/day/$1
And, I find all <a href="/content/adobe/foo/bar.html"> and transform it to <a href="http://adobe.example.com/foo/bar.html"> . And, <a href="/content/day/foo/bar.html"> becomes <a href=" http://day.example.com/foo/bar.html"> @Component(immediate = true, label = "canonical url stuff") @Service @Properties({ @Property(name = "pipeline.mode", value = "global"), @Property(name = "service.ranking", intValue = -1) }) public class CanonicalHrefFactory implements TransformerFactory { @Override public Transformer createTransformer() { new CanonicalHref(); } private static class CanonicalHref extends ContentHandlerDelegate { @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { final ContentHandler contentHandler = getContentHandler(); final Attributes modified = "a".equalsIgnoreCase(localName) ? DO_THE_HREF_URL_REWRITE_HERE_TO_YOUR_LIKING(attributes) : attributes; contentHandler.startElement(uri, localName, qName, modified); } } } In this method, DO_THE_HREF_URL_REWRITE_HERE_TO_YOUR_LIKING(), I could look into /etc/map and find suitable entry (by looking at sling:internalRedirect property). If the longest match is found, I can parse sling:match of the node to get hostname. But, as you can see, sling:match regex might not contain hostname.. It could be something like http/(adobe|day)\.com/(.+) I don't think org.apache.sling.jcr.resource.internal.JcrResourceResolver.resolve(HttpServletRequest, String) is injective (url a,b,c rewrites to d. inverse of that isn't a function). If there is no instance where siteA links to siteB, you can just implement TransformerFactory and strip out the prefix, /content/<siteName>, from href (as long as you structured your repository consistently). On Tue, Feb 7, 2012 at 7:38 AM, David Gonzalez <[email protected]>wrote: > Sam, doesn't etc/map require a root mapping which can't be a regex > (can't be regex for outgoing mapping atleast)? How would I structure > the etc/map nodes to only match on the resource path? Would I just put > the resource mapping directly under scheme (http) node I lieu of the > root mapping? > > Thanks > > > > On Feb 7, 2012, at 7:18 AM, "sam ”" <[email protected]> wrote: > > > You can rewrite from http server. > > > > For the urls appearing in html, you can use rewriter: > > > http://sling.apache.org/site/output-rewriting-pipelines-orgapacheslingrewriter.html > > > > Or, since your mappings are simple, you can roll out your own utility > that > > walks /etc/map for sling:internalRedirect. And, find the longest matching > > internalRedirect against resourcePath. > > Once found, you can construct url from there. > > > > > > On Mon, Feb 6, 2012 at 10:42 PM, David G. <[email protected]> > wrote: > > > >> Hey, > >> > >> I'm using dispatcher running under httpd as cache. > >> > >> One of the things I am trying to get around is serving pages from the > >> usual /content/<site>/<lang>/page.html structure. > >> > >> I need to validate, but I think I could > >> > >> 1) handle incoming rewrites: mysite.com/page.html > > >> /content/mysite/en/page.html > >> 2) use the JCR Resource Resolver mappings to rewrite all my in-page > links > >> to point at /page.html > >> > >> I haven't looked at the source code to see why sling can't handle > >> bi-directional mapping when using regex (it seems like it should be able > >> to, but I must be missing something). > >> > >> Thanks > >> > >> -- > >> David Gonzalez > >> Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > >> > >> > >> On Monday, February 6, 2012 at 12:29 PM, James Stansell wrote: > >> > >>> On Mon, Feb 6, 2012 at 5:26 AM, David Gonzalez < > [email protected](mailto: > >> [email protected])>wrote: > >>> > >>>> Does mod-rewrite support rewriting all the links in the documents > >>>> returned in the response? > >>>> > >>> > >>> > >>> Probably not. In fact right now a lot of our links are > >>> /content/<site>/en/page.html and we have rewrite rule which gives a > >>> redirect to /page.html. > >>> > >>> It should be possible to use a sling filter to modify the links when > >>> serving the page but we haven't looked into that yet. > >>> > >>> > >>>> Have you seen perf hits doing this? (I'm assuming every html response > >>>> must be parsed and rewritten.) > >>>> > >>> > >>> > >>> As far as I know our performance concerns are in other areas. Our sling > >> is > >>> actually part of CQ5 so we already were using httpd in order to host > the > >>> dispatcher plugin for caching the pages. Plus we are using mod_rewrite > >> for > >>> rewriting 1000s of legacy URLs so I don't think we ever considered > >> another > >>> option. > >>> > >>> > >>>> Are there any gotchas w mod_rewrite that you've run into rewriting > >>>> incoming and outgoing urls? > >>>> > >>> > >>> > >>> Our biggest problems have been with the legacy URLs. I guess a general > >>> gotcha could be the regexes for the rewrite; not thinking of anything > >> else. > >>> > >>> If we were using plain sling we would probably be caching with > varnish. I > >>> wonder if that has any rewrite support? Are you using a web cache? > >>> > >>> > >> > >> > >> >
