Re: [Wicket-user] Session ids and search engine bots

Johan Compagner Thu, 08 Jun 2006 02:10:31 -0700

If you say that you don't want sessionids in all (or most) bookmarkable links

make your own: WebRequestCodingStrategy

override the method:

protected CharSequence encode(RequestCycle requestCycle,
IBookmarkablePageRequestTarget requestTarget)

and copy the complete code of that method. but don't do:

return requestCycle.getOriginalResponse().encodeURL(url);

just return that url directly.

Or make your own buffered response object (WebApplication.newWebResponse())
and override in the response object the
public CharSequence encodeURL(CharSequence url)

but then you have to analize the url and know which one you want to encode and which not.

The problem is that we have to encode the bookmarkable pages. Even if those pages where completely stateless by itself.
Because a bookmarkable link on that page could still require the session to get the logged in user or something.
So for us we never can know that a bookmarkable page doesn't have to be encoded or not.

What we could do is have a bot detection in wicket. Where we see that it is a bot and don't encode anything.

johan

On 6/8/06, John Patterson <[EMAIL PROTECTED]> wrote:

On 7 Jun 2006, at 22:15, Igor Vaynberg wrote:

Because wicket components are stateful that state needs to live somewhere. So far it can only live in session for wicket 1.2 or on client in wicket 2.0.

Thanks for the update. I look forward to using this new facility.

Maybe in 2.0 what we do is check for the bot and then switch from server to client state. But even then the problem is that urls are not stable, they are still session relative.

All my pages which should be indexed by bots require no "conversation" state to be storedand would all be bookmarkable. Would it be possible to have common pages (or page components) stored in the application scope instead? They could then have their rendered content cached too.

The workaround is simple but is a pain to implement. Basically you have to use only bookmarkable pages and bookmarkable links for the pages you want to be indexed. This means you are not using wicket's session handling and intstead yourself encoding state into the url just like you would with webwork. In 2.0 we have stateless forms so you can also perform POSTs. However even if you do this wicket will still create a session upon first request. This may or may not be resolved in the 2.0 frame.

It would be really great if it could be solved. I remember Jonathon Locke talking about Wicket 1.0 on TSS and saying "Client side state, including zero-state will be available in 1.1." That statement was one of the reasons I chose to start developing in wicket. Now I find I have to maintain two separate frameworks for different parts of my site.

jsessionid is a well known variable and im sure googlebot is smart enough to know what it is, if not - well then you cannot use wicket if you want your site to be crawled by google.

Now isnt there a file that you can serve to the google bot to tell it how to crawl your site? Which urls to hit, etc? If this is true then you can create a bookmarkable "gateway" page to the rest of your application.

The sitemap is used to supplement the normal crawl mainly for pages that cannot be reached by crawling. I would like both methods to crawl the maximum number of pages. I also want other bots to crawl my pages - not just Google.

Currently, I am considering either apache url rewriting to remove all session ids from non-conversational pages or hacking wicket to disable url encoding for all pages that do not absolutely require a session.

Any comments on these approaches? Is there any thing currently in wicket that would help me selectively disable url encoding?

Thanks,

John.

_______________________________________________
Wicket-user mailing list
Wicket-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wicket-user

_______________________________________________
Wicket-user mailing list
Wicket-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/wicket-user

Re: [Wicket-user] Session ids and search engine bots

Reply via email to