Ok, at least I'm not missing anything.  I understand the benefits it's
providing with its stateful framework.  Developing a site with Wicket is
easier than with any other framework I've used.  But this statefulness,
which makes websites so easy to develop, seems to be counter productive to
SEO:  

GoogleBot will follow and index stateful links.  Worst case scenario, these
actually become visible to google users and when they click the link it
takes them to an "invalid session" page.  They think, "This site is broken"
and move on to the next link of their search result.  

Another approach to solving this is to block all the stateful pages in my
robots.txt file.  But how can I block these links in robots.txt since they
change per session?  Is there any way to know what the url will resolve to
when googlebot tries to visit my site so I can tell it to disallow:
/?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?  


> -----Original Message-----
> From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
> Sent: Thursday, April 03, 2008 5:45 PM
> To: [email protected]
> Subject: Re: Removing the jsessionid for SEO
> 
> On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan <[EMAIL PROTECTED]>
> wrote:
> > Ok I did a little preliminary research on this.  Right now
> PagingNavigator
> >  uses PagingNavigationLink's to represent its page.  This extends Link.
> I'm
> >  supposed to override PagingNavigator's newPagingNavigationLink() method
> to
> >  accomplish this (I think) but past that, this isn't very
> straightforward to
> >  me.
> >
> >  Do I need to create my own BookmarkablePagingNavigationLink?  When I
> do...
> >  what next?  I really don't know enough about bookmarkablePageLinks to
> do
> >  this.  Right now, all the magic happens inside PagingNavigationLink.
> Won't
> >  I have to move all that logic into the WebPage that I'm passing into
> >  BookmarkablePagingNavigationLink?  This seems like a lot of work.  Am I
> >  missing something critical?
> 
> no, you are not missing anything. you see, when you go stateless, like
> what you want, then you have to recreate all the magic stuff that
> makes stateful links Just Work. Without state you are back to the
> servlet/mvc programming model: you have to encode the state that you
> want into the link, then on the trip back decode it, recreate
> something from it, and then apply that something onto the components.
> This is the crapwork that wicket does for you usually.
> 
> -igor
> 
> 
> >
> >
> >  > -----Original Message-----
> >  > From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
> >
> >
> > > Sent: Thursday, April 03, 2008 3:40 PM
> >  > To: [email protected]
> >  > Subject: Re: Removing the jsessionid for SEO
> >  >
> >  > you subclass the pagenavigator and make it use bookmarkable links
> >  > also. it has factory methods for all the links it uses.
> >  >
> >  > -igor
> >  >
> >  >
> >  > On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan <[EMAIL PROTECTED]>
> >  > wrote:
> >  > > I wasn't talking about the links that are on the list (I already
> make
> >  > those
> >  > >  bookmarkable).  I'm talking about the links that the Navigator
> >  > generates.
> >  > >  How do I make it so page 2 is bookmarkable?
> >  > >
> >  > >
> >  > >  -----Original Message-----
> >  > >  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
> >  > >
> >  > >
> >  > > Sent: Thursday, April 03, 2008 3:30 PM
> >  > >  To: [email protected]
> >  > >  Subject: Re: Removing the jsessionid for SEO
> >  > >
> >  > >  instead of
> >  > >
> >  > >  item.add(new link("foo") { onclick() });
> >  > >
> >  > >  do
> >  > >
> >  > >  item.add(new bookmarkablepagelink("foo", page.class));
> >  > >
> >  > >  -igor
> >  > >
> >  > >
> >  > >  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
> <[EMAIL PROTECTED]>
> >  > wrote:
> >  > >  > How?  I asked how to do it before and nobody suggested this as a
> >  > >  >  possibility.
> >  > >  >
> >  > >  >
> >  > >  >
> >  > >  >  -----Original Message-----
> >  > >  >  From: Igor Vaynberg [mailto:[EMAIL PROTECTED]
> >  > >  >  Sent: Thursday, April 03, 2008 3:26 PM
> >  > >  >  To: [email protected]
> >  > >  >  Subject: Re: Removing the jsessionid for SEO
> >  > >  >
> >  > >  >  dataview can work in a stateless mode, just use bookmarkable
> links
> >  > inside
> >  > >  it
> >  > >  >
> >  > >  >  -igor
> >  > >  >
> >  > >  >
> >  > >  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
> <[EMAIL PROTECTED]>
> >  > >  wrote:
> >  > >  >  > Regardless, at the very least this makes your site look
> "weird"
> >  > and
> >  > >  >  >  unprofessional when google puts a jsessionid on your url.
> There
> >  > has
> >  > >  got
> >  > >  >  to
> >  > >  >  >  be some negative effect when google visits it the second
> time and
> >  > the
> >  > >  >  >  jsessionid has changed but it sees the same exact content.
> Worst
> >  > >  case,
> >  > >  >  >  it'll think you're trying to trick it.
> >  > >  >  >
> >  > >  >  >  About those 404s, I'm finding that with the fix I provided I
> >  > don't get
> >  > >  a
> >  > >  >  >  404, but the links refresh the page I'm already on.  IE: If
> I'm
> >  > on A,
> >  > >  and
> >  > >  >  a
> >  > >  >  >  link to B is non-bookmarkable, clicking B refreshes A.
> >  > >  >  >
> >  > >  >  >  This issue is very disconcerting to me.  It's one of the
> reasons
> >  > I
> >  > >  wish
> >  > >  >  that
> >  > >  >  >  DataView had an option to work in stateless mode.  Cause if
> I ban
> >  > >  cookies
> >  > >  >  >  and Googlebot visits my home page (with a navigator on it),
> it'll
> >  > try
> >  > >  to
> >  > >  >  >  follow all these page links and from its perspective, they
> all
> >  > lead
> >  > >  back
> >  > >  >  to
> >  > >  >  >  the first page.  So it's kinda a catch-22: Include the
> jsessionid
> >  > in
> >  > >  the
> >  > >  >  >  urls and get bad SEO or remove the jsessionid and get bad
> SEO :(
> >  > >  >  >
> >  > >  >  >  Perhaps the answer to my prayers is a combination of the
> >  > >  noindex/nofollow
> >  > >  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a
> nofollow
> >  > on the
> >  > >  >  home
> >  > >  >  >  page (so googlebot doesn't try to follow the navigator
> links) and
> >  > use
> >  > >  the
> >  > >  >  >  sitemap.xml to point out the individual pages I want it to
> index.
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >  Matej: can you go into more detail about your hybrid URL
> >  > statement?
> >  > >  >  Won't
> >  > >  >  >  google index, for example, /home and /home.1 if I use it?
> When
> >  > it
> >  > >  >  follows
> >  > >  >  >  the next page, won't the url become /home.1.2 or something?
> That
> >  > .2
> >  > >  is a
> >  > >  >  >  page version: If google indexes that and tries to visit it
> again,
> >  > >  won't
> >  > >  >  it
> >  > >  >  >  report about an invalid session?
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >  -----Original Message-----
> >  > >  >  >  From: Matej Knopp [mailto:[EMAIL PROTECTED]
> >  > >  >  >  Sent: Thursday, April 03, 2008 11:10 AM
> >  > >  >  >  To: [email protected]
> >  > >  >  >  Subject: Re: Removing the jsessionid for SEO
> >  > >  >  >
> >  > >  >  >  On the other hand, crawling non-bookmarkable pages is not
> very
> >  > useful
> >  > >  >  >  anyway, since ?wicket:interface url will always get page
> expired
> >  > when
> >  > >  >  >  you click on the result.
> >  > >  >  >
> >  > >  >  >  However, preserving session makes lot of sense with hybrid
> url.
> >  > Google
> >  > >  >  >  remembers the original url (without page instance) while
> indexing
> >  > the
> >  > >  >  >  real page (after redirect).
> >  > >  >  >
> >  > >  >  >  I think though that the crawler is quite advanced. I'm would
> >  > think  it
> >  > >  >  >  supports cookies (at least JSESSIONID) as well as it
> evaluates
> >  > some of
> >  > >  >  >  the javascript on page.
> >  > >  >  >
> >  > >  >  >  -Matej
> >  > >  >  >
> >  > >  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
> >  > >  <[EMAIL PROTECTED]>
> >  > >  >  >  wrote:
> >  > >  >  >  > right. if you strip sessionid then all your
> nonbookmarkable
> >  > urls
> >  > >  will
> >  > >  >  >  >  resolve to a 404. that will probably drop your rank a lot
> >  > >  faster....
> >  > >  >  >  >
> >  > >  >  >  >  -igor
> >  > >  >  >  >
> >  > >  >  >  >
> >  > >  >  >  >
> >  > >  >  >  >
> >  > >  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
> >  > >  <[EMAIL PROTECTED]>
> >  > >  >  >  wrote:
> >  > >  >  >  >  > the problem is that then you have to have all stateless
> >  > pages.
> >  > >  Else
> >  > >  >  >  google
> >  > >  >  >  >  >  can't crawl your website.
> >  > >  >  >  >  >  And if that is the case then you could be completely
> >  > stateless
> >  > >  so
> >  > >  >  you
> >  > >  >  >  dont
> >  > >  >  >  >  >  have a session (id) to worry about at all.
> >  > >  >  >  >  >
> >  > >  >  >  >  >  johan
> >  > >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
> >  > >  >  >  >  >  [EMAIL PROTECTED]> wrote:
> >  > >  >  >  >  >
> >  > >  >  >  >  >  > When Google asks to not have special treatment for
> their
> >  > bot,
> >  > >  >  they
> >  > >  >  >  are
> >  > >  >  >  >  >  > referring to content more than anything. Regarding
> the
> >  > session
> >  > >  id
> >  > >  >  >  being
> >  > >  >  >  >  >  > coded in the URL, see the Technical guidelines
> section of
> >  > >  >  Google's
> >  > >  >  >  >  >  > Webmaster Guidelines -
> >  > >  >  >  >  >  >
> >  > >  >  >
> >  > >
> >  >
> http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
> >  > >  >  >  >  >  > gn
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > It specifically recommends "allow(ing) search bots
> to
> >  > crawl
> >  > >  your
> >  > >  >  >  sites
> >  > >  >  >  >  >  > without session IDs or arguments that track their
> path
> >  > through
> >  > >  >  the
> >  > >  >  >  >  >  > site."
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > -----Original Message-----
> >  > >  >  >  >  >  > From: Johan Compagner [mailto:[EMAIL PROTECTED]
> >  > >  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
> >  > >  >  >  >  >  > To: [email protected]
> >  > >  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > isnt google always saying that you shouldn't alter
> >  > behavior of
> >  > >  >  your
> >  > >  >  >  site
> >  > >  >  >  >  >  > depending of it is there bot or not?
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
> >  > <[EMAIL PROTECTED]>
> >  > >  >  >  wrote:
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > Hi!
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > igor.vaynberg wrote:
> >  > >  >  >  >  >  > > >
> >  > >  >  >  >  >  > > > also by doing what you have done users with
> cookies
> >  > >  disabled
> >  > >  >  >  wont be
> >  > >  >  >  >  >  > > > able to use your site...
> >  > >  >  >  >  >  > > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > In my opinion session id is a problem. Google
> index the
> >  > same
> >  > >  >  page
> >  > >  >  >  >  >  > again
> >  > >  >  >  >  >  > > and
> >  > >  >  >  >  >  > > again.
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > About the users without cookies we can do like
> this:
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >        static class Unbuffered extends WebResponse
> {
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >                 private static final String[]
> botAgents
> >  > = {
> >  > >  >  >  >  >  > "onetszukaj",
> >  > >  >  >  >  >  > > "googlebot",
> >  > >  >  >  >  >  > > "appie", "architext",
> >  > >  >  >  >  >  > >                        "jeeves", "bjaaland",
> "ferret",
> >  > >  >  "gulliver",
> >  > >  >  >  >  >  > > "harvest", "htdig",
> >  > >  >  >  >  >  > >                        "linkwalker", "lycos_",
> "moget",
> >  > >  >  >  >  >  > "muscatferret",
> >  > >  >  >  >  >  > > "myweb", "nomad",
> >  > >  >  >  >  >  > > "scooter",
> >  > >  >  >  >  >  > >                        "yahoo!\\sslurp\\schina",
> >  > "slurp",
> >  > >  >  >  "weblayers",
> >  > >  >  >  >  >  > > "antibot", "bruinbot",
> >  > >  >  >  >  >  > > "digout4u",
> >  > >  >  >  >  >  > >                        "echo!", "ia_archiver",
> >  > "jennybot",
> >  > >  >  >  "mercator",
> >  > >  >  >  >  >  > > "netcraft", "msnbot",
> >  > >  >  >  >  >  > > "petersnews",
> >  > >  >  >  >  >  > >                        "unlost_web_crawler",
> "voila",
> >  > >  >  "webbase",
> >  > >  >  >  >  >  > > "webcollage", "cfetch",
> >  > >  >  >  >  >  > > "zyborg",
> >  > >  >  >  >  >  > >                        "wisenutbot", "robot",
> "crawl",
> >  > >  "spider"
> >  > >  >  };
> >  > >  >  >  /*
> >  > >  >  >  >  >  > and
> >  > >  >  >  >  >  > > so on... */
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >                public Unbuffered(final
> >  > HttpServletResponse
> >  > >  res)
> >  > >  >  {
> >  > >  >  >  >  >  > >            super(res);
> >  > >  >  >  >  >  > >         }
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >        @Override
> >  > >  >  >  >  >  > >        public CharSequence encodeURL(final
> CharSequence
> >  > url)
> >  > >  {
> >  > >  >  >  >  >  > >             return isAgent() ? url :
> >  > super.encodeURL(url);
> >  > >  >  >  >  >  > >        }
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >                private static boolean isAgent() {
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >                        String agent =
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  >
> >  > >  >  >
> >  > >
> >  >
> ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
> >  > >  >  >  >  >  > tHeader("User-Agent");
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >                        for(String bot : botAgents)
> {
> >  > >  >  >  >  >  > >                                if
> >  > >  >  >  (agent.toLowerCase().indexOf(bot) !=
> >  > >  >  >  >  >  > -1)
> >  > >  >  >  >  >  > > {
> >  > >  >  >  >  >  > >                                        return
> true;
> >  > >  >  >  >  >  > >                                }
> >  > >  >  >  >  >  > >                        }
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >                        return false;
> >  > >  >  >  >  >  > >                }
> >  > >  >  >  >  >  > >    }
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > I didn't test this code but I do similar thing in
> my
> >  > old
> >  > >  >  >  application
> >  > >  >  >  >  >  > in
> >  > >  >  >  >  >  > > Spring and it works.
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > Take care,
> >  > >  >  >  >  >  > > Artur
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > --
> >  > >  >  >  >  >  > > View this message in context:
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  >
> >  > >  >  >
> >  > >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> >  > tp16464534p1646739
> >  > >  >  >  >  >  >
> >  > >  >  >
> >  > >  >
> >  > >  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> >  > tp16464534p1646
> >  > >  >  >  7396.html>
> >  > >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >  > > > Sent from the Wicket - User mailing list archive at
> >  > >  Nabble.com.
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  ------------------------------------------------------------
> -----
> >  > ----
> >  > >  >  >  >  >  > > To unsubscribe, e-mail: users-
> >  > [EMAIL PROTECTED]
> >  > >  >  >  >  >  > > For additional commands, e-mail:
> >  > >  [EMAIL PROTECTED]
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > ______________
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > The information contained in this message is
> proprietary
> >  > >  and/or
> >  > >  >  >  >  >  > confidential. If you are not the
> >  > >  >  >  >  >  > intended recipient, please: (i) delete the message
> and
> >  > all
> >  > >  >  copies;
> >  > >  >  >  (ii) do
> >  > >  >  >  >  >  > not disclose,
> >  > >  >  >  >  >  > distribute or use the message in any manner; and
> (iii)
> >  > notify
> >  > >  the
> >  > >  >  >  sender
> >  > >  >  >  >  >  > immediately. In addition,
> >  > >  >  >  >  >  > please be aware that any message addressed to our
> domain
> >  > is
> >  > >  >  subject
> >  > >  >  >  to
> >  > >  >  >  >  >  > archiving and review by
> >  > >  >  >  >  >  > persons other than the intended recipient. Thank
> you.
> >  > >  >  >  >  >  > _____________
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  >
> >  > >  >  >  ------------------------------------------------------------
> -----
> >  > ----
> >  > >  >  >  >  >  > To unsubscribe, e-mail: users-
> >  > [EMAIL PROTECTED]
> >  > >  >  >  >  >  > For additional commands, e-mail: users-
> >  > [EMAIL PROTECTED]
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >
> >  > >  >  >  >
> >  > >  ------------------------------------------------------------------
> ---
> >  > >  >  >  >  To unsubscribe, e-mail: users-
> [EMAIL PROTECTED]
> >  > >  >  >  >  For additional commands, e-mail: users-
> [EMAIL PROTECTED]
> >  > >  >  >  >
> >  > >  >  >  >
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >  --
> >  > >  >  >  Resizable and reorderable grid components.
> >  > >  >  >  http://www.inmethod.com
> >  > >  >  >
> >  > >  >  >  ------------------------------------------------------------
> -----
> >  > ----
> >  > >  >  >  To unsubscribe, e-mail: [EMAIL PROTECTED]
> >  > >  >  >  For additional commands, e-mail: users-
> [EMAIL PROTECTED]
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >  ------------------------------------------------------------
> -----
> >  > ----
> >  > >  >  >  To unsubscribe, e-mail: [EMAIL PROTECTED]
> >  > >  >  >  For additional commands, e-mail: users-
> [EMAIL PROTECTED]
> >  > >  >  >
> >  > >  >  >
> >  > >  >
> >  > >  >  ---------------------------------------------------------------
> -----
> >  > -
> >  > >  >  To unsubscribe, e-mail: [EMAIL PROTECTED]
> >  > >  >  For additional commands, e-mail: [EMAIL PROTECTED]
> >  > >  >
> >  > >  >
> >  > >  >  ---------------------------------------------------------------
> -----
> >  > -
> >  > >  >  To unsubscribe, e-mail: [EMAIL PROTECTED]
> >  > >  >  For additional commands, e-mail: [EMAIL PROTECTED]
> >  > >  >
> >  > >  >
> >  > >
> >  > >  ------------------------------------------------------------------
> ---
> >  > >  To unsubscribe, e-mail: [EMAIL PROTECTED]
> >  > >  For additional commands, e-mail: [EMAIL PROTECTED]
> >  > >
> >  > >
> >  > >  ------------------------------------------------------------------
> ---
> >  > >  To unsubscribe, e-mail: [EMAIL PROTECTED]
> >  > >  For additional commands, e-mail: [EMAIL PROTECTED]
> >  > >
> >  > >
> >  >
> >  > ---------------------------------------------------------------------
> >  > To unsubscribe, e-mail: [EMAIL PROTECTED]
> >  > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >  ---------------------------------------------------------------------
> >  To unsubscribe, e-mail: [EMAIL PROTECTED]
> >  For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to