[PHP] Re: PATH INFO urls - replacing GET syntax

George Whiffen Mon, 25 Feb 2002 04:17:45 -0800

Navid,

The path info technique is moderately well known.  I thought I'd seen an
account of it at phpwizard.net, and I would have given you the link, but
when I looked just now I couldn't find it. Anyway, this is roughly how it
goes:-

Background
========
You can add all kinds of stuff at the end of a url pointing to your php
script and the php still gets run e.g.

http://www.mysite.net/myscript.php/this/and/thatand_the_other

will still cause myscript.php to be run.  Anything appearing after the path
name is know as the path info and is available in a pre-defined global
variable $PATH_INFO, (at least with Apache).  In this example $PATH_INFO
would be "/this/and/thatand_the_other", (note the / at the front).

Basic Technique
===========
So you can use PATH_INFO to communicate with your script from a link without
using a GET string e.g.
instead of :  /myscript.php?search=findthis, you could just have
/myscript.php/findthis.

With the first, "GET" syntax, php automatically sets $search to "findthis".
If you use the second, path info syntax instead, then you have to do a bit
more work to get the value into the variable you want.  In this case, you
would do something like:

list($null,$search) = explode('/',$PATH_INFO);

>From this line on, it's exactly as if $search was set from the GET
information.

(The $null variable is simply there to "use up" the empty variable that
explode assigns for what appears before the first /, it is only necessary
because I use list() syntax to do all the work of unpacking and assignment
in one line.  list($search) = explode('/',substr($PATH_INFO,1)); would be an
alternative that avoids this dummy variable.)

You can pass any number of parameters, but your script needs to know what
variables need to be set and in what order, which it doesn't have to do
with  the GET, since the names of the variables are passed as well as their
values on a GET. For example, I've had urls that look like something like
this:
    /product.php/cookers/range/belling
as a replacement for

/product.php?section_code=cookers&type_code=range&manufacturer_code=belling

To unpack the PATH_INFO version in this case, I use the following code:

list($null,$section_code,$type_code,$manufacturer_code) =
explode('/',$PATH_INFO);

That's about all there is to it.

Eliminating the .php
=============
There is a further refinement that I like, which is to get rid of the .php
on the end of the script name.  So
instead of    /myscript.php/this  you have just /myscript/this, or in my
last example :
/product/cookers/range/belling.   To achieve this you need to tell your
web-server that it should treat files with no extension as if they were php
files.  If you run Apache, you need the following directive in the
configuration file.

DefaultType application/x-httpd-php

(Don't assume I've got the syntax exactly right, check it!)

You can set it across your whole server, just some virtual servers or
individual directories.  This tells Apache that if a file has no "type" i.e.
no extension, treat it as php.

The Advantages
===========
Why bother using PATH_INFO, since it is extra work?

1. Elegant URLs
Like you, I much prefer elegant urls.  My reasoning is that the "location"
of a page has an extremely prominent position within any browser, so it
ought to be made as attractive, straightforward and unfrightening for the
user as possible.  It's as much a part of the design as the rest of the
page.   Whenever I see horrendous urls, (and some of the supposedly most
professional sites are the worst), I get a bad taste in my mouth.  I
immediately assume that if the location of the page has such unnecessary
geekery, there is probably a lot more unnecessary geekery to follow on the
body of the page...

2. Search Engines
An often quoted reason for using PATH_INFO is that search engines will not
follow links which include GET data, whereas they will always follow
PATH_INFO based links since there is no way they can distinguish them from
"real" static links.  In principle, this means that you can get every page
of search results indexed and treated as if it were an individual static
page by the search engine.  You just need "hidden" path info based links on
your pages that point to searches for "all" or various sub-sets of the
possible results.  But search engine placing is a black art, so don't take
my word for it ;)

3. Usability
Another point, is PATH_INFO urls are generally easy to
remember/bookmark/manage, since they have a simpler format and will always
use less characters than the GET version.  They are, for example, much more
likely to be fully visible in the browser's message toolbar during hovers
over links.

4. Multi-Output Services
There is a more subtle long-term advantage in eliminating the GET portion of
your urls.  Pure "PATH_INFO" urls are inherently more standards compliant
than urls with GET information.  They are significantly more likely to fit
in with non-html urls, e.g. satisfying wap or xml requests for "data" from a
"service" without changing the "addressing" of the service. You could
publish the fact that urls of the form mysite/product/cookers/range/belling
always return information about Belling range cookers, with the form in
which the data is returned,  html, wml,  xml or whatever just depending on
the request type.  The basic address of the data is the same, regardless of
the format requested.

Before anyone jumps on my back for suggesting GET syntax is non-standard,
let me point out that it's always going to be true that an "address" format
with only one delimiter i.e. '/', is going to be more easily ported than an
address format with four i.e. /, ?, = and &.

5. Advanced Techniques
There are loads of simple but powerful techniques you can invent  if you
combine PATH_INFO with the DefaultType technique.

For instance, you could take an existing static site and turn it into a
fully dynamic php site with minimal changes.  You might want to add standard
headers/footers to turn a framed into a frameless site, or to add
sophisticated database controlled user access control, all without changing
any of the static pages or their cross-links.  How?  Well you rename the
static site's directories and then put php scripts in with the old name of
the directory.  Then all requests for pages within the original directory
are immediately routed to the php script.  That script can then use
PATH_INFO to find what page was actually requested, read in the html of the
static page, manipulate it,  e.g. "including" new headers/footers, and then
return the modified html just as if it was the original static page! It's a
lot less work than modifying all the static pages, and a lot more friendly
if you want the page contents to be maintained by html designers with no php
skills.

General Problems
============

1. Caching
If you are using PATH_INFO, you should watch out for caching issues.  I'm
not quite sure how it works, (can anyone help explain?), but it seems that
browsers and other cache-ers (e.g. proxy server caches), try and make sure
they do not cache dynamic pages e.g. search results.  I suspect that they
identify dynamic pages by POST requests or presence of GET information,
neither of which are there if you use PATH_INFO.  I believe that means you
can end up with users getting out of date results if they follow a PATH_INFO
link at a later date. The results will be served up from cache instead of
being re-generated.

There's a pretty simple failsafe solution.  You just have to make sure you
send out explicit headers to prevent cache-ing e.g. insert the following
php, (or call a function with this code), at the top of every script.

header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . "GMT");
header("Cache-Control: no-cache,must-revalidate");
header("Pragma: no-cache");

2. Invisible php
One regret I have about developing sites which fully use these techniques,
is that it can end up being completely impossible for anyone, including the
robots that collect tool-usage data, to know that the site uses php.  If I
was using asp, perl, ColdFusion, Oracle, etc. I would consider that
desirable ;).  But it does seem a bit unfair on php that it can be so good
that ends up completely invisible!

I don't know how many people are using this technique, (how can anyone
tell?), but it's even possible that this is part of the reason why php sites
are, apparently, not growing as fast relative to other technologies as they
used to.

Sample Site
========
http://tandridge.cpfc.co.uk uses PATH_INFO quite a bit.  In particular, it
has php scripts which produce dynamic images, (try the "League Tables" and
"Cup Competitions" options).  These dynamic image scripts receive their
control information i.e. where to find the data to display, via PATH_INFO.
That makes it easy to change the image displayed without reloading the
page.  There's a little bit of Javascript that runs when the user changes
their selection to tweak the image url (location), and change the
image-generating scripts "parameters".

By the way, if anyone is interested in bits of this site's code for another
sports results site, let me know and I'll see if the client will let me
release some or all of the code.  It took me more than a month's work, and,
IMHO, it has some semi-cute features.  It'd be a shame if it was only ever
used for a few hundred London youth soccer teams, when it could be easily
converted to handle league/cup fixtures and results for basketball,
baseball, ice-hockey or whatever

Another of my sites that's pretty much path info throughout, but which
you'll probably find more entertaining is http://super10.lycos.co.uk, an
online gaming site.  There are free games, (flash based), and real money to
be won if you sign up for the prize games, so even though the techniques are
less obvious, (most of it runs in windows without location bars), you might
find it more interesting!

Hope this helps.  If you, or anyone else, find these notes useful, please
drop me a mail and I'll think about putting a proper article somewhere or
other.

George

mailto::[EMAIL PROTECTED]

Navid Yar wrote:

> George,
>
> Good point. I actually like your idea a lot. I have never thought about
> using $SCRIPT_NAME.
>
> You also mentioned using "$PATH_INFO to implement elegant (and
> search-engine safe) urls..." below. Can you give me a couple of examples
> of how I might do this? I always hated the GET strings at the end of the
> url. Sometimes I redirect a user to the same page two times just to get
> rid of the trailing GET string. I know that's a bad way of doing it, but
> it was a temporary thing until I could find a way around it. I would
> really appreciate your help on this one. Thanks...
>
> Navid
>
> -----Original Message-----
> From: George Whiffen [mailto:[EMAIL PROTECTED]]
> Sent: Monday, February 18, 2002 7:09 AM
> To: Navid Yar
> Subject: Re: [PHP] form submission error trapping
>
> Navid,
>
> $SCRIPT_NAME is sometimes a safer alternative than $PHP_SELF.
>
> The difference is that $PHP_SELF includes $PATH_INFO while $SCRIPT_NAME
> is
> just the name of the actual script running.
> http://www.php.net/manual/en/language.variables.predefined.php
>
> This becomes particularly important if you use $PATH_INFO to implement
> elegant (and search-engine safe) urls e.g. /search/products/myproduct
> rather
> than /search.php?category=products&key=myproduct.
>
> George
>
> Navid Yar wrote:
>
> > Simply, to send a form to itself, you can use a special variable
> called
> > $PHP_SELF. Here's an example of how to use it:
> >
> > if ($somevalue) {
> >    header("Location: $PHP_SELF");
> > } else {
> >    execute some other code...
> > }
> >
> > Here, if $somevalue holds true, it will call itself and reload the
> same
> > script/file. This code is not very useful at all, but it gets the
> point
> > across. If you wanted to pass GET variables to this, then you could
> > easily say:
> >
> > header("Location: $PHP_SELF?var=value&var2=value2&var3=value3");
> >
> > ...and so on. You can also use this approach with Sessions if you
> wanted
> > to turn the values back over to the form page, assuming you had two
> > pages: one for the form, and one for form checking and entry into a
> > database. There are several ways to check forms, whether you want it
> on
> > one page or span it out to several pages. You just need to be creative
> > in what tools are avaiable to you. Here is an example of how you can
> > pass session values:
> >
> > header("Location: some_file.php?<?=SID?>");
> >
> > Here, whatever variables you've registered in session_register() will
> be
> > passed to the php page you specify, in this case some_file.php. Hope
> > this helps. Have fun, and happy coding.  :)

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP] Re: PATH INFO urls - replacing GET syntax

Reply via email to