Re: Starting Page

Alan Hoyle Tue, 22 Jul 2003 10:45:01 -0700

On Tue, 22 Jul 2003 at 12:44, David A. Desrosiers wrote:

> > Suppose you have a book with a ToC, and each of the N chapters is a
> > separate page which links to the next/prev chapter.
>
>       So you have:
>
>       [TOC.html]
>       |       |
>       |       |
>       [ch1]   [ch2]
>       /\      /\
>       p n     p n
>
>       Now where is the chapter itself? Where is the content?


No, something more like this structure on the web:

        [TOC.html] ___  ...
         |      \     \___     ...
         V       V       V           ...
        [ch1]<-->[ch2]<-->[ch3]<--> ... <-->[chN]

But instead of presenting the TOC first, I wanted to see Chapter 1.  For
whatever reason....

Currently, assuming to make it display ch1 first, I'd have to start
spidering from ch1.html with a link depth of N-1.

As I've said before, I don't have a need for this feature, but I can
understand why someone might want it.

>       This still doesn't make sense to me. How can you start at Chapter 1
> (assuming you want to read the content, not the empty page with two links on
> it; previous/next), if you start spidering from the TOC, 3 levels above it?
>
>       In a real world example, let's say I want to spider news.bbc.co.uk's
> news articles. How would I point the 'spider-from' value to www.cnn.com,
> which at 2-levels deep, points to news.bbc.co.uk, but exists "above" your
> initial point of spidering penetration?
>
>
>               [cnn.com]               # "TOC" in your example
>                  |
>               [links.html]            # previous/next page in your example
>                  |
> start ->      [news.bbc.co.uk]        # Chapter data
>                  |
>               [news_articles]         # Sub-chapter data
>
>       Can you give me an example of how this could be useful, because I'm
> afraid I'm missing the entire concept. You can't spider something that isn't
> linked from anywhere in your structure, and somehow have it included in your
> final document.

Here's an approximation of the BBC's structure:

        __[bbc]__
       /    |    \___
      /     |        \
   [UK]   [world]   [science]
    ||      |||       |||
    ||      |||       |||
  [........articles........]
    /   |   \   /   |  \
   /    |    \ /    |   \
 [...related previous articles...]

Let's say you want to spider all of news.bbc.co.uk's articles, but for
whatever reason, you want to see the Science ones first.  If this feature
existed, you'd set it to spider from news.bbc.co.uk/, but have the first
page displayed be news.bbc.co.uk/science/

The best way to spider all of the BBC would be to spider from [bbc] with a
link depth of 2.

Currently, the only way to get Plucker to display [science] first is to
spider from there.  In order to have it get all the other news content,
you'd have to set it to spider with a link depth of 3:

[sci] -> [bbc] -> [UK] -> [UK articles]

But, then you'd also get:

[sci] -> [sci articles] -> [related articles] -> [related^2 articles]

(This ignores the connections from siblings [sci] <-> [UK], etc, but even
when that's taken into consideration, spidering from [sci] with depth 2
still gets the unwanted [sci] -> [sci articles] -> [related articles].)

-alan


-- 
    Alan Hoyle  -  [EMAIL PROTECTED]  -  http://www.alanhoyle.com/
      "I don't want the world, I just want your half." -TMBG
                 Get Horizontal, Play Ultimate.


_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

Re: Starting Page

Reply via email to