> The ideal would be to input a list of URLs that must be 
> included in the collection, with a separate depth for
> each item on the list.  And an assumption that there 
> are interlinks between all of these.  

Do you know about the max_depth attribute? 

Or maybe it is maxdepth - they both appear, and 
I don't have the time this morning to verify that 
they are treated the same.  This is probably a good 
time for 

Disclosure:  I haven't used it much; there may be 
bugs in any of the code, the documentatation, or 
my understanding, but this should get you started.

maxdepth is one of the plucker-specific attributes.  
Most web browsers will ignore it as unknown, but 
plucker will say "oh!  I thought I was going out x 
more layers, but from this link in particular, I'll 
now go out y layers instead."

> have a home page which then links to three different
> pages, A, B, C, D and E.  One could then get such 
> careful control that one could include the home page,
> two levels of links starting with A, three levels starting
> with C, just page B itself and no links from it, and 
> exclude D and E.  

<a href=A maxdepth=2>A</a>
<a href=B maxdepth=0>B</a>
<a href=C maxdepth=3>C</a>
<a href=D>D</a>
<a href=E>E</a>

You would need to put D and E on an exclude list, since
they appear directly in the home document.  Of course,
if you're editing the document to add a maxdepth, you
might as well take out the links to D and E while you're 
in there.

If you want to associate linkdepth with urlpatterns 
automatically, you will need to change the code, but
this gives you a start.

Also, the way the caching works, there may be a risk
that it would exclude links that are close enough to
some (but not all) starting nodes.

>From the example above, assume B and C both link
to F.  If it reads B first,  it won't fetch F (because B 
says to stop).  Then, when it reads C, it may say 
"whoa, I already looked at F" and not bother to follow 
up."

I *think* what actually happens is that it treats the 
attributes as part of the key, so that it will follow up.
Unfortunately, this means that if A also links to F,
you will fetch (and save) it and its children twice,
with only the C branch pointing to grandchildren.
(Or *maybe* the alias list will catch it, and it will
fetch either 2 or 3 levels randomly.  There might 
be room for improvements here... :D)

-jJ

_______________________________________________
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Reply via email to