Re: Problem pluckering the LA Times

David A. Desrosiers Tue, 19 Mar 2002 09:00:58 -0800


> I dont think the problem is with plucker here,. If you actually try
> going to that url in a browser and click a link, I dont actually get
> anything back from the server in terms of content.


        What browser? I just tried Netscape, Konq, Dillo, Mozilla, and IE,
and they all return content. Clicking on the "Next" button repeatedly will
advance me through their "Top Stories" header pages as expected. When I
reach "Ino=9", it only has a "Home" and "Back" button, indicating I'm at the
end of the stories fields.

> It looks like it might be a problem on their side.

        It may be the way they handle spiders, but the site definately works
in a "real" browser (vs. Plucker, wget, etc.) The problem can be compensated
for in the parser, and I believe that's where the problem lies at the
moment.

        I believe the problem lies in HREF tags constructed as follows:

        <a href="?csc=1&crq=300730&src=Top+Stories&lno=9">Ex-Workers Face
        Off With China Oil Firm</a>

        Look very closely how the HREF element starts.. "?csc...". That may
be confusing the parser as it splits tags and links from the main URI and
protocol itself (or adds them). It's bad design to do that, but it's
perfectly legitimate.


[dd]

Re: Problem pluckering the LA Times

Reply via email to