I am using JPluck 2.0.1 and I had added the NY Times, which was working fine. I am very impressed with this application.

However I was only going max depth of 1. Many of the main articles have a part one and a part two. So I decided to add to my inclusion pattern something like .*pagewanted=2.* (and I tried a bunch of variations on that). I was hoping that it would therefore include the links that the first page of the articles point to (when an article has two pages the link to second one ends with pagewanted=2). Instead of doing that it seemed to do the opposite and exclude pretty much everything except that main page.
I can't figure out what putting things onto the include list acts to exclude everything else.
I tried increasing my depth to 2, but then The Times gets MUCH larger, and there is not much extra that I need.
I don't really want to go by hand and exclude all of the extra stuff.
I also tried adding an additional inclusion of .* and then I was right back to what I had started with.
So is there an easy way in JPluck to do what in English would read:
"do the main page of the NYTimes and one level deeper. If anything you find in that search ends with pagewanted=2 then additionally do that page, even if it is an additional depth of 1"?


I was hoping that the include list would do that, but it seems not to.

Thanks,
--

--Adam
_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

Reply via email to