Pike wrote:
Hi Ricky, Chris

I've not noticed much difference, with both plugins failing on the feedburner feed:

- http://feeds.feedburner.com/Techcrunch


Strange, but that feed is indeed invalid xml if I wget it.
It starts with newlines and ends with comments. Very
picky, but that's not allowed afaik.


Yes, I did a wget shortly after posting and the feed is clearly invalid XML; however as unfortunate as it is the web is full of invalid XML feeds that still need to be parsed somehow. To paraphrase the well known mantra, my feelings are that these plugins need to be more liberal in what they except and people need to be more conservative in what they produce.

Another problem I seem to have just now is that some of the search results link to their XML feeds, rather than to the destination of their items.

I have this with all results: what is indexed
seems to be 1 record per feed, containing a
parsed version of the content including all its items,
with sometimes bits of xml and html markup in it.

I was assuming this is the intended behaviour ?

It may well be the intended behaviour, but it's not the behaviour I want. The strategy I'd like to employ is the strategy you mention trying to get going on another thread; i.e. to crawl feed items (rather than the feed) with a depth of 1.

If you manage to do this successfully then I'd love to hear how.

R.

Reply via email to