Hi Pluckites

I am trying to pluck the Wikipedia current events page
http://en.wikipedia.org/wiki/Current_events\nand only that page (spidering 
limit = 1).

I am getting the following error:
Error:  Runtime error parsing document 
http://en.wikipedia.org/wiki/Current_events: 
unexpected char in declaration: '<'
  Parsing failed.
[See below for full error listing]

Having read the previous threads on this error, I checked the page at the 
W3C validator.  It validates.
"This Page Is Valid XHTML 1.0 Transitional!"
http://validator.w3.org/check?uri=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FCurrent_events

Looking at the code, Wikipedia seems to do some funny things with style sheets. 
 
<style type="text/css" media="screen,projection">/*<![CDATA[*/ @import 
"/style/monobook/main.css"; 
/*]]>*/</style>
My coding is not strong enough to know if this is the problem.

I have included the relevant .ini file at the end of this document.

As has been said here before, this is an oldie but a goodie.
http://lists.rubberchicken.org/pipermail/plucker-list/2003-January/001341.html
However, I seem to have found a new wrinkle.

Any advice gratefully appreciated.


Jonathan O'Donnell
mailto:[EMAIL PROTECTED]
http://purl.nla.gov.au/net/jod
+61 4 2575 5829

================== Full error listing begins ==============
Initializing Plucker spidering engine...
 
-----------------------------------------------------------
Updating channel: Wikipedia current events...
-----------------------------------------------------------
Pluckerdir is 'C:Program FilesPlucker'...
Using proxy '' with authentication for user ''...
---- 0 collected, 1 to do ----
Processing http://en.wikipedia.org/wiki/Current_events...
  Retrieved ok.
Error:  Runtime error parsing document 
http://en.wikipedia.org/wiki/Current_events: 
unexpected char in declaration: '<'
  Parsing failed.
---- all 0 pages retrieved and parsed ----
Writing out collected data...
Writing document 'Wikipedia current events' to file C:Program 
FilesPluckerchannels/Wikipediacurrentevents/Wikipediacurrentevents.pdb
Traceback (most recent call last):
  File "C:Program FilesPlucker/parser/python/PyPlucker/Spider.py", line 1734, 
in ?
    sys.exit(realmain(None))
  File "C:Program FilesPlucker/parser/python/PyPlucker/Spider.py", line 1719, 
in realmain
    retval = main (config, exclusion_lists)
  File "C:Program FilesPlucker/parser/python/PyPlucker/Spider.py", line 1147, 
in main
    mapping = writer.write (verbose=verbosity, alias_list=alias_list)
  File "C:Program FilesPlucker/parser/pythonPyPluckerWriter.py", line 535, 
in write
    result = Writer.write (self, verbose, alias_list=alias_list)
  File "C:Program FilesPlucker/parser/pythonPyPluckerWriter.py", line 352, 
in write
    raise RuntimeError("The collection process failed to generate a 'home' 
document")
RuntimeError: The collection process failed to generate a 'home' document
Installing channel output to destinations...
Setting new due date...
Tasks completed for all channels.
================ Full error listing ends =====================

================ Relevant plucker.ini file starts ===============
[Wikipediacurrentevents]
copy_to_dir=I:MULTIMEDStaff_TeamToJonathanJonathanMy_Web
directory_on_card=
doc_file=channels/Wikipediacurrentevents/Wikipediacurrentevents
doc_name=Wikipedia current events
handheld_target_storage_mode=0
is_usb_pause=1
maxheight=500
maxwidth=300
user=Jonathan O'Donnell
home_url=http://en.wikipedia.org/wiki/Current_events
home_maxdepth=1
home_stayonhost=0
home_stayondomain=1
depth_first=0
verbosity=1
status_line_length=60
referrer=
user_agent=
before_command=
after_command=
home_url_pattern=
exclusion_lists=
charset=
indent_paragraphs=0
tables=0
anchor_color=#0000FF
bpp=0
alt_text=1
alt_maxwidth=0
alt_maxheight=0
try_reduce_bpp=0
try_reduce_dimension=0
compression=zlib
image_compression_limit=50
category=
no_urlinfo=1
owner_id_build=
copyprevention_bit=0
backup_bit=0
launchable_bit=0
big_icon=
small_icon=
update_enabled=1
update_frequency=1
update_period=daily
update_base=2004-11-13T10:31:00
close_on_exit=1
close_on_error=1
======================= Relevant plucker.ini file ends ================


_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

Reply via email to