Hi Pluckites I am trying to pluck the Wikipedia current events page http://en.wikipedia.org/wiki/Current_events\nand only that page (spidering limit = 1).
I am getting the following error: Error: Runtime error parsing document http://en.wikipedia.org/wiki/Current_events: unexpected char in declaration: '<' Parsing failed. [See below for full error listing] Having read the previous threads on this error, I checked the page at the W3C validator. It validates. "This Page Is Valid XHTML 1.0 Transitional!" http://validator.w3.org/check?uri=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FCurrent_events Looking at the code, Wikipedia seems to do some funny things with style sheets. <style type="text/css" media="screen,projection">/*<![CDATA[*/ @import "/style/monobook/main.css"; /*]]>*/</style> My coding is not strong enough to know if this is the problem. I have included the relevant .ini file at the end of this document. As has been said here before, this is an oldie but a goodie. http://lists.rubberchicken.org/pipermail/plucker-list/2003-January/001341.html However, I seem to have found a new wrinkle. Any advice gratefully appreciated. Jonathan O'Donnell mailto:[EMAIL PROTECTED] http://purl.nla.gov.au/net/jod +61 4 2575 5829 ================== Full error listing begins ============== Initializing Plucker spidering engine... ----------------------------------------------------------- Updating channel: Wikipedia current events... ----------------------------------------------------------- Pluckerdir is 'C:Program FilesPlucker'... Using proxy '' with authentication for user ''... ---- 0 collected, 1 to do ---- Processing http://en.wikipedia.org/wiki/Current_events... Retrieved ok. Error: Runtime error parsing document http://en.wikipedia.org/wiki/Current_events: unexpected char in declaration: '<' Parsing failed. ---- all 0 pages retrieved and parsed ---- Writing out collected data... Writing document 'Wikipedia current events' to file C:Program FilesPluckerchannels/Wikipediacurrentevents/Wikipediacurrentevents.pdb Traceback (most recent call last): File "C:Program FilesPlucker/parser/python/PyPlucker/Spider.py", line 1734, in ? sys.exit(realmain(None)) File "C:Program FilesPlucker/parser/python/PyPlucker/Spider.py", line 1719, in realmain retval = main (config, exclusion_lists) File "C:Program FilesPlucker/parser/python/PyPlucker/Spider.py", line 1147, in main mapping = writer.write (verbose=verbosity, alias_list=alias_list) File "C:Program FilesPlucker/parser/pythonPyPluckerWriter.py", line 535, in write result = Writer.write (self, verbose, alias_list=alias_list) File "C:Program FilesPlucker/parser/pythonPyPluckerWriter.py", line 352, in write raise RuntimeError("The collection process failed to generate a 'home' document") RuntimeError: The collection process failed to generate a 'home' document Installing channel output to destinations... Setting new due date... Tasks completed for all channels. ================ Full error listing ends ===================== ================ Relevant plucker.ini file starts =============== [Wikipediacurrentevents] copy_to_dir=I:MULTIMEDStaff_TeamToJonathanJonathanMy_Web directory_on_card= doc_file=channels/Wikipediacurrentevents/Wikipediacurrentevents doc_name=Wikipedia current events handheld_target_storage_mode=0 is_usb_pause=1 maxheight=500 maxwidth=300 user=Jonathan O'Donnell home_url=http://en.wikipedia.org/wiki/Current_events home_maxdepth=1 home_stayonhost=0 home_stayondomain=1 depth_first=0 verbosity=1 status_line_length=60 referrer= user_agent= before_command= after_command= home_url_pattern= exclusion_lists= charset= indent_paragraphs=0 tables=0 anchor_color=#0000FF bpp=0 alt_text=1 alt_maxwidth=0 alt_maxheight=0 try_reduce_bpp=0 try_reduce_dimension=0 compression=zlib image_compression_limit=50 category= no_urlinfo=1 owner_id_build= copyprevention_bit=0 backup_bit=0 launchable_bit=0 big_icon= small_icon= update_enabled=1 update_frequency=1 update_period=daily update_base=2004-11-13T10:31:00 close_on_exit=1 close_on_error=1 ======================= Relevant plucker.ini file ends ================ _______________________________________________ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list

