Friday, August 29, 2003, 20:23
Dear Friends,
I've been trying to convert a series of HTML documents on my PC, but I
keep getting some strange errors at the end. They are actually about
50 HTML files, starting from the index, but since I did not know the
correct link depth between them I just guessed 10, and the final
message says "3165 parsed".
The problem seems to happen when the parsed documents are being
converted to the PDB file. Something happens and the PDB is not
created, something with Python.
I am using Plucker Desktop 1.2.01, and I've never had such problem,
even on very large amounts of documents.
Can anyone help? The whole error message is as follows:
--- start of error log ---
---- all 3165 pages retrieved and parsed ----
Writing out collected data...
Writing document 'Thinking in Java, 3rd Ed.' to file C:\Arquivos de
programas\Plucker\channels/ThinkinginJava3rdEd/ThinkinginJava3rdEd.pdb
Traceback (most recent call last):
File "C:\Arquivos de programas\Plucker\PyPlucker\Spider.py", line 1532, in ?
sys.exit(realmain())
File "C:\Arquivos de programas\Plucker\PyPlucker\Spider.py", line 1524, in realmain
retval = main (config, exclusion_lists)
File "C:\Arquivos de programas\Plucker\PyPlucker\Spider.py", line 1046, in main
mapping = writer.write (verbose=verbosity, alias_list=alias_list)
File "C:\Arquivos de programas\Plucker\PyPlucker\Writer.py", line 518, in write
result = Writer.write (self, verbose, alias_list=alias_list)
File "C:\Arquivos de programas\Plucker\PyPlucker\Writer.py", line 310, in write
self._mapper = Mapper(self._collection, alias_list.as_dict())
File "C:\Arquivos de programas\Plucker\PyPlucker\Writer.py", line 102, in __init__
self._get_id_for_doc(doc)
File "C:\Arquivos de programas\Plucker\PyPlucker\Writer.py", line 112, in
_get_id_for_doc
id = self._url_to_id_mapping.get(doc.get_url())
AttributeError: 'None' object has no attribute 'get_url'
Installing channel output to destinations...
Setting channels new due date
Tasks completed for all channels.
--- end of error log ---
I have tried other sites and documents, and they convert OK. I tried
cleaning the HTML files using HTML Tidy (this worked before when
trying to convert HTML documents generated by Power Point), but made
no difference here.
Sincerely,
Michael A. Lees
[EMAIL PROTECTED]
_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list