I'm just using plucker for a couple of weeks now (and 1.2 for just a week or so). I
had some questions though. Hopefully, they are as easy to answer as the one about blue
anchors. (PS: I did RTFM, but couldn't find M :-) ).
1) Has the documentation been updated for version 1.2? (or anything after 1.1.13,
which is the last I got).
2) I'm using windows and I've tried the different options for image_parser in the
plucker.ini. Most of them don't work, except pil2 and windowspil.
But
2a) windowspil doesn't make a link from small pictures (the maxwidth x maxheight ones)
to the larger versions (the alt-maxwidth x alt-maxheight ones). I've tried to figure
out the source-code (I'm new to python, so that was fun) and I found that some image
parser are derived from ImageParser, which seems to be the baseclass, but others aren't
class ImageParser:
class ImageMagickImageParser:
class NetPBMImageParser:
class NewNetPBMImageParser(ImageParser):
class PythonImagingLibraryParser:
class NewPythonImagingLibraryParser(ImageParser):
class WindowsImageParser:
class WindowsPILImageParser:
Note that pil2 (NewPythonImagingLibraryParser) is using ImageParser.
I also found that at the bottom of ImageParser::get_plucker_doc(), there is a piece of
code that seems to gather the bigger picture by calling _related_images(). I also saw
that these images are added by a call to PluckerImageDocument::add_related_image().
Then I looked at where add_related_image() is used and found two places: it's
definition and it's use at the bottom of ImageParser::get_plucker_doc(). Now you
probably see the relevance how I started: windowspil (WindowsPILImageParser) isn't
derived from ImageParser AND it doesn't generate those related images.
2b) pil2 does a call to add_related_image(), but has another problem: it crashes when
I use bpp=16.
C:\>"c:\program files\plucker\python\python" "c:\program files\plucker\pyplucker
\spider.py" "--pluckerhome=c:\program files\plucker" --home-url=file:c:/x.html -
-doc-file=c:\x --bpp=16
Pluckerdir is 'c:\program files\plucker'...
---- 0 collected, 1 to do ----
Processing file:C:/x.html...
Retrieved ok.
Parsed ok; 1 image.
---- 1 collected, 1 to do ----
Processing file:C:\x.jpg...
Retrieved ok.
Error: Unknown error parsing document file:C:\x.jpg:
Traceback (innermost last):
File "C:\Program Files\plucker\PyPlucker\Parser.py", line 45, in generic_parse
r
return parsed.get_plucker_doc ()
File "C:\Program Files\plucker\PyPlucker\ImageParser.py", line 218, in get_plu
cker_doc
raise ImageSize ("Image data too large (%d bytes) for a Plucker image record
"
ImageSize: Image data too large (209024 bytes) for a Plucker image record (max 6
1440 bytes) when plucked at 500x209x16! Scale it down.
Parsed ok.
---- all 3 pages retrieved and parsed ----
Writing out collected data...
Writing document 'x' to file c:\x.pdb
Traceback (innermost last):
File "c:\program files\plucker\pyplucker\spider.py", line 1512, in ?
sys.exit(realmain())
File "c:\program files\plucker\pyplucker\spider.py", line 1505, in realmain
retval = main (config, exclusion_lists)
File "c:\program files\plucker\pyplucker\spider.py", line 1041, in main
mapping = writer.write (verbose=verbosity, alias_list=alias_list)
File "C:\Program Files\plucker\PyPlucker\Writer.py", line 518, in write
result = Writer.write (self, verbose, alias_list=alias_list)
File "C:\Program Files\plucker\PyPlucker\Writer.py", line 310, in write
self._mapper = Mapper(self._collection, alias_list.as_dict())
File "C:\Program Files\plucker\PyPlucker\Writer.py", line 102, in __init__
self._get_id_for_doc(doc)
File "C:\Program Files\plucker\PyPlucker\Writer.py", line 112, in _get_id_for_
doc
id = self._url_to_id_mapping.get(doc.get_url())
AttributeError: 'None' object has no attribute 'get_url'
I think _get_id_for_doc should return if doc==None, which would be a bit more graceful.
I hope anyone can help me here.
agb