On 19/08/10 18:30, Stefan Thomas wrote:
Hey again,
Following modification can drop the page number suffix in
output filename when "split" flag is disabled. Stefan, could
you review?
Makes sense and works great! Thanks!
A Dissabte, 31 de juliol de 2010, Adrian Johnson va escriure:
I've just done some testing with the patches and found a problem with
the generated output file name.
My original patch would, if no output file name is specified, use the
input filename to generate the output name like what pdftops does. eg
"pdftocairo -ps foo.pdf" will create foo.ps
"pdftocairo -png foo.pdf" will create foo-001.png, foo-002.png, ...
the updated patches are doing:
"pdftocairo -ps foo.pdf" will create cairoout-001.ps
"pdftocairo -png foo.pdf" will create cairoout-001.png,
cairoout-002.png, ...
which is not very user friendly.
The reason for this change was that the previous way of determining
outRoot gave unwanted results when you gave it a remote PDF like:
"pdftocairo -ps http://m.je/test.pdf" will try to create
http://m.je/test.ps
Didn't like URLs without a real filename either:
"pdftocairo -ps http://example.com/get?format=pdf&asset=3948" will try
to create http://example.ps
Another problem was that it would create output files in the directory
where the PDF file resides rather than in the current working directory
which is unusual for *NIX command line tools.
"pdftocairo -ps /media/cdrom0/pdfs/mytest.pdf" -> write error
The current version (with mpsuzuki's fix) will create "cairoout.ps" in
the current working directory in all of those cases. Plus you can always
provide a second parameter if you'd like a different name. I'm liking
the predictability of it.
As I see it we have three options:
1. Current way: Always use "cairoout", unless otherwise specified by the
user.
2. Adrian's way: Create outRoot from input filename. Perhaps with some
improvements like cutting off everything before the last slash (if it
exists), then everything after the first question mark (if it exists),
then everything after the last dot (if it exists). If the result is
empty or otherwise not a valid filename, use "cairoout".
3. pdftoppm's way: Output to STDOUT unless a second parameter with an
output filename is provided.
I'd be happy with any of these. I'm liking about number one that I know
there aren't any bugs in it. With number 2 I feel like there is always
going to be a URL or filename that breaks it/acts weird. I'm liking
about number three that it's consistent with pdftoppm.
Thoughts?
My preference is to maintain consistency with the other poppler utils.
pdftops, pdftotext, and pdftohtml all use the source file with the
extension changed. I prefer pdftocairo to work the same way.
pdftoppm is the exception in writing to stdout if no output name is
specified. This only makes sense for the ppm format which allows
multiple images concatenated together. Writing multiple images to
stdout does not work with the png or jpeg formats.
Currently none of the other poppler utils (except pdftoppm which writes
to stdout) handle URLs with no output file specified. Whatever solution
is chosen for providing default filenames for URLs should be
consistently implemented across all the poppler utils. I would suggest
implementing behavior similar to wget. ie strip off everything up to
the last slash and escape any characters not supported by the
filesystem. Since the other poppler utils do not work with URLs with no
output file specified, I don't think finding a solution for this case
should block the committing of pdftocairo.
I'll make a final patch once we've decided the filename issue.
Cheers,
Stefan
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler