Hi Michael.
Here's an updated version of my previous patch, which adds a parser in addition to the tokeniser, and also makes major changes to pdf-obj.[ch]. There's still some incomplete stuff (like the pdf_obj_dict_ functions) but it would probably be OK to merge once I complete the copyright assignment. The next component to add would be a reader that can read the xref table and resolve indirect object references. Before to proceed with the implementation of code pertaining to the object layer, we need to complete other activities. I estimate that by the end of this weekend I will finish the drafts for: - The public API to be offered by the object layer to the client applications and the upper layers. - The overall design of the object layer: + What modules are we going to implement, the specific role of each module, and how modules collaborate to implement the public interface. This is not trivial. Among other things, we will have to decide how to implement the garbage collection of indirect objects when saving the document, how to manage the creation of new stream objects (the Acrobat Sdk uses temporary files, for example), and a large etc. Some decisions in this phase will have a direct impact on the implementation of the parser, for example: do we want to have a separate parser for xref tables? can we use the same data type (pdf_obj_t) for both the public interface and to be used by the parser?, etc. Unlike the base layer (where the modules are pretty independent one from another) the object layer requires a quite careful design in order to avoid severe design problems and thus posterior rewritings. + The internal interfaces between the modules, that should be carefully documented in the architecture guide and validated against the collaboration schema to detect flaws. - Some guidelines about how to document the internal interfaces (the modules in the base layer implement public interfaces only) in the architecture guide, and other housekeeping. Then we will have to intensively work (together) on the drafts to get a coherent and good enough architecture for the layer. After that I will create the tasks for the implementation of the layer. Following our general development procedures (see http://www.gnupdf.org/manuals/gnupdf-hg.html/Development-procedures.html) the tasks will cover: - The design of the module-level tests. - The implementation of the modules. - The implementation of the tests. Note also that the existing code in src/object/ is by all means obsolete and useless. I wrote it too quickly and in the time before we started to follow a more "structured" development method. So the next week we will have a lot of work designing and brainstorming. I am trying to set up a reasonable base with the drafts, but a lot of things will be incomplete or simply wrong. Of course your code for the tokeniser will be quite useful, since I think that we will be able to use it as-is. One annoying thing I noticed was that pdf_list_t needs a heap allocation to use an iterator, which means that pdf_obj_equal_p could fail with an ENOMEM error (but currently has no way to return that error). It would be nice if the iterator could be kept on the stack -- struct pdf_list_iterator_s only contains one member anyway. Gerel, what do you think about this? We would of course loose the benefits of the opaque pointers in this case, but 'pdf_obj_equal_p' throwing PDF_ENOMEM sounds quite weird, and I think that we could make an exception and publish the iterator structure. The gnulib list code will also need to be changed to return ENOMEM when necessary -- currently it just calls abort() when malloc fails. Yes, we noticed that. Would be really nice to modify the list module to not crash if 'xalloc_die' returns NULL (does nothing). Really, I think that it is the only way to use the list module in a library. I noticed that you are active in the gnulib development mailing list. Would you want to raise the issue there? A message was posted a few days ago about error management in type 4 functions. My parser already handles syntax errors (for now, it just returns PDF_EBADFILE), and it would be fairly easy to make the parser read type 4 functions if it would help. The (quite simple) parser for type 4 functions is implemented in src/base/pdf-fp-func.[ch] and it already provides detection of syntax errors and some run time errors. It is also trivial to adapt it to cover more error situations. As the task description states, the bigger problem there is to come with a suitable and general enough interface for 'pdf_fp_func_eval'. For 'pdf_fp_func_4_new', since it is only used for type 4 functions, the addition of a new parameter of type 'struct *pdf_fp_func_4_errors_s' should do it. -- Jose E. Marchesi <jema...@gnu.org> GNU Project