Andrzej Bialecki wrote:
3. implement a catch-all plugin, which is equivalent to a Unix command strings(1) (I have an implementation of that which I can contribute). And turn it off/on in the config, if it's off, then the unknown content is skipped and logged, if it's on - then make the best effort to extract text.

This is possible now by simply configuring a catch-all plugin to match the empty suffix and removing the empty suffix from other plugins. So it seems the problem is not that this is currently impossible, but rather that it would be better to alter the configuration than the plugin definitions.

So we might have ParserFactory read a config file that maps content types and url suffixes to plugins. Folks can edit this file instead of modifying the plugin declarations. It can also define default handlers for unknown content types and unknown suffixes. This could either augment or entirely replace the specifications in the plugins themselves. Does this make sense?

Doug

Reply via email to