Dear Nutch Community, I'm working on a PoC with Nutch 1.X, and also aware of 2.X and its features. I'd like to use Nutch 1.X with an alternative storage, for example Couchbase. Parsed documents would be pre-processed at a Parser extension point, analyzed and a specific JSON schema would be sent - for example to Couchbase. However, the content should not be present in Nutch's segment table.
In other words, how to use an external storage engine with Apache Nutch 1.X to bypass Gora altogether, add a custom pre-processing before ingesting data into external storage, and to remove any duplicates from the segment table? I appreciate you help, thanks! Regards, Zoltán

