Hi All, Currently I have a stack of XML documents in MarkLogic. They get there via an XProc pipeline. I am currently working to run Apache Any23 on the XML *just* before it get inserted into MarkLogic. I would like to then send the extracted structure e.g. triples, etc to TDB and use this structure to compliment structured or text based queries within my search application.
Currently I need clarification on a couple of areas if possible... 1. The triples can be easily extracted then written as a ByteArrayOutputStream (or a Sting representation of this Stream), and I assume this can be fed into TDB? 2. If this above is achievable... how exactly would TDB persist the Stream? Would this be one graph? Can someone please expand on this? 3. How would I synchronize the XML documents and the associated content within TDB? This is my major area of confusion. I accept that this is not in any way, shape or form related to TDB/Jena... but I am curious to hear from anyone out there who has attempted anything similar. The idea is to compliment structured or text based queries into a search application and have the relevant TDB content compliment the users search... something like the provision of domain/document specific metadata to give more body to the search experience. Thanks very much for any feedback on this one, I realise it is a pretty lengthy question but any suggestions would be great. All the best Lewis -- Lewis
