On Mon, 25 Jul 2022, Oscar Rieken Jr via user wrote:
I am currently trying to validate our Tika setup and was looking for a set of example data I could use

If you want a small number of files of lots of different types, the test files in the Tika source tree will work. Main set are in
tika-parsers/src/test/resources/test-documents/

If you want a very large number of files, then the Tika Corpora collection is a good source. We have a few different collections, including stuff from common crawl, govdocs and bug trackers. If you can let us know what sort of file types and how many, we can suggest the best corpora collection

Nick

Reply via email to