I'd strongly recommend rolling your own ingest code. See Erick's superb: https://lucidworks.com/post/indexing-with-solrj/
You can easily get attachments via the RecursiveParserWrapper, e.g. https://github.com/apache/tika/blob/master/tika-parsers/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java#L351 This will return a list of Metadata objects; the first one will be the main/container, each other entry will be an attachment. Let us know if you have any questions/surprises. There are a couple of todos for .eml... On Fri, Aug 2, 2019 at 3:43 AM Jan Høydahl <jan....@cominvent.com> wrote: > > Try the Apache Tika mailing list. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 2. aug. 2019 kl. 05:01 skrev Zheng Lin Edwin Yeo <edwinye...@gmail.com>: > > > > Hi, > > > > Does anyone knows if this can be done on the Solr side? > > Or it has to be done on the Tika side? > > > > Regards, > > Edwin > > > > On Thu, 1 Aug 2019 at 09:38, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > > wrote: > > > >> Hi, > >> > >> Would like to check, Is there anyway which we can detect the number of > >> attachments and their names during indexing of EML files in Solr, and index > >> those information into Solr? > >> > >> Currently, Solr is able to use Tika and Tesseract OCR to extract the > >> contents of the attachments. However, I could not find the information > >> about the number of attachments in the EML file and what are their > >> filename. > >> > >> I am using Solr 7.6.0 in production, and also trying out on the new Solr > >> 8.2.0. > >> > >> Regards, > >> Edwin > >> >