Just to reconfirm, are you indexing file content? Because if you are, you need to be aware most of the PDF do not extract well, as they do not have text flow preserved.
If you are indexing PDF files, I would run a sample through Tika directly (that's what Solr uses under the covers anyway) and see what the output looks like. Apart from that, either SolrJ or DIH would work. If this is for a production system, I'd use SolrJ with client-side Tika parsing. But you could use DIH for a quick test run. Regards, Alex. ---- Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 3 August 2015 at 13:56, Mugeesh Husain <muge...@gmail.com> wrote: > Hi Alexandre, > I have a 40 millions of files which is stored in a file systems, > the filename saved as ARIA_SSN10_0007_LOCATION_0000129.pdf > 1.)I have to split all underscore value from a filename and these value have > to be index to the solr. > 2.)Do Not need file contains(Text) to index. > > You Told me "The answer is Yes" i didn't get in which way you said Yes. > > Thanks > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-tp3656484p4220527.html > Sent from the Solr - User mailing list archive at Nabble.com.