Well, If it is just file names, I'd probably use SolrJ client, maybe with Java 8. Read file names, split the name into parts with regular expressions, stuff parts into different field names and send to Solr. Java 8 has FileSystem walkers, etc to make it easier.
You could do it with DIH, but it would be with nested entities and the inner entity would probably try to parse the file. So, a lot of wasted effort if you just care about the file names. Or, I would just do a directory listing in the operating system and use regular expressions to split it into CSV file, which I would then import into Solr directly. In all of these cases, the question would be which field is the ID of the record to ensure no duplicates. Regards, Alex. ---- Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 3 August 2015 at 15:34, Mugeesh Husain <muge...@gmail.com> wrote: > @Alexandre No i dont need a content of a file. i am repeating my requirement > > I have a 40 millions of files which is stored in a file systems, > the filename saved as ARIA_SSN10_0007_LOCATION_0000129.pdf > > I just split all Value from a filename only,these values i have to index. > > I am interested to index value to solr not file contains. > > I have tested the DIH from a file system its work fine but i dont know how > can i implement my code in DIH > if my code get some value than how i can i index it using DIH. > > If i will use DIH then How i will make split operation and get value from > it. > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-tp3656484p4220552.html > Sent from the Solr - User mailing list archive at Nabble.com.