FileListEntityProcessor can't handle directories containing lots of files -------------------------------------------------------------------------
Key: SOLR-798 URL: https://issues.apache.org/jira/browse/SOLR-798 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Reporter: Grant Ingersoll Priority: Minor The FileListEntityProcessor currently tries to process all documents in a single directory at once, and stores the results into a hashmap. On directories containing a large number of documents, this quickly causes OutOfMemory errors. Unfortunately, the typical fix to this is to hack FileFilter to do the work for you and always return false from the accept method. It may be possible to hook up some type of Producer/Consumer multithreaded FileFilter approach whereby the FileFilter blocks until the nextRow() mechanism requests another row, thereby avoiding the need to cache everything in the map. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.