Shalin, OK!
I got myself a JIRA account and opened solr-1000 and followed the wiki instructions on creating a patch which I have now uploaded! Only problem is that while the fix seems fine the test case I added to TestFileListEntityProcessor.java fails. I need somebody who knows what they are doing to point out what I am doing wrong and/or how to debug test failures. It would also be nice if I knew how to run or debug one Junit test rather than all of them, which takes almost 8min. @Test public void testRECURSION() throws IOException { long time = System.currentTimeMillis(); File childdir = new File("." + time + "/child" ); childdir.mkdirs(); childdir.deleteOnExit(); createFile(childdir, "a.xml", "a.xml".getBytes(), true); createFile(childdir, "b.xml", "b.xml".getBytes(), true); createFile(childdir, "c.props", "c.props".getBytes(), true); Map attrs = AbstractDataImportHandlerTest.createMap( FileListEntityProcessor.FILE_NAME, "^.*\\.xml$", FileListEntityProcessor.BASE_DIR, childdir.getAbsolutePath(), FileListEntityProcessor.RECURSIVE, true); Context c = AbstractDataImportHandlerTest.getContext(null, new VariableResolverImpl(), null, 0, Collections.EMPTY_LIST, attrs); FileListEntityProcessor fileListEntityProcessor = new FileListEntityProcessor(); fileListEntityProcessor.init(c); List<String> fList = new ArrayList<String>(); while (true) { // add the documents to the index Map<String, Object> f = fileListEntityProcessor.nextRow(); if (f == null) break; fList.add((String) f.get(FileListEntityProcessor.ABSOLUTE_FILE)); } System.out.println("List of files indexed -- " + fList); Assert.assertEquals(3, fList.size()); } Regards Fergus. >On Mon, Feb 2, 2009 at 2:36 AM, Fergus McMenemie <fer...@twig.me.uk> wrote: > >> Hello >> >> I have been trying to find out why DIH in FileListEntityProcessor >> mode did not appear to be recursing into subdirectories. Going through >> FileListEntityProcessor.java I eventually tumbled to the fact that my >> filename filter setting from data-config.xml also applied to directory >> names. > > >Hmm, not good. > > >> >> >> <entity name="jc" >> processor="FileListEntityProcessor" >> fileName=".*\.xml" >> newerThan="'NOW-1000DAYS'" >> recursive="true" >> rootEntity="false" >> dataSource="null" >> baseDir="/Volumes/spare/ts/stuff/ford"> >> >> Now, I feel that the fieldName filter should be applied to files fed >> into the parser, it should not be applied to the directory names we are >> recursing through. I bodged the code as follows to adjust the behavior >> so that the "FileName" and "excludes" attributes of "entity" only >> apply to filenames and not directory names. > > >I agree with you. > >Perhaps we can have separate filters for directories and files but let's >hold on till the need comes up. > >> >> >> It now recurses though my directory tree only indexing the appropriate >> files! I think the new behavior is more standard. >> >> Is this a change valid? > > >Absolutely. Can you please create an issue and attach the patch? Thanks! > >-- >Regards, >Shalin Shekhar Mangar. -- =============================================================== Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================