Re: Patch submission for DataImportHandler's FileListEntityProcessor to sort files

2011-10-25 Thread Martijn v Groningen
Hi Gabriel,

I'm not an expert FileEntityProcessor user, but I'd expect a
consistent process order. Your code seems kosher to me. You use the
last modified date as order, which seems ok to me. So create a Jira
issue and attach your patch!

Martijn

On 24 October 2011 21:49, Gabriel Cooper gabriel.coo...@jtv.com wrote:
 Hello,

 I noticed what appears to be a bug in DataImportHandler's
 FileListEntityProcessor. Specifically, it relies on Java's File.list()
 method to retrieve a list of files from the configured dataimport directory,
 but list() does not guarantee a sort order. This means that if you have two
 files that update the same record, the results are non-deterministic.
 Typically, list() does in fact return them lexigraphically sorted, but this
 is not guaranteed.

 An example of how you can get into trouble is to imagine the following:

 xyz.xml -- Created one hour ago. Contains updates to records Foo and
 Bar.
 abc.xml -- Created one minute ago. Contains updates to records Bar and
 Baz.

 In this case, the newest file, in abc.xml, would (likely, but not
 guaranteed) be run first, updating the Bar and Baz records. Next, the
 older file, xyz.xml, would update Foo and overwrite Bar with outdated
 changes.

 The HowToContribute wiki page suggested I send my request here before
 opening an actual bug ticket, so please let me know if there's anything else
 I can or should do to get this patch submitted and approved. I've attached a
 patch of FileListEntityProcessor, along with an updated test, please let me
 know if it's kosher.

 Thank you,

 Gabriel.


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Met vriendelijke groet,

Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Patch submission for DataImportHandler's FileListEntityProcessor to sort files

2011-10-24 Thread Gabriel Cooper

Hello,

I noticed what appears to be a bug in DataImportHandler's 
FileListEntityProcessor. Specifically, it relies on Java's File.list() 
method to retrieve a list of files from the configured dataimport 
directory, but list() does not guarantee a sort order. This means that 
if you have two files that update the same record, the results are 
non-deterministic. Typically, list() does in fact return them 
lexigraphically sorted, but this is not guaranteed.


An example of how you can get into trouble is to imagine the following:

xyz.xml -- Created one hour ago. Contains updates to records Foo and 
Bar.
abc.xml -- Created one minute ago. Contains updates to records Bar and 
Baz.


In this case, the newest file, in abc.xml, would (likely, but not 
guaranteed) be run first, updating the Bar and Baz records. Next, 
the older file, xyz.xml, would update Foo and overwrite Bar with 
outdated changes.


The HowToContribute wiki page suggested I send my request here before 
opening an actual bug ticket, so please let me know if there's anything 
else I can or should do to get this patch submitted and approved. I've 
attached a patch of FileListEntityProcessor, along with an updated test, 
please let me know if it's kosher.


Thank you,

Gabriel.
Index: 
src/test/org/apache/solr/handler/dataimport/TestFileListEntityProcessor.java
===
--- 
src/test/org/apache/solr/handler/dataimport/TestFileListEntityProcessor.java
(revision 1188246)
+++ 
src/test/org/apache/solr/handler/dataimport/TestFileListEntityProcessor.java
(working copy)
@@ -36,12 +36,19 @@
   @Test
   @SuppressWarnings(unchecked)
   public void testSimple() throws IOException {
+final String CREATED_FIRST  = b.xml;
+final String CREATED_SECOND = a.xml;
 File tmpdir = File.createTempFile(test, tmp, TEMP_DIR);
 tmpdir.delete();
 tmpdir.mkdir();
 tmpdir.deleteOnExit();
+createFile(tmpdir, b.xml, b.xml.getBytes(), false);
+try {
+  Thread.sleep(1000);
+} catch (Exception e) {
+// Don't care if interrupted. Pass.
+}
 createFile(tmpdir, a.xml, a.xml.getBytes(), false);
-createFile(tmpdir, b.xml, b.xml.getBytes(), false);
 createFile(tmpdir, c.props, c.props.getBytes(), false);
 Map attrs = createMap(
 FileListEntityProcessor.FILE_NAME, xml$,
@@ -58,6 +65,9 @@
   fList.add((String) f.get(FileListEntityProcessor.ABSOLUTE_FILE));
 }
 assertEquals(2, fList.size());
+
+assertTrue(File created first should have appeared first, 
fList.get(0).endsWith(CREATED_FIRST));
+assertTrue(File created second should have appeared second, 
fList.get(1).endsWith(CREATED_SECOND));
   }
   
   @Test
Index: src/java/org/apache/solr/handler/dataimport/FileListEntityProcessor.java
===
--- src/java/org/apache/solr/handler/dataimport/FileListEntityProcessor.java
(revision 1188246)
+++ src/java/org/apache/solr/handler/dataimport/FileListEntityProcessor.java
(working copy)
@@ -219,25 +219,24 @@
   }
 
   private void getFolderFiles(File dir, final ListMapString, Object 
fileDetails) {
-// Fetch an array of file objects that pass the filter, however the
-// returned array is never populated; accept() always returns false.
-// Rather we make use of the fileDetails array which is populated as
-// a side affect of the accept method.
-dir.list(new FilenameFilter() {
-  public boolean accept(File dir, String name) {
-File fileObj = new File(dir, name);
-if (fileObj.isDirectory()) {
-  if (recursive) getFolderFiles(fileObj, fileDetails);
-} else if (fileNamePattern == null) {
-  addDetails(fileDetails, dir, name);
-} else if (fileNamePattern.matcher(name).find()) {
-  if (excludesPattern != null  excludesPattern.matcher(name).find())
-return false;
-  addDetails(fileDetails, dir, name);
+File[] files = dir.listFiles();
+Arrays.sort(files, new ComparatorFile(){
+public int compare(File f1, File f2) {
+return ((Long)f1.lastModified()).compareTo(f2.lastModified());
 }
-return false;
-  }
 });
+
+for(File fileObj : files) {
+  String name = fileObj.getName();
+  if (fileObj.isDirectory()) {
+if (recursive) getFolderFiles(fileObj, fileDetails);
+  } else if (fileNamePattern == null) {
+addDetails(fileDetails, dir, name);
+  } else if (fileNamePattern.matcher(name).find()) {
+if (excludesPattern == null || !excludesPattern.matcher(name).find())
+  addDetails(fileDetails, dir, name);
+  }
+}
   }
 
   private void addDetails(ListMapString, Object files, File dir, String 
name) {


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: