Re: DataImport Handler, writing a new EntityProcessor
Hi! Thanks for all the advice! I finally did it, the most annoying error that took me the best of a day to figure out was that the state variable here had to be reset: https://bitbucket.org/dermotte/liresolr/src/d27878a71c63842cb72b84162b599d99c4408965/src/main/java/net/semanticmetadata/lire/solr/LireEntityProcessor.java?at=master#cl-56 The EntityProcessor is part of this image search plugin if anyone is interested: https://bitbucket.org/dermotte/liresolr/ :) It's always the small things that are hard to find cheers and thanks, Mathias On Wed, Dec 18, 2013 at 7:26 PM, P Williams williams.tricia.l...@gmail.com wrote: Hi Mathias, I'd recommend testing one thing at a time. See if you can get it to work for one image before you try a directory of images. Also try testing using the solr-testframework using your ide (I use Eclipse) to debug rather than your browser/print statements. Hopefully that will give you some more specific knowledge of what's happening around your plugin. I also wrote an EntityProcessor plugin to read from a properties filehttps://issues.apache.org/jira/browse/SOLR-3928. Hopefully that'll give you some insight about this kind of Solr plugin and testing them. Cheers, Tricia On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux m...@itec.uni-klu.ac.atwrote: Hi all! I've got a question regarding writing a new EntityProcessor, in the same sense as the Tika one. My EntityProcessor should analyze jpg images and create document fields to be used with the LIRE Solr plugin (https://bitbucket.org/dermotte/liresolr). Basically I've taken the same approach as the TikaEntityProcessor, but my setup just indexes the first of 1000 images. I'm using a FileListEntityProcessor to get all JPEGs from a directory and then I'm handing them over (see [2]). My code for the EntityProcessor is at [1]. I've tried to use the DataSource as well as the filePath attribute, but it ends up all the same. However, the FileListEntityProcessor is able to read all the files according to the debug output, but I'm missing the link from the FileListEntityProcessor to the LireEntityProcessor. I'd appreciate any pointer or help :) cheers, Mathias [1] LireEntityProcessor http://pastebin.com/JFajkNtf [2] dataConfig http://pastebin.com/vSHucatJ -- Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec -- PD Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec
RE: DataImport Handler, writing a new EntityProcessor
The first thing I would suggest is to try and run it not in debug mode. DIH's debug mode limits the number of documents it will take in, so that might be all that is wrong here. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of Mathias Lux Sent: Wednesday, December 18, 2013 4:04 AM To: solr-user@lucene.apache.org Subject: DataImport Handler, writing a new EntityProcessor Hi all! I've got a question regarding writing a new EntityProcessor, in the same sense as the Tika one. My EntityProcessor should analyze jpg images and create document fields to be used with the LIRE Solr plugin (https://bitbucket.org/dermotte/liresolr). Basically I've taken the same approach as the TikaEntityProcessor, but my setup just indexes the first of 1000 images. I'm using a FileListEntityProcessor to get all JPEGs from a directory and then I'm handing them over (see [2]). My code for the EntityProcessor is at [1]. I've tried to use the DataSource as well as the filePath attribute, but it ends up all the same. However, the FileListEntityProcessor is able to read all the files according to the debug output, but I'm missing the link from the FileListEntityProcessor to the LireEntityProcessor. I'd appreciate any pointer or help :) cheers, Mathias [1] LireEntityProcessor http://pastebin.com/JFajkNtf [2] dataConfig http://pastebin.com/vSHucatJ -- Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec
Re: DataImport Handler, writing a new EntityProcessor
Unfortunately it is the same in non-debug, just the first document. I also output the params to sout, but it seems only the first one is ever arriving at my custom class. I've the feeling that I'm doing something seriously wrong here, based on a complete misunderstanding :) I basically assume that the nested entity processor will be called for each of the rows that come out from its parent. I've read somewhere, that the data has to be taken from the data source, and I've implemented that, but it doesn't seem to change anything. cheers, Mathias On Wed, Dec 18, 2013 at 3:05 PM, Dyer, James james.d...@ingramcontent.com wrote: The first thing I would suggest is to try and run it not in debug mode. DIH's debug mode limits the number of documents it will take in, so that might be all that is wrong here. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of Mathias Lux Sent: Wednesday, December 18, 2013 4:04 AM To: solr-user@lucene.apache.org Subject: DataImport Handler, writing a new EntityProcessor Hi all! I've got a question regarding writing a new EntityProcessor, in the same sense as the Tika one. My EntityProcessor should analyze jpg images and create document fields to be used with the LIRE Solr plugin (https://bitbucket.org/dermotte/liresolr). Basically I've taken the same approach as the TikaEntityProcessor, but my setup just indexes the first of 1000 images. I'm using a FileListEntityProcessor to get all JPEGs from a directory and then I'm handing them over (see [2]). My code for the EntityProcessor is at [1]. I've tried to use the DataSource as well as the filePath attribute, but it ends up all the same. However, the FileListEntityProcessor is able to read all the files according to the debug output, but I'm missing the link from the FileListEntityProcessor to the LireEntityProcessor. I'd appreciate any pointer or help :) cheers, Mathias [1] LireEntityProcessor http://pastebin.com/JFajkNtf [2] dataConfig http://pastebin.com/vSHucatJ -- Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec -- PD Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec
Re: DataImport Handler, writing a new EntityProcessor
Hi Mathias, I'd recommend testing one thing at a time. See if you can get it to work for one image before you try a directory of images. Also try testing using the solr-testframework using your ide (I use Eclipse) to debug rather than your browser/print statements. Hopefully that will give you some more specific knowledge of what's happening around your plugin. I also wrote an EntityProcessor plugin to read from a properties filehttps://issues.apache.org/jira/browse/SOLR-3928. Hopefully that'll give you some insight about this kind of Solr plugin and testing them. Cheers, Tricia On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux m...@itec.uni-klu.ac.atwrote: Hi all! I've got a question regarding writing a new EntityProcessor, in the same sense as the Tika one. My EntityProcessor should analyze jpg images and create document fields to be used with the LIRE Solr plugin (https://bitbucket.org/dermotte/liresolr). Basically I've taken the same approach as the TikaEntityProcessor, but my setup just indexes the first of 1000 images. I'm using a FileListEntityProcessor to get all JPEGs from a directory and then I'm handing them over (see [2]). My code for the EntityProcessor is at [1]. I've tried to use the DataSource as well as the filePath attribute, but it ends up all the same. However, the FileListEntityProcessor is able to read all the files according to the debug output, but I'm missing the link from the FileListEntityProcessor to the LireEntityProcessor. I'd appreciate any pointer or help :) cheers, Mathias [1] LireEntityProcessor http://pastebin.com/JFajkNtf [2] dataConfig http://pastebin.com/vSHucatJ -- Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec