Re: DataImport Handler, writing a new EntityProcessor

2013-12-19 Thread Mathias Lux
Hi!

Thanks for all the advice! I finally did it, the most annoying error
that took me the best of a day to figure out was that the state
variable here had to be reset:
https://bitbucket.org/dermotte/liresolr/src/d27878a71c63842cb72b84162b599d99c4408965/src/main/java/net/semanticmetadata/lire/solr/LireEntityProcessor.java?at=master#cl-56

The EntityProcessor is part of this image search plugin if anyone is
interested: https://bitbucket.org/dermotte/liresolr/

:) It's always the small things that are hard to find

cheers and thanks, Mathias

On Wed, Dec 18, 2013 at 7:26 PM, P Williams
williams.tricia.l...@gmail.com wrote:
 Hi Mathias,

 I'd recommend testing one thing at a time.  See if you can get it to work
 for one image before you try a directory of images.  Also try testing using
 the solr-testframework using your ide (I use Eclipse) to debug rather than
 your browser/print statements.  Hopefully that will give you some more
 specific knowledge of what's happening around your plugin.

 I also wrote an EntityProcessor plugin to read from a properties
 filehttps://issues.apache.org/jira/browse/SOLR-3928.
  Hopefully that'll give you some insight about this kind of Solr plugin and
 testing them.

 Cheers,
 Tricia




 On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux m...@itec.uni-klu.ac.atwrote:

 Hi all!

 I've got a question regarding writing a new EntityProcessor, in the
 same sense as the Tika one. My EntityProcessor should analyze jpg
 images and create document fields to be used with the LIRE Solr plugin
 (https://bitbucket.org/dermotte/liresolr). Basically I've taken the
 same approach as the TikaEntityProcessor, but my setup just indexes
 the first of 1000 images. I'm using a FileListEntityProcessor to get
 all JPEGs from a directory and then I'm handing them over (see [2]).
 My code for the EntityProcessor is at [1]. I've tried to use the
 DataSource as well as the filePath attribute, but it ends up all the
 same. However, the FileListEntityProcessor is able to read all the
 files according to the debug output, but I'm missing the link from the
 FileListEntityProcessor to the LireEntityProcessor.

 I'd appreciate any pointer or help :)

 cheers,
   Mathias

 [1] LireEntityProcessor http://pastebin.com/JFajkNtf
 [2] dataConfig http://pastebin.com/vSHucatJ

 --
 Dr. Mathias Lux
 Klagenfurt University, Austria
 http://tinyurl.com/mlux-itec




-- 
PD Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec


RE: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread Dyer, James
The first thing I would suggest is to try and run it not in debug mode.  DIH's 
debug mode limits the number of documents it will take in, so that might be all 
that is wrong here.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of Mathias 
Lux
Sent: Wednesday, December 18, 2013 4:04 AM
To: solr-user@lucene.apache.org
Subject: DataImport Handler, writing a new EntityProcessor

Hi all!

I've got a question regarding writing a new EntityProcessor, in the
same sense as the Tika one. My EntityProcessor should analyze jpg
images and create document fields to be used with the LIRE Solr plugin
(https://bitbucket.org/dermotte/liresolr). Basically I've taken the
same approach as the TikaEntityProcessor, but my setup just indexes
the first of 1000 images. I'm using a FileListEntityProcessor to get
all JPEGs from a directory and then I'm handing them over (see [2]).
My code for the EntityProcessor is at [1]. I've tried to use the
DataSource as well as the filePath attribute, but it ends up all the
same. However, the FileListEntityProcessor is able to read all the
files according to the debug output, but I'm missing the link from the
FileListEntityProcessor to the LireEntityProcessor.

I'd appreciate any pointer or help :)

cheers,
  Mathias

[1] LireEntityProcessor http://pastebin.com/JFajkNtf
[2] dataConfig http://pastebin.com/vSHucatJ

-- 
Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec



Re: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread Mathias Lux
Unfortunately it is the same in non-debug, just the first document. I
also output the params to sout, but it seems only the first one is
ever arriving at my custom class. I've the feeling that I'm doing
something seriously wrong here, based on a complete misunderstanding
:) I basically assume that the nested entity processor will be called
for each of the rows that come out from its parent. I've read
somewhere, that the data has to be taken from the data source, and
I've implemented that, but it doesn't seem to change anything.

cheers,
Mathias

On Wed, Dec 18, 2013 at 3:05 PM, Dyer, James
james.d...@ingramcontent.com wrote:
 The first thing I would suggest is to try and run it not in debug mode.  
 DIH's debug mode limits the number of documents it will take in, so that 
 might be all that is wrong here.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of 
 Mathias Lux
 Sent: Wednesday, December 18, 2013 4:04 AM
 To: solr-user@lucene.apache.org
 Subject: DataImport Handler, writing a new EntityProcessor

 Hi all!

 I've got a question regarding writing a new EntityProcessor, in the
 same sense as the Tika one. My EntityProcessor should analyze jpg
 images and create document fields to be used with the LIRE Solr plugin
 (https://bitbucket.org/dermotte/liresolr). Basically I've taken the
 same approach as the TikaEntityProcessor, but my setup just indexes
 the first of 1000 images. I'm using a FileListEntityProcessor to get
 all JPEGs from a directory and then I'm handing them over (see [2]).
 My code for the EntityProcessor is at [1]. I've tried to use the
 DataSource as well as the filePath attribute, but it ends up all the
 same. However, the FileListEntityProcessor is able to read all the
 files according to the debug output, but I'm missing the link from the
 FileListEntityProcessor to the LireEntityProcessor.

 I'd appreciate any pointer or help :)

 cheers,
   Mathias

 [1] LireEntityProcessor http://pastebin.com/JFajkNtf
 [2] dataConfig http://pastebin.com/vSHucatJ

 --
 Dr. Mathias Lux
 Klagenfurt University, Austria
 http://tinyurl.com/mlux-itec




-- 
PD Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec


Re: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread P Williams
Hi Mathias,

I'd recommend testing one thing at a time.  See if you can get it to work
for one image before you try a directory of images.  Also try testing using
the solr-testframework using your ide (I use Eclipse) to debug rather than
your browser/print statements.  Hopefully that will give you some more
specific knowledge of what's happening around your plugin.

I also wrote an EntityProcessor plugin to read from a properties
filehttps://issues.apache.org/jira/browse/SOLR-3928.
 Hopefully that'll give you some insight about this kind of Solr plugin and
testing them.

Cheers,
Tricia




On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux m...@itec.uni-klu.ac.atwrote:

 Hi all!

 I've got a question regarding writing a new EntityProcessor, in the
 same sense as the Tika one. My EntityProcessor should analyze jpg
 images and create document fields to be used with the LIRE Solr plugin
 (https://bitbucket.org/dermotte/liresolr). Basically I've taken the
 same approach as the TikaEntityProcessor, but my setup just indexes
 the first of 1000 images. I'm using a FileListEntityProcessor to get
 all JPEGs from a directory and then I'm handing them over (see [2]).
 My code for the EntityProcessor is at [1]. I've tried to use the
 DataSource as well as the filePath attribute, but it ends up all the
 same. However, the FileListEntityProcessor is able to read all the
 files according to the debug output, but I'm missing the link from the
 FileListEntityProcessor to the LireEntityProcessor.

 I'd appreciate any pointer or help :)

 cheers,
   Mathias

 [1] LireEntityProcessor http://pastebin.com/JFajkNtf
 [2] dataConfig http://pastebin.com/vSHucatJ

 --
 Dr. Mathias Lux
 Klagenfurt University, Austria
 http://tinyurl.com/mlux-itec