Hi Gora, thanks a lot, very nice solution, works perfectly. I will dig more into ScriptTransformer, seems to be very powerful.
Regards, Bernd Am 08.01.2011 14:38, schrieb Gora Mohanty: > On Fri, Jan 7, 2011 at 12:30 PM, Bernd Fehling > <bernd.fehl...@uni-bielefeld.de> wrote: >> Hello list, >> >> is it possible to load only selected documents with XPathEntityProcessor? >> While loading docs I want to drop/skip/ignore documents with missing URL. >> >> Example: >> <documents> >> <document> >> <title>first title</title> >> <id>identifier_01</id> >> <link>http://www.foo.com/path/bar.html</link> >> </document> >> <document> >> <title>second title</title> >> <id>identifier_02</id> >> <link></link> >> </document> >> </documents> >> >> The first document should be loaded, the second document should be ignored >> because it has an empty link (should also work for missing link field). > [...] > > You can use a ScriptTransformer, along with $skipRow/$skipDoc. > E.g., something like this for your data import configuration file: > > <dataConfig> > <script><![CDATA[ > function skipRow(row) { > var link = row.get( 'link' ); > if( link == null || link == '' ) { > row.put( '$skipRow', 'true' ); > } > return row; > } > ]]></script> > <dataSource type="FileDataSource" /> > <document> > <entity name="f" processor="FileListEntityProcessor" > baseDir="/home/gora/test" fileName=".*xml" newerThan="'NOW-3DAYS'" > recursive="true" rootEntity="false" dataSource="null"> > <entity name="top" processor="XPathEntityProcessor" > forEach="/documents/document" url="${f.fileAbsolutePath}" > transformer="script:skipRow"> > <field column="link" xpath="/documents/document/link"/> > <field column="title" xpath="/documents/document/title"/> > <field column="id" xpath="/documents/document/id"/> > </entity> > </entity> > </document> > </dataConfig> > > Regards, > Gora