Re: DIH load only selected documents with XPathEntityProcessor

Bernd Fehling Mon, 10 Jan 2011 01:06:32 -0800

Hi Gora,

thanks a lot, very nice solution, works perfectly.
I will dig more into ScriptTransformer, seems to be very powerful.


Regards,
Bernd

Am 08.01.2011 14:38, schrieb Gora Mohanty:
> On Fri, Jan 7, 2011 at 12:30 PM, Bernd Fehling
> <bernd.fehl...@uni-bielefeld.de> wrote:
>> Hello list,
>>
>> is it possible to load only selected documents with XPathEntityProcessor?
>> While loading docs I want to drop/skip/ignore documents with missing URL.
>>
>> Example:
>> <documents>
>>    <document>
>>        <title>first title</title>
>>        <id>identifier_01</id>
>>        <link>http://www.foo.com/path/bar.html</link>
>>    </document>
>>    <document>
>>        <title>second title</title>
>>        <id>identifier_02</id>
>>        <link></link>
>>    </document>
>> </documents>
>>
>> The first document should be loaded, the second document should be ignored
>> because it has an empty link (should also work for missing link field).
> [...]
> 
> You can use a ScriptTransformer, along with $skipRow/$skipDoc.
> E.g., something like this for your data import configuration file:
> 
> <dataConfig>
>     <script><![CDATA[
>       function skipRow(row) {
>         var link = row.get( 'link' );
>         if( link == null || link == '' ) {
>           row.put( '$skipRow', 'true' );
>         }
>         return row;
>       }
>     ]]></script>
>     <dataSource type="FileDataSource" />
>     <document>
>         <entity name="f" processor="FileListEntityProcessor"
> baseDir="/home/gora/test" fileName=".*xml" newerThan="'NOW-3DAYS'"
> recursive="true" rootEntity="false" dataSource="null">
>             <entity name="top" processor="XPathEntityProcessor"
> forEach="/documents/document" url="${f.fileAbsolutePath}"
> transformer="script:skipRow">
>                <field column="link" xpath="/documents/document/link"/>
>                <field column="title" xpath="/documents/document/title"/>
>                <field column="id" xpath="/documents/document/id"/>
>             </entity>
>         </entity>
>     </document>
> </dataConfig>
> 
> Regards,
> Gora

Re: DIH load only selected documents with XPathEntityProcessor

Reply via email to