2009/7/30 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com>: > On Thu, Jul 30, 2009 at 1:23 AM, Erik Hatcher<e...@ehatchersolutions.com> > wrote: >> I've been troubleshooting an issue where we're trying to load documents >> through DIH's URLDataSource and XPathEntityProcessor, where we want to >> leverage the $hasMore feature to request to a new URL. >> >> I've been tinkering with this using a very simple example, two XML files - >> >> solr.xml: >> <add> >> <doc> >> <field name="id">SOLR1000</field> >> </doc> >> <doc> >> <field name="id">**HASMORE**</field> >> </doc> >> </add> >> >> solr2.xml >> <add> >> <doc> >> <field name="id">SOLR2k</field> >> </doc> >> </add> >> >> My DIH config is: >> >> <?xml version="1.0"?> >> <dataConfig> >> <dataSource type="URLDataSource" >> baseUrl="file:///Users/erikhatcher/dev/solr/example/exampledocs/" >> readTimeout="180000" connectionTimeout="60000"/> >> >> <script> >> <![CDATA[ >> function checkForMore(row, context) { >> print("### checkForMore: " + row); >> if (row.get('id') == '**HASMORE**') { >> print("#### hasMore ####"); >> row.put('$hasMore', 'true'); >> row.put('$nextUrl', >> 'file:///Users/erikhatcher/dev/solr/example/exampledocs/solr2.xml'); >> row.put('$skipRow', 'true'); >> } else { >> row.put('$hasMore', 'false'); >> } >> return row; >> } >> ]]> >> </script> >> >> <document name="docs"> >> <entity name="doc" >> processor="XPathEntityProcessor" >> url="solr.xml" >> forEach="/add/doc" >> stream="true" >> >> transformer="DateFormatTransformer,TemplateTransformer,script:checkForMore" >> onError="abort"> >> <field column="id" xpath="/add/doc/fie...@name='id']"/> >> </entity> >> </document> >> </dataConfig> >> >> Without the else clause in checkForMore to set $hasMore to false, an >> infinite loop occurs and solr2.xml is requested repeatedly. This is because >> once $hasMore is set on a row, XPathEntityProcess#readUsefulVars sets it in >> entity scope and it never gets unset. Is this intentional? Shouldn't >> $hasMore get reset after more is requested? > > I would say we must reset it after using once. >> >> On a related note, it would seem useful to allow $hasMore/$skipRow/$nextUrl >> to be controlled from the XML data rather than solely from a transformer. >> But $prefixed fields are ignored by DIH, right? > This is possible using a RegexTransformer (so you may not need to > write your own) > > <field column="$hasMore" regex="HASMORE" replaceWith="true"/>
a small correction <field column="$hasMore" regex="HASMORE" replaceWith="true" sourceColName="id"/> > > >> >> I'm still looking for that holy grail of a good example leveraging >> $hasMore/$nextUrl! :) >> >> Thanks, >> Erik >> >> > > > > -- > ----------------------------------------------------- > Noble Paul | Principal Engineer| AOL | http://aol.com > -- ----------------------------------------------------- Noble Paul | Principal Engineer| AOL | http://aol.com