Hello. I have been beating my head around the data-config.xml listed at the end of this message. It breaks in a few different ways.
1) I have bodged TemplateTransformer to allow it to return when one of the variables is undefined. This ensures my uniqueKey is always defined. But thinking more on Nobel's comments there is use in having it work both ways. ie leaving the column undefined or replacing the variable with "". I still like my idea about using the default value of a solr field from schema.xml, but I cant figure out how/where to best implement it. 2) Having used TemplateTransformer to assign a value to an entity column that column cannot be used in other TemplateTransformer operations. In my project I am attempting to reuse "x.fileWebPath". To fix this, the last line of transformRow() in TemplateTransformer.java needs replaced with the following which as well as 'putting' the templated-ed string in 'row' also saves it into the 'resolver'. **originally** row.put(column, resolver.replaceTokens(expr)); } **new** String columnName = map.get(DataImporter.COLUMN); expr=resolver.replaceTokens(expr); row.put(columnName, expr); resolverMapCopy.put(columnName, expr); } As an aside I think I ran into the issues covered by SOLR-993. It took a while to figure out I could not a a single columnname/value to the resolver. I had instead to add to the map that was already stored within the resolver. 3) No entity column names can be used within RegexTransformer. I guess all the stuff that was added to TemplateTransformer to allow column names to be used in templates needs re-added into RegexTransformer. I am doing that now... but am confused by the fragment of code which copies from resolverMap into resolverMapCopy. As best I can see resolverMap is always empty; but I am barely able to follow the code! Can somebody explain when/why resolverMap would be populated. Also, I begin to understand comments made by Noble in SOL-1001 about resolving "entity attributes in ContextImpl.getEntityAttribute" and I guess Shalin was right as well. However it also seems wrong that at the top of every transformer we are going to repeat the same code to load the resolver with information about the entity. 4) In that I am reusing template output within other templates the order of execution becomes important. Can I assume that the explicitly listed columns in an entity are processed by the various transformers in the order they appear within data-config.xml. I *think* that the list of columns within an entity as returned by getAllEntityFields() is actually an ArrayList which I think or order dependent. IS this correct? 5) Should I raise this as a single JIRA issue? 6) Having played with this stuff, I was going to add a bit more to the wiki highlighting some of the possibilities and issues with transformers. But want to check with the list first! <dataConfig> <dataSource name="myfilereader" type="FileDataSource"/> <document> <entity name="jc" processor="FileListEntityProcessor" fileName="^.*\.xml$" newerThan="'NOW-1000DAYS'" recursive="true" rootEntity="false" dataSource="null" baseDir="/Volumes/spare/ts/solr/content" > <entity name="x" dataSource="myfilereader" processor="XPathEntityProcessor" url="${jc.fileAbsolutePath}" rootEntity="true" stream="false" forEach="/record | /record/mediaBlock" transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer"> <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" /> <field column="fileWebPath" regex="${x.test}(.*)" replaceWith="/ford$1" sourceColName="fileAbsolutePath"/> <field column="title" xpath="/record/title" /> <field column="para1" name="para" xpath="/record/sect1/para" /> <field column="para2" name="para" xpath="/record/list/listitem/para" /> <field column="pubdate" xpath="/record/metadata/da...@qualifier='pubDate']" dateTimeFormat="yyyyMMdd" /> <field column="vurl" xpath="/record/mediaBlock/mediaObject/@vurl" /> <field column="imgSrcArticle" template="${dataimporter.request.fordinstalldir}" /> <field column="imgCpation" xpath="/record/mediaBlock/caption" /> <field column="test" template="${dataimporter.request.contentinstalldir}" /> <!-- **problem is that vurl is just a fragment of the info needed to access the picture. --> <field column="imgWebPathICON" regex="(.*)/.*" replaceWith="$1/imagery/${x.vurl}s.jpg" sourceColName="fileWebPath"/> <field column="imgWebPathFULL" regex="(.*)/.*" replaceWith="$1/imagery/${x.vurl}.jpg" sourceColName="fileWebPath"/> <field column="vdkvgwkey" template="${jc.fileAbsolutePath}#${x.vurl}" /> </entity> </entity> </document> </dataConfig> Regards Fergus. -- =============================================================== Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================