Also, concatenation of values is also quite common. We should have a way of doing this without forcing everybody to write code. I think we should add a ConcatenateTransformer in DataImportHandler itself which can take care of basic use-cases. A syntax like this may be good enough:
<field column="myField" concatenate="field1, field2, field3,..." separateBy=" " /> What do you think? On Sat, Apr 19, 2008 at 12:41 AM, Shalin Shekhar Mangar < [EMAIL PROTECTED]> wrote: > Hi David, > Actually you can concatenate values, however you'll have to write a bit of > code. You can write this in javascript (if you're using Java 6) or in Java. > > Basically, you need to write a Transformer to do it. Look at > http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9 > > For example, lets say you get fields first-name and last-name in the XML. > But in the schema.xml you have a field called "name" in which you need to > concatenate the values of first-name and last-name (with a space in > between). Create a Java class: > > public class ConcatenateTransformer { public Object > transformRow(Map<String, Object> row) { String firstName = > row.get("first-name"); String lastName = row.get("last-name"); > row.put("name", firstName + " " + lastName); return row; } } > > Add this class to solr's classpath by putting its jar in solr/WEB-INF/lib > > The data-config.xml should like this: > <entity name="myEntity" processor="XPathEntityProcessor" url=" > http://myurl/example.xml" > transformer="com.yourpackage.ConcatenateTransformer"> <field > column="first-name" xpath="/record/first-name" /> <field column="last-name" > xpath="/record/last-name" /> <field column="name" /> </entity> > > This will call ConcatenateTransformer.transformRow method for each row and > you can concatenate any field with any field (or constant). Note that solr > document will keep only those fields which are in the schema.xml, the rest > are thrown away. > > If you don't want to write this in Java, you can use JavaScript by using > the built-in ScriptTransformer, for an example look at > http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9 > > However, I'm beginning to realize that XSLT is a common need, let me see > how best we can accomodate it in DataImportHandler. Which XSLT processor > will you prefer? > > On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org < > [EMAIL PROTECTED]> wrote: > > > > > I'm in the same situation as you Daniel. The DataImportHandler is > > pretty > > awesome but I'd also prefer it had the power of XSLT. The XPath support > > in > > it doesn't suffice for me. And I can't do very basic things like > > concatenate one value with another, say a constant even. It's too bad > > there > > isn't a mode that XSLT can be put in to to not build the whole file into > > memory to do the transform. I've been looking into this and have turned > > up > > nothing. It would be neat if there was a STaX to multi-document > > adapter, at > > which point XSLT could be applied to the smaller fixed-size documents > > instead of the entire data stream. I haven't found anything like this > > so > > it'd need to be built. For now my documents aren't too big to XSLT > > in-memory. > > > > ~ David > > > > > > Daniel Papasian wrote: > > > > > > Shalin Shekhar Mangar wrote: > > >> Hi Daniel, > > >> > > >> Maybe if you can give us a sample of how your XML looks like, we can > > >> suggest > > >> how to use SOLR-469 (Data Import Handler) to index it. Most of the > > >> use-cases > > >> we have yet encountered are solvable using the XPathEntityProcessor > > in > > >> DataImportHandler without using XSLT, for details look at > > >> > > http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476 > > > > > > I think even if it is possible to use SOLR-469 for my needs, I'd still > > > prefer the XSLT approach, because it's going to be a bit of > > > configuration either way, and I'd rather it be an XSLT stylesheet than > > > solrconfig.xml. In addition, I haven't yet decided whether I want to > > > apply any patches to the version that we will deploy, but if I do go > > > down the route of the XSLT transform patch, if I end up having to back > > > it out the amount of work that it would be for me to do the transform > > at > > > the XML source would be negligible, where it would be quite a bit of > > > work ahead of me to go from using the DataImportHandler to not using > > it > > > at all. > > > > > > Because both the solr instance and the XML source are in house, I have > > > the ability to apply the XSLT at the source instead of at solr. > > > However, there are different teams of people that control the XML > > source > > > and solr, so it would require a bit more office coordination to do it > > on > > > the backend. > > > > > > The data is a filemaker XML export (DTD fmresultset) and it looks > > > roughly like this: > > > <fmresultset> > > > <resultset> > > > <field name="ID"><data>125</data></field> > > > <field name="organization"><data>Ford Foundation</data></field> > > > ... > > > <relatedset table="Employees"> > > > <record> > > > <field name="ID"><data>Y5-A</data></field> > > > <field name="Name"><data>John Smith</data></field> > > > </record> > > > <record> > > > <field name="ID"><data>Y5-B</data></field> > > > <field name="Name"><data>Jane Doe</data></field> > > > </record> > > > </relatedset> > > > </fmresultset> > > > > > > I'm taking the product of the resultset and the relatedset, using both > > > IDs concatenated as a unique identifier, like so: > > > > > > <doc> > > > <field name="ID">125Y5-A</field> > > > <field name="organization">Ford Foundation</field> > > > <field name="Name">John Smith</field> > > > </doc> > > > <doc> > > > <field name="ID">125Y5-B</field> > > > <field name="organization">Ford Foundation</field> > > > <field name="Name">Jane Doe</field> > > > </doc> > > > > > > I can do the transform pretty simply with XSLT. I suppose it is > > > possible to get the DataImportHandler to do this, but I'm not yet > > > convinced that it's easier. > > > > > > Daniel > > > > > > > > > > -- > > View this message in context: > > http://www.nabble.com/XSLT-transform-before-update--tp16738227p16764009.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > -- > Regards, > Shalin Shekhar Mangar. > -- Regards, Shalin Shekhar Mangar.