Re: XSLT transform before update?

Shalin Shekhar Mangar Fri, 18 Apr 2008 12:12:04 -0700

Hi David,
Actually you can concatenate values, however you'll have to write a bit of
code. You can write this in javascript (if you're using Java 6) or in Java.


Basically, you need to write a Transformer to do it. Look at
http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9

For example, lets say you get fields first-name and last-name in the XML.
But in the schema.xml you have a field called "name" in which you need to
concatenate the values of first-name and last-name (with a space in
between). Create a Java class:

public class ConcatenateTransformer { public Object transformRow(Map<String,
Object> row) { String firstName = row.get("first-name"); String lastName =
row.get("last-name"); row.put("name", firstName + " " + lastName); return
row; } }

Add this class to solr's classpath by putting its jar in solr/WEB-INF/lib

The data-config.xml should like this:
<entity name="myEntity" processor="XPathEntityProcessor" url="
http://myurl/example.xml";
transformer="com.yourpackage.ConcatenateTransformer"> <field
column="first-name" xpath="/record/first-name" /> <field column="last-name"
xpath="/record/last-name" /> <field column="name" /> </entity>

This will call ConcatenateTransformer.transformRow method for each row and
you can concatenate any field with any field (or constant). Note that solr
document will keep only those fields which are in the schema.xml, the rest
are thrown away.

If you don't want to write this in Java, you can use JavaScript by using the
built-in ScriptTransformer, for an example look at
http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9

However, I'm beginning to realize that XSLT is a common need, let me see how
best we can accomodate it in DataImportHandler. Which XSLT processor will
you prefer?

On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org <[EMAIL PROTECTED]>
wrote:

>
> I'm in the same situation as you Daniel.  The DataImportHandler is pretty
> awesome but I'd also prefer it had the power of XSLT.  The XPath support
> in
> it doesn't suffice for me.  And I can't do very basic things like
> concatenate one value with another, say a constant even.  It's too bad
> there
> isn't a mode that XSLT can be put in to to not build the whole file into
> memory to do the transform.  I've been looking into this and have turned
> up
> nothing.  It would be neat if there was a STaX to multi-document adapter,
> at
> which point XSLT could be applied to the smaller fixed-size documents
> instead of the entire data stream.  I haven't found anything like this so
> it'd need to be built.  For now my documents aren't too big to XSLT
> in-memory.
>
> ~ David
>
>
> Daniel Papasian wrote:
> >
> > Shalin Shekhar Mangar wrote:
> >> Hi Daniel,
> >>
> >> Maybe if you can give us a sample of how your XML looks like, we can
> >> suggest
> >> how to use SOLR-469 (Data Import Handler) to index it. Most of the
> >> use-cases
> >> we have yet encountered are solvable using the XPathEntityProcessor in
> >> DataImportHandler without using XSLT, for details look at
> >>
> http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
> >
> > I think even if it is possible to use SOLR-469 for my needs, I'd still
> > prefer the XSLT approach, because it's going to be a bit of
> > configuration either way, and I'd rather it be an XSLT stylesheet than
> > solrconfig.xml.  In addition, I haven't yet decided whether I want to
> > apply any patches to the version that we will deploy, but if I do go
> > down the route of the XSLT transform patch, if I end up having to back
> > it out the amount of work that it would be for me to do the transform at
> > the XML source would be negligible, where it would be quite a bit of
> > work ahead of me to go from using the DataImportHandler to not using it
> > at all.
> >
> > Because both the solr instance and the XML source are in house, I have
> > the ability to apply the XSLT at the source instead of at solr.
> > However, there are different teams of people that control the XML source
> > and solr, so it would require a bit more office coordination to do it on
> > the backend.
> >
> > The data is a filemaker XML export (DTD fmresultset) and it looks
> > roughly like this:
> > <fmresultset>
> >    <resultset>
> >      <field name="ID"><data>125</data></field>
> >      <field name="organization"><data>Ford Foundation</data></field>
> >      ...
> >      <relatedset table="Employees">
> >        <record>
> >          <field name="ID"><data>Y5-A</data></field>
> >          <field name="Name"><data>John Smith</data></field>
> >        </record>
> >        <record>
> >          <field name="ID"><data>Y5-B</data></field>
> >          <field name="Name"><data>Jane Doe</data></field>
> >        </record>
> >      </relatedset>
> > </fmresultset>
> >
> > I'm taking the product of the resultset and the relatedset, using both
> > IDs concatenated as a unique identifier, like so:
> >
> > <doc>
> > <field name="ID">125Y5-A</field>
> > <field name="organization">Ford Foundation</field>
> > <field name="Name">John Smith</field>
> > </doc>
> > <doc>
> > <field name="ID">125Y5-B</field>
> > <field name="organization">Ford Foundation</field>
> > <field name="Name">Jane Doe</field>
> > </doc>
> >
> > I can do the transform pretty simply with XSLT.  I suppose it is
> > possible to get the DataImportHandler to do this, but I'm not yet
> > convinced that it's easier.
> >
> > Daniel
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16764009.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: XSLT transform before update?

Reply via email to