Thanks Erick, This is how I was doing it but when I saw the Solr Cell stuff I figured I'd give it a go. What I ended up doing is the following
ModifiableSolrParams params = indexer.index(artifact); params.add("fmap.content", "my_custom_field"); params.add("extractFormat", "text"); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest( "/update/extract"); up.setParams(params); FileStream f = new FileStream(new File("")); up.addContentStream(f); On Fri, Sep 6, 2013 at 9:54 AM, Erick Erickson <erickerick...@gmail.com>wrote: > It's always frustrating when someone replies with "Why not do it > a completely different way?". But I will anyway :). > > There's no requirement at all that you send things to Solr to make > Solr Cel (aka Tika) do it's tricks. Since you're already in SolrJ > anyway, why not just parse on the client? This has the advantage > of allowing you to offload the Tika processing from Solr which can > be quite expensive. You can use the same Tika jars that come > with Solr or download whatever version from the Tika project > you want. That way, you can exercise much better control over > what's done. > > Here's a skeletal program with indexing from a DB mixed in, but > it shouldn't be hard at all to pull the DB parts out. > > http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ > > FWIW, > Erick > > > On Thu, Sep 5, 2013 at 5:28 PM, Jamie Johnson <jej2...@gmail.com> wrote: > > > Is it possible to configure solr cell to only extract and store the body > of > > a document when indexing? I'm currently doing the following which I > > thought would work > > > > ModifiableSolrParams params = new ModifiableSolrParams(); > > > > params.set("defaultField", "content"); > > > > params.set("xpath", "/xhtml:html/xhtml:body/descendant::node()"); > > > > ContentStreamUpdateRequest up = new ContentStreamUpdateRequest( > > "/update/extract"); > > > > up.setParams(params); > > > > FileStream f = new FileStream(new File("..")); > > > > up.addContentStream(f); > > > > up.setAction(ACTION.COMMIT, true, true); > > > > solrServer.request(up); > > > > > > But the result of content is as follows > > > > <arr name="content_mvtxt"> > > <str/> > > <str>null</str> > > <str>ISO-8859-1</str> > > <str>text/plain; charset=ISO-8859-1</str> > > <str>Just a little test</str> > > </arr> > > > > > > What I had hoped for was just > > > > <arr name="content_mvtxt"> > > <str>Just a little test</str> > > </arr> > > >