You can't just send arbitrary XML to Solr for update, no. You need to send a Solr Update Request in XML. You can write software that transforms that arbitrary XML to a Solr update request, for simple cases it could even just be XSLT. There are also a variety of other mediator pieces that come with Solr for doing updates; you can send updates in comma-seperated-value format, or you can use Direct Import Handler to, in some not-too-complicated cases, embed the translation from your arbitrary XML to Solr documents in your Solr instance itself.

But you can't just send arbitrary XML to the Solr update handler, no.

No matter what method you use to send documents to solr, you're going to have to think about what you want your Solr schema to look like -- what fields of what types. And then map your data to it. In Solr, unlike in an rdbms, what you want your schema to look like has a lot to do with what kinds of queries you will want it to support, it can't just be done based on the nature of the data alone.

Jonathan

On 2/15/2011 12:45 PM, alan bonnemaison wrote:
Erick,

I think you put the finger on the problem. Our XML files (we get from our
suppliers) do *not* look like that.

That's what a typical file looks like

<insert_list>...................<result><result
outcome="PASS"></result><parameter_list><string_parameter name="SN"
value="NOVAL" /><string_parameter name="RECEIVER" value="000907010391"
/><string_parameter name="Model" value="R16-500" />...<string_parameter
name="WorkCenterID" value="PREP" /><string_parameter name="SiteID"
value="CTCA" /><string_parameter name="RouteID" value="ADV"
/><string_parameter name="LineID" value="Line5" /></parameter_list><config
enable_sfcs_comm="true" enable_param_db_comm="false"
force_param_db_update="false" driver_platform="LABVIEW" mode="PROD"
driver_revision="2.0"></config></insert_list>

Obviously, nothing like<add><doc>....</doc></add>

By the way, querying q=*:* retrieved "HTTP error 500 Null pointer
exception", which leads me to believe that my index is 100% empty.

What I am trying to do cannot be done, correct? I just don't want to waste
anyone's time.................

Thanks,

Alan.


On Tue, Feb 15, 2011 at 6:01 AM, Erick Erickson<erickerick...@gmail.com>wrote:

Can we see a small sample of an xml file you're posting? Because it should
look something like
<add>
   <doc>
     <field name="stbmodel">R16-500</field>
        more fields here.
   </doc>
</add>

Take a look at the Solr admin page after you've indexed data to see what's
actually in your index, I suspect what's in there isn't what you
expect.

Try querying q=*:* just for yucks to see what the documents returned look
like.

I suspect your index doesn't contain anything like what you think, but
that's only
a guess...

Best
Erick

On Mon, Feb 14, 2011 at 7:15 PM, alan bonnemaison<kg6...@gmail.com>
wrote:
Hello!

We receive from our suppliers hardware manufacturing data in XML files.
On a
typical day, we got 25,000 files. That is why I chose to implement Solr.

The file names are made of eleven fields separated by tildas like so


CTCA~PRE~PREP~1010123~ONTDTVP5A~41~P~R16-500~000912239878~20110125~212321.XML
Our R&D guys want to be able search each field of the file XML file names
(OR operation) but they don't care to search the file contents. Ideally,
they would like to do a query all files where "stbmodel" equal to
"R16-500"
or "result" is "P" or "filedate" is "20110125"...you get the idea.

I defined in schema.xml each data field like so (from left to right --
sorry
for the long list):

   <field name="location"       type="textgen"          indexed="false"
stored="true"   multiValued="false"/>
   <field name="scriptid"       type="textgen"          indexed="false"
stored="true"   multiValued="false"/>
   <field name="slotid"         type="textgen"          indexed="false"
stored="true"   multiValued="false"/>
   <field name="workcenter"     type="textgen"          indexed="false"
stored="false"  multiValued="false"/>
   <field name="workcenterid"   type="textgen"          indexed="false"
stored="fase"   multiValued="false"/>
   <field name="result"         type="string"           indexed="true"
stored="true"    multiValued="false"/>
   <field name="computerid"     type="textgen"          indexed="false"
stored="true"   multiValued="false"/>
   <field name="stbmodel"       type="textgen"          indexed="true"
stored="true"    multiValued="false"/>
   <field name="receiver"       type="string"           indexed="true"
stored="true"    multiValued="false"/>
   <field name="filedate"       type="textgen"          indexed="false"
stored="true"   multiValued="false"/>
   <field name="filetime"       type="textgen"          indexed="false"
stored="true"   multiValued="false"/>

Also, I defined as unique key the field "receiver". But no results are
returned by my queries. I made sure to update my index like so: "java
-jar
apache-solr-1.4.1/example/exampledocs/post.jar *XML".

I am obviously missing something. Is there a way to configure schema.xml
to
search for file names? I welcome your input.

Al.



Reply via email to