Re: Problems with DIH and missing fields.
Marcelo, could you paste the relevant parts of your DIH config? Regards Stefan On Thu, Mar 31, 2011 at 9:55 PM, Marcelo Iturbe marc...@santiago.cl wrote: Hello, I have an XML which contains personal contacts. Not all contacts have the same fields (email, phone, postal). The problem is that when certain fields are NOT present, SOLR is injecting the previous contacts data. For example, assume the following from the XML feed: entry title type='text'Jane Doe/title gd:email rel='http://schemas.google.com/g/2005#work' address=' jane@gmail.com' primary='true'/ gd:postalAddress rel='http://schemas.google.com/g/2005#home 'Santiago Region Metropolitana Chile/gd:postalAddress /entry entry title type='text'Jeff Smith/title gd:email rel='http://schemas.google.com/g/2005#work' address=' jeff.sm...@gmail.com' primary='true'/ /entry entry title type='text'Ana Mercurio/title gd:phoneNumber rel='http://schemas.google.com/g/2005#mobile' primary='true'+56912345678/gd:phoneNumber /entry The second contact, will have the first contacts postal address. The third contact, will have Janes Postal Address and Jeffs email address: lst arr name=title strAna Mercurio/str /arr arr name=phoneNumber str+5612345678/str /arr arr name=email strjeff.sm...@gmail.com/str /arr arr name=postalAddress strSantiago Region Metropolitana Chile/str /arr /lst This is how I have the fields specified in the schema.xml file: field name=email type=string indexed=true stored=true multiValued=true default= / field name=phoneNumber type=string indexed=true stored=true multiValued=true default= / field name=postalAddress type=string indexed=true stored=true multiValued=true default= / What did I miss? Thanks for your help.
Re: Problems with DIH and missing fields.
Hello, I was able to repeat this behaviour in Solr 3.1.0 The procedure is - rename the directory example-DIH/rss to example-DIH/gcontacts - modify solrconfig.xml to only load gcontacts - rename rss-data-config.xml to gcontacts-data-config.xml and modify (see content below) - modify schema.xml This is from my schema.xml field name=source type=text indexed=true stored=true / field name=source-link type=string indexed=false stored=true / field name=title type=string indexed=true stored=true / field name=link type=string indexed=true stored=true / field name=email type=string indexed=true stored=true multiValued=true default= / field name=phoneNumber type=string indexed=true stored=true multiValued=true default= / field name=organization type=string indexed=true stored=true multiValued=true default= / field name=postalAddress type=string indexed=true stored=true multiValued=true default= / field name=all_text type=text indexed=true stored=true multiValued=true / copyField source=title dest=all_text / copyField source=email dest=all_text / copyField source=phoneNumber dest=all_text / copyField source=organization dest=all_text / copyField source=postalAddress dest=all_text / this is my gcontacts-data-config.xml file dataConfig dataSource type=URLDataSource / document entity name=gcontacts pk=link url=http://172.16.0.30/sayt2/contacts/testtim.xml; processor=XPathEntityProcessor forEach=/feed/entry field column=source xpath=/feed/entry/id commonField=true / field column=source-link xpath=/feed/entry/link[@rel='edit']/@href commonField=true / field column=title xpath=/feed/entry/title commonField=true/ field column=link xpath=/feed/entry/link[@rel='edit']/@href / field column=email xpath=/feed/entry/email/@address commonField=true/ field column=phoneNumber xpath=/feed/entry/phoneNumber commonField=true/ field column=organization xpath=/feed/entry/organization commonField=true/ field column=postalAddress xpath=/feed/entry/postalAddress commonField=true/ /entity /document /dataConfig This is from my solrconfig.xml file ?xml version=1.0 encoding=UTF-8 standalone=yes? solr sharedLib=lib persistent=true cores adminPath=/admin/cores core default=false instanceDir=gcontacts name=gcontacts/ /cores /solr Thanks for your help. Regards On Fri, Apr 1, 2011 at 4:27 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: Marcelo, could you paste the relevant parts of your DIH config? Regards Stefan On Thu, Mar 31, 2011 at 9:55 PM, Marcelo Iturbe marc...@santiago.cl wrote: Hello, I have an XML which contains personal contacts. Not all contacts have the same fields (email, phone, postal). The problem is that when certain fields are NOT present, SOLR is injecting the previous contacts data. For example, assume the following from the XML feed: entry title type='text'Jane Doe/title gd:email rel='http://schemas.google.com/g/2005#work' address=' jane@gmail.com' primary='true'/ gd:postalAddress rel='http://schemas.google.com/g/2005#home 'Santiago Region Metropolitana Chile/gd:postalAddress /entry entry title type='text'Jeff Smith/title gd:email rel='http://schemas.google.com/g/2005#work' address=' jeff.sm...@gmail.com' primary='true'/ /entry entry title type='text'Ana Mercurio/title gd:phoneNumber rel='http://schemas.google.com/g/2005#mobile' primary='true'+56912345678/gd:phoneNumber /entry The second contact, will have the first contacts postal address. The third contact, will have Janes Postal Address and Jeffs email address: lst arr name=title strAna Mercurio/str /arr arr name=phoneNumber str+5612345678/str /arr arr name=email strjeff.sm...@gmail.com/str /arr arr name=postalAddress strSantiago Region Metropolitana Chile/str /arr /lst This is how I have the fields specified in the schema.xml file: field name=email type=string indexed=true stored=true multiValued=true default= / field name=phoneNumber type=string indexed=true stored=true multiValued=true default= / field name=postalAddress type=string indexed=true stored=true multiValued=true default= / What did I miss? Thanks for your help.
Re: Problems with DIH and missing fields.
Solved it! commonField=true should be commonField=false mistakes that happen when copying source a sample proyect... Thanks for your help. On Fri, Apr 1, 2011 at 10:29 AM, Marcelo Iturbe marc...@santiago.cl wrote: Hello, I was able to repeat this behaviour in Solr 3.1.0 The procedure is - rename the directory example-DIH/rss to example-DIH/gcontacts - modify solrconfig.xml to only load gcontacts - rename rss-data-config.xml to gcontacts-data-config.xml and modify (see content below) - modify schema.xml This is from my schema.xml field name=source type=text indexed=true stored=true / field name=source-link type=string indexed=false stored=true / field name=title type=string indexed=true stored=true / field name=link type=string indexed=true stored=true / field name=email type=string indexed=true stored=true multiValued=true default= / field name=phoneNumber type=string indexed=true stored=true multiValued=true default= / field name=organization type=string indexed=true stored=true multiValued=true default= / field name=postalAddress type=string indexed=true stored=true multiValued=true default= / field name=all_text type=text indexed=true stored=true multiValued=true / copyField source=title dest=all_text / copyField source=email dest=all_text / copyField source=phoneNumber dest=all_text / copyField source=organization dest=all_text / copyField source=postalAddress dest=all_text / this is my gcontacts-data-config.xml file dataConfig dataSource type=URLDataSource / document entity name=gcontacts pk=link url=http://172.16.0.30/sayt2/contacts/testtim.xml; processor=XPathEntityProcessor forEach=/feed/entry field column=source xpath=/feed/entry/id commonField=true / field column=source-link xpath=/feed/entry/link[@rel='edit']/@href commonField=true / field column=title xpath=/feed/entry/title commonField=true/ field column=link xpath=/feed/entry/link[@rel='edit']/@href / field column=email xpath=/feed/entry/email/@address commonField=true/ field column=phoneNumber xpath=/feed/entry/phoneNumber commonField=true/ field column=organization xpath=/feed/entry/organization commonField=true/ field column=postalAddress xpath=/feed/entry/postalAddress commonField=true/ /entity /document /dataConfig This is from my solrconfig.xml file ?xml version=1.0 encoding=UTF-8 standalone=yes? solr sharedLib=lib persistent=true cores adminPath=/admin/cores core default=false instanceDir=gcontacts name=gcontacts/ /cores /solr Thanks for your help. Regards On Fri, Apr 1, 2011 at 4:27 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: Marcelo, could you paste the relevant parts of your DIH config? Regards Stefan On Thu, Mar 31, 2011 at 9:55 PM, Marcelo Iturbe marc...@santiago.cl wrote: Hello, I have an XML which contains personal contacts. Not all contacts have the same fields (email, phone, postal). The problem is that when certain fields are NOT present, SOLR is injecting the previous contacts data. For example, assume the following from the XML feed: entry title type='text'Jane Doe/title gd:email rel='http://schemas.google.com/g/2005#work' address=' jane@gmail.com' primary='true'/ gd:postalAddress rel='http://schemas.google.com/g/2005#home 'Santiago Region Metropolitana Chile/gd:postalAddress /entry entry title type='text'Jeff Smith/title gd:email rel='http://schemas.google.com/g/2005#work' address=' jeff.sm...@gmail.com' primary='true'/ /entry entry title type='text'Ana Mercurio/title gd:phoneNumber rel='http://schemas.google.com/g/2005#mobile' primary='true'+56912345678/gd:phoneNumber /entry The second contact, will have the first contacts postal address. The third contact, will have Janes Postal Address and Jeffs email address: lst arr name=title strAna Mercurio/str /arr arr name=phoneNumber str+5612345678/str /arr arr name=email strjeff.sm...@gmail.com/str /arr arr name=postalAddress strSantiago Region Metropolitana Chile/str /arr /lst This is how I have the fields specified in the schema.xml file: field name=email type=string indexed=true stored=true multiValued=true default= / field name=phoneNumber type=string indexed=true stored=true multiValued=true default= / field name=postalAddress type=string indexed=true stored=true multiValued=true default= / What did I miss? Thanks for your help.
Problems with DIH and missing fields.
Hello, I have an XML which contains personal contacts. Not all contacts have the same fields (email, phone, postal). The problem is that when certain fields are NOT present, SOLR is injecting the previous contacts data. For example, assume the following from the XML feed: entry title type='text'Jane Doe/title gd:email rel='http://schemas.google.com/g/2005#work' address=' jane@gmail.com' primary='true'/ gd:postalAddress rel='http://schemas.google.com/g/2005#home 'Santiago Region Metropolitana Chile/gd:postalAddress /entry entry title type='text'Jeff Smith/title gd:email rel='http://schemas.google.com/g/2005#work' address=' jeff.sm...@gmail.com' primary='true'/ /entry entry title type='text'Ana Mercurio/title gd:phoneNumber rel='http://schemas.google.com/g/2005#mobile' primary='true'+56912345678/gd:phoneNumber /entry The second contact, will have the first contacts postal address. The third contact, will have Janes Postal Address and Jeffs email address: lst arr name=title strAna Mercurio/str /arr arr name=phoneNumber str+5612345678/str /arr arr name=email strjeff.sm...@gmail.com/str /arr arr name=postalAddress strSantiago Region Metropolitana Chile/str /arr /lst This is how I have the fields specified in the schema.xml file: field name=email type=string indexed=true stored=true multiValued=true default= / field name=phoneNumber type=string indexed=true stored=true multiValued=true default= / field name=postalAddress type=string indexed=true stored=true multiValued=true default= / What did I miss? Thanks for your help.