Re: Problems with DIH and missing fields.

2011-04-01 Thread Stefan Matheis
Marcelo,

could you paste the relevant parts of your DIH config?

Regards
Stefan

On Thu, Mar 31, 2011 at 9:55 PM, Marcelo Iturbe marc...@santiago.cl wrote:
 Hello,
 I have an XML which contains personal contacts. Not all contacts have the
 same fields (email, phone, postal).

 The problem is that when certain fields are NOT present,  SOLR is injecting
 the previous contacts data.

 For example, assume the following from the XML feed:
 entry
        title type='text'Jane Doe/title
        gd:email rel='http://schemas.google.com/g/2005#work' address='
 jane@gmail.com' primary='true'/
        gd:postalAddress rel='http://schemas.google.com/g/2005#home
 'Santiago
            Region Metropolitana
        Chile/gd:postalAddress
    /entry
    entry
        title type='text'Jeff Smith/title
        gd:email rel='http://schemas.google.com/g/2005#work' address='
 jeff.sm...@gmail.com' primary='true'/
    /entry
    entry
        title type='text'Ana Mercurio/title
        gd:phoneNumber rel='http://schemas.google.com/g/2005#mobile'
 primary='true'+56912345678/gd:phoneNumber
    /entry

 The second contact, will have the first contacts postal address.
 The third contact, will have Janes Postal Address and Jeffs email address:

 lst
    arr name=title
        strAna Mercurio/str
    /arr
    arr name=phoneNumber
        str+5612345678/str
    /arr
    arr name=email
        strjeff.sm...@gmail.com/str
    /arr
    arr name=postalAddress
        strSantiago
            Region Metropolitana
        Chile/str
    /arr
 /lst

 This is how I have the fields specified in the schema.xml file:
    field name=email type=string indexed=true stored=true
 multiValued=true default= /
    field name=phoneNumber type=string indexed=true stored=true
 multiValued=true  default= /
    field name=postalAddress type=string indexed=true stored=true
 multiValued=true  default= /

 What did I miss?

 Thanks for your help.



Re: Problems with DIH and missing fields.

2011-04-01 Thread Marcelo Iturbe
Hello,
I was able to repeat this behaviour in Solr 3.1.0

The procedure is
 - rename the directory example-DIH/rss to example-DIH/gcontacts
 - modify solrconfig.xml to only load gcontacts
 - rename rss-data-config.xml to gcontacts-data-config.xml and modify (see
content below)
 - modify schema.xml

This is from my schema.xml
field name=source type=text indexed=true stored=true /
field name=source-link type=string indexed=false stored=true /

field name=title type=string indexed=true stored=true /
field name=link type=string indexed=true stored=true /
field name=email type=string indexed=true stored=true
multiValued=true default= /
field name=phoneNumber type=string indexed=true stored=true
multiValued=true  default= /
field name=organization type=string indexed=true stored=true
multiValued=true  default= /
field name=postalAddress type=string indexed=true stored=true
multiValued=true  default= /

field name=all_text type=text indexed=true stored=true
multiValued=true /
copyField source=title dest=all_text /
copyField source=email dest=all_text /
copyField source=phoneNumber dest=all_text /
copyField source=organization dest=all_text /
copyField source=postalAddress dest=all_text /

this is my gcontacts-data-config.xml file
dataConfig
dataSource type=URLDataSource /
document
entity name=gcontacts
pk=link
url=http://172.16.0.30/sayt2/contacts/testtim.xml;
processor=XPathEntityProcessor
forEach=/feed/entry


field column=source xpath=/feed/entry/id commonField=true
/
field column=source-link
xpath=/feed/entry/link[@rel='edit']/@href commonField=true /

field column=title xpath=/feed/entry/title
commonField=true/
field column=link xpath=/feed/entry/link[@rel='edit']/@href
/
field column=email xpath=/feed/entry/email/@address
commonField=true/
field column=phoneNumber xpath=/feed/entry/phoneNumber
commonField=true/
field column=organization xpath=/feed/entry/organization
commonField=true/
field column=postalAddress xpath=/feed/entry/postalAddress
commonField=true/
/entity
/document
/dataConfig

This is from my solrconfig.xml file
?xml version=1.0 encoding=UTF-8 standalone=yes?
solr sharedLib=lib persistent=true
cores adminPath=/admin/cores
core default=false instanceDir=gcontacts name=gcontacts/
/cores
/solr

Thanks for your help.

Regards

On Fri, Apr 1, 2011 at 4:27 AM, Stefan Matheis 
matheis.ste...@googlemail.com wrote:

 Marcelo,

 could you paste the relevant parts of your DIH config?

 Regards
 Stefan

 On Thu, Mar 31, 2011 at 9:55 PM, Marcelo Iturbe marc...@santiago.cl
 wrote:
  Hello,
  I have an XML which contains personal contacts. Not all contacts have the
  same fields (email, phone, postal).
 
  The problem is that when certain fields are NOT present,  SOLR is
 injecting
  the previous contacts data.
 
  For example, assume the following from the XML feed:
  entry
 title type='text'Jane Doe/title
 gd:email rel='http://schemas.google.com/g/2005#work' address='
  jane@gmail.com' primary='true'/
 gd:postalAddress rel='http://schemas.google.com/g/2005#home
  'Santiago
 Region Metropolitana
 Chile/gd:postalAddress
 /entry
 entry
 title type='text'Jeff Smith/title
 gd:email rel='http://schemas.google.com/g/2005#work' address='
  jeff.sm...@gmail.com' primary='true'/
 /entry
 entry
 title type='text'Ana Mercurio/title
 gd:phoneNumber rel='http://schemas.google.com/g/2005#mobile'
  primary='true'+56912345678/gd:phoneNumber
 /entry
 
  The second contact, will have the first contacts postal address.
  The third contact, will have Janes Postal Address and Jeffs email
 address:
 
  lst
 arr name=title
 strAna Mercurio/str
 /arr
 arr name=phoneNumber
 str+5612345678/str
 /arr
 arr name=email
 strjeff.sm...@gmail.com/str
 /arr
 arr name=postalAddress
 strSantiago
 Region Metropolitana
 Chile/str
 /arr
  /lst
 
  This is how I have the fields specified in the schema.xml file:
 field name=email type=string indexed=true stored=true
  multiValued=true default= /
 field name=phoneNumber type=string indexed=true stored=true
  multiValued=true  default= /
 field name=postalAddress type=string indexed=true stored=true
  multiValued=true  default= /
 
  What did I miss?
 
  Thanks for your help.
 



Re: Problems with DIH and missing fields.

2011-04-01 Thread Marcelo Iturbe
Solved it!

commonField=true
should be
commonField=false

mistakes that happen when copying source a sample proyect...

Thanks for your help.


On Fri, Apr 1, 2011 at 10:29 AM, Marcelo Iturbe marc...@santiago.cl wrote:


 Hello,
 I was able to repeat this behaviour in Solr 3.1.0

 The procedure is
  - rename the directory example-DIH/rss to example-DIH/gcontacts
  - modify solrconfig.xml to only load gcontacts
  - rename rss-data-config.xml to gcontacts-data-config.xml and modify (see
 content below)
  - modify schema.xml

 This is from my schema.xml
 field name=source type=text indexed=true stored=true /
 field name=source-link type=string indexed=false stored=true
 /

 field name=title type=string indexed=true stored=true /
 field name=link type=string indexed=true stored=true /

 field name=email type=string indexed=true stored=true
 multiValued=true default= /
 field name=phoneNumber type=string indexed=true stored=true
 multiValued=true  default= /
 field name=organization type=string indexed=true stored=true
 multiValued=true  default= /

 field name=postalAddress type=string indexed=true stored=true
 multiValued=true  default= /

 field name=all_text type=text indexed=true stored=true
 multiValued=true /
 copyField source=title dest=all_text /
 copyField source=email dest=all_text /
 copyField source=phoneNumber dest=all_text /
 copyField source=organization dest=all_text /
 copyField source=postalAddress dest=all_text /

 this is my gcontacts-data-config.xml file
 dataConfig
 dataSource type=URLDataSource /
 document
 entity name=gcontacts
 pk=link
 url=http://172.16.0.30/sayt2/contacts/testtim.xml;
 processor=XPathEntityProcessor
 forEach=/feed/entry
 

 field column=source xpath=/feed/entry/id
 commonField=true /
 field column=source-link
 xpath=/feed/entry/link[@rel='edit']/@href commonField=true /

 field column=title xpath=/feed/entry/title
 commonField=true/
 field column=link
 xpath=/feed/entry/link[@rel='edit']/@href /
 field column=email xpath=/feed/entry/email/@address
 commonField=true/
 field column=phoneNumber xpath=/feed/entry/phoneNumber
 commonField=true/
 field column=organization xpath=/feed/entry/organization
 commonField=true/
 field column=postalAddress
 xpath=/feed/entry/postalAddress  commonField=true/
 /entity
 /document
 /dataConfig

 This is from my solrconfig.xml file
 ?xml version=1.0 encoding=UTF-8 standalone=yes?
 solr sharedLib=lib persistent=true
 cores adminPath=/admin/cores
 core default=false instanceDir=gcontacts name=gcontacts/
 /cores
 /solr

 Thanks for your help.

 Regards


 On Fri, Apr 1, 2011 at 4:27 AM, Stefan Matheis 
 matheis.ste...@googlemail.com wrote:

 Marcelo,

 could you paste the relevant parts of your DIH config?

 Regards
 Stefan

 On Thu, Mar 31, 2011 at 9:55 PM, Marcelo Iturbe marc...@santiago.cl
 wrote:
  Hello,
  I have an XML which contains personal contacts. Not all contacts have
 the
  same fields (email, phone, postal).
 
  The problem is that when certain fields are NOT present,  SOLR is
 injecting
  the previous contacts data.
 
  For example, assume the following from the XML feed:
  entry
 title type='text'Jane Doe/title
 gd:email rel='http://schemas.google.com/g/2005#work' address='
  jane@gmail.com' primary='true'/
 gd:postalAddress rel='http://schemas.google.com/g/2005#home
  'Santiago
 Region Metropolitana
 Chile/gd:postalAddress
 /entry
 entry
 title type='text'Jeff Smith/title
 gd:email rel='http://schemas.google.com/g/2005#work' address='
  jeff.sm...@gmail.com' primary='true'/
 /entry
 entry
 title type='text'Ana Mercurio/title
 gd:phoneNumber rel='http://schemas.google.com/g/2005#mobile'
  primary='true'+56912345678/gd:phoneNumber
 /entry
 
  The second contact, will have the first contacts postal address.
  The third contact, will have Janes Postal Address and Jeffs email
 address:
 
  lst
 arr name=title
 strAna Mercurio/str
 /arr
 arr name=phoneNumber
 str+5612345678/str
 /arr
 arr name=email
 strjeff.sm...@gmail.com/str
 /arr
 arr name=postalAddress
 strSantiago
 Region Metropolitana
 Chile/str
 /arr
  /lst
 
  This is how I have the fields specified in the schema.xml file:
 field name=email type=string indexed=true stored=true
  multiValued=true default= /
 field name=phoneNumber type=string indexed=true stored=true
  multiValued=true  default= /
 field name=postalAddress type=string indexed=true
 stored=true
  multiValued=true  default= /
 
  What did I miss?
 
  Thanks for your help.
 





Problems with DIH and missing fields.

2011-03-31 Thread Marcelo Iturbe
Hello,
I have an XML which contains personal contacts. Not all contacts have the
same fields (email, phone, postal).

The problem is that when certain fields are NOT present,  SOLR is injecting
the previous contacts data.

For example, assume the following from the XML feed:
entry
title type='text'Jane Doe/title
gd:email rel='http://schemas.google.com/g/2005#work' address='
jane@gmail.com' primary='true'/
gd:postalAddress rel='http://schemas.google.com/g/2005#home
'Santiago
Region Metropolitana
Chile/gd:postalAddress
/entry
entry
title type='text'Jeff Smith/title
gd:email rel='http://schemas.google.com/g/2005#work' address='
jeff.sm...@gmail.com' primary='true'/
/entry
entry
title type='text'Ana Mercurio/title
gd:phoneNumber rel='http://schemas.google.com/g/2005#mobile'
primary='true'+56912345678/gd:phoneNumber
/entry

The second contact, will have the first contacts postal address.
The third contact, will have Janes Postal Address and Jeffs email address:

lst
arr name=title
strAna Mercurio/str
/arr
arr name=phoneNumber
str+5612345678/str
/arr
arr name=email
strjeff.sm...@gmail.com/str
/arr
arr name=postalAddress
strSantiago
Region Metropolitana
Chile/str
/arr
/lst

This is how I have the fields specified in the schema.xml file:
field name=email type=string indexed=true stored=true
multiValued=true default= /
field name=phoneNumber type=string indexed=true stored=true
multiValued=true  default= /
field name=postalAddress type=string indexed=true stored=true
multiValued=true  default= /

What did I miss?

Thanks for your help.