DataImportHandler w/ multivalued fields
Hello Solr Community! I am implementing a data connection to Solr through the Data Import Handler and non-multivalued fields are working correctly, but multivalued fields are not getting indexed properly. I am new to DataImportHandler, but from what I could find, the entity is the way to go for multivalued field. The weird thing is that data is being indexed for one row, meaning first raw_tag gets populated. Anyone have any ideas? Thanks, Briggs This is the relevant part of the schema: field name =raw_tag type=text_en_lessAggressive indexed=true stored=false multivalued=true/ field name =raw_tag_string type=string indexed=false stored=true multivalued=true/ copyField source=raw_tag dest=raw_tag_string/ And the relevant part of data-import.xml: document name=merchant entity name=site query=select * from site field column=siteId name=siteId / field column=domain name=domain / field column=aliasFor name=aliasFor / field column=title name=title / field column=description name=description / field column=requests name=requests / field column=requiresModeration name=requiresModeration / field column=blocked name=blocked / field column=affiliateLink name=affiliateLink / field column=affiliateTracker name=affiliateTracker / field column=affiliateNetwork name=affiliateNetwork / field column=cjMerchantId name=cjMerchantId / field column=thumbNail name=thumbNail / field column=updateRankings name=updateRankings / field column=couponCount name=couponCount / field column=category name=category / field column=adult name=adult / field column=rank name=rank / field column=redirectsTo name=redirectsTo / field column=wwwRequired name=wwwRequired / field column=avgSavings name=avgSavings / field column=products name=products / field column=nameChecked name=nameChecked / field column=tempFlag name=tempFlag / field column=created name=created / field column=enableSplitTesting name=enableSplitTesting / field column=affiliateLinklock name=affiliateLinklock / field column=hasMobileSite name=hasMobileSite / field column=blockSite name=blockSite / entity name=merchant_tags pk=siteId query=select raw_tag, freetags.id, freetagged_objects.object_id as siteId from freetags inner join freetagged_objects on freetags.id=freetagged_objects.tag_id where freetagged_objects.object_id='${site.siteId}' field column=raw_tag name=raw_tag/ /entity /entity /document
Re: DataImportHandler w/ multivalued fields
In addition, I tried a query like below and changed the column definition to field column=raw_tag name=raw_tag splitBy=, / and still no luck. It is indexing the full content now but not multivalued. It seems like the splitBy ins't working properly. select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.* from site left outer join (freetags inner join freetagged_objects) on (freetags.id = freetagged_objects.tag_id and site.siteId = freetagged_objects.object_id) group by site.siteId Am I doing something wrong? Thanks, Briggs Thompson On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Hello Solr Community! I am implementing a data connection to Solr through the Data Import Handler and non-multivalued fields are working correctly, but multivalued fields are not getting indexed properly. I am new to DataImportHandler, but from what I could find, the entity is the way to go for multivalued field. The weird thing is that data is being indexed for one row, meaning first raw_tag gets populated. Anyone have any ideas? Thanks, Briggs This is the relevant part of the schema: field name =raw_tag type=text_en_lessAggressive indexed=true stored=false multivalued=true/ field name =raw_tag_string type=string indexed=false stored=true multivalued=true/ copyField source=raw_tag dest=raw_tag_string/ And the relevant part of data-import.xml: document name=merchant entity name=site query=select * from site field column=siteId name=siteId / field column=domain name=domain / field column=aliasFor name=aliasFor / field column=title name=title / field column=description name=description / field column=requests name=requests / field column=requiresModeration name=requiresModeration / field column=blocked name=blocked / field column=affiliateLink name=affiliateLink / field column=affiliateTracker name=affiliateTracker / field column=affiliateNetwork name=affiliateNetwork / field column=cjMerchantId name=cjMerchantId / field column=thumbNail name=thumbNail / field column=updateRankings name=updateRankings / field column=couponCount name=couponCount / field column=category name=category / field column=adult name=adult / field column=rank name=rank / field column=redirectsTo name=redirectsTo / field column=wwwRequired name=wwwRequired / field column=avgSavings name=avgSavings / field column=products name=products / field column=nameChecked name=nameChecked / field column=tempFlag name=tempFlag / field column=created name=created / field column=enableSplitTesting name=enableSplitTesting / field column=affiliateLinklock name=affiliateLinklock / field column=hasMobileSite name=hasMobileSite / field column=blockSite name=blockSite / entity name=merchant_tags pk=siteId query=select raw_tag, freetags.id, freetagged_objects.object_id as siteId from freetags inner join freetagged_objects on freetags.id=freetagged_objects.tag_id where freetagged_objects.object_id='${site.siteId}' field column=raw_tag name=raw_tag/ /entity /entity /document
Re: DataImportHandler w/ multivalued fields
Hi Briggs, By saying multivalued fields are not getting indexed prperly, do you mean to say that you are not able to search on those fields ? Have you tried actually searching your Solr index for those multivalued terms and make sure if it returns the search results ? One possibility could be that the multivalued fields are getting indexed correctly and are searchable. However, since your schema.xml has a raw_tag field whose stored attribute is set to false, you may not be able to see those fields. On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: In addition, I tried a query like below and changed the column definition to field column=raw_tag name=raw_tag splitBy=, / and still no luck. It is indexing the full content now but not multivalued. It seems like the splitBy ins't working properly. select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.* from site left outer join (freetags inner join freetagged_objects) on (freetags.id = freetagged_objects.tag_id and site.siteId = freetagged_objects.object_id) group by site.siteId Am I doing something wrong? Thanks, Briggs Thompson On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Hello Solr Community! I am implementing a data connection to Solr through the Data Import Handler and non-multivalued fields are working correctly, but multivalued fields are not getting indexed properly. I am new to DataImportHandler, but from what I could find, the entity is the way to go for multivalued field. The weird thing is that data is being indexed for one row, meaning first raw_tag gets populated. Anyone have any ideas? Thanks, Briggs This is the relevant part of the schema: field name =raw_tag type=text_en_lessAggressive indexed=true stored=false multivalued=true/ field name =raw_tag_string type=string indexed=false stored=true multivalued=true/ copyField source=raw_tag dest=raw_tag_string/ And the relevant part of data-import.xml: document name=merchant entity name=site query=select * from site field column=siteId name=siteId / field column=domain name=domain / field column=aliasFor name=aliasFor / field column=title name=title / field column=description name=description / field column=requests name=requests / field column=requiresModeration name=requiresModeration / field column=blocked name=blocked / field column=affiliateLink name=affiliateLink / field column=affiliateTracker name=affiliateTracker / field column=affiliateNetwork name=affiliateNetwork / field column=cjMerchantId name=cjMerchantId / field column=thumbNail name=thumbNail / field column=updateRankings name=updateRankings / field column=couponCount name=couponCount / field column=category name=category / field column=adult name=adult / field column=rank name=rank / field column=redirectsTo name=redirectsTo / field column=wwwRequired name=wwwRequired / field column=avgSavings name=avgSavings / field column=products name=products / field column=nameChecked name=nameChecked / field column=tempFlag name=tempFlag / field column=created name=created / field column=enableSplitTesting name=enableSplitTesting / field column=affiliateLinklock name=affiliateLinklock / field column=hasMobileSite name=hasMobileSite / field column=blockSite name=blockSite / entity name=merchant_tags pk=siteId query=select raw_tag, freetags.id, freetagged_objects.object_id as siteId from freetags inner join freetagged_objects on freetags.id=freetagged_objects.tag_id where freetagged_objects.object_id='${site.siteId}' field column=raw_tag name=raw_tag/ /entity /entity /document -- Thanks and Regards Rahul A. Warawdekar
Re: DataImportHandler w/ multivalued fields
Hey Rahul, Thanks for the response. I actually just figured it thankfully :). To answer your question, the raw_tag is indexed and not stored (tokenized), and then there is a copyField for raw_tag to raw_tag_string which would be used for facets. That *should have* been displayed in the results. The silly mistake I made was not camel casing multiValued, which is clearly the source of the problem. The second email I sent changing the query and using the split for the multivalued field had an error in it in the form of a missing line: transformer=RegexTransformer in the entity declaration. Anyhow, thanks for the quick response! Briggs On Thu, Dec 1, 2011 at 12:57 PM, Rahul Warawdekar rahul.warawde...@gmail.com wrote: Hi Briggs, By saying multivalued fields are not getting indexed prperly, do you mean to say that you are not able to search on those fields ? Have you tried actually searching your Solr index for those multivalued terms and make sure if it returns the search results ? One possibility could be that the multivalued fields are getting indexed correctly and are searchable. However, since your schema.xml has a raw_tag field whose stored attribute is set to false, you may not be able to see those fields. On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: In addition, I tried a query like below and changed the column definition to field column=raw_tag name=raw_tag splitBy=, / and still no luck. It is indexing the full content now but not multivalued. It seems like the splitBy ins't working properly. select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.* from site left outer join (freetags inner join freetagged_objects) on (freetags.id = freetagged_objects.tag_id and site.siteId = freetagged_objects.object_id) group by site.siteId Am I doing something wrong? Thanks, Briggs Thompson On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Hello Solr Community! I am implementing a data connection to Solr through the Data Import Handler and non-multivalued fields are working correctly, but multivalued fields are not getting indexed properly. I am new to DataImportHandler, but from what I could find, the entity is the way to go for multivalued field. The weird thing is that data is being indexed for one row, meaning first raw_tag gets populated. Anyone have any ideas? Thanks, Briggs This is the relevant part of the schema: field name =raw_tag type=text_en_lessAggressive indexed=true stored=false multivalued=true/ field name =raw_tag_string type=string indexed=false stored=true multivalued=true/ copyField source=raw_tag dest=raw_tag_string/ And the relevant part of data-import.xml: document name=merchant entity name=site query=select * from site field column=siteId name=siteId / field column=domain name=domain / field column=aliasFor name=aliasFor / field column=title name=title / field column=description name=description / field column=requests name=requests / field column=requiresModeration name=requiresModeration / field column=blocked name=blocked / field column=affiliateLink name=affiliateLink / field column=affiliateTracker name=affiliateTracker / field column=affiliateNetwork name=affiliateNetwork / field column=cjMerchantId name=cjMerchantId / field column=thumbNail name=thumbNail / field column=updateRankings name=updateRankings / field column=couponCount name=couponCount / field column=category name=category / field column=adult name=adult / field column=rank name=rank / field column=redirectsTo name=redirectsTo / field column=wwwRequired name=wwwRequired / field column=avgSavings name=avgSavings / field column=products name=products / field column=nameChecked name=nameChecked / field column=tempFlag name=tempFlag / field column=created name=created / field column=enableSplitTesting name=enableSplitTesting / field column=affiliateLinklock name=affiliateLinklock / field column=hasMobileSite name=hasMobileSite / field column=blockSite name=blockSite / entity name=merchant_tags pk=siteId query=select raw_tag, freetags.id, freetagged_objects.object_id as siteId from freetags inner join freetagged_objects on freetags.id=freetagged_objects.tag_id where freetagged_objects.object_id='${site.siteId}' field