DataImportHandler w/ multivalued fields

2011-12-01 Thread Briggs Thompson
Hello Solr Community!

I am implementing a data connection to Solr through the Data Import Handler
and non-multivalued fields are working correctly, but multivalued fields
are not getting indexed properly.

I am new to DataImportHandler, but from what I could find, the entity is
the way to go for multivalued field. The weird thing is that data is being
indexed for one row, meaning first raw_tag gets populated.


Anyone have any ideas?
Thanks,
Briggs

This is the relevant part of the schema:

   field name =raw_tag type=text_en_lessAggressive indexed=true
stored=false multivalued=true/
   field name =raw_tag_string type=string indexed=false
stored=true multivalued=true/
   copyField source=raw_tag dest=raw_tag_string/

And the relevant part of data-import.xml:

document name=merchant
entity name=site
  query=select * from site 
field column=siteId name=siteId /
field column=domain name=domain /
field column=aliasFor name=aliasFor /
field column=title name=title /
field column=description name=description /
field column=requests name=requests /
field column=requiresModeration name=requiresModeration /
field column=blocked name=blocked /
field column=affiliateLink name=affiliateLink /
field column=affiliateTracker name=affiliateTracker /
field column=affiliateNetwork name=affiliateNetwork /
field column=cjMerchantId name=cjMerchantId /
field column=thumbNail name=thumbNail /
field column=updateRankings name=updateRankings /
field column=couponCount name=couponCount /
field column=category name=category /
field column=adult name=adult /
field column=rank name=rank /
field column=redirectsTo name=redirectsTo /
field column=wwwRequired name=wwwRequired /
field column=avgSavings name=avgSavings /
field column=products name=products /
field column=nameChecked name=nameChecked /
field column=tempFlag name=tempFlag /
field column=created name=created /
field column=enableSplitTesting name=enableSplitTesting /
field column=affiliateLinklock name=affiliateLinklock /
field column=hasMobileSite name=hasMobileSite /
field column=blockSite name=blockSite /
entity name=merchant_tags pk=siteId
query=select raw_tag, freetags.id,
freetagged_objects.object_id as siteId
   from freetags
   inner join freetagged_objects
   on freetags.id=freetagged_objects.tag_id
   where freetagged_objects.object_id='${site.siteId}'
field column=raw_tag name=raw_tag/
/entity
/entity
/document


Re: DataImportHandler w/ multivalued fields

2011-12-01 Thread Briggs Thompson
In addition, I tried a query like below and changed the column definition
to
field column=raw_tag name=raw_tag splitBy=, /
and still no luck. It is indexing the full content now but not multivalued.
It seems like the splitBy ins't working properly.

select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.*
from site
left outer join
  (freetags inner join freetagged_objects)
 on (freetags.id = freetagged_objects.tag_id
   and site.siteId = freetagged_objects.object_id)
group  by site.siteId

Am I doing something wrong?
Thanks,
Briggs Thompson

On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson 
w.briggs.thomp...@gmail.com wrote:

 Hello Solr Community!

 I am implementing a data connection to Solr through the Data Import
 Handler and non-multivalued fields are working correctly, but multivalued
 fields are not getting indexed properly.

 I am new to DataImportHandler, but from what I could find, the entity is
 the way to go for multivalued field. The weird thing is that data is being
 indexed for one row, meaning first raw_tag gets populated.


 Anyone have any ideas?
 Thanks,
 Briggs

 This is the relevant part of the schema:

field name =raw_tag type=text_en_lessAggressive indexed=true
 stored=false multivalued=true/
field name =raw_tag_string type=string indexed=false
 stored=true multivalued=true/
copyField source=raw_tag dest=raw_tag_string/

 And the relevant part of data-import.xml:

 document name=merchant
 entity name=site
   query=select * from site 
 field column=siteId name=siteId /
 field column=domain name=domain /
 field column=aliasFor name=aliasFor /
 field column=title name=title /
 field column=description name=description /
 field column=requests name=requests /
 field column=requiresModeration name=requiresModeration /
 field column=blocked name=blocked /
 field column=affiliateLink name=affiliateLink /
 field column=affiliateTracker name=affiliateTracker /
 field column=affiliateNetwork name=affiliateNetwork /
 field column=cjMerchantId name=cjMerchantId /
 field column=thumbNail name=thumbNail /
 field column=updateRankings name=updateRankings /
 field column=couponCount name=couponCount /
 field column=category name=category /
 field column=adult name=adult /
 field column=rank name=rank /
 field column=redirectsTo name=redirectsTo /
 field column=wwwRequired name=wwwRequired /
 field column=avgSavings name=avgSavings /
 field column=products name=products /
 field column=nameChecked name=nameChecked /
 field column=tempFlag name=tempFlag /
 field column=created name=created /
 field column=enableSplitTesting name=enableSplitTesting /
 field column=affiliateLinklock name=affiliateLinklock /
 field column=hasMobileSite name=hasMobileSite /
 field column=blockSite name=blockSite /
 entity name=merchant_tags pk=siteId
 query=select raw_tag, freetags.id,
 freetagged_objects.object_id as siteId
from freetags
inner join freetagged_objects
on freetags.id=freetagged_objects.tag_id
 where freetagged_objects.object_id='${site.siteId}'
 field column=raw_tag name=raw_tag/
  /entity
 /entity
 /document



Re: DataImportHandler w/ multivalued fields

2011-12-01 Thread Rahul Warawdekar
Hi Briggs,

By saying multivalued fields are not getting indexed prperly, do you mean
to say that you are not able to search on those fields ?
Have you tried actually searching your Solr index for those multivalued
terms and make sure if it returns the search results ?

One possibility could be that the multivalued fields are getting indexed
correctly and are searchable.
However, since your schema.xml has a raw_tag field whose stored
attribute is set to false, you may not be able to see those fields.



On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson w.briggs.thomp...@gmail.com
 wrote:

 In addition, I tried a query like below and changed the column definition
 to
field column=raw_tag name=raw_tag splitBy=, /
 and still no luck. It is indexing the full content now but not multivalued.
 It seems like the splitBy ins't working properly.

select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.*
 from site
 left outer join
  (freetags inner join freetagged_objects)
 on (freetags.id = freetagged_objects.tag_id
   and site.siteId = freetagged_objects.object_id)
 group  by site.siteId

 Am I doing something wrong?
 Thanks,
 Briggs Thompson

 On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson 
 w.briggs.thomp...@gmail.com wrote:

  Hello Solr Community!
 
  I am implementing a data connection to Solr through the Data Import
  Handler and non-multivalued fields are working correctly, but multivalued
  fields are not getting indexed properly.
 
  I am new to DataImportHandler, but from what I could find, the entity is
  the way to go for multivalued field. The weird thing is that data is
 being
  indexed for one row, meaning first raw_tag gets populated.
 
 
  Anyone have any ideas?
  Thanks,
  Briggs
 
  This is the relevant part of the schema:
 
 field name =raw_tag type=text_en_lessAggressive indexed=true
  stored=false multivalued=true/
 field name =raw_tag_string type=string indexed=false
  stored=true multivalued=true/
 copyField source=raw_tag dest=raw_tag_string/
 
  And the relevant part of data-import.xml:
 
  document name=merchant
  entity name=site
query=select * from site 
  field column=siteId name=siteId /
  field column=domain name=domain /
  field column=aliasFor name=aliasFor /
  field column=title name=title /
  field column=description name=description /
  field column=requests name=requests /
  field column=requiresModeration name=requiresModeration
 /
  field column=blocked name=blocked /
  field column=affiliateLink name=affiliateLink /
  field column=affiliateTracker name=affiliateTracker /
  field column=affiliateNetwork name=affiliateNetwork /
  field column=cjMerchantId name=cjMerchantId /
  field column=thumbNail name=thumbNail /
  field column=updateRankings name=updateRankings /
  field column=couponCount name=couponCount /
  field column=category name=category /
  field column=adult name=adult /
  field column=rank name=rank /
  field column=redirectsTo name=redirectsTo /
  field column=wwwRequired name=wwwRequired /
  field column=avgSavings name=avgSavings /
  field column=products name=products /
  field column=nameChecked name=nameChecked /
  field column=tempFlag name=tempFlag /
  field column=created name=created /
  field column=enableSplitTesting name=enableSplitTesting
 /
  field column=affiliateLinklock name=affiliateLinklock /
  field column=hasMobileSite name=hasMobileSite /
  field column=blockSite name=blockSite /
  entity name=merchant_tags pk=siteId
  query=select raw_tag, freetags.id,
  freetagged_objects.object_id as siteId
 from freetags
 inner join freetagged_objects
 on freetags.id=freetagged_objects.tag_id
  where freetagged_objects.object_id='${site.siteId}'
  field column=raw_tag name=raw_tag/
   /entity
  /entity
  /document
 




-- 
Thanks and Regards
Rahul A. Warawdekar


Re: DataImportHandler w/ multivalued fields

2011-12-01 Thread Briggs Thompson
Hey Rahul,

Thanks for the response. I actually just figured it thankfully :). To
answer your question, the raw_tag is indexed and not stored (tokenized),
and then there is a copyField for raw_tag to raw_tag_string which would
be used for facets. That *should have* been displayed in the results.

The silly mistake I made was not camel casing multiValued, which is
clearly the source of the problem.

The second email I sent changing the query and using the split for the
multivalued field had an error in it in the form of a missing line:
transformer=RegexTransformer
in the entity declaration.

Anyhow, thanks for the quick response!

Briggs


On Thu, Dec 1, 2011 at 12:57 PM, Rahul Warawdekar 
rahul.warawde...@gmail.com wrote:

 Hi Briggs,

 By saying multivalued fields are not getting indexed prperly, do you mean
 to say that you are not able to search on those fields ?
 Have you tried actually searching your Solr index for those multivalued
 terms and make sure if it returns the search results ?

 One possibility could be that the multivalued fields are getting indexed
 correctly and are searchable.
 However, since your schema.xml has a raw_tag field whose stored
 attribute is set to false, you may not be able to see those fields.



 On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson 
 w.briggs.thomp...@gmail.com
  wrote:

  In addition, I tried a query like below and changed the column definition
  to
 field column=raw_tag name=raw_tag splitBy=, /
  and still no luck. It is indexing the full content now but not
 multivalued.
  It seems like the splitBy ins't working properly.
 
 select group_concat(freetags.raw_tag separator ', ') as raw_tag,
 site.*
  from site
  left outer join
   (freetags inner join freetagged_objects)
  on (freetags.id = freetagged_objects.tag_id
and site.siteId = freetagged_objects.object_id)
  group  by site.siteId
 
  Am I doing something wrong?
  Thanks,
  Briggs Thompson
 
  On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson 
  w.briggs.thomp...@gmail.com wrote:
 
   Hello Solr Community!
  
   I am implementing a data connection to Solr through the Data Import
   Handler and non-multivalued fields are working correctly, but
 multivalued
   fields are not getting indexed properly.
  
   I am new to DataImportHandler, but from what I could find, the entity
 is
   the way to go for multivalued field. The weird thing is that data is
  being
   indexed for one row, meaning first raw_tag gets populated.
  
  
   Anyone have any ideas?
   Thanks,
   Briggs
  
   This is the relevant part of the schema:
  
  field name =raw_tag type=text_en_lessAggressive indexed=true
   stored=false multivalued=true/
  field name =raw_tag_string type=string indexed=false
   stored=true multivalued=true/
  copyField source=raw_tag dest=raw_tag_string/
  
   And the relevant part of data-import.xml:
  
   document name=merchant
   entity name=site
 query=select * from site 
   field column=siteId name=siteId /
   field column=domain name=domain /
   field column=aliasFor name=aliasFor /
   field column=title name=title /
   field column=description name=description /
   field column=requests name=requests /
   field column=requiresModeration
 name=requiresModeration
  /
   field column=blocked name=blocked /
   field column=affiliateLink name=affiliateLink /
   field column=affiliateTracker name=affiliateTracker /
   field column=affiliateNetwork name=affiliateNetwork /
   field column=cjMerchantId name=cjMerchantId /
   field column=thumbNail name=thumbNail /
   field column=updateRankings name=updateRankings /
   field column=couponCount name=couponCount /
   field column=category name=category /
   field column=adult name=adult /
   field column=rank name=rank /
   field column=redirectsTo name=redirectsTo /
   field column=wwwRequired name=wwwRequired /
   field column=avgSavings name=avgSavings /
   field column=products name=products /
   field column=nameChecked name=nameChecked /
   field column=tempFlag name=tempFlag /
   field column=created name=created /
   field column=enableSplitTesting
 name=enableSplitTesting
  /
   field column=affiliateLinklock name=affiliateLinklock
 /
   field column=hasMobileSite name=hasMobileSite /
   field column=blockSite name=blockSite /
   entity name=merchant_tags pk=siteId
   query=select raw_tag, freetags.id,
   freetagged_objects.object_id as siteId
  from freetags
  inner join freetagged_objects
  on freetags.id=freetagged_objects.tag_id
   where freetagged_objects.object_id='${site.siteId}'
   field