Re: AW: Facets on multiple values

Shawn Heisey Thu, 29 Jul 2010 12:24:50 -0700

 On 7/29/2010 1:13 PM, Chris Hostetter wrote:

: My initial approach was to grab the values (which are in another table) with a
: DIH subentity and store them in a multivalued field, but that reduced index
: speed to a crawl.  That's because instead of one query for the entire import,
: it was making an individual subquery for every document returned by the main
: query.  Switching to a left join, I couldn't see any performance difference,
: and it's still one query.


It's not clera to me how you are getting the values i nthe first place
that getting them as a multivalued field slowed down thta much, but
if the data is already semi-colon delimited, then the RegexTransformer can
make a mutlivalued field out of it using splitBy.


Here's the original query before adding this new field:

SELECT *,FROM_UNIXTIME(post_date) AS pd FROM ncdat WHERE blahblahblah

This is the new query:

SELECT d.*,FROM_UNIXTIME(post_date) AS pd,GROUP_CONCAT(w.webtableSEPARATOR ';') AS search_group FROM ncdat d LEFT JOIN ncdat_wt w ONd.feature=w.featurecode WHERE blahblahblah

The abandoned initial approach kept the same main query and used itsprimary key on a second query to gather the search groups. With 7.5million rows in the first query, you get 7.5 million individual queriesagainst the second table, which when it's complete will only have a fewthousand rows. It went from taking about 5 hours to index (database isthe bottleneck, not Solr) to about 12 hours. Is there a way to makethis approach faster?


Thanks,
Shawn

Re: AW: Facets on multiple values

Reply via email to