On 7/29/2010 1:13 PM, Chris Hostetter wrote:
: My initial approach was to grab the values (which are in another table) with a
: DIH subentity and store them in a multivalued field, but that reduced index
: speed to a crawl.  That's because instead of one query for the entire import,
: it was making an individual subquery for every document returned by the main
: query.  Switching to a left join, I couldn't see any performance difference,
: and it's still one query.

It's not clera to me how you are getting the values i nthe first place
that getting them as a multivalued field slowed down thta much, but
if the data is already semi-colon delimited, then the RegexTransformer can
make a mutlivalued field out of it using splitBy.

Here's the original query before adding this new field:

SELECT *,FROM_UNIXTIME(post_date) AS pd FROM ncdat WHERE blahblahblah

This is the new query:

SELECT d.*,FROM_UNIXTIME(post_date) AS pd,GROUP_CONCAT(w.webtable SEPARATOR ';') AS search_group FROM ncdat d LEFT JOIN ncdat_wt w ON d.feature=w.featurecode WHERE blahblahblah

The abandoned initial approach kept the same main query and used its primary key on a second query to gather the search groups. With 7.5 million rows in the first query, you get 7.5 million individual queries against the second table, which when it's complete will only have a few thousand rows. It went from taking about 5 hours to index (database is the bottleneck, not Solr) to about 12 hours. Is there a way to make this approach faster?

Thanks,
Shawn

Reply via email to