On 7/29/2010 1:13 PM, Chris Hostetter wrote:
: My initial approach was to grab the values (which are in another table) with a
: DIH subentity and store them in a multivalued field, but that reduced index
: speed to a crawl. That's because instead of one query for the entire import,
: it was making an individual subquery for every document returned by the main
: query. Switching to a left join, I couldn't see any performance difference,
: and it's still one query.
It's not clera to me how you are getting the values i nthe first place
that getting them as a multivalued field slowed down thta much, but
if the data is already semi-colon delimited, then the RegexTransformer can
make a mutlivalued field out of it using splitBy.
Here's the original query before adding this new field:
SELECT *,FROM_UNIXTIME(post_date) AS pd FROM ncdat WHERE blahblahblah
This is the new query:
SELECT d.*,FROM_UNIXTIME(post_date) AS pd,GROUP_CONCAT(w.webtable
SEPARATOR ';') AS search_group FROM ncdat d LEFT JOIN ncdat_wt w ON
d.feature=w.featurecode WHERE blahblahblah
The abandoned initial approach kept the same main query and used its
primary key on a second query to gather the search groups. With 7.5
million rows in the first query, you get 7.5 million individual queries
against the second table, which when it's complete will only have a few
thousand rows. It went from taking about 5 hours to index (database is
the bottleneck, not Solr) to about 12 hours. Is there a way to make
this approach faster?
Thanks,
Shawn