Just to get the paranoid option out of the way, is 'id' actually the
column that has unique ids in your database? If you do "select
distinct id from imdb.director" - how many items do you get?
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
Hi thanks for the continued support. I'm really worried as my project
deadline is near. It was 1636549 in MySQL vs 287041 in Solr. I put select
distinct in the beginning of the query because IMDB doesn't have a table
for cast & crew. It puts movie and person and their roles into one huge
table
Hello
I'm being troubled by solr's data import handler. My solr version is 5.3.1
and mysql is 5.5. I tried to index imdb data but found solr only partially
indexed. I ran "SELECT DISTINCT COUNT(*) FROM imdb.director" and the query
result was 1636549. However DIH only fetched and indexed 287041
That's not quite the question I asked. Do a distinct on 'id' only in
the database itself. If your ids are NOT unique, you need to create a
composite or a virtual id for Solr. Because whatever your
solrconfig.xml say is uniqueKey will be used to deduplicate the
documents. If you have 10 documents
Yes the id is unique. If I only select distinct id,count(id) I get the same
results. However I found this is more likely a MySQL issue. I created a new
table called director1 and ran query "insert into director1 select * from
director" I got only 287041 results inserted, which was the same as