Re: Data import handler not indexing all data

2015-11-07 Thread Alexandre Rafalovitch
Just to get the paranoid option out of the way, is 'id' actually the column that has unique ids in your database? If you do "select distinct id from imdb.director" - how many items do you get? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: Data import handler not indexing all data

2015-11-07 Thread Yangrui Guo
Hi thanks for the continued support. I'm really worried as my project deadline is near. It was 1636549 in MySQL vs 287041 in Solr. I put select distinct in the beginning of the query because IMDB doesn't have a table for cast & crew. It puts movie and person and their roles into one huge table

Data import handler not indexing all data

2015-11-07 Thread Yangrui Guo
Hello I'm being troubled by solr's data import handler. My solr version is 5.3.1 and mysql is 5.5. I tried to index imdb data but found solr only partially indexed. I ran "SELECT DISTINCT COUNT(*) FROM imdb.director" and the query result was 1636549. However DIH only fetched and indexed 287041

Re: Data import handler not indexing all data

2015-11-07 Thread Alexandre Rafalovitch
That's not quite the question I asked. Do a distinct on 'id' only in the database itself. If your ids are NOT unique, you need to create a composite or a virtual id for Solr. Because whatever your solrconfig.xml say is uniqueKey will be used to deduplicate the documents. If you have 10 documents

Re: Data import handler not indexing all data

2015-11-07 Thread Yangrui Guo
Yes the id is unique. If I only select distinct id,count(id) I get the same results. However I found this is more likely a MySQL issue. I created a new table called director1 and ran query "insert into director1 select * from director" I got only 287041 results inserted, which was the same as