Hi thanks for the continued support. I'm really worried as my project deadline is near. It was 1636549 in MySQL vs 287041 in Solr. I put select distinct in the beginning of the query because IMDB doesn't have a table for cast & crew. It puts movie and person and their roles into one huge table 'cast_info'. Hence there are multiple rows for a director, one row per his movie.
On Saturday, November 7, 2015, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Just to get the paranoid option out of the way, is 'id' actually the > column that has unique ids in your database? If you do "select > distinct id from imdb.director" - how many items do you get? > > Regards, > Alex. > ---- > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 7 November 2015 at 18:21, Yangrui Guo <guoyang...@gmail.com > <javascript:;>> wrote: > > Hello > > > > I'm being troubled by solr's data import handler. My solr version is > 5.3.1 > > and mysql is 5.5. I tried to index imdb data but found solr only > partially > > indexed. I ran "SELECT DISTINCT COUNT(*) FROM imdb.director" and the > query > > result was 1636549. However DIH only fetched and indexed 287041 rows. I > > didn't see any error in the log. Why was this happening? > > > > Here's my data-config.xml > > > > <dataConfig> > > <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" > > url="jdbc:mysql://localhost:3306/imdb" user="root" password="password" /> > > <document> > > <entity name="director" transformer="RegexTransformer" query="SELECT > > DISTINCT * FROM imdb.director"> > > <field name="id" column="id" /> > > <field name="content_type" column="content_type" /> > > </entity> > > </document> > > </dataConfig> > > > > Yangrui Guo >