Hi thanks for the continued support. I'm really worried as my project
deadline is near. It was 1636549 in MySQL vs 287041 in Solr. I put select
distinct in the beginning of the query because IMDB doesn't have a table
for cast & crew. It puts movie and person and their roles into one huge
table 'cast_info'. Hence there are multiple rows for a director, one row
per his movie.

On Saturday, November 7, 2015, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Just to get the paranoid option out of the way, is 'id' actually the
> column that has unique ids in your database? If you do "select
> distinct id from imdb.director" - how many items do you get?
>
> Regards,
>    Alex.
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 7 November 2015 at 18:21, Yangrui Guo <guoyang...@gmail.com
> <javascript:;>> wrote:
> > Hello
> >
> > I'm being troubled by solr's data import handler. My solr version is
> 5.3.1
> > and mysql is 5.5. I tried to index imdb data but found solr only
> partially
> > indexed. I ran "SELECT DISTINCT COUNT(*) FROM imdb.director" and the
> query
> > result was 1636549. However DIH only fetched and indexed 287041 rows. I
> > didn't see any error in the log. Why was this happening?
> >
> > Here's my data-config.xml
> >
> > <dataConfig>
> > <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
> > url="jdbc:mysql://localhost:3306/imdb" user="root" password="password" />
> > <document>
> > <entity name="director" transformer="RegexTransformer" query="SELECT
> > DISTINCT * FROM imdb.director">
> > <field name="id" column="id" />
> > <field name="content_type" column="content_type" />
> > </entity>
> > </document>
> > </dataConfig>
> >
> > Yangrui Guo
>

Reply via email to