[jira] [Updated] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-05-07 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-3360:
-

Attachment: SOLR-3360-test.patch

Mikhail,

Here is a patch incorporating your last comments.  One problem though the test 
fails for me with 
-Dtests.seed=4fe0e50054781418:4b35f2f982289e2f:-668154ae4e16ecdb

This creates a configuration for the 1-thread test:

dataConfig
dataSource  type=MockDataSource/
   document
   entity name=x threads=1 query=select * 
from x  processor=SqlEntityProcessor
   field column=id /
   entity name=y query=select * from y 
where=xid=x.id  processor=SqlEntityProcessor
   field column=desc /
   /entity
   /entity
   /document
/dataConfig

Any ideas?

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
Assignee: James Dyer
 Fix For: 3.6.1

 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360-test.patch, SOLR-3360-test.patch, SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 

[jira] [Updated] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-05-07 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-3360:
---

Attachment: SOLR-3360-test.patch

Hi,

Applied a tiny modification:
{code}
  + (useWhereParam
  ?entity name=\y\ query=\select * from y\ 
where=\xid=x.id\  
  :entity name=\y\ query=\select * from y 
where y.A=${x.id}\  
  )+ processor=\+(childCached || useWhereParam ? 
Cached:)+SqlEntityProcessor\\n
{code}

to prevent where=\xid=x.id\ with plain SqlEntityProcessor


 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
Assignee: James Dyer
 Fix For: 3.6.1

 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360-test.patch, SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: 

[jira] [Updated] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-05-07 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-3360:
-

Attachment: SOLR-3360-test.patch

See the latest SOLR-3360-test.patch.  I changed the ternary operator as you 
suggested.

I still get a failure with this:  
-Dtests.seed=-55eeb72d0a16dfec:4e1a59f5738a6b25:4bf3cbf2bd3b659a

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
Assignee: James Dyer
 Fix For: 3.6.1

 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360-test.patch, SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360-test.patch, SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

[jira] [Updated] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-05-04 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-3360:
-

Attachment: SOLR-3360-test.patch

Mikhail,

See SOLR-3360-test.patch.  If you agree this covers everything your version 
does then I'll commit it shortly.

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
Assignee: James Dyer
 Fix For: 3.6.1

 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360-test.patch, SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-22 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-3360:
---

Attachment: SOLR-3360-test.patch

Ok. Fix is attached. But SOLR-3307 is broken by this patch. Work in progress

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3360) Problem with DataImportHandler multi-threaded

2012-04-22 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-3360:
---

Attachment: SOLR-3360.patch

my final solution is attached SOLR-3360.patch. SOLR-3307 is mostly rolled back, 
and fixed by slightly different way see change in XPathEntityProcessor.java 
All tests are green

 Problem with DataImportHandler multi-threaded
 -

 Key: SOLR-3360
 URL: https://issues.apache.org/jira/browse/SOLR-3360
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
Reporter: Claudio R
 Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, 
 SOLR-3360.patch


 Hi,
 If I use dataimport with 1 thread, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource5001/str
str name=Total Rows Fetched1000/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:21:57/str
str name=Indexing completed. Added/Updated: 1000 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:23:19/str
str name=Total Documents Processed1000/str
str name=Time taken0:1:22.390/str
 /lst
 If I use datamport with 10 threads, I got:
 lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched1/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2012-04-16 11:31:43/str
str name=Indexing completed. Added/Updated: 1 documents. Deleted 0 
 documents./str
str name=Committed2012-04-16 11:41:50/str
str name=Total Documents Processed1/str
str name=Time taken0:10:7.586/str
 /lst
 The configuration of 10 threads consumed 10 times longer than the 
 configuration with 1 thread.
 I have 1000 records in the database.
 My db-data-config.xml is shown below:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver 
 url=jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test user=user 
 password=pass/
   document
  entity name=indice rootEntity=true threads=10 
 transformer=RegexTransformer,TemplateTransformer query=select top 1000 
 i.id_indice, i.a, i.b from indice i where i.status = 'I' 
 deltaImportQuery=i.id_indice, i.a, i.b from indice i where id_indice in 
 ('${dataimporter.delta.id_indice}') deltaQuery=select id_indice from indice 
 where status='I' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120) deletedPkQuery=select id_indice 
 from indice where status='D' and data_hora_modificacao = convert(datetime, 
 '${dataimporter.last_index_time}', 120)  
 field column=id_indice name=id_indice /
 field column=a name=a /
 field column=b name=b /
 entity name=filtro 
 transformer=RegexTransformer,TemplateTransformer query=select categoria, 
 sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'
field name=filtro_categoria column=categoria /
field name=filtro_sub_categoria column=sub_categoria /
field name=nv_sub_categoria column=nv_sub_categoria 
 template=${filtro.categoria}|${filtro.sub_categoria} /
 /entity
 entity name=pagina_relacionada query=select url from 
 pagina_relacionada where indice_id_indice = '${indice.id_indice}'
field name=pagina_relacionada_url column=url /
 /entity
 entity name=veja_mais query=select chamada, url from 
 veja_mais where indice_id_indice = '${indice.id_indice}'
field name=veja_mais_chamada column=chamada /
field name=veja_mais_url column=url /
 /entity
 entity name=video query=select url from video where 
 indice_id_indice = '${indice.id_indice}'
field name=video_url column=url /
 /entity
 entity name=galeria query=select url from galeria where 
 indice_id_indice = '${indice.id_indice}'
field name=galeria_url column=url /
 /entity
  /entity
   /document
 /dataConfig
 Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org