RE: How does DIH multithreading work?

2010-11-01 Thread Dyer, James
Mark,

I have the same question so I did a little research on this.  Not a complete 
answer but here is what I've found:

- threads was aded with SOLR-1352 
(https://issues.apache.org/jira/browse/SOLR-1352).

- Also see 
http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler
 for background info.

- Only available in 3.x and trunk.  Committed on 1/12/2010 by Noble Paul (who 
surely can tell you more accurate info than I can).

- Seems like when using, each thread will call nextRow on your root entity 
datasource in parallel.

- Not sure this will help with child entities (ie. I had hoped I could get it 
to build child caches in parallel but I don't think this is the case).

- A doc comment on ThreadedEntityProcessorWrapper indicates this will help 
speed up running transformers becauses they'd be in parallel.  This would make 
sense if maybe your database can only pull back so fast, but then you have an 
intensive transformer.  Maybe adding a thread would make your processing no 
slower than the db...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: markwaddle [mailto:m...@markwaddle.com] 
Sent: Tuesday, October 26, 2010 2:25 PM
To: solr-user@lucene.apache.org
Subject: How does DIH multithreading work?


I understand that the thread count is specified on root entities only. Does
it spawn multiple threads per root entity? Or multiple threads per
descendant entity? Can someone give an example of how you would make a
database query in an entity with 4 threads that would select 1 row per
thread?

Thanks,
Mark
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How does DIH multithreading work?

2010-11-01 Thread Lance Norskog
It is useful for parsing PDFs on a multi-processor machine. Also, if a
sub-entity does an outbound I/O call to a database, a file, or another
SOLR (SOLR-1499).

Anything where the pipeline time outweighs disk i/o time.

Threading happens on a per-document level- there is no concurrent
access inside a document pipeline.

There is a bug which causes Entityprocessor that look up attributes to
throw an exception. This make Tika unusable inside a thread. Two other
EPs also won't work, but I did not test them.

https://issues.apache.org/jira/browse/SOLR-2186

On Mon, Nov 1, 2010 at 10:43 AM, Dyer, James james.d...@ingrambook.com wrote:
 Mark,

 I have the same question so I did a little research on this.  Not a complete 
 answer but here is what I've found:

 - threads was aded with SOLR-1352 
 (https://issues.apache.org/jira/browse/SOLR-1352).

 - Also see 
 http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler
  for background info.

 - Only available in 3.x and trunk.  Committed on 1/12/2010 by Noble Paul (who 
 surely can tell you more accurate info than I can).

 - Seems like when using, each thread will call nextRow on your root entity 
 datasource in parallel.

 - Not sure this will help with child entities (ie. I had hoped I could get it 
 to build child caches in parallel but I don't think this is the case).

 - A doc comment on ThreadedEntityProcessorWrapper indicates this will help 
 speed up running transformers becauses they'd be in parallel.  This would 
 make sense if maybe your database can only pull back so fast, but then you 
 have an intensive transformer.  Maybe adding a thread would make your 
 processing no slower than the db...

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: markwaddle [mailto:m...@markwaddle.com]
 Sent: Tuesday, October 26, 2010 2:25 PM
 To: solr-user@lucene.apache.org
 Subject: How does DIH multithreading work?


 I understand that the thread count is specified on root entities only. Does
 it spawn multiple threads per root entity? Or multiple threads per
 descendant entity? Can someone give an example of how you would make a
 database query in an entity with 4 threads that would select 1 row per
 thread?

 Thanks,
 Mark
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com


Re: How does DIH multithreading work?

2010-10-27 Thread markwaddle

Anyone know how it works?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1784419.html
Sent from the Solr - User mailing list archive at Nabble.com.