It is useful for parsing PDFs on a multi-processor machine. Also, if a
sub-entity does an outbound I/O call to a database, a file, or another
SOLR (SOLR-1499).
Anything where the pipeline time outweighs disk i/o time.
Threading happens on a per-document level- there is no concurrent
access inside a document pipeline.
There is a bug which causes Entityprocessor that look up attributes to
throw an exception. This make Tika unusable inside a thread. Two other
EPs also won't work, but I did not test them.
https://issues.apache.org/jira/browse/SOLR-2186
On Mon, Nov 1, 2010 at 10:43 AM, Dyer, James james.d...@ingrambook.com wrote:
Mark,
I have the same question so I did a little research on this. Not a complete
answer but here is what I've found:
- threads was aded with SOLR-1352
(https://issues.apache.org/jira/browse/SOLR-1352).
- Also see
http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler
for background info.
- Only available in 3.x and trunk. Committed on 1/12/2010 by Noble Paul (who
surely can tell you more accurate info than I can).
- Seems like when using, each thread will call nextRow on your root entity
datasource in parallel.
- Not sure this will help with child entities (ie. I had hoped I could get it
to build child caches in parallel but I don't think this is the case).
- A doc comment on ThreadedEntityProcessorWrapper indicates this will help
speed up running transformers becauses they'd be in parallel. This would
make sense if maybe your database can only pull back so fast, but then you
have an intensive transformer. Maybe adding a thread would make your
processing no slower than the db...
James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311
-Original Message-
From: markwaddle [mailto:m...@markwaddle.com]
Sent: Tuesday, October 26, 2010 2:25 PM
To: solr-user@lucene.apache.org
Subject: How does DIH multithreading work?
I understand that the thread count is specified on root entities only. Does
it spawn multiple threads per root entity? Or multiple threads per
descendant entity? Can someone give an example of how you would make a
database query in an entity with 4 threads that would select 1 row per
thread?
Thanks,
Mark
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
Lance Norskog
goks...@gmail.com