I’ll try to test the patch this week. Thanks!
On 10/10/11 2:01 AM, "Mikhail Khludnev" <mkhlud...@griddynamics.com> wrote: > Hello, > > Pls have a look to attached patch for > http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_4_0/ > > It makes all tests green, but I have a several doubts in it. > > Key points: > > # EntityProcessorBase has context and dataSourceRowCache fields which can't > be shared among threads (per thread ThreadedEntityProcessorWrappers share > single instance of EntityProcessor). > * I removed context field, and provide it via Context.CURRENT threadlocal. > * here is the really bad thing EntityProcessorBase.init() has a special case > for "white box" tests, which bypass putting context into threadlocal. i.e. > during full circuit run init() argument is ignored, but for tests it's pushing > into threadlocal. I don't know how to improve it. > * I removed dataSourceRowCache and getFromRowCacheTransformed() by inlining > it, because I see no purpose in it - it's iteration state. As a drawback > consuming CachedSqlEntProc empties it, and after some time of thinking I've > got that I'm wrong here. CachedSqlEntProc should be reused during debug > sessions. I'm going to return dataSourceRowCache back, but put into > threadlocal context too. Nevertheless, right now cacheWithWhereClause is used > to distribute locks by PK. > * synchronized CachedSqlEntityProcessor.init(), because it calls thread > sensitive cacheInit() > > # DocBuilder > * creating EntityProcessor in EntityRunner.<init> was wrong because it leads > to creating CachedSqlEntProc per thread as Maria has spotted. I created per > entity processors map in DocBuilder. It allows to share single > CachedSqlEntProc among threads. > * I implemented stacking Context which is passed through thread local. > (push/pop&oldCtx in runAThread()) > * I was wonder that there is the second code path in additiond to > EntityRunners: buildDocument() - stacking contexts was already done there, but > I had to reorder init() and push rows; > * and the third code path collectDelta() - here I had to add stacking context > too. > > # plenty changes in tests, there two kinds of it > * clearing context in threadlocal before and after "white-box" tests. (in > whiteboxing I mean unit tests that avoid "official" DocBuilder entry points) > * and in TestCachedSqlEntityProcessor you can see that I actually broke > Caching nature of this processor, required for debug most. > > # Summary: > * by applying this patch you'll have CachedSqlEntProc works in multiple > threads, but for every debug request it will pull all data every time. > * I'm going to fix this issue and provide non-invasive fix by putting > dataSourceRowCache into a context.entitySession. > > # Questions to DIH committers > * isn't ContextImpl.putVal(String, Object, Map) wrong (it always put into > entitySession) > * is there any plan to refactor DIH into single codepath on EntityRunners (and > remove separate flows for debug and delta); > > -- > Mikhail > > On Sat, Oct 8, 2011 at 1:47 AM, Mikhail Khludnev <mkhlud...@griddynamics.com> > wrote: >> Hello, >> >> First of all I've got the same feeling for some time. >> I've checked TestThreaded >> <http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_4_0/solr/contr >> ib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestThreaded >> .java> and got that it doesn't cover CachedSqlEntityProcessor with fancy >> where="xid=x.id <http://x.id> " feature. >> Pls find the test for it attached: >> dih-mt-cached-race-condition-TestThreaded.java.patch (as a patch for >> TestThreaded from tags/lucene_solr_3_4_0) >> >> Most times testCachedMultiThread_FullImport() fails (but really rare it >> passed) and testCachedSingleThread_FullImport() is always green. The >> difference is one and two threads. >> Pls also find attached: >> >> dih-mt-cached-race-condition-TestThreaded.log contains FINEST messages >> including the root cause (my guess): >> 2011/10/08 0:01:13 >> org.apache.solr.handler.dataimport.ThreadedEntityProcessorWrapper nextRow >> 詳細レベル (低): arow : null >> >> (I'm sorry for hieroglyphs, I don't know where they comes from, it should be >> just FINEST) >> and corrupted documents, where some of child entity fields are missed. >> >> dih-mt-cached-race-condition-TestThreaded.test.out - is the actual test >> failure (I'm sorry but it was another run than in .log). The failed assert is >> the searching root entities by the children terms. >> >> I'm not sure but creating children entity runners in multiple threads looks >> really suspicious for me, I hadn't have a time to look deeper into the code. >> >> PS. I want to contribute two bleeding ideas (and implementations too), which >> improves DIH much, but this race condition is a blocker for them. >> >> Regards >> >> On Mon, Oct 3, 2011 at 10:09 PM, Maria Vazquez <maria.vazq...@dexone.com> >> wrote: >>> When I'm debugging, if it is single threaded >>> CachedSqlEntityProcessor.getAllNonCachedRows is called only once, all the >>> rows cached and next time it requests a row it gets it from the cached data. >>> In the logs I see the SQL call only once. >>> >>> If I use multiple threads, it calls >>> CachedSqlEntityProcessor.getAllNonCachedRows multiple times, so it creates a >>> new cache per thread. >>> >>> The cache should be shared among threads and be safe, right? >>> >>> Thanks! >>> Maria >>> >>> >>> >>> >>> On 10/1/11 8:00 PM, "pulkitsing...@gmail.com" <pulkitsing...@gmail.com> >>> wrote: >>> >>>> > What part of the source code in debug mode behaved in a fashion such as >>>> to >>>> > make it seem like it is not thread-safe? >>>> > >>>> > If it feels difficult to put into words then you can always make a small >>>> 5 min >>>> > screencast to demo the issue and talk about it. I do that for really >>>> complex >>>> > stuff with Jing by techsmith (free version). >>>> > >>>> > Or you can put up a small and simplified test case together to demo the >>>> issue >>>> > and then paste the link for that hosted attachment :) >>>> > >>>> > Sent from my iPhone >>>> > >>>> > On Sep 30, 2011, at 1:28 PM, Maria Vazquez <maria.vazq...@dexone.com> >>>> wrote: >>>> > >>>>> >> Hi, >>>>> >> I¹m using threads with JdbcDataStore and CachedSqlEntityProcessor. >>>>> >> I noticed that if I make it single threaded CachedSqlEntityProcessor >>>>> behaves >>>>> >> as expected (it only queries the db once and caches all the rows). If I >>>>> make >>>>> >> it multi threaded it seems to make multiple db queries and when I debug >>>>> the >>>>> >> source code it looks like it is not thread safe. >>>>> >> Any ideas? >>>>> >> Thanks, >>>>> >> Maria >>>>> >> >>> >> >>