> On 19 Aug 2016, at 09:44, Hernán Morales Durand <[email protected]> 
> wrote:
> 
> 
> 2016-08-18 16:57 GMT-03:00 Sven Van Caekenberghe <[email protected]>:
> 
> > On 18 Aug 2016, at 18:35, Hernán Morales Durand <[email protected]> 
> > wrote:
> >
> > Hi Sven!
> >
> > Yes, this is lot of requests to a public database named GenBank through a 
> > public API called "Entrez". I repeated the test today to send you a log of 
> > what's happening. The requests are done in batches of 500 uid's because 
> > they limit/penalize huge requests. Each uid is human meta-data, I have 
> > requested 20,000 samples, and Argentina has one of the worst connections in 
> > the world.
> 
> I see that you do 10 request concurrently, correct ? In different 
> threads/processes ?
> 
> 
> Mmmm no, I wrote #splitDownload methods to handle batch requests 
> (historically NCBI limits to 3 requests per second)... This is the code 
> (entrezUrlUidLimit = 500)

Ah, I think I know what happened. You probably called #logToTranscript multiple 
times with an older version of Zinc. It seems as if every log event is 
displayed multiple times. The code should read:

ZnLogEvent class>>#logToTranscript
        self stopLoggingToTranscript.
        ^ self announcer when: ZnLogEvent do: [ :event | Transcript crShow: 
event ].

ZnLogEvent class>>stopLoggingToTranscript
        self announcer unsubscribe: self

> >>splitDownload
> 
>     | index splittedRs |
> 
>     splittedRs := OrderedCollection new: self entrezUrlUidLimit.    
>     index := 1.
>     self uids do: [: id  |
>             splittedRs add: id.
>             index \\ self entrezUrlUidLimit = 0
>                 ifTrue: [
>                     self bioLog: 'Requesting ' , splittedRs size asString , ' 
> records to Entrez'.
>                     self results add: (self genBankFetchRecords: splittedRs).
>                     splittedRs := OrderedCollection new: self 
> entrezUrlUidLimit ].
>             index := index + 1 ]
>         displayingProgress: 'Downloading...' translated.
>     " Add remaining records "
>     self bioLog: 'Requesting ' , splittedRs size asString , ' records to 
> Entrez'.
>     self results add: (self genBankFetchRecords: splittedRs).
>     ^ self results
> 
> Then I just isolated the code in Pharo 5 and did a break (Alt + . on Windows).
> 
> 2016-08-19 04:06:33 Requesting 500 records to Entrez
> 2016-08-19 04:06:33 Executing...BioEFetchSeq
> 2016-08-19 04:07:17 Requesting 500 records to Entrez
> 2016-08-19 04:07:17 Executing...BioEFetchSeq
> 2016-08-19 04:08:20 Requesting 500 records to Entrez
> 2016-08-19 04:08:20 Executing...BioEFetchSeq
> 2016-08-19 04:09:28 Requesting 500 records to Entrez
> 2016-08-19 04:09:28 Executing...BioEFetchSeq
> 2016-08-19 04:10:34 Requesting 500 records to Entrez
> 2016-08-19 04:10:34 Executing...BioEFetchSeq
> 2016-08-19 04:12:09 Requesting 500 records to Entrez
> 2016-08-19 04:12:09 Executing...BioEFetchSeq
> 2016-08-19 04:14:05 Requesting 500 records to Entrez
> 2016-08-19 04:14:05 Executing...BioEFetchSeq
> 2016-08-19 04:14:54 Requesting 500 records to Entrez
> 2016-08-19 04:14:54 Executing...BioEFetchSeq
> 2016-08-19 04:15:58 Requesting 500 records to Entrez
> 2016-08-19 04:15:58 Executing...BioEFetchSeq
> 2016-08-19 04:17:38 Requesting 500 records to Entrez
> 2016-08-19 04:17:38 Executing...BioEFetchSeq
> 2016-08-19 04:19:12 Requesting 500 records to Entrez
> 2016-08-19 04:19:12 Executing...BioEFetchSeq
> 2016-08-19 04:21:01 Requesting 500 records to Entrez
> 2016-08-19 04:21:01 Executing...BioEFetchSeq
> 2016-08-19 04:22:41 Requesting 500 records to Entrez
> 2016-08-19 04:22:41 Executing...BioEFetchSeq
> 2016-08-19 04:24:17 Requesting 500 records to Entrez
> 2016-08-19 04:24:17 Executing...BioEFetchSeq
> 2016-08-19 04:25:29 Requesting 500 records to Entrez
> 2016-08-19 04:25:29 Executing...BioEFetchSeq
> 2016-08-19 04:26:34 Requesting 500 records to Entrez
> 2016-08-19 04:26:34 Executing...BioEFetchSeq
> 2016-08-19 04:28:48 Requesting 500 records to Entrez
> 2016-08-19 04:28:48 Executing...BioEFetchSeq
> 2016-08-19 04:29:37 Requesting 500 records to Entrez
> 2016-08-19 04:29:37 Executing...BioEFetchSeq
> 2016-08-19 04:30:53 Requesting 500 records to Entrez
> 2016-08-19 04:30:53 Executing...BioEFetchSeq
> 2016-08-19 04:32:43 Requesting 500 records to Entrez
> 2016-08-19 04:32:43 Executing...BioEFetchSeq
> 2016-08-19 04:33:56 Requesting 500 records to Entrez
> 2016-08-19 04:33:56 Executing...BioEFetchSeq
> 2016-08-19 04:34:29 Requesting 500 records to Entrez
> 2016-08-19 04:34:29 Executing...BioEFetchSeq
> 
> And now Transcript displays the intervals (all at once of course), something 
> weird is happening there.

It is really hard for me to see the context, which is probably complex. Unless 
you isolate the problem for me so that I can run it in a standard image, I 
won't be able to say much more.

> I tried the first request that I saw,
> 
> ZnClient new url: 
> 'http://eutils.ncbi.nlm.nih.gov:80/entrez/eutils/efetch.fcgi?retmode=xml&id=188847,188840,188835,188827,188833,188831,5919044,5919041,188829,188824,188816,188841,188828,188822,188809,188838,188836,5919066,5919018,188826,188807,5919042,188812,5919052,5919028,188832,188830,188820,188823,188819,5919056,188784,5919051,5919010,188814,5919036,5919011,188815,188794,5919043,5919009,5919004,188802,5919040,5919003,5919006,5919001,5919037,5919020,188844,188782,188779,5919067,5919045,5918994,188783,5919062,5919019,5918993,188743,188734,5919013,5919000,5918987,188817,5919046,5919007,5918996,5919032,5919029,5919027,5919021,188719,188795,188785,188788,188780,5919026,5919024,5919023,5919022,5919017,5919002,5919064,5918997,5919059,5919048,5919005,188781,5919055,5919047,5919033,5919030,5919025,5918992,188739,5919016,188732,188731,5919035,5918991,5918990,188834,188774,188763,5919031,5919012,5918989,188825,188745,5919015,5918995,188715,188775,188730,188718,5919049,188821,188803,5919065,5919050,5919034,188714,5918988,188720,5919063,5919054,5919014,5919008,188797,5918999,5918998,188717,5919057,188750,188723,188758,188769,188752,188751,188724,188760,188754,5919060,188759,188757,188756,5919061,188771,188736,5919038,188761,5919053,188735,188786,5919058,188818,188753,188733,188845,188755,188837,188777,188725,188846,188744,1002820005,999844743,941582049,941581974,941581869,941581732,941581138,941580894,941580699,850201272,653603877,653603608,653603509,653603323,653603248,653602787,827529357,817037023,807985587,807985574,807985096,806641362,735659021,735658976,735658957,735658862,735658829,735658814,735658794,735658792,735658766,735658757,735658751,695118655,695118621,695118612,695118403,695118088,695118067,700283821,700282771,700281917,575524695,550825751,556709823,556709753,667060998,568246384,610260256,610260144,430766974,397187538,397181812,570436518,374089790,374089754,374089025,374089020,374088999,374088998,374088995,549444460,410062763,472836119,408775754,408775351,536189361,513791441,513791357,513791301,513791217,513790965,513790629,513790321,513790223,513788781,461681792,378744435,401666731,401666186,302377629,302377601,392496866,392496829,392496762,392496587,392496297,392496261,392496213,392496202,392496104,392495779,392495588,381261925,381261506,381253498,326200539,339745356,339744796,339744703,339744578,339744577,305656401,305656233,305656149,305656121,305655967,305655953,305655925,334189684,320339399,329741913,290758881,305387591,318039611,289600384,288806465,288806339,288805758,224474796,224474781,209916429,223015240,223015212,223015170,223015156,223012454,223009682,205278793,189396910,189175349,189175200,189175179,189175178,189175014,189175012,189174852,171675625,171674995,171674925,156077831,119034956,154100682,86450681,78775919,92109049,48596178,32891159,75905872,32348701,32348575,32348436,32348394,61287889,58333751,18845598,941581228,381273587,198445276,198445114,198445030,198444993,116009304,116009141,1046813254,1046813224,1046813183,1046813083,1046813056,1044327069,1043189398,961420771,961420766,961420707,961420612,961420499,961420495,961420492,961420468,961420465,961420273,961420254,961420206,961420158,914343336,1036141119,1031954968,1025815253,1025813517,1025813307,1024312154,961391305,961391239,961391204,961391190,961391164,961390978,1002820131,1002820047,1002819991,1002819935,1002819837,1002552665,1002552627,1002552595,1002552503,999844882,999844783,999844777,999844747,999844727,998453634,998453620,982278317,941583102,941582715,941582531,941582178,941582015,941581822,941581789,941581731,941581674,941581539,941581502,941581448,941581294,941581131,941581007,941580980,941580783,957761438,821345416,255367289,255367177,255367163,255366085,891497898,850203159,850203157,850202660,850202610,850202608,850202194,653604148,653604144,653604143,653604135,653604084,653604022,653604006,653603906,653603905,653603896,653603888,653603850,653603838,653603811,653603724,653603559,653603399,653603328,653603288,653603266,653603234,653603186,653603183,653603163,653603162,653603154,653603090,653602987,653602982,653602979,653602962,653602868,653602840,653602832,653602802,807985564,807985482,807985463,807985332,807985294,807985282,807985270,807985265,807985240,807985201,807985176,807985141,807985134,807985121,807985088,806642054,806641999,806641444,806641353,806641250,806641208,806641180,806641082,806640718,735659024,735658983,735658901,735658884,735658841,735658805,735658747,756762254,666695871,695118992,695118986,695118959,695118943,695118900,695118876,695118874,695118870,695118824,695118768,695118703,695118692,695118688,695118661,695118589,695118438,695118421,695118375,695118344,695118179,695118173,695118170&db=nuccore';
>  get; yourself.
> 
> Which resulted in about 11Mb XML, which you parse and convert I guess ?
> 
> 
> Yes, I have a parser for that in BioSmalltalk.
>  
> That would be between 100 and 200 Mb, at least, no ?
> 
> That already puts a heavy burden on the image/VM. Make sure to properly close 
> and cleanup everything as soon as you can.
> 
> And all this 40 times (500 x 40 = 20000) ?
> 
> I think that goes way over the 2Gb limit (which is rather theoretical, it is 
> more like 1.5).
> 
> Unless you clean up very well afterwards and only keep the minimal 
> useful/needed info in memory.
> 
> 
> This is the first time I try with human sequences (many samples there, we 
> normally do animals, bacteria and viruses which are considerably fewer), I 
> don't have the final numbers but I will serialize each downloaded file.
>  
> > Another problem I have is the Transcript writes everything once the 
> > exception is signaled :(
> > Is there a way to revert to the old behavior where each Transcript show: 
> > writes to Transcript in situ?
> 
> Do you block the UI thread (while running your main top level expression) ? 
> Try forking.
> 
> 
> I know, but I don't want my users opening and closing critical windows while 
> downloading data causing more disasters. 
> Is there a way to rollback to older behavior where a Transcript show: had 
> higher priority? I don't know why this was changed, it is *critical* to see 
> what happens in real time!

To see what I mean, try both of the following expressions with a clean 
Transcript:

1 to: 10 do: [ :n | ('Now doing ', n asString) crLog. 1 second wait ].

[ 1 to: 10 do: [ :n | ('Now doing ', n asString) crLog. 1 second wait ] ] fork.

If you want to signal progress, there are (better) options for that.

Have a look at ZnClientTest>>#testProgress or even 
MCHttpRepository>>#displayProgress:during: for examples.

>  
> > The ClosedConnection didn't signaled today, but OutOfMemory.
> 
> See above.
> 
> 
> Thanks Sven
> 
> Hernán


Reply via email to