Re: Divide Large Data Blob?

Bob Sneidar via use-livecode Mon, 16 May 2022 15:25:44 -0700

A maximum of 7 recursions are necessary to isolate a single instance of 100 
possible values. 1000 requires a maximum of 10. 10000 values requires 14. The 
idea is that for every factor of 10, you need roughly 3 more recursions. This 
of course assumes the data is sorted, which in your case is sorted into 3 
containers. If you know the limits of how many lines can be garbage, and how 
many can be valid data, you narrow your scope significantly.


Livecode is pretty damn quick at parsing this kind of data. If there are 
consistent delimiters (in this case a line break) then even 20 or 30 recursions 
is child's play. 

Bob S


> On May 16, 2022, at 15:00 , Bob Sneidar via use-livecode 
> <use-livecode@lists.runrev.com> wrote:
> 
> Do you know exactly which lines you need to toss, or do you need to searc the 
> data to find out where the beginning and end of the useful data is? 
> If the former, then just put line x to y of your data into a new variable. If 
> the latter, then a divide and conquer approach might be the answer. Get the 
> line 30% in, test for valid, get the line 40% in, test, then 35% then 32.5% 
> or 37.5% depending on your test. 
> 
> You may only have to do this a dozen or so times to find the exact line where 
> your valid data begins. 
> 
> The other way of course is to get it all into a SQL database (how did you all 
> know I was going to say that??) The downside is that you have to iterate 
> through all your data once. The upside is a good one liner query statement 
> may be all you need to process your data. And if you need to make multiple 
> passes at your data, all the better. 
> 
> Bob S
> 
>> On May 16, 2022, at 10:46 , Rick Harrison via use-livecode 
>> <use-livecode@lists.runrev.com> wrote:
>> 
>> I have a large chunk of data that I want to
>> search as quickly as possible.  
>> 
>> Unfortunately the part I want to search is the 
>> middle third of the data.  The other thirds at 
>> the beginning and at the end are just junk and 
>> slow down my search so I want to get rid of them.
>> 
>> I don’t want to search line by line as that
>> takes way too long.
>> 
>> There’s no unique character dividing any
>> of these data regions.
>> 
>> What’s the best way to do this?
>> 
>> Thanks in advance!
>> 
>> Rick
>> 
>> 
>> 
>> _______________________________________________
>> use-livecode mailing list
>> use-livecode@lists.runrev.com
>> Please visit this url to subscribe, unsubscribe and manage your subscription 
>> preferences:
>> http://lists.runrev.com/mailman/listinfo/use-livecode
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Divide Large Data Blob?

Reply via email to