Hello Mark, Thank your for the explanation. It is very nice.
--- MBOX file format Yes, as you also suggest, I am already reading MXBOX file in chunks which are separated by a string CR & "From " as also defined for that file format. So, it goes "read from file <filename> at <position> until <string>". The only drawback is that at the end of the file there is no such string and it needs another way of reading, but that is then possible Another way, as you suggest, is reading line by line and checking for such string value to separate messages. I just do not know yet what will be more efficient in terms of speed. I will be testing. --- Checking available physical memory (in RAM, not on disk) Also a good way would be to check for available amount of *physical* memory. This way one could limit chunks read into memory, and processing would be pretty straight forward and fast when also knowing limitations of the OS (32bit, 64bit, available RAM, etc... all you suggested). Is there a function to know available physical memory in LiveCode? I could not find yet. --- Reading backwards in a file --Well, reading backwards in that way is equivalent to knowing how long the file is: -- read ... at -1000 until EOF -- is the same as -- read ... at (fileSize - 1000) until EOF With reading backwards I meant starting from EOF or any position and having the pointer going backward char by char to whatever other previous position. Syntax could be: "read from file <filename> at <position> down to <position>". But I am not sure if there are many use cases for this. --- Storing large number of messages You are right with storing the retrieved messages in a database. It is the best way. That is what I was preparing to do as it is obviously the only solution which makes sense for such large amounts of data. And only then it allows for all kinds of post-processing the easier way. I will be using both, SQLite, and later a remote database system. --- The detailed files I was not aware about the "the detailed files" function. Something new I learned. Again thank you. I checked the dictionary. It could be much more explicit about such function. With "detailed". It only finds the keyword "detailed." Searching for "detailed files" I finds nothing. But I found something in the Forums with good explanation. Maybe it is worth writing an enhancement request to document this function the dictionary of LiveCode. Cheers to all ), Roland On 22 March 2016 at 14:16, Mark Waddingham <m...@livecode.com> wrote: > On 2016-03-22 12:45, Roland Huettmann wrote: > >> How to know how much we can read into memory? Is there any function to >> know >> this? Is there a size limit for variables? >> > > LiveCode has a limit of 2Gb characters for strings but that depends on how > much memory a single process can have on your system. > > On 32-bit systems, you're generally limited to 768Mb-1Gb contiguous block > of memory (32-bit Windows has an address space of 3Gb for a user process > which also has to include all mapped resources such as executables and > shared libraries; Mac has a user process address space of 4Gb which also > has to include all mapped resources so you can generally get up to around > 1.5Gb contiguous allocated memory block). > > On 64-bit systems then you should be able to many 2Gb strings (or similar > in LiveCode), although obviously how fast they will operate will depend on > the amount of physical ram in the machine - disk paged virtual memory > taking up the slack). > > It is not possible to read backwards - which could be a nice way reading a >> file in some special cases. So "read from file fName at eof until -1000" >> does not work. >> > > Well, reading backwards in that way is equivalent to knowing how long the > file is: > > read ... at -1000 until EOF > > is the same as > > read ... at (fileSize - 1000) until EOF > > So, the only way reading very large file is reading a chunk of data of n >> bytes (whatever is allowed in memory), processing this, and then reading >> the next chunk until the remaining part of the file is small enough to be >> read until eof. >> > > For such a large file (38gb) your only solution is to read and parse it in > chunks. MBOX files are a sequence of records, so you need to use a process > which reads in blocks from the file when there is not enough data left to > find the current record boundary - that way you only load into memory (at > any one time) enough of the file to process completely the next record. > > In terms of finding the size of a file in LiveCode you can use 'the > detailed files'. > > It is worth pointing out that using 'open file' and 'read from file' are > *stream* based in approach. From memory, the MBOX format is essentially > line-based, so you should be able to write a relatively simple parsing loop > with that in mind: > > open file ... > repeat forever > read from file ... until return > if the result is not empty then > exit repeat > end if > if *it is a new message boundary* then > ... finish processing current message ... > ... start processing new boundary ... > else > ... append line to current message ... > end if > end repeat > > Of course, one thing to bear in mind, is that with a 38Gb file you are > never going to fit all of that into memory; so the best approach would > probably be to parse your mail messages and then store them into a storage > scheme which doesn't require everything to appear in memory at once - e.g. > an sqlite db or a more traditional dbms, or even lots of discrete files in > a filesystem in some suitable hierarchy. > > Warmest Regards, > > Mark. > > -- > Mark Waddingham ~ m...@livecode.com ~ http://www.livecode.com/ > LiveCode: Everyone can create apps > > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode