Re: Problem parsing data in Gigabyte size text files

Dave Fri, 06 Jul 2007 02:41:50 -0700

Hi,

That sounds like a better approach to me too, however if the problemis because the file is > 2GB (or whatever the limit is on Windows)then it still won't work.


All the Best
Dave

On 5 Jul 2007, at 13:20, Andre Garzia wrote:

Alejandro,
if this is that kind of XML that has a simple record structure and is

repeated over and over again like a phone book, then why don't youbreak it

into smaller edible chunks and insert it into something like SQLite or

Valentina chunk by chunk. By using a RDBMS you'll be able to queryand make

sense of the XML data easily, and those databases will have no problem
dealing with large data sets.

because, even if you manage to load 8gb of data in Rev,manipulating it willbe kind slow I think, just imagine the loops needed to make crossreferenceslike find everyone who was born in july and is between 30 and 40years....


I'd make a little software to go piece by piece inserting this into a
database and then begin again from there.

Andre

On 7/4/07, Alejandro Tejada <[EMAIL PROTECTED]> wrote:


Hi all,

Recently, i was extracting data
from a 8 gigabyte ANSI text file
(a XML customer database), but after
processing approximately 3.5 gigabyte
of data, Revolution quits itself and
Windows XP presents the familiar dialog
asking to notify the Developer of this
error.

The log file that i saved, while using
the stack, shows that after reading character
3,758,096,384 (that is more than 3 thousand million
of characters) the stack could not read anymore
into the XML database and start repeating the
same last line of text that it reads.

Notice that i checked the processor and memory use
with Windows Task Manager and everything was normal.
The stack was using between a 30 to 70 % of processor
and memory use was between 45 MB and 125 MB.

The code used is similar to this:

repeat until tCounter = 8589934592 -- 8 Gigabites
read from file tData from char tCounter for 10000
-- reading 10,000 characters from database
-- these character are placed in the variable: it
put processDATA(it) into tProcessedData
write tProcessedData to tNewFile
put tCounter && last line of it & cr after URL tLOG
add 10000 to tCounter
end repeat

etc...

I have repeated the test at least 3 times :((
and the results are almost the same, with a small
difference between the character where stack quits,
while reading this 8 Gigabyte size XML database.

I have checked for strange characters in that part of
the database, when i splitted the file in many parts,
but have not found any.

Every insight that you could provide to process
this database from start to end is more
than welcome. :)

Thanks in advance.

alejandro


Visit my site:
http://www.geocities.com/capellan2000/



____________________________________________________________
________________________
Sucker-punch spam with award-winning protection.
Try the free Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/features_spam.html
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

_______________________________________________
use-revolution mailing list
[email protected]

Please visit this url to subscribe, unsubscribe and manage yoursubscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Problem parsing data in Gigabyte size text files

Reply via email to