I'm a moron, but that's a different issue. I fixed the readline/eachline
issue, but that didn't address the crashing problem. I did some
experimenting though and I think I fixed the problem.
I added free(str) at the end of each loop to free up the memory from
parse_string. I parsed each line and for some reason my program was hanging
onto the results so the memory usage was slowly creeping up until the
program crashed. Adding frree(str) kept the memory usage flat and ran
through the entire file.
On Thursday, January 28, 2016 at 3:38:45 PM UTC-5, Stefan Karpinski wrote:
>
> At best, you'll only see every other line, right? At worst, eachline may
> do some IO lookahead (i.e. read one line ahead) and this will do something
> even more confusing.
>
> On Thu, Jan 28, 2016 at 3:35 PM, Brandon Booth > wrote:
>
>> No real reason. I was going back and forth between eachline(f) and for i
>> = 1:n to see if it worked for 1000 rows, then 10,000 rows, etc. I ended up
>> with a hybrid of the two. Will that matter much?
>>
>>
>> On Thursday, January 28, 2016 at 1:32:09 PM UTC-5, Diego Javier Zea wrote:
>>>
>>> Hi!
>>>
>>> Why you are using
>>>
>>> for line in eachline(f) l = readline(f)
>>>
>>>
>>> instead of
>>>
>>> for l in eachline(f)
>>>
>>>
>>> ?
>>>
>>> Best
>>>
>>> El jueves, 28 de enero de 2016, 12:42:35 (UTC-3), Brandon Booth escribió:
I'm parsing an XML file that's about 30gb and wrote the loop below to
parse it line by line. My code cycles through each line and builds a 1x200
dataframe that is appended to a larger dataframe. When the larger
dataframe
gets to 1000 rows I stream it to an SQLite table. The code works for the
first 25 million or so lines (which equates to 125,000 or so records in
the
SQLite table) and then freezes. I've tried it without the larger dataframe
but that didn't help.
Any suggestions to avoid crashing?
Thanks.
Brandon
The XML structure:
value
value>/field2>
...
value
value>/field2>
...
My loop:
f = open("contracts.xml","r")readline(f)n = countlines(f)tic()for line in
eachline(f) l = readline(f) if startswith(l,">> append!(df1,df)if size(df1,1) == 1000 source = convertdf(df1)
Data.stream!(source,sink) deleterows!(df1,1:1000)end else
str = parse_string(l)r = root(str)df[symbol(name(r))] =
string(content(r)) endend
close(f)
>