[OpenBD] Re: Memory Issue while looping over large file

Aaron J. White Thu, 12 Jan 2012 21:06:28 -0800

Hey guys,

Using cfflush was the solution. I was able to process the entire file
while using less memory for tomcat and the vps. Had to write a blog
post for this one.


Thanks again.

On Jan 12, 1:53 pm, Jamie MacDonald <[email protected]> wrote:
> Hi Aaron,
>
> Just before your loop closing tag, can you try adding <cfflush />.
>
> I remember doing similar work dealing with a very large file and it was
> down to this issue, even though you may not be writing a great deal of
> output, with the amount of rows you mention, the whitespace characters
> inside that loop could still be adding up to a large amount, and the
> engine may be saving all this up thinking it will be rendering a page
> causing the memory issue with a buffer. cfflush will reset the buffer
> more frequently to stop it becomming full, that would be my initial
> thought for this.
>
> Jamie MacDonald.
>
> On 12/01/2012 19:20, Aaron J. White wrote:
>
>
>
>
>
>
>
>
>
> > Cfloop over a file doesn't put the file in memory. Just the current
> > line.
>
> >http://www.bennadel.com/blog/2011-Reading-In-File-Data-One-Line-At-A-...
>
> > On Jan 12, 1:13 pm, Alex Skinner<[email protected]>  wrote:
> >> I think basically you don't want to hold the whole file in memory, there is
> >> no reason to, try the code i provided and without outputting the line just
> >> out put a counter e.g.
> >> 1
> >> 2
> >> 3
> >> 4
> >> 5
> >> 6
> >> 7
> >> See if it barfs at the same line number
>
> >> A
>
> >> On 12 January 2012 19:09, Aaron J. White<[email protected]>  wrote:
>
> >>> midstring and split taken from cflib
> >>>http://www.cflib.org/udf/MidString
> >>>http://www.cflib.org/udf/split
> >>> On Jan 12, 1:03 pm, "Aaron J. White"<[email protected]>  wrote:
> >>>> Not really.
> >>>>          <cfset locals.startOfTitle = "<example_node>" />
> >>>>          <cfset locals.endOfTitle = "</example_node>" />
> >>>>          <cfloop index="locals.line" file="#locals.absFilePath#">
> >>>>                  <cfif locals.line DOES NOT CONTAIN locals.endOfTitle>
> >>>>                          <!--- add line to titleitem  --->
> >>>>                          <cfset locals.titleItem&= locals.line />
> >>>>                          <cfset application.import.lineCount += 1 />
> >>>>                          <cfif application.import.stop>
> >>>>                                  <cfabort />
> >>>>                          </cfif>
> >>>>                  <cfelse>
> >>>>                          <cfset locals.titleItem&= locals.line />
> >>>>                          <cfset application.import.lineCount += 1 />
> >>>>                          <!--- we hit the end of a title. first get exta
> >>> chars from back.
> >>>> we'll need those later--->
> >>>>                          <cfset locals.tempArr =
> >>> application.utility.split(locals.titleItem,
> >>>> locals.endOfTitle) />
> >>>>                          <cfset locals.tempItem =
> >>> locals.tempArr[arraylen(locals.tempArr)]&
> >>>> "" />
> >>>>                          <!--- now get everything id middle of nodes --->
> >>>>                          <cfset locals.titleItem = locals.startOfTitle&
> >>>> application.utility.midstring(locals.titleItem, locals.startOfTitle,
> >>>> locals.endOfTitle)&  locals.endOfTitle/>
> >>>>                          <!--- convert title item to xml object--->
> >>>>                          <cfset locals.titleXml =
> >>> xmlparse(locals.titleItem) />
> >>>>                          <!--- we have our node. prepare titleItem text
> >>> for next iteration
> >>>> --->
> >>>>                          <cfset locals.titleItem = locals.tempItem/>
> >>>>                          <cfif application.import.stop>
> >>>>                                  <cfabort />
> >>>>                          <cfelse>
> >>>>                                  <!--- process the title xml and add
> >>> required info to the database
> >>>> --->
> >>>>                                  <cfset processTitleItem(locals.titleXml)
> >>> />
> >>>>                          </cfif>
> >>>>                  </cfif>
> >>>>          </cfloop>
> >>>> On Jan 12, 12:43 pm, Alex Skinner<[email protected]>  wrote:
> >>>>> Seeing some code would be good how are you doing the read
> >>>>> I google and found something like this
> >>>>> <cfscript>
> >>>>> // Define the file to read, use forward slashes only
> >>>>> FileName="C:/Example/ReadMe.txt";
> >>>>> // Initilize Java File IO
> >>>>> FileIOClass=createObject("java","java.io.FileReader");
> >>>>> FileIO=FileIOClass.init(FileName);
> >>>>> LineIOClass=createObject("java","java.io.BufferedReader" );
> >>>>> LineIO=LineIOClass.init(FileIO);
> >>>>> </cfscript>
> >>>>> <CFSET EOF=0>
> >>>>> <CFLOOP condition="NOT EOF">
> >>>>>      <!--- Read in next line --->
> >>>>>      <CFSET CurrLine=LineIO.readLine()>
> >>>>>      <!--- If CurrLine is not defined, we have reached the end of file
> >>> --->
> >>>>>      <CFIF IsDefined("CurrLine") EQ "NO">
> >>>>>          <CFSET EOF=1>
> >>>>>          <CFBREAK>
> >>>>>      </CFIF>
> >>>>>      <CFOUTPUT>#CurrLine#<br></CFOUTPUT><CFFLUSH>
> >>>>> </CFLOOP>
> >>>>> Is your solution similar ?
> >>>>> A
> >>>>> On 12 January 2012 17:57, Aaron J. White<[email protected]>  wrote:
> >>>>>> Hey all,
> >>>>>> I am receiving an OutOfMemory error while running a script that is
> >>>>>> trying to loop over a 1.2gb+ xml file (~ 12 million lines). I'm not
> >>>>>> really sure if what I am doing is just horrible and there is a better
> >>>>>> way or if it is a memory issue in openbd.
> >>>>>> I have assigned tomcat 2gb max memory. While I'm running the script I
> >>>>>> can see the memory usage slowly creep up in task manager. With 4gb of
> >>>>>> ram on the vps I get to about 7 million lines before tomcat gives up.
> >>>>>> When I had 3gb of ram on the server and 1gb applied to Tomcat I could
> >>>>>> only get to about 4 million lines.
> >>>>>> Here's the logic behind what I am doing.
> >>>>>> I am interested in one particular node in the large file so I loop
> >>>>>> over the file line by line. As I loop if the line does not contain
> >>> the
> >>>>>> end of the node I'm looking for then I<cfset locals.exampleNode&=
> >>>>>> locals.line />
> >>>>>> Once I hit a line that contains the end of the node (</
> >>>>>> example_node>  ). I do a few operations to clean up any extra text
> >>> from
> >>>>>> the front and back of the node string and then convert it to xml with
> >>>>>> xmlparse.
> >>>>>> Once I have the node as xml I push it to another function that does
> >>>>>> serveral things.
> >>>>>> ** uses xpath to grab particular information from the node. Seven
> >>>>>> xpath searches are done on each node unless I decide to skip the node
> >>>>>> after the first two xpath searches.
> >>>>>> ** Depending on the content I either add the information to my
> >>>>>> database, update the information, or skip it. I have about 5 tables
> >>>>>> that are getting modified from the script. A few of the unimportant
> >>>>>> queries use background="yes".
> >>>>>> The whole script runs in a cfthread so it doesn't time out.
> >>>>>> Can anyone give any insight. Also, I could post some code example,
> >>> but
> >>>>>> my script is about 600 lines long.
> >>>>>> --
> >>>>>> online documentation:http://openbd.org/manual/
> >>>>>>    google+ hints/tips:https://plus.google.com/115990347459711259462
> >>>>>>    http://groups.google.com/group/openbd?hl=en
> >>>>>>      Join us @http://www.OpenCFsummit.org/Dallas, Feb 2012
> >>>>> --
> >>>>> Alex Skinner
> >>>>> Managing Director
> >>>>> Pixl8 Interactive
> >>>>> Tel: +448452600726
> >>>>> Email: [email protected]
> >>>>> Web: pixl8.co.uk
> >>> --
> >>> online documentation:http://openbd.org/manual/
> >>>    google+ hints/tips:https://plus.google.com/115990347459711259462
> >>>    http://groups.google.com/group/openbd?hl=en
> >>>      Join us @http://www.OpenCFsummit.org/Dallas, Feb 2012
> >> --
> >> Alex Skinner
> >> Managing Director
> >> Pixl8 Interactive
>
> >> Tel: +448452600726
> >> Email: [email protected]
> >> Web: pixl8.co.uk
>
> --
> --
> aw2.0
>    http://www.aw20.co.uk/

-- 
online documentation: http://openbd.org/manual/
   google+ hints/tips: https://plus.google.com/115990347459711259462
     http://groups.google.com/group/openbd?hl=en

     Join us @ http://www.OpenCFsummit.org/ Dallas, Feb 2012

[OpenBD] Re: Memory Issue while looping over large file

Reply via email to