Re: [OpenBD] Re: Memory Issue while looping over large file

Benjamin Davis Thu, 12 Jan 2012 11:57:21 -0800

Interesting point Jamie.  To go along with this, instead of using cfflush
to push the whitespaces to the browser, you can
use <cfscript>getPageContext().getOut().clearBuffer();</cfscript> to reset
the buffer.  This works great for me on an app that has a fair amount of
background work before each page.


Ben

On Thu, Jan 12, 2012 at 12:53 PM, Jamie MacDonald <[email protected]> wrote:

> Hi Aaron,
>
> Just before your loop closing tag, can you try adding <cfflush />.
>
> I remember doing similar work dealing with a very large file and it was
> down to this issue, even though you may not be writing a great deal of
> output, with the amount of rows you mention, the whitespace characters
> inside that loop could still be adding up to a large amount, and the engine
> may be saving all this up thinking it will be rendering a page causing the
> memory issue with a buffer. cfflush will reset the buffer more frequently
> to stop it becomming full, that would be my initial thought for this.
>
> Jamie MacDonald.
>
> On 12/01/2012 19:20, Aaron J. White wrote:
>
>> Cfloop over a file doesn't put the file in memory. Just the current
>> line.
>>
>> http://www.bennadel.com/blog/**2011-Reading-In-File-Data-One-**
>> Line-At-A-Time-Using-**ColdFusion-s-CFLoop-Tag-Or-**
>> Java-s-LineNumberReader.htm<http://www.bennadel.com/blog/2011-Reading-In-File-Data-One-Line-At-A-Time-Using-ColdFusion-s-CFLoop-Tag-Or-Java-s-LineNumberReader.htm>
>>
>> On Jan 12, 1:13 pm, Alex Skinner<[email protected]>  wrote:
>>
>>> I think basically you don't want to hold the whole file in memory, there
>>> is
>>> no reason to, try the code i provided and without outputting the line
>>> just
>>> out put a counter e.g.
>>> 1
>>> 2
>>> 3
>>> 4
>>> 5
>>> 6
>>> 7
>>> See if it barfs at the same line number
>>>
>>> A
>>>
>>> On 12 January 2012 19:09, Aaron J. White<[email protected]>  wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>  midstring and split taken from cflib
>>>> http://www.cflib.org/udf/**MidString<http://www.cflib.org/udf/MidString>
>>>> http://www.cflib.org/udf/split
>>>> On Jan 12, 1:03 pm, "Aaron J. White"<[email protected]>  wrote:
>>>>
>>>>> Not really.
>>>>>         <cfset locals.startOfTitle = "<example_node>" />
>>>>>         <cfset locals.endOfTitle = "</example_node>" />
>>>>>         <cfloop index="locals.line" file="#locals.absFilePath#">
>>>>>                 <cfif locals.line DOES NOT CONTAIN locals.endOfTitle>
>>>>>                         <!--- add line to titleitem  --->
>>>>>                         <cfset locals.titleItem&= locals.line />
>>>>>                         <cfset application.import.lineCount += 1 />
>>>>>                         <cfif application.import.stop>
>>>>>                                 <cfabort />
>>>>>                         </cfif>
>>>>>                 <cfelse>
>>>>>                         <cfset locals.titleItem&= locals.line />
>>>>>                         <cfset application.import.lineCount += 1 />
>>>>>                         <!--- we hit the end of a title. first get exta
>>>>>
>>>> chars from back.
>>>>
>>>>> we'll need those later--->
>>>>>                         <cfset locals.tempArr =
>>>>>
>>>> application.utility.split(**locals.titleItem,
>>>>
>>>>> locals.endOfTitle) />
>>>>>                         <cfset locals.tempItem =
>>>>>
>>>> locals.tempArr[arraylen(**locals.tempArr)]&
>>>>
>>>>> "" />
>>>>>                         <!--- now get everything id middle of nodes
>>>>> --->
>>>>>                         <cfset locals.titleItem = locals.startOfTitle&
>>>>> application.utility.midstring(**locals.titleItem, locals.startOfTitle,
>>>>> locals.endOfTitle)&  locals.endOfTitle/>
>>>>>                         <!--- convert title item to xml object--->
>>>>>                         <cfset locals.titleXml =
>>>>>
>>>> xmlparse(locals.titleItem) />
>>>>
>>>>>                         <!--- we have our node. prepare titleItem text
>>>>>
>>>> for next iteration
>>>>
>>>>> --->
>>>>>                         <cfset locals.titleItem = locals.tempItem/>
>>>>>                         <cfif application.import.stop>
>>>>>                                 <cfabort />
>>>>>                         <cfelse>
>>>>>                                 <!--- process the title xml and add
>>>>>
>>>> required info to the database
>>>>
>>>>> --->
>>>>>                                 <cfset processTitleItem(locals.**
>>>>> titleXml)
>>>>>
>>>> />
>>>>
>>>>>                         </cfif>
>>>>>                 </cfif>
>>>>>         </cfloop>
>>>>> On Jan 12, 12:43 pm, Alex Skinner<[email protected]>  wrote:
>>>>>
>>>>>> Seeing some code would be good how are you doing the read
>>>>>> I google and found something like this
>>>>>> <cfscript>
>>>>>> // Define the file to read, use forward slashes only
>>>>>> FileName="C:/Example/ReadMe.**txt";
>>>>>> // Initilize Java File IO
>>>>>> FileIOClass=createObject("**java","java.io.FileReader");
>>>>>> FileIO=FileIOClass.init(**FileName);
>>>>>> LineIOClass=createObject("**java","java.io.BufferedReader" );
>>>>>> LineIO=LineIOClass.init(**FileIO);
>>>>>> </cfscript>
>>>>>> <CFSET EOF=0>
>>>>>> <CFLOOP condition="NOT EOF">
>>>>>>     <!--- Read in next line --->
>>>>>>     <CFSET CurrLine=LineIO.readLine()>
>>>>>>     <!--- If CurrLine is not defined, we have reached the end of file
>>>>>>
>>>>> --->
>>>>
>>>>>     <CFIF IsDefined("CurrLine") EQ "NO">
>>>>>>         <CFSET EOF=1>
>>>>>>         <CFBREAK>
>>>>>>     </CFIF>
>>>>>>     <CFOUTPUT>#CurrLine#<br></**CFOUTPUT><CFFLUSH>
>>>>>> </CFLOOP>
>>>>>> Is your solution similar ?
>>>>>> A
>>>>>> On 12 January 2012 17:57, Aaron J. White<[email protected]>
>>>>>>  wrote:
>>>>>>
>>>>>>> Hey all,
>>>>>>> I am receiving an OutOfMemory error while running a script that is
>>>>>>> trying to loop over a 1.2gb+ xml file (~ 12 million lines). I'm not
>>>>>>> really sure if what I am doing is just horrible and there is a better
>>>>>>> way or if it is a memory issue in openbd.
>>>>>>> I have assigned tomcat 2gb max memory. While I'm running the script I
>>>>>>> can see the memory usage slowly creep up in task manager. With 4gb of
>>>>>>> ram on the vps I get to about 7 million lines before tomcat gives up.
>>>>>>> When I had 3gb of ram on the server and 1gb applied to Tomcat I could
>>>>>>> only get to about 4 million lines.
>>>>>>> Here's the logic behind what I am doing.
>>>>>>> I am interested in one particular node in the large file so I loop
>>>>>>> over the file line by line. As I loop if the line does not contain
>>>>>>>
>>>>>> the
>>>>
>>>>> end of the node I'm looking for then I<cfset locals.exampleNode&=
>>>>>>> locals.line />
>>>>>>> Once I hit a line that contains the end of the node (</
>>>>>>> example_node>  ). I do a few operations to clean up any extra text
>>>>>>>
>>>>>> from
>>>>
>>>>> the front and back of the node string and then convert it to xml with
>>>>>>> xmlparse.
>>>>>>> Once I have the node as xml I push it to another function that does
>>>>>>> serveral things.
>>>>>>> ** uses xpath to grab particular information from the node. Seven
>>>>>>> xpath searches are done on each node unless I decide to skip the node
>>>>>>> after the first two xpath searches.
>>>>>>> ** Depending on the content I either add the information to my
>>>>>>> database, update the information, or skip it. I have about 5 tables
>>>>>>> that are getting modified from the script. A few of the unimportant
>>>>>>> queries use background="yes".
>>>>>>> The whole script runs in a cfthread so it doesn't time out.
>>>>>>> Can anyone give any insight. Also, I could post some code example,
>>>>>>>
>>>>>> but
>>>>
>>>>> my script is about 600 lines long.
>>>>>>> --
>>>>>>> online 
>>>>>>> documentation:http://openbd.**org/manual/<http://openbd.org/manual/>
>>>>>>>   google+ hints/tips:https://plus.**google.com/**
>>>>>>> 115990347459711259462<https://plus.google.com/115990347459711259462>
>>>>>>>    
>>>>>>> http://groups.google.com/**group/openbd?hl=en<http://groups.google.com/group/openbd?hl=en>
>>>>>>>     Join us 
>>>>>>> @http://www.OpenCFsummit.org/**Dallas<http://www.OpenCFsummit.org/Dallas>,
>>>>>>> Feb 2012
>>>>>>>
>>>>>> --
>>>>>> Alex Skinner
>>>>>> Managing Director
>>>>>> Pixl8 Interactive
>>>>>> Tel: +448452600726
>>>>>> Email: [email protected]
>>>>>> Web: pixl8.co.uk
>>>>>>
>>>>> --
>>>> online documentation:http://openbd.**org/manual/<http://openbd.org/manual/>
>>>>   google+ 
>>>> hints/tips:https://plus.**google.com/**115990347459711259462<https://plus.google.com/115990347459711259462>
>>>>    
>>>> http://groups.google.com/**group/openbd?hl=en<http://groups.google.com/group/openbd?hl=en>
>>>>     Join us 
>>>> @http://www.OpenCFsummit.org/**Dallas<http://www.OpenCFsummit.org/Dallas>,
>>>> Feb 2012
>>>>
>>> --
>>> Alex Skinner
>>> Managing Director
>>> Pixl8 Interactive
>>>
>>> Tel: +448452600726
>>> Email: [email protected]
>>> Web: pixl8.co.uk
>>>
>>
>
> --
> --
> aw2.0
>  http://www.aw20.co.uk/
>
> --
> online documentation: http://openbd.org/manual/
>  google+ hints/tips: 
> https://plus.google.com/**115990347459711259462<https://plus.google.com/115990347459711259462>
>    
> http://groups.google.com/**group/openbd?hl=en<http://groups.google.com/group/openbd?hl=en>
>
>    Join us @ http://www.OpenCFsummit.org/ Dallas, Feb 2012
>

-- 
online documentation: http://openbd.org/manual/
   google+ hints/tips: https://plus.google.com/115990347459711259462
     http://groups.google.com/group/openbd?hl=en

     Join us @ http://www.OpenCFsummit.org/ Dallas, Feb 2012

Re: [OpenBD] Re: Memory Issue while looping over large file

Reply via email to