Hey guys, Using cfflush was the solution. I was able to process the entire file while using less memory for tomcat and the vps. Had to write a blog post for this one.
Thanks again. On Jan 12, 1:53 pm, Jamie MacDonald <[email protected]> wrote: > Hi Aaron, > > Just before your loop closing tag, can you try adding <cfflush />. > > I remember doing similar work dealing with a very large file and it was > down to this issue, even though you may not be writing a great deal of > output, with the amount of rows you mention, the whitespace characters > inside that loop could still be adding up to a large amount, and the > engine may be saving all this up thinking it will be rendering a page > causing the memory issue with a buffer. cfflush will reset the buffer > more frequently to stop it becomming full, that would be my initial > thought for this. > > Jamie MacDonald. > > On 12/01/2012 19:20, Aaron J. White wrote: > > > > > > > > > > > Cfloop over a file doesn't put the file in memory. Just the current > > line. > > >http://www.bennadel.com/blog/2011-Reading-In-File-Data-One-Line-At-A-... > > > On Jan 12, 1:13 pm, Alex Skinner<[email protected]> wrote: > >> I think basically you don't want to hold the whole file in memory, there is > >> no reason to, try the code i provided and without outputting the line just > >> out put a counter e.g. > >> 1 > >> 2 > >> 3 > >> 4 > >> 5 > >> 6 > >> 7 > >> See if it barfs at the same line number > > >> A > > >> On 12 January 2012 19:09, Aaron J. White<[email protected]> wrote: > > >>> midstring and split taken from cflib > >>>http://www.cflib.org/udf/MidString > >>>http://www.cflib.org/udf/split > >>> On Jan 12, 1:03 pm, "Aaron J. White"<[email protected]> wrote: > >>>> Not really. > >>>> <cfset locals.startOfTitle = "<example_node>" /> > >>>> <cfset locals.endOfTitle = "</example_node>" /> > >>>> <cfloop index="locals.line" file="#locals.absFilePath#"> > >>>> <cfif locals.line DOES NOT CONTAIN locals.endOfTitle> > >>>> <!--- add line to titleitem ---> > >>>> <cfset locals.titleItem&= locals.line /> > >>>> <cfset application.import.lineCount += 1 /> > >>>> <cfif application.import.stop> > >>>> <cfabort /> > >>>> </cfif> > >>>> <cfelse> > >>>> <cfset locals.titleItem&= locals.line /> > >>>> <cfset application.import.lineCount += 1 /> > >>>> <!--- we hit the end of a title. first get exta > >>> chars from back. > >>>> we'll need those later---> > >>>> <cfset locals.tempArr = > >>> application.utility.split(locals.titleItem, > >>>> locals.endOfTitle) /> > >>>> <cfset locals.tempItem = > >>> locals.tempArr[arraylen(locals.tempArr)]& > >>>> "" /> > >>>> <!--- now get everything id middle of nodes ---> > >>>> <cfset locals.titleItem = locals.startOfTitle& > >>>> application.utility.midstring(locals.titleItem, locals.startOfTitle, > >>>> locals.endOfTitle)& locals.endOfTitle/> > >>>> <!--- convert title item to xml object---> > >>>> <cfset locals.titleXml = > >>> xmlparse(locals.titleItem) /> > >>>> <!--- we have our node. prepare titleItem text > >>> for next iteration > >>>> ---> > >>>> <cfset locals.titleItem = locals.tempItem/> > >>>> <cfif application.import.stop> > >>>> <cfabort /> > >>>> <cfelse> > >>>> <!--- process the title xml and add > >>> required info to the database > >>>> ---> > >>>> <cfset processTitleItem(locals.titleXml) > >>> /> > >>>> </cfif> > >>>> </cfif> > >>>> </cfloop> > >>>> On Jan 12, 12:43 pm, Alex Skinner<[email protected]> wrote: > >>>>> Seeing some code would be good how are you doing the read > >>>>> I google and found something like this > >>>>> <cfscript> > >>>>> // Define the file to read, use forward slashes only > >>>>> FileName="C:/Example/ReadMe.txt"; > >>>>> // Initilize Java File IO > >>>>> FileIOClass=createObject("java","java.io.FileReader"); > >>>>> FileIO=FileIOClass.init(FileName); > >>>>> LineIOClass=createObject("java","java.io.BufferedReader" ); > >>>>> LineIO=LineIOClass.init(FileIO); > >>>>> </cfscript> > >>>>> <CFSET EOF=0> > >>>>> <CFLOOP condition="NOT EOF"> > >>>>> <!--- Read in next line ---> > >>>>> <CFSET CurrLine=LineIO.readLine()> > >>>>> <!--- If CurrLine is not defined, we have reached the end of file > >>> ---> > >>>>> <CFIF IsDefined("CurrLine") EQ "NO"> > >>>>> <CFSET EOF=1> > >>>>> <CFBREAK> > >>>>> </CFIF> > >>>>> <CFOUTPUT>#CurrLine#<br></CFOUTPUT><CFFLUSH> > >>>>> </CFLOOP> > >>>>> Is your solution similar ? > >>>>> A > >>>>> On 12 January 2012 17:57, Aaron J. White<[email protected]> wrote: > >>>>>> Hey all, > >>>>>> I am receiving an OutOfMemory error while running a script that is > >>>>>> trying to loop over a 1.2gb+ xml file (~ 12 million lines). I'm not > >>>>>> really sure if what I am doing is just horrible and there is a better > >>>>>> way or if it is a memory issue in openbd. > >>>>>> I have assigned tomcat 2gb max memory. While I'm running the script I > >>>>>> can see the memory usage slowly creep up in task manager. With 4gb of > >>>>>> ram on the vps I get to about 7 million lines before tomcat gives up. > >>>>>> When I had 3gb of ram on the server and 1gb applied to Tomcat I could > >>>>>> only get to about 4 million lines. > >>>>>> Here's the logic behind what I am doing. > >>>>>> I am interested in one particular node in the large file so I loop > >>>>>> over the file line by line. As I loop if the line does not contain > >>> the > >>>>>> end of the node I'm looking for then I<cfset locals.exampleNode&= > >>>>>> locals.line /> > >>>>>> Once I hit a line that contains the end of the node (</ > >>>>>> example_node> ). I do a few operations to clean up any extra text > >>> from > >>>>>> the front and back of the node string and then convert it to xml with > >>>>>> xmlparse. > >>>>>> Once I have the node as xml I push it to another function that does > >>>>>> serveral things. > >>>>>> ** uses xpath to grab particular information from the node. Seven > >>>>>> xpath searches are done on each node unless I decide to skip the node > >>>>>> after the first two xpath searches. > >>>>>> ** Depending on the content I either add the information to my > >>>>>> database, update the information, or skip it. I have about 5 tables > >>>>>> that are getting modified from the script. A few of the unimportant > >>>>>> queries use background="yes". > >>>>>> The whole script runs in a cfthread so it doesn't time out. > >>>>>> Can anyone give any insight. Also, I could post some code example, > >>> but > >>>>>> my script is about 600 lines long. > >>>>>> -- > >>>>>> online documentation:http://openbd.org/manual/ > >>>>>> google+ hints/tips:https://plus.google.com/115990347459711259462 > >>>>>> http://groups.google.com/group/openbd?hl=en > >>>>>> Join us @http://www.OpenCFsummit.org/Dallas, Feb 2012 > >>>>> -- > >>>>> Alex Skinner > >>>>> Managing Director > >>>>> Pixl8 Interactive > >>>>> Tel: +448452600726 > >>>>> Email: [email protected] > >>>>> Web: pixl8.co.uk > >>> -- > >>> online documentation:http://openbd.org/manual/ > >>> google+ hints/tips:https://plus.google.com/115990347459711259462 > >>> http://groups.google.com/group/openbd?hl=en > >>> Join us @http://www.OpenCFsummit.org/Dallas, Feb 2012 > >> -- > >> Alex Skinner > >> Managing Director > >> Pixl8 Interactive > > >> Tel: +448452600726 > >> Email: [email protected] > >> Web: pixl8.co.uk > > -- > -- > aw2.0 > http://www.aw20.co.uk/ -- online documentation: http://openbd.org/manual/ google+ hints/tips: https://plus.google.com/115990347459711259462 http://groups.google.com/group/openbd?hl=en Join us @ http://www.OpenCFsummit.org/ Dallas, Feb 2012
