[OpenBD] Re: Memory Issue while looping over large file

Aaron J. White Thu, 12 Jan 2012 11:09:19 -0800

midstring and split taken from cflib

http://www.cflib.org/udf/MidString
http://www.cflib.org/udf/split


On Jan 12, 1:03 pm, "Aaron J. White" <[email protected]> wrote:
> Not really.
>
>         <cfset locals.startOfTitle = "<example_node>" />
>         <cfset locals.endOfTitle = "</example_node>" />
>
>         <cfloop index="locals.line" file="#locals.absFilePath#">
>                 <cfif locals.line DOES NOT CONTAIN locals.endOfTitle>
>                         <!--- add line to titleitem  --->
>                         <cfset locals.titleItem &= locals.line />
>                         <cfset application.import.lineCount += 1 />
>                         <cfif application.import.stop>
>                                 <cfabort />
>                         </cfif>
>                 <cfelse>
>                         <cfset locals.titleItem &= locals.line />
>                         <cfset application.import.lineCount += 1 />
>                         <!--- we hit the end of a title. first get exta chars 
> from back.
> we'll need those later--->
>                         <cfset locals.tempArr = 
> application.utility.split(locals.titleItem,
> locals.endOfTitle) />
>                         <cfset locals.tempItem = 
> locals.tempArr[arraylen(locals.tempArr)] &
> "" />
>                         <!--- now get everything id middle of nodes --->
>                         <cfset locals.titleItem = locals.startOfTitle &
> application.utility.midstring(locals.titleItem, locals.startOfTitle,
> locals.endOfTitle) & locals.endOfTitle/>
>                         <!--- convert title item to xml object--->
>                         <cfset locals.titleXml = xmlparse(locals.titleItem) />
>                         <!--- we have our node. prepare titleItem text for 
> next iteration
> --->
>                         <cfset locals.titleItem = locals.tempItem/>
>                         <cfif application.import.stop >
>                                 <cfabort />
>                         <cfelse>
>                                 <!--- process the title xml and add required 
> info to the database
> --->
>                                 <cfset processTitleItem(locals.titleXml) />
>                         </cfif>
>                 </cfif>
>         </cfloop>
>
> On Jan 12, 12:43 pm, Alex Skinner <[email protected]> wrote:
>
>
>
>
>
>
>
> > Seeing some code would be good how are you doing the read
>
> > I google and found something like this
>
> > <cfscript>
> > // Define the file to read, use forward slashes only
> > FileName="C:/Example/ReadMe.txt";
> > // Initilize Java File IO
> > FileIOClass=createObject("java","java.io.FileReader");
> > FileIO=FileIOClass.init(FileName);
> > LineIOClass=createObject("java","java.io.BufferedReader" );
> > LineIO=LineIOClass.init(FileIO);
> > </cfscript>
>
> > <CFSET EOF=0>
> > <CFLOOP condition="NOT EOF">
> >     <!--- Read in next line --->
> >     <CFSET CurrLine=LineIO.readLine()>
> >     <!--- If CurrLine is not defined, we have reached the end of file --->
> >     <CFIF IsDefined("CurrLine") EQ "NO">
> >         <CFSET EOF=1>
> >         <CFBREAK>
> >     </CFIF>
> >     <CFOUTPUT>#CurrLine#<br></CFOUTPUT><CFFLUSH>
> > </CFLOOP>
>
> > Is your solution similar ?
>
> > A
>
> > On 12 January 2012 17:57, Aaron J. White <[email protected]> wrote:
>
> > > Hey all,
>
> > > I am receiving an OutOfMemory error while running a script that is
> > > trying to loop over a 1.2gb+ xml file (~ 12 million lines). I'm not
> > > really sure if what I am doing is just horrible and there is a better
> > > way or if it is a memory issue in openbd.
>
> > > I have assigned tomcat 2gb max memory. While I'm running the script I
> > > can see the memory usage slowly creep up in task manager. With 4gb of
> > > ram on the vps I get to about 7 million lines before tomcat gives up.
> > > When I had 3gb of ram on the server and 1gb applied to Tomcat I could
> > > only get to about 4 million lines.
>
> > > Here's the logic behind what I am doing.
>
> > > I am interested in one particular node in the large file so I loop
> > > over the file line by line. As I loop if the line does not contain the
> > > end of the node I'm looking for then I <cfset locals.exampleNode &=
> > > locals.line />
> > > Once I hit a line that contains the end of the node ( </
> > > example_node> ). I do a few operations to clean up any extra text from
> > > the front and back of the node string and then convert it to xml with
> > > xmlparse.
>
> > > Once I have the node as xml I push it to another function that does
> > > serveral things.
> > > ** uses xpath to grab particular information from the node. Seven
> > > xpath searches are done on each node unless I decide to skip the node
> > > after the first two xpath searches.
> > > ** Depending on the content I either add the information to my
> > > database, update the information, or skip it. I have about 5 tables
> > > that are getting modified from the script. A few of the unimportant
> > > queries use background="yes".
> > > The whole script runs in a cfthread so it doesn't time out.
>
> > > Can anyone give any insight. Also, I could post some code example, but
> > > my script is about 600 lines long.
>
> > > --
> > > online documentation:http://openbd.org/manual/
> > >   google+ hints/tips:https://plus.google.com/115990347459711259462
> > >    http://groups.google.com/group/openbd?hl=en
>
> > >     Join us @http://www.OpenCFsummit.org/Dallas, Feb 2012
>
> > --
> > Alex Skinner
> > Managing Director
> > Pixl8 Interactive
>
> > Tel: +448452600726
> > Email: [email protected]
> > Web: pixl8.co.uk

-- 
online documentation: http://openbd.org/manual/
   google+ hints/tips: https://plus.google.com/115990347459711259462
     http://groups.google.com/group/openbd?hl=en

     Join us @ http://www.OpenCFsummit.org/ Dallas, Feb 2012

[OpenBD] Re: Memory Issue while looping over large file

Reply via email to