midstring and split taken from cflib http://www.cflib.org/udf/MidString http://www.cflib.org/udf/split
On Jan 12, 1:03 pm, "Aaron J. White" <[email protected]> wrote: > Not really. > > <cfset locals.startOfTitle = "<example_node>" /> > <cfset locals.endOfTitle = "</example_node>" /> > > <cfloop index="locals.line" file="#locals.absFilePath#"> > <cfif locals.line DOES NOT CONTAIN locals.endOfTitle> > <!--- add line to titleitem ---> > <cfset locals.titleItem &= locals.line /> > <cfset application.import.lineCount += 1 /> > <cfif application.import.stop> > <cfabort /> > </cfif> > <cfelse> > <cfset locals.titleItem &= locals.line /> > <cfset application.import.lineCount += 1 /> > <!--- we hit the end of a title. first get exta chars > from back. > we'll need those later---> > <cfset locals.tempArr = > application.utility.split(locals.titleItem, > locals.endOfTitle) /> > <cfset locals.tempItem = > locals.tempArr[arraylen(locals.tempArr)] & > "" /> > <!--- now get everything id middle of nodes ---> > <cfset locals.titleItem = locals.startOfTitle & > application.utility.midstring(locals.titleItem, locals.startOfTitle, > locals.endOfTitle) & locals.endOfTitle/> > <!--- convert title item to xml object---> > <cfset locals.titleXml = xmlparse(locals.titleItem) /> > <!--- we have our node. prepare titleItem text for > next iteration > ---> > <cfset locals.titleItem = locals.tempItem/> > <cfif application.import.stop > > <cfabort /> > <cfelse> > <!--- process the title xml and add required > info to the database > ---> > <cfset processTitleItem(locals.titleXml) /> > </cfif> > </cfif> > </cfloop> > > On Jan 12, 12:43 pm, Alex Skinner <[email protected]> wrote: > > > > > > > > > Seeing some code would be good how are you doing the read > > > I google and found something like this > > > <cfscript> > > // Define the file to read, use forward slashes only > > FileName="C:/Example/ReadMe.txt"; > > // Initilize Java File IO > > FileIOClass=createObject("java","java.io.FileReader"); > > FileIO=FileIOClass.init(FileName); > > LineIOClass=createObject("java","java.io.BufferedReader" ); > > LineIO=LineIOClass.init(FileIO); > > </cfscript> > > > <CFSET EOF=0> > > <CFLOOP condition="NOT EOF"> > > <!--- Read in next line ---> > > <CFSET CurrLine=LineIO.readLine()> > > <!--- If CurrLine is not defined, we have reached the end of file ---> > > <CFIF IsDefined("CurrLine") EQ "NO"> > > <CFSET EOF=1> > > <CFBREAK> > > </CFIF> > > <CFOUTPUT>#CurrLine#<br></CFOUTPUT><CFFLUSH> > > </CFLOOP> > > > Is your solution similar ? > > > A > > > On 12 January 2012 17:57, Aaron J. White <[email protected]> wrote: > > > > Hey all, > > > > I am receiving an OutOfMemory error while running a script that is > > > trying to loop over a 1.2gb+ xml file (~ 12 million lines). I'm not > > > really sure if what I am doing is just horrible and there is a better > > > way or if it is a memory issue in openbd. > > > > I have assigned tomcat 2gb max memory. While I'm running the script I > > > can see the memory usage slowly creep up in task manager. With 4gb of > > > ram on the vps I get to about 7 million lines before tomcat gives up. > > > When I had 3gb of ram on the server and 1gb applied to Tomcat I could > > > only get to about 4 million lines. > > > > Here's the logic behind what I am doing. > > > > I am interested in one particular node in the large file so I loop > > > over the file line by line. As I loop if the line does not contain the > > > end of the node I'm looking for then I <cfset locals.exampleNode &= > > > locals.line /> > > > Once I hit a line that contains the end of the node ( </ > > > example_node> ). I do a few operations to clean up any extra text from > > > the front and back of the node string and then convert it to xml with > > > xmlparse. > > > > Once I have the node as xml I push it to another function that does > > > serveral things. > > > ** uses xpath to grab particular information from the node. Seven > > > xpath searches are done on each node unless I decide to skip the node > > > after the first two xpath searches. > > > ** Depending on the content I either add the information to my > > > database, update the information, or skip it. I have about 5 tables > > > that are getting modified from the script. A few of the unimportant > > > queries use background="yes". > > > The whole script runs in a cfthread so it doesn't time out. > > > > Can anyone give any insight. Also, I could post some code example, but > > > my script is about 600 lines long. > > > > -- > > > online documentation:http://openbd.org/manual/ > > > google+ hints/tips:https://plus.google.com/115990347459711259462 > > > http://groups.google.com/group/openbd?hl=en > > > > Join us @http://www.OpenCFsummit.org/Dallas, Feb 2012 > > > -- > > Alex Skinner > > Managing Director > > Pixl8 Interactive > > > Tel: +448452600726 > > Email: [email protected] > > Web: pixl8.co.uk -- online documentation: http://openbd.org/manual/ google+ hints/tips: https://plus.google.com/115990347459711259462 http://groups.google.com/group/openbd?hl=en Join us @ http://www.OpenCFsummit.org/ Dallas, Feb 2012
