Re: Advise in optimizing an application
This may be something like that you want. cfquery name=findDups SELECT max(a.email_id) as email_id, a.ml_email FROM nl_mailgroups a inner join nl_catREL c on c.email_id = a.email_id WHERE c.groups_id = cfqueryparam value=#req.thisGROUPID# cfsqltype=CF_SQL_INTEGER / group by a.ml_email ORDER BY email_id /cfquery Your other query may be taking a while due to a missing index on the ml_email column. ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:359652 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm
Re: Advise in optimizing an application
Ended up finally creating a VIEW in the database to handle the problem. Even on a 15,000 record list, runs in just a few seconds... | SELECT DISTINCT nl_catREL.groups_id, duplicates.ml_id FROM nl_catREL INNER JOIN nl_mailgroupsON nl_mailgroups.ml_id = nl_catREL.ml_id INNER JOIN nl_mailgroupsAS duplicatesON duplicates.ml_email = nl_mailgroups.ml_email ANDduplicates.ml_id nl_mailgroups.ml_id| ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:359653 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm
Re: Advise in optimizing an application
You need to turn use some debugging tools to help you identify the key culprits. For example, if your query to check if a user has unsubscribed is taking 500ms, that alone would be the reason for the slowness. Either use the step debugger, a tool like FusionReactor or just turn on verbose debugging output, so you can understand your current bottleneck. It's very possible you have some inefficiencies in some of your SQL lookups (do to missing indexes, etc) that would cause slowness over lots of loops. You might look into using Java libs to read in the Excel file more efficiently. On Saturday, November 15, 2014, Les Mizzell lesm...@bellsouth.net wrote: Evening, I'm still fighting with my Excel Import application a bit. Basically, it works pretty well, as long as there's a limited number of records being imported. Problem is, the client tried to import a list with almost 15,000 addresses today, and it eventually timed out or the server reset the connection before it finished the import. Then my phone rings... The app does several things: 1. Import 4 columns from an excel sheet [email][first name][last name][company] into an array 2. Check the email address to see if it's valid. If not, delete that row from the array 3. Check the email address to see if it's been previously unsubscribed from the list and delete that row from the array 4. Check the list against other email addresses already in the database for that list, and delete that row so as to not import it if it's already there. 5. After all that, write the record Running some test with the 15,000 address sheet Reading the Excel sheet in and running steps 1 through 4 above is taking well over 10 minutes by itself. Writing the record after that, enough extra time to be sure it borks up. Smaller list (several hundred addresses) seem to import fine - MUCH faster than the former POI based app. So my question is - I need to optimize something somewhere to cut down CPU cycles and increase efficiency. The client uses this thing multiple times every day. Some of the code is from an example by Raymond Camden. Thanks Raymond. Full app below. Where are my major bottlenecks and what can I do about them? cftry cfset theISSUE = true / cfif structKeyExists(form, email_list) and len(form.email_list) !--- Each mail list has to belong to a group: 1. If a New Group - add to groups table and get ID--- cfif IsDefined(form.newGROUP) AND #form.newGROUP# NEQ cfquery name=addGROUP datasource = #req.datasource# username=#username# password=#password# SET NOCOUNT ON INSERT into nl_groupLIST ( mgroup,status ) values ( cfqueryparam value=#trim(form.newGROUP)# cfsqltype=CF_SQL_VARCHAR ,'act' ) SELECT @@IDENTITY AS newGROUPID SET NOCOUNT OFF /cfquery cfset req.thisGROUPID = #addGROUP.newGROUPID# /cfif !---2. If an existing group - just get the id--- cfif IsDefined(form.mgroup) AND #form.mgroup# NEQ xxcfset req.thisGROUPID = #form.mgroup# /cfif !--- Get the Excel Sheet and send it to a destination outside of web root --- cfset dest = getTempDirectory() cffile action=upload destination=#dest# filefield=email_list result=upload nameconflict=makeunique cfif upload.fileWasSaved cfset theFile = upload.serverDirectory / upload.serverFile cfif isSpreadsheetFile(theFile) cfspreadsheet action=read src=#theFile# query=data cffile action=delete file=#theFile# cfelse cfset theISSUE = invalid / cfset errors = The file was not an Excel file. cffile action=delete file=#theFile# /cfif cfelse cfset theISSUE = loading / cfset errors = The file was not properly uploaded. /cfif /cfif cfset metadata = getMetadata(data) cfset colList = cfloop index=col array=#metadata# cfset colList = listAppend(colList, col.name) /cfloop cfif data.recordCount is 0 cfset theISSUE = empty / cfelse !--- Create an array and read the data into it --- cfset mailREC=arraynew(2) cfoutput query=data cfif IsDefined(data.col_1) AND #isVALID(email,data.col_1)#cfset email = trim(#data.col_1#) /cfelsecfset email = - //cfif cfif IsDefined(data.col_2) AND #len(data.col_2)#cfset fname = trim(#data.col_2#) /cfelsecfset fname = - //cfif cfif IsDefined(data.col_3) AND #len(data.col_3)#cfset lname = trim(#data.col_3#) /cfelsecfset lname = - //cfif cfif IsDefined(data.col_4) AND #len(data.col_4)#cfset other = trim(#data.col_4#) /cfelsecfset other = - //cfif cfset mailREC[CurrentRow][1] = email / cfset mailREC[CurrentRow][2] = fname / cfset mailREC[CurrentRow][3] = lname / cfset mailREC[CurrentRow][4] = other / /cfoutput !--- Before we can save the list, we need to run two
Re: Advise in optimizing an application
I'm still fighting with my Excel Import application a bit. Basically, it works pretty well, as long as there's a limited number of records being imported. Was testing last night with the 15,000 record Excel sheet. If I comment out the two filter queries in the code, it will read in the sheet and a cfdump of the array shows everything was imported properly. Didn't time it, but it only took a few minutes. Adding those two queries back in and trying to import the entire sheet - the server choked and I had to call support to reboot it. Whoops. It's the two filter queries below that are slowing the whole thing down and choking the server. The one optimization I can think of is the look for duplicates query. Instead of running that against the entire database looking for matches (several hundred thousand records last time I checked), I should do a query of queries and run the loop ONLY against records for that one group, which I'd prefetch before starting the loop. That should help a lot. Working on that now... One thought I had - I don't care how long the import of a large sheet takes. It just can't choke the server. Would there be any benefit of putting in a DELAY for every 500 records (or so) in the loop to give the server a rest to do other stuff? cfif i MOD 500 EQ 0 cfset sleep(5000) / /cfif FILTER QUERIES !--- Before we can save the list, we need to run two checks --- !--- First, is there an email address in this list that has previously unsubscribed? --- cfoutput cfloop from=#arrayLen(mailREC)# to=1 step=-1 index=i !--- delete the row if it was an invalid address. Got to be a way to fix this but --- !--- the above data output block wants to add an invalide row if you try to skip the entire row when the address is invalid --- cfif mailREC[i][1] eq - cfset ArrayDeleteAt(mailREC, #i#) cfelse !--- OK, it's a valid address. Have they unsubscribed before? --- cfquery name=unsubbed datasource=#req.datasource# username=#username# password=#password# SELECT groups_id FROM maillist_unsubscribe WHERE dateRESUB is NULL AND ml_email = cfqueryparam value=#mailREC[i][1]# cfsqltype=CF_SQL_VARCHAR AND groups_id = cfqueryparam value=#req.thisGROUPID# cfsqltype=CF_SQL_INTEGER / /cfquery cfif unsubbed.recordcount EQ 1cfset ArrayDeleteAt(mailREC, #i#)/cfif /cfif /cfloop !--- if Admin has checked the duplicate filter box on the form, but sure nothing incoming is a duplicate already in the database for this group --- cfif form.dupFILTER EQ on cfloop from=#arrayLen(mailREC)# to=1 step=-1 index=i cfquery name=ckDUP datasource = #req.datasource# username=#username# password=#password# Select groups_id from vw_mailLIST WHERE groups_id = cfqueryparam value=#trim(req.thisGROUPID)# cfsqltype=CF_SQL_INTEGER / AND ml_email = cfqueryparam cfsqltype=cf_sql_varchar value=#mailREC[i][1]# / /cfquery cfif ckDUP.recordcount EQ 1cfset ArrayDeleteAt(mailREC, #i#)/cfif /cfloop /cfif /cfoutput ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:359647 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm
Re: Advise in optimizing an application
Do you have indexes on ml_email and groups_id columns? I would likely create an index that has both in it. There's a couple of suggestions about changing your architecture a bit: First, from the looks of it I think you could probably combine both these queries into one with a left join vw_mailLIST and then evaluate the outcome to determine if you delete the item from your array. As long as the items joined on are PKs or have an index it should be faster than running two individual queries. Without knowing the actual table structure or data I can't be sure but off-hand I'd think the following would be much quicker. (Note that since I can't test there may need to be some tweaking.) SELECT mlu.groups_id as mlu_gid, mlu.dateRESUB, vw.groups_ID as vw_gid FROM maillist_unsubscribe mlu LEFT JOIN vw_mailLIST vw ON mlu.groups_id = vw.groups_id WHERE mlu.ml_email = vw.ml_email ANDmlu.ml_email = cfqueryparam value=#mailREC[i][1]# cfsqltype= CF_SQL_VARCHAR ANDmlu.groups_id = cfqueryparam value=#req.thisGROUPID# cfsqltype= CF_SQL_INTEGER / You can then evaluate if dateRESUB is NULL, if mlu_gid and/or vw_gid are null or if there's recordCount at all to determine whether you remove the item from the the array. You can also have your if statement about the dupFILTER there as well. This puts everything into a single loop with 50% less queries. Second, I would consider doing things a bit differently and instead of looping over your array with individual queries it should be much faster if you run a single query and then loop over the query result to determine if you update the array. So create a comma separated list of email addy's from your mailREC array and use it in an IN statement on mlu.ml_email. IN statements aren't super efficient, but if you have an index on ml_email it should still be much faster than 15,000 individual queries. Then loop over the result set and use ArrayFind to see if the result is in the array and if it is, delete it. The result set should only contain the results you're looking to delete if I'm understanding your queries correctly. This approach would be WAY faster and much more scalable since looping over 15,000 or even 100,000 results in memory is way quicker than making that many database queries. Hopefully you find this useful. Without having your database to query against it can be a bit tough to determine if the code I wrote works. I do think the architecture logic in approach two is worth the time to refactor though. Have a great day. Mike On Sat, Nov 15, 2014 at 8:44 AM, Les Mizzell lesm...@bellsouth.net wrote: I'm still fighting with my Excel Import application a bit. Basically, it works pretty well, as long as there's a limited number of records being imported. Was testing last night with the 15,000 record Excel sheet. If I comment out the two filter queries in the code, it will read in the sheet and a cfdump of the array shows everything was imported properly. Didn't time it, but it only took a few minutes. Adding those two queries back in and trying to import the entire sheet - the server choked and I had to call support to reboot it. Whoops. It's the two filter queries below that are slowing the whole thing down and choking the server. The one optimization I can think of is the look for duplicates query. Instead of running that against the entire database looking for matches (several hundred thousand records last time I checked), I should do a query of queries and run the loop ONLY against records for that one group, which I'd prefetch before starting the loop. That should help a lot. Working on that now... One thought I had - I don't care how long the import of a large sheet takes. It just can't choke the server. Would there be any benefit of putting in a DELAY for every 500 records (or so) in the loop to give the server a rest to do other stuff? cfif i MOD 500 EQ 0 cfset sleep(5000) / /cfif FILTER QUERIES !--- Before we can save the list, we need to run two checks --- !--- First, is there an email address in this list that has previously unsubscribed? --- cfoutput cfloop from=#arrayLen(mailREC)# to=1 step=-1 index=i !--- delete the row if it was an invalid address. Got to be a way to fix this but --- !--- the above data output block wants to add an invalide row if you try to skip the entire row when the address is invalid --- cfif mailREC[i][1] eq - cfset ArrayDeleteAt(mailREC, #i#) cfelse !--- OK, it's a valid address. Have they unsubscribed before? --- cfquery name=unsubbed datasource=#req.datasource# username=#username# password=#password# SELECT groups_id FROM maillist_unsubscribe WHERE dateRESUB is NULL AND ml_email = cfqueryparam value=#mailREC[i][1]# cfsqltype=CF_SQL_VARCHAR AND groups_id = cfqueryparam value=#req.thisGROUPID# cfsqltype=CF_SQL_INTEGER / /cfquery
Re: Advise in optimizing an application
Any chance to have the database engine do all that record logic? That would be the first thing I would try. Stored procedures are great for things like this. I don't know what RDMS you are using, but most have SQL that can do this. Complex database stuff is usually better off doing within the database as it can scale. 1 store the four Excel fields in a temporary table in the database 2 Use SQL to weed out all the records against the current tables. Google for examples. 3 Dump the parsed records into the production table. 2 and 3 probably are one step in the database engine, but the SQL might be tedious so my first thought is to separate it. Good luck, Roger Les Mizzell lesm...@bellsouth.net wrote: Evening, I'm still fighting with my Excel Import application a bit. Basically, it works pretty well, as long as there's a limited number of records being imported. Problem is, the client tried to import a list with almost 15,000 addresses today, and it eventually timed out or the server reset the connection before it finished the import. Then my phone rings... The app does several things: 1. Import 4 columns from an excel sheet [email][first name][last name][company] into an array 2. Check the email address to see if it's valid. If not, delete that row from the array 3. Check the email address to see if it's been previously unsubscribed from the list and delete that row from the array 4. Check the list against other email addresses already in the database for that list, and delete that row so as to not import it if it's already there. 5. After all that, write the record Running some test with the 15,000 address sheet Reading the Excel sheet in and running steps 1 through 4 above is taking well over 10 minutes by itself. Writing the record after that, enough extra time to be sure it borks up. Smaller list (several hundred addresses) seem to import fine - MUCH faster than the former POI based app. -- LinkedIn: http://www.linkedin.com/pub/roger-austin/8/a4/60 Twitter: http://twitter.com/RogerTheGeek Blog: http://RogerTheGeek.wordpress.com/ ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:359649 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm
Re: Advise in optimizing an application
On 11/15/2014 12:42 PM, Roger Austin wrote: Any chance to have the database engine do all that record logic? After killing myself on this, I finally realiaed I was doing it ALL WRONG! All those loops and everything to filter the array, then do the insertwas taking forever. I'm revising - BULK IMPORT EVERYTHING to a temp table. No filters. Then, run the filters against the database and write to the final table. A LOT faster. Duh.. Still got one problem, and it's the duplicates filter. So, I need to filter the inserted records and kill any duplicate email addresses. First, I need to be sure I'm filtering ONLY the addresses that belong to the specific group, as folks might be subscribed to more than one group: cfquery name=getFULLLIST SELECT groups_id, email_id from nl_catREL WHERE groups_id = cfqueryparam value=#req.thisGROUPID# cfsqltype=CF_SQL_INTEGER / /cfquery cfset fullLIST = #ValueList(getFULLLIST.email_id)# / Now, let's see if I can return JUST the duplicates. Once that works, it's no big deal to delete them... cfquery name=findDups SELECT distinct a.email_id FROM nl_mailgroups a, nl_mailgroups b WHERE a.ml_id in (cfqueryparam value=#fullLIST# cfsqltype=CF_SQL_INTEGER list=yes /) AND a.email_id b.email_id AND a.ml_email = b.ml_email ORDER BY a.email_id /cfquery That doesn't work. It returns everything ... and it takes a really long time, even with only 150 addresses or so. Ideas? ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:359650 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm
Re: Advise in optimizing an application
You might try something like this SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE t1.email = t2.email); Without being more familiar with your use case, it is very difficult to suggest much. Les Mizzell lesm...@bellsouth.net wrote: On 11/15/2014 12:42 PM, Roger Austin wrote: Any chance to have the database engine do all that record logic? After killing myself on this, I finally realiaed I was doing it ALL WRONG! All those loops and everything to filter the array, then do the insertwas taking forever. I'm revising - BULK IMPORT EVERYTHING to a temp table. No filters. Then, run the filters against the database and write to the final table. A LOT faster. Duh.. Still got one problem, and it's the duplicates filter. So, I need to filter the inserted records and kill any duplicate email addresses. First, I need to be sure I'm filtering ONLY the addresses that belong to the specific group, as folks might be subscribed to more than one group: cfquery name=getFULLLIST SELECT groups_id, email_id from nl_catREL WHERE groups_id = cfqueryparam value=#req.thisGROUPID# cfsqltype=CF_SQL_INTEGER / /cfquery cfset fullLIST = #ValueList(getFULLLIST.email_id)# / Now, let's see if I can return JUST the duplicates. Once that works, it's no big deal to delete them... cfquery name=findDups SELECT distinct a.email_id FROM nl_mailgroups a, nl_mailgroups b WHERE a.ml_id in (cfqueryparam value=#fullLIST# cfsqltype=CF_SQL_INTEGER list=yes /) AND a.email_id b.email_id AND a.ml_email = b.ml_email ORDER BY a.email_id /cfquery That doesn't work. It returns everything ... and it takes a really long time, even with only 150 addresses or so. Ideas? ~| Order the Adobe Coldfusion Anthology now! http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:359651 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm
Advise in optimizing an application
Evening, I'm still fighting with my Excel Import application a bit. Basically, it works pretty well, as long as there's a limited number of records being imported. Problem is, the client tried to import a list with almost 15,000 addresses today, and it eventually timed out or the server reset the connection before it finished the import. Then my phone rings... The app does several things: 1. Import 4 columns from an excel sheet [email][first name][last name][company] into an array 2. Check the email address to see if it's valid. If not, delete that row from the array 3. Check the email address to see if it's been previously unsubscribed from the list and delete that row from the array 4. Check the list against other email addresses already in the database for that list, and delete that row so as to not import it if it's already there. 5. After all that, write the record Running some test with the 15,000 address sheet Reading the Excel sheet in and running steps 1 through 4 above is taking well over 10 minutes by itself. Writing the record after that, enough extra time to be sure it borks up. Smaller list (several hundred addresses) seem to import fine - MUCH faster than the former POI based app. So my question is - I need to optimize something somewhere to cut down CPU cycles and increase efficiency. The client uses this thing multiple times every day. Some of the code is from an example by Raymond Camden. Thanks Raymond. Full app below. Where are my major bottlenecks and what can I do about them? cftry cfset theISSUE = true / cfif structKeyExists(form, email_list) and len(form.email_list) !--- Each mail list has to belong to a group: 1. If a New Group - add to groups table and get ID--- cfif IsDefined(form.newGROUP) AND #form.newGROUP# NEQ cfquery name=addGROUP datasource = #req.datasource# username=#username# password=#password# SET NOCOUNT ON INSERT into nl_groupLIST ( mgroup,status ) values ( cfqueryparam value=#trim(form.newGROUP)# cfsqltype=CF_SQL_VARCHAR ,'act' ) SELECT @@IDENTITY AS newGROUPID SET NOCOUNT OFF /cfquery cfset req.thisGROUPID = #addGROUP.newGROUPID# /cfif !---2. If an existing group - just get the id--- cfif IsDefined(form.mgroup) AND #form.mgroup# NEQ xxcfset req.thisGROUPID = #form.mgroup# /cfif !--- Get the Excel Sheet and send it to a destination outside of web root --- cfset dest = getTempDirectory() cffile action=upload destination=#dest# filefield=email_list result=upload nameconflict=makeunique cfif upload.fileWasSaved cfset theFile = upload.serverDirectory / upload.serverFile cfif isSpreadsheetFile(theFile) cfspreadsheet action=read src=#theFile# query=data cffile action=delete file=#theFile# cfelse cfset theISSUE = invalid / cfset errors = The file was not an Excel file. cffile action=delete file=#theFile# /cfif cfelse cfset theISSUE = loading / cfset errors = The file was not properly uploaded. /cfif /cfif cfset metadata = getMetadata(data) cfset colList = cfloop index=col array=#metadata# cfset colList = listAppend(colList, col.name) /cfloop cfif data.recordCount is 0 cfset theISSUE = empty / cfelse !--- Create an array and read the data into it --- cfset mailREC=arraynew(2) cfoutput query=data cfif IsDefined(data.col_1) AND #isVALID(email,data.col_1)#cfset email = trim(#data.col_1#) /cfelsecfset email = - //cfif cfif IsDefined(data.col_2) AND #len(data.col_2)#cfset fname = trim(#data.col_2#) /cfelsecfset fname = - //cfif cfif IsDefined(data.col_3) AND #len(data.col_3)#cfset lname = trim(#data.col_3#) /cfelsecfset lname = - //cfif cfif IsDefined(data.col_4) AND #len(data.col_4)#cfset other = trim(#data.col_4#) /cfelsecfset other = - //cfif cfset mailREC[CurrentRow][1] = email / cfset mailREC[CurrentRow][2] = fname / cfset mailREC[CurrentRow][3] = lname / cfset mailREC[CurrentRow][4] = other / /cfoutput !--- Before we can save the list, we need to run two checks --- !--- First, is there an email address in this list that has previously unsubscribed? --- cfoutput cfloop from=#arrayLen(mailREC)# to=1 step=-1 index=i !--- delete the row if it was an invalid address. Got to be a way to fix this but --- !--- the above data output block wants to add an invalide row if you try to skip the entire row when the address is invalid --- cfif mailREC[i][1] eq - cfset ArrayDeleteAt(mailREC, #i#) cfelse !--- OK, it's a valid address. Have they unsubscribed before? --- cfquery name=unsubbed datasource=#req.datasource# username=#username# password=#password# SELECT groups_id FROM maillist_unsubscribe WHERE dateRESUB is NULL