[api-dev] Extremely slow performance with getCellByPosition
That's exactly what I needed. Thank you so much. The time went from two hours down to about six seconds. I still don't understand why using getCellByPosition is so horribly slow, but my particular application doesn't depend on it now, so I'm happy. If I can be of any help tracking down the source of the abysmal performance, please don't hesitate to ask.
Re: [api-dev] Extremely slow performance with getCellByPosition
Hi, you have again posted on a mailing list where you need subscription. To ensure that you will get all replies to your emails directly please subscribe under http://api.openoffice.org/servlets/ProjectMailingListList Thanks Juergen On 11/9/10 12:29 AM, Scott Deerwester wrote: That's exactly what I needed. Thank you so much. The time went from two hours down to about six seconds. I still don't understand why using getCellByPosition is so horribly slow, but my particular application doesn't depend on it now, so I'm happy. If I can be of any help tracking down the source of the abysmal performance, please don't hesitate to ask. - To unsubscribe, e-mail: dev-unsubscr...@api.openoffice.org For additional commands, e-mail: dev-h...@api.openoffice.org
Re: [api-dev] Extremely slow performance with getCellByPosition
Hi, I used OOoBasic, which is not really fast. For the test my sheet had numbers on the first 637 rows, but the macro runs on 27001 rows. Sub Main Dim sh As Object, myCell As Object Dim x As Long, y As Long, t As Long sh = ThisComponent.Sheets(0) t = GetSystemTicks for y = 0 to 27000 for x = 0 to 32 myCell = sh.getCellByPosition(x,y) next next MsgBox(Duration : GetSystemTicks -t milliseconds) End Sub Result : approx. 110 seconds (less than 2 minutes) Windows XP SP3, Intel Core 2CPU at 2.13 GHz Your spreadsheet probably has something special. Or you do something else in the loop. Try to load the csv version instead of the original Excel file. Regards Bernard Message de Andrew Pitonyak date 2010-11-08 00:04 : On Fri, 5 Nov 2010 21:24:08 -0400, Scott Deerwester scott.deerwes...@gmail.com wrote: The following code (in Python): for r in range(dataRange.StartRow, dataRange.EndRow): for c in range(dataRange.StartColumn, dataRange.EndColumn): cell = sheet.getCellByPosition(c,r) takes nearly two hours to run on a reasonably fast workstation for a spreadsheet with 32 columns and ~27,000 rows (about .2 seconds per row). By comparison, opening the file takes about 8 seconds and saving the file to a CSV (which is functionally equivalent to the above) takes a few seconds. The original file was written in Excel as an XLS (not XLSX). This seems impossibly slow. Am I misusing the API somehow? What can I do to speed it up? - To unsubscribe, e-mail: dev-unsubscr...@api.openoffice.org For additional commands, e-mail: dev-h...@api.openoffice.org
[api-dev] Extremely slow performance with getCellByPosition
On Sun, 07 Nov 2010 18:04:36 -0500 Andrew Pitonyak and...@pitonyak.org wrote: On Fri, 5 Nov 2010 21:24:08 -0400, Scott Deerwester scott.deerwes...@gmail.com wrote: The following code (in Python): for r in range(dataRange.StartRow, dataRange.EndRow): for c in range(dataRange.StartColumn, dataRange.EndColumn): cell = sheet.getCellByPosition(c,r) takes nearly two hours to run on a reasonably fast workstation for a spreadsheet with 32 columns and ~27,000 rows (about .2 seconds per row). By comparison, opening the file takes about 8 seconds and saving the file to a CSV (which is functionally equivalent to the above) takes a few seconds. The original file was written in Excel as an XLS (not XLSX). This seems impossibly slow. Am I misusing the API somehow? What can I do to speed it up? First, I usually determine which cells are used. Depending on the operations, this may be a gross test such as simply finding the largest used block. Next, if I must access the data in the cells, I obtain the data by getting the entire range and then calling getDataArray() (or getData, depending on my need / usage). I have a section on timing in AndrewMacor.odt where I search a Calc document using different methods. Thanks for the response, Andrew. I'm having trouble understanding how to use cursors, cell ranges and cell range data. What I want to do is: - Create an aggregate that has a particular (rectangular) data range that is a subset of the used cells. - Call an efficient accessor (getDataArray sounds promising) to go over the cells and write them out to a tab-separated file, with some processing for date formats first. I don't think it exists, but what I'd like to do would be: cellRange = sheet.createRangeFromPosition(startRow, startCol, endRow, endCol) for rowObj in cellRange.getDataArray(): for colNum in range(len(rowObj)): value = rowObj[colNum] if needSpecialProcessing(startCol + colNum): doSpecialProcessing(value) else: print value Barring that, I could do something like: cellRange = sheet.getDataArray() for rowNum in range(len(cellRange)): if rowNum = startRow and rowNum = endRow: rowObj = cellRange[rowNum] for colNum in range(len(rowObj)): if colNum = startCol and colNum = endCol: value = rowObj[colNum] if needSpecialProcessing(colNum): doSpecialProcessing(value) else: print value but that ends up retrieving a lot more data than I need, and in case isn't very elegant. Could you give me a simple example of how you'd generate something that generates the data array, given a starting and ending row and column?
Re: [api-dev] Extremely slow performance with getCellByPosition
On Mon, 8 Nov 2010 12:43:59 -0500, Scott Deerwester scott.deerwes...@gmail.com wrote: On Sun, 07 Nov 2010 18:04:36 -0500 Andrew Pitonyak and...@pitonyak.org wrote: On Fri, 5 Nov 2010 21:24:08 -0400, Scott Deerwester scott.deerwes...@gmail.com wrote: The following code (in Python): for r in range(dataRange.StartRow, dataRange.EndRow): for c in range(dataRange.StartColumn, dataRange.EndColumn): cell = sheet.getCellByPosition(c,r) takes nearly two hours to run on a reasonably fast workstation for a spreadsheet with 32 columns and ~27,000 rows (about .2 seconds per row). By comparison, opening the file takes about 8 seconds and saving the file to a CSV (which is functionally equivalent to the above) takes a few seconds. The original file was written in Excel as an XLS (not XLSX). This seems impossibly slow. Am I misusing the API somehow? What can I do to speed it up? First, I usually determine which cells are used. Depending on the operations, this may be a gross test such as simply finding the largest used block. Next, if I must access the data in the cells, I obtain the data by getting the entire range and then calling getDataArray() (or getData, depending on my need / usage). I have a section on timing in AndrewMacor.odt where I search a Calc document using different methods. Thanks for the response, Andrew. I'm having trouble understanding how to use cursors, cell ranges and cell range data. What I want to do is: - Create an aggregate that has a particular (rectangular) data range that is a subset of the used cells. - Call an efficient accessor (getDataArray sounds promising) to go over the cells and write them out to a tab-separated file, with some processing for date formats first. I don't think it exists, but what I'd like to do would be: cellRange = sheet.createRangeFromPosition(startRow, startCol, endRow, endCol) for rowObj in cellRange.getDataArray(): for colNum in range(len(rowObj)): value = rowObj[colNum] if needSpecialProcessing(startCol + colNum): doSpecialProcessing(value) else: print value Barring that, I could do something like: cellRange = sheet.getDataArray() for rowNum in range(len(cellRange)): if rowNum = startRow and rowNum = endRow: rowObj = cellRange[rowNum] for colNum in range(len(rowObj)): if colNum = startCol and colNum = endCol: value = rowObj[colNum] if needSpecialProcessing(colNum): doSpecialProcessing(value) else: print value but that ends up retrieving a lot more data than I need, and in case isn't very elegant. Could you give me a simple example of how you'd generate something that generates the data array, given a starting and ending row and column? I cannot easily find or create an example from my present location (I am traveling and working from memory), but, you can get a range from a sheet by position. http://api.openoffice.org/docs/common/ref/com/sun/star/table/XCellRange.html So, to obtain B1:c6 Left = 1 = column B Top = 0 = row 1 Right = 2 = C Bottom = 5 = row 6 DIm od DIm rown As Long Dim coln As Long od = ThisComponent.getSheets().getSheetByIndex(0).getCellRangeByPosition(1, 0, 2, 5).getDataArray() for rown = LBound(od) To UBound(od) oDataRow = od(rown) For coln = LBound(oDataRow) to UBound(oDataRow) oDataRow(coln) = rown + coln Next Next ThisComponent.getSheets().getSheetByIndex(0).getCellRangeByPosition(1, 0, 2, 5).setDataArray(od) Also, note Bernard's response with respect to the speed that he observed. Finally, you want to get the used range for the sheet, not the entire sheet I expect. - To unsubscribe, e-mail: dev-unsubscr...@api.openoffice.org For additional commands, e-mail: dev-h...@api.openoffice.org
Re: [api-dev] Extremely slow performance with getCellByPosition
On Fri, 5 Nov 2010 21:24:08 -0400, Scott Deerwester scott.deerwes...@gmail.com wrote: The following code (in Python): for r in range(dataRange.StartRow, dataRange.EndRow): for c in range(dataRange.StartColumn, dataRange.EndColumn): cell = sheet.getCellByPosition(c,r) takes nearly two hours to run on a reasonably fast workstation for a spreadsheet with 32 columns and ~27,000 rows (about .2 seconds per row). By comparison, opening the file takes about 8 seconds and saving the file to a CSV (which is functionally equivalent to the above) takes a few seconds. The original file was written in Excel as an XLS (not XLSX). This seems impossibly slow. Am I misusing the API somehow? What can I do to speed it up? First, I usually determine which cells are used. Depending on the operations, this may be a gross test such as simply finding the largest used block. Next, if I must access the data in the cells, I obtain the data by getting the entire range and then calling getDataArray() (or getData, depending on my need / usage). I have a section on timing in AndrewMacor.odt where I search a Calc document using different methods. - To unsubscribe, e-mail: dev-unsubscr...@api.openoffice.org For additional commands, e-mail: dev-h...@api.openoffice.org
[api-dev] Extremely slow performance with getCellByPosition
The following code (in Python): for r in range(dataRange.StartRow, dataRange.EndRow): for c in range(dataRange.StartColumn, dataRange.EndColumn): cell = sheet.getCellByPosition(c,r) takes nearly two hours to run on a reasonably fast workstation for a spreadsheet with 32 columns and ~27,000 rows (about .2 seconds per row). By comparison, opening the file takes about 8 seconds and saving the file to a CSV (which is functionally equivalent to the above) takes a few seconds. The original file was written in Excel as an XLS (not XLSX). This seems impossibly slow. Am I misusing the API somehow? What can I do to speed it up?