[api-dev] Extremely slow performance with getCellByPosition

2010-11-09 Thread Scott Deerwester
That's exactly what I needed. Thank you so much. The time went from two
hours down to about six seconds. I still don't understand why using
getCellByPosition is so horribly slow, but my particular application doesn't
depend on it now, so I'm happy. If I can be of any help tracking down the
source of the abysmal performance, please don't hesitate to ask.


Re: [api-dev] Extremely slow performance with getCellByPosition

2010-11-09 Thread Jürgen Schmidt
Hi,

you have again posted on a mailing list where you need
subscription. To ensure that you will get all replies to your emails
directly please subscribe under
http://api.openoffice.org/servlets/ProjectMailingListList

Thanks

Juergen


On 11/9/10 12:29 AM, Scott Deerwester wrote:
 That's exactly what I needed. Thank you so much. The time went from two
 hours down to about six seconds. I still don't understand why using
 getCellByPosition is so horribly slow, but my particular application doesn't
 depend on it now, so I'm happy. If I can be of any help tracking down the
 source of the abysmal performance, please don't hesitate to ask.
 


-
To unsubscribe, e-mail: dev-unsubscr...@api.openoffice.org
For additional commands, e-mail: dev-h...@api.openoffice.org



Re: [api-dev] Extremely slow performance with getCellByPosition

2010-11-08 Thread Bernard Marcelly

Hi,

I used OOoBasic, which is not really fast.
For the test my sheet had numbers on the first 637 rows, but the macro runs on 
27001 rows.


Sub Main
Dim sh As Object, myCell As Object
Dim x As Long, y As Long, t As Long

sh = ThisComponent.Sheets(0)
t = GetSystemTicks
for y =  0 to 27000
  for x = 0 to 32
myCell = sh.getCellByPosition(x,y)
  next
next
MsgBox(Duration :   GetSystemTicks -t   milliseconds)
End Sub

Result : approx. 110 seconds (less than 2 minutes)
Windows XP SP3, Intel Core 2CPU at 2.13 GHz

Your spreadsheet probably has something special. Or you do something else in the 
loop.

Try to load the csv version instead of the original Excel file.

Regards
  Bernard

Message de Andrew Pitonyak  date 2010-11-08 00:04 :


On Fri, 5 Nov 2010 21:24:08 -0400, Scott Deerwester
scott.deerwes...@gmail.com  wrote:

The following code (in Python):

for r in range(dataRange.StartRow, dataRange.EndRow):
 for c in range(dataRange.StartColumn, dataRange.EndColumn):
 cell = sheet.getCellByPosition(c,r)


takes nearly two hours to run on a reasonably fast workstation for a
spreadsheet with 32 columns and ~27,000 rows (about .2 seconds per row).

By

comparison, opening the file takes about 8 seconds and saving the file

to a

CSV (which is functionally equivalent to the above) takes a few seconds.
The
original file was written in Excel as an XLS (not XLSX). This seems
impossibly slow. Am I misusing the API somehow? What can I do to speed

it

up?





-
To unsubscribe, e-mail: dev-unsubscr...@api.openoffice.org
For additional commands, e-mail: dev-h...@api.openoffice.org



[api-dev] Extremely slow performance with getCellByPosition

2010-11-08 Thread Scott Deerwester
On Sun, 07 Nov 2010 18:04:36 -0500 Andrew Pitonyak and...@pitonyak.org wrote:

 On Fri, 5 Nov 2010 21:24:08 -0400, Scott Deerwester 
 scott.deerwes...@gmail.com wrote:
 The following code (in Python):

 for r in range(dataRange.StartRow, dataRange.EndRow):
 for c in range(dataRange.StartColumn, dataRange.EndColumn):
 cell = sheet.getCellByPosition(c,r)


 takes nearly two hours to run on a reasonably fast workstation for a
 spreadsheet with 32 columns and ~27,000 rows (about .2 seconds per row). By
 comparison, opening the file takes about 8 seconds and saving the file to a
 CSV (which is functionally equivalent to the above) takes a few seconds.
 The
 original file was written in Excel as an XLS (not XLSX). This seems
 impossibly slow. Am I misusing the API somehow? What can I do to speed it
 up?


 First, I usually determine which cells are used. Depending on the
 operations, this may be a gross test such as simply finding the largest
 used block.

 Next, if I must access the data in the cells, I obtain the data by getting
 the entire range and then calling getDataArray() (or getData, depending on
 my need / usage).

 I have a section on timing in AndrewMacor.odt where I search a Calc
 document using different methods.

Thanks for the response, Andrew. I'm having trouble understanding how
to use cursors, cell ranges and cell range data. What I want to do is:

- Create an aggregate that has a particular (rectangular) data range
that is a subset of the used cells.
- Call an efficient accessor (getDataArray sounds promising) to go
over the cells and write them out to a tab-separated file, with some
processing for date formats first.

I don't think it exists, but what I'd like to do would be:

  cellRange = sheet.createRangeFromPosition(startRow, startCol, endRow, endCol)

  for rowObj in cellRange.getDataArray():
for colNum in range(len(rowObj)):
  value = rowObj[colNum]
  if needSpecialProcessing(startCol + colNum):
doSpecialProcessing(value)
  else:
print value

Barring that, I could do something like:

  cellRange = sheet.getDataArray()

  for rowNum in range(len(cellRange)):
if rowNum = startRow and rowNum = endRow:
  rowObj = cellRange[rowNum]
  for colNum in range(len(rowObj)):
if colNum = startCol and colNum = endCol:
  value = rowObj[colNum]
  if needSpecialProcessing(colNum):
doSpecialProcessing(value)
  else:
print value

but that ends up retrieving a lot more data than I need, and in case
isn't very elegant. Could you give me a simple example of how you'd
generate something that generates the data array, given a starting and
ending row and column?


Re: [api-dev] Extremely slow performance with getCellByPosition

2010-11-08 Thread Andrew Pitonyak
On Mon, 8 Nov 2010 12:43:59 -0500, Scott Deerwester
scott.deerwes...@gmail.com wrote:
 On Sun, 07 Nov 2010 18:04:36 -0500 Andrew Pitonyak and...@pitonyak.org
 wrote:

 On Fri, 5 Nov 2010 21:24:08 -0400, Scott Deerwester
 scott.deerwes...@gmail.com wrote:
 The following code (in Python):

 for r in range(dataRange.StartRow, dataRange.EndRow):
 for c in range(dataRange.StartColumn, dataRange.EndColumn):
 cell = sheet.getCellByPosition(c,r)


 takes nearly two hours to run on a reasonably fast workstation for a
 spreadsheet with 32 columns and ~27,000 rows (about .2 seconds per
 row). By
 comparison, opening the file takes about 8 seconds and saving the file
 to a
 CSV (which is functionally equivalent to the above) takes a few
seconds.
 The
 original file was written in Excel as an XLS (not XLSX). This seems
 impossibly slow. Am I misusing the API somehow? What can I do to speed
 it
 up?


 First, I usually determine which cells are used. Depending on the
 operations, this may be a gross test such as simply finding the largest
 used block.

 Next, if I must access the data in the cells, I obtain the data by
 getting
 the entire range and then calling getDataArray() (or getData, depending
 on
 my need / usage).

 I have a section on timing in AndrewMacor.odt where I search a Calc
 document using different methods.
 
 Thanks for the response, Andrew. I'm having trouble understanding how
 to use cursors, cell ranges and cell range data. What I want to do is:
 
 - Create an aggregate that has a particular (rectangular) data range
 that is a subset of the used cells.
 - Call an efficient accessor (getDataArray sounds promising) to go
 over the cells and write them out to a tab-separated file, with some
 processing for date formats first.
 
 I don't think it exists, but what I'd like to do would be:
 
   cellRange = sheet.createRangeFromPosition(startRow, startCol, endRow,
   endCol)
 
   for rowObj in cellRange.getDataArray():
 for colNum in range(len(rowObj)):
   value = rowObj[colNum]
   if needSpecialProcessing(startCol + colNum):
 doSpecialProcessing(value)
   else:
 print value
 
 Barring that, I could do something like:
 
   cellRange = sheet.getDataArray()
 
   for rowNum in range(len(cellRange)):
 if rowNum = startRow and rowNum = endRow:
   rowObj = cellRange[rowNum]
   for colNum in range(len(rowObj)):
 if colNum = startCol and colNum = endCol:
   value = rowObj[colNum]
 if needSpecialProcessing(colNum):
   doSpecialProcessing(value)
 else:
   print value
 
 but that ends up retrieving a lot more data than I need, and in case
 isn't very elegant. Could you give me a simple example of how you'd
 generate something that generates the data array, given a starting and
 ending row and column?

I cannot easily find or create an example from my present location (I am
traveling and working from memory), but, you can get a range from a sheet
by position.

http://api.openoffice.org/docs/common/ref/com/sun/star/table/XCellRange.html

So, to obtain B1:c6

Left = 1 = column B
Top = 0 = row 1
Right = 2 = C
Bottom = 5 = row 6

DIm od
DIm rown As Long
Dim coln As Long

od =
ThisComponent.getSheets().getSheetByIndex(0).getCellRangeByPosition(1, 0,
2, 5).getDataArray()

for rown = LBound(od) To UBound(od)
  oDataRow = od(rown)
  For coln = LBound(oDataRow) to UBound(oDataRow)
oDataRow(coln) = rown + coln
  Next
Next
ThisComponent.getSheets().getSheetByIndex(0).getCellRangeByPosition(1, 0,
2, 5).setDataArray(od)

Also, note Bernard's response with respect to the speed that he observed.

Finally, you want to get the used range for the sheet, not the entire
sheet I expect.

-
To unsubscribe, e-mail: dev-unsubscr...@api.openoffice.org
For additional commands, e-mail: dev-h...@api.openoffice.org



Re: [api-dev] Extremely slow performance with getCellByPosition

2010-11-07 Thread Andrew Pitonyak
On Fri, 5 Nov 2010 21:24:08 -0400, Scott Deerwester
scott.deerwes...@gmail.com wrote:
 The following code (in Python):
 
 for r in range(dataRange.StartRow, dataRange.EndRow):
 for c in range(dataRange.StartColumn, dataRange.EndColumn):
 cell = sheet.getCellByPosition(c,r)
 
 
 takes nearly two hours to run on a reasonably fast workstation for a
 spreadsheet with 32 columns and ~27,000 rows (about .2 seconds per row).
By
 comparison, opening the file takes about 8 seconds and saving the file
to a
 CSV (which is functionally equivalent to the above) takes a few seconds.
 The
 original file was written in Excel as an XLS (not XLSX). This seems
 impossibly slow. Am I misusing the API somehow? What can I do to speed
it
 up?


First, I usually determine which cells are used. Depending on the
operations, this may be a gross test such as simply finding the largest
used block.

Next, if I must access the data in the cells, I obtain the data by getting
the entire range and then calling getDataArray() (or getData, depending on
my need / usage).

I have a section on timing in AndrewMacor.odt where I search a Calc
document using different methods.


-
To unsubscribe, e-mail: dev-unsubscr...@api.openoffice.org
For additional commands, e-mail: dev-h...@api.openoffice.org



[api-dev] Extremely slow performance with getCellByPosition

2010-11-06 Thread Scott Deerwester
The following code (in Python):

for r in range(dataRange.StartRow, dataRange.EndRow):
for c in range(dataRange.StartColumn, dataRange.EndColumn):
cell = sheet.getCellByPosition(c,r)


takes nearly two hours to run on a reasonably fast workstation for a
spreadsheet with 32 columns and ~27,000 rows (about .2 seconds per row). By
comparison, opening the file takes about 8 seconds and saving the file to a
CSV (which is functionally equivalent to the above) takes a few seconds. The
original file was written in Excel as an XLS (not XLSX). This seems
impossibly slow. Am I misusing the API somehow? What can I do to speed it
up?