thanks always here to help

On 11/27/06, Bob Gailer <[EMAIL PROTECTED]> wrote:

Bokverket wrote:
> I did program a lot in VB's earlier versions, but it has grown... My
> reason
> for not considering VB was that the actual processing would make
excellent
> use of the Python collection objects /dictionaries/, which in my mind
would
> hold words of the Microsoft Word document.
Are you aware that VBA with the scripting runtime offers dictionary
object almost identical to Python's dict? Here is a VB Sub that counts
the words in the document:

Sub test()
Dim d As Document, w As Variant, w2 As String
Set dict = CreateObject("Scripting.Dictionary")
Set d = Documents(1)
For Each w In d.Range().Words
  w2 = Trim(w)
  If dict.Exists(w2) Then
    dict(w2) = dict(w2) + 1
  Else
    dict(w2) = 0
  End If
Next
End Sub

For my test document of about 21000 occurrences of 120 words this took
about 5 seconds. The Python equivalent takes 0.15 seconds.
>>> import time
>>> import win32com.client
>>> a = win32com.client.Dispatch("word.application")
>>> d = a.Documents(1)
# wrap the process to get the text from the document, split it into
words and build the dictionary
>>> def f():
...      t=time.time()
...      s=d.Range().Text
...      w=s.split()
...      wd={}
...      for i in w:
...          wd[i]=wd.setdefault(i,0)+1
...      print time.time()-t
...
>>> f()
0.15700006485

> (The app's purpose is to analyze words of possibly very large Word
documents.) Plus I suppose that a macro which would loop with a few lines
over each word of the doc will be slow, although I don't if there is a
compiling or byte-code mechanism.  Am I wrong?
>

> I don't know if having a VB as glue to shelling Python is perfectly fine
> performance-wise, and it certainly would be a simple way to handle the
> dialog boxes that collect the parameters.  Maybe that is a much better
way
> than wondering about calling Python /shelling, calling a DLL, whatever/
> directly.
>
> Next question: Is Microsoft Word's API for Python published like for VB
and
> easy to use?
>
Word has one API. It is what is published for VB. Your Python program
would use win32com.client to launch Word as a COM server, then interact
with it the same as a VB program (well, almost the same). For this you
need pywin32 http://sourceforge.net/projects/pywin32/.

import win32com.client
application = win32com.client.Dispatch("word.application")
# application is the same as the application object you see at the top
of the word DOM in VB.
document = application.Documents.Add() # to create a new document OR
.Documents.Open(filename) to open an existing document.
# OR if word is already running you can access an existing document
using .Documents(indexno OR name)

# how is different from VB? objects do not have default properties. Must
be explicit. No set statement. Functions and subs must have the ()
appended.

Hope that's enough to get you started.

Since your goal seems to be text processing I'd think you'd want to read
the entire document text into a Python string, then manipulate that.
text = document.Range().Text will get all the text of the document body.
(excludes header/footer). Note that paragraph breaks are \r, and that
table cells end in \r\x07.

--
Bob Gailer
510-978-4454

_______________________________________________
Python-win32 mailing list
Python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32




--
http://www.goldwatches.com
_______________________________________________
Python-win32 mailing list
Python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32

Reply via email to