Re: Performance question (Am I doing something wrong?)

Jeff Greif 10 Nov 2002 18:33:03 -0000

Each invocation of the command line tool causes a process to be started,
maybe some configuration files to be read, and a CORBA connection made.
Probably there is one second or more associated with that.


You might try running under more realistic conditions for a production app
by taking one of the sample apps from the developer's guide and modifying it
to run a set of queries (possibly read from a file or the command line) in a
loop, perhaps with a varying number of passes through the list, possibly
randomly selecting a query on each attempt.   Try to make the set of queries
large enough so all the results won't be cached when you repeat, or run the
test twice, once with caching off if this can be done, to see the
difference.

I would have expected a much larger improvement using the index.  It's
conceivable that an index on 's' might help also.

It would also be good to test with more documents to see how things scale,
since you can't tell this from what you have in the collection now.

Just out of curiosity, what part-of-speech tagger are you using?

Jeff

----- Original Message -----
From: "Beni Ruef" <[EMAIL PROTECTED]>
To: <xindice-users@xml.apache.org>
Sent: Sunday, November 10, 2002 6:50 AM
Subject: Performance question (Am I doing something wrong?)


> I just installed Xindice 1.0 on my iBook under Mac OS X (with enough
> RAM, i.e. a reasonably fast machine).
>
> The (TEI encoded) texts I'm interested in look like this:
>
>      .
>      .
>      <s n="id-1.1"><w pos="DD1">This</w> <w pos="VBZ">is</w> <w
> pos="AT1">a</w><w pos="NN1">sentence</w><c pos="YSTP">.</c></s>
>      <s n="id-1.2"><w pos="DD1">This</w> <w pos="VBZ">is</w> <w
> pos="DD1">another</w><w pos="PN1">one</w><c pos="YSTP">.</c></s>
>    </body></text>
> </TEI.2>
>
> and the simplest queries like this:
>
>      xindice xpath -c /db/myCollection -q
> '/TEI.2/text/body/s[w="another"]'
>
> With a test corpus containing three documents and a total of ca. 700
> KBytes the above query takes ca. 9 seconds...
> More surprisingly, even a simple retrieval (xindice rd) of a 200K
> document needs 5 seconds!
>
> After having run
>
>      xindiceadmin ai -c /db/myCollection -p w -n wordform
>
> things improve slightly as the same query takes now 4.5 seconds, a query
> with a non-existing word form "only" 2.5 seconds.  BTW, there seems to
> be an overhead of ca. 2 seconds as any operation takes at least 2
> seconds...
>
> Obviously, this is still way too slow to be usable as I'm planning to
> work with corpora containing some 100 million words...
>
> So what am I doing (terribly ;-) wrong and what can be improved?!  What
> about this overhead and what about the indexer switches like pagesize?
>
> Thanks in advance, Cheers
> -Beni
>
>

Re: Performance question (Am I doing something wrong?)

Reply via email to