Sounds like a good sample to me. The collection could be smaller if you document is mostly tags. I would guess that the internal storage is not just your raw document, but a parsed version and the tags are probably represented by a number. If the ratio of your data to your tags goes way up, then you will probably see a difference. I don't know this for fact, I don't actually code Xindice, I just play a coder on television.
Tom/Kimbro can give more info or correct me if I am wrong :-). Mark Sreeni Chippada wrote: > For each dataset, I have taken a few samples. > For example, for 16MB, items 2, 100, 185, 370. All gave 210ms > For the 1GB, items 2, 500, 1000, 2000, 5000, 10000, 20000, 2300. > Couple of time I saw 400ms. But when i repeate the query, it takes > 210/220ms. > I also have other stuff running on the laptop. > > Why is the collection size is approximately or less than half the size of > the dataset size? > I was expecting the collection size is going to be much bigger than the > actual dataset size when inserted as a dom. > > > -----Original Message----- > From: Mark J. Stang [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 06, 2002 5:15 PM > To: [email protected] > Subject: Re: indexing/xpath query question > > Thanks for the information, I haven't had time to run the tests yet! > It appears that your access time is constant, that makes me think > that it is hitting the same document in the same place everytime. > How random is your selection of the document? > > Or your machine and Xindice are fast enough that it doesn't matter ;-). > > thanks, > > Mark > > Sreeni Chippada wrote: > > > Hi Mark, > > Here are the stats I gathered. > > I will try to load a much bigger file later this week. I will post > > those stats as well when I am done. > > > > Thanks, > > Sreeni > > > > ********************* BEGIN ***************** > > > > Hardware: Dell Inspiron 800 / 512 MB > > > > Software: Microsoft Win2K / JDK 1.4.0 > > > > Dataset Size: 16MB > > Number of Documents : 373 > > Load Type: DOM > > Collection Size: 7.46MB > > Insertion Time : 125s > > Index Populating Time: 5s 738ms > > Index Size: 386KB > > Retrieval Time : 210ms for any item with index/ 28 Secs without index > > > > Dataset Size: 102.5MB > > Number of Documents : 2389 > > Load Type: DOM > > Collection Size: 37MB > > Insertion Time : 770 secs > > Index Populating Time: 32s 878ms > > Index Size: 3.14MB > > Retrieval Time : 210ms for any item with index/ 192 Secs without index > > > > Dataset Size: 1025MB > > Number of Documents : ~23890 > > Load Type: DOM > > Collection Size: 432MB > > Insertion Time : 770 secs > > Index Populating Time: 6m 28s 258ms > > Index Size: 26.258MB > > Retrieval Time : 220ms for any item with index/ Didn't try without > indexing > > > > ********************** END **************** > > > > -----Original Message----- > > From: Mark J. Stang [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, March 05, 2002 10:13 PM > > To: [email protected] > > Subject: Re: indexing/xpath query question > > > > How did your speed comparison with and without the index go? > > > > thanks, > > > > Mark > > > > Sreeni Chippada wrote: > > > > > Mark, > > > That worked. > > > xindiceadmin xpath -c /db/lucent -q > > > "//*/BILL_INVOICE.bill_ref_no[text()='2']" > > > or > > > xindiceadmin xpath -c /db/lucent -q > > > "//INVOICE/BILL_INVOICE.bill_ref_no[text()='2']" > > > gives the expected result. > > > > > > Really appreciate your help. Thanks to all who responded to the > > > mails. > > > > > > -Sreeni > > > > > > -----Original Message----- > > > From: Mark J. Stang [mailto:[EMAIL PROTECTED] > > > Sent: Tuesday, March 05, 2002 7:32 PM > > > To: [email protected] > > > Subject: Re: indexing/xpath query question > > > > > > Sreeni, > > > > > > I tried it using differernt formats and the only one that worked was: > > > > > > xindiceadmin xpath -c /db/customers -q "//*/[EMAIL PROTECTED]'Stang']" > > > > > > In my case, my collection is customers, I am telling it to search > > everything > > > starting at the root of the document looking for any tag named "name" > that > > > has an attribute "lname" with a value of 'Stang'. I had to put the > > quotes > > > around > > > the whole thing. It didn't work any other way. > > > > > > Mark > > > > > > Sreeni Chippada wrote: > > > > > > > Thanks Tom. > > > > But I still do not know why this does not work. > > > > xindiceadmin xpath -c /db/test -q > > > > /INVOICE/BILL_INVOICE.bill_ref_no[text()="2"] > > > > Also tried using single quotes. Any suggestions? I tried from both > > command > > > > line and using the java api. > > > > > > > > Thanks, > > > > Sreeni > > > > > > > > -----Original Message----- > > > > From: Tom Bradford [mailto:[EMAIL PROTECTED] > > > > Sent: Tuesday, March 05, 2002 3:00 PM > > > > To: [email protected] > > > > Subject: Re: indexing/xpath query question > > > > > > > > On Monday, March 4, 2002, at 01:13 PM, Sreeni Chippada wrote: > > > > > I am new to xindice. I added a few documents as DOMs and ran > > xpath > > > > > query successfully. Then I added an index on the collection and ran > > the > > > > > query. It takes same amount of time. > > > > > > > > > > xindiceadmin ai -c /db/test -n BillRefNum -p > > > > > /INOVICE/BILL_INVOICE.bill_ref_no > > > > > > > > The IndexManager should have thrown an error when you tried to create > > > > this index, because the pattern that you used is invalid. This is a > bug > > > > in the IndexManager. > > > > > > > > Xindice indexing patterns *are not* XPaths, they are simple element, > > > > attribute, or element/attribute combinations. > > > > > > > > You should have created your indexes like this: > > > > > > > > xindiceadmin ai -c /db/test -n BillRefNum -p BILL_INVOICE.bill_ref_no > > > > > > > > Read the Xindice Administrator docs for more information about > Indexing > > > > patterns. > > > > > > > > -- > > > > Tom Bradford - http://www.tbradford.org > > > > Architect - XQRL (XQuery Engine) - http://www.xqrl.com > > > > Apache Xindice (Native XML Database) - http://xml.apache.org/xindice > > > > Project Labrador (Web Services Framework) - http://notdotnet.org
