Re: Index causes query to break

Natalia Shilenkova Sun, 20 Jun 2010 19:34:51 -0700

On Jun 17, 2010, at 8:05 PM, Liz Glasser wrote:

> We have xindice 1.1.
> 
> I'm not really sure what you mean by value type, but I created the index with 
> the following command:
> 
> xindice add_indexer -c url -p "*...@id" -n idindex


Value type determines how the value is going to be interpreted, by default it 
is "string". Using other types, such as integer, can speed up queries that have 
comparisons with numbers (i.e. eleme...@attribute > 5]). 

> After sending my original email I made some progress. If I create just that 
> index above the query for "/DatabaseRecord[//tns:lizsno...@id='1234']]" 
> works, but all queries for "/DatabaseRecord[//lizsnodede...@id='1234']]" fail.
> 
> However, if I add a second index for "lizsnoded...@id" then both queries 
> work. Further testing shows that the first index speeds up the first query 
> drastically and the second index speeds up the second query drastically. So 
> it appears this combination is what I need.
> 

> But I don't understand why the first index a) doesn't speed up the second 
> query and b) breaks the second query altogether. I also do not understand why 
> the second query still works on my other server. I'm guessing that may be due 
> to the fact that I only have a few records total in that database since I 
> wiped it.

It appears that you are doing everything correctly, I don't see a reason for 
the index to break the query. I tried to reproduce the problem, however, I was 
not able to do that... 

From your description I  can tell that the second query uses the index and, for 
some reason, finds no matching documents. If there is another, more specific 
index, the query will use it instead of more generic one, which seems to work 
for you. 

It is kind of difficult for me to troubleshoot the problem when I cannot 
reproduce it myself. If you can narrow it down somehow, it will help a lot 
(like having specific document structure that cause the problem). One thing you 
can try to do is to run index scanner program that I attached to see data in 
the index. It shows indexed value, corresponding document key and 
element/attribute where the value appears. 

The scanner expects 3 parameters -
1. Path to the database directory.
2. Path to the target collection that does not include database name. For 
example, if in the context switch of the command line tool you use 
/db/my/collection then second parameter should be /my/collection.
3. Index name.

You can check if the value you are searching for actually appears in the index 
for the document that you know has that value and its element name is correct, 
including namespace. This can eliminate at least some possible problems.  

Be sure to shutdown the database before running the scanner - it is a low level 
utility and scans index file directly. I did not have much time to test it, so 
I would not run it on production.

Regards,
Natalia

IndexScanner.java
Description: Binary data


> Any insight you can provide would help me to feel more confident in the 
> ability for Xindice to work in a production environment. 
> Thank you
> 
> Liz
> 

> 
> Natalia Shilenkova wrote:
>> Liz, 
>> What Xindice version do you use and what is the value type that you used for 
>> creating the index? 
>> Natalia
>> 
>> On Jun 17, 2010, at 4:12 PM, Liz Glasser wrote:
>> 
>>  
>>> Hopefully this won't be too hard to follow. Please let me know if I need to 
>>> clarify something.
>>> 
>>> I have two types of Nodes (let's call them LizsNode and LizsNodeDesc) that 
>>> each have an attribute named ID. LizsNode uses a namespace (tns).
>>> 
>>> I have had the LizsNode data around for months, but the LizNodeDesc are all 
>>> new. I have added an index defined as "*...@id" recently.
>>> 
>>> With the index defined, when use the xpath query 
>>> "/DatabaseRecord[//tns:lizsno...@id='1234']]" the node is returned 
>>> correctly. But when I use the query 
>>> "/DatabaseRecord[//lizsnodede...@id='1234']]" 99.999% of the time nothing 
>>> is returned. Occasionally the right result is returned. Also if I query for 
>>> /DatabaseRecord[//LizsNodeDesc]" all those nodes are returned and I can see 
>>> my ID is correct.
>>> 
>>> If I remove the index all queries return correctly, but its way too slow.
>>> 
>>> Also, I have one machine that this seemed to occur on but I deleted the 
>>> entire collection and the newly created one is fine. I cannot do this to my 
>>> production server however.
>>> 
>>> Any ideas what is causing this? I have confirmed the issue by querying via 
>>> commandline and via Java code.
>>> 
>>> Liz
>>>    
>> 
>> 
>> 
>>  
>

Re: Index causes query to break

Reply via email to