Re: [U2] UV index with duplicate nodes

Richard Lewis Fri, 01 Oct 2010 10:37:00 -0700

Thanks for all the suggestions.

Data issues was one of the first things we look for, and we were able to
verify absolutely that there are no 'hidden' characters causing creation of
separate nodes. The nodes actually have exact duplicate IDs.


The file being indexed is actually a transaction log, and here is a partial
listing of the transactions, in creation order:

Trans.Log.Id MainDB.ID........ Trans.Dt. Trans.Time Trans.Dt..
Trans.Time..... Program

763038240    DBID12345         29 SEP 10 05:14:10pm 15613
62050.6937      NUS009
763038241    DBID12345         29 SEP 10 05:14:10pm 15613
62050.6954      NUS009
763038242    DBID12345         29 SEP 10 05:14:10pm 15613
62050.6971      NUS009
763038243    DBID12345         29 SEP 10 05:14:10pm 15613
62050.6988      NUS009
763038244    DBID12345         29 SEP 10 05:14:10pm 15613
62050.7006      NUS009
763038245    DBID12345         29 SEP 10 05:14:10pm 15613
62050.7024      NUS009
763038246    DBID12345         29 SEP 10 05:14:10pm 15613
62050.7041      NUS009
763038247    DBID12345         29 SEP 10 05:14:10pm 15613
62050.7057      NUS009
763038248    DBID12345         29 SEP 10 05:14:10pm 15613
62050.7075      NUS009
763038249    DBID12345         29 SEP 10 05:14:10pm 15613
62050.7095      NUS009
763038250    DBID12345         29 SEP 10 05:14:10pm 15613
62050.7112      NUS009
763038251    DBID12345         29 SEP 10 05:14:10pm 15613
62050.7129      NUS009
763038252    DBID12345         29 SEP 10 05:14:10pm 15613
62050.7146      NUS009
763038253    DBID12345         29 SEP 10 05:14:10pm 15613
62050.8737      NUS009
763038254    DBID12345         29 SEP 10 05:14:10pm 15613
62050.8761      NUS009
763038255    DBID12345         29 SEP 10 05:14:10pm 15613
62050.8777      NUS009
763038256    DBID12345         29 SEP 10 05:14:10pm 15613
62050.8793      NUS009
763038257    DBID12345         29 SEP 10 05:14:10pm 15613
62050.8811      NUS009
763038258    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9003      NUS009
763038259    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9017      NUS009
763038260    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9034      NUS009
763038261    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9053      NUS009
763038262    DBID12345         29 SEP 10 05:14:10pm 15613
62050.907       NUS009
763038263    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9494      NUS009
763038264    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9509      NUS009
763038265    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9524      NUS009
763038266    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9539      NUS009
763038267    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9553      NUS009
763038268    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9568      NUS009
763038269    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9584      NUS009
763038270    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9755      NUS009
763038271    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9769      NUS009
763038272    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9784      NUS009
763038273    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9799      NUS009
763038274    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9814      NUS009
763038275    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9829      NUS009
763038276    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9843      NUS009
763038277    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9858      NUS009
763038278    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9873      NUS009
*763038279    DBID12345         29 SEP 10 05:14:10pm 15613
62050.9943      NUS009
763038280    DBID12345         29 SEP 10 05:14:11pm 15613
62051.0247      NUS009
763038281    DBID12345         29 SEP 10 05:14:11pm 15613
62051.0264      NUS009
763038282    DBID12345         29 SEP 10 05:14:11pm 15613
62051.0279      NUS009
763038283    DBID12345         29 SEP 10 05:14:11pm 15613
62051.1511      NUS009
763038284    DBID12345         29 SEP 10 05:14:11pm 15613
62051.161       NUS009

I have placed an asterisk at the beginning of the line with the transaction
that was indexed in the duplicate node (in case the red highlighting doesn't
carry through).  _All_ of the other transactions for DBID12345 were indexed
together.  All of the other transactions for DBID12345 were created by the
same instance of the same program, at pretty much the same point in time, as
you can see from the time and date stamps.  I've included the internal time
so that you can see to the millisecond level how close together these
records are created (and indexed).

There is no correlative at all in the dictionary item, and we have already
thought that we should probably change it to a D-style dict. item, but we're
trying to reliably reproduce the problem, preferably on a smaller database,
before we make that change, so we can tell whether or not it actually fixes
it.

We're looking into determining exactly how long a rebuild of this particular
index will take, as that seems like the best work-around for us at the
moment.

We are in the process of formalizing the steps we use to identify the
problem, and codifying that in a process that can be run on every index we
have, so that we can at least identify the scope of our issues and, if
rebuild is the only remedy, make plans to schedule them regularly.

If you would like to check your indexes, here are the steps we have found to
be reliable:

1.    Create an F-pointer to the index directly, named indexfile, with the
Dict pointing to the dict of the data/source file.
2.    SSELECT indexfile
3.    SAVE.LIST XX
4.    Create a DICT fname DUP dictionary item like this:
    a.    0001: I
    b.    0002: @2;@ID;@1...@2
    c.    0003:
    d.    0004:
    e.    0005: 3R
    f.    0006: S
5.    GET.LIST XX
6.    LIST indexfile WITH DUP # “0” DUP
7.    Note the ID’s listed (dup_id1, dup_id2, etc.).  Use them as listing
criteria as follows:
    a.    LIST indexfile WITH @ID = “dup_id1]””dup_id2]” (etc.) F1 F2

Building a program to perform these tasks will be one of our next immediate
steps.  Any further suggestions or comments are most welcome!

Sincerely Best Regards,

Richard


On Fri, Oct 1, 2010 at 7:39 AM, David Wolverton <dwolv...@flash.net> wrote:

> This has been our issue in the past -- unless you look at that data with a
> HexEditor, it's not obvious.  If you use the UniData "AE" Editor (now
> included with UniVerse) you can type a "^" (Shift 6 - Caret) and see
> non-printable characters.  Once you know the culprit, you fix the source
> data and the index heals itself.
>
> David W.
>
> -----Original Message-----
> From: u2-users-boun...@listserver.u2ug.org
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Richard Brown
> Sent: Friday, October 01, 2010 6:02 AM
> To: U2 Users List
> Subject: Re: [U2] UV index with duplicate nodes
>
> Look for a space or non printing character in the data.  That would make it
> look the same but actually be unique.
>
>
> ----- Original Message -----
> From: "Richard Lewis" <rbl...@gmail.com>
> To: <U2-Users@listserver.u2ug.org>
> Sent: Thursday, September 30, 2010 8:47 PM
> Subject: [U2] UV index with duplicate nodes
>
>
> > We've just uncovered a rather unusual and unsettling situation.  We have
> a
> > file with a single index that has somehow gotten nodes with duplicate
> > keys.
> > A simple example would be having an index on ZIPCODE in a address
> > database,
> > and finding that there are _two_ nodes (records) in the index for ZIPCODE
> > 12345, for example.  The source records referred to in the nodes are not
> > duplicated, but since most operations find the 'first' node, any source
> > records referred to in the duplicate node appear to not exist in the
> > index.
> >
> >>LIST.INDEX fname ALL
> > Alternate Key Index Summary for file fname
> > File........... fname
> > Indices........ 1 (1 A-type, 0 C-type, 0 D-type, 0 I-type, 0 SQL, 0
> > S-type)
> > Index Updates.. Enabled, No updates pending
> >
> > Index name      Type  Build    Nulls  In DICT  S/M  Just Unique Field
> > num/I-type
> > fieldname        A    Not Reqd  Yes    Yes      M    L     N    2
> >
> >
> > The file contains 6,539,233 records, with 574,547 unique values in
> > fieldname
> > (which is actually a single-valued field, and has been verified that each
> > record's fieldname contains one and only one value).  We found that 9
> > source
> > records appear to have not been included in the index, but upon further
> > research found the nodes with duplicate keys.  We created an F-pointer to
> > the index file itself (not normally recommended, but useful), then got
> the
> > results like the following:
> >
> > LIST indexfile WITH @ID = "12345]" F1 F2
> >
> > fname..... F1........ F2........
> > 12345      987654     876543
> > 12345-6789 765432     543219
> > 12345      654321
> >
> > We are having our UniVerse administrator ask our dealer for assistance,
> > but
> > were interested if any other users have had any recent similar
> > experiences,
> > or advice.
>
> _______________________________________________
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
>
>
> _______________________________________________
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
>
_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] UV index with duplicate nodes

Reply via email to