[RBASE-L] - Re: GROUP BY ... HAVING: Epic Fail

Albert Berry Fri, 06 Jul 2012 17:46:38 -0700

Bill, I think we were confused by the way we read what you said. Fromlater posts, I believe that you might have attrib1 on one row, attrib2on another, etc. If this is the case, then:

-- create a table with all of the columns, and one only instance of theserial number:

CREATE TEMP TABLE tHumpty (SerialNo integer, attrib1, ..., attribN)
INSERT INTO tHumpty (SerialNo) select distinct SerialNo from <table>
ALTER TABLE tHumpty ADD PRIMARY KEY (SerialNo)
-- If the source table(s) don't have an index on serialno, make one
CREATE INDEX TableSerialNo on table (SerialNo)
-- for each attrib:
UPDATE thumpty SET attrib1 = t2.attrib1 +
  FROM tHumpty t1, table t2 +
  WHERE t1.SerialNo = t2.SerialNo +
  AND Attrib1 IS NOT NULL
UPDATE thumpty SET attrib2 = t2.attrib2 +
  FROM tHumpty t1, table t2 +
  WHERE t1.SerialNo = t2.SerialNo +
  AND Attrib2 IS NOT NULL

CAVEAT: this will use the last value found where a particular attributehas more than one value, because it will update for all occurrences.


Albert


On 06/07/2012 5:26 PM, Bruce Chitiea wrote:

Bill:

PROJECTed a temp table 'humpty'. Created an index on serialno.

Command: DELETE DUPLICATES FROM humpty

...either as-is or with a USING collist results in:

'0 row(s) have been deleted from humpty'

There are a truck-load of duplicate rows.

Snake-bit

Bruce

    -------- Original Message --------
    Subject: [RBASE-L] - Re: GROUP BY ... HAVING: Epic Fail
    From: "Bill Eyring" <[email protected]
    <mailto:[email protected]>>
    Date: Fri, July 06, 2012 12:19 pm
    To: [email protected] <mailto:[email protected]> (RBASE-L Mailing
    List)

    Bruce
    Try this.
    Create a temp table and insert all columns from the flat file into
    the temp table. Create an index on the serialno column. Then
    DELETE DUPLICATES FROM TEMPTABLE using col1,col2,col3,col4. you
    will be left with unique serial number rows for each attrib.
    Bill Eyring

    ------------------------------------------------------------------------
    *From:* [email protected] <mailto:[email protected]>
    [mailto:[email protected]] *On Behalf Of *Bruce Chitiea
    *Sent:* Friday, July 06, 2012 2:18 PM
    *To:* RBASE-L Mailing List
    *Subject:* [RBASE-L] - Re: GROUP BY ... HAVING: Epic Fail

    Albert/Lawrence:

    Both of the suggested constructs return a listing of all
    serialnos, including those having a count of 1.

    Perhaps a restatement is useful. The serialno column in this flat
    file from Hell is not a key, but just one value in duplicate rows.
    The serialno WILL become the primary key in the table to which it
    and related attributes will be extracted. My problem right now is
    that essential attribute values are randomly strewn across related
    rows so that some rows have all values, many do not, some have
    none and some have wrong values requiring review.

    I need ultimately to 'collect' and collapse all those values into
    one unique serialno'd row to insert into the new table. The list
    of serialno values I need to create will drive the cursor that
    will 'walk' the accursed flat file table.

    I'm not interested in knowing serialnos with multiple rows in this
    flat file from Hell. Serialnos whose attribute sets duplicate each
    other are ok. I'm looking for only those serialnos in rows
    containing variable attribute value sets.

    So this row set is NOT what I'm looking for:

    serialno attrib1 attrib2 attrib3
    12345 | 100 | ABC | 200
    12345 | 100 | ABC | 200
    .
    12345 | 100 | ABC | 200


    Follows the jumbled row set I'm looking for:

    serialno attrib1 attrib2 attrib3
    45678 | 100 | ABC | 200
    45678 | 100 | XYZ | 200
    45678 | -0- | ABC | 200
    45678 | -0- | ABC | 100

    ... and that's why I'm hoping that the GROUP BY ... HAVING
    structure can identify serialno values for row-sets of the 45678
    variety while ignoring the 12345's.

    Thanks for hanging in there with this.

    Bruce


        -------- Original Message --------
        Subject: [RBASE-L] - Re: GROUP BY ... HAVING: Epic Fail
        From: Albert Berry <[email protected]
        <mailto:[email protected]>>
        Date: Fri, July 06, 2012 10:40 am
        To: [email protected] <mailto:[email protected]> (RBASE-L
        Mailing List)

        Your group by clause will select all the rows, Bruce.
        Try this to get only those rows with duplicated serial numbers.

        SELECT SerialNo, Attrib1, Attrib2, AttribN +
        FROM table +
        WHERE SerialNo IN +
        (SELECT SerialNo from table group by serialno having count(*) > 1)




        On 06/07/2012 10:59 AM, Bruce Chitiea wrote:
        > Ok, I'm stumped. My generic code structure, the last of many
        I've tried:
        >
        > SELECT serialno +
        > FROM table +
        > WHERE serialno IN +
        > (SELECT DISTINCT serialno +
        > FROM table t2 +
        > GROUP BY serialno,attrib1,attrib2,...,attribN +
        > HAVING COUNT(*) > 1)
        >
        >
        > I'm cleaning up all the sin and normalizing from a legacy
        flat-file
        > database with several hundred thousand rows involving maybe
        15,000
        > unique 'serialno' values.
        >
        > I need to make a list of serialno values where for each
        listed serialno
        > there exist rows encompassing more than one unique set of
        > [serialno,attrib1,attrib2,...,attribN] values.
        >
        > No matter what I try, I end up with a list of ALL 15,000
        distinct
        > serialno values.
        >
        > Help!
        >
        > Bruce
        >
        >
        >
        >
        >
        > -----
        > No virus found in this message.
        > Checked by AVG - www.avg.com <http://www.avg.com>
        > Version: 2012.0.2195 / Virus Database: 2437/5114 - Release
        Date: 07/06/12
        >
        >
        >


No virus found in this message.
Checked by AVG - www.avg.com <http://www.avg.com>
Version: 2012.0.2195 / Virus Database: 2437/5114 - Release Date: 07/06/12

[RBASE-L] - Re: GROUP BY ... HAVING: Epic Fail

Reply via email to