Albert: I'll be working your code this morning.

Bruce
-------- Original Message --------
Subject: [RBASE-L] - Re: GROUP BY ... HAVING: Epic Fail ADDENDUM
From: Albert Berry <[email protected]>
Date: Sat, July 07, 2012 7:58 am
To: [email protected] (RBASE-L Mailing List)

I should have made the last line read AND t2.Attrib2 IS NOT NULL

On 06/07/2012 6:45 PM, Albert Berry wrote:
> Bill, I think we were confused by the way we read what you said. From
> later posts, I believe that you might have attrib1 on one row, attrib2
> on another, etc. If this is the case, then:
>
> -- create a table with all of the columns, and one only instance of
> the serial number:
> CREATE TEMP TABLE tHumpty (SerialNo integer, attrib1, ..., attribN)
> INSERT INTO tHumpty (SerialNo) select distinct SerialNo from <table>
> ALTER TABLE tHumpty ADD PRIMARY KEY (SerialNo)
> -- If the source table(s) don't have an index on serialno, make one
> CREATE INDEX TableSerialNo on table (SerialNo)
> -- for each attrib:
> UPDATE thumpty SET attrib1 = t2.attrib1 +
> FROM tHumpty t1, table t2 +
> WHERE t1.SerialNo = t2.SerialNo +
> AND Attrib1 IS NOT NULL
> UPDATE thumpty SET attrib2 = t2.attrib2 +
> FROM tHumpty t1, table t2 +
> WHERE t1.SerialNo = t2.SerialNo +
> AND Attrib2 IS NOT NULL
>
> CAVEAT: this will use the last value found where a particular
> attribute has more than one value, because it will update for all
> occurrences.
>
> Albert
>
>
> On 06/07/2012 5:26 PM, Bruce Chitiea wrote:
>> Bill:
>>
>> PROJECTed a temp table 'humpty'. Created an index on serialno.
>>
>> Command: DELETE DUPLICATES FROM humpty
>>
>> ...either as-is or with a USING collist results in:
>>
>> '0 row(s) have been deleted from humpty'
>>
>> There are a truck-load of duplicate rows.
>>
>> Snake-bit
>>
>> Bruce
>>
>> -------- Original Message --------
>> Subject: [RBASE-L] - Re: GROUP BY ... HAVING: Epic Fail
>> From: "Bill Eyring" <[email protected]
>> <mailto:[email protected]>>
>> Date: Fri, July 06, 2012 12:19 pm
>> To: [email protected] <mailto:[email protected]> (RBASE-L Mailing
>> List)
>>
>> Bruce
>> Try this.
>> Create a temp table and insert all columns from the flat file into
>> the temp table. Create an index on the serialno column. Then
>> DELETE DUPLICATES FROM TEMPTABLE using col1,col2,col3,col4. you
>> will be left with unique serial number rows for each attrib.
>> Bill Eyring
>>
>> ------------------------------------------------------------------------
>> *From:* [email protected] <mailto:[email protected]>
>> [mailto:[email protected]] *On Behalf Of *Bruce Chitiea
>> *Sent:* Friday, July 06, 2012 2:18 PM
>> *To:* RBASE-L Mailing List
>> *Subject:* [RBASE-L] - Re: GROUP BY ... HAVING: Epic Fail
>>
>> Albert/Lawrence:
>>
>> Both of the suggested constructs return a listing of all
>> serialnos, including those having a count of 1.
>>
>> Perhaps a restatement is useful. The serialno column in this flat
>> file from Hell is not a key, but just one value in duplicate rows.
>> The serialno WILL become the primary key in the table to which it
>> and related attributes will be extracted. My problem right now is
>> that essential attribute values are randomly strewn across related
>> rows so that some rows have all values, many do not, some have
>> none and some have wrong values requiring review.
>>
>> I need ultimately to 'collect' and collapse all those values into
>> one unique serialno'd row to insert into the new table. The list
>> of serialno values I need to create will drive the cursor that
>> will 'walk' the accursed flat file table.
>>
>> I'm not interested in knowing serialnos with multiple rows in this
>> flat file from Hell. Serialnos whose attribute sets duplicate each
>> other are ok. I'm looking for only those serialnos in rows
>> containing variable attribute value sets.
>>
>> So this row set is NOT what I'm looking for:
>>
>> serialno attrib1 attrib2 attrib3
>> 12345 | 100 | ABC | 200
>> 12345 | 100 | ABC | 200
>> .
>> 12345 | 100 | ABC | 200
>>
>>
>> Follows the jumbled row set I'm looking for:
>>
>> serialno attrib1 attrib2 attrib3
>> 45678 | 100 | ABC | 200
>> 45678 | 100 | XYZ | 200
>> 45678 | -0- | ABC | 200
>> 45678 | -0- | ABC | 100
>>
>> ... and that's why I'm hoping that the GROUP BY ... HAVING
>> structure can identify serialno values for row-sets of the 45678
>> variety while ignoring the 12345's.
>>
>> Thanks for hanging in there with this.
>>
>> Bruce
>>
>>
>> -------- Original Message --------
>> Subject: [RBASE-L] - Re: GROUP BY ... HAVING: Epic Fail
>> From: Albert Berry <[email protected]
>> <mailto:[email protected]>>
>> Date: Fri, July 06, 2012 10:40 am
>> To: [email protected] <mailto:[email protected]> (RBASE-L
>> Mailing List)
>>
>> Your group by clause will select all the rows, Bruce.
>> Try this to get only those rows with duplicated serial numbers.
>>
>> SELECT SerialNo, Attrib1, Attrib2, AttribN +
>> FROM table +
>> WHERE SerialNo IN +
>> (SELECT SerialNo from table group by serialno having count(*)
>> > 1)
>>
>>
>>
>>
>> On 06/07/2012 10:59 AM, Bruce Chitiea wrote:
>> > Ok, I'm stumped. My generic code structure, the last of many
>> I've tried:
>> >
>> > SELECT serialno +
>> > FROM table +
>> > WHERE serialno IN +
>> > (SELECT DISTINCT serialno +
>> > FROM table t2 +
>> > GROUP BY serialno,attrib1,attrib2,...,attribN +
>> > HAVING COUNT(*) > 1)
>> >
>> >
>> > I'm cleaning up all the sin and normalizing from a legacy
>> flat-file
>> > database with several hundred thousand rows involving maybe
>> 15,000
>> > unique 'serialno' values.
>> >
>> > I need to make a list of serialno values where for each
>> listed serialno
>> > there exist rows encompassing more than one unique set of
>> > [serialno,attrib1,attrib2,...,attribN] values.
>> >
>> > No matter what I try, I end up with a list of ALL 15,000
>> distinct
>> > serialno values.
>> >
>> > Help!
>> >
>> > Bruce
>> >
>> >
>> >
>> >
>> >
>> > -----
>> > No virus found in this message.
>> > Checked by AVG - www.avg.com <http://www.avg.com>
>> > Version: 2012.0.2195 / Virus Database: 2437/5114 - Release
>> Date: 07/06/12
>> >
>> >
>> >
>>
>>
>> No virus found in this message.
>> Checked by AVG - www.avg.com <http://www.avg.com>
>> Version: 2012.0.2195 / Virus Database: 2437/5114 - Release Date:
>> 07/06/12
>>
>
>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2012.0.2195 / Virus Database: 2437/5115 - Release Date: 07/06/12
>
>


Reply via email to