On Tuesday 28 June 2005 11:20 am, C. Ed Felt wrote:
> Fellow Open Software Supporters:
>
> I would like to ask for some input on a database issue I am working on.
>
> I am getting repeat records in my billing database for a VoIP company.
> There is no fix for this as it is a SIP, (Session Initiation Protocol),
> specific problem so I am working on a post process solution.
>
> I am writing a script in Python and the database is MySQL. The
> programming language doesn't matter since I am only looking for a
> suggestion on my sql (MySQL) query. My current plan is to pull all
> records (called Call Data Records or CDRs) from the database for the
> last 24 hours in to a big array. If any of the records match on a field
> called 'sessid' then they are copies (though not exact copies as there
> can be timestamp differences).
>
> So, in short, (since a paragraph will get confusing):
>
> 1. SELECT all records in the CDR table after a requested date (usually
> 24 hours).
> 2. Store these records in a huge array.
> 3. Find all repeats on the 'sessid' field and store this in an array.
> 4. Delete all repeats (save one copy of each repeat CDR).
>
> Is there a MySQL query, (version 3), to select all rows that have one or
> more matching rows on a specific field ('sessid')? This would
> essentially combine steps 1, 2 and 3 in one MySQL query.
Yes. There a few ways, some depending on your version of MySQL.
If you have 4.1 or newer with subselects you can do:
You'll need a primary key column in your table to identify rows uniquely for
this method:
DELETE t.*
FROM <table> t
LEFT JOIN (
SELECT sessid,
MIN(<uid>) AS <uid>, # any expression pick the uid of the row
to keep
COUNT(*) AS cnt
FROM <table>
GROUP BY sessid
HAVING cnt > 1
) dups USING(sessid)
WHERE t.uid != dups.uid;
With MySQL < 4.1, you can simply replace the subquery with a temp table
populated with the same--only one extra query.
If you don't have a primary key column in your table, it gets a bit more
difficult--and might require an iterative approach, unless you have some
other way of singling out rows, such as min(timestamp) != timestamp.
--
Respectfully,
Nicholas Leippe
Sales Team Automation, LLC
1335 West 1650 North, Suite C
Springville, UT 84663 +1 801.853.4090
http://www.salesteamautomation.com
.===================================.
| This has been a P.L.U.G. mailing. |
| Don't Fear the Penguin. |
| IRC: #utah at irc.freenode.net |
`==================================='