Hello Ryan,

On 2020-05-03 00:15, Ryan Blenis wrote:

Thanks for the quick reply. Yes they're all different... but very
oddly formatted.

I can send you a copy of the output directly if you'd like, as I don't
want to expose a bunch of emails. But the gist is that many have
periods, dashes, or numbers preceding the actual email address, in
some cases as odd as "------@------.com", "----@hotmail.com [2]",
"-------@hotmail.com [2]", "-@..immature", "-@..khairy",
"-@d.i.s.c.o", "......@www.hip",
"......redactedhotmailaddr...@hotmail.com..........me",
"._.pitch@n._", etc.

indeed, they are clearly bogus addresses.


I did a pilerget on the actual email and it contains one TO address (a
correct address, the only odd thing I noticed is that the TO address
is in all caps, and from looking at the DB/sphinx I know in some
places you were using "X" as a placeholder for periods.

I just migrated from 1.1.0 (since 2015) and set up a new server on
1.3.8 - so the issue would have been (at least with) 1.1.0.  My
upgrade method was a bit of a mix after reading all your documentation
on upgrading and server migrations (since I was doing both), as well
as Bitbucket issues. I know it's been a while, but I really wanted to
take the time to understand Piler before running an upgrade and
potentially screw something up.

I looked for a changelog with a fix like this, but I didn't see a
formal one, and looking through all the commits since 2015 was going
to take a LONG time so I figured I'd ask here. I can't see if I still
get new rcpt rows for that id, as I tossed my backups of prior DB
versions on the old server already, but I can certainly pay attention
on the new server.

select id, COUNT(*) from rcpt GROUP BY id HAVING COUNT(*) > 50 ORDER
BY COUNT(*) ASC; shows that there are definitely a couple other id's
with this issue, and likely many more, but there is a steep dropoff
from 3 million to 30k to 200-500 range per ID. Maybe you (or other
users) can use that query to check the recipient count for each stored
email for similar issues.


I just did, and found 10 such emails. Some of them had even ~150 recipients.
After checking those 10 emails I found that they are all legitimate in
a sense that the email really consisted of that many recipient, and the
rcpt table entries were valid email addresses.

So fortunately the issue is not present in my personal archive.

I just wanted to make sure that my assumption on the rcpt table was
correct (e.g. it should only have the "to" recipients for each email
matching on metadata.id [1]), so that I can delete all those rows and

yes, you are right.

either manually insert the address found from pilerget or delete the
matching metadata row and just re-import the email to fix the issue.

In the latter case you'll get two hits for the same message from sphinx,
and one of them would be a ghost, because you deleted its corresponding
message.

Janos



On Sat, May 2, 2020 at 8:29 AM <s...@acts.hu> wrote:

Hello Ryan,

this must be definitely a bug. What piler version do you use?
Try selecting the first 1000 rows for id=37, and check if the
recipients are actually all different.

Do you still get new rcpt rows for id=37?
I suspect that piler tries to keep processing the very same email.

Janos

On 2020-05-02 09:41, Ryan Blenis wrote:
Hello,

Just a question about the table. I took a look at the data, and
I've
got 23 million rows in there, however I kept seeing id 37 pop up.
I
queried for "SELECT COUNT(*) FROM rcpt WHERE id = 37" and got a
count
of 3,116,977.

I was under the impression (perhaps mistakenly) that the table was
used to store recipients of the email linked on metadata.id [1]
[1] - but
certainly no email had 3 million recipients. Can you let me know
what
I'm missing here? Thank you.



Links:
------
[1] http://metadata.id
[2] http://hotmail.com

Reply via email to