I should probably give that a try just to see, but I am 9.999% positive that fedora has tabs in their changelog fields. I don't think a single character delimiter will work across the board given the variety of the source data.
' | ' was working just fine... but apparently the postgres COPY command doesn't like multicharacter delimiters. Thought I'd toss this out since I just started pulling in all 18 million updates (again) and now I'm vaguely bored. The previous attempt using manual INSERT's for each line took 2.5 hours, but I wasn't merging the new data with the old. -Ben On Monday, September 9th, 2024 at 4:30 PM, Russell Senior <[email protected]> wrote: > I like tabs as delimiters, fwiw. > > -- > Russell Senior > [email protected] > > On Mon, Sep 9, 2024 at 4:22 PM Ben Koenig [email protected] wrote: > > > I might bre preaching to the choir here, but it turns out there are lots of > > ways to write down an email address. > > > > I'm reading data into a database that is stored in "|" delimited strings - > > "col1|col2|col3|" etc. This data includes email addresses. Much to my > > surprise I ran into errors because someone(who shall rename nameless) > > decided to enter their identity under the following format: > > > > "firstname lastname" <$USER|at|$HOST> > > > > VERTICAL LINES. Why they did this, I don't know. As it turns out, this > > particular data set has no defined policy for users entering their email, > > since I see the following formats... > > > > "firstname lastname" <$USER|at|$HOST> > > "firstname lastname" <$USER at $HOST> > > > > "firstname lastname" <$USER@$HOST> > > > > "firstname lastname" > > > > I'm sure there are other formats I have yet to observe....... > > > > I've seen the human readable "at" in email addresses, but I was NOT > > expecting to see someone combine it with vertical lines. But then again, I > > suppose it would be unreasonable of me to expect volunteers working for > > free to adhere to a strict policy for data entry. > > > > -Ben
