It appears Twitter returns the 'source' field which has a user agent
Most of the time this is an HTML HREF string. I have found one case
where that is not true. "web" will exist as only that string.
Is "web" the only non HTML string I can expect at this time? I'm
logging the event if I get another edge case, but wanted to check.
Regarding regex's to get just the user agent name, and strip the HTML
I've heard some noise on this list about the format changing. I see
some have nofollow, others do not, many variations.
I don't want to chase this around, though I also log non matches here.
My regex looks for '\>.+<\i'
I believe this to be pretty solid, but it feels a little too easy. I'm
also left with the returned match having > and < in the string. Of
course, those are easy enough to replace/trim off.
What do you think of this approach? Any cases others have seen that
would lead this to failure?
Iphone says hello.