On Thursday, September 1, 2016 at 4:21:50 PM UTC-7, Jeremy C. Reed wrote:
>
> Sorry if I overlooked it, but I cannot find documentation or 
> explanation for the "Spam Filtering: User handling" webpage at 
> /admin/spamfilter/user. 
>

As far as I know that's not an oversight, I haven't seen any documentation. 
It would be great if we could write some.
 

> My page currently shows: 
>
> -=-=-=-=-=-=-=-=-= 
> Spam Filtering: User handling (data overview) 126893 entries 
>
>     Overview All Registered Unused 
>
> There are 126893 different entries in the database, 125992 users are 
> registered and 125889 have not been used. 
>
> Old user: __________  New user: __________ 
>
> [Button: Change unauthorized user] 
>
> [Button: Remove 44302 temporary sessions] 
>
> [Button: Convert emails to registered usernames] 
>
> -=-=-=-=-=-=-=-=-= 
>
> I cannot use the All, Registered, or Unused links as they all cause the 
> webserver or webbrowser to time out. 
>

The tables on those pages aren't paginated, so I'm not surprised the page 
load times out with so many entries.

1.0.9dev has a long history: 
https://trac.edgewall.org/log/plugins/1.0/spam-filter

Therefore I bumped the version to 1.0.9 just now, and it would be good if 
you upgraded so we can be certain which version you are running.
 

> What is the definition here of a "user"? What are these many entries? 
>

A user is an authenticated entry in the session table, or a username (of an 
unathenticated session) that is used in a wiki edit, repository change, 
ticket edit, etc ... You might have usernames that don't map to 
authenticated entries if you allow anonymous edits of your site, or if you 
are connected to version control repositories with revision authors that 
don't have accounts in your system.

When a user visits the site a session is created. For anonymous sessions 
the SID (session ID) is just a hash. If the user authenticates, the SID is 
the username. If the session saves a preference such as "days back" on the 
timeline, or username/email in prefs, then an entry is created in the 
session_attribute table.

You can see info on the session and session_attribute tables here:
https://trac.edgewall.org/wiki/TracDev/DatabaseSchema/Common#Tablesession
 

> (My /admin/accounts/users lists only 163 users. Only a few are known 
> spammers and a few are test accounts.) 
>


 

> What does it mean by 125889 users have not been used?  Does this imply 
> that we had that many users over time?  I don't believe that we removed 
> over 125,000 accounts.
>

Did you have a problem with spam registrations at sometime in the past?

One possible explanation I see from looking at the code is that you have a 
lot of session_attribute entries that don't necessary match a session entry.

I think we'll need to look at the sessions using TracAdmin to get more 
info. Try "trac-admin $env session list" and "trac-admin $env session list 
| wc -l". How many entries do you find? 

If you write those out into a spreadsheet and review them, we can count how 
many authenticated sessions there are, how many unauthenticated sessions 
there are, and maybe get some insight into the phantom registered accounts. 
TracAdmin's "session delete" can be used to remove entries you don't want.

It's also highly possible there is a bug in SpamFilter. Some of the code 
quality of the plugin is rather suspect, and there aren't many tests.
 

> I have been the primary Trac admin for the site for around 7 years. I 
> migrated the server a few times, upgraded Trac a few times, but the 
> content (wiki, tickets, user accounts) has never been reset. 
>
> What are the Old user/New user fields and Change unauthorized user 
> button for? (Any example?) 
>

>From the implementation, it looks like the feature is used to change a 
username. When a user adds a ticket comment the username (session id) is 
stored in the database. So if a username is to be changed, the entries have 
to be changed in many tables throughout the database. This is a shortcoming 
of Trac, and it would be ideal if the username wasn't used in the columns 
of so many tables, and something like a foreign key was used instead.

If what I said is true, then the button is poorly named, and was probably 
named per a specific use-case that the author had in mind.
 

> What does "Remove .... temporary sessions mean"? 
>

That purges anonymous session, same as the "trac-admin session remove" 
command:
https://trac.edgewall.org/wiki/TracAdmin

We get many 10's of thousands of anonymous sessions on trac-hacks.org each 
week, presumably this is primarily due to bots and not actual users.

What is "Convert emails to registered usernames" mean? 
>

I guess it's used to convert email address to username, when a user has 
made an edit and used an email address rather than a username as a session 
id. 
This seems like a feature that isn't very useful for most sites.
 

> Any way to access the mode=all, mode=authorized (Registered), or 
> mode=unused pages?  Maybe I can restrict how many are displayed? (Sorry 
> I didn't read the source about this.) 
>

We'll probably have to try to purge entries from the database first using 
TracAdmin.
 

> What is the purpose and  best practice of this /admin/spamfilter/user 
> interface?
>

To remove "unused" account from the database. I haven't looked at the 
specific conditions to label an account as unused, but it's something like: 
more than one year old, and has been inactive: no repository changes, 
ticket edits or wiki edits. It's probably mainly useful for public-facing 
sites.

I've considered integrating SpamFilter into Trac, and now that I look 
closely at this user handling feature, if SpamFilter was integrated to Trac 
I think the "User handling" feature would probably be best as a separate 
plugin. Or, maybe as some of the features of AccountManager are integrated 
into Trac a feature like "Change unauthorized user" (i.e. rename username) 
could be added as an account management feature. Of course, if we fixed how 
usernames were stored in the tables it might not even be necessary.

https://trac.edgewall.org/ticket/12398
 

> Maybe answers in this thread can supplement docs. 
>
> This is Trac 1.0.12 installed from FreeBSD package. The 
> TracAccountManager-0.4.4 and TracSpamFilter-1.0.9 are installed from 
> subversion (then created egg and copied to plugins). 
>
> (By the way, the FreeBSD package for trac-accountmanager-0.5.12583_1,1 
> is incompatible due to: 
> 2016-08-31 16:57:51,588 Trac[loader] ERROR: Skipping 
> "spamfilter.registration = 
> tracspamfilter.filters.registration [account]": (version conflict 
> "VersionConflict: (TracAccountManager 0.5dev-r0 
> (/usr/local/lib/python2.7/site-packages), 
> Requirement.parse('TracAccountManager>=0.4'))") 
>
> I guess it didn't like the "dev-r0" part in the version check. I didn't 
> look at source code to workaround it. So I deinstalled package and 
> installed as mentioned above.) 
>

I think that issue is fixed in SpamFilter and AccountManager, so the latest 
packages just need to be put into FreeBSD package management system:
https://trac-hacks.org/changeset/14554/accountmanagerplugin
https://trac.edgewall.org/changeset/14337/plugins/1.0/spam-filter

You might talk with Greg about that:
https://groups.google.com/d/msg/trac-dev/EUCPZYz0haQ/rNxuqqUUCAAJ

Feel free to contact me directly by email if you would be willing to share 
the output of "session list", which you probably don't want to post 
publicly at least. We could follow-up here with the outcomes.

- Ryan

-- 
You received this message because you are subscribed to the Google Groups "Trac 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/trac-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to