https://bugzilla.wikimedia.org/show_bug.cgi?id=60484
--- Comment #7 from Nik Everett <[email protected]> --- The biggest case I can think of for excluding text from search is the license information on commons. Please take that as an example. Maybe it is the only example I think it is pretty important. 1. The license information doesn't add a whole lot to the result. Try searching commons with Cirrus for "distribute", "transmit", or "following" and you'll very quickly start to see the text of the CC license. And the searches find 14 million results. Heaven forbid you want to find "distributed transmits" or something. You'll almost exclusively get the license highlighted and you'll still find 14 million results. This isn't _horrible_ because the top results all have "distribute" or "transmit" in the title but it isn't great. 2. Knock on effect from #2: because relevance is calculated based on the inverse of the number of documents that contain the word the then every term in the CC license is worth less then words not in the license. I can't point to any example of why that is bad but I feel it in my bones. Feel free to ignore this. I'm probably paranoid. 3. Entirely self serving: given #1, the contents of the license take up an awful lot of space for very little benefit. If I had more space I could make Cirrus a beta on more wikis. It is kind of a lame reason and I'm attacking the space issue from other angles so maybe it'll be moot long before we get this deployed and convince the community that it is worth doing. 4. Really really self serving: if .nosearch is the right solution and is useful then it is super duper easy to implement. Like one line of code, a few tests, and bam. Its already done, just waiting to be rebased and merged. It was so easy it would have taken longer to estimate the effort then to propose an implementation. I really wouldn't be surprised if someone couldn't come up with great reason why #1 is silly and we just shouldn't do it. The big problem with the nosearch class implementation is that it'd be pretty simple to abuse and hard to catch the abuse because the text is still on the page. One of the nice things about the solution is you could use a web browser's debugger to highlight all the text excluded from search by writing a simple CSS class. -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
