Re: [WikiEN-l] Custom Google search engines for finding RSs for subject areas

Gwern Branwen Sat, 23 Jan 2010 07:02:36 -0800

On Sat, Jan 23, 2010 at 4:31 AM, Carcharoth <carcharot...@googlemail.com> wrote:
> On Sat, Jan 23, 2010 at 3:21 AM, Gwern Branwen <gwe...@gmail.com> wrote:
>> On Fri, Jan 22, 2010 at 8:45 PM, K. Peachey <p858sn...@yahoo.com.au> wrote:
>>> On Sat, Jan 23, 2010 at 3:00 AM, Gwern Branwen <gwe...@gmail.com> wrote:
>>>> ...snip...
>>>> I started with all the links listed in
>>>> https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:WikiProject_Anime_and_manga/Online_reliable_sources
>>>> and then began running searches on random topics and pruning based on
>>>> that - chucking sites into the blacklist sinbin, or finding good sites
>>>> omitted from the list and adding them to the whitelist. At last count,
>>>> I had 200 sites on the nice list and 311 on the naughty list (but this
>>>> counts things like the Mirrors page as a single link, though they ban
>>>> dozens or hundreds of sites).
>>>> ...snip...
>>> Perhaps we should encourage more WikiProjects to create lists like the
>>> one displayed then add them into a category and someone could work on
>>> a custom search that suitable to use across the project that is
>>> continuously updated with more allow/black lists.
>>>
>>> -Peachey
>>
>> That would be an excellent idea, especially if they could then all be
>> {{subst}}ed into a single page - just as I can ban every site listed
>> in the consolidated WP:MIRRO page, so too I can *include* every site
>> listed on a page. It would probably be superior to the current AfD
>> template with just some normal Google/Books/News searches.
>
> Does your custom search aggregate books, news, and scholar searches,
> as well as ordinary web searches?


I put in the Books/News/Scholar URLs, but I'm unsure it did anything.
For example, AFAIK, a site search of Google books will only turn up
the homepage for a book - the metadata, reviews, etc; the actual OCR
page contents are part of the 'deep web' you can get at only through
the actual Google search box. One might think that Google's custom
search might recognize the Google service URLs and run the deep web
queries and not just query the surface details - but that seems to be
too much to expect. (So I am perhaps a little hasty in suggesting a
universal CSE would replace the AfD searches.)

> Those are the four Google searches I
> use most often, and it is interesting to see how some subjects get
> more coverage in one area of the information metasphere than other
> areas. It is all quite logical when you think about when the topic
> received most coverage. The one thing I still find that is lacking a
> lot is Google News - a lots of old newspapers still seem to need to be
> searched on separate databases. What is the best database out there
> for searching in old newspapers?
>
> Carcharoth

I don't know of any good non-proprietary old newspaper database, personally.

-- 
gwern

_______________________________________________
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l

Re: [WikiEN-l] Custom Google search engines for finding RSs for subject areas

Reply via email to