Georgi Kobilarov wrote:
>> In this particular one,  it's two articles about the
>> same
>> topic,  but there could be some cases where the two articles are about
>> something different.
>>     
>
> Yes, such as http://en.wikipedia.org/wiki/FROG
> and http://en.wikipedia.org/wiki/Frog
>
> I agree that this can be annoying. One have to make sure to not lose the
> case information (as it happened to me with lookup.dbpedia.org once, hence
> merging FROG and Frog).
>
> But what do you suggest to do about that, Paul? Should Wikipedia make URLs
> case-insensitive and then enforce disambiguation with ()?
>   
    If (wikipedia) were my site,  I'd do two things:

(i) map all case-variant forms to a single form (New yOrK cITy -> New 
York City;)  "FROG" gets renamed to "FROG Cipher" or "Frog (Cipher)"
(ii) do a permanent redirect from variant forms to the canonical form

    I think what dbpedia is doing is reasonable considering the situation.

    My own system for handling generic databases has both a VARBINARY 
and VARCHAR field for dbpedia URLs/labels.  It does a case-insensitive 
lookup first,  and if that fails,  looks at the alternatives that turn 
up.  It's also got some heuristics for dealing with redirects,  
disambiguation,  and all that.  In the big picture I see "naming and 
identity" as a specific functional module for this kind of system...

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to