Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-27 Thread Stas Malyshev
Hi!

> It should be a basic requirement of any SPARQL engine that it should be
> able to handle path queries that contain cycles.

So I did some simple checks, and on simple examples Blazegraph handles
cycles just fine. However, on more complex queries, the cycles seem to
be causing trouble. I don't know yet why, I'll look at it further,
probably next week.

So the problem is not "handling cycles" in general, it is handling some
specific data set, and most probably is a consequence of some bug. I'll
report when I have more data about what exactly triggers the bug.
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-27 Thread James Heald

On 27/10/2015 08:42, Stas Malyshev wrote:

Hi!


It should be a basic requirement of any SPARQL engine that it should be
able to handle path queries that contain cycles.


So I did some simple checks, and on simple examples Blazegraph handles
cycles just fine. However, on more complex queries, the cycles seem to
be causing trouble. I don't know yet why, I'll look at it further,
probably next week.

So the problem is not "handling cycles" in general, it is handling some
specific data set, and most probably is a consequence of some bug. I'll
report when I have more data about what exactly triggers the bug.



The key issue with if a graph contains cycles is that you can not then 
just assume that each successive generation of nodes obtained by adding 
another path step are by definition new nodes (as they would be for an 
acyclic graph -- well not entirely, because you might already have 
reached them by a shorter path; but nothing's going to seriously break 
with an acyclic graph if you get this check wrong).


In contrast, with a graph that contains cycles, you need to do some sort 
of hash join with what you have already seen, to specifically identify 
the new nodes.


If the query planner is somehow messing up those hash joins when given 
multiple interrelated path requirements, that could be a source of trouble.


   -- James.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-25 Thread Markus Krötzsch

On 25.10.2015 02:18, Kingsley Idehen wrote:

On 10/24/15 10:51 AM, Markus Krötzsch wrote:

On 24.10.2015 12:29, Martynas Jusevičius wrote:

I don't see how cycle queries can be a requirement for SPARQL engines if
they are not part of SPARQL spec? The closest thing you have is property
paths.


We were talking about *cyclic data* not cyclic queries (which you can
also create easily using BGPs, but that's unrelated here). Apparently,
BlazeGraph has performance issues when computing a path expression
over a cyclic graph.

Markus


Markus,

Out of curiosity, can you share a SPARQL query example (text or query
results url) that demonstrates your point?


You mean a query with BlazeGraph having performance issues? That problem 
was reported by Stas. He should have examples. In any case, it is always 
a combination of query and data.


Markus


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-25 Thread James Heald

On 25/10/2015 09:31, Markus Krötzsch wrote:

On 25.10.2015 02:18, Kingsley Idehen wrote:

On 10/24/15 10:51 AM, Markus Krötzsch wrote:


We were talking about *cyclic data* not cyclic queries (which you can
also create easily using BGPs, but that's unrelated here). Apparently,
BlazeGraph has performance issues when computing a path expression
over a cyclic graph.

Markus


Markus,

Out of curiosity, can you share a SPARQL query example (text or query
results url) that demonstrates your point?


You mean a query with BlazeGraph having performance issues? That problem
was reported by Stas. He should have examples. In any case, it is always
a combination of query and data.



Hi Kingsley,

I had a problem with Blazgraph queries that had path requirements 
containing a compound path predicate, and ending in a variable, eg


   wd:Q289 wdt:P31/wdt:P279* ?o.

However, this particular example now appears to work.  (With the recent 
upgrade of the SPARQL endpoint to the latest Blazegraph production 
release ?)


On the other hand, it appears that path queries can still fail if they 
involve a variable intended to be a fixed constant set by a BIND 
statement (usually the first thing a query engine will do).


So, for example, a query to count incidences of instances of subclasses 
of painting, where the key requirement statement is


  ?a wdt:P31/wdt:P279* wd:Q3305213

runs in about 0.4 seconds.   However, a very similar query where the 
identity of that target superclass is set using a BIND statement,


   BIND (wd:Q3305213 AS ?class) .
   ?a wdt:P31/wdt:P279* ?class .

times out -- or rather: it ought to be reporting that it has timed out, 
and used to, but now it doesn't throw a "Query Timed Out" error, but 
instead now after 120 seconds returns an (incorrect) count of zero. (An 
additional, new bug).


Complete versions of these queries can be found at
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/suggestions#Path_assertions_that_end_in_a_variable_can_blow_up

and as a Blazegraph bug at

https://jira.blazegraph.com/browse/BLZG-1543

(although, as with a couple of other issues described on the same wiki 
page linked above, that I've filed a Blazegraph bug for, there doesn't 
seem to be any indication that anybody has actually read the bug...)



I'm not sure if Stas knows of other current issues with path queries.

I did post a complaint to this list, just after the query service was 
publicly announced, that path queries seemed very slow.  They *are* 
still slower than the equivalent search on WDQ.  But I think it was this 
issue with binding variables that was underlying the worst of what I was 
seeing.


As for cyclical paths, as I posted a couple of days ago, the queries at
https://www.wikidata.org/wiki/Wikidata:WikiProject_Names/given-name_variants
for counting up incidences of given-name variants involve graphs that 
are anything but directed (based on the P460 "said to be the same as" 
property), and Blazegraph seems to handle them without any particular 
difficulty; though it's possible that there may have been earlier 
problems when the service was still at an alpha stage.


  -- James.




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-25 Thread James Heald

The standard algorithm for a path search is very simple:

  Keep adding a new generation of links, until the new link brings in 
no node not already seen.



This works for graphs of equivalence relations, it works for directed 
acyclic graphs.


It's not the /graphs/ that are causing the problem here, because 
Blazegraph can handle either of them by themself and give the right answer.


Rather, in the query like:

SELECT (COUNT(DISTINCT(?city)) AS ?count) WHERE {
  ?city wdt:P31/wdt:P279* wd:Q515 .  # find instances of subclasses of city
  ?city wdt:P131* wd:Q1202 .
}

something is going wrong with the way Blazegraph handles the two 
conditions *together*.



I suspect this may be closely related to whatever is going wrong with a 
query like:


SELECT (COUNT(DISTINCT(?a)) AS ?count) WHERE {
   BIND (wd:Q3305213 AS ?class) .
   ?a wdt:P31/wdt:P279* ?class .
}

which times out.;


It's the plan of joins which is going wrong, not whether the graph is 
acyclic or not.


  -- James.



On 25/10/2015 17:53, Daniel Kinzler wrote:

"Said to be the same as" is a good example of a case where cycles are
unavoidable. A possible workaround in this case is to make sure that the
transitive closure of "said to be the same as" is already in the data, such that
the path "P460+" returns the same results as a mere "P460" would. It's not
ideal, but maybe workable.


I think we have to distinguish between different use cases:

1) Antisymmetric transitive relations, like subclass-of or part-of, which should
form an acyclic graph. For these, the "*" notation in sparql can be used to
query a sub-graph, such as all kinds of cars or all places in Idaho. This is our
primary use case for path traversal, I believe

2) Symmetric transitive relations, such as "said to be the same as". These
(should) form small "islands" of fully connected graphs that are (hopefully)
unconnected to each other. Here, the "*" notation can be used to include the
entire clique instead of only a single node in a query. This might be useful in
some cases, but doesn't strike me as a typical use case.

3) Cycles in non-transitive properties: these are not errors at all, and
problems only arise when such properties as used in a query as if they were
transitive. We could perhaps detect and reject attempts to apply the "*"
notation to properties that are not transitive.

4) Intransitive symmetrical relations (e.g. "souse of"). Do we need any special
handling for them, or do they just get treated like (3)?


Anyway: we need a solution for (1) that allows transitive queries, and a
solution for (3) that prevents pathological behavior. If we get nice handling
for case (2), that's a bonus, but not a requirement, I think.





___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-25 Thread Daniel Kinzler
Am 25.10.2015 um 19:20 schrieb James Heald:
> It's not the /graphs/ that are causing the problem here, because Blazegraph 
> can
> handle either of them by themself and give the right answer.

That's an interesting observation, would you add your examples to
?

-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-25 Thread Daniel Kinzler
Am 25.10.2015 um 19:50 schrieb Daniel Kinzler:
> Am 25.10.2015 um 19:20 schrieb James Heald:
>> It's not the /graphs/ that are causing the problem here, because Blazegraph 
>> can
>> handle either of them by themself and give the right answer.
> 
> That's an interesting observation, would you add your examples to
> ?

Oh, you just did :) thanks!


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-24 Thread Markus Krötzsch

On 24.10.2015 12:29, Martynas Jusevičius wrote:

I don't see how cycle queries can be a requirement for SPARQL engines if
they are not part of SPARQL spec? The closest thing you have is property
paths.


We were talking about *cyclic data* not cyclic queries (which you can 
also create easily using BGPs, but that's unrelated here). Apparently, 
BlazeGraph has performance issues when computing a path expression over 
a cyclic graph.


Markus



On Sat, 24 Oct 2015 at 09:37, James Heald > wrote:

On 24/10/2015 00:50, Stas Malyshev wrote:
 > Hi!
 >
 >> least one Wikipedia) are considered to refer to equivalent
classes on
 >> Wikidata, which could be expressed by a small subclass-of cycle. For
 >
 > We can do it, but I'd rather we didn't. The reason is that it would
 > require engine that queries such data (e.g. SPARQL engine) to be
 > comfortable with cycles in property paths (especially ones with + and
 > *), and not every one is (Blazegraph for example looks like does not
 > handle them out of the box). It can be dealt with, I assume, but why
 > create trouble for ourselves?

It should be a basic requirement of any SPARQL engine that it should be
able to handle path queries that contain cycles.

For example, consider equivalence relationships like P460 "said to be
the same as", which is being used to link given names together.

If we want to find all the names in a particular equivalence class, and
eg rank them by their incidence count, as is done in the 'query'
columns at
https://www.wikidata.org/wiki/Wikidata:WikiProject_Names/given-name_variants

then being able to handle cycles in path queries is a basic requirement
for the job.

 -- James.


___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-24 Thread Kingsley Idehen
On 10/24/15 10:51 AM, Markus Krötzsch wrote:
> On 24.10.2015 12:29, Martynas Jusevičius wrote:
>> I don't see how cycle queries can be a requirement for SPARQL engines if
>> they are not part of SPARQL spec? The closest thing you have is property
>> paths.
>
> We were talking about *cyclic data* not cyclic queries (which you can
> also create easily using BGPs, but that's unrelated here). Apparently,
> BlazeGraph has performance issues when computing a path expression
> over a cyclic graph.
>
> Markus 

Markus,

Out of curiosity, can you share a SPARQL query example (text or query
results url) that demonstrates your point?

-- 
Regards,

Kingsley Idehen   
Founder & CEO 
OpenLink Software 
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this



smime.p7s
Description: S/MIME Cryptographic Signature
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-24 Thread James Heald

On 24/10/2015 00:50, Stas Malyshev wrote:

Hi!


least one Wikipedia) are considered to refer to equivalent classes on
Wikidata, which could be expressed by a small subclass-of cycle. For


We can do it, but I'd rather we didn't. The reason is that it would
require engine that queries such data (e.g. SPARQL engine) to be
comfortable with cycles in property paths (especially ones with + and
*), and not every one is (Blazegraph for example looks like does not
handle them out of the box). It can be dealt with, I assume, but why
create trouble for ourselves?


It should be a basic requirement of any SPARQL engine that it should be 
able to handle path queries that contain cycles.


For example, consider equivalence relationships like P460 "said to be 
the same as", which is being used to link given names together.


If we want to find all the names in a particular equivalence class, and 
eg rank them by their incidence count, as is done in the 'query' columns at

https://www.wikidata.org/wiki/Wikidata:WikiProject_Names/given-name_variants

then being able to handle cycles in path queries is a basic requirement 
for the job.


   -- James.


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-24 Thread Thomas Douillard
> Blazegraph for example looks like does not
handle them out of the box

As Wikidata is an Open Wiki, I think we can't avoid the query engine having
to deal with cycles from time to times. I can't imagine the Wikidata query
engine having troubles with cycles. It must be robust.

2015-10-24 1:50 GMT+02:00 Stas Malyshev :

> Hi!
>
> > least one Wikipedia) are considered to refer to equivalent classes on
> > Wikidata, which could be expressed by a small subclass-of cycle. For
>
> We can do it, but I'd rather we didn't. The reason is that it would
> require engine that queries such data (e.g. SPARQL engine) to be
> comfortable with cycles in property paths (especially ones with + and
> *), and not every one is (Blazegraph for example looks like does not
> handle them out of the box). It can be dealt with, I assume, but why
> create trouble for ourselves?
>
> > We also have/had cycles involving instance-of, which is definitely an
> > error. ;-)
>
> Right. So I think we need to mark properties that should not form cycles
> with
> https://www.wikidata.org/wiki/Q18647519 (asymmetric property) and have
> constraints checking scripts/bots find out such cases and alert about them.
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-24 Thread Markus Krötzsch

On 24.10.2015 09:36, James Heald wrote:

On 24/10/2015 00:50, Stas Malyshev wrote:

Hi!


least one Wikipedia) are considered to refer to equivalent classes on
Wikidata, which could be expressed by a small subclass-of cycle. For


We can do it, but I'd rather we didn't. The reason is that it would
require engine that queries such data (e.g. SPARQL engine) to be
comfortable with cycles in property paths (especially ones with + and
*), and not every one is (Blazegraph for example looks like does not
handle them out of the box). It can be dealt with, I assume, but why
create trouble for ourselves?


It should be a basic requirement of any SPARQL engine that it should be
able to handle path queries that contain cycles.

For example, consider equivalence relationships like P460 "said to be
the same as", which is being used to link given names together.

If we want to find all the names in a particular equivalence class, and
eg rank them by their incidence count, as is done in the 'query' columns at
https://www.wikidata.org/wiki/Wikidata:WikiProject_Names/given-name_variants


then being able to handle cycles in path queries is a basic requirement
for the job.


I agree. Even if we discourage cycles in other cases, there is still no 
guarantee that there won't be any, so the engine should be robust 
against this.


On the other hand, we have to live with the technical infrastructure we 
got. If BlazeGraph does not handle cycles well, we should encourage 
their team to work on fixing this, but at the same time we need to work 
around the issue for a while.


"Said to be the same as" is a good example of a case where cycles are 
unavoidable. A possible workaround in this case is to make sure that the 
transitive closure of "said to be the same as" is already in the data, 
such that the path "P460+" returns the same results as a mere "P460" 
would. It's not ideal, but maybe workable.


Markus


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-23 Thread Gerard Meijssen
Hoi,
The problem with tools like this is that they get a moment attention.
Particularly when they are stand alone, not integrated, they will lose
interest.

Would it be an option to host this tool on Labs?
Thanks,
 GerardM

On 22 October 2015 at 21:27, Markus Kroetzsch <
markus.kroetz...@tu-dresden.de> wrote:

> On 22.10.2015 19:29, Dario Taraborelli wrote:
>
>> I’m constantly getting 500 errors.
>>
>>
> I also observed short outages in the past, and I sometimes had to run a
> request twice to get an answer. It seems that the hosting on bitbucket is
> not very reliable. At the moment, this is still a first preview of the tool
> without everything set up as it should be. The tool should certainly move
> to Wikimedia labs in the future.
>
> Markus
>
>
>
> --
> Markus Kroetzsch
> Faculty of Computer Science
> Technische Universität Dresden
> +49 351 463 38486
> http://korrekt.org/
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-23 Thread Lydia Pintscher
On Thu, Oct 22, 2015 at 5:31 PM, Markus Kroetzsch
 wrote:
> Hi all,
>
> I am happy to announce a new tool [1], written by Serge Stratan, which
> allows you to browse the taxonomy (subclass of & instance of relations)
> between Wikidata's most important class items.

Nice work! Thanks for sharing.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-23 Thread Thomas Douillard
Integration is the purpose of templates like Q' with Reasonator or P' and
Item Documentation
, I don't know if they are actually use. Templates like Query have a
limited success however

2015-10-23 11:16 GMT+02:00 Gerard Meijssen :

> Hoi,
> The problem with tools like this is that they get a moment attention.
> Particularly when they are stand alone, not integrated, they will lose
> interest.
>
> Would it be an option to host this tool on Labs?
> Thanks,
>  GerardM
>
> On 22 October 2015 at 21:27, Markus Kroetzsch <
> markus.kroetz...@tu-dresden.de> wrote:
>
>> On 22.10.2015 19:29, Dario Taraborelli wrote:
>>
>>> I’m constantly getting 500 errors.
>>>
>>>
>> I also observed short outages in the past, and I sometimes had to run a
>> request twice to get an answer. It seems that the hosting on bitbucket is
>> not very reliable. At the moment, this is still a first preview of the tool
>> without everything set up as it should be. The tool should certainly move
>> to Wikimedia labs in the future.
>>
>> Markus
>>
>>
>>
>> --
>> Markus Kroetzsch
>> Faculty of Computer Science
>> Technische Universität Dresden
>> +49 351 463 38486
>> http://korrekt.org/
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-23 Thread Markus Krötzsch

On 23.10.2015 11:16, Gerard Meijssen wrote:

Hoi,
The problem with tools like this is that they get a moment attention.
Particularly when they are stand alone, not integrated, they will lose
interest.


Problems, problems, ...



Would it be an option to host this tool on Labs?


Yes, this is planned for the future, especially to automate regular data 
updates, which Serge now has to do manually. Besides the changed URL, 
this move would make a big difference for users. What you see right now 
is a first prototype beta-release that is meant to gather user feedback 
on how to develop this tool further.


Markus



On 22 October 2015 at 21:27, Markus Kroetzsch
>
wrote:

On 22.10.2015 19:29, Dario Taraborelli wrote:

I’m constantly getting 500 errors.


I also observed short outages in the past, and I sometimes had to
run a request twice to get an answer. It seems that the hosting on
bitbucket is not very reliable. At the moment, this is still a first
preview of the tool without everything set up as it should be. The
tool should certainly move to Wikimedia labs in the future.

Markus



--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486 
http://korrekt.org/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-23 Thread Stas Malyshev
Hi!

> least one Wikipedia) are considered to refer to equivalent classes on
> Wikidata, which could be expressed by a small subclass-of cycle. For

We can do it, but I'd rather we didn't. The reason is that it would
require engine that queries such data (e.g. SPARQL engine) to be
comfortable with cycles in property paths (especially ones with + and
*), and not every one is (Blazegraph for example looks like does not
handle them out of the box). It can be dealt with, I assume, but why
create trouble for ourselves?

> We also have/had cycles involving instance-of, which is definitely an
> error. ;-)

Right. So I think we need to mark properties that should not form cycles
with
https://www.wikidata.org/wiki/Q18647519 (asymmetric property) and have
constraints checking scripts/bots find out such cases and alert about them.
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-22 Thread Dario Taraborelli
I’m constantly getting 500 errors.

> On Oct 22, 2015, at 10:25 AM, Thomas Douillard  
> wrote:
> 
> Great tool ! The error detection is precious !
> 
> 2015-10-22 17:31 GMT+02:00 Markus Kroetzsch  >:
> Hi all,
> 
> I am happy to announce a new tool [1], written by Serge Stratan, which allows 
> you to browse the taxonomy (subclass of & instance of relations) between 
> Wikidata's most important class items. For example, here is the Wikidata 
> taxonomy for Pizza (discussed recently on this list):
> 
> http://sergestratan.bitbucket.org?draw=true=s0=177,2095,7802,28877,35120,223557,386724,488383,666242,736427,746549,2424752,1513,16686448
>  
> 
> 
> 
> == What you see there ==
> 
> Solid green lines mean "subclass of" relations (subclasses are lower), while 
> dashed purple lines are "instance of" relations (instances are lower). Drag 
> and zoom the view as usual. Hover over items for more information. Click on 
> arrows with numbers to display upper or lower neighbours. Right-click on 
> classes to get more options.
> 
> The sidebar on the left shows statistics and presumed problems in the data 
> (redundancies and likely errors). You can select a report type to see the 
> reports, and click on any line to show the error. If you search for a class 
> in the search field, the errors will be narrowed down to issues related to 
> the taxonomy of this class.
> 
> The toolbar at the top has options to show and hide items based on the 
> current selection (left click on any box).
> 
> Edges in red are the wrong way around (top to bottom). This occurs only when 
> there are cycles in the "taxonomy".
> 
> 
> == Micro tutorial ==
> 
> (1) Enter "Unicorn" in the search box, press return.
> (2) Zoom out a bit by scrolling your mouse/touchpad
> (3) Click on the "Unicorn" item box. It becomes blue (selected).
> (4) Click "Expand up" in the toolbar at the top
> (5) Zoom out to see the taxonomy of unicorn
> (6) Find the class "Fictional Horse" (directly above unicorn) and click its 
> downwards arrow labelled "3" to see all three children items of "fictional 
> horse".
> (7) Click the share button on the top right to get a link to this view.
> 
> You can also create your own share link manually by just changing the Qids in 
> the URL as you like.
> 
> 
> == Status and limitations ==
> 
> This is a prototype and it still has some limits:
> 
> * It only shows "proper" classes that have at least one instance or subclass. 
> This is to reduce the overall data size and load time.
> * The data is based on dumps (the date is shown on the right). It is not a 
> live view.
> * The layout is sometimes too dense. You can find a "hidden" option to make 
> it more spacy behind the sidebar (click "Sidebar" to see it). This helps to 
> disentangle larger graphs.
> * There are some minor bugs in the UI. You sometimes need to click more than 
> once until the right thing happens.
> * The help page at http://sergestratan.bitbucket.org/howtouse.html 
>  does not explain everything 
> in detail yet.
> 
> It is planned to work on some of these limitations in the future.
> 
> The hope is that this tool will reveal many errors in Wikidata's taxonomy 
> that are otherwise hard to detect. For example, you can see easily that every 
> "Ship" is an "Event" in Wikidata, that every "Hobbit" is a "Fantasy Race", 
> and that every "Monday" is both a "Mathematical object" and a "Unit of 
> measurement".
> 
> Feedback is welcome (on the tool; better start new threads for feedback on 
> the Wikidata taxonomy ;-),
> 
> Markus
> 
> 
> [1] http://sergestratan.bitbucket.org 
> 
> -- 
> Markus Kroetzsch
> Faculty of Computer Science
> Technische Universität Dresden
> +49 351 463 38486 
> http://korrekt.org/ 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/wikidata 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



Dario Taraborelli  Head of Research, Wikimedia Foundation
wikimediafoundation.org  • nitens.org 
 • @readermeter 
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-22 Thread Benjamin Good
I am having the same kinds of 500 problems.  Bitbucket is generally
suffering today:  http://status.bitbucket.org

On Thu, Oct 22, 2015 at 12:27 PM, Markus Kroetzsch <
markus.kroetz...@tu-dresden.de> wrote:

> On 22.10.2015 19:29, Dario Taraborelli wrote:
>
>> I’m constantly getting 500 errors.
>>
>>
> I also observed short outages in the past, and I sometimes had to run a
> request twice to get an answer. It seems that the hosting on bitbucket is
> not very reliable. At the moment, this is still a first preview of the tool
> without everything set up as it should be. The tool should certainly move
> to Wikimedia labs in the future.
>
> Markus
>
>
>
> --
> Markus Kroetzsch
> Faculty of Computer Science
> Technische Universität Dresden
> +49 351 463 38486
> http://korrekt.org/
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-22 Thread Markus Kroetzsch

On 22.10.2015 19:29, Dario Taraborelli wrote:

I’m constantly getting 500 errors.



I also observed short outages in the past, and I sometimes had to run a 
request twice to get an answer. It seems that the hosting on bitbucket 
is not very reliable. At the moment, this is still a first preview of 
the tool without everything set up as it should be. The tool should 
certainly move to Wikimedia labs in the future.


Markus


--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-22 Thread Asaf Bartov
Works for me now.

This is fantastic. :)

Please consider adding it to Hay's tools directory, so more people can
discover it.
https://tools.wmflabs.org/hay/directory/?search=taxonomy#/search/taxonomy

   A.

On Thu, Oct 22, 2015 at 10:29 AM, Dario Taraborelli <
dtarabore...@wikimedia.org> wrote:

> I’m constantly getting 500 errors.
>
> On Oct 22, 2015, at 10:25 AM, Thomas Douillard 
> wrote:
>
> Great tool ! The error detection is precious !
>
> 2015-10-22 17:31 GMT+02:00 Markus Kroetzsch <
> markus.kroetz...@tu-dresden.de>:
>
>> Hi all,
>>
>> I am happy to announce a new tool [1], written by Serge Stratan, which
>> allows you to browse the taxonomy (subclass of & instance of relations)
>> between Wikidata's most important class items. For example, here is the
>> Wikidata taxonomy for Pizza (discussed recently on this list):
>>
>>
>> http://sergestratan.bitbucket.org?draw=true=s0=177,2095,7802,28877,35120,223557,386724,488383,666242,736427,746549,2424752,1513,16686448
>> 
>>
>>
>> == What you see there ==
>>
>> Solid green lines mean "subclass of" relations (subclasses are lower),
>> while dashed purple lines are "instance of" relations (instances are
>> lower). Drag and zoom the view as usual. Hover over items for more
>> information. Click on arrows with numbers to display upper or lower
>> neighbours. Right-click on classes to get more options.
>>
>> The sidebar on the left shows statistics and presumed problems in the
>> data (redundancies and likely errors). You can select a report type to see
>> the reports, and click on any line to show the error. If you search for a
>> class in the search field, the errors will be narrowed down to issues
>> related to the taxonomy of this class.
>>
>> The toolbar at the top has options to show and hide items based on the
>> current selection (left click on any box).
>>
>> Edges in red are the wrong way around (top to bottom). This occurs only
>> when there are cycles in the "taxonomy".
>>
>>
>> == Micro tutorial ==
>>
>> (1) Enter "Unicorn" in the search box, press return.
>> (2) Zoom out a bit by scrolling your mouse/touchpad
>> (3) Click on the "Unicorn" item box. It becomes blue (selected).
>> (4) Click "Expand up" in the toolbar at the top
>> (5) Zoom out to see the taxonomy of unicorn
>> (6) Find the class "Fictional Horse" (directly above unicorn) and click
>> its downwards arrow labelled "3" to see all three children items of
>> "fictional horse".
>> (7) Click the share button on the top right to get a link to this view.
>>
>> You can also create your own share link manually by just changing the
>> Qids in the URL as you like.
>>
>>
>> == Status and limitations ==
>>
>> This is a prototype and it still has some limits:
>>
>> * It only shows "proper" classes that have at least one instance or
>> subclass. This is to reduce the overall data size and load time.
>> * The data is based on dumps (the date is shown on the right). It is not
>> a live view.
>> * The layout is sometimes too dense. You can find a "hidden" option to
>> make it more spacy behind the sidebar (click "Sidebar" to see it). This
>> helps to disentangle larger graphs.
>> * There are some minor bugs in the UI. You sometimes need to click more
>> than once until the right thing happens.
>> * The help page at http://sergestratan.bitbucket.org/howtouse.html does
>> not explain everything in detail yet.
>>
>> It is planned to work on some of these limitations in the future.
>>
>> The hope is that this tool will reveal many errors in Wikidata's taxonomy
>> that are otherwise hard to detect. For example, you can see easily that
>> every "Ship" is an "Event" in Wikidata, that every "Hobbit" is a "Fantasy
>> Race", and that every "Monday" is both a "Mathematical object" and a "Unit
>> of measurement".
>>
>> Feedback is welcome (on the tool; better start new threads for feedback
>> on the Wikidata taxonomy ;-),
>>
>> Markus
>>
>>
>> [1] http://sergestratan.bitbucket.org
>>
>> --
>> Markus Kroetzsch
>> Faculty of Computer Science
>> Technische Universität Dresden
>> +49 351 463 38486
>> http://korrekt.org/
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
>
> *Dario Taraborelli  *Head of Research, Wikimedia Foundation
> wikimediafoundation.org • nitens.org • @readermeter
> 
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>


-- 
Asaf Bartov
Wikimedia Foundation