Re: [Wikidata-l] Fwd: Todos for RDF export

2013-01-29 Thread Gregor Hagedorn
Some of our insights into the SMW RDF export (which we found to be
difficult to configure and use):

1. Probably most relevant: total lack of support for xml:lang, which would
have been essential to our purposes.

Wikidata should be planned with support for language in mind.

2. We also found that we had serious problems with managing structure, e.g.
record and subobject. Due to the need to obtain this information
recursively by repeated calls, and because there is no control on the URI
created for these calls, some easy solutions like applying clean-up xslt
will not work. This may not be relevant for wikidata.

3. At first the lack of variable datatype (datatype is fixed per property)
is acceptable. However, we found this a major problem with respect to the
forced distinction between datatype:wiki-page and datatype:global URI
properties. Essentially, SMW forces one to introduce for a semantic
property (e.g. dc:creator) two distinct dummy properties:
property:creator_page and property:creator_uri. Since in RDF export the
artificial distinction between pages and URIs disappears, it would be
desirable to merge them, but only one of them can be set to an imported
vocabulary.

I think this may be relevant to wikidata, where a similar distinction
between properties pointing to a local wikidata item and a global resource
exists.

Gregor

(PS: If any of the problems above in reality does not exist in SMW and we
simply overlooked the solution, I am very happy for corrections, of course!)

>

-- 
-
Dr. G. Hagedorn
+49-(0)30-8304 2220 (work)
+49-(0)30-831 5785 (private)
http://www.linkedin.com/in/gregorhagedorn
https://profiles.google.com/g.m.hagedorn/about

This communication, together with any attachments, is made entirely on my
own behalf and in no way should be deemed to express official positions of
my employer. It is intended only for the person(s) to whom it is addressed.
Redistributing or publishing it without permission may be a violation of
copyright or privacy rights.
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Samat
On Tue, Jan 29, 2013 at 9:45 PM, Amir E. Aharoni <
amir.ahar...@mail.huji.ac.il> wrote:

> 2013/1/29 Samat :
> > On Tue, Jan 29, 2013 at 7:54 PM, Lydia Pintscher
> >  wrote:
> >>
> >> On Tue, Jan 29, 2013 at 7:51 PM, Samat  wrote:
> >> > I agree with you.
> >> > I am also waiting for "somebody", who can change pywiki compatible
> with
> >> > wikidata. I have no time and knowledge for it, but I have a bot (at
> >> > least on
> >> > huwiki, not on wikidata) and I have access to the Hungarian
> Toolserver,
> >> > so I
> >> > could tun this bot for cleaning the wikicode on huwiki and update the
> >> > interwiki links on wikidata. But we need a/the "Somebody" first :)
> >>
> >> Have you looked at the link I posted? What exactly is missing for you
> >> to do what you want to do?
> >>
> >>
> >> Cheers
> >> Lydia
> >
> >
> > Yes, I have.
> > I mean that interwiki.py should do at least the following:
> > * delete interwikis from every article where there is no conflict;
> > * add these interwikis to the relevant page on Wikidata (create this
> page if
> > it doesn't exist yet, change the page if it already exists).
> > As I know, the Hungarian editors are doing this tasks now manually.
> > If there is (are) conflict(s) between interwiki links, it can be the next
> > step.
>
> Well, actually, I wouldn't think that it is immediately urgent. I
> completely understand that this should be done some time soon -
> probably in a couple of weeks from now. But it may be a good idea not
> to use a bot to immediately remove the links from all the
> (non-conflicting) articles until the post-deployment dust settles.
>
> And until the Big Links Remove, if the bots don't re-add the removed
> links by force, that should be enough.
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
>

OK.
I have time (to wait) :)

Samat
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Fwd: Todos for RDF export

2013-01-29 Thread Denny Vrandečić
2013/1/25 Daniel Kinzler 

> Hi!
>
> I thought about the RDF export a bit, and I think we should break this up
> into
> several steps for better tracking. Here is what I think needs to be done:
>
>
Daniel,
I am answering to Wikidata-l, and adding Tpt (since he started working on
something similar), hoping to get more input on the open list.

I especially hope that Markus and maybe Jeroen can provide insight from the
experience with Semantic MediaWiki.

Just to reiterate internally: in my opinion we should learn from the
experience that SMW made here, but we should not immediately try to create
common code for this case. First step should be to create something that
works for Wikibase, and then analyze if we can refactor some code on both
Wikibase and SMW and then have a common library that both build on. This
will give us two running systems that can be tested against while
refactoring. But starting the other way around -- designing a common
library, developing it for both Wikibase and SMW, while keeping SMW's
constraints in mind -- will be much more expensive in terms of resources. I
guess we agree on the end result -- share as much code as possible. But
please let us not *start* with that goal, but rather aim first at the goal
"Get an RDF export for Wikidata". (This is especially true because of the
fact that Wikibase is basically reified all the way through, something SMW
does not have to deal with).

In Semantic MediaWiki, the relevant parts of the code are (if I get it
right):

SMWSemanticData is roughly what we call Wikibase::Entity

includes/export/SMW_ExportController.php - SMWExportController - main
object responsible for creating serializations. Used for configuration, and
then calls the SMWExporter on the relevant data (which it collects itself)
and applies the defined SMWSerializer on the returned SMWExpData.

includes/export/SMW_Exporter.php -  SMWExporter - takes a SMWSemanticData
object and returns a SMWExpData object, which is optimized for being
exported
includes/export/SMW_Exp_Data.php -  SMWExpData - holds the data that is
needed for export
includes/export/SMW_Exp_Element.php - several classes used to represent the
data in SMWExpData. Note that there is some interesting interplay happening
with DataItems and DataValues here.

includes/export/SMW_Serializer.php - SMWSerializer - abstract class for
different serializers
includes/export/SMW_Serializer_RDFXML.php - SMWRDFXMLSerializer -
responsible to create the RDF/XML serialization
includes/export/SWM_Serializer_Turtle.php - SMWTurtleSerializer -
responsible to create the Turtle serialization

special/URIResolver/SMW_SpecialURIResolver.php - SMWURIResolver - Special
page that deals with content negotiation.
special/Export/SMW_SpecialOWLExport.php - SMWSpecialOWLExport - Special
page that serializes a single item.
maintenance/SMW_dumpRDF.php - calling the serialization code to create a
dump of the whole wiki, or of certain entity types. Basically configures a
SMWExportController and let's it do its job.

There are some smart ideas in the way that the ExportController and
Exporter are being called by both the dump script as well as the single
item serializer, and that allow it to scale to almost any size.

Remember that unlike SMW, Wikibase contains mostly reified knowledge. Here
is the spec of how to translate the internal Wikibase representation to
RDF: http://meta.wikimedia.org/wiki/Wikidata/Development/RDF

The other major influence is obviously the MediaWiki API, with its (almost)
clean separation of results and serialization formats. Whereas we can also
get inspired here, the issue is that RDF is a graph based model and the
MediaWiki API is really built for a tree. Therefore I am afraid that we
cannot reuse much here.

Note that this does not mean that the API can not be used to access the
data about entities, but merely that the API answers with tree-based
objects, most prominently the JSON objects described here:
http://meta.wikimedia.org/wiki/Wikidata/Data_model/JSON

So, after this lengthy prelude, let's get to the Todos that Daniel suggests:

* A low-level serializer for RDF triples, with namespace support. Would be
> nice
> if it had support for different forms of output (xml, n3, etc). I suppose
> we can
> just use an existing one, but it needs to be found and tried.
>
>
Re reuse: the thing is that to the best of my knowledge PHP RDF packages
are quite heavyweight (because they also contain parsers, not just
serializers, and often enough SPARQL processors and support for blank nodes
etc.), and it is rare that they support the kind of high-throughput
streaming that we would require for the complete dump (i.e. there is
obviously no point of first setting all triples into a graph model and then
call the model->serialize() method, this needs too much memory). Also some
optimizations that we can use (re ordering of triples, use of namespaces,
some assumptions about the whole dump, etc.). I will ask the Semantic Web
mailing list about

Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Bináris
2013/1/29 Lydia Pintscher 

>
>
> Have you looked at the link I posted? What exactly is missing for you
> to do what you want to do?
>
>
> As far as I see, these are just code fragments, Lego elements to build
something from, but they are not yet integrated into interwiki.py.

-- 
Bináris
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Amir E. Aharoni
2013/1/29 Samat :
> On Tue, Jan 29, 2013 at 7:54 PM, Lydia Pintscher
>  wrote:
>>
>> On Tue, Jan 29, 2013 at 7:51 PM, Samat  wrote:
>> > I agree with you.
>> > I am also waiting for "somebody", who can change pywiki compatible with
>> > wikidata. I have no time and knowledge for it, but I have a bot (at
>> > least on
>> > huwiki, not on wikidata) and I have access to the Hungarian Toolserver,
>> > so I
>> > could tun this bot for cleaning the wikicode on huwiki and update the
>> > interwiki links on wikidata. But we need a/the "Somebody" first :)
>>
>> Have you looked at the link I posted? What exactly is missing for you
>> to do what you want to do?
>>
>>
>> Cheers
>> Lydia
>
>
> Yes, I have.
> I mean that interwiki.py should do at least the following:
> * delete interwikis from every article where there is no conflict;
> * add these interwikis to the relevant page on Wikidata (create this page if
> it doesn't exist yet, change the page if it already exists).
> As I know, the Hungarian editors are doing this tasks now manually.
> If there is (are) conflict(s) between interwiki links, it can be the next
> step.

Well, actually, I wouldn't think that it is immediately urgent. I
completely understand that this should be done some time soon -
probably in a couple of weeks from now. But it may be a good idea not
to use a bot to immediately remove the links from all the
(non-conflicting) articles until the post-deployment dust settles.

And until the Big Links Remove, if the bots don't re-add the removed
links by force, that should be enough.

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Samat
On Tue, Jan 29, 2013 at 7:54 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Tue, Jan 29, 2013 at 7:51 PM, Samat  wrote:
> > I agree with you.
> > I am also waiting for "somebody", who can change pywiki compatible with
> > wikidata. I have no time and knowledge for it, but I have a bot (at
> least on
> > huwiki, not on wikidata) and I have access to the Hungarian Toolserver,
> so I
> > could tun this bot for cleaning the wikicode on huwiki and update the
> > interwiki links on wikidata. But we need a/the "Somebody" first :)
>
> Have you looked at the link I posted? What exactly is missing for you
> to do what you want to do?
>
>
> Cheers
> Lydia
>

Yes, I have.
I mean that interwiki.py should do at least the following:
* delete interwikis from every article where there is no conflict;
* add these interwikis to the relevant page on Wikidata (create this page
if it doesn't exist yet, change the page if it already exists).
As I know, the Hungarian editors are doing this tasks now manually.
If there is (are) conflict(s) between interwiki links, it can be the next
step.

If this features already work I am sorry and I go to run my bot (or first
request for bot approval on Wikidata). :)
If this features don't work now, we need them urgently.

Samat
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Lydia Pintscher
On Tue, Jan 29, 2013 at 7:51 PM, Samat  wrote:
> I agree with you.
> I am also waiting for "somebody", who can change pywiki compatible with
> wikidata. I have no time and knowledge for it, but I have a bot (at least on
> huwiki, not on wikidata) and I have access to the Hungarian Toolserver, so I
> could tun this bot for cleaning the wikicode on huwiki and update the
> interwiki links on wikidata. But we need a/the "Somebody" first :)

Have you looked at the link I posted? What exactly is missing for you
to do what you want to do?


Cheers
Lydia

--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Wikidata

Wikimedia Deutschland e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Samat
On Tue, Jan 29, 2013 at 4:27 PM, Jan Dudík  wrote:

> Bots should remove links on nonconflicted pages on hu.wiki and update
> wikidata too, but no one was able (or willing?) to write this feature
> since 30th october :-(
>
> In next days more and more wikis will be "locked" for interwiki bots,
> but these problems will remain minimally one week after this feature
> exists (one week is necessary for granting bot flag on wikidata - or
> should there be global bots allowed?)
>
> JAnD
>

I agree with you.
I am also waiting for "somebody", who can change pywiki compatible with
wikidata. I have no time and knowledge for it, but I have a bot (at least
on huwiki, not on wikidata) and I have access to the Hungarian Toolserver,
so I could tun this bot for cleaning the wikicode on huwiki and update the
interwiki links on wikidata. But we need a/the "Somebody" first :)

Cheers,
Samat
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Lydia Pintscher
On Tue, Jan 29, 2013 at 4:27 PM, Jan Dudík  wrote:
> And on wikidata are oudtated data, because many new articles are
> created (moved and deleted) daily, but the most used platform -
> pywikipedia is not ready yet for wikidata.

Depending on what you mean with ready it is:
http://www.mediawiki.org/wiki/Manual:Pywikipediabot/Wikidata


Cheers
Lydia

--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Wikidata

Wikimedia Deutschland e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Jan Dudík
As owner of interwiki bot i now see:

Disabling of interwiki bots is about one row in source code of bot,
depend how ofter owners update. So 1-2 days can disable many bots, the
others shoud be blocked for a while

But there is problem now - other wikis still use classic intwrwiki
links and on hu.wiki still remains these links - which are outdated
and in some case with incorrect links too - and this causes interwiki
conflicts on other wikis, because bots read links in this wiki, but
cannot edit them.

And on wikidata are oudtated data, because many new articles are
created (moved and deleted) daily, but the most used platform -
pywikipedia is not ready yet for wikidata.

Bots should remove links on nonconflicted pages on hu.wiki and update
wikidata too, but no one was able (or willing?) to write this feature
since 30th october :-(

In next days more and more wikis will be "locked" for interwiki bots,
but these problems will remain minimally one week after this feature
exists (one week is necessary for granting bot flag on wikidata - or
should there be global bots allowed?)

JAnD


"Bináris"  schrieb:

 >2013/1/28 Amir Ladsgroup 
 >
 >> What is exact time of the next deployment (it and he)?
 >>
 >If you want to catch it, join #wikimedia-wikidata on IRC. It was great
 >to
 >follow it on D-day!
 >
 >
 >> And what time you think is best to disable interwiki bots?
 >>
 >Xqt can modify the code, but pywiki is not deployed, it is updated by
 >bot
 >owners, so there is no chance to focus it on one hour. For this reason
 >I
 >would say to begin it after deployment of Wikibase as otherwise one
 >should
 >do it at least 1 or 2 days before which would cause a maintenance
 >pause.
 >Yes, people will try to remove iws and some of them will be put back by
 >bots.

> Would it also make sense to write a bot putting the remaining iws to 
> >wikidata and rmoving them from the wiki if they can be replaced by t>hem 
> from wikidata?

> Marco

-- 
--
Ing. Jan Dudík

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] wikidata-test-client is acting as a Hebrew client until deployment

2013-01-29 Thread Silke Meyer
Hi!
At the moment, wikidata-test-client.wikimedia.de/wiki is configured to
act as Hebrew client to wikidata-test-repo.wikimedia.de/wiki. wmf8 can
be tested there until tomorrow's deployment. (Sorry for the remaining
auto-imported content.)
Best,

-- 
Silke Meyer
Systemadministratorin und Projektassistenz Wikidata

Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. (030) 219 158 260

http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt
für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] RDBMSes in Wikidata

2013-01-29 Thread Sumana Harihareswara
(was "Database used by wikidata")

We would LOVE for more developers or systems administrators to help
support MediaWiki core on other RDBMSes; one of the most helpful things
you can do is help look at the changes other developers are making that
might affect your preferred RDBMS, and provide testing and code review.

https://www.mediawiki.org/wiki/Code_review_guide

For instance, here's a search for not-yet-merged commits that mention
PostgreSQL in their commit summaries:

https://gerrit.wikimedia.org/r/#/q/message:postg,n,z

Anyone can comment on those code changes in Gerrit by getting "developer
access" which you can get instantly.  See
https://www.mediawiki.org/wiki/Developer_access .

Please see https://www.mediawiki.org/wiki/Database_testing for lists of
things to test, and feel free to add to "Creating a test plan for
databases".

As Aude pointed out, there is at least one db in the WMF cluster now
running MariaDB.  More details:
http://lists.wikimedia.org/pipermail/wikitech-l/2012-December/064994.html
-- 
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Ed Summers
On Tue, Jan 29, 2013 at 7:29 AM, Denny Vrandečić
 wrote:
> Are the numbers absolute, or a sample of out of a thousand?

I believe they are absolute. I'll see if I can figure out what's going
on by asking over on the analytics list.

//Ed

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata synchronization (was: Re: getting some stats for the Hungarian Wikipedia)

2013-01-29 Thread Ed Summers
On Tue, Jan 29, 2013 at 7:14 AM, Daniel Kinzler
 wrote:
> 3rd party clients which want to embed data from Wikidata, but cannot access 
> the
> database directly, are not yet supported. We have designed the architecture 
> in a
> way that should allow us to support them easily enough, but the necessary
> mechanisms are not yet implemented.
>
> The plan is to eventually implement "remote" clients that fetch data via the
> API, and get notifications pushed to them probably via PubsubHubbub. I would
> very much like to see this, but our priority is to get Wikimedia sites feature
> complete first.

Being able to talk to the database directly does simplify things
greatly I imagine, and I can completely understand wanting to focus on
Wikimedia sites first. Thanks for the details.

//Ed

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Denny Vrandečić
Yes, that would be very helpful. I am not sure who the right person to poke
is.

Are the numbers absolute, or a sample of out of a thousand?


2013/1/29 Ed Summers 

> I just checked with a new stats file, and the wikidata page I accessed
> during the hour was not recorded in the file. So my suspicion is the
> page views are not getting recorded. I could double check with the
> analytics folks to see what the best course of action is if that is
> helpful.
>
> //Ed
>
> On Tue, Jan 29, 2013 at 7:09 AM, Ed Summers  wrote:
> > On Tue, Jan 29, 2013 at 6:02 AM, Daniel Kinzler
> >  wrote:
> >> But that's only for editing. Viewing should show up in the same way it
> shows for
> >> other wikis.
> >
> > Thanks. Maybe I'm not looking correctly (or enough) but I don't see
> > any wikidata pages being accessed, for example:
> >
> > 2013-01/pagecounts-20130129-11.gz | zcat - | egrep ' Q\d+ '
> > de Q10 3 17060
> > de Q7 1 20607
> > en Q1 4 26849
> > en Q10 2 16419
> > en Q100 2 15580
> > en Q106 1 8122
> > en Q17 1 9697
> > en Q2 9 45346
> > en Q22 1 7835
> > en Q3 1 8520
> > en Q35 1 377
> > en Q374 3 21466
> > en Q4 9 57882
> > en Q400 1 34656
> > en Q6700 1 29519
> > en Q711 1 7148
> > en Q8 2 78309
> > en Q9 1 274
> > en Q9450 1 29412
> > en Q96 1 11959
> > es Q2 1 29036
> > es Q4 1 9167
> > fr Q1 1 8243
> > fr Q400 6 145497
> > hu Q10 3 210830
> > it Q4 2 14962
> > ja Q10 24 583983
> > ko Q10 1 13121
> > ko Q3 1 7785
> > nl Q7 1 365
> > ru Q4000 1 401
> > zh Q1 1 10929
> > zh Q10 3 41551
> >
> > I suppose it's possible that nobody accessed wikidata.org during that
> > hour. I will check again in an hour after I accessed some pages :-)
> >
> >> It's talking directly to the database.
> >
> > Ok. From looking very quickly at pollForChanges I guess the polling
> > doesn't use the API either? Does that mean that users of Wikidata who
> > want to keep up to date with changes need to be hosted in the
> > Wikimedia datacenter and granted read access to the Wikidata database?
> >
> > //Ed
>
> ___
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>



-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Ed Summers
I just checked with a new stats file, and the wikidata page I accessed
during the hour was not recorded in the file. So my suspicion is the
page views are not getting recorded. I could double check with the
analytics folks to see what the best course of action is if that is
helpful.

//Ed

On Tue, Jan 29, 2013 at 7:09 AM, Ed Summers  wrote:
> On Tue, Jan 29, 2013 at 6:02 AM, Daniel Kinzler
>  wrote:
>> But that's only for editing. Viewing should show up in the same way it shows 
>> for
>> other wikis.
>
> Thanks. Maybe I'm not looking correctly (or enough) but I don't see
> any wikidata pages being accessed, for example:
>
> 2013-01/pagecounts-20130129-11.gz | zcat - | egrep ' Q\d+ '
> de Q10 3 17060
> de Q7 1 20607
> en Q1 4 26849
> en Q10 2 16419
> en Q100 2 15580
> en Q106 1 8122
> en Q17 1 9697
> en Q2 9 45346
> en Q22 1 7835
> en Q3 1 8520
> en Q35 1 377
> en Q374 3 21466
> en Q4 9 57882
> en Q400 1 34656
> en Q6700 1 29519
> en Q711 1 7148
> en Q8 2 78309
> en Q9 1 274
> en Q9450 1 29412
> en Q96 1 11959
> es Q2 1 29036
> es Q4 1 9167
> fr Q1 1 8243
> fr Q400 6 145497
> hu Q10 3 210830
> it Q4 2 14962
> ja Q10 24 583983
> ko Q10 1 13121
> ko Q3 1 7785
> nl Q7 1 365
> ru Q4000 1 401
> zh Q1 1 10929
> zh Q10 3 41551
>
> I suppose it's possible that nobody accessed wikidata.org during that
> hour. I will check again in an hour after I accessed some pages :-)
>
>> It's talking directly to the database.
>
> Ok. From looking very quickly at pollForChanges I guess the polling
> doesn't use the API either? Does that mean that users of Wikidata who
> want to keep up to date with changes need to be hosted in the
> Wikimedia datacenter and granted read access to the Wikidata database?
>
> //Ed

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Wikidata synchronization (was: Re: getting some stats for the Hungarian Wikipedia)

2013-01-29 Thread Daniel Kinzler
On 29.01.2013 13:09, Ed Summers wrote:
> Ok. From looking very quickly at pollForChanges I guess the polling
> doesn't use the API either? Does that mean that users of Wikidata who
> want to keep up to date with changes need to be hosted in the
> Wikimedia datacenter and granted read access to the Wikidata database?

Both, the now deprecated pollForChanges and the new dispatchChanges directly
poll the repo's database, and directly push to the client's database.

3rd party clients which want to embed data from Wikidata, but cannot access the
database directly, are not yet supported. We have designed the architecture in a
way that should allow us to support them easily enough, but the necessary
mechanisms are not yet implemented.

The plan is to eventually implement "remote" clients that fetch data via the
API, and get notifications pushed to them probably via PubsubHubbub. I would
very much like to see this, but our priority is to get Wikimedia sites feature
complete first.

-- daniel

-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Ed Summers
On Tue, Jan 29, 2013 at 6:02 AM, Daniel Kinzler
 wrote:
> But that's only for editing. Viewing should show up in the same way it shows 
> for
> other wikis.

Thanks. Maybe I'm not looking correctly (or enough) but I don't see
any wikidata pages being accessed, for example:

2013-01/pagecounts-20130129-11.gz | zcat - | egrep ' Q\d+ '
de Q10 3 17060
de Q7 1 20607
en Q1 4 26849
en Q10 2 16419
en Q100 2 15580
en Q106 1 8122
en Q17 1 9697
en Q2 9 45346
en Q22 1 7835
en Q3 1 8520
en Q35 1 377
en Q374 3 21466
en Q4 9 57882
en Q400 1 34656
en Q6700 1 29519
en Q711 1 7148
en Q8 2 78309
en Q9 1 274
en Q9450 1 29412
en Q96 1 11959
es Q2 1 29036
es Q4 1 9167
fr Q1 1 8243
fr Q400 6 145497
hu Q10 3 210830
it Q4 2 14962
ja Q10 24 583983
ko Q10 1 13121
ko Q3 1 7785
nl Q7 1 365
ru Q4000 1 401
zh Q1 1 10929
zh Q10 3 41551

I suppose it's possible that nobody accessed wikidata.org during that
hour. I will check again in an hour after I accessed some pages :-)

> It's talking directly to the database.

Ok. From looking very quickly at pollForChanges I guess the polling
doesn't use the API either? Does that mean that users of Wikidata who
want to keep up to date with changes need to be hosted in the
Wikimedia datacenter and granted read access to the Wikidata database?

//Ed

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Daniel Kinzler
On 29.01.2013 11:35, Ed Summers wrote:
> It does appear that wikidata is showing up in there, but it's just one line:
> 
>  undefined//www.wikidata.org/w/api.php 8 50103
> 
> It would be nice to correct the 'undefined' so that it was something
> like 'wd'. Also, it's too bad that we don't actually get to see in the
> logs what article is being looked up via the API, I guess because
> these requests were POSTs instead of GETs.

But that's only for editing. Viewing should show up in the same way it shows for
other wikis.

> I apologize if this has come up before, but is Hungarian Wikipedia
> using the Wikidata API for integration? Or is it talking directly to
> the Wikidata database?

It's talking directly to the database.

-- daniel

-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] getting some stats for the Hungarian Wikipedia

2013-01-29 Thread Ed Summers
Ahh, I see that there was no response to Denny's question about wikidata stats?

I took a look in one of the hourly stats files with this:

curl 
http://dumps.wikimedia.org/other/pagecounts-raw/2013/2013-01/pagecounts-20130128-150001.gz
| zcat - | grep wikidata

It does appear that wikidata is showing up in there, but it's just one line:

 undefined//www.wikidata.org/w/api.php 8 50103

It would be nice to correct the 'undefined' so that it was something
like 'wd'. Also, it's too bad that we don't actually get to see in the
logs what article is being looked up via the API, I guess because
these requests were POSTs instead of GETs.

I apologize if this has come up before, but is Hungarian Wikipedia
using the Wikidata API for integration? Or is it talking directly to
the Wikidata database?

//Ed

On Mon, Jan 28, 2013 at 10:39 AM, Ed Summers  wrote:
> On the subject of stats are there any plans to add wikidata access stats to:
>
> http://dumps.wikimedia.org/other/pagecounts-raw/
>
> Or are they available elsewhere?
>
> //Ed
>
> On Mon, Jan 28, 2013 at 10:31 AM, Nikola Smolenski  wrote:
>> On 28/01/13 15:39, Lydia Pintscher wrote:
>>>
>>> Is anyone interested in getting us some stats for the deployment on
>>> the Hungarian Wikipedia? There is a database dump at
>>> http://dumps.wikimedia.org/backup-index.html from the 22nd of January
>>> that could be used. I'm interested in the effect Wikidata had so far
>>> on this one Wikipedia.
>>
>>
>> Not from dumps, but from Toolserver, I don't see some reduction in bot
>> activity. Number of bot edits in last 12 months:
>>
>> 201201  61527
>> 201202  48472
>> 201203  3
>> 201204  60875
>> 201205  56364
>> 201206  56483
>> 201207  49836
>> 201208  50862
>> 201209  39235
>> 201210  44943
>> 201211  37492
>> 201212  52815
>> 201301  40258
>>
>>
>> ___
>> Wikidata-l mailing list
>> Wikidata-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Interwiki bots - practical questions

2013-01-29 Thread Lydia Pintscher
On Tue, Jan 29, 2013 at 10:53 AM, Amir E. Aharoni
 wrote:
> Spin off from the "Phase 1" thread.
>
> 2013/1/29 Magnus Manske :
>> Why not just block the bots on wikis that use wikidata?
>
> This looks like the right thing to me, but I don't want to be too rude
> to the bot operators and I do want the bots to keep doing useful
> things.
>
> Imagine the scenario:
> * Wikidata Client is deployed to the Hebrew Wikipedia.
> * I remove interlanguage links from the Hebrew Wikipedia article
> [[ASCII]], an item for which is available in the Wikidata Repo (
> https://www.wikidata.org/wiki/Q8815 ).
> ** The article is supposed to show the links brought from Wikidata now.
> * After some time User:LovelyBot adds the links back.
> * I block User:LovelyBot.
>
> Now what do I say to User:Lovely?
>
> A: Stop changing interlanguage links on the Hebrew Wikipedia. We have
> Wikidata now.
> B: Update your pywikipedia bot configuration (or version). We have
> Wikidata now, and your bot must not touch articles that get the
> interlanguage links from the Wikidata repo.
>
> I prefer option B, but can pywikipediabot indeed identify that the
> links in the article are coming from Wikidata?

Yes, see 
http://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2013/1#Interwiki_bots_and_Wikidata

> And are there interwiki
> bots that are not using the pywikipediabot infrastructure?

Yes I think so.


Cheers
Lydia

--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Wikidata

Wikimedia Deutschland e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Interwiki bots - practical questions

2013-01-29 Thread Amir E. Aharoni
Spin off from the "Phase 1" thread.

2013/1/29 Magnus Manske :
> Why not just block the bots on wikis that use wikidata?

This looks like the right thing to me, but I don't want to be too rude
to the bot operators and I do want the bots to keep doing useful
things.

Imagine the scenario:
* Wikidata Client is deployed to the Hebrew Wikipedia.
* I remove interlanguage links from the Hebrew Wikipedia article
[[ASCII]], an item for which is available in the Wikidata Repo (
https://www.wikidata.org/wiki/Q8815 ).
** The article is supposed to show the links brought from Wikidata now.
* After some time User:LovelyBot adds the links back.
* I block User:LovelyBot.

Now what do I say to User:Lovely?

A: Stop changing interlanguage links on the Hebrew Wikipedia. We have
Wikidata now.
B: Update your pywikipedia bot configuration (or version). We have
Wikidata now, and your bot must not touch articles that get the
interlanguage links from the Wikidata repo.

I prefer option B, but can pywikipediabot indeed identify that the
links in the article are coming from Wikidata? And are there interwiki
bots that are not using the pywikipediabot infrastructure?

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Phase 1

2013-01-29 Thread Nikola Smolenski

On 29/01/13 10:28, Magnus Manske wrote:

So are the same bots doing different things? I seem to remember there
was one giant toolserver pybot instance doing only interwiki.


OTOH, yes, I believe there are bots doing only interwikis that could 
probably be blocked. But isn't anyone from Hungarian Wikipedia here to 
tell us how the test is going and what the community wants to do now?



___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Phase 1

2013-01-29 Thread Marco Fleckinger


AFAIK there are many iw-bots and several others.

I don't think that there are iw-bots doing anything else as well. It would not 
be very useful now. But acually I don't know it. If yes, blocking those 
specific ones might be a good idea.

Marco

Magnus Manske  schrieb:

>So are the same bots doing different things? I seem to remember there
>was
>one giant toolserver pybot instance doing only interwiki.
>
>
>On Tue, Jan 29, 2013 at 9:17 AM, Nikola Smolenski 
>wrote:
>
>> On 29/01/13 10:02, Magnus Manske wrote:
>>
>>> Why not just block the bots on wikis that use wikidata?
>>>
>>
>> Bots are used for much more than interwiki handling.
>>
>>  On Tue, Jan 29, 2013 at 6:51 AM, Bináris >> > wrote:
>>>
>>>
>>>
>>> 2013/1/28 Amir Ladsgroup >> >
>>>
>>>
>>> What is exact time of the next deployment (it and he)?
>>>
>>> If you want to catch it, join #wikimedia-wikidata on IRC. It was
>>> great to follow it on D-day!
>>>
>>> And what time you think is best to disable interwiki bots?
>>>
>>> Xqt can modify the code, but pywiki is not deployed, it is
>updated
>>> by bot owners, so there is no chance to focus it on one hour.
>For
>>> this reason I would say to begin it after deployment of Wikibase
>as
>>> otherwise one should do it at least 1 or 2 days before which
>would
>>> cause a maintenance pause. Yes, people will try to remove iws
>and
>>> some of them will be put back by bots.
>>>
>>
>> __**_
>> Wikidata-l mailing list
>> Wikidata-l@lists.wikimedia.org
>>
>https://lists.wikimedia.org/**mailman/listinfo/wikidata-l
>>
>
>
>
>
>___
>Wikidata-l mailing list
>Wikidata-l@lists.wikimedia.org
>https://lists.wikimedia.org/mailman/listinfo/wikidata-l


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Phase 1

2013-01-29 Thread Magnus Manske
So are the same bots doing different things? I seem to remember there was
one giant toolserver pybot instance doing only interwiki.


On Tue, Jan 29, 2013 at 9:17 AM, Nikola Smolenski  wrote:

> On 29/01/13 10:02, Magnus Manske wrote:
>
>> Why not just block the bots on wikis that use wikidata?
>>
>
> Bots are used for much more than interwiki handling.
>
>  On Tue, Jan 29, 2013 at 6:51 AM, Bináris > > wrote:
>>
>>
>>
>> 2013/1/28 Amir Ladsgroup > >
>>
>>
>> What is exact time of the next deployment (it and he)?
>>
>> If you want to catch it, join #wikimedia-wikidata on IRC. It was
>> great to follow it on D-day!
>>
>> And what time you think is best to disable interwiki bots?
>>
>> Xqt can modify the code, but pywiki is not deployed, it is updated
>> by bot owners, so there is no chance to focus it on one hour. For
>> this reason I would say to begin it after deployment of Wikibase as
>> otherwise one should do it at least 1 or 2 days before which would
>> cause a maintenance pause. Yes, people will try to remove iws and
>> some of them will be put back by bots.
>>
>
> __**_
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/**mailman/listinfo/wikidata-l
>
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Phase 1

2013-01-29 Thread Nikola Smolenski

On 29/01/13 10:02, Magnus Manske wrote:

Why not just block the bots on wikis that use wikidata?


Bots are used for much more than interwiki handling.


On Tue, Jan 29, 2013 at 6:51 AM, Bináris mailto:wikipo...@gmail.com>> wrote:



2013/1/28 Amir Ladsgroup mailto:ladsgr...@gmail.com>>

What is exact time of the next deployment (it and he)?

If you want to catch it, join #wikimedia-wikidata on IRC. It was
great to follow it on D-day!

And what time you think is best to disable interwiki bots?

Xqt can modify the code, but pywiki is not deployed, it is updated
by bot owners, so there is no chance to focus it on one hour. For
this reason I would say to begin it after deployment of Wikibase as
otherwise one should do it at least 1 or 2 days before which would
cause a maintenance pause. Yes, people will try to remove iws and
some of them will be put back by bots.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Phase 1

2013-01-29 Thread Magnus Manske
Why not just block the bots on wikis that use wikidata?


On Tue, Jan 29, 2013 at 6:51 AM, Bináris  wrote:

>
>
> 2013/1/28 Amir Ladsgroup 
>
>> What is exact time of the next deployment (it and he)?
>>
> If you want to catch it, join #wikimedia-wikidata on IRC. It was great to
> follow it on D-day!
>
>
>> And what time you think is best to disable interwiki bots?
>>
> Xqt can modify the code, but pywiki is not deployed, it is updated by bot
> owners, so there is no chance to focus it on one hour. For this reason I
> would say to begin it after deployment of Wikibase as otherwise one should
> do it at least 1 or 2 days before which would cause a maintenance pause.
> Yes, people will try to remove iws and some of them will be put back by
> bots.
>
>
> --
> Bináris
> ___
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] running your own instance of Wikibase (for science!)

2013-01-29 Thread Marco Fleckinger


Hi,

For concrete plans I'm (and as I assume others are to) still waiting for phase 
2. Without it people outside might not get the potential of this great project.

They might think of it like Sara Torvalds did about her brother Linus' work 
just printing "AA" and "" on the screen using multiple threads 
on his new 386-machine.

Please do not get it in a wrong manner. WikiData in his actual state is much 
more than this "AA" and "" stuff from early 1991. But non techs 
of the non Wikipedians might not see this.

So: Keep on developing! We really need it!

Marco

Lydia Pintscher  schrieb:

>Heya folks :)
>
>I've heard a lot of interest from people in running their own instance
>of Wikibase (the software behind Wikidata) for scientific projects. I
>have however not seen any more concrete plans yet. I'd love to hear
>about them if they exist either on or off-list.
>
>
>Cheers
>Lydia
>
>--
>Lydia Pintscher - http://about.me/lydia.pintscher
>Community Communications for Wikidata
>
>Wikimedia Deutschland e.V.
>Obentrautstr. 72
>10963 Berlin
>www.wikimedia.de
>
>Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
>Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
>Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
>___
>Wikidata-l mailing list
>Wikidata-l@lists.wikimedia.org
>https://lists.wikimedia.org/mailman/listinfo/wikidata-l


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Database used by wikidata

2013-01-29 Thread Marco Fleckinger




Hor Meng Yoong  schrieb:

>On Mon, Jan 28, 2013 at 10:52 PM, Katie Filbert
>wrote:
>
>
>> PostgreSQL is really not supported at all yet for Wikibase. 
>MediaWiki
>> core support for Postgres is also currently broken, while patches for
>sites
>> schema updates, ORM support are pending on gerrit.
>>
>
>Look like using PostgreSQL is a no-go for me at the moment for my
>project.
>Both MySQL and PostgreSQL are great but the former is being changed to
>more
>proprietary and its open-source's future is uncertain.
>I would urge the MediaWiki development community to consider further on
>this.
>
I would not be that definite. There are some companies which are using 
MediaWiki and they may also will use WikiBase. They also might have 
professional Database-Support. Looking at this it could also make sense to 
support MSSQL or Oracle (in this case it's pitty that Apple doesn't have an own 
AFAIK).

And no: I'm definitely not a business guy. :=D

Marco

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Phase 1

2013-01-29 Thread Marco Fleckinger




"Bináris"  schrieb:

>2013/1/28 Amir Ladsgroup 
>
>> What is exact time of the next deployment (it and he)?
>>
>If you want to catch it, join #wikimedia-wikidata on IRC. It was great
>to
>follow it on D-day!
>
>
>> And what time you think is best to disable interwiki bots?
>>
>Xqt can modify the code, but pywiki is not deployed, it is updated by
>bot
>owners, so there is no chance to focus it on one hour. For this reason
>I
>would say to begin it after deployment of Wikibase as otherwise one
>should
>do it at least 1 or 2 days before which would cause a maintenance
>pause.
>Yes, people will try to remove iws and some of them will be put back by
>bots.

Would it also make sense to write a bot putting the remaining iws to wikidata 
and rmoving them from the wiki if they can be replaced by them from wikidata?

Marco


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l