Re: [Wikimedia-l] LsJbot and geonames

2015-09-06 Thread Ricordisamoa
Proper data-based stubs are being worked on: 
https://phabricator.wikimedia.org/project/profile/1416/

Lsjbot, you have no chance to survive make your time.

Il 06/09/2015 02:40, Anders Wennersten ha scritto:
Geonames [1] is a database which holds around 9 M entries of 
geographical related items from all over the world.


Lsjbot is now generating articles from a subset of it, after several 
months of extensive research on its quality, Wikidata relations and 
notability issues. While the quality in some regions is substandard 
(and these will not be generated) it was seen as very good in most 
areas.  In the discussion  I was intrigued to learn that identical 
Arabic names should be transcribed differently depending on its 
geographic location. And I was fascinated of the question of 
notability of wells in the Bahrain desert (which in the end was 
excluded, mostly because we knew too little of that reality)


In this run Lsjbot has extended its functionality even further then 
when it generated articles for species. It looks for relevant 
geographical items close to the actual one: a lake close by, a 
mountain and where is the nearest major town etc.


Macedonia  can be taken as one example. Lsjbot generated over 1 
articles (and 5000 disambiguous pages) making it a magnitude more than 
what exist in enwp. Also for a well defined type like villages, almost 
50% as many has been generated than existing in enwp. One example [2] 
where you can see what has been generated (and note the reuse of a 
relevant figure existing in frwp). Please compare the corresponding 
articles on other languages in this case, many having less information 
than the bot generated one.


The generation is still in early stage [3) but has already got the 
article count for svwp to pass 2 M  today.  But it will take many 
months more before completed and perhaps more M marks will be passed 
before it is through. If you want to give feedback you are welcome to 
enter it at [4]


Anders
(with all credits for the Lsjbot to be given to Sverker, its owner, I 
am just one of the many supporters of him and his bot on svwp)


[1]
http://www.geonames.org/about.html

[2]
https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29

[3]
https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar

[4]
https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_platser 






___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines

Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 




___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] LsJbot and geonames

2015-09-06 Thread Emilio J . Rodríguez-Posada
Congratulations for the stub creation, they are good (and better that those
handmade stubs in other languages).

About the Wikidata placeholder project, it sounds very interesting.

2015-09-06 2:40 GMT+02:00 Anders Wennersten :

> Geonames [1] is a database which holds around 9 M entries of geographical
> related items from all over the world.
>
> Lsjbot is now generating articles from a subset of it, after several
> months of extensive research on its quality, Wikidata relations and
> notability issues. While the quality in some regions is substandard (and
> these will not be generated) it was seen as very good in most areas.  In
> the discussion  I was intrigued to learn that identical Arabic names should
> be transcribed differently depending on its geographic location. And I was
> fascinated of the question of notability of wells in the Bahrain desert
> (which in the end was excluded, mostly because we knew too little of that
> reality)
>
> In this run Lsjbot has extended its functionality even further then when
> it generated articles for species. It looks for relevant geographical items
> close to the actual one: a lake close by, a mountain and where is the
> nearest major town etc.
>
> Macedonia  can be taken as one example. Lsjbot generated over 1
> articles (and 5000 disambiguous pages) making it a magnitude more than what
> exist in enwp. Also for a well defined type like villages, almost 50% as
> many has been generated than existing in enwp. One example [2] where you
> can see what has been generated (and note the reuse of a relevant figure
> existing in frwp). Please compare the corresponding articles on other
> languages in this case, many having less information than the bot generated
> one.
>
> The generation is still in early stage [3) but has already got the article
> count for svwp to pass 2 M  today.  But it will take many months more
> before completed and perhaps more M marks will be passed before it is
> through. If you want to give feedback you are welcome to enter it at [4]
>
> Anders
> (with all credits for the Lsjbot to be given to Sverker, its owner, I am
> just one of the many supporters of him and his bot on svwp)
>
> [1]
> http://www.geonames.org/about.html
>
> [2]
> https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29
>
> [3]
> https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar
>
> [4]
>
> https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_platser
>
>
>
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] LsJbot and geonames

2015-09-06 Thread Gerard Meijssen
Hoi,
PLEASE reconsider. A Wikidata based solution is not superior because it
started from Wikidata.

PLEASE consider collaboration. It will be so much more powerful when LSJBOT
and people at Wikidata collaborate. It will get things right the first
time. It does not have to be perfect from the start as long as it gets
better over time. As long as we always work on improving the data.

PLEASE consider text generation based on Wikidata. They are the scripts
LSJBOT uses, they can help us improve the text when more or better
information becomes available.
Thanks,
 GerardM

On 6 September 2015 at 08:25, Ricordisamoa 
wrote:

> Proper data-based stubs are being worked on:
> https://phabricator.wikimedia.org/project/profile/1416/
> Lsjbot, you have no chance to survive make your time.
>
>
> Il 06/09/2015 02:40, Anders Wennersten ha scritto:
>
>> Geonames [1] is a database which holds around 9 M entries of geographical
>> related items from all over the world.
>>
>> Lsjbot is now generating articles from a subset of it, after several
>> months of extensive research on its quality, Wikidata relations and
>> notability issues. While the quality in some regions is substandard (and
>> these will not be generated) it was seen as very good in most areas.  In
>> the discussion  I was intrigued to learn that identical Arabic names should
>> be transcribed differently depending on its geographic location. And I was
>> fascinated of the question of notability of wells in the Bahrain desert
>> (which in the end was excluded, mostly because we knew too little of that
>> reality)
>>
>> In this run Lsjbot has extended its functionality even further then when
>> it generated articles for species. It looks for relevant geographical items
>> close to the actual one: a lake close by, a mountain and where is the
>> nearest major town etc.
>>
>> Macedonia  can be taken as one example. Lsjbot generated over 1
>> articles (and 5000 disambiguous pages) making it a magnitude more than what
>> exist in enwp. Also for a well defined type like villages, almost 50% as
>> many has been generated than existing in enwp. One example [2] where you
>> can see what has been generated (and note the reuse of a relevant figure
>> existing in frwp). Please compare the corresponding articles on other
>> languages in this case, many having less information than the bot generated
>> one.
>>
>> The generation is still in early stage [3) but has already got the
>> article count for svwp to pass 2 M  today.  But it will take many months
>> more before completed and perhaps more M marks will be passed before it is
>> through. If you want to give feedback you are welcome to enter it at [4]
>>
>> Anders
>> (with all credits for the Lsjbot to be given to Sverker, its owner, I am
>> just one of the many supporters of him and his bot on svwp)
>>
>> [1]
>> http://www.geonames.org/about.html
>>
>> [2]
>> https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29
>>
>> [3]
>> https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar
>>
>> [4]
>>
>> https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_platser
>>
>>
>>
>>
>> ___
>> Wikimedia-l mailing list, guidelines at:
>> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
>> Wikimedia-l@lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>> 
>>
>
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] LsJbot and geonames

2015-09-06 Thread Gerard Meijssen
Hoi,
As always I have been a big fan of the wonderful work that has been done.
My reaction was very much for what I perceived as a negative reaction from
Ricordisamoa. Telling you to stop and become part of Wikidata is a bit off.
Asking for collaboration and work towards a common goal, a goal that you
very much want to share as I perceive it in your reply is most wonderful
and most welcome.

When your data is at a quality level where you create stubs, it is very
much at the level where we should have it in Wikidata. Obviously it is for
the Swedish community to have the stubs or experiment with cached articles
based on Wikidata data. Obviously, we are at a point where we can create
the stubs and where caching concepts is technically feasible but not
something we have done so far.

What does it take to have such an experiment?
Thanks,
 GerardM

On 6 September 2015 at 11:23, Anders Wennersten 
wrote:

> At svwp we work closely with Wikidata and see it as the natural base for
> our article substance. And we follow closely Phabricator and are eager to
> implement it as soon as it will be feasible to implement. And Lsjbot is in
> no way counteractive to these. It will be easy to exchange Lsjbot article
> with Phabricator generated ones when time is right.
>
> But I believe you miss the point with what Lsjbot is doing now.  The
> extensive research etc done on data in Geonames is one of the crucial
> efforts. And in a way all this generation project is a research on the
> viability to use this data for full in all language versions. If it still
> is seen as viable we could extend our article coverage for geographical
> entities with a factor 10 in all versions. And this research is a must even
> independently of which technique is used to generate the articles.
>
> The other crucial effort is the extended intelligence built into the
> generation of  facts in the articles. To find out close by physical object
> by clever algorithms is a intellectual effort of highest dignity. First
> when bot generating was introduced, it was more or less a mapping of items
> from input to items in output (in articles). We now see how more info is
> created by info only implicit existing in input and where it is combined
> with external (map) data
>
> I can not enough press on how much I am impressed by Sverkers outstanding
> intellectual effort and his creativity in implementing and running software
> that is of great help reaching our common vision "free knowledge for all".
>
>  Anders
>
>
>
>
>
> Den 2015-09-06 kl. 08:50, skrev Gerard Meijssen:
>
>> Hoi,
>> PLEASE reconsider. A Wikidata based solution is not superior because it
>> started from Wikidata.
>>
>> PLEASE consider collaboration. It will be so much more powerful when
>> LSJBOT
>> and people at Wikidata collaborate. It will get things right the first
>> time. It does not have to be perfect from the start as long as it gets
>> better over time. As long as we always work on improving the data.
>>
>> PLEASE consider text generation based on Wikidata. They are the scripts
>> LSJBOT uses, they can help us improve the text when more or better
>> information becomes available.
>> Thanks,
>>   GerardM
>>
>> On 6 September 2015 at 08:25, Ricordisamoa 
>> wrote:
>>
>> Proper data-based stubs are being worked on:
>>> https://phabricator.wikimedia.org/project/profile/1416/
>>> Lsjbot, you have no chance to survive make your time.
>>>
>>>
>>> Il 06/09/2015 02:40, Anders Wennersten ha scritto:
>>>
>>> Geonames [1] is a database which holds around 9 M entries of geographical
 related items from all over the world.

 Lsjbot is now generating articles from a subset of it, after several
 months of extensive research on its quality, Wikidata relations and
 notability issues. While the quality in some regions is substandard (and
 these will not be generated) it was seen as very good in most areas.  In
 the discussion  I was intrigued to learn that identical Arabic names
 should
 be transcribed differently depending on its geographic location. And I
 was
 fascinated of the question of notability of wells in the Bahrain desert
 (which in the end was excluded, mostly because we knew too little of
 that
 reality)

 In this run Lsjbot has extended its functionality even further then when
 it generated articles for species. It looks for relevant geographical
 items
 close to the actual one: a lake close by, a mountain and where is the
 nearest major town etc.

 Macedonia  can be taken as one example. Lsjbot generated over 1
 articles (and 5000 disambiguous pages) making it a magnitude more than
 what
 exist in enwp. Also for a well defined type like villages, almost 50% as
 many has been generated than existing in enwp. One example [2] where you
 can see what has been generated (and note the reuse of a relevant figure

Re: [Wikimedia-l] LsJbot and geonames

2015-09-06 Thread Steinsplitter Wiki
Hoi,

"Article Placeholders are automatically generated content pages in 
Wikipedia or other mediawiki projects displaying data from Wikidata."   
Seriously? RobotWiki? Do we really want this? Quality, not quantity. 

> From: gerard.meijs...@gmail.com
> Date: Sun, 6 Sep 2015 11:35:31 +0200
> To: wikimedia-l@lists.wikimedia.org
> Subject: Re: [Wikimedia-l] LsJbot and geonames
> 
> Hoi,
> As always I have been a big fan of the wonderful work that has been done.
> My reaction was very much for what I perceived as a negative reaction from
> Ricordisamoa. Telling you to stop and become part of Wikidata is a bit off.
> Asking for collaboration and work towards a common goal, a goal that you
> very much want to share as I perceive it in your reply is most wonderful
> and most welcome.
> 
> When your data is at a quality level where you create stubs, it is very
> much at the level where we should have it in Wikidata. Obviously it is for
> the Swedish community to have the stubs or experiment with cached articles
> based on Wikidata data. Obviously, we are at a point where we can create
> the stubs and where caching concepts is technically feasible but not
> something we have done so far.
> 
> What does it take to have such an experiment?
> Thanks,
>  GerardM
> 
> On 6 September 2015 at 11:23, Anders Wennersten <m...@anderswennersten.se>
> wrote:
> 
> > At svwp we work closely with Wikidata and see it as the natural base for
> > our article substance. And we follow closely Phabricator and are eager to
> > implement it as soon as it will be feasible to implement. And Lsjbot is in
> > no way counteractive to these. It will be easy to exchange Lsjbot article
> > with Phabricator generated ones when time is right.
> >
> > But I believe you miss the point with what Lsjbot is doing now.  The
> > extensive research etc done on data in Geonames is one of the crucial
> > efforts. And in a way all this generation project is a research on the
> > viability to use this data for full in all language versions. If it still
> > is seen as viable we could extend our article coverage for geographical
> > entities with a factor 10 in all versions. And this research is a must even
> > independently of which technique is used to generate the articles.
> >
> > The other crucial effort is the extended intelligence built into the
> > generation of  facts in the articles. To find out close by physical object
> > by clever algorithms is a intellectual effort of highest dignity. First
> > when bot generating was introduced, it was more or less a mapping of items
> > from input to items in output (in articles). We now see how more info is
> > created by info only implicit existing in input and where it is combined
> > with external (map) data
> >
> > I can not enough press on how much I am impressed by Sverkers outstanding
> > intellectual effort and his creativity in implementing and running software
> > that is of great help reaching our common vision "free knowledge for all".
> >
> >  Anders
> >
> >
> >
> >
> >
> > Den 2015-09-06 kl. 08:50, skrev Gerard Meijssen:
> >
> >> Hoi,
> >> PLEASE reconsider. A Wikidata based solution is not superior because it
> >> started from Wikidata.
> >>
> >> PLEASE consider collaboration. It will be so much more powerful when
> >> LSJBOT
> >> and people at Wikidata collaborate. It will get things right the first
> >> time. It does not have to be perfect from the start as long as it gets
> >> better over time. As long as we always work on improving the data.
> >>
> >> PLEASE consider text generation based on Wikidata. They are the scripts
> >> LSJBOT uses, they can help us improve the text when more or better
> >> information becomes available.
> >> Thanks,
> >>   GerardM
> >>
> >> On 6 September 2015 at 08:25, Ricordisamoa <ricordisa...@openmailbox.org>
> >> wrote:
> >>
> >> Proper data-based stubs are being worked on:
> >>> https://phabricator.wikimedia.org/project/profile/1416/
> >>> Lsjbot, you have no chance to survive make your time.
> >>>
> >>>
> >>> Il 06/09/2015 02:40, Anders Wennersten ha scritto:
> >>>
> >>> Geonames [1] is a database which holds around 9 M entries of geographical
> >>>> related items from all over the world.
> >>>>
> >>>> Lsjbot is now generating articles from a subset of it, after several
> >>>> months of extensive research on its

Re: [Wikimedia-l] LsJbot and geonames

2015-09-06 Thread Anders Wennersten
At svwp we work closely with Wikidata and see it as the natural base for 
our article substance. And we follow closely Phabricator and are eager 
to implement it as soon as it will be feasible to implement. And Lsjbot 
is in no way counteractive to these. It will be easy to exchange Lsjbot 
article with Phabricator generated ones when time is right.


But I believe you miss the point with what Lsjbot is doing now.  The 
extensive research etc done on data in Geonames is one of the crucial 
efforts. And in a way all this generation project is a research on the 
viability to use this data for full in all language versions. If it 
still is seen as viable we could extend our article coverage for 
geographical entities with a factor 10 in all versions. And this 
research is a must even independently of which technique is used to 
generate the articles.


The other crucial effort is the extended intelligence built into the 
generation of  facts in the articles. To find out close by physical 
object by clever algorithms is a intellectual effort of highest dignity. 
First when bot generating was introduced, it was more or less a mapping 
of items from input to items in output (in articles). We now see how 
more info is created by info only implicit existing in input and where 
it is combined with external (map) data


I can not enough press on how much I am impressed by Sverkers 
outstanding intellectual effort and his creativity in implementing and 
running software that is of great help reaching our common vision "free 
knowledge for all".


 Anders




Den 2015-09-06 kl. 08:50, skrev Gerard Meijssen:

Hoi,
PLEASE reconsider. A Wikidata based solution is not superior because it
started from Wikidata.

PLEASE consider collaboration. It will be so much more powerful when LSJBOT
and people at Wikidata collaborate. It will get things right the first
time. It does not have to be perfect from the start as long as it gets
better over time. As long as we always work on improving the data.

PLEASE consider text generation based on Wikidata. They are the scripts
LSJBOT uses, they can help us improve the text when more or better
information becomes available.
Thanks,
  GerardM

On 6 September 2015 at 08:25, Ricordisamoa 
wrote:


Proper data-based stubs are being worked on:
https://phabricator.wikimedia.org/project/profile/1416/
Lsjbot, you have no chance to survive make your time.


Il 06/09/2015 02:40, Anders Wennersten ha scritto:


Geonames [1] is a database which holds around 9 M entries of geographical
related items from all over the world.

Lsjbot is now generating articles from a subset of it, after several
months of extensive research on its quality, Wikidata relations and
notability issues. While the quality in some regions is substandard (and
these will not be generated) it was seen as very good in most areas.  In
the discussion  I was intrigued to learn that identical Arabic names should
be transcribed differently depending on its geographic location. And I was
fascinated of the question of notability of wells in the Bahrain desert
(which in the end was excluded, mostly because we knew too little of that
reality)

In this run Lsjbot has extended its functionality even further then when
it generated articles for species. It looks for relevant geographical items
close to the actual one: a lake close by, a mountain and where is the
nearest major town etc.

Macedonia  can be taken as one example. Lsjbot generated over 1
articles (and 5000 disambiguous pages) making it a magnitude more than what
exist in enwp. Also for a well defined type like villages, almost 50% as
many has been generated than existing in enwp. One example [2] where you
can see what has been generated (and note the reuse of a relevant figure
existing in frwp). Please compare the corresponding articles on other
languages in this case, many having less information than the bot generated
one.

The generation is still in early stage [3) but has already got the
article count for svwp to pass 2 M  today.  But it will take many months
more before completed and perhaps more M marks will be passed before it is
through. If you want to give feedback you are welcome to enter it at [4]

Anders
(with all credits for the Lsjbot to be given to Sverker, its owner, I am
just one of the many supporters of him and his bot on svwp)

[1]
http://www.geonames.org/about.html

[2]
https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29

[3]
https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar

[4]

https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_platser




___
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,





Re: [Wikimedia-l] LsJbot and geonames

2015-09-06 Thread Ricordisamoa

Hoi,
Wouldn't have it been a better use of Sverker's brainpower to implement 
a long-term solution that doesn't require saving articles to the wiki?


Il 06/09/2015 11:23, Anders Wennersten ha scritto:
At svwp we work closely with Wikidata and see it as the natural base 
for our article substance. And we follow closely Phabricator and are 
eager to implement it as soon as it will be feasible to implement. And 
Lsjbot is in no way counteractive to these. It will be easy to 
exchange Lsjbot article with Phabricator generated ones when time is 
right.


But I believe you miss the point with what Lsjbot is doing now. The 
extensive research etc done on data in Geonames is one of the crucial 
efforts. And in a way all this generation project is a research on the 
viability to use this data for full in all language versions. If it 
still is seen as viable we could extend our article coverage for 
geographical entities with a factor 10 in all versions. And this 
research is a must even independently of which technique is used to 
generate the articles.


The other crucial effort is the extended intelligence built into the 
generation of  facts in the articles. To find out close by physical 
object by clever algorithms is a intellectual effort of highest 
dignity. First when bot generating was introduced, it was more or less 
a mapping of items from input to items in output (in articles). We now 
see how more info is created by info only implicit existing in input 
and where it is combined with external (map) data


I can not enough press on how much I am impressed by Sverkers 
outstanding intellectual effort and his creativity in implementing and 
running software that is of great help reaching our common vision 
"free knowledge for all".


 Anders




Den 2015-09-06 kl. 08:50, skrev Gerard Meijssen:

Hoi,
PLEASE reconsider. A Wikidata based solution is not superior because it
started from Wikidata.

PLEASE consider collaboration. It will be so much more powerful when 
LSJBOT

and people at Wikidata collaborate. It will get things right the first
time. It does not have to be perfect from the start as long as it gets
better over time. As long as we always work on improving the data.

PLEASE consider text generation based on Wikidata. They are the scripts
LSJBOT uses, they can help us improve the text when more or better
information becomes available.
Thanks,
  GerardM

On 6 September 2015 at 08:25, Ricordisamoa 


wrote:


Proper data-based stubs are being worked on:
https://phabricator.wikimedia.org/project/profile/1416/
Lsjbot, you have no chance to survive make your time.


Il 06/09/2015 02:40, Anders Wennersten ha scritto:

Geonames [1] is a database which holds around 9 M entries of 
geographical

related items from all over the world.

Lsjbot is now generating articles from a subset of it, after several
months of extensive research on its quality, Wikidata relations and
notability issues. While the quality in some regions is substandard 
(and
these will not be generated) it was seen as very good in most 
areas.  In
the discussion  I was intrigued to learn that identical Arabic 
names should
be transcribed differently depending on its geographic location. 
And I was
fascinated of the question of notability of wells in the Bahrain 
desert
(which in the end was excluded, mostly because we knew too little 
of that

reality)

In this run Lsjbot has extended its functionality even further then 
when
it generated articles for species. It looks for relevant 
geographical items

close to the actual one: a lake close by, a mountain and where is the
nearest major town etc.

Macedonia  can be taken as one example. Lsjbot generated over 1
articles (and 5000 disambiguous pages) making it a magnitude more 
than what
exist in enwp. Also for a well defined type like villages, almost 
50% as
many has been generated than existing in enwp. One example [2] 
where you
can see what has been generated (and note the reuse of a relevant 
figure

existing in frwp). Please compare the corresponding articles on other
languages in this case, many having less information than the bot 
generated

one.

The generation is still in early stage [3) but has already got the
article count for svwp to pass 2 M  today.  But it will take many 
months
more before completed and perhaps more M marks will be passed 
before it is
through. If you want to give feedback you are welcome to enter it 
at [4]


Anders
(with all credits for the Lsjbot to be given to Sverker, its owner, 
I am

just one of the many supporters of him and his bot on svwp)

[1]
http://www.geonames.org/about.html

[2]
https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29

[3]
https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar

[4]

https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_platser 






___
Wikimedia-l mailing list, guidelines at:

Re: [Wikimedia-l] LsJbot and geonames

2015-09-06 Thread Emilio J . Rodríguez-Posada
2015-09-06 13:22 GMT+02:00 Steinsplitter Wiki <steinsplitter-w...@live.com>:

> Hoi,
>
> "Article Placeholders are automatically generated content pages in
> Wikipedia or other mediawiki projects displaying data from Wikidata."
>  Seriously? RobotWiki? Do we really want this? Quality, not quantity.
>
>
Yeah. I REALLY want this.


> > From: gerard.meijs...@gmail.com
> > Date: Sun, 6 Sep 2015 11:35:31 +0200
> > To: wikimedia-l@lists.wikimedia.org
> > Subject: Re: [Wikimedia-l] LsJbot and geonames
> >
> > Hoi,
> > As always I have been a big fan of the wonderful work that has been done.
> > My reaction was very much for what I perceived as a negative reaction
> from
> > Ricordisamoa. Telling you to stop and become part of Wikidata is a bit
> off.
> > Asking for collaboration and work towards a common goal, a goal that you
> > very much want to share as I perceive it in your reply is most wonderful
> > and most welcome.
> >
> > When your data is at a quality level where you create stubs, it is very
> > much at the level where we should have it in Wikidata. Obviously it is
> for
> > the Swedish community to have the stubs or experiment with cached
> articles
> > based on Wikidata data. Obviously, we are at a point where we can create
> > the stubs and where caching concepts is technically feasible but not
> > something we have done so far.
> >
> > What does it take to have such an experiment?
> > Thanks,
> >  GerardM
> >
> > On 6 September 2015 at 11:23, Anders Wennersten <
> m...@anderswennersten.se>
> > wrote:
> >
> > > At svwp we work closely with Wikidata and see it as the natural base
> for
> > > our article substance. And we follow closely Phabricator and are eager
> to
> > > implement it as soon as it will be feasible to implement. And Lsjbot
> is in
> > > no way counteractive to these. It will be easy to exchange Lsjbot
> article
> > > with Phabricator generated ones when time is right.
> > >
> > > But I believe you miss the point with what Lsjbot is doing now.  The
> > > extensive research etc done on data in Geonames is one of the crucial
> > > efforts. And in a way all this generation project is a research on the
> > > viability to use this data for full in all language versions. If it
> still
> > > is seen as viable we could extend our article coverage for geographical
> > > entities with a factor 10 in all versions. And this research is a must
> even
> > > independently of which technique is used to generate the articles.
> > >
> > > The other crucial effort is the extended intelligence built into the
> > > generation of  facts in the articles. To find out close by physical
> object
> > > by clever algorithms is a intellectual effort of highest dignity. First
> > > when bot generating was introduced, it was more or less a mapping of
> items
> > > from input to items in output (in articles). We now see how more info
> is
> > > created by info only implicit existing in input and where it is
> combined
> > > with external (map) data
> > >
> > > I can not enough press on how much I am impressed by Sverkers
> outstanding
> > > intellectual effort and his creativity in implementing and running
> software
> > > that is of great help reaching our common vision "free knowledge for
> all".
> > >
> > >  Anders
> > >
> > >
> > >
> > >
> > >
> > > Den 2015-09-06 kl. 08:50, skrev Gerard Meijssen:
> > >
> > >> Hoi,
> > >> PLEASE reconsider. A Wikidata based solution is not superior because
> it
> > >> started from Wikidata.
> > >>
> > >> PLEASE consider collaboration. It will be so much more powerful when
> > >> LSJBOT
> > >> and people at Wikidata collaborate. It will get things right the first
> > >> time. It does not have to be perfect from the start as long as it gets
> > >> better over time. As long as we always work on improving the data.
> > >>
> > >> PLEASE consider text generation based on Wikidata. They are the
> scripts
> > >> LSJBOT uses, they can help us improve the text when more or better
> > >> information becomes available.
> > >> Thanks,
> > >>   GerardM
> > >>
> > >> On 6 September 2015 at 08:25, Ricordisamoa <
> ricordisa...@openmailbox.org>
> > >> wrote:
> > >>

[Wikimedia-l] LsJbot and geonames

2015-09-05 Thread Anders Wennersten
Geonames [1] is a database which holds around 9 M entries of 
geographical related items from all over the world.


Lsjbot is now generating articles from a subset of it, after several 
months of extensive research on its quality, Wikidata relations and 
notability issues. While the quality in some regions is substandard (and 
these will not be generated) it was seen as very good in most areas.  In 
the discussion  I was intrigued to learn that identical Arabic names 
should be transcribed differently depending on its geographic location. 
And I was fascinated of the question of notability of wells in the 
Bahrain desert (which in the end was excluded, mostly because we knew 
too little of that reality)


In this run Lsjbot has extended its functionality even further then when 
it generated articles for species. It looks for relevant geographical 
items close to the actual one: a lake close by, a mountain and where is 
the nearest major town etc.


Macedonia  can be taken as one example. Lsjbot generated over 1 
articles (and 5000 disambiguous pages) making it a magnitude more than 
what exist in enwp. Also for a well defined type like villages, almost 
50% as many has been generated than existing in enwp. One example [2] 
where you can see what has been generated (and note the reuse of a 
relevant figure existing in frwp). Please compare the corresponding 
articles on other languages in this case, many having less information 
than the bot generated one.


The generation is still in early stage [3) but has already got the 
article count for svwp to pass 2 M  today.  But it will take many months 
more before completed and perhaps more M marks will be passed before it 
is through. If you want to give feedback you are welcome to enter it at [4]


Anders
(with all credits for the Lsjbot to be given to Sverker, its owner, I am 
just one of the many supporters of him and his bot on svwp)


[1]
http://www.geonames.org/about.html

[2]
https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29

[3]
https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar

[4]
https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_platser




___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,