[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2019-03-12 Thread Cyberpower678
Cyberpower678 added a comment.


  
https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/InternetArchiveBot

TASK DETAIL
  https://phabricator.wikimedia.org/T143488

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Cyberpower678
Cc: Spinster, Wittylama, Redalert2fan, Jane023, Multichill, Abbe98, 
Lydia_Pintscher, Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, 
alaa_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Chicocvenancio, QZanden, 
dachary, LawExplorer, _jensen, rosalieper, Cirdan, Wikidata-bugs, Hydriz, aude, 
Ricordisamoa, Sjoerddebruin, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2019-03-12 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  \o/

TASK DETAIL
  https://phabricator.wikimedia.org/T143488

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Cyberpower678, Lydia_Pintscher
Cc: Spinster, Wittylama, Redalert2fan, Jane023, Multichill, Abbe98, 
Lydia_Pintscher, Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, 
alaa_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Chicocvenancio, QZanden, 
dachary, LawExplorer, _jensen, rosalieper, Cirdan, Wikidata-bugs, Hydriz, aude, 
Ricordisamoa, Sjoerddebruin, Mbch331, Jay8g
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2018-12-28 Thread Cyberpower678
Cyberpower678 added a comment.
Well IABot can save to the Wayback Machine, and it probably will, but it's primary job to find an archive for the URL and add it to the Wikidata entry.TASK DETAILhttps://phabricator.wikimedia.org/T143488EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Cyberpower678Cc: Spinster, Wittylama, Redalert2fan, Jane023, Multichill, Abbe98, Lydia_Pintscher, Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, Dinadineke, Nandana, tabish.shaikh91, Lahi, Gq86, GoranSMilovanovic, Soteriaspace, Jayprakash12345, Chicocvenancio, JakeTheDeveloper, QZanden, dachary, merbst, LawExplorer, _jensen, D3r1ck01, Cirdan, Wikidata-bugs, Hydriz, aude, Ricordisamoa, Sjoerddebruin, TheDJ, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2018-12-27 Thread Multichill
Multichill added a comment.

In T143488#4843337, @Cyberpower678 wrote:
To answer your questions, IABot will be using the wbgetentities and wbeditentity API calls on MW.


Good to hear you're working on this! I guess for this part of the bot to work you need to be able to fetch all the external inks on a Wikidata item so you can go to the next step of checking if you need to index these? I understand you considering using "wbgetentities", but not sure how "wbeditentity" fits in. If you're just reading, I would probably use entitydata ( https://www.mediawiki.org/wiki/Wikibase/EntityData ) to fetch the item in your favorite format. That will give you most of the links ( https://www.wikidata.org/wiki/Q219831 / http://www.wikidata.org/entity/Q219831.rdf / http://www.wikidata.org/entity/Q219831.json). The RDF format is quite nice because some of the links get expanded already (look for wdtn). For some of the links you have to expand it yourself using the formatter url ( https://www.wikidata.org/wiki/Property:P1630). At the start of the bot run I would do a SPARQL query for all the formatter urls and make a lookup table out of it. For each item you process you could just do an easy lookup in this table if you encounter an external-id property.

Anyway, if you need any help, please let me know. Link rot is quite a large problem right now on Wikidata and I would love to have a bot start indexing links so we at least have a copy somewhere.TASK DETAILhttps://phabricator.wikimedia.org/T143488EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Cyberpower678, MultichillCc: Spinster, Wittylama, Redalert2fan, Jane023, Multichill, Abbe98, Lydia_Pintscher, Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, Dinadineke, Nandana, tabish.shaikh91, Lahi, Gq86, GoranSMilovanovic, Soteriaspace, Jayprakash12345, Chicocvenancio, JakeTheDeveloper, QZanden, dachary, merbst, LawExplorer, _jensen, D3r1ck01, Cirdan, Wikidata-bugs, Hydriz, aude, Ricordisamoa, Sjoerddebruin, TheDJ, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2018-12-25 Thread Cyberpower678
Cyberpower678 added a comment.
To answer your questions, IABot will be using the wbgetentities and wbeditentity API calls on MW.TASK DETAILhttps://phabricator.wikimedia.org/T143488EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Cyberpower678Cc: Spinster, Wittylama, Redalert2fan, Jane023, Multichill, Abbe98, Lydia_Pintscher, Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, Dinadineke, Nandana, tabish.shaikh91, Lahi, Gq86, GoranSMilovanovic, Soteriaspace, Jayprakash12345, Chicocvenancio, JakeTheDeveloper, QZanden, dachary, merbst, LawExplorer, _jensen, D3r1ck01, Cirdan, Wikidata-bugs, Hydriz, aude, Ricordisamoa, Sjoerddebruin, TheDJ, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2018-12-25 Thread Cyberpower678
Cyberpower678 added a comment.

In T143488#4843283, @Multichill wrote:

In T143488#4677644, @Cyberpower678 wrote:
I am currently at that Hackathon in Thompson 150 right now.  If you care to meet me during lunch break I will be happy to work on this with you.


That didn't work out. Can you please have a look at my previous questions? You don't seem to be working on it and the task does not contain enough information for others to work on this.


And a Merry Christmas to you too.  I guess you don't celebrate it, as I've personally been busy with the holiday season.  Just because you don't see public movement, doesn't mean I'm not working on it.  It takes some work to get IABot ready for Wikidata.TASK DETAILhttps://phabricator.wikimedia.org/T143488EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Cyberpower678Cc: Spinster, Wittylama, Redalert2fan, Jane023, Multichill, Abbe98, Lydia_Pintscher, Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, Dinadineke, Nandana, tabish.shaikh91, Lahi, Gq86, GoranSMilovanovic, Soteriaspace, Jayprakash12345, Chicocvenancio, JakeTheDeveloper, QZanden, dachary, merbst, LawExplorer, _jensen, D3r1ck01, Cirdan, Wikidata-bugs, Hydriz, aude, Ricordisamoa, Sjoerddebruin, TheDJ, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2018-12-25 Thread Multichill
Multichill added a comment.

In T143488#4677644, @Cyberpower678 wrote:
I am currently at that Hackathon in Thompson 150 right now.  If you care to meet me during lunch break I will be happy to work on this with you.


That didn't work out. Can you please have a look at my previous questions? You don't seem to be working on it and the task does not contain enough information for others to work on this.TASK DETAILhttps://phabricator.wikimedia.org/T143488EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Cyberpower678, MultichillCc: Redalert2fan, Jane023, Multichill, Abbe98, Lydia_Pintscher, Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, Dinadineke, Nandana, tabish.shaikh91, Lahi, Gq86, GoranSMilovanovic, Soteriaspace, Jayprakash12345, Chicocvenancio, JakeTheDeveloper, QZanden, dachary, merbst, LawExplorer, _jensen, D3r1ck01, Cirdan, Wikidata-bugs, Hydriz, aude, Ricordisamoa, Sjoerddebruin, TheDJ, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2018-10-18 Thread Cyberpower678
Cyberpower678 added a comment.

In T143488#4673884, @Multichill wrote:

In T143488#4132389, @Cyberpower678 wrote:
It's doable, but not easy.  Wikidata has a different structure.


Extending the current bot seems to be the most future proof solution. In this task we only care about getting things into the archive, nothing else. So my guess is that parsing a Wikidata item is what you run into? Take for example https://www.wikidata.org/wiki/Q24066189 . You could force it to some other format like https://www.wikidata.org/entity/Q24066189.rdf or https://www.wikidata.org/entity/Q24066189.json to make it easier to find and extract urls.

Do you have some pointers where you think the challenge is going to be? We have an upcoming hackathon and we might be able to work on this.


I am currently at that Hackathon in Thompson 150 right now.  If you care to meet me during lunch break I will be happy to work on this with you.TASK DETAILhttps://phabricator.wikimedia.org/T143488EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Cyberpower678Cc: Jane023, Multichill, Abbe98, Lydia_Pintscher, Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, Nandana, tabish.shaikh91, Lahi, Gq86, GoranSMilovanovic, Soteriaspace, Jayprakash12345, JakeTheDeveloper, QZanden, dachary, merbst, LawExplorer, D3r1ck01, Wikidata-bugs, Hydriz, aude, Ricordisamoa, Sjoerddebruin, TheDJ, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2018-10-17 Thread Multichill
Multichill added a comment.

In T143488#4132389, @Cyberpower678 wrote:
It's doable, but not easy.  Wikidata has a different structure.


Extending the current bot seems to be the most future proof solution. In this task we only care about getting things into the archive, nothing else. So my guess is that parsing a Wikidata item is what you run into? Take for example https://www.wikidata.org/wiki/Q24066189 . You could force it to some other format like https://www.wikidata.org/entity/Q24066189.rdf or https://www.wikidata.org/entity/Q24066189.json to make it easier to find and extract urls.

Do you have some pointers where you think the challenge is going to be? We have an upcoming hackathon and we might be able to work on this.TASK DETAILhttps://phabricator.wikimedia.org/T143488EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MultichillCc: Multichill, Abbe98, Lydia_Pintscher, Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, Nandana, tabish.shaikh91, Lahi, Gq86, GoranSMilovanovic, Soteriaspace, Jayprakash12345, JakeTheDeveloper, QZanden, dachary, merbst, LawExplorer, D3r1ck01, Wikidata-bugs, Hydriz, aude, Ricordisamoa, Sjoerddebruin, TheDJ, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2018-04-15 Thread Cyberpower678
Cyberpower678 added a comment.
It's doable, but not easy.  Wikidata has a different structure.TASK DETAILhttps://phabricator.wikimedia.org/T143488EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Cyberpower678Cc: Multichill, Abbe98, Lydia_Pintscher, Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, Lahi, Gq86, GoranSMilovanovic, Soteriaspace, Jayprakash12345, JakeTheDeveloper, QZanden, dachary, LawExplorer, Wikidata-bugs, Hydriz, aude, Ricordisamoa, Sjoerddebruin, TheDJ, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2018-04-15 Thread abian
abian added a comment.
Any news? Would it be easy to add Wikidata to the list of wikis where the InternetArchiveBot runs?TASK DETAILhttps://phabricator.wikimedia.org/T143488EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: abianCc: Multichill, Abbe98, Lydia_Pintscher, Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, Lahi, Gq86, GoranSMilovanovic, Soteriaspace, Jayprakash12345, JakeTheDeveloper, QZanden, dachary, LawExplorer, Wikidata-bugs, Hydriz, aude, Ricordisamoa, Sjoerddebruin, TheDJ, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2017-04-20 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.
I'm currently in contact with them to get this done.TASK DETAILhttps://phabricator.wikimedia.org/T143488EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lydia_PintscherCc: Lydia_Pintscher, Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, Soteriaspace, JakeTheDeveloper, QZanden, dachary, Wikidata-bugs, aude, Ricordisamoa, Sjoerddebruin, TheDJ, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T143488: Save contents of URLs linked from Wikidata in the Internet Archive

2016-12-08 Thread Micru
Micru added a comment.
Will it support URLs generated by external identifier?TASK DETAILhttps://phabricator.wikimedia.org/T143488EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MicruCc: Micru, Sadads, Cyberpower678, Izno, Aklapper, abian, JakeTheDeveloper, dachary, D3r1ck01, Wikidata-bugs, aude, Ricordisamoa, Sjoerddebruin, TheDJ, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs