Re: [Wikitech-l] Citation Hunt database

2017-02-05 Thread Takashi OTA
It surely worked. I finally succeeded to run jawp version of CitationHunt!
Really really appreciated.

[Closed]

--Takashi [[User:Takot]]

2017年2月5日(日) 11:39 MZMcBride <z...@mzmcbride.com>:

> Takashi OTA wrote:
> >After importing categorylinks.sql and page.sql, downloaded from
> >https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-category.sql.gz
> >https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-page.sql.gz
> >
> >on to local MySQL database "jawiki_p", with the instructions shown at:
> >https://github.com/eggpi/citationhunt/blob/master/scripts/README.md .
> >
> >(I have done it like;
> >$ mysql -u root
> >mysql> create database jawiki_p;
> >mysql> use jawiki_p;
> >mysql> source jawiki-latest-category.sql;
> >mysql> source jawiki-latest-page.sql; )
> >
> >When you run scripts/print_unsourced_pageids_from_wikipedia.py
> >after setting CH_LANG, it dumped an error shown below:
> >
> >(ch-venv) Mac-mini:scripts takot$ export CH_LANG=en
> >(ch-venv) Mac-mini:scripts takot$ echo $CH_LANG
> >ja
> >(ch-venv) Mac-mini:scripts takot$
> >./print_unsourced_pageids_from_wikipedia.py > unsourced
> >Traceback (most recent call last):
> >  File "./print_unsourced_pageids_from_wikipedia.py", line 40, in 
> >print_unsourced_ids_from_wikipedia()
> >  File "./print_unsourced_pageids_from_wikipedia.py", line 21, in
> >print_unsourced_ids_from_wikipedia
> >' OR '.join(['cl_to = %s'] * len(categories)) + ')', categories)
> >  File
> >"/Users/takot/ch-venv/lib/python2.7/site-packages/MySQLdb/cursors.py",
> >line
> >205, in execute
> >self.errorhandler(self, exc, value)
> >  File
> >"/Users/takot/ch-venv/lib/python2.7/site-packages/MySQLdb/connections.py",
> >line 36, in defaulterrorhandler
> >raise errorclass, errorvalue
> >_mysql_exceptions.ProgrammingError: (1146, "Table 'jawiki_p.categorylinks'
> >doesn't exist")
>
> I think you're confusing these two database tables:
>
> * https://www.mediawiki.org/wiki/Manual:Category_table
> * https://www.mediawiki.org/wiki/Manual:Categorylinks_table
>
> It looks like you loaded category, but the script is complaining about
> categorylinks.
>
> MZMcBride
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Citation Hunt database

2017-02-04 Thread Takashi OTA
Thanks MZMcBride.
That makes sense. I will try to.

--Takashi

On Sun, Feb 5, 2017 at 11:39 MZMcBride <z...@mzmcbride.com> wrote:

> Takashi OTA wrote:
> >After importing categorylinks.sql and page.sql, downloaded from
> >https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-category.sql.gz
> >https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-page.sql.gz
> >
> >on to local MySQL database "jawiki_p", with the instructions shown at:
> >https://github.com/eggpi/citationhunt/blob/master/scripts/README.md .
> >
> >(I have done it like;
> >$ mysql -u root
> >mysql> create database jawiki_p;
> >mysql> use jawiki_p;
> >mysql> source jawiki-latest-category.sql;
> >mysql> source jawiki-latest-page.sql; )
> >
> >When you run scripts/print_unsourced_pageids_from_wikipedia.py
> >after setting CH_LANG, it dumped an error shown below:
> >
> >(ch-venv) Mac-mini:scripts takot$ export CH_LANG=en
> >(ch-venv) Mac-mini:scripts takot$ echo $CH_LANG
> >ja
> >(ch-venv) Mac-mini:scripts takot$
> >./print_unsourced_pageids_from_wikipedia.py > unsourced
> >Traceback (most recent call last):
> >  File "./print_unsourced_pageids_from_wikipedia.py", line 40, in 
> >print_unsourced_ids_from_wikipedia()
> >  File "./print_unsourced_pageids_from_wikipedia.py", line 21, in
> >print_unsourced_ids_from_wikipedia
> >' OR '.join(['cl_to = %s'] * len(categories)) + ')', categories)
> >  File
> >"/Users/takot/ch-venv/lib/python2.7/site-packages/MySQLdb/cursors.py",
> >line
> >205, in execute
> >self.errorhandler(self, exc, value)
> >  File
> >"/Users/takot/ch-venv/lib/python2.7/site-packages/MySQLdb/connections.py",
> >line 36, in defaulterrorhandler
> >raise errorclass, errorvalue
> >_mysql_exceptions.ProgrammingError: (1146, "Table 'jawiki_p.categorylinks'
> >doesn't exist")
>
> I think you're confusing these two database tables:
>
> * https://www.mediawiki.org/wiki/Manual:Category_table
> * https://www.mediawiki.org/wiki/Manual:Categorylinks_table
>
> It looks like you loaded category, but the script is complaining about
> categorylinks.
>
> MZMcBride
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Citation Hunt database

2017-02-04 Thread Takashi OTA
I'm working on to run the Citation Hunt to enable it for Japanese
Wikipedia, on my home Mac mini (not on the Tools Lab). Sorry if
this is not the right channel to communicate with. In that case I
would appreciate if you guide me to more appropriate one.

After reading
https://github.com/eggpi/citationhunt/blob/master/CONTRIBUTING.md ,
I have succeeded to run it at least for enwp locally, with provided
en.sql.gz
from https://tools.wmflabs.org/citationhunt/static/exports/en.sql.gz
That's a good kickstart, I assume.

---

Currently I'm stuck in preparing jawp's database to run.

After importing categorylinks.sql and page.sql, downloaded from
https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-category.sql.gz
https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-page.sql.gz

on to local MySQL database "jawiki_p", with the instructions shown at:
https://github.com/eggpi/citationhunt/blob/master/scripts/README.md .

(I have done it like;
$ mysql -u root
mysql> create database jawiki_p;
mysql> use jawiki_p;
mysql> source jawiki-latest-category.sql;
mysql> source jawiki-latest-page.sql; )

When you run scripts/print_unsourced_pageids_from_wikipedia.py
after setting CH_LANG, it dumped an error shown below:

(ch-venv) Mac-mini:scripts takot$ export CH_LANG=en
(ch-venv) Mac-mini:scripts takot$ echo $CH_LANG
ja
(ch-venv) Mac-mini:scripts takot$
./print_unsourced_pageids_from_wikipedia.py > unsourced
Traceback (most recent call last):
  File "./print_unsourced_pageids_from_wikipedia.py", line 40, in 
print_unsourced_ids_from_wikipedia()
  File "./print_unsourced_pageids_from_wikipedia.py", line 21, in
print_unsourced_ids_from_wikipedia
' OR '.join(['cl_to = %s'] * len(categories)) + ')', categories)
  File
"/Users/takot/ch-venv/lib/python2.7/site-packages/MySQLdb/cursors.py", line
205, in execute
self.errorhandler(self, exc, value)
  File
"/Users/takot/ch-venv/lib/python2.7/site-packages/MySQLdb/connections.py",
line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.ProgrammingError: (1146, "Table 'jawiki_p.categorylinks'
doesn't exist")

---

Apparently the database on MySQL seems not prepared well.
My current config.py can be seen at:
https://github.com/takot/citationhunt/blob/master/config.py

Current database tables in jawiki_p on my local MySQL database is like this:

$ mysql -u root
mysql> show databases;
+---+
| Database  |
+---+
| information_schema|
| jawiki_p  |
| mysql |
| performance_schema|
| root__citationhunt_en |
| root__citationhunt_ja |
| root__stats_global|
| sys   |
+---+
8 rows in set (0.02 sec)

mysql> use jawiki_p;
mysql> show tables;
++
| Tables_in_jawiki_p |
++
| category   |
| page   |
++
2 rows in set (0.01 sec)

---

Hopes you provide some tip or hack to proceed.
Thanks in advance,

--Takashi [[User:Takot]]
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Acquiring list of templates including external links

2016-07-31 Thread Takashi OTA
Hoi,

This is an inquiry from my friend in academia, researching about Wikipedia.

He would like to know whether there's a way to acquire a list of templates
including external links. Here are some examples including external links.

https://ja.wikipedia.org/wiki/Template:JOI/doc
https://ja.wikipedia.org/wiki/Template:Twitter/doc

Such links are stored in externallinks.sql.gz, in an expanded form.

When you want to check increase/decrease of linked domains in chronological
order through edit history, you have to check pages-meta-history1.xml etc.
In a such case, traditional links and links by templates are mixed,
Therefore, the latter ones (links by templates) should be expanded to
traditional link forms.

Sorry if what I am saying does not make sense.
Thanks in advance,

--Takashi Ota [[U:Takot]]
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l