Re: [PATCH] mw-to-git/t9360: fix broken &&-chain

2018-08-21 Thread Antoine Beaupré
On 2018-08-21 15:22:43, Eric Sunshine wrote:
> On Tue, Aug 21, 2018 at 2:55 PM Antoine Beaupré  wrote:
>> On 2018-08-08 18:10:22, Matthieu Moy wrote:
>> > "jrnieder"  wrote:
>> >> (+cc: some folks interested in git-remote-mediawiki)
>> >
>> > In case it still matters, an obvious Acked-by: Matthieu Moy 
>> > 
>>
>> I seem to have lost context of the original email, and can't find a copy
>> on public-inbox.org... Is there a patch we should merge back into
>> git-mediawiki already?
>
> The patch is here[1].
>
> [1]: 
> https://public-inbox.org/git/20180730204646.32312-1-sunsh...@sunshineco.com/

Thanks, so

Acked-by: Antoine Beaupré 

FWIW. :)

A.

-- 
The history of any one part of the earth, like the life of a soldier,
consists of long periods of boredom and short periods of terror.
   - British geologist Derek V. Ager


Re: [PATCH] mw-to-git/t9360: fix broken &&-chain

2018-08-21 Thread Antoine Beaupré
On 2018-08-08 18:10:22, Matthieu Moy wrote:
> "jrnieder"  wrote:
>
>> (+cc: some folks interested in git-remote-mediawiki)
>
> Thanks.
>
> In case it still matters, an obvious Acked-by: Matthieu Moy 
> 

Hi,

I seem to have lost context of the original email, and can't find a copy
on public-inbox.org... Is there a patch we should merge back into
git-mediawiki already?

Thanks!

A.

-- 
Your injured body has become the burden of your digital soul.
- Yin Aiwen, 2013, The Massage is the Medium


[PATCH v5 2/7] remote-mediawiki: allow fetching namespaces with spaces

2017-11-07 Thread Antoine Beaupré
From: Ingo Ruhnke <grum...@gmail.com>

we still want to use spaces as separators in the config, but we should
allow the user to specify namespaces with spaces, so we use underscore
for this.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 5ffb57595..a1d783789 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -65,6 +65,7 @@ chomp(@tracked_categories);
 
 # Just like @tracked_categories, but for MediaWiki namespaces.
 my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.namespaces"));
+for (@tracked_namespaces) { s/_/ /g; }
 chomp(@tracked_namespaces);
 
 # Import media files on pull
-- 
2.11.0



[PATCH v5 0/7] namespace support

2017-11-07 Thread Antoine Beaupré
Yet another reroll to fix a typo.



Re: [PATCH v4 3/7] remote-mediawiki: show known namespace choices on failure

2017-11-07 Thread Antoine Beaupré
On 2017-11-07 10:45:27, Thomas Adam wrote:
> On Mon, Nov 06, 2017 at 04:19:49PM -0500, Antoine Beaupré wrote:
>> If we fail to find a requested namespace, we should tell the user
>> which ones we know about, since those were already fetched. This
>> allows users to fetch all namespaces by specifying a dummy namespace,
>> failing, then copying the list of namespaces in the config.
>> 
>> Eventually, we should have a flag that allows fetching all namespaces
>> automatically.
>> 
>> Reviewed-by: Antoine Beaupré <anar...@debian.org>
>> Signed-off-by: Antoine Beaupré <anar...@debian.org>
>> ---
>>  contrib/mw-to-git/git-remote-mediawiki.perl | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
>> b/contrib/mw-to-git/git-remote-mediawiki.perl
>> index a1d783789..6364d4e91 100755
>> --- a/contrib/mw-to-git/git-remote-mediawiki.perl
>> +++ b/contrib/mw-to-git/git-remote-mediawiki.perl
>> @@ -1334,7 +1334,8 @@ sub get_mw_namespace_id {
>>  my $id;
>>  
>>  if (!defined $ns) {
>> -print {*STDERR} "No such namespace ${name} on MediaWiki.\n";
>> +my @namespaces = map { s/ /_/g; $_; } sort keys %namespaces_id;
>
> Oops.  This was my typo from my original suggestion.  The hash is
> '%namespace_id', not '%namespaces_id'.  However, how did this slip through
> testing?  I'm assuming you blindly copied this from my example, which although
> quick to do, is only being caught because of my sharp eyes...

I must admit I did not test that at all. Honestly, I'm just trying to
finalize this so we can move to GitHub and I can move on other
things. :)

I rerolled with your fix.

A.
-- 
If builders built houses the way programmers built programs,
The first woodpecker to come along would destroy civilization.
- Gerald Weinberg


[PATCH v5 7/7] remote-mediawiki: show progress while fetching namespaces

2017-11-07 Thread Antoine Beaupré
Without this, the fetch process seems hanged while we fetch page
listings across the namespaces. Obviously, it should be possible to
silence this with -q, but that's an issue already present everywhere
in the code and should be fixed separately:

https://github.com/Git-Mediawiki/Git-Mediawiki/issues/30

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index c9f46359b..af9cbc9d0 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -279,6 +279,7 @@ sub get_mw_tracked_namespaces {
 aplimit => 'max' } )
 || die $mediawiki->{error}->{code} . ': '
 . $mediawiki->{error}->{details} . "\n";
+print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace 
($namespace_id)\n";
 foreach my $page (@{$mw_pages}) {
 $pages->{$page->{title}} = $page;
 }
-- 
2.11.0



[PATCH v5 4/7] remote-mediawiki: skip virtual namespaces

2017-11-07 Thread Antoine Beaupré
Virtual namespaces do not correspond to pages in the database and are
automatically generated by MediaWiki. It makes little sense,
therefore, to fetch pages from those namespaces and the MW API doesn't
support listing those pages.

According to the documentation, those virtual namespaces are currently
"Special" (-1) and "Media" (-2) but we treat all negative namespaces
as "virtual" as a future-proofing mechanism.

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 5e8845893..611a04cd7 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -264,10 +264,13 @@ sub get_mw_tracked_categories {
 sub get_mw_tracked_namespaces {
 my $pages = shift;
 foreach my $local_namespace (@tracked_namespaces) {
+my $namespace_id = get_mw_namespace_id($local_namespace);
+# virtual namespaces don't support allpages
+next if !defined($namespace_id) || $namespace_id < 0;
 my $mw_pages = $mediawiki->list( {
 action => 'query',
 list => 'allpages',
-apnamespace => get_mw_namespace_id($local_namespace),
+apnamespace => $namespace_id,
 aplimit => 'max' } )
 || die $mediawiki->{error}->{code} . ': '
 . $mediawiki->{error}->{details} . "\n";
-- 
2.11.0



[PATCH v5 6/7] remote-mediawiki: process namespaces in order

2017-11-07 Thread Antoine Beaupré
Ideally, we'd process them in numeric order since that is more
logical, but we can't do that yet since this is where we find the
numeric identifiers in the first place. Lexicographic order is a good
compromise.

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 0e60b85c8..c9f46359b 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -263,7 +263,7 @@ sub get_mw_tracked_categories {
 
 sub get_mw_tracked_namespaces {
 my $pages = shift;
-foreach my $local_namespace (@tracked_namespaces) {
+foreach my $local_namespace (sort @tracked_namespaces) {
 my $namespace_id;
 if ($local_namespace eq "(Main)") {
 $namespace_id = 0;
-- 
2.11.0



[PATCH v5 3/7] remote-mediawiki: show known namespace choices on failure

2017-11-07 Thread Antoine Beaupré
If we fail to find a requested namespace, we should tell the user
which ones we know about, since those were already fetched. This
allows users to fetch all namespaces by specifying a dummy namespace,
failing, then copying the list of namespaces in the config.

Eventually, we should have a flag that allows fetching all namespaces
automatically.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index a1d783789..5e8845893 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -1334,7 +1334,8 @@ sub get_mw_namespace_id {
my $id;
 
if (!defined $ns) {
-   print {*STDERR} "No such namespace ${name} on MediaWiki.\n";
+   my @namespaces = map { s/ /_/g; $_; } sort keys %namespace_id;
+   print {*STDERR} "No such namespace ${name} on MediaWiki, known 
namespaces: @namespaces\n";
$ns = {is_namespace => 0};
$namespace_id{$name} = $ns;
}
-- 
2.11.0



[PATCH v5 5/7] remote-mediawiki: support fetching from (Main) namespace

2017-11-07 Thread Antoine Beaupré
When we specify a list of namespaces to fetch from, by default the MW
API will not fetch from the default namespace, refered to as "(Main)"
in the documentation:

https://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces

I haven't found a way to address that "(Main)" namespace when getting
the namespace ids: indeed, when listing namespaces, there is no
"canonical" field for the main namespace, although there is a "*"
field that is set to "" (empty). So in theory, we could specify the
empty namespace to get the main namespace, but that would make
specifying namespaces harder for the user: we would need to teach
users about the "empty" default namespace. It would also make the code
more complicated: we'd need to parse quotes in the configuration.

So we simply override the query here and allow the user to specify
"(Main)" since that is the publicly documented name.

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 611a04cd7..0e60b85c8 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -264,7 +264,12 @@ sub get_mw_tracked_categories {
 sub get_mw_tracked_namespaces {
 my $pages = shift;
 foreach my $local_namespace (@tracked_namespaces) {
-my $namespace_id = get_mw_namespace_id($local_namespace);
+my $namespace_id;
+if ($local_namespace eq "(Main)") {
+$namespace_id = 0;
+} else {
+$namespace_id = get_mw_namespace_id($local_namespace);
+}
 # virtual namespaces don't support allpages
 next if !defined($namespace_id) || $namespace_id < 0;
 my $mw_pages = $mediawiki->list( {
-- 
2.11.0



[PATCH v5 1/7] remote-mediawiki: add namespace support

2017-11-07 Thread Antoine Beaupré
From: Kevin <ke...@ki-ai.org>

This introduces a new remote.origin.namespaces argument that is a
space-separated list of namespaces. The list of pages extract is then
taken from all the specified namespaces.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 25 +
 1 file changed, 25 insertions(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index e7f857c1a..5ffb57595 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -63,6 +63,10 @@ chomp(@tracked_pages);
 my @tracked_categories = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.categories"));
 chomp(@tracked_categories);
 
+# Just like @tracked_categories, but for MediaWiki namespaces.
+my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.namespaces"));
+chomp(@tracked_namespaces);
+
 # Import media files on pull
 my $import_media = run_git("config --get --bool 
remote.${remotename}.mediaimport");
 chomp($import_media);
@@ -256,6 +260,23 @@ sub get_mw_tracked_categories {
return;
 }
 
+sub get_mw_tracked_namespaces {
+my $pages = shift;
+foreach my $local_namespace (@tracked_namespaces) {
+my $mw_pages = $mediawiki->list( {
+action => 'query',
+list => 'allpages',
+apnamespace => get_mw_namespace_id($local_namespace),
+aplimit => 'max' } )
+|| die $mediawiki->{error}->{code} . ': '
+. $mediawiki->{error}->{details} . "\n";
+foreach my $page (@{$mw_pages}) {
+$pages->{$page->{title}} = $page;
+}
+}
+return;
+}
+
 sub get_mw_all_pages {
my $pages = shift;
# No user-provided list, get the list of pages from the API.
@@ -319,6 +340,10 @@ sub get_mw_pages {
$user_defined = 1;
get_mw_tracked_categories(\%pages);
}
+   if (@tracked_namespaces) {
+   $user_defined = 1;
+   get_mw_tracked_namespaces(\%pages);
+   }
if (!$user_defined) {
get_mw_all_pages(\%pages);
}
-- 
2.11.0



Re: [PATCH v4 2/7] remote-mediawiki: allow fetching namespaces with spaces

2017-11-07 Thread Antoine Beaupré
On 2017-11-07 07:08:08, Thomas Adam wrote:
> On Mon, Nov 06, 2017 at 04:19:48PM -0500, Antoine Beaupré wrote:
>> From: Ingo Ruhnke <grum...@gmail.com>
>> 
>> we still want to use spaces as separators in the config, but we should
>> allow the user to specify namespaces with spaces, so we use underscore
>> for this.
>> 
>> Reviewed-by: Antoine Beaupré <anar...@debian.org>
>> Signed-off-by: Antoine Beaupré <anar...@debian.org>
>> ---
>>  contrib/mw-to-git/git-remote-mediawiki.perl | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
>> b/contrib/mw-to-git/git-remote-mediawiki.perl
>> index 5ffb57595..a1d783789 100755
>> --- a/contrib/mw-to-git/git-remote-mediawiki.perl
>> +++ b/contrib/mw-to-git/git-remote-mediawiki.perl
>> @@ -65,6 +65,7 @@ chomp(@tracked_categories);
>>  
>>  # Just like @tracked_categories, but for MediaWiki namespaces.
>>  my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all 
>> remote.${remotename}.namespaces"));
>> +for (@tracked_namespaces) { s/_/ /g; }
>>  chomp(@tracked_namespaces);
>
> Depending on the number if namespaces returned, it might be easier to convert
> this to the following:
>
> my @tracked_namespaces = map {
>   chomp; s/_/ /g; $_;
> } split(/[ \n]/, run_git("config --get-all 
> remote.${remotename}.namespaces"));
>
> This would, once again, avoid creating @tracked_namespaces, and iterating over
> it.
>
> Note that this isn't about trying to 'golf' this; it's a performance
> consideration.

I'm not sure it's worth it. Mediawiki has only about 10 default
namespaces, and the user needs to specify them by hand here. I wouldn't
be concerned about the performance.

A.

-- 
Education is the most powerful weapon which we can use to change the
world.
   - Nelson Mandela


Re: future of the mediawiki extension?

2017-11-06 Thread Antoine Beaupré
On 2017-11-07 09:44:03, Junio C Hamano wrote:
> Antoine Beaupré <anar...@debian.org> writes:
>
>> On 2017-10-31 10:37:29, Junio C Hamano wrote:
>>>> There's also a hybrid solution used by git-multimail: have a copy of the
>>>> code in git.git, but do the development separately. I'm not sure it'd be
>>>> a good idea for Git-Mediawiki, but I'm mentionning it for completeness.
>>>
>>> I think the plan was to make code drop from time to time at major
>>> release points of git-multimail, but I do not think we've seen many
>>> updates recently.
>>
>> I'd be okay with a hybrid as well. It would require minimal work on
>> Git's side at this stage: things can just stay as is until there's a new
>> "release" of the mediawiki extension and at that point you can decide if
>> you merge it all in or if you drop it in favor of the contrib.
>>
>> I think it's also fine to punt it completely out to the community.
>>
>> Either way, I may have time to do some of that work in the coming month,
>> so let me know what you prefer, I guess you two have the last word
>> here. The community, on Mediawiki's side, seem to mostly favor GitHub.
>
> I guess I shouldn't leave this thread hanging.
>
> As contrib/README says, the "owners" of an area in contrib/ has the
> ultimate say and control over the area, and for contrib/mw-to-git,
> the "owners" have always been Matthieu, at least to me.
>
> As he made it clear earlier in this thread that (1) he sees you as a
> steady hand that can help guide the tool forward as its new "owner",
> and (2) he thinks Git-Mediawiki will be helped by being an
> independent project hosted at GitHub, now you have the say ;-)
>
> A few topics from you that are already on list may want to go
> through to 'master' as any other topics, but from there on, I am
> fine with the development of Git-Mediawiki primarily done as a
> separate project, optionally giving contrib/mw-to-git/ occasional
> update dumps.  You could even choose to remove contrib/mw-to-git/*
> except for git-remote-mediawiki.txt that says that the tool's main
> development effort happens at GitHub to redirect people, if you
> think that would reduce potential confusion.
>
> I am also OK to serve as a patch monkey and keep going; I won't be
> picking up patches to contrib/mw-to-git/ unless you (and others)
> review them, though.

Makes sense. I think that, for now, I'll keep some sort of status quo
and "copy" (as opposed to "move") development over to GitHub. We can
then make dumps when new releases are done over there. If that proves
impractical because of changes in the build system or some other reason,
I'll send patches to clear the code from core and replace it with the
suggested .txt file.

Thanks!

A.

-- 
Like slavery and apartheid, poverty is not natural. It is man-made and
it can be overcome and eradicated by the actions of human
beings. Overcoming poverty is not a gesture of charity. It is an act
of justice. - Nelson Mandela


[PATCH v4 0/7] remote-mediawiki: namespace support

2017-11-06 Thread Antoine Beaupré
Hopefully, the final series. This includes only one more fix, from
Thomas, to remove an extra loop.

This should, alas, be ready to merge.



[PATCH v4 2/7] remote-mediawiki: allow fetching namespaces with spaces

2017-11-06 Thread Antoine Beaupré
From: Ingo Ruhnke <grum...@gmail.com>

we still want to use spaces as separators in the config, but we should
allow the user to specify namespaces with spaces, so we use underscore
for this.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 5ffb57595..a1d783789 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -65,6 +65,7 @@ chomp(@tracked_categories);
 
 # Just like @tracked_categories, but for MediaWiki namespaces.
 my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.namespaces"));
+for (@tracked_namespaces) { s/_/ /g; }
 chomp(@tracked_namespaces);
 
 # Import media files on pull
-- 
2.11.0



[PATCH v4 7/7] remote-mediawiki: show progress while fetching namespaces

2017-11-06 Thread Antoine Beaupré
Without this, the fetch process seems hanged while we fetch page
listings across the namespaces. Obviously, it should be possible to
silence this with -q, but that's an issue already present everywhere
in the code and should be fixed separately:

https://github.com/Git-Mediawiki/Git-Mediawiki/issues/30

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 7dccb44e0..fcdc29197 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -279,6 +279,7 @@ sub get_mw_tracked_namespaces {
 aplimit => 'max' } )
 || die $mediawiki->{error}->{code} . ': '
 . $mediawiki->{error}->{details} . "\n";
+print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace 
($namespace_id)\n";
 foreach my $page (@{$mw_pages}) {
 $pages->{$page->{title}} = $page;
 }
-- 
2.11.0



[PATCH v4 3/7] remote-mediawiki: show known namespace choices on failure

2017-11-06 Thread Antoine Beaupré
If we fail to find a requested namespace, we should tell the user
which ones we know about, since those were already fetched. This
allows users to fetch all namespaces by specifying a dummy namespace,
failing, then copying the list of namespaces in the config.

Eventually, we should have a flag that allows fetching all namespaces
automatically.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index a1d783789..6364d4e91 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -1334,7 +1334,8 @@ sub get_mw_namespace_id {
my $id;
 
if (!defined $ns) {
-   print {*STDERR} "No such namespace ${name} on MediaWiki.\n";
+   my @namespaces = map { s/ /_/g; $_; } sort keys %namespaces_id;
+   print {*STDERR} "No such namespace ${name} on MediaWiki, known 
namespaces: @namespaces\n";
$ns = {is_namespace => 0};
$namespace_id{$name} = $ns;
}
-- 
2.11.0



[PATCH v4 4/7] remote-mediawiki: skip virtual namespaces

2017-11-06 Thread Antoine Beaupré
Virtual namespaces do not correspond to pages in the database and are
automatically generated by MediaWiki. It makes little sense,
therefore, to fetch pages from those namespaces and the MW API doesn't
support listing those pages.

According to the documentation, those virtual namespaces are currently
"Special" (-1) and "Media" (-2) but we treat all negative namespaces
as "virtual" as a future-proofing mechanism.

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 6364d4e91..7f483180f 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -264,10 +264,13 @@ sub get_mw_tracked_categories {
 sub get_mw_tracked_namespaces {
 my $pages = shift;
 foreach my $local_namespace (@tracked_namespaces) {
+my $namespace_id = get_mw_namespace_id($local_namespace);
+# virtual namespaces don't support allpages
+next if !defined($namespace_id) || $namespace_id < 0;
 my $mw_pages = $mediawiki->list( {
 action => 'query',
 list => 'allpages',
-apnamespace => get_mw_namespace_id($local_namespace),
+apnamespace => $namespace_id,
 aplimit => 'max' } )
 || die $mediawiki->{error}->{code} . ': '
 . $mediawiki->{error}->{details} . "\n";
-- 
2.11.0



[PATCH v4 6/7] remote-mediawiki: process namespaces in order

2017-11-06 Thread Antoine Beaupré
Ideally, we'd process them in numeric order since that is more
logical, but we can't do that yet since this is where we find the
numeric identifiers in the first place. Lexicographic order is a good
compromise.

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 7a0824f31..7dccb44e0 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -263,7 +263,7 @@ sub get_mw_tracked_categories {
 
 sub get_mw_tracked_namespaces {
 my $pages = shift;
-foreach my $local_namespace (@tracked_namespaces) {
+foreach my $local_namespace (sort @tracked_namespaces) {
 my $namespace_id;
 if ($local_namespace eq "(Main)") {
 $namespace_id = 0;
-- 
2.11.0



[PATCH v4 5/7] remote-mediawiki: support fetching from (Main) namespace

2017-11-06 Thread Antoine Beaupré
When we specify a list of namespaces to fetch from, by default the MW
API will not fetch from the default namespace, refered to as "(Main)"
in the documentation:

https://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces

I haven't found a way to address that "(Main)" namespace when getting
the namespace ids: indeed, when listing namespaces, there is no
"canonical" field for the main namespace, although there is a "*"
field that is set to "" (empty). So in theory, we could specify the
empty namespace to get the main namespace, but that would make
specifying namespaces harder for the user: we would need to teach
users about the "empty" default namespace. It would also make the code
more complicated: we'd need to parse quotes in the configuration.

So we simply override the query here and allow the user to specify
"(Main)" since that is the publicly documented name.

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 7f483180f..7a0824f31 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -264,7 +264,12 @@ sub get_mw_tracked_categories {
 sub get_mw_tracked_namespaces {
 my $pages = shift;
 foreach my $local_namespace (@tracked_namespaces) {
-my $namespace_id = get_mw_namespace_id($local_namespace);
+my $namespace_id;
+if ($local_namespace eq "(Main)") {
+$namespace_id = 0;
+} else {
+$namespace_id = get_mw_namespace_id($local_namespace);
+}
 # virtual namespaces don't support allpages
 next if !defined($namespace_id) || $namespace_id < 0;
 my $mw_pages = $mediawiki->list( {
-- 
2.11.0



[PATCH v4 1/7] remote-mediawiki: add namespace support

2017-11-06 Thread Antoine Beaupré
From: Kevin <ke...@ki-ai.org>

This introduces a new remote.origin.namespaces argument that is a
space-separated list of namespaces. The list of pages extract is then
taken from all the specified namespaces.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 25 +
 1 file changed, 25 insertions(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index e7f857c1a..5ffb57595 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -63,6 +63,10 @@ chomp(@tracked_pages);
 my @tracked_categories = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.categories"));
 chomp(@tracked_categories);
 
+# Just like @tracked_categories, but for MediaWiki namespaces.
+my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.namespaces"));
+chomp(@tracked_namespaces);
+
 # Import media files on pull
 my $import_media = run_git("config --get --bool 
remote.${remotename}.mediaimport");
 chomp($import_media);
@@ -256,6 +260,23 @@ sub get_mw_tracked_categories {
return;
 }
 
+sub get_mw_tracked_namespaces {
+my $pages = shift;
+foreach my $local_namespace (@tracked_namespaces) {
+my $mw_pages = $mediawiki->list( {
+action => 'query',
+list => 'allpages',
+apnamespace => get_mw_namespace_id($local_namespace),
+aplimit => 'max' } )
+|| die $mediawiki->{error}->{code} . ': '
+. $mediawiki->{error}->{details} . "\n";
+foreach my $page (@{$mw_pages}) {
+$pages->{$page->{title}} = $page;
+}
+}
+return;
+}
+
 sub get_mw_all_pages {
my $pages = shift;
# No user-provided list, get the list of pages from the API.
@@ -319,6 +340,10 @@ sub get_mw_pages {
$user_defined = 1;
get_mw_tracked_categories(\%pages);
}
+   if (@tracked_namespaces) {
+   $user_defined = 1;
+   get_mw_tracked_namespaces(\%pages);
+   }
if (!$user_defined) {
get_mw_all_pages(\%pages);
}
-- 
2.11.0



Re: [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces

2017-11-02 Thread Antoine Beaupré
On 2017-11-02 22:31:02, Thomas Adam wrote:
> On Thu, Nov 02, 2017 at 06:26:43PM -0400, Antoine Beaupré wrote:
>> On 2017-11-02 22:18:07, Thomas Adam wrote:
>> > Hi,
>> >
>> > On Thu, Nov 02, 2017 at 05:25:18PM -0400, Antoine Beaupré wrote:
>> >> +print {*STDERR} "$#{$mw_pages} found in namespace 
>> >> $local_namespace ($namespace_id)\n";
>> >
>> > How is this any different to using warn()?  I appreciate you're using a
>> > globbed filehandle, but it seems superfluous to me.
>> 
>> It's what is used everywhere in the module, I'm just tagging along.
>> 
>> This was discussed before: there's an issue about cleaning up the
>> messaging in that module, that can be fixed separately.
>
> Understood.  That should happen sooner rather than later.

Actually, is there a standard way to do this in git with Perl
extensions? I know about "option verbosity N" but how should I translate
this into Perl? Carp? Warn? Log::Any? Log4perl?

Recommendations welcome...

A.

-- 
Si Dieu existe, j'espère qu'Il a une excuse valable
- Daniel Pennac


Re: [PATCH v3 4/7] remote-mediawiki: skip virtual namespaces

2017-11-02 Thread Antoine Beaupré
On 2017-11-02 18:43:00, Eric Sunshine wrote:
> On Thu, Nov 2, 2017 at 5:25 PM, Antoine Beaupré <anar...@debian.org> wrote:
>> Virtual namespaces do not correspond to pages in the database and are
>> automatically generated by MediaWiki. It makes little sense,
>> therefore, to fetch pages from those namespaces and the MW API doesn't
>> support listing those pages.
>>
>> According to the documentation, those virtual namespaces are currently
>> "Special" (-1) and "Media" (-2) but we treat all negative namespaces
>> as "virtual" as a future-proofing mechanism.
>>
>> Reviewed-by: Eric Sunshine <sunsh...@sunshineco.com>
>
> It probably would be best to omit this Reviewed-by: since it was not
> provided explicitly. More importantly, I'm neither a user of nor
> familiar with MediaWiki or its API, so a Reviewed-by: from me has
> little or no value. Probably best would be for someone such as
> Matthieu to give his Reviewed-by: if he so desires.

Alright, I was wondering what the process was for those. I didn't want
to leave your contributions by the wayside...

I'll wait a little while longer for more feedback and then resend
without those. unless...

@junio: my github repo has the branch without those Reviewed-by tags,
iirc. so if you can to merge from there, that will keep me from sending
yet another pile of patches for such a trivial change...

a.

-- 
Semantics is the gravity of abstraction.


Re: [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces

2017-11-02 Thread Antoine Beaupré
On 2017-11-02 22:18:07, Thomas Adam wrote:
> Hi,
>
> On Thu, Nov 02, 2017 at 05:25:18PM -0400, Antoine Beaupré wrote:
>> +print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace 
>> ($namespace_id)\n";
>
> How is this any different to using warn()?  I appreciate you're using a
> globbed filehandle, but it seems superfluous to me.

It's what is used everywhere in the module, I'm just tagging along.

This was discussed before: there's an issue about cleaning up the
messaging in that module, that can be fixed separately.

A.
-- 
N'aimer qu'un seul est barbarie, car c'est au détriment de tous les
autres. Fût-ce l'amour de Dieu.
- Nietzsche, "Par delà le bien et le mal"


[PATCH v3 1/7] remote-mediawiki: add namespace support

2017-11-02 Thread Antoine Beaupré
From: Kevin <ke...@ki-ai.org>

This introduces a new remote.origin.namespaces argument that is a
space-separated list of namespaces. The list of pages extract is then
taken from all the specified namespaces.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 25 +
 1 file changed, 25 insertions(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index e7f857c1a..5ffb57595 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -63,6 +63,10 @@ chomp(@tracked_pages);
 my @tracked_categories = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.categories"));
 chomp(@tracked_categories);
 
+# Just like @tracked_categories, but for MediaWiki namespaces.
+my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.namespaces"));
+chomp(@tracked_namespaces);
+
 # Import media files on pull
 my $import_media = run_git("config --get --bool 
remote.${remotename}.mediaimport");
 chomp($import_media);
@@ -256,6 +260,23 @@ sub get_mw_tracked_categories {
return;
 }
 
+sub get_mw_tracked_namespaces {
+my $pages = shift;
+foreach my $local_namespace (@tracked_namespaces) {
+my $mw_pages = $mediawiki->list( {
+action => 'query',
+list => 'allpages',
+apnamespace => get_mw_namespace_id($local_namespace),
+aplimit => 'max' } )
+|| die $mediawiki->{error}->{code} . ': '
+. $mediawiki->{error}->{details} . "\n";
+foreach my $page (@{$mw_pages}) {
+$pages->{$page->{title}} = $page;
+}
+}
+return;
+}
+
 sub get_mw_all_pages {
my $pages = shift;
# No user-provided list, get the list of pages from the API.
@@ -319,6 +340,10 @@ sub get_mw_pages {
$user_defined = 1;
get_mw_tracked_categories(\%pages);
}
+   if (@tracked_namespaces) {
+   $user_defined = 1;
+   get_mw_tracked_namespaces(\%pages);
+   }
if (!$user_defined) {
get_mw_all_pages(\%pages);
}
-- 
2.11.0



[PATCH v3 4/7] remote-mediawiki: skip virtual namespaces

2017-11-02 Thread Antoine Beaupré
Virtual namespaces do not correspond to pages in the database and are
automatically generated by MediaWiki. It makes little sense,
therefore, to fetch pages from those namespaces and the MW API doesn't
support listing those pages.

According to the documentation, those virtual namespaces are currently
"Special" (-1) and "Media" (-2) but we treat all negative namespaces
as "virtual" as a future-proofing mechanism.

Reviewed-by: Eric Sunshine <sunsh...@sunshineco.com>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index e7616e1a2..21fb2e302 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -264,10 +264,13 @@ sub get_mw_tracked_categories {
 sub get_mw_tracked_namespaces {
 my $pages = shift;
 foreach my $local_namespace (@tracked_namespaces) {
+my $namespace_id = get_mw_namespace_id($local_namespace);
+# virtual namespaces don't support allpages
+next if !defined($namespace_id) || $namespace_id < 0;
 my $mw_pages = $mediawiki->list( {
 action => 'query',
 list => 'allpages',
-apnamespace => get_mw_namespace_id($local_namespace),
+apnamespace => $namespace_id,
 aplimit => 'max' } )
 || die $mediawiki->{error}->{code} . ': '
 . $mediawiki->{error}->{details} . "\n";
-- 
2.11.0



[PATCH v3 0/7] remote-mediawiki: namespace support

2017-11-02 Thread Antoine Beaupré
This should be the final roll of patches for namespace support. I
included the undef check even though that problem occurs elsewhere in
the code. I also removed the needless "my" move.

Hopefully that should be the last in the queue!



[PATCH v3 2/7] remote-mediawiki: allow fetching namespaces with spaces

2017-11-02 Thread Antoine Beaupré
From: Ingo Ruhnke <grum...@gmail.com>

we still want to use spaces as separators in the config, but we should
allow the user to specify namespaces with spaces, so we use underscore
for this.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 5ffb57595..a1d783789 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -65,6 +65,7 @@ chomp(@tracked_categories);
 
 # Just like @tracked_categories, but for MediaWiki namespaces.
 my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.namespaces"));
+for (@tracked_namespaces) { s/_/ /g; }
 chomp(@tracked_namespaces);
 
 # Import media files on pull
-- 
2.11.0



[PATCH v3 3/7] remote-mediawiki: show known namespace choices on failure

2017-11-02 Thread Antoine Beaupré
If we fail to find a requested namespace, we should tell the user
which ones we know about, since those were already fetched. This
allows users to fetch all namespaces by specifying a dummy namespace,
failing, then copying the list of namespaces in the config.

Eventually, we should have a flag that allows fetching all namespaces
automatically.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index a1d783789..e7616e1a2 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -1334,7 +1334,9 @@ sub get_mw_namespace_id {
my $id;
 
if (!defined $ns) {
-   print {*STDERR} "No such namespace ${name} on MediaWiki.\n";
+   my @namespaces = sort keys %namespace_id;
+   for (@namespaces) { s/ /_/g; }
+   print {*STDERR} "No such namespace ${name} on MediaWiki, known 
namespaces: @namespaces\n";
$ns = {is_namespace => 0};
$namespace_id{$name} = $ns;
}
-- 
2.11.0



[PATCH v3 6/7] remote-mediawiki: process namespaces in order

2017-11-02 Thread Antoine Beaupré
Ideally, we'd process them in numeric order since that is more
logical, but we can't do that yet since this is where we find the
numeric identifiers in the first place. Lexicographic order is a good
compromise.

Reviewed-by: Eric Sunshine <sunsh...@sunshineco.com>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 898541a9f..f53e638cf 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -263,7 +263,7 @@ sub get_mw_tracked_categories {
 
 sub get_mw_tracked_namespaces {
 my $pages = shift;
-foreach my $local_namespace (@tracked_namespaces) {
+foreach my $local_namespace (sort @tracked_namespaces) {
 my $namespace_id;
 if ($local_namespace eq "(Main)") {
 $namespace_id = 0;
-- 
2.11.0



[PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces

2017-11-02 Thread Antoine Beaupré
Without this, the fetch process seems hanged while we fetch page
listings across the namespaces. Obviously, it should be possible to
silence this with -q, but that's an issue already present everywhere
in the code and should be fixed separately:

https://github.com/Git-Mediawiki/Git-Mediawiki/issues/30

Reviewed-by: Eric Sunshine <sunsh...@sunshineco.com>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index f53e638cf..dc43a950b 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -279,6 +279,7 @@ sub get_mw_tracked_namespaces {
 aplimit => 'max' } )
 || die $mediawiki->{error}->{code} . ': '
 . $mediawiki->{error}->{details} . "\n";
+print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace 
($namespace_id)\n";
 foreach my $page (@{$mw_pages}) {
 $pages->{$page->{title}} = $page;
 }
-- 
2.11.0



[PATCH v3 5/7] remote-mediawiki: support fetching from (Main) namespace

2017-11-02 Thread Antoine Beaupré
When we specify a list of namespaces to fetch from, by default the MW
API will not fetch from the default namespace, refered to as "(Main)"
in the documentation:

https://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces

I haven't found a way to address that "(Main)" namespace when getting
the namespace ids: indeed, when listing namespaces, there is no
"canonical" field for the main namespace, although there is a "*"
field that is set to "" (empty). So in theory, we could specify the
empty namespace to get the main namespace, but that would make
specifying namespaces harder for the user: we would need to teach
users about the "empty" default namespace. It would also make the code
more complicated: we'd need to parse quotes in the configuration.

So we simply override the query here and allow the user to specify
"(Main)" since that is the publicly documented name.

Reviewed-by: Eric Sunshine <sunsh...@sunshineco.com>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 21fb2e302..898541a9f 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -264,7 +264,12 @@ sub get_mw_tracked_categories {
 sub get_mw_tracked_namespaces {
 my $pages = shift;
 foreach my $local_namespace (@tracked_namespaces) {
-my $namespace_id = get_mw_namespace_id($local_namespace);
+my $namespace_id;
+if ($local_namespace eq "(Main)") {
+$namespace_id = 0;
+} else {
+$namespace_id = get_mw_namespace_id($local_namespace);
+}
 # virtual namespaces don't support allpages
 next if !defined($namespace_id) || $namespace_id < 0;
 my $mw_pages = $mediawiki->list( {
-- 
2.11.0



Re: [PATCH 4/7] remote-mediawiki: skip virtual namespaces

2017-11-02 Thread Antoine Beaupré
On 2017-11-02 10:24:40, Junio C Hamano wrote:
> Antoine Beaupré <anar...@debian.org> writes:
>
>> It might still worth fixing this, but I'm not sure what the process is
>> here - in the latest "what's cooking" Junio said this patchset would be
>> merged in "next". Should I reroll the patchset to fix this or not?
>
> The process is for you (the contributor of the topic) to yell at me,
> "don't merge it yet, there still are updates to come".

YELL! "don't merge it yet, there still are updates to come". :)

> That message _may_ come to late, in which case we may have to go
> incremental, but I usually try to leave at least a few days between
> the time I mark a topic as "will merge" and the time I actually do
> the merge, for this exact reason.

Awesome, thanks for the update.

i'll roll a v4 with the last tweaks, hopefully that will be the last.

a.

-- 
How inappropriate to call this planet 'Earth' when it is quite clearly
'Ocean'.
- Arthur C. Clarke


Re: [PATCH 5/7] remote-mediawiki: support fetching from (Main) namespace

2017-11-02 Thread Antoine Beaupré
On 2017-11-01 15:56:51, Eric Sunshine wrote:
>> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
>> b/contrib/mw-to-git/git-remote-mediawiki.perl
>> @@ -264,9 +264,14 @@ sub get_mw_tracked_categories {
>>  sub get_mw_tracked_namespaces {
>>  my $pages = shift;
>>  foreach my $local_namespace (@tracked_namespaces) {
>> -my $namespace_id = get_mw_namespace_id($local_namespace);
>> +my ($namespace_id, $mw_pages);
>> +if ($local_namespace eq "(Main)") {
>> +$namespace_id = 0;
>> +} else {
>> +$namespace_id = get_mw_namespace_id($local_namespace);
>> +}
>
> I meant to ask this in the previous round, but with the earlier patch
> mixing several distinct changes into one, I plumb forgot: Would it
> make sense to move this "(Main)" special case into
> get_mw_namespace_id() itself? After all, that function is all about
> determining an ID associated with a name, and "(Main)" is a name.

Right. At first sight, I agree: get_mw_namespace_id should do the right
thing. But then, I look at the code of that function, and it strikes me
as ... well... really hard to actually do this the right way.

In fact, I suspect that passing "" to get_mw_namespace_id would actually
do the right thing. The problem, as I explained before, is that passing
that in the configuration is pretty hard: it would needlessly complicate
the configuration setting, so I think it's a fair shortcut to do it
here.

>>  next if $namespace_id < 0; # virtual namespaces don't support 
>> allpages
>> -my $mw_pages = $mediawiki->list( {
>> +$mw_pages = $mediawiki->list( {
>
> Why did the "my" of $my_pages get moved up to the top of the foreach
> loop? I can't seem to see any reason for it. Is this an unrelated
> change accidentally included in this patch?

Just a habit of declaring functions at the beginning of a block. Maybe
it's because I'm old? :)

I'll reroll a last patchset with those fixes.

A.

-- 
One of the strongest motives that leads men to art and science is
escape from everyday life with its painful crudity and hopeless
dreariness. Such men make this cosmos and its construction the pivot
of their emotional life, in order to find the peace and security which
they cannot find in the narrow whirlpool of personal experience.
   - Albert Einstein


Re: [PATCH 4/7] remote-mediawiki: skip virtual namespaces

2017-11-01 Thread Antoine Beaupré
On 2017-11-01 09:52:09, Eric Sunshine wrote:
> On Sun, Oct 29, 2017 at 10:51 PM, Antoine Beaupré <anar...@debian.org> wrote:
>> Virtual namespaces do not correspond to pages in the database and are
>> automatically generated by MediaWiki. It makes little sense,
>> therefore, to fetch pages from those namespaces and the MW API doesn't
>> support listing those pages.
>>
>> According to the documentation, those virtual namespaces are currently
>> "Special" (-1) and "Media" (-2) but we treat all negative namespaces
>> as "virtual" as a future-proofing mechanism.
>
> This patch makes more sense now with the additional commentary.
> Thanks. More below.
>
>> Signed-off-by: Antoine Beaupré <anar...@debian.org>
>> ---
>> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
>> b/contrib/mw-to-git/git-remote-mediawiki.perl
>> index e7616e1a2..5c85e64b6 100755
>> --- a/contrib/mw-to-git/git-remote-mediawiki.perl
>> +++ b/contrib/mw-to-git/git-remote-mediawiki.perl
>> @@ -264,10 +264,12 @@ sub get_mw_tracked_categories {
>>  sub get_mw_tracked_namespaces {
>>  my $pages = shift;
>>  foreach my $local_namespace (@tracked_namespaces) {
>> +my $namespace_id = get_mw_namespace_id($local_namespace);
>> +next if $namespace_id < 0; # virtual namespaces don't support 
>> allpages
>
> Since (it appears) that get_mw_namespace_id() can return undef, you
> probably still need to take that into account before performing a
> numeric comparison:
>
> next if !$namespace_id || $namespace_id < 0;

I would argue that this bug exists already elsewhere in the code - no
error handling exists there... Furthermore, it should be !defined()
because it can be 0.

It might still worth fixing this, but I'm not sure what the process is
here - in the latest "what's cooking" Junio said this patchset would be
merged in "next". Should I reroll the patchset to fix this or not?

A.

-- 
N'aimer qu'un seul est barbarie, car c'est au détriment de tous les
autres. Fût-ce l'amour de Dieu.
- Nietzsche, "Par delà le bien et le mal"


Re: future of the mediawiki extension?

2017-10-30 Thread Antoine Beaupré
On 2017-10-31 10:37:29, Junio C Hamano wrote:
>> There's also a hybrid solution used by git-multimail: have a copy of the
>> code in git.git, but do the development separately. I'm not sure it'd be
>> a good idea for Git-Mediawiki, but I'm mentionning it for completeness.
>
> I think the plan was to make code drop from time to time at major
> release points of git-multimail, but I do not think we've seen many
> updates recently.

I'd be okay with a hybrid as well. It would require minimal work on
Git's side at this stage: things can just stay as is until there's a new
"release" of the mediawiki extension and at that point you can decide if
you merge it all in or if you drop it in favor of the contrib.

I think it's also fine to punt it completely out to the community.

Either way, I may have time to do some of that work in the coming month,
so let me know what you prefer, I guess you two have the last word
here. The community, on Mediawiki's side, seem to mostly favor GitHub.

A.

-- 
Never attribute to malice that which can be adequately explained by
stupidity, but don't rule out malice.
 - Albert Einstein


Re: [PATCH v2] remote-mediawiki: limit filenames to legal

2017-10-30 Thread Antoine Beaupré
On 2017-10-30 11:34:11, Matthieu Moy wrote:
> Antoine Beaupré <anar...@debian.org> writes:
>
>> @@ -52,7 +53,7 @@ sub smudge_filename {
>>  $filename =~ s/ /_/g;
>>  # Decode forbidden characters encoded in clean_filename
>>  $filename =~ s/_%_([0-9a-fA-F][0-9a-fA-F])/sprintf('%c', hex($1))/ge;
>> -return $filename;
>> +return substr($filename, 0, NAME_MAX-3);
>
> There's a request to allow a configurable extension (.mediawiki would
> help importing in some wikis, see
> https://github.com/Git-Mediawiki/Git-Mediawiki/issues/42). You should at
> least make this stg like length(".mw") so that the next search
> for ".mw" finds this.

I believe I did that in v3.

> Also, note that your solution works for using Git-Mediawiki in a
> read-only way, but if you start modifying and pushing such files, you'll
> get into trouble. It probably makes sense to issue a warnign in such
> case.

True. I didn't consider that, but then again the patch is not a
regression: you couldn't have pushed those repos in the first place
anyways...

A.

-- 
The history of any one part of the earth, like the life of a soldier,
consists of long periods of boredom and short periods of terror.
   - British geologist Derek V. Ager


Re: future of the mediawiki extension?

2017-10-30 Thread Antoine Beaupré
On 2017-10-30 11:29:55, Matthieu Moy wrote:
>> It should also be mentioned that this contrib isn't very active: I'm not
>> part of the GitHub organization, yet I'm probably the one that's been
>> the most active with patches in the last year (and I wasn't very active
>> at all).
>
> FYI, I'm no longer using Mediawiki as much as I did, and I don't really
> use Git-Mediawiki anymore.
>
> The main blocking point to revive Git-Mediawiki is to find a new
> maintainer (https://github.com/Git-Mediawiki/Git-Mediawiki/issues/33). I
> believe I just found one ;-).

Eh. I assume you mean me here. As I hinted at in another thread, I am
not sure I can commit to leading the project - just scratching an
itch. But I may be able to review pull requests and make some releases
from time to time... I probably won't work on code or features I don't
need unless someone funds my work or something. ;)

We'll see where the community takes us, I guess... Always better to have
more than one maintainer, anyways, just for the bus factor... Worst
case, I'll delegate to a worthy successor. :)

A.

-- 
Votre silence ne vous protégera pas.
- Audrey Lorde


Re: [PATCH 0/4] WIP: git-remote-media wiki namespace support

2017-10-30 Thread Antoine Beaupré
On 2017-10-30 11:40:06, Matthieu Moy wrote:
> Antoine Beaupré <anar...@debian.org> writes:
>
>> Obviously, doing unit tests against a full MediaWiki instance isn't
>> exactly trivial.
>
> Not trivial, but doable: there is all the infrastructure to do so in t/:
> install-wiki.sh to automatically install Mediawiki, and then a testsuite
> that interacts with it.
>
> This has been written under the assumption that the developer had a
> lighttpd instance running on localhost, but this can probably be adapted
> to run on Travis-CI (install lighttpd & Mediawiki in the install: part,
> and run the tests afterwards), so that anyone can run the tests by just
> submitting a pull-request to Git-Mediawiki.
>
> If you are to work more on Git-Mediawiki, don't underestimate the
> usefullness of the testsuite (for example, Git-Mediawiki was developped
> against a prehistoric version of Mediawiki, the testsuite can help
> ensuring it still works on the lastest version), nor the fun of playing
> with install scripts and CI systems ;-).

Hello!

Glad to hear from you. :)

So I actually tried install-wiki.sh, and it "failed to start lighttpd"
and told me to see logs. I couldn't find them and stopped there...

It would be great to hook this up into CI somewhere, but I suspect it
isn't considering how it doesn't actually work out of the box.

I'm hoping we can still do things and fix some things without going
through that trouble, but I recognize it would be better to have unit
tests operational.

Honestly, I would prefer just having this thing work and not have to
work on it. :) I have lots of things on my plate and I'm just scratching
an itch on this one - some backup script broke and I am trying to fix
it. Once it works, my work is done, so unfortunately I cannot lead that
project (but I'd be happy to help when I can of course).

A.

-- 
The greatest tragedy in mankind's entire history may be the hijacking of
morality by religion.
- Arthur C. Clarke


Re: [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces

2017-10-30 Thread Antoine Beaupré
On 2017-10-29 23:52:16, Eric Sunshine wrote:
> On Sun, Oct 29, 2017 at 10:43 PM, Antoine Beaupré <anar...@debian.org> wrote:
>> On 2017-10-29 15:49:28, Eric Sunshine wrote:
>>> This may be problematic since get_mw_namespace_id() may return undef
>>> rather than a number, in which case Perl will complain.
>>
>> Actually, get_mw_namespace_id() doesn't seem like it can return undef -
>> did you mistake it with get_mw_namespace_id_for_page()?
>
> Hmm, no. What I see in the function is this:
>
> my $id;
> ...
> if ($ns->{is_namespace}) {
> $id = $ns->{id};
> }
> ...
> return $id;
>
> So, $id starts undefined and is assigned only conditionally before
> being returned, but perhaps I'm missing some subtlety.

Ah yes, you're probably right there.

-- 
During the initial stage of the struggle, the oppressed, instead of
striving for liberation, tend themselves to become oppressors The very
structure of their thought has been conditioned by the contradictions of
the concrete, existential situation by which they were shaped. Their
ideal is to be men; but for them, to be men is to be oppressors. This is
their model of humanity.
- Paulo Freire, Pedagogy of the Oppressed


future of the mediawiki extension?

2017-10-29 Thread Antoine Beaupré
Hi,

First thanks for the excellent feedback regarding the mediawiki
extension, it's great that obscure extensions like this see such
excellent reviews.

I think, however, it would be good to have a discussion about the future
of that extension in Git. The extension has a bit of a hybrid presence -
it is partly in git core (as a contrib, but still) and partly on GitHub
here:

https://github.com/Git-Mediawiki/Git-Mediawiki/

This leads to some confusion as to where the changes should be
made. Some people make changes straight on GitHub by forking the above
repo, others fork the Git repo, and very few actually send the patches
here, on this mailing list.

There was a discussion last year about moving the module out of git core
and onto its own repository again:

https://github.com/Git-Mediawiki/Git-Mediawiki/issues/34

There is also a discussion on releasing the code to CPAN:

https://github.com/Git-Mediawiki/Git-Mediawiki/issues/18

It should also be mentioned that this contrib isn't very active: I'm not
part of the GitHub organization, yet I'm probably the one that's been
the most active with patches in the last year (and I wasn't very active
at all). There's an issue on GitHub about this as well:

https://github.com/Git-Mediawiki/Git-Mediawiki/issues/33

So, what should be done about this contrib? Should it stay in Git core?
Or should it be punted back to the community and managed on GitHub?

Please avoid "mailing list vs GitHub" flamewars and keep to the topic of
this specific contrib's future. :)

Thanks!

A.

PS: personally, I don't care much either way. It certainly seem that I
get way better feedback here than I previously got on GitHub, but that
could be because of the hybrid way things are setup in the first
place...
-- 
To punish me for my contempt for authority, fate made me an authority myself.
   - Albert Einstein


[PATCH v2 0/7] remote-mediawiki: add namespace support

2017-10-29 Thread Antoine Beaupré
This patch series tries to integrate all the feedback received in the
recent review from Eric Sunshine. It completely removes the confusing
changes to get_mw_namespace_id_for_page() because I believe they are
unrelated to the namespace support.

I also split up the last patch in 4 different patches for clarity and
fixed the vocabulary (it's "virtual" namespaces, not "special", which
is a specific namespace).

I left that die() in there because it makes the code a little cleaner
and I'm lazy.

Thanks again for the good feedback!


[PATCH 7/7] remote-mediawiki: show progress while fetching namespaces

2017-10-29 Thread Antoine Beaupré
Without this, the fetch process seems hanged while we fetch page
listings across the namespaces. Obviously, it should be possible to
silence this with -q, but that's an issue already present everywhere
in the code and should be fixed separately:

https://github.com/Git-Mediawiki/Git-Mediawiki/issues/30

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 5199af6f6..61e6dd798 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -278,6 +278,7 @@ sub get_mw_tracked_namespaces {
 aplimit => 'max' } )
 || die $mediawiki->{error}->{code} . ': '
 . $mediawiki->{error}->{details} . "\n";
+print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace 
($namespace_id)\n";
 foreach my $page (@{$mw_pages}) {
 $pages->{$page->{title}} = $page;
 }
-- 
2.11.0



[PATCH 2/7] remote-mediawiki: allow fetching namespaces with spaces

2017-10-29 Thread Antoine Beaupré
From: Ingo Ruhnke <grum...@gmail.com>

we still want to use spaces as separators in the config, but we should
allow the user to specify namespaces with spaces, so we use underscore
for this.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 5ffb57595..a1d783789 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -65,6 +65,7 @@ chomp(@tracked_categories);
 
 # Just like @tracked_categories, but for MediaWiki namespaces.
 my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.namespaces"));
+for (@tracked_namespaces) { s/_/ /g; }
 chomp(@tracked_namespaces);
 
 # Import media files on pull
-- 
2.11.0



[PATCH 4/7] remote-mediawiki: skip virtual namespaces

2017-10-29 Thread Antoine Beaupré
Virtual namespaces do not correspond to pages in the database and are
automatically generated by MediaWiki. It makes little sense,
therefore, to fetch pages from those namespaces and the MW API doesn't
support listing those pages.

According to the documentation, those virtual namespaces are currently
"Special" (-1) and "Media" (-2) but we treat all negative namespaces
as "virtual" as a future-proofing mechanism.

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index e7616e1a2..5c85e64b6 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -264,10 +264,12 @@ sub get_mw_tracked_categories {
 sub get_mw_tracked_namespaces {
 my $pages = shift;
 foreach my $local_namespace (@tracked_namespaces) {
+my $namespace_id = get_mw_namespace_id($local_namespace);
+next if $namespace_id < 0; # virtual namespaces don't support allpages
 my $mw_pages = $mediawiki->list( {
 action => 'query',
 list => 'allpages',
-apnamespace => get_mw_namespace_id($local_namespace),
+apnamespace => $namespace_id,
 aplimit => 'max' } )
 || die $mediawiki->{error}->{code} . ': '
 . $mediawiki->{error}->{details} . "\n";
-- 
2.11.0



[PATCH 5/7] remote-mediawiki: support fetching from (Main) namespace

2017-10-29 Thread Antoine Beaupré
When we specify a list of namespaces to fetch from, by default the MW
API will not fetch from the default namespace, refered to as "(Main)"
in the documentation:

https://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces

I haven't found a way to address that "(Main)" namespace when getting
the namespace ids: indeed, when listing namespaces, there is no
"canonical" field for the main namespace, although there is a "*"
field that is set to "" (empty). So in theory, we could specify the
empty namespace to get the main namespace, but that would make
specifying namespaces harder for the user: we would need to teach
users about the "empty" default namespace. It would also make the code
more complicated: we'd need to parse quotes in the configuration.

So we simply override the query here and allow the user to specify
"(Main)" since that is the publicly documented name.

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 5c85e64b6..2c2a7367b 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -264,9 +264,14 @@ sub get_mw_tracked_categories {
 sub get_mw_tracked_namespaces {
 my $pages = shift;
 foreach my $local_namespace (@tracked_namespaces) {
-my $namespace_id = get_mw_namespace_id($local_namespace);
+my ($namespace_id, $mw_pages);
+if ($local_namespace eq "(Main)") {
+$namespace_id = 0;
+} else {
+$namespace_id = get_mw_namespace_id($local_namespace);
+}
 next if $namespace_id < 0; # virtual namespaces don't support allpages
-my $mw_pages = $mediawiki->list( {
+$mw_pages = $mediawiki->list( {
 action => 'query',
 list => 'allpages',
 apnamespace => $namespace_id,
-- 
2.11.0



[PATCH 1/7] remote-mediawiki: add namespace support

2017-10-29 Thread Antoine Beaupré
From: Kevin <ke...@ki-ai.org>

This introduces a new remote.origin.namespaces argument that is a
space-separated list of namespaces. The list of pages extract is then
taken from all the specified namespaces.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 25 +
 1 file changed, 25 insertions(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index e7f857c1a..5ffb57595 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -63,6 +63,10 @@ chomp(@tracked_pages);
 my @tracked_categories = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.categories"));
 chomp(@tracked_categories);
 
+# Just like @tracked_categories, but for MediaWiki namespaces.
+my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.namespaces"));
+chomp(@tracked_namespaces);
+
 # Import media files on pull
 my $import_media = run_git("config --get --bool 
remote.${remotename}.mediaimport");
 chomp($import_media);
@@ -256,6 +260,23 @@ sub get_mw_tracked_categories {
return;
 }
 
+sub get_mw_tracked_namespaces {
+my $pages = shift;
+foreach my $local_namespace (@tracked_namespaces) {
+my $mw_pages = $mediawiki->list( {
+action => 'query',
+list => 'allpages',
+apnamespace => get_mw_namespace_id($local_namespace),
+aplimit => 'max' } )
+|| die $mediawiki->{error}->{code} . ': '
+. $mediawiki->{error}->{details} . "\n";
+foreach my $page (@{$mw_pages}) {
+$pages->{$page->{title}} = $page;
+}
+}
+return;
+}
+
 sub get_mw_all_pages {
my $pages = shift;
# No user-provided list, get the list of pages from the API.
@@ -319,6 +340,10 @@ sub get_mw_pages {
$user_defined = 1;
get_mw_tracked_categories(\%pages);
}
+   if (@tracked_namespaces) {
+   $user_defined = 1;
+   get_mw_tracked_namespaces(\%pages);
+   }
if (!$user_defined) {
get_mw_all_pages(\%pages);
}
-- 
2.11.0



[PATCH 3/7] remote-mediawiki: show known namespace choices on failure

2017-10-29 Thread Antoine Beaupré
If we fail to find a requested namespace, we should tell the user
which ones we know about, since those were already fetched. This
allows users to fetch all namespaces by specifying a dummy namespace,
failing, then copying the list of namespaces in the config.

Eventually, we should have a flag that allows fetching all namespaces
automatically.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index a1d783789..e7616e1a2 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -1334,7 +1334,9 @@ sub get_mw_namespace_id {
my $id;
 
if (!defined $ns) {
-   print {*STDERR} "No such namespace ${name} on MediaWiki.\n";
+   my @namespaces = sort keys %namespace_id;
+   for (@namespaces) { s/ /_/g; }
+   print {*STDERR} "No such namespace ${name} on MediaWiki, known 
namespaces: @namespaces\n";
$ns = {is_namespace => 0};
$namespace_id{$name} = $ns;
}
-- 
2.11.0



[PATCH 6/7] remote-mediawiki: process namespaces in order

2017-10-29 Thread Antoine Beaupré
Ideally, we'd process them in numeric order since that is more
logical, but we can't do that yet since this is where we find the
numeric identifiers in the first place. Lexicographic order is a good
compromise.

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 2c2a7367b..5199af6f6 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -263,7 +263,7 @@ sub get_mw_tracked_categories {
 
 sub get_mw_tracked_namespaces {
 my $pages = shift;
-foreach my $local_namespace (@tracked_namespaces) {
+foreach my $local_namespace (sort @tracked_namespaces) {
 my ($namespace_id, $mw_pages);
 if ($local_namespace eq "(Main)") {
 $namespace_id = 0;
-- 
2.11.0



Re: [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces

2017-10-29 Thread Antoine Beaupré
On 2017-10-29 15:49:28, Eric Sunshine wrote:
[...]
>> Reviewed-by: Antoine Beaupré <anar...@debian.org>
>> Signed-off-by: Antoine Beaupré <anar...@debian.org>
>> ---
>> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
>> b/contrib/mw-to-git/git-remote-mediawiki.perl
>> @@ -264,16 +264,27 @@ sub get_mw_tracked_categories {
>>  sub get_mw_tracked_namespaces {
>>  my $pages = shift;
>> -foreach my $local_namespace (@tracked_namespaces) {
>> -my $mw_pages = $mediawiki->list( {
>> -action => 'query',
>> -list => 'allpages',
>> -apnamespace => get_mw_namespace_id($local_namespace),
>> -aplimit => 'max' } )
>> -|| die $mediawiki->{error}->{code} . ': '
>> -. $mediawiki->{error}->{details} . "\n";
>> -foreach my $page (@{$mw_pages}) {
>> -$pages->{$page->{title}} = $page;
>> +foreach my $local_namespace (sort @tracked_namespaces) {
>> +my ($mw_pages, $namespace_id);
>> +if ($local_namespace eq "(Main)") {
>> +$namespace_id = 0;
>> +} else {
>> +$namespace_id = get_mw_namespace_id($local_namespace);
>> +}
>> +if ($namespace_id >= 0) {
>
> This may be problematic since get_mw_namespace_id() may return undef
> rather than a number, in which case Perl will complain.

[...]

Actually, get_mw_namespace_id() doesn't seem like it can return undef -
did you mistake it with get_mw_namespace_id_for_page()?

A.

-- 
Uncompromising war resistance and refusal to do military service under
any circumstances.
   - Albert Einstein


Re: [PATCH 1/4] remote-mediawiki: add namespace support

2017-10-29 Thread Antoine Beaupré
On 2017-10-29 23:08:00, Kevin wrote:
> So I shared the patch some time ago (~2 years). Surprisingly its just
> now getting attention. I guess some renewed interest in using mediawiki
> with git.

I think what's happening is that someone (ie. me :p) figured it was
about frigging time to actually send those patches to the git mailing
list. ;) And I'm glad we're seeing such good reviews, so thanks Eric for
that... 

> Myself, however, am no longer using mediawiki. Nor am I
> completely clear on what the reasons were for using some variable or
> another a couple of years ago. So... the best of luck, sorry I couldn't
> be more helpful.

That's too bad, but thanks for the feedback anyways. :)

Frankly, I'm tempted to just completely remove the
get_mw_namespace_id_for_page hunk - it's completely unrelated to the
rest of the patch.

Could that be a bugfix for a separate issue that crept up in your
patchset? For example this?

https://github.com/Git-Mediawiki/Git-Mediawiki/issues/43

A.
-- 
That's one of the remarkable things about life: it's never so bad that
it can't get worse.
- Calvin


Re: [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces

2017-10-29 Thread Antoine Beaupré
On 2017-10-29 15:49:28, Eric Sunshine wrote:
> On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anar...@debian.org> wrote:
>> Subject: remote-mediawiki: allow using (Main) as a namespace and skip 
>> special namespaces
>
> This patch is more difficult to review than it perhaps ought to be
> since it is making multiple unrelated changes.
>
> It's not clear from the description what special namespaces are and
> why they need to be skipped. It's also not clear why (Main) is
> special. Perhaps the commit message(s) could explain these issues in
> more detail.
>
> To simplify review and make it easier to gauge what it going on, it
> might make sense to split this patch into at least two: one which
> skips "special namespaces", and one which gives special treatment to
> (Main).

Agreed, I'll try to do that.

> More below...
>
>> Reviewed-by: Antoine Beaupré <anar...@debian.org>
>> Signed-off-by: Antoine Beaupré <anar...@debian.org>
>> ---
>> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
>> b/contrib/mw-to-git/git-remote-mediawiki.perl
>> @@ -264,16 +264,27 @@ sub get_mw_tracked_categories {
>>  sub get_mw_tracked_namespaces {
>>  my $pages = shift;
>> -foreach my $local_namespace (@tracked_namespaces) {
>> -my $mw_pages = $mediawiki->list( {
>> -action => 'query',
>> -list => 'allpages',
>> -apnamespace => get_mw_namespace_id($local_namespace),
>> -aplimit => 'max' } )
>> -|| die $mediawiki->{error}->{code} . ': '
>> -. $mediawiki->{error}->{details} . "\n";
>> -foreach my $page (@{$mw_pages}) {
>> -$pages->{$page->{title}} = $page;
>> +foreach my $local_namespace (sort @tracked_namespaces) {
>> +my ($mw_pages, $namespace_id);
>> +if ($local_namespace eq "(Main)") {
>> +$namespace_id = 0;
>> +} else {
>> +$namespace_id = get_mw_namespace_id($local_namespace);
>> +}
>> +if ($namespace_id >= 0) {
>
> This may be problematic since get_mw_namespace_id() may return undef
> rather than a number, in which case Perl will complain. Since the code
> skips the $mediawiki query altogether when it encounters "(Main)", you
> could fix this problem and simplify the code overall by simply
> skipping the bulk of the foreach loop body instead of mucking around
> with $namespace_id. For instance:
>
> foreach my $local_namespace (sort @tracked_namespaces) {
> next if ($local_namespace eq "(Main)");
> ...normal processing...
> }

Ah yes. I see your point but it doesn't actually skip the query when it
encouters main ($namespace_id >= 0).

>> +if ($mw_pages = $mediawiki->list( {
>> +action => 'query',
>> +list => 'allpages',
>> +apnamespace => $namespace_id,
>> +aplimit => 'max' } )) {
>> +print {*STDERR} "$#{$mw_pages} found in namespace 
>> $local_namespace ($namespace_id)\n";
>
> The original code did not emit this diagnostic but the new code does
> so unconditionally. Is this just leftover debugging code or is
> intended that all users should see this information all the time?

This is a known issue that permeates the whole remote at this point, and
it is quite annoying.

https://github.com/Git-Mediawiki/Git-Mediawiki/issues/30

I have, however, considered it useful to include this to show progress
as it can take a while to fetch all namespace information...

Obviously, once we figure out how to silence this stuff (ie. how to
recognize -q), it should be silenced like everything else, but until
then I think it's quite useful.

>> +foreach my $page (@{$mw_pages}) {
>> +$pages->{$page->{title}} = $page;
>> +}
>> +} else {
>> +warn $mediawiki->{error}->{code} . ': '
>> +. $mediawiki->{error}->{details} . "\n";
>
> I guess this is the part which "skips special namespaces". The
> original code die()'d but this merely warns. Aside from these "special
> namespaces", are there genuine cases when the $mediawiki query would
> return an error, and which should indeed die(), or is warning
> appropriate for all $mediawiki query error cases?

Maybe I didn't get the indentation right, but this } else { is for query
failures, *not* the if ($namespace_id < 0). So < 0 is just silently
skipped.

The original code was die()'ing on failures, but I think that's a
mistake: we should fetch what we can and warn on the failures. That
allows the user to fix multiple problems at once instead of having to
rerun the script repeatedly.

A.

-- 
Le féminisme n'a jamais tué personne
Le machisme tue tous les jours.
- Benoîte Groulx


Re: [PATCH 3/4] remote-mediawiki: show known namespace choices on failure

2017-10-29 Thread Antoine Beaupré
On 2017-10-29 13:34:31, Eric Sunshine wrote:
> On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anar...@debian.org> wrote:
>> if we fail to find a requested namespace, we should tell the user
>
> s/if/If/

fixed.

>> which ones we know about, since we already do. this allows users to
>
> s/this/This/
>
> Not sure what ", since we already do" means here.

we already have fetched the mapping, fixed.

>> feetch all namespaces by specifying a dummy namespace, failing, then
>
> s/feetch/fetch/

fixed.

>> copying the list of namespaces in the config.
>>
>> eventually, we should have a flag that allows fetching all namespaces
>> automatically.
>>
>> Reviewed-by: Antoine Beaupré <anar...@debian.org>
>> Signed-off-by: Antoine Beaupré <anar...@debian.org>
>> ---
>> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
>> b/contrib/mw-to-git/git-remote-mediawiki.perl
>> @@ -1334,7 +1334,9 @@ sub get_mw_namespace_id {
>> my $id;
>>
>> if (!defined $ns) {
>> -   print {*STDERR} "No such namespace ${name} on MediaWiki.\n";
>> +   my @namespaces = sort keys %namespace_id;
>> +   for (@namespaces) { s/ /_/g; }
>> +   print {*STDERR} "No such namespace ${name} on MediaWiki, 
>> known namespaces: @namespaces.\n";
>
> Probably want to drop the terminating "." in the error message.

meh... i just respected what was already there, but it's true it can be
error-prone when copy-pasting, so removed.

a.
-- 
A ballot is like a bullet. You don't throw your ballots until you see
a target, and if that target is not within your reach, keep your
ballot in your pocket.
 - Malcom X


Re: [PATCH 1/4] remote-mediawiki: add namespace support

2017-10-29 Thread Antoine Beaupré
On 2017-10-29 13:24:03, Eric Sunshine wrote:
> On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anar...@debian.org> wrote:
>> From: Kevin <ke...@ki-ai.org>
>>
>> this introduces a new remote.origin.namespaces argument that is a
>
> s/this/This/

ack.

>> space-separated list of namespaces. the list of pages extract is then
>
> s/the/The/

ack.

>> taken from all the specified namespaces.
>>
>> Reviewed-by: Antoine Beaupré <anar...@debian.org>
>> Signed-off-by: Antoine Beaupré <anar...@debian.org>
>> ---
>> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
>> b/contrib/mw-to-git/git-remote-mediawiki.perl
>> @@ -1331,7 +1356,12 @@ sub get_mw_namespace_id {
>>  sub get_mw_namespace_id_for_page {
>> my $namespace = shift;
>> if ($namespace =~ /^([^:]*):/) {
>
> This is not a new issue, but why capture if $1 is never referenced in
> the code below?

meh, i dunno.

>> -   return get_mw_namespace_id($namespace);
>> +   my ($ns, $id) = split(/:/, $namespace);
>> +   if (Scalar::Util::looks_like_number($id)) {
>> +   return get_mw_namespace_id($ns);
>
> So, the idea is that if the input has form "something:number", then
> you want to look up "something" as a namespace name. Anything else
> (such as "something:foobar") is not considered a valid page reference.
> Right?

frankly, i have no idea what's going on here.

>> +   } else{
>
> Missing space before open brace.

right.

>> +   return
>
> Not required, but missing semi-colon.

ok.

>> +   }
>> } else {
>> return;
>> }
>
> The multiple 'return's are a bit messy. Perhaps collapse the entire
> function to something like this:
>
> sub get_mw_namespace_id_for_page {
> my $arg = shift;
> if ($arg =~ /^([^:]+):\d+$/) {
> return get_mw_namespace_id($1);
> }
> return undef;
> }
>
> Then, you don't need even need Scalar::Util::looks_like_number()
> (unless, I suppose, the incoming number is expected to be something
> other than simple digits).
>
> In fact, it may be that the intent of the original code *was* meant to
> do exactly the same as shown in my example above, but that the person
> who wrote it accidentally typed:
>
> return get_mw_namespace_id($namespace);
>
> instead of the intended:
>
> return get_mw_namespace_id($1);
>
> So, a minimal fix would be simply to change $namespace to $1.
> Tightening the regex as I did in my example would be a bonus (though
> probably ought to be a separate patch).

so while i'm happy to just copy-paste your code in there, that's kind of
a sensitive area of the code, as it was originally used only in the
upload procedure, which I haven't tested at all. so i'm hesitant in just
merging that in as is.

i don't understand why or how this even works, to be honest: page names
don't necessarily look like numbers, in fact, they generally don't. i
don't understand why the patch submitted here even touches that function
at all, considering that the function is only used on uploads. I just
cargo-culted it from the original issue...

sigh.

a.

-- 
C'est trop facile quand les guerres sont finies
D'aller gueuler que c'était la dernière
Amis bourgeois vous me faites envie
Ne voyez vous pas donc point vos cimetières?
- Jaques Brel


[no subject]

2017-10-29 Thread Antoine Beaupré

sorry for the noise here, but the original patch didn't fix the length
in the right place. v2 fixed it in the library properly, but i forgot
to also include the length of the suffix. this should be good to go...


[PATCH v3] remote-mediawiki: limit filenames to legal

2017-10-29 Thread Antoine Beaupré
mediawiki pages can have names longer than NAME_MAX (generally 255)
characters, which will fail on checkout. we simply strip out extra
characters, which may mean one page's content will overwrite another
(the last editing winning).

ideally, we would do a more clever system to find unique names, but
that would be more difficult and error prone for a situation that
should rarely happen in the first place.

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/Git/Mediawiki.pm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/Git/Mediawiki.pm 
b/contrib/mw-to-git/Git/Mediawiki.pm
index d13c4dfa7..917d9e2d3 100644
--- a/contrib/mw-to-git/Git/Mediawiki.pm
+++ b/contrib/mw-to-git/Git/Mediawiki.pm
@@ -2,6 +2,7 @@ package Git::Mediawiki;
 
 use 5.008;
 use strict;
+use POSIX;
 use Git;
 
 BEGIN {
@@ -52,7 +53,7 @@ sub smudge_filename {
$filename =~ s/ /_/g;
# Decode forbidden characters encoded in clean_filename
$filename =~ s/_%_([0-9a-fA-F][0-9a-fA-F])/sprintf('%c', hex($1))/ge;
-   return $filename;
+   return substr($filename, 0, NAME_MAX-length('.mw'));
 }
 
 sub connect_maybe {
-- 
2.11.0



[PATCH v2] remote-mediawiki: limit filenames to legal

2017-10-29 Thread Antoine Beaupré
mediawiki pages can have names longer than NAME_MAX (generally 255)
characters, which will fail on checkout. we simply strip out extra
characters, which may mean one page's content will overwrite another
(the last editing winning).

ideally, we would do a more clever system to find unique names, but
that would be more difficult and error prone for a situation that
should rarely happen in the first place.

Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/Git/Mediawiki.pm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/Git/Mediawiki.pm 
b/contrib/mw-to-git/Git/Mediawiki.pm
index d13c4dfa7..c9f22680a 100644
--- a/contrib/mw-to-git/Git/Mediawiki.pm
+++ b/contrib/mw-to-git/Git/Mediawiki.pm
@@ -2,6 +2,7 @@ package Git::Mediawiki;
 
 use 5.008;
 use strict;
+use POSIX;
 use Git;
 
 BEGIN {
@@ -52,7 +53,7 @@ sub smudge_filename {
$filename =~ s/ /_/g;
# Decode forbidden characters encoded in clean_filename
$filename =~ s/_%_([0-9a-fA-F][0-9a-fA-F])/sprintf('%c', hex($1))/ge;
-   return $filename;
+   return substr($filename, 0, NAME_MAX-3);
 }
 
 sub connect_maybe {
-- 
2.11.0



[PATCH] remote-mediawiki: limit filenames to legal

2017-10-29 Thread Antoine Beaupré
mediawiki pages can have names longer than NAME_MAX (generally 255)
characters, which will fail on checkout. we simply strip out extra
characters, which may mean one page's content will overwrite another
(the last editing winning).

ideally, we would do a more clever system to find unique names, but
that would be more difficult and error prone for a situation that
should rarely happen in the first place.
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index e7f857c1a..58870d197 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -18,6 +18,7 @@ use Git::Mediawiki qw(clean_filename smudge_filename 
connect_maybe
EMPTY HTTP_CODE_OK);
 use DateTime::Format::ISO8601;
 use warnings;
+use POSIX;
 
 # By default, use UTF-8 to communicate with Git and the user
 binmode STDERR, ':encoding(UTF-8)';
@@ -703,7 +704,7 @@ sub import_file_revision {
%mediafile = %{$mediafile};
}
 
-   my $title = $commit{title};
+   my $title = substr($commit{title}, 0, NAME_MAX);
my $comment = $commit{comment};
my $content = $commit{content};
my $author = $commit{author};
-- 
2.11.0



[PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces

2017-10-29 Thread Antoine Beaupré
Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 31 +++--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 07cc74bac..ccefde4dc 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -264,16 +264,27 @@ sub get_mw_tracked_categories {
 
 sub get_mw_tracked_namespaces {
 my $pages = shift;
-foreach my $local_namespace (@tracked_namespaces) {
-my $mw_pages = $mediawiki->list( {
-action => 'query',
-list => 'allpages',
-apnamespace => get_mw_namespace_id($local_namespace),
-aplimit => 'max' } )
-|| die $mediawiki->{error}->{code} . ': '
-. $mediawiki->{error}->{details} . "\n";
-foreach my $page (@{$mw_pages}) {
-$pages->{$page->{title}} = $page;
+foreach my $local_namespace (sort @tracked_namespaces) {
+my ($mw_pages, $namespace_id);
+if ($local_namespace eq "(Main)") {
+$namespace_id = 0;
+} else {
+$namespace_id = get_mw_namespace_id($local_namespace);
+}
+if ($namespace_id >= 0) {
+if ($mw_pages = $mediawiki->list( {
+action => 'query',
+list => 'allpages',
+apnamespace => $namespace_id,
+aplimit => 'max' } )) {
+print {*STDERR} "$#{$mw_pages} found in namespace 
$local_namespace ($namespace_id)\n";
+foreach my $page (@{$mw_pages}) {
+$pages->{$page->{title}} = $page;
+}
+} else {
+warn $mediawiki->{error}->{code} . ': '
+. $mediawiki->{error}->{details} . "\n";
+}
 }
 }
 return;
-- 
2.11.0



[PATCH 0/4] WIP: git-remote-media wiki namespace support

2017-10-29 Thread Antoine Beaupré
Hi,

For a few years now, work has been happening in a [GitHub issue] to
improve git's support for MediaWiki sites, which are implemented in
the contrib/mw-to-git/ module, mostly visible in the
git-remote-mediawiki command.

 [GitHub issue]: https://github.com/Git-Mediawiki/Git-Mediawiki/issues/10

This specific patchset adds support for namespaces in
MediaWiki. Without this, it is impossible to fetch pages outside the
"(Main)" namespace (e.g. Talk pages or "meta"). Namespaces are heavily
used on many wikis and this seems like an essential feature to have.

I have been hesitant in pushing those patches here because I know how
strict the git community is regarding patchsets and I was afraid they
would just get shot down, especially because there are no unit tests
for the new functionality. Obviously, doing unit tests against a full
MediaWiki instance isn't exactly trivial. Even though the contrib
module features a test suite and a way to install MediaWiki, I haven't
had the chance to test this yet, so unit tests are still missing. This
is the main reason why this is marked WIP.

I have tried to follow the patch submission guide, but I believe this
is my first Git patch, so please be gentle. Any review would be
greatly appreciated and I hope this can be eventually merged in. This
work is also available on GitHub:

https://github.com/anarcat/git/tree/mediawiki-namespaces

Thanks in advance,

A.



[PATCH 3/4] remote-mediawiki: show known namespace choices on failure

2017-10-29 Thread Antoine Beaupré
if we fail to find a requested namespace, we should tell the user
which ones we know about, since we already do. this allows users to
feetch all namespaces by specifying a dummy namespace, failing, then
copying the list of namespaces in the config.

eventually, we should have a flag that allows fetching all namespaces
automatically.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index fc48846a1..07cc74bac 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -1334,7 +1334,9 @@ sub get_mw_namespace_id {
my $id;
 
if (!defined $ns) {
-   print {*STDERR} "No such namespace ${name} on MediaWiki.\n";
+   my @namespaces = sort keys %namespace_id;
+   for (@namespaces) { s/ /_/g; }
+   print {*STDERR} "No such namespace ${name} on MediaWiki, known 
namespaces: @namespaces.\n";
$ns = {is_namespace => 0};
$namespace_id{$name} = $ns;
}
-- 
2.11.0



[PATCH 2/4] remote-mediawiki: allow fetching namespaces with spaces

2017-10-29 Thread Antoine Beaupré
From: Ingo Ruhnke <grum...@gmail.com>

we still want to use spaces as separators in the config, but we should
allow the user to specify namespaces with spaces, so we use underscore
for this.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index 1c5e39831..fc48846a1 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -66,6 +66,7 @@ chomp(@tracked_categories);
 
 # Just like @tracked_categories, but for MediaWiki namespaces.
 my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.namespaces"));
+for (@tracked_namespaces) { s/_/ /g; }
 chomp(@tracked_namespaces);
 
 # Import media files on pull
-- 
2.11.0



[PATCH 1/4] remote-mediawiki: add namespace support

2017-10-29 Thread Antoine Beaupré
From: Kevin <ke...@ki-ai.org>

this introduces a new remote.origin.namespaces argument that is a
space-separated list of namespaces. the list of pages extract is then
taken from all the specified namespaces.

Reviewed-by: Antoine Beaupré <anar...@debian.org>
Signed-off-by: Antoine Beaupré <anar...@debian.org>
---
 contrib/mw-to-git/git-remote-mediawiki.perl | 34 +++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl 
b/contrib/mw-to-git/git-remote-mediawiki.perl
index e7f857c1a..1c5e39831 100755
--- a/contrib/mw-to-git/git-remote-mediawiki.perl
+++ b/contrib/mw-to-git/git-remote-mediawiki.perl
@@ -17,6 +17,7 @@ use Git;
 use Git::Mediawiki qw(clean_filename smudge_filename connect_maybe
EMPTY HTTP_CODE_OK);
 use DateTime::Format::ISO8601;
+use Scalar::Util;
 use warnings;
 
 # By default, use UTF-8 to communicate with Git and the user
@@ -63,6 +64,10 @@ chomp(@tracked_pages);
 my @tracked_categories = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.categories"));
 chomp(@tracked_categories);
 
+# Just like @tracked_categories, but for MediaWiki namespaces.
+my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all 
remote.${remotename}.namespaces"));
+chomp(@tracked_namespaces);
+
 # Import media files on pull
 my $import_media = run_git("config --get --bool 
remote.${remotename}.mediaimport");
 chomp($import_media);
@@ -256,6 +261,23 @@ sub get_mw_tracked_categories {
return;
 }
 
+sub get_mw_tracked_namespaces {
+my $pages = shift;
+foreach my $local_namespace (@tracked_namespaces) {
+my $mw_pages = $mediawiki->list( {
+action => 'query',
+list => 'allpages',
+apnamespace => get_mw_namespace_id($local_namespace),
+aplimit => 'max' } )
+|| die $mediawiki->{error}->{code} . ': '
+. $mediawiki->{error}->{details} . "\n";
+foreach my $page (@{$mw_pages}) {
+$pages->{$page->{title}} = $page;
+}
+}
+return;
+}
+
 sub get_mw_all_pages {
my $pages = shift;
# No user-provided list, get the list of pages from the API.
@@ -319,6 +341,10 @@ sub get_mw_pages {
$user_defined = 1;
get_mw_tracked_categories(\%pages);
}
+if (@tracked_namespaces) {
+$user_defined = 1;
+get_mw_tracked_namespaces(\%pages);
+}
if (!$user_defined) {
get_mw_all_pages(\%pages);
}
@@ -1263,7 +1289,6 @@ my %cached_mw_namespace_id;
 sub get_mw_namespace_id {
$mediawiki = connect_maybe($mediawiki, $remotename, $url);
my $name = shift;
-
if (!exists $namespace_id{$name}) {
# Look at configuration file, if the record for that namespace 
is
# already cached. Namespaces are stored in form:
@@ -1331,7 +1356,12 @@ sub get_mw_namespace_id {
 sub get_mw_namespace_id_for_page {
my $namespace = shift;
if ($namespace =~ /^([^:]*):/) {
-   return get_mw_namespace_id($namespace);
+   my ($ns, $id) = split(/:/, $namespace);
+   if (Scalar::Util::looks_like_number($id)) {
+   return get_mw_namespace_id($ns);
+   } else{
+   return
+   }
} else {
return;
}
-- 
2.11.0



Re: [PATCH] graph.c: visual difference on subsequent series

2015-07-27 Thread Antoine Beaupré
Any reason why this patch wasn't included / reviewed?

Thanks,

A.

On 2014-11-10 08:33:32, Antoine Beaupré wrote:
 For projects with separate history lines and, thus, multiple root-commits, the
 linear arrangement of `git log --graph --oneline` does not allow the user to
 spot where the sequence ends, giving the impression that it's a contiguous
 history. E.g.

 History sequence A: a1 -- a2 -- a3 (root-commit)
 History sequence B: b1 -- b2 -- b3 (root-commit)

 git log --graph --oneline
 * a1
 * a2
 * a3
 * b1
 * b2
 * b3

 In a GUI tool, the root-commit of each series would stand out on the graph.

 This modification changes the commit char to a different symbol ('o'), so 
 users
 of the command-line graph tool can easily identify root-commits and make sense
 of where each series is limited to.

 git log --graph --oneline
 * a1
 * a2
 o a3
 * b1
 * b2
 o b3

 The 'o' character was chosen because it is the same character used in rev-list
 to mark root commits.

 This patch is similar than the one provided by Milton Soares Filho in
 1382734287.31768.1.git.send.email.milton.soares.fi...@gmail.com but was
 implemented independently and uses the 'o' character instead of 'x'.

 Other solutions were discarded for those reasons:

  * line delimiters: we want to keep one commit per line
  * tree indentation: it makes little sense with commit trees without
common history, and is more complicated to implement

 Signed-off-by: Antoine Beaupré anar...@koumbit.org
 ---
  revision.c |  8 ++--
  t/t4202-log.sh | 10 +-
  t/t6016-rev-list-graph-simplify-history.sh | 14 +++---
  3 files changed, 18 insertions(+), 14 deletions(-)

 diff --git a/revision.c b/revision.c
 index 75dda92..5f21e24 100644
 --- a/revision.c
 +++ b/revision.c
 @@ -3246,8 +3246,12 @@ char *get_revision_mark(const struct rev_info *revs, 
 const struct commit *commit
   return ;
   else
   return ;
 - } else if (revs-graph)
 - return *;
 + } else if (revs-graph) {
 + if (commit-parents)
 + return *;
 + else
 + return o;
 + }
   else if (revs-cherry_mark)
   return +;
   return ;
 diff --git a/t/t4202-log.sh b/t/t4202-log.sh
 index 99ab7ca..d11876e 100755
 --- a/t/t4202-log.sh
 +++ b/t/t4202-log.sh
 @@ -244,7 +244,7 @@ cat  expect EOF
  * fourth
  * third
  * second
 -* initial
 +o initial
  EOF
  
  test_expect_success 'simple log --graph' '
 @@ -272,7 +272,7 @@ cat  expect \EOF
  |/
  * third
  * second
 -* initial
 +o initial
  EOF
  
  test_expect_success 'log --graph with merge' '
 @@ -338,7 +338,7 @@ cat  expect \EOF
  |
  | second
  |
 -* commit tags/side-1~3
 +o commit tags/side-1~3
Author: A U Thor aut...@example.com
  
initial
 @@ -410,7 +410,7 @@ cat  expect \EOF
  * | third
  |/
  * second
 -* initial
 +o initial
  EOF
  
  test_expect_success 'log --graph with merge' '
 @@ -799,7 +799,7 @@ cat expect \EOF
  | -one
  | +ichi
  |
 -* commit COMMIT_OBJECT_NAME
 +o commit COMMIT_OBJECT_NAME
Author: A U Thor aut...@example.com
  
initial
 diff --git a/t/t6016-rev-list-graph-simplify-history.sh 
 b/t/t6016-rev-list-graph-simplify-history.sh
 index f7181d1..74b6fc3 100755
 --- a/t/t6016-rev-list-graph-simplify-history.sh
 +++ b/t/t6016-rev-list-graph-simplify-history.sh
 @@ -81,7 +81,7 @@ test_expect_success '--graph --all' '
   echo |/| expected 
   echo * | $A2  expected 
   echo |/expected 
 - echo * $A1  expected 
 + echo o $A1  expected 
   git rev-list --graph --all  actual 
   test_cmp expected actual
   '
 @@ -111,7 +111,7 @@ test_expect_success '--graph --simplify-by-decoration' '
   echo |/| expected 
   echo * | $A2  expected 
   echo |/expected 
 - echo * $A1  expected 
 + echo o $A1  expected 
   git rev-list --graph --all --simplify-by-decoration  actual 
   test_cmp expected actual
   '
 @@ -139,7 +139,7 @@ test_expect_success '--graph --simplify-by-decoration 
 prune branch B' '
   echo * | $A3  expected 
   echo |/expected 
   echo * $A2  expected 
 - echo * $A1  expected 
 + echo o $A1  expected 
   git rev-list --graph --simplify-by-decoration --all  actual 
   test_cmp expected actual
   '
 @@ -156,7 +156,7 @@ test_expect_success '--graph --full-history -- bar.txt' '
   echo | |/expected 
   echo * | $A3  expected 
   echo |/expected 
 - echo * $A2  expected 
 + echo o $A2  expected 
   git rev-list --graph --full-history --all -- bar.txt  actual 
   test_cmp expected actual
   '
 @@ -170,7 +170,7 @@ test_expect_success '--graph --full-history 
 --simplify-merges -- bar.txt' '
   echo * | $A5  expected 
   echo * | $A3  expected 
   echo

[PATCH] graph.c: visual difference on subsequent series

2014-11-10 Thread Antoine Beaupré
For projects with separate history lines and, thus, multiple root-commits, the
linear arrangement of `git log --graph --oneline` does not allow the user to
spot where the sequence ends, giving the impression that it's a contiguous
history. E.g.

History sequence A: a1 -- a2 -- a3 (root-commit)
History sequence B: b1 -- b2 -- b3 (root-commit)

git log --graph --oneline
* a1
* a2
* a3
* b1
* b2
* b3

In a GUI tool, the root-commit of each series would stand out on the graph.

This modification changes the commit char to a different symbol ('o'), so users
of the command-line graph tool can easily identify root-commits and make sense
of where each series is limited to.

git log --graph --oneline
* a1
* a2
o a3
* b1
* b2
o b3

The 'o' character was chosen because it is the same character used in rev-list
to mark root commits.

This patch is similar than the one provided by Milton Soares Filho in
1382734287.31768.1.git.send.email.milton.soares.fi...@gmail.com but was
implemented independently and uses the 'o' character instead of 'x'.

Other solutions were discarded for those reasons:

 * line delimiters: we want to keep one commit per line
 * tree indentation: it makes little sense with commit trees without
   common history, and is more complicated to implement

Signed-off-by: Antoine Beaupré anar...@koumbit.org
---
 revision.c |  8 ++--
 t/t4202-log.sh | 10 +-
 t/t6016-rev-list-graph-simplify-history.sh | 14 +++---
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/revision.c b/revision.c
index 75dda92..5f21e24 100644
--- a/revision.c
+++ b/revision.c
@@ -3246,8 +3246,12 @@ char *get_revision_mark(const struct rev_info *revs, 
const struct commit *commit
return ;
else
return ;
-   } else if (revs-graph)
-   return *;
+   } else if (revs-graph) {
+   if (commit-parents)
+   return *;
+   else
+   return o;
+   }
else if (revs-cherry_mark)
return +;
return ;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 99ab7ca..d11876e 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -244,7 +244,7 @@ cat  expect EOF
 * fourth
 * third
 * second
-* initial
+o initial
 EOF
 
 test_expect_success 'simple log --graph' '
@@ -272,7 +272,7 @@ cat  expect \EOF
 |/
 * third
 * second
-* initial
+o initial
 EOF
 
 test_expect_success 'log --graph with merge' '
@@ -338,7 +338,7 @@ cat  expect \EOF
 |
 | second
 |
-* commit tags/side-1~3
+o commit tags/side-1~3
   Author: A U Thor aut...@example.com
 
   initial
@@ -410,7 +410,7 @@ cat  expect \EOF
 * | third
 |/
 * second
-* initial
+o initial
 EOF
 
 test_expect_success 'log --graph with merge' '
@@ -799,7 +799,7 @@ cat expect \EOF
 | -one
 | +ichi
 |
-* commit COMMIT_OBJECT_NAME
+o commit COMMIT_OBJECT_NAME
   Author: A U Thor aut...@example.com
 
   initial
diff --git a/t/t6016-rev-list-graph-simplify-history.sh 
b/t/t6016-rev-list-graph-simplify-history.sh
index f7181d1..74b6fc3 100755
--- a/t/t6016-rev-list-graph-simplify-history.sh
+++ b/t/t6016-rev-list-graph-simplify-history.sh
@@ -81,7 +81,7 @@ test_expect_success '--graph --all' '
echo |/| expected 
echo * | $A2  expected 
echo |/expected 
-   echo * $A1  expected 
+   echo o $A1  expected 
git rev-list --graph --all  actual 
test_cmp expected actual
'
@@ -111,7 +111,7 @@ test_expect_success '--graph --simplify-by-decoration' '
echo |/| expected 
echo * | $A2  expected 
echo |/expected 
-   echo * $A1  expected 
+   echo o $A1  expected 
git rev-list --graph --all --simplify-by-decoration  actual 
test_cmp expected actual
'
@@ -139,7 +139,7 @@ test_expect_success '--graph --simplify-by-decoration prune 
branch B' '
echo * | $A3  expected 
echo |/expected 
echo * $A2  expected 
-   echo * $A1  expected 
+   echo o $A1  expected 
git rev-list --graph --simplify-by-decoration --all  actual 
test_cmp expected actual
'
@@ -156,7 +156,7 @@ test_expect_success '--graph --full-history -- bar.txt' '
echo | |/expected 
echo * | $A3  expected 
echo |/expected 
-   echo * $A2  expected 
+   echo o $A2  expected 
git rev-list --graph --full-history --all -- bar.txt  actual 
test_cmp expected actual
'
@@ -170,7 +170,7 @@ test_expect_success '--graph --full-history 
--simplify-merges -- bar.txt' '
echo * | $A5  expected 
echo * | $A3  expected 
echo |/expected 
-   echo * $A2  expected 
+   echo o $A2  expected 
git rev-list --graph --full-history --simplify-merges --all \
-- bar.txt