[MediaWiki-commits] [Gerrit] mediawiki/core[master]: Block same-file reuploads

2017-09-25 Thread jenkins-bot (Code Review)
jenkins-bot has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/378702 )

Change subject: Block same-file reuploads
..


Block same-file reuploads

When uploading a file, there are a few ways of checking for and blocking
(or at least warning about) duplicate uploads already.

However, it occasionally seems to happen that files get uploaded twice.
The exact same file, usually - submitted at (almost) the exact same time
(possibly some error in whatever submits the file upload, but still)

Given 2 uploads at (almost) the exact same time, both of them are stored,
even if they are the exact same files.
The last upload also ends up with a `logging` entry with `log_page = 0`.

I don’t believe such upload should go through: if we do find that a file
is an exact duplicate of something that already exists, I don’t see any
reason it should go through.

Note that with this patch, it will become impossible to reupload a file
with the exact same hash (which was possible before.)
If we still want to allow same-file reuploads while also blocking these
kind of race-condition same-uploads, we could make the check more strict
(e.g. also check timestamps, or check if page already exists, or …)

Bug: T158480
Change-Id: I76cbd2c64c3b893997f1f85974d6f82cbfe121e1
---
M includes/filerepo/file/LocalFile.php
M maintenance/importImages.php
2 files changed, 37 insertions(+), 15 deletions(-)

Approvals:
  MarkTraceur: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/includes/filerepo/file/LocalFile.php 
b/includes/filerepo/file/LocalFile.php
index 96e7a7e..c7aa40e 100644
--- a/includes/filerepo/file/LocalFile.php
+++ b/includes/filerepo/file/LocalFile.php
@@ -1244,8 +1244,22 @@
// Once the second operation goes through, then the 
current version was
// updated and we must therefore update the DB too.
$oldver = $status->value;
-   if ( !$this->recordUpload2( $oldver, $comment, 
$pageText, $props, $timestamp, $user, $tags ) ) {
-   $status->fatal( 'filenotfound', $srcPath );
+   $uploadStatus = $this->recordUpload2(
+   $oldver,
+   $comment,
+   $pageText,
+   $props,
+   $timestamp,
+   $user,
+   $tags
+   );
+   if ( !$uploadStatus->isOK() ) {
+   if ( $uploadStatus->hasMessage( 'filenotfound' 
) ) {
+   // update filenotfound error with more 
specific path
+   $status->fatal( 'filenotfound', 
$srcPath );
+   } else {
+   $status->merge( $uploadStatus );
+   }
}
}
 
@@ -1275,7 +1289,7 @@
 
$pageText = SpecialUpload::getInitialPageText( $desc, $license, 
$copyStatus, $source );
 
-   if ( !$this->recordUpload2( $oldver, $desc, $pageText, false, 
$timestamp, $user ) ) {
+   if ( !$this->recordUpload2( $oldver, $desc, $pageText, false, 
$timestamp, $user )->isOK() ) {
return false;
}
 
@@ -1295,7 +1309,7 @@
 * @param string|bool $timestamp
 * @param null|User $user
 * @param string[] $tags
-* @return bool
+* @return Status
 */
function recordUpload2(
$oldver, $comment, $pageText, $props = false, $timestamp = 
false, $user = null, $tags = []
@@ -1329,7 +1343,7 @@
if ( !$this->fileExists ) {
wfDebug( __METHOD__ . ": File " . $this->getRel() . " 
went missing!\n" );
 
-   return false;
+   return Status::newFatal( 'filenotfound', 
$this->getRel() );
}
 
$dbw->startAtomic( __METHOD__ );
@@ -1362,16 +1376,24 @@
$reupload = ( $dbw->affectedRows() == 0 );
 
if ( $reupload ) {
+   $row = $dbw->selectRow(
+   'image',
+   [ 'img_timestamp', 'img_sha1' ],
+   [ 'img_name' => $this->getName() ],
+   __METHOD__,
+   [ 'LOCK IN SHARE MODE' ]
+   );
+
+   if ( $row && $row->img_sha1 === $this->sha1 ) {
+   $dbw->endAtomic( __METHOD__ );
+   wfDebug( __METHOD__ . ": File " . 
$this->getRel() . " already exists!\n" );
+   $title = Title::newFromText( $this->getName(), 

[MediaWiki-commits] [Gerrit] mediawiki/core[master]: Block same-file reuploads

2017-09-18 Thread Matthias Mullie (Code Review)
Matthias Mullie has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/378702 )

Change subject: Block same-file reuploads
..

Block same-file reuploads

When uploading a file, there are a few ways of checking for and blocking
(or at least warning about) duplicate uploads already.

However, it occasionally seems to happen that files get uploaded twice.
The exact same file, usually - submitted at (almost) the exact same time
(possibly some error in whatever submits the file upload, but still)

Given 2 uploads at (almost) the exact same time, both of them are stored,
even if they are the exact same files.
The last upload also ends up with a `logging` entry with `log_page = 0`.

I don’t believe such upload should go through: if we do find that a file
is an exact duplicate of something that already exists, I don’t see any
reason it should go through.

Note that with this patch, it will become impossible to reupload a file
with the exact same hash (which was possible before.)
If we still want to allow same-file reuploads while also blocking these
kind of race-condition same-uploads, we could make the check more strict
(e.g. also check timestamps, or check if page already exists, or …)

Bug: T158480
Change-Id: I76cbd2c64c3b893997f1f85974d6f82cbfe121e1
---
M includes/filerepo/file/LocalFile.php
M maintenance/importImages.php
2 files changed, 29 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/core 
refs/changes/02/378702/1

diff --git a/includes/filerepo/file/LocalFile.php 
b/includes/filerepo/file/LocalFile.php
index 96e7a7e..9697d7e 100644
--- a/includes/filerepo/file/LocalFile.php
+++ b/includes/filerepo/file/LocalFile.php
@@ -1244,8 +1244,14 @@
// Once the second operation goes through, then the 
current version was
// updated and we must therefore update the DB too.
$oldver = $status->value;
-   if ( !$this->recordUpload2( $oldver, $comment, 
$pageText, $props, $timestamp, $user, $tags ) ) {
-   $status->fatal( 'filenotfound', $srcPath );
+   $uploadStatus = $this->recordUpload2( $oldver, 
$comment, $pageText, $props, $timestamp, $user, $tags );
+   if ( !$uploadStatus->isOK() ) {
+   if ( $uploadStatus->hasMessage( 'filenotfound' 
) ) {
+   // update filenotfound error with more 
specific path
+   $status->fatal( 'filenotfound', 
$srcPath );
+   } else {
+   $status->merge( $uploadStatus );
+   }
}
}
 
@@ -1275,7 +1281,7 @@
 
$pageText = SpecialUpload::getInitialPageText( $desc, $license, 
$copyStatus, $source );
 
-   if ( !$this->recordUpload2( $oldver, $desc, $pageText, false, 
$timestamp, $user ) ) {
+   if ( !$this->recordUpload2( $oldver, $desc, $pageText, false, 
$timestamp, $user )->isOK() ) {
return false;
}
 
@@ -1295,7 +1301,7 @@
 * @param string|bool $timestamp
 * @param null|User $user
 * @param string[] $tags
-* @return bool
+* @return Status
 */
function recordUpload2(
$oldver, $comment, $pageText, $props = false, $timestamp = 
false, $user = null, $tags = []
@@ -1329,7 +1335,7 @@
if ( !$this->fileExists ) {
wfDebug( __METHOD__ . ": File " . $this->getRel() . " 
went missing!\n" );
 
-   return false;
+   return Status::newFatal( 'filenotfound', 
$this->getRel() );
}
 
$dbw->startAtomic( __METHOD__ );
@@ -1362,16 +1368,24 @@
$reupload = ( $dbw->affectedRows() == 0 );
 
if ( $reupload ) {
+   $row = $dbw->selectRow(
+   'image',
+   [ 'img_timestamp', 'img_sha1' ],
+   [ 'img_name' => $this->getName() ],
+   __METHOD__,
+   [ 'LOCK IN SHARE MODE' ]
+   );
+
+   if ( $row && $row->img_sha1 === $this->sha1 ) {
+   $dbw->endAtomic( __METHOD__ );
+   wfDebug( __METHOD__ . ": File " . 
$this->getRel() . " already exists!\n" );
+   $title = Title::newFromText( $this->getName(), 
NS_FILE );
+   return Status::newFatal( 
'fileexists-no-change', $title->getPrefixedText() );
+   }
+
if ( $allowTimeKludge ) {
# Use LOCK IN