[MediaWiki-commits] [Gerrit] Add .gitignore and .gitreview - change (openzim)

2013-09-18 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Add .gitignore and .gitreview
..


Add .gitignore and .gitreview

Bug: 54175
Change-Id: Ic900b6a1aa3848aa59b428adef19115a9c55ac61
---
A .gitignore
A .gitreview
2 files changed, 9 insertions(+), 0 deletions(-)

Approvals:
  Reedy: Looks good to me, but someone else must approve
  Kelson: Verified; Looks good to me, approved



diff --git a/.gitignore b/.gitignore
new file mode 100644
index 000..98b092a
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,4 @@
+.svn
+*~
+*.kate-swp
+.*.swp
diff --git a/.gitreview b/.gitreview
new file mode 100644
index 000..2a01b99
--- /dev/null
+++ b/.gitreview
@@ -0,0 +1,5 @@
+[gerrit]
+host=gerrit.wikimedia.org
+port=29418
+project=openzim.git
+defaultbranch=master

-- 
To view, visit https://gerrit.wikimedia.org/r/84432
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic900b6a1aa3848aa59b428adef19115a9c55ac61
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Reedy re...@wikimedia.org
Gerrit-Reviewer: Kelson kel...@kiwix.org
Gerrit-Reviewer: Reedy re...@wikimedia.org

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Fix mobile/desktop view choices substainability on 3 parts d... - change (mediawiki...MobileFrontend)

2014-03-09 Thread Kelson (Code Review)
Kelson has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/117699

Change subject: Fix mobile/desktop view choices substainability on 3 parts 
domain names (like wikimedia.org.uk) #54885
..

Fix mobile/desktop view choices substainability on 3 parts domain names (like 
wikimedia.org.uk) #54885

Change-Id: I4cc4faf6c6f80571a65483299e04c596a7c1a5f8
---
M includes/MobileContext.php
1 file changed, 9 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/MobileFrontend 
refs/changes/99/117699/1

diff --git a/includes/MobileContext.php b/includes/MobileContext.php
index b65140d..ad19f00 100644
--- a/includes/MobileContext.php
+++ b/includes/MobileContext.php
@@ -493,11 +493,7 @@
$host = $parsedUrl['host'];
// Validates value as IP address
if ( !IP::isValid( $host ) ) {
-   $domainParts = explode( '.', $host );
-   $domainParts = array_reverse( $domainParts );
-   // Although some browsers will accept cookies without 
the initial ., ยป RFC 2109 requires it to be included.
-   wfProfileOut( __METHOD__ );
-   return count( $domainParts ) = 2 ? '.' . 
$domainParts[1] . '.' . $domainParts[0] : $host;
+   return substr( $host, strpos( $host, . ) );
}
wfProfileOut( __METHOD__ );
return $host;
@@ -511,10 +507,15 @@
 * @return string
 */
public function getStopMobileRedirectCookieDomain() {
-   global $wgMFStopRedirectCookieHost;
+   global $wgMFStopRedirectCookieHost, $wgMFCookieDomain;
+   $host = $this-getRequest()-getHeader( 'Host' );
 
-   if ( !$wgMFStopRedirectCookieHost ) {
-   $wgMFStopRedirectCookieHost = $this-getBaseDomain();
+   if ( !$wgMFStopRedirectCookieHost || 
$wgMFStopRedirectCookieHost != $host ) {
+   if ( $wgMFCookieDomain  ( strpos( $host, 
$wgMFCookieDomain ) !== false )) {
+   $wgMFStopRedirectCookieHost = $wgMFCookieDomain;
+   } else {
+   $wgMFStopRedirectCookieHost = 
$this-getBaseDomain();
+   }
}
 
return $wgMFStopRedirectCookieHost;

-- 
To view, visit https://gerrit.wikimedia.org/r/117699
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4cc4faf6c6f80571a65483299e04c596a7c1a5f8
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/MobileFrontend
Gerrit-Branch: master
Gerrit-Owner: Kelson kel...@kiwix.org

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Allow to backup globalimagelinks table, T87571 - change (operations/dumps)

2015-03-28 Thread Kelson (Code Review)
Kelson has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/200313

Change subject: Allow to backup globalimagelinks table, T87571
..

Allow to backup globalimagelinks table, T87571

Change-Id: I17c40250c39f20c8981b6e840a8240e38bae43b8
---
M xmldumps-backup/README.config
M xmldumps-backup/WikiDump.py
M xmldumps-backup/worker.py
3 files changed, 13 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/dumps 
refs/changes/13/200313/1

diff --git a/xmldumps-backup/README.config b/xmldumps-backup/README.config
index 41a8699..03f763f 100644
--- a/xmldumps-backup/README.config
+++ b/xmldumps-backup/README.config
@@ -62,6 +62,9 @@
 wikidatalist-- File with list of databases which act as a wikibase
repo. For Wikimedia projects this currently consists
of the project 'wikidata'.
+globalusagelist -- File with list of databases which act as a media
+   repo with the GlobalUsage extension. For Wikimedia projects
+   this currently consists of the project 'commons'.
 biglist -- File with list of large wikis for which no history dumps are 
generated because they are too huge. (This must be an old 
deprecated option; these days we do not care how big they 
diff --git a/xmldumps-backup/WikiDump.py b/xmldumps-backup/WikiDump.py
index 6d0775c..7c7d624 100644
--- a/xmldumps-backup/WikiDump.py
+++ b/xmldumps-backup/WikiDump.py
@@ -176,6 +176,7 @@
privatelist: ,
flaggedrevslist: ,
wikidatalist: ,
+   globalusagelist: ,
 #  dir: ,
forcenormal: 0,
halt: 0,
@@ -316,6 +317,7 @@
self.privateList = MiscUtils.dbList(self.conf.get(wiki, 
privatelist))
self.flaggedRevsList = MiscUtils.dbList(self.conf.get(wiki, 
flaggedrevslist))
self.wikidataList = MiscUtils.dbList(self.conf.get(wiki, 
wikidatalist))
+   self.globalUsageList = MiscUtils.dbList(self.conf.get(wiki, 
globalusagelist))
self.wikiDir = self.conf.get(wiki, dir)
self.forceNormal = self.conf.getint(wiki, forcenormal)
self.halt = self.conf.getint(wiki, halt)
@@ -489,6 +491,9 @@
 
def hasWikidata(self):
return self.dbName in self.config.wikidataList
+
+   def hasGlobalUsage(self):
+   return self.dbName in self.config.globalUsageList

def isLocked(self):
return os.path.exists(self.lockFile())
diff --git a/xmldumps-backup/worker.py b/xmldumps-backup/worker.py
index c8a4be0..8ec96c0 100644
--- a/xmldumps-backup/worker.py
+++ b/xmldumps-backup/worker.py
@@ -558,6 +558,7 @@
self.wiki = wiki
self._hasFlaggedRevs = self.wiki.hasFlaggedRevs()
self._hasWikidata = self.wiki.hasWikidata()
+   self._hasGlobalUsage = self.wiki.hasGlobalUsage()
self._prefetch = prefetch
self._spawn = spawn
self.chunkInfo = chunkInfo
@@ -679,6 +680,10 @@
self.dumpItems.append(
PublicTable( sites, sitestable,This 
contains the SiteMatrix information from meta.wikimedia.org provided as a 
table. ))
 
+   if self._hasGlobalUsage:
+   self.dumpItems.append(
+   PublicTable( globalimagelinks, 
globalimagelinks,Global wiki media/files usage records. ))
+
self.dumpItems.append(
BigXmlDump(meta-history,
   metahistorybz2dump,

-- 
To view, visit https://gerrit.wikimedia.org/r/200313
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I17c40250c39f20c8981b6e840a8240e38bae43b8
Gerrit-PatchSet: 1
Gerrit-Project: operations/dumps
Gerrit-Branch: ariel
Gerrit-Owner: Kelson kel...@kiwix.org

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Fixed wrong libmagic path - change (openzim)

2015-06-07 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Fixed wrong libmagic path
..


Fixed wrong libmagic path

Change-Id: Ic7674a62530abde447252fd6c0b693a8d741443e
---
M zimwriterfs/macosx-build.sh
1 file changed, 1 insertion(+), 2 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/macosx-build.sh b/zimwriterfs/macosx-build.sh
index 6f2e3bc..ae38a39 100755
--- a/zimwriterfs/macosx-build.sh
+++ b/zimwriterfs/macosx-build.sh
@@ -10,10 +10,9 @@
 LIBZIM_DIR=${ZIM_DIR}/build/lib
 LZMA_DIR=${KIWIX_ROOT}/src/dependencies/xz/build
 LIBLZMA_DIR=${LZMA_DIR}/lib
-MAGIC_DIR=/usr/local
+MAGIC_DIR=/usr/local/Cellar/libmagic/5.22_1
 LIBMAGIC_DIR=${MAGIC_DIR}/lib
 STATIC_LDFLAGS=${LIBZIM_DIR}/libzim.a ${LIBLZMA_DIR}/liblzma.a 
${LIBMAGIC_DIR}/libmagic.a -lz
-#LDFLAGS=-L${KIWIX_ROOT}/src/dependencies/zimlib-1.2/build/lib/ -lzim 
-L${KIWIX_ROOT}/src/dependencies/xz/build/lib/ -llzma 
-L/usr/local/Cellar/libmagic/5.22_1/lib/ -lmagic -lz
 
 CC=clang -O3
 CXX=clang++ -O3

-- 
To view, visit https://gerrit.wikimedia.org/r/216530
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic7674a62530abde447252fd6c0b693a8d741443e
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Rgaudin rgau...@gmail.com
Gerrit-Reviewer: Kelson kel...@kiwix.org

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Fixed compilation on OSX. made a script to compile both stat... - change (openzim)

2015-06-06 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Fixed compilation on OSX. made a script to compile both static 
and shared
..


Fixed compilation on OSX. made a script to compile both static and shared

Change-Id: I9dd7b7039988be13f932bdb3e4b30c5bcf2e6783
---
M zimwriterfs/README
A zimwriterfs/macosx-build.sh
2 files changed, 79 insertions(+), 12 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/README b/zimwriterfs/README
index c1d2b10..d8d6a83 100644
--- a/zimwriterfs/README
+++ b/zimwriterfs/README
@@ -32,16 +32,10 @@
 OSX compilation
 
 
-Until the autotool is configured properly to support flexible options,
-you can still compile and use zimwriterfs with some manual work.
+On MaxOSX, a script helps you build zimwriterfs both statically and 
dynamically.
+You must have a working and set up Kiwix repository (with dependencies ready).
 
-1. Have a working Kiwix dev environnment (up to make inside src/dependencies)
-2. Install libmagic: brew install libmagic
-3. zimwriterfs compilation
-./autogen.sh
-LDFLAGS=-L$KIWIX_ROOT/src/dependencies/zimlib-1.1/build/lib/ 
-L/usr/local/Cellar/libmagic/5.14/lib/ 
-L$KIWIX_ROOT/src/dependencies/xz/build/lib/ CXXFLAGS=-I 
$KIWIX_ROOT/src/kiwix/src/dependencies/zimlib-1.1/include/ 
-I/usr/local/Cellar/libmagic/5.14/include/ 
-I$KIWIX_ROOT/src/kiwix/src/dependencies/xz/src/liblzma/lzma/ 
-I$KIWIX_ROOT/src/kiwix/src/dependencies/xz/src/liblzma/ ./configure  make
-4. Copy libs if not in LIBRARY_PATH
-ln -s $KIWIX_ROOT/src/dependencies/zimlib-1.1/build/lib/libzim.dylib .
-ln -s $KIWIX_ROOT/src/dependencies/xz/build/lib/liblzma.dylib .
-5. Use it
-./zimwriterfs
\ No newline at end of file
+1. Install libmagic with brew (it's important)
+   - ruby -e $(curl -fsSL 
https://raw.githubusercontent.com/Homebrew/install/master/install)
+   - brew install libmagic
+2. KIWIX_ROOT=/Users/xxx/src/kiwix ./macosx-build.sh
diff --git a/zimwriterfs/macosx-build.sh b/zimwriterfs/macosx-build.sh
new file mode 100755
index 000..6f2e3bc
--- /dev/null
+++ b/zimwriterfs/macosx-build.sh
@@ -0,0 +1,73 @@
+#!/bin/bash
+
+if [ x$KIWIX_ROOT = x ];
+   then
+   echo You must define envvironment variable KIWIX_ROOT to the root of 
Kiwix git repository. Exiting.
+   exit 1
+fi
+
+ZIM_DIR=${KIWIX_ROOT}/src/dependencies/zimlib-1.2
+LIBZIM_DIR=${ZIM_DIR}/build/lib
+LZMA_DIR=${KIWIX_ROOT}/src/dependencies/xz/build
+LIBLZMA_DIR=${LZMA_DIR}/lib
+MAGIC_DIR=/usr/local
+LIBMAGIC_DIR=${MAGIC_DIR}/lib
+STATIC_LDFLAGS=${LIBZIM_DIR}/libzim.a ${LIBLZMA_DIR}/liblzma.a 
${LIBMAGIC_DIR}/libmagic.a -lz
+#LDFLAGS=-L${KIWIX_ROOT}/src/dependencies/zimlib-1.2/build/lib/ -lzim 
-L${KIWIX_ROOT}/src/dependencies/xz/build/lib/ -llzma 
-L/usr/local/Cellar/libmagic/5.22_1/lib/ -lmagic -lz
+
+CC=clang -O3
+CXX=clang++ -O3
+CXXFLAGS=-Igumbo -I${ZIM_DIR}/include -I${MAGIC_DIR}/include/ 
-I${LZMA_DIR}/include
+CFLAGS=$CXXFLAGS
+LDFLAGS=-L. -lzim -llzma -lmagic -lz
+SHARED_OUTPUT=zimwriterfs-shared
+STATIC_OUTPUT=zimwriterfs-static
+
+function compile {
+   $CXX $CXXFLAGS -c zimwriterfs.cpp
+   $CC $CFLAGS -c gumbo/utf8.c
+   $CC $CFLAGS -c gumbo/string_buffer.c
+   $CC $CFLAGS -c gumbo/parser.c
+   $CC $CFLAGS -c gumbo/error.c
+   $CC $CFLAGS -c gumbo/string_piece.c
+   $CC $CFLAGS -c gumbo/tag.c
+   $CC $CFLAGS -c gumbo/vector.c
+   $CC $CFLAGS -c gumbo/tokenizer.c
+   $CC $CFLAGS -c gumbo/util.c
+   $CC $CFLAGS -c gumbo/char_ref.c
+   $CC $CFLAGS -c gumbo/attribute.c
+}
+
+echo Compiling zimwriterfs for OSX as static then shared.
+
+# remove object files
+echo Clean-up repository (*.o, zimwriterfs-*, *.dylib)
+rm *.o ${STATIC_OUTPUT} ${SHARED_OUTPUT} *.dylib
+
+# compile source code
+echo Compile source code file objects
+compile
+
+# link statically
+echo Link statically into ${STATIC_OUTPUT}
+$CXX $CXXFLAGS $STATIC_LDFLAGS -o ${STATIC_OUTPUT} *.o
+
+# copy dylib to current folder
+echo Copy dylibs into the current folder
+cp -v ${KIWIX_ROOT}/src/dependencies/zimlib-1.2/build/lib/libzim.dylib .
+cp -v ${KIWIX_ROOT}/src/dependencies/xz/build/lib/liblzma.dylib .
+cp -v ${LIBMAGIC_DIR}/libmagic.dylib .
+chmod 644 ./libmagic.dylib
+
+# link dynamicaly
+echo Link dynamically into ${SHARED_OUTPUT}
+$CXX $CXXFLAGS $LDFLAGS -o zimwriterfs-shared *.o
+
+echo Fix install name tool on ${SHARED_OUTPUT}
+install_name_tool -change ${LIBZIM_DIR}/libzim.0.dylib libzim.dylib 
${SHARED_OUTPUT}
+install_name_tool -change ${LIBLZMA_DIR}/liblzma.5.dylib liblzma.dylib 
${SHARED_OUTPUT}
+install_name_tool -change ${LIBMAGIC_DIR}/libmagic.1.dylib libmagic.dylib 
${SHARED_OUTPUT}
+otool -L ${SHARED_OUTPUT}
+
+ls -lh ${STATIC_OUTPUT}
+ls -lh ${SHARED_OUTPUT}

-- 
To view, visit https://gerrit.wikimedia.org/r/216507
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: 

[MediaWiki-commits] [Gerrit] Move few utility functions to a separate module. - change (openzim)

2016-06-22 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Move few utility functions to a separate module.
..


Move few utility functions to a separate module.

Change-Id: Ia26754d14f0cd6c557626675beb5a7c5fe2cadaa
---
M zimwriterfs/Makefile.am
A zimwriterfs/tools.cpp
A zimwriterfs/tools.h
M zimwriterfs/zimwriterfs.cpp
4 files changed, 578 insertions(+), 481 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index 0caa3c8..ea2ab7a 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -1,3 +1,6 @@
 AUTOMAKE_OPTIONS=subdir-objects
 bin_PROGRAMS=zimwriterfs
-zimwriterfs_SOURCES=zimwriterfs.cpp
+
+zimwriterfs_SOURCES= \
+zimwriterfs.cpp \
+tools.cpp
diff --git a/zimwriterfs/tools.cpp b/zimwriterfs/tools.cpp
new file mode 100644
index 000..019b22c
--- /dev/null
+++ b/zimwriterfs/tools.cpp
@@ -0,0 +1,525 @@
+/*
+ * Copyright 2013-2016 Emmanuel Engelhart 
+ * Copyright 2016 Matthieu Gautier 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU  General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ * MA 02110-1301, USA.
+ */
+
+#include "tools.h"
+
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef _WIN32
+#define SEPARATOR "\\"
+#else
+#define SEPARATOR "/"
+#endif
+
+
+/* Init file extensions hash */
+static std::map _create_extMimeTypes(){
+  std::map extMimeTypes;
+  extMimeTypes["HTML"] = "text/html";
+  extMimeTypes["html"] = "text/html";
+  extMimeTypes["HTM"] = "text/html";
+  extMimeTypes["htm"] = "text/html";
+  extMimeTypes["PNG"] = "image/png";
+  extMimeTypes["png"] = "image/png";
+  extMimeTypes["TIFF"] = "image/tiff";
+  extMimeTypes["tiff"] = "image/tiff";
+  extMimeTypes["TIF"] = "image/tiff";
+  extMimeTypes["tif"] = "image/tiff";
+  extMimeTypes["JPEG"] = "image/jpeg";
+  extMimeTypes["jpeg"] = "image/jpeg";
+  extMimeTypes["JPG"] = "image/jpeg";
+  extMimeTypes["jpg"] = "image/jpeg";
+  extMimeTypes["GIF"] = "image/gif";
+  extMimeTypes["gif"] = "image/gif";
+  extMimeTypes["SVG"] = "image/svg+xml";
+  extMimeTypes["svg"] = "image/svg+xml";
+  extMimeTypes["TXT"] = "text/plain";
+  extMimeTypes["txt"] = "text/plain";
+  extMimeTypes["XML"] = "text/xml";
+  extMimeTypes["xml"] = "text/xml";
+  extMimeTypes["EPUB"] = "application/epub+zip";
+  extMimeTypes["epub"] = "application/epub+zip";
+  extMimeTypes["PDF"] = "application/pdf";
+  extMimeTypes["pdf"] = "application/pdf";
+  extMimeTypes["OGG"] = "application/ogg";
+  extMimeTypes["ogg"] = "application/ogg";
+  extMimeTypes["JS"] = "application/javascript";
+  extMimeTypes["js"] = "application/javascript";
+  extMimeTypes["JSON"] = "application/json";
+  extMimeTypes["json"] = "application/json";
+  extMimeTypes["CSS"] = "text/css";
+  extMimeTypes["css"] = "text/css";
+  extMimeTypes["otf"] = "application/vnd.ms-opentype";
+  extMimeTypes["OTF"] = "application/vnd.ms-opentype";
+  extMimeTypes["eot"] = "application/vnd.ms-fontobject";
+  extMimeTypes["EOT"] = "application/vnd.ms-fontobject";
+  extMimeTypes["ttf"] = "application/font-ttf";
+  extMimeTypes["TTF"] = "application/font-ttf";
+  extMimeTypes["woff"] = "application/font-woff";
+  extMimeTypes["WOFF"] = "application/font-woff";
+  extMimeTypes["vtt"] = "text/vtt";
+  extMimeTypes["VTT"] = "text/vtt";
+  
+  return extMimeTypes;
+}
+
+static std::map extMimeTypes = 
_create_extMimeTypes();
+
+static std::map fileMimeTypes;
+
+
+extern std::string directoryPath;
+extern bool inflateHtmlFlag;
+extern bool uniqueNamespace;
+extern magic_t magic;
+
+/* Decompress an STL string using zlib and return the original data. */
+inline std::string inflateString(const std::string& str) {
+  z_stream zs; // z_stream is zlib's control structure
+  memset(, 0, sizeof(zs));
+
+  if (inflateInit() != Z_OK)
+throw(std::runtime_error("inflateInit failed while decompressing."));
+
+  zs.next_in = (Bytef*)str.data();
+  zs.avail_in = str.size();
+
+  int ret;
+  char outbuffer[32768];
+  std::string outstring;
+
+  // get the decompressed bytes blockwise using repeated calls to inflate
+  do {
+zs.next_out 

[MediaWiki-commits] [Gerrit] Move article's related stuffs in article.(h|cpp). - change (openzim)

2016-06-22 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Move article's related stuffs in article.(h|cpp).
..


Move article's related stuffs in article.(h|cpp).

Change-Id: I2a257ea1a0a13eca0748b444838a525666a9090d
---
M zimwriterfs/Makefile.am
A zimwriterfs/article.cpp
A zimwriterfs/article.h
M zimwriterfs/zimwriterfs.cpp
4 files changed, 253 insertions(+), 199 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index ea2ab7a..3383e35 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -3,4 +3,5 @@
 
 zimwriterfs_SOURCES= \
 zimwriterfs.cpp \
-tools.cpp
+tools.cpp \
+article.cpp
diff --git a/zimwriterfs/article.cpp b/zimwriterfs/article.cpp
new file mode 100644
index 000..f743cde
--- /dev/null
+++ b/zimwriterfs/article.cpp
@@ -0,0 +1,158 @@
+/*
+ * Copyright 2013-2016 Emmanuel Engelhart 
+ * Copyright 2016 Matthieu Gautier 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU  General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ * MA 02110-1301, USA.
+ */
+
+#include "article.h"
+#include "tools.h"
+
+
+extern std::string directoryPath;
+
+Article::Article(const std::string& path, const bool detectRedirects) {
+  invalid = false;
+
+  /* aid */
+  aid = path.substr(directoryPath.size()+1);
+
+  /* url */
+  url = aid;
+
+  /* mime-type */
+  mimeType = getMimeTypeForFile(aid);
+  
+  /* namespace */
+  ns = getNamespaceForMimeType(mimeType)[0];
+
+  /* HTML specific code */
+  if (mimeType.find("text/html") != std::string::npos) {
+std::size_t found;
+std::string html = getFileContent(path);
+GumboOutput* output = gumbo_parse(html.c_str());
+GumboNode* root = output->root;
+
+/* Search the content of the  tag in the HTML */
+if (root->type == GUMBO_NODE_ELEMENT && root->v.element.children.length >= 
2) {
+  const GumboVector* root_children = >v.element.children;
+  GumboNode* head = NULL;
+  for (int i = 0; i < root_children->length; ++i) {
+   GumboNode* child = (GumboNode*)(root_children->data[i]);
+   if (child->type == GUMBO_NODE_ELEMENT &&
+   child->v.element.tag == GUMBO_TAG_HEAD) {
+ head = child;
+ break;
+   }
+  }
+
+  if (head != NULL) {
+   GumboVector* head_children = >v.element.children;
+   for (int i = 0; i < head_children->length; ++i) {
+ GumboNode* child = (GumboNode*)(head_children->data[i]);
+ if (child->type == GUMBO_NODE_ELEMENT &&
+ child->v.element.tag == GUMBO_TAG_TITLE) {
+   if (child->v.element.children.length == 1) {
+ GumboNode* title_text = 
(GumboNode*)(child->v.element.children.data[0]);
+ if (title_text->type == GUMBO_NODE_TEXT) {
+   title = title_text->v.text.text;
+ }
+   }
+ }
+   }
+
+   /* Detect if this is a redirection (if no redirects CSV specified) */
+   std::string targetUrl;
+   try {
+ targetUrl = detectRedirects ? 
extractRedirectUrlFromHtml(head_children) : "";
+   } catch (std::string ) {
+ std::cerr << error << std::endl;
+   }
+   if (!targetUrl.empty()) {
+ redirectAid = computeAbsolutePath(aid, decodeUrl(targetUrl));
+ if (!fileExists(directoryPath + "/" + redirectAid)) {
+   redirectAid.clear();
+   invalid = true;
+ }
+   }
+  }
+
+  /* If no title, then compute one from the filename */
+  if (title.empty()) {
+   found = path.rfind("/");
+   if (found != std::string::npos) {
+ title = path.substr(found+1);
+ found = title.rfind(".");
+ if (found!=std::string::npos) {
+   title = title.substr(0, found);
+ }
+   } else {
+ title = path;
+   }
+   std::replace(title.begin(), title.end(), '_',  ' ');
+  }
+}
+
+gumbo_destroy_output(, output);
+  }
+}
+
+std::string Article::getAid() const
+{
+  return aid;
+}
+
+bool Article::isInvalid() const
+{
+  return invalid;
+}
+
+char Article::getNamespace() const
+{
+  return ns;
+}
+
+std::string Article::getUrl() const
+{
+  return url;
+}
+
+std::string Article::getTitle() const
+{
+  return title;
+}
+

[MediaWiki-commits] [Gerrit] Move articleSource's related stuffs in articlesource.(h|cpp). - change (openzim)

2016-06-22 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Move articleSource's related stuffs in articlesource.(h|cpp).
..


Move articleSource's related stuffs in articlesource.(h|cpp).

Change-Id: Iee91484679bf401a693af1ca7e1c7e34f2c741d0
---
M zimwriterfs/Makefile.am
A zimwriterfs/articlesource.cpp
A zimwriterfs/articlesource.h
M zimwriterfs/zimwriterfs.cpp
4 files changed, 305 insertions(+), 229 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index 3383e35..6e46553 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -4,4 +4,5 @@
 zimwriterfs_SOURCES= \
 zimwriterfs.cpp \
 tools.cpp \
-article.cpp
+article.cpp \
+articlesource.cpp
diff --git a/zimwriterfs/articlesource.cpp b/zimwriterfs/articlesource.cpp
new file mode 100644
index 000..8b0b34c
--- /dev/null
+++ b/zimwriterfs/articlesource.cpp
@@ -0,0 +1,256 @@
+/*
+ * Copyright 2013-2016 Emmanuel Engelhart 
+ * Copyright 2016 Matthieu Gautier 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU  General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ * MA 02110-1301, USA.
+ */
+
+#include "articlesource.h"
+#include "article.h"
+#include "tools.h"
+
+#include 
+
+#include 
+#include 
+#include 
+
+bool popFromFilenameQueue(std::string );
+bool isVerbose();
+
+extern std::string welcome;
+extern std::string language;
+extern std::string creator;
+extern std::string publisher;
+extern std::string title;
+extern std::string description;
+extern std::string directoryPath;
+
+std::map counters;
+char *data = NULL;
+unsigned int dataSize = 0;
+
+
+
+ArticleSource::ArticleSource() {
+  /* Prepare metadata */
+  metadataQueue.push("Language");
+  metadataQueue.push("Publisher");
+  metadataQueue.push("Creator");
+  metadataQueue.push("Title");
+  metadataQueue.push("Description");
+  metadataQueue.push("Date");
+  metadataQueue.push("Favicon");
+  metadataQueue.push("Counter");
+}
+
+void ArticleSource::init_redirectsQueue_from_file(const std::string& path){
+std::ifstream in_stream;
+std::string line;
+
+in_stream.open(path.c_str());
+while (std::getline(in_stream, line)) {
+  redirectsQueue.push(line);
+}
+in_stream.close();
+}
+
+std::string ArticleSource::getMainPage() {
+  return welcome;
+}
+
+Article *article = NULL;
+const zim::writer::Article* ArticleSource::getNextArticle() {
+  std::string path;
+
+  if (article != NULL) {
+delete(article);
+  }
+
+  if (!metadataQueue.empty()) {
+path = metadataQueue.front();
+metadataQueue.pop();
+article = new MetadataArticle(path);
+  } else if (!redirectsQueue.empty()) {
+std::string line = redirectsQueue.front();
+redirectsQueue.pop();
+article = new RedirectArticle(line);
+  } else if (popFromFilenameQueue(path)) {
+do {
+  article = new Article(path);
+} while (article && article->isInvalid() && popFromFilenameQueue(path));
+  } else {
+article = NULL;
+  }
+
+  /* Count mimetypes */
+  if (article != NULL && !article->isRedirect()) {
+
+if (isVerbose())
+  std::cout << "Creating entry for " << article->getAid() << std::endl;
+
+std::string mimeType = article->getMimeType();
+if (counters.find(mimeType) == counters.end()) {
+  counters[mimeType] = 1;
+} else {
+  counters[mimeType]++;
+}
+  }
+
+  return article;
+}
+
+zim::Blob ArticleSource::getData(const std::string& aid) {
+
+  if (isVerbose())
+std::cout << "Packing data for " << aid << std::endl;
+
+  if (data != NULL) {
+delete(data);
+data = NULL;
+  }
+
+  if (aid.substr(0, 3) == "/M/") {
+std::string value; 
+
+if ( aid == "/M/Language") {
+  value = language;
+} else if (aid == "/M/Creator") {
+  value = creator;
+} else if (aid == "/M/Publisher") {
+  value = publisher;
+} else if (aid == "/M/Title") {
+  value = title;
+} else if (aid == "/M/Description") {
+  value = description;
+} else if ( aid == "/M/Date") {
+  time_t t = time(0);
+  struct tm * now = localtime( & t );
+  std::stringstream stream;
+  stream << (now->tm_year + 1900) << '-' 
+<< std::setw(2) << std::setfill('0') << 

[MediaWiki-commits] [Gerrit] Use a Queue object to handle threadsafe access to a queue. - change (openzim)

2016-06-22 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Use a Queue object to handle threadsafe access to a queue.
..


Use a Queue object to handle threadsafe access to a queue.

By using a Queue object we avoid the declaration of popFromFilenameQueue
function in articlesource.cpp.

Change-Id: Ic685f7e22e4ce95f6e0eb280f65809fd0dff1a6a
---
M zimwriterfs/articlesource.cpp
M zimwriterfs/articlesource.h
A zimwriterfs/queue.h
M zimwriterfs/zimwriterfs.cpp
4 files changed, 116 insertions(+), 55 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/articlesource.cpp b/zimwriterfs/articlesource.cpp
index 8b0b34c..6cf5f30 100644
--- a/zimwriterfs/articlesource.cpp
+++ b/zimwriterfs/articlesource.cpp
@@ -28,7 +28,6 @@
 #include 
 #include 
 
-bool popFromFilenameQueue(std::string );
 bool isVerbose();
 
 extern std::string welcome;
@@ -44,8 +43,9 @@
 unsigned int dataSize = 0;
 
 
-
-ArticleSource::ArticleSource() {
+ArticleSource::ArticleSource(Queue& filenameQueue):
+filenameQueue(filenameQueue)
+{
   /* Prepare metadata */
   metadataQueue.push("Language");
   metadataQueue.push("Publisher");
@@ -88,10 +88,10 @@
 std::string line = redirectsQueue.front();
 redirectsQueue.pop();
 article = new RedirectArticle(line);
-  } else if (popFromFilenameQueue(path)) {
+  } else if (filenameQueue.popFromQueue(path)) {
 do {
   article = new Article(path);
-} while (article && article->isInvalid() && popFromFilenameQueue(path));
+} while (article && article->isInvalid() && 
filenameQueue.popFromQueue(path));
   } else {
 article = NULL;
   }
diff --git a/zimwriterfs/articlesource.h b/zimwriterfs/articlesource.h
index adbdbda..1ad6524 100644
--- a/zimwriterfs/articlesource.h
+++ b/zimwriterfs/articlesource.h
@@ -24,12 +24,13 @@
 #include 
 #include 
 #include 
+#include "queue.h"
 
 #include 
 
 class ArticleSource : public zim::writer::ArticleSource {
   public:
-explicit ArticleSource();
+explicit ArticleSource(Queue& filenameQueue);
 virtual const zim::writer::Article* getNextArticle();
 virtual zim::Blob getData(const std::string& aid);
 virtual std::string getMainPage();
@@ -39,6 +40,7 @@
   private:
 std::queue metadataQueue;
 std::queue redirectsQueue;
+Queue& filenameQueue;
 };
 
 #endif //OPENZIM_ZIMWRITERFS_ARTICLESOURCE_H
diff --git a/zimwriterfs/queue.h b/zimwriterfs/queue.h
new file mode 100644
index 000..d177568
--- /dev/null
+++ b/zimwriterfs/queue.h
@@ -0,0 +1,88 @@
+/*
+ * Copyright 2016 Matthieu Gautier 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU  General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ * MA 02110-1301, USA.
+ */
+
+#ifndef OPENZIM_ZIMWRITERFS_QUEUE_H
+#define OPENZIM_ZIMWRITERFS_QUEUE_H
+
+#define MAX_QUEUE_SIZE 100
+
+#include 
+#include 
+
+template
+class Queue {
+public:
+Queue() {pthread_mutex_init(_queueMutex,NULL);};
+virtual ~Queue() {pthread_mutex_destroy(_queueMutex);};
+virtual bool isEmpty();
+virtual void pushToQueue(const T& element);
+virtual bool popFromQueue(T );
+
+protected:
+std::queue   m_realQueue;
+pthread_mutex_t m_queueMutex;
+
+private:
+// Make this queue non copyable
+Queue(const Queue&);
+Queue& operator=(const Queue&);
+};
+
+template
+bool Queue::isEmpty() {
+pthread_mutex_lock(_queueMutex);
+bool retVal = m_realQueue.empty();
+pthread_mutex_unlock(_queueMutex);
+return retVal;
+}
+
+template
+void Queue::pushToQueue(const T ) {
+unsigned int wait = 0;
+unsigned int queueSize = 0;
+
+do {
+usleep(wait);
+pthread_mutex_lock(_queueMutex);
+queueSize = m_realQueue.size();
+pthread_mutex_unlock(_queueMutex);
+wait += 10;
+} while (queueSize > MAX_QUEUE_SIZE);
+
+pthread_mutex_lock(_queueMutex);
+m_realQueue.push(element);
+pthread_mutex_unlock(_queueMutex);
+}
+
+template
+bool Queue::popFromQueue(T ) {
+pthread_mutex_lock(_queueMutex);
+if (m_realQueue.empty()) {
+pthread_mutex_unlock(_queueMutex);
+return false;
+}
+
+element = m_realQueue.front();
+m_realQueue.pop();
+pthread_mutex_unlock(_queueMutex);
+
+  return true;
+}
+
+#endif // 

[MediaWiki-commits] [Gerrit] Compress blobs and track file size immediately after adding ... - change (openzim)

2016-06-22 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Compress blobs and track file size immediately after adding 
each article.
..


Compress blobs and track file size immediately after adding each article.

Rather than first collecting all the directory entries and only afterwards
writing the blobs, write each cluster on the fly as we see each article.
Keep track of both compressed and uncompressed clusters so that we don't
needlessly terminate compressed clusters just because we happen to have
encountered an uncompressible file.  Account for additions to each of
the various indices as we go so that we maintain a fairly accurate
size for the file at every point, which will allow us to stop adding
articles once the ZIM file gets to a certain size.

Change-Id: Ib644fff4cb804320a07aadbea499c8416df66adc
---
M zimlib/include/zim/writer/zimcreator.h
M zimlib/src/zimcreator.cpp
2 files changed, 126 insertions(+), 86 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/writer/zimcreator.h 
b/zimlib/include/zim/writer/zimcreator.h
index 5134d98..4296371 100644
--- a/zimlib/include/zim/writer/zimcreator.h
+++ b/zimlib/include/zim/writer/zimcreator.h
@@ -52,10 +52,10 @@
 CompressionType compression;
 bool isEmpty;
 offset_type clustersSize;
+offset_type currentSize;
 
-void createDirents(ArticleSource& src);
+void createDirents(ArticleSource& src, const std::string& tmpfname);
 void createTitleIndex(ArticleSource& src);
-void createClusters(ArticleSource& src, const std::string& tmpfname);
 void fillHeader(ArticleSource& src);
 void write(const std::string& fname, const std::string& tmpfname);
 
@@ -84,6 +84,10 @@
 void setMinChunkSize(int s)   { minChunkSize = s; }
 
 void create(const std::string& fname, ArticleSource& src);
+
+/* The user can query `currentSize` after each article has been
+ * added to the ZIM file. */
+offset_type getCurrentSize() { return currentSize; }
 };
 
   }
diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index 66ce902..ac2720b 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -56,28 +56,30 @@
   : minChunkSize(1024-64),
 nextMimeIdx(0),
 #ifdef ENABLE_LZMA
-compression(zimcompLzma)
+compression(zimcompLzma),
 #elif ENABLE_BZIP2
-compression(zimcompBzip2)
+compression(zimcompBzip2),
 #elif ENABLE_ZLIB
-compression(zimcompZip)
+compression(zimcompZip),
 #else
-compression(zimcompNone)
+compression(zimcompNone),
 #endif
+currentSize(0)
 {
 }
 
 ZimCreator::ZimCreator(int& argc, char* argv[])
   : nextMimeIdx(0),
 #ifdef ENABLE_LZMA
-compression(zimcompLzma)
+compression(zimcompLzma),
 #elif ENABLE_BZIP2
-compression(zimcompBzip2)
+compression(zimcompBzip2),
 #elif ENABLE_ZLIB
-compression(zimcompZip)
+compression(zimcompZip),
 #else
-compression(zimcompNone)
+compression(zimcompNone),
 #endif
+currentSize(0)
 {
   Arg minChunkSizeArg(argc, argv, "--min-chunk-size");
   if (minChunkSizeArg.isSet())
@@ -110,15 +112,12 @@
   log_debug("basename " << basename);
 
   INFO("create directory entries");
-  createDirents(src);
+  createDirents(src, basename + ".tmp");
   INFO(dirents.size() << " directory entries created");
 
   INFO("create title index");
   createTitleIndex(src);
   INFO(dirents.size() << " title index created");
-
-  INFO("create clusters");
-  createClusters(src, basename + ".tmp");
   INFO(clusterOffsets.size() << " clusters created");
 
   INFO("fill header");
@@ -132,9 +131,23 @@
   INFO("ready");
 }
 
-void ZimCreator::createDirents(ArticleSource& src)
+void ZimCreator::createDirents(ArticleSource& src, const std::string& 
tmpfname)
 {
   INFO("collect articles");
+  std::ofstream out(tmpfname.c_str());
+  currentSize =
+80 /* for header */ +
+1 /* for mime type table termination */ +
+16 /* for md5sum */;
+
+  // We keep both a "compressed cluster" and an "uncompressed cluster"
+  // because we don't know which one will fill up first.  We also need
+  // to track the dirents currently in each, so we can fix up the
+  // cluster index if the other one ends up written first.
+  DirentsType compDirents, uncompDirents;
+  Cluster compCluster, uncompCluster;
+  compCluster.setCompression(compression);
+  uncompCluster.setCompression(zimcompNone);
 
   const Article* article;
   while ((article = src.getNextArticle()) != 0)
@@ -163,13 +176,107 @@
 }
 else
 {
+  uint16_t oldMimeIdx = nextMimeIdx;
   

[MediaWiki-commits] [Gerrit] Update build instructions for MacOS X. - change (openzim)

2016-06-21 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Update build instructions for MacOS X.
..


Update build instructions for MacOS X.

Change-Id: I0190ada3bf02c42ed34cb6d253516a9c662357e1
---
M zimlib/README.md
M zimwriterfs/README.md
2 files changed, 31 insertions(+), 2 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/README.md b/zimlib/README.md
index ce35ed9..9aef82c 100644
--- a/zimlib/README.md
+++ b/zimlib/README.md
@@ -16,5 +16,24 @@
 make
 ```
 
+OSX compilation
+---
+On MacOSX, you'll need to install some packages from
+[homebrew](http://brew.sh/):
+```
+brew update
+brew install xz libmagic
+```
+You'll also want to add `/usr/local/include` to your search path,
+for example:
+```
+./autogen.sh
+./configure CFLAGS=-I/usr/local/include CXXFLAGS=-I/usr/local/include
+make
+```
+
+License
+---
+
 The `zimlib` library is released under the GPLv2 license
 terms.
diff --git a/zimwriterfs/README.md b/zimwriterfs/README.md
index 8695643..1cbb7dc 100644
--- a/zimwriterfs/README.md
+++ b/zimwriterfs/README.md
@@ -44,9 +44,19 @@
 ```
 
 OSX compilation
-
+---
+OSX builds are similar to Linux, except we use homebrew.  Change to
+`../zimlib` and build zimlib as instructed in the README there.  Then
+return here and:
+```
+brew install gumbo-parser
+./autogen.sh
+./configure CXXFLAGS="-I../zimlib/include -I/usr/local/include" 
LDFLAGS=-L../zimlib/src/.libs
+make
+```
 
-On MaxOSX, a script helps you build zimwriterfs both statically and 
dynamically.
+Alternatively, there is a script included here to help you build both
+static and dynamic binaries for `zimwriterfs`.
 You must have a working and set up Kiwix repository (with dependencies ready).
 
 1. Install libmagic with brew (it's important)

-- 
To view, visit https://gerrit.wikimedia.org/r/295382
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0190ada3bf02c42ed34cb6d253516a9c662357e1
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Cscott 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Include files from srcdir; built files in builddir. - change (openzim)

2016-06-19 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Include files from srcdir; built files in builddir.
..


Include files from srcdir; built files in builddir.

Change-Id: I84f097567b944601534cdbc3313216c0364211bd
---
M zimlib/examples/Makefile.am
M zimlib/src/Makefile.am
M zimlib/src/tools/Makefile.am
M zimlib/test/Makefile.am
M zimreader/src/Makefile.am
M zimwriterdb/src/Makefile.am
M zimwriterdb/test/Makefile.am
7 files changed, 7 insertions(+), 7 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/examples/Makefile.am b/zimlib/examples/Makefile.am
index 13ac63f..3157cf0 100644
--- a/zimlib/examples/Makefile.am
+++ b/zimlib/examples/Makefile.am
@@ -1,4 +1,4 @@
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include
 noinst_PROGRAMS = createZimExample
 createZimExample_SOURCES = createZimExample.cpp
 LDADD = $(top_builddir)/src/libzim.la
diff --git a/zimlib/src/Makefile.am b/zimlib/src/Makefile.am
index 823a3b6..c0bd632 100644
--- a/zimlib/src/Makefile.am
+++ b/zimlib/src/Makefile.am
@@ -1,4 +1,4 @@
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include
 
 lib_LTLIBRARIES = libzim.la
 
diff --git a/zimlib/src/tools/Makefile.am b/zimlib/src/tools/Makefile.am
index a50b7c8..410504a 100644
--- a/zimlib/src/tools/Makefile.am
+++ b/zimlib/src/tools/Makefile.am
@@ -1,4 +1,4 @@
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include -I$(top_srcdir)/src
 if MAKE_BENCHMARK
   ZIMBENCH = zimbench
 endif
diff --git a/zimlib/test/Makefile.am b/zimlib/test/Makefile.am
index 34dad12..29dd7bd 100644
--- a/zimlib/test/Makefile.am
+++ b/zimlib/test/Makefile.am
@@ -1,4 +1,4 @@
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include
 
 noinst_PROGRAMS = zimlib-test
 
diff --git a/zimreader/src/Makefile.am b/zimreader/src/Makefile.am
index ff40f97..b7fc747 100644
--- a/zimreader/src/Makefile.am
+++ b/zimreader/src/Makefile.am
@@ -10,7 +10,7 @@
 .css.cpp:
ecppc $(ECPPFLAGS) -b $(ECPPFLAGS_CSS) $<
 
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include
 
 bin_PROGRAMS = zimreader
 
diff --git a/zimwriterdb/src/Makefile.am b/zimwriterdb/src/Makefile.am
index 46b0fac..854f2cb 100644
--- a/zimwriterdb/src/Makefile.am
+++ b/zimwriterdb/src/Makefile.am
@@ -1,6 +1,6 @@
 bin_PROGRAMS = zimwriterdb zimindexer zimcreatorsearch wikizim
 
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include
 
 # wikizim
 #
diff --git a/zimwriterdb/test/Makefile.am b/zimwriterdb/test/Makefile.am
index fffe02a..52262b2 100644
--- a/zimwriterdb/test/Makefile.am
+++ b/zimwriterdb/test/Makefile.am
@@ -6,4 +6,4 @@
 createzim_t_LDFLAGS = -lcxxtools -lzim
 createzim_t_SOURCES = createzim-t.cpp
 
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include

-- 
To view, visit https://gerrit.wikimedia.org/r/295111
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I84f097567b944601534cdbc3313216c0364211bd
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Cscott 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Minor wording and typo fixes to the zimwriterfs README. - change (openzim)

2016-06-18 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Minor wording and typo fixes to the zimwriterfs README.
..


Minor wording and typo fixes to the zimwriterfs README.

Change-Id: Ia408f755b38f079f3063a6f0bc02ae95519f5e1c
---
M zimwriterfs/README
1 file changed, 7 insertions(+), 7 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/README b/zimwriterfs/README
index 91ea637..811d4a8 100644
--- a/zimwriterfs/README
+++ b/zimwriterfs/README
@@ -1,15 +1,15 @@
-zimwriterfs is a console tool to create ZIM (http://www.openzim.org)
-files from a localy stored directory containing a "self-sufficient"
-HTML content (with pictures, javascript, stylesheets). The result will
+`zimwriterfs` is a console tool to create [ZIM](http://www.openzim.org)
+files from a locally-stored directory containing "self-sufficient"
+HTML content (with pictures, javascript, and stylesheets). The result will
 contain all the files of the local directory compressed and merged in
 the ZIM file. Nothing more, nothing less. The generated file can be
-open with a ZIM reader, Kiwix (http://www.kiwix.org) for example, but
-you have other one (http://openzim.org/wiki/ZIM_Readers).
+opened with a ZIM reader; [Kiwix](http://www.kiwix.org) is one example, but
+there are [others](http://openzim.org/wiki/ZIM_Readers).
 
-zimwriterfs works for now only on POSIX compatible systems, you simply
+`zimwriterfs` works for now only on POSIX-compatible systems, you simply
 need to compile it and run it. The software does not need a lot of
 resources, but if you create a pretty big ZIM files, then it could
-take a wile to complete.
+take a while to complete.
 
 GNU/Linux compilation
 -

-- 
To view, visit https://gerrit.wikimedia.org/r/295032
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia408f755b38f079f3063a6f0bc02ae95519f5e1c
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Cscott 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Move zim::writer::ArticleSource::getData() to zim::writer::A... - change (openzim)

2016-06-23 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Move zim::writer::ArticleSource::getData() to 
zim::writer::Article::getData().
..


Move zim::writer::ArticleSource::getData() to zim::writer::Article::getData().

When building a ZIM file incrementally, it makes more sense to provide
the data upfront as soon as you return the Article object from
ArticleSource::getNextArticle.  This is the way that the Category
object already works.

Change-Id: I78f2a69ae3931cc43a51cdab360468e13fcc54cb
---
M zimlib/examples/createZimExample.cpp
M zimlib/include/zim/writer/articlesource.h
M zimlib/src/articlesource.cpp
M zimlib/src/zimcreator.cpp
4 files changed, 54 insertions(+), 14 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/examples/createZimExample.cpp 
b/zimlib/examples/createZimExample.cpp
index 914000c..95b8d09 100644
--- a/zimlib/examples/createZimExample.cpp
+++ b/zimlib/examples/createZimExample.cpp
@@ -40,7 +40,7 @@
 virtual std::string getMimeType() const;
 virtual std::string getRedirectAid() const;
 
-zim::Blob data()
+virtual zim::Blob getData() const
 { return zim::Blob(&_data[0], _data.size()); }
 };
 
@@ -96,7 +96,6 @@
 explicit TestArticleSource(unsigned max = 16);
 
 virtual const zim::writer::Article* getNextArticle();
-virtual zim::Blob getData(const std::string& aid);
 };
 
 TestArticleSource::TestArticleSource(unsigned max)
@@ -119,14 +118,6 @@
   unsigned n = _next++;
 
   return &_articles[n];
-}
-
-zim::Blob TestArticleSource::getData(const std::string& aid)
-{
-  unsigned n;
-  std::istringstream s(aid);
-  s >> n;
-  return _articles[n-1].data();
 }
 
 int main(int argc, char* argv[])
diff --git a/zimlib/include/zim/writer/articlesource.h 
b/zimlib/include/zim/writer/articlesource.h
index 1fda337..94ee91b 100644
--- a/zimlib/include/zim/writer/articlesource.h
+++ b/zimlib/include/zim/writer/articlesource.h
@@ -20,16 +20,16 @@
 #ifndef ZIM_WRITER_ARTICLESOURCE_H
 #define ZIM_WRITER_ARTICLESOURCE_H
 
+#include 
 #include 
 #include 
 #include 
 
 namespace zim
 {
-  class Blob;
-
   namespace writer
   {
+class ArticleSource;
 class Article
 {
   public:
@@ -45,9 +45,26 @@
 virtual bool shouldCompress() const;
 virtual std::string getRedirectAid() const;
 virtual std::string getParameter() const;
+/* Idealy this method should be pure virtual,
+ * but for compatibility reasons, provide a default implementation
+ * using the old ArticleSourc::getData.
+ */
+virtual Blob getData() const;
 
 // returns the next category id, to which the article is assigned to
 virtual std::string getNextCategory();
+
+  
//
+  /* For API compatibility.
+   * The default Article::getData call ArticleSource::getData.
+   * So store the source of article in article to let default API 
compatible
+   * function do its job.
+   * This should be removed once every users switch to new API.
+   */
+  private:
+mutable ArticleSource*  __source;
+friend class ZimCreator;
+  
//
 };
 
 class Category
@@ -63,7 +80,6 @@
   public:
 virtual void setFilename(const std::string& fname) { }
 virtual const Article* getNextArticle() = 0;
-virtual Blob getData(const std::string& aid) = 0;
 virtual Uuid getUuid();
 virtual std::string getMainPage();
 virtual std::string getLayoutPage();
@@ -73,6 +89,16 @@
 // ids. Using this list, the writer fetches the category data using
 // this method.
 virtual Category* getCategory(const std::string& cid);
+
+
/**/
+/* For API compatibility.
+ * The default Article::getData call ArticleSource::getData.
+ * So keep the getData. Do not set it pure virtual cause we want new
+ * code to not use it.
+ * This should be removed once every users switch to new API.
+ */
+virtual Blob getData(const std::string& aid) { throw "This should not 
be called"; };
+
/**/
 };
 
   }
diff --git a/zimlib/src/articlesource.cpp b/zimlib/src/articlesource.cpp
index 4d1ec91..cc72ae2 100644
--- a/zimlib/src/articlesource.cpp
+++ b/zimlib/src/articlesource.cpp
@@ -17,6 +17,7 @@
  *
  */
 
+#include 
 #include 
 
 namespace zim
@@ -68,6 +69,19 @@
   return std::string();
 }
 
+
/**/
+/* For API compatibility.
+ * The default Article::getData call ArticleSource::getData.
+ * This should be 

[MediaWiki-commits] [Gerrit] Readd call to stripTitleInvalidChars. - change (openzim)

2016-06-23 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Readd call to stripTitleInvalidChars.
..


Readd call to stripTitleInvalidChars.

Commit 1c969dd remove the call to stripTitleInvalidChars in Article.
This was a error while rebasing changes. Fix this.

Change-Id: I001bcdce81db27a359703221c929620e543b0a32
---
M zimwriterfs/article.cpp
M zimwriterfs/tools.h
2 files changed, 2 insertions(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/article.cpp b/zimwriterfs/article.cpp
index f743cde..4aeb083 100644
--- a/zimwriterfs/article.cpp
+++ b/zimwriterfs/article.cpp
@@ -69,6 +69,7 @@
  GumboNode* title_text = 
(GumboNode*)(child->v.element.children.data[0]);
  if (title_text->type == GUMBO_NODE_TEXT) {
title = title_text->v.text.text;
+   stripTitleInvalidChars(title);
  }
}
  }
diff --git a/zimwriterfs/tools.h b/zimwriterfs/tools.h
index ec6b454..8b43da4 100644
--- a/zimwriterfs/tools.h
+++ b/zimwriterfs/tools.h
@@ -40,6 +40,7 @@
 
 void replaceStringInPlaceOnce(std::string& subject, const std::string& search, 
const std::string& replace);
 void replaceStringInPlace(std::string& subject, const std::string& search, 
const std::string& replace);
+void stripTitleInvalidChars(std::string & str);
 
 std::string extractRedirectUrlFromHtml(const GumboVector* head_children);
 void getLinks(GumboNode* node, std::map );

-- 
To view, visit https://gerrit.wikimedia.org/r/295633
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I001bcdce81db27a359703221c929620e543b0a32
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Update zimwriterfs makefiles. - change (openzim)

2016-06-18 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Update zimwriterfs makefiles.
..


Update zimwriterfs makefiles.

Change-Id: Ic0f93e19eb3e07d86c115bd15f47ef5fbc74f954
---
M zimwriterfs/Makefile.am
M zimwriterfs/README.md
M zimwriterfs/configure.ac
3 files changed, 22 insertions(+), 49 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index 308149d..e52da96 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -1,5 +1,3 @@
 AUTOMAKE_OPTIONS=subdir-objects
 bin_PROGRAMS=zimwriterfs
 zimwriterfs_SOURCES= zimwriterfs.cpp gumbo/utf8.c gumbo/string_buffer.c 
gumbo/parser.c gumbo/error.c gumbo/string_piece.c gumbo/tag.c gumbo/vector.c 
gumbo/tokenizer.c gumbo/util.c gumbo/char_ref.c gumbo/attribute.c
-zimwriterfs_CXXFLAGS=$(LIBZIM_CFLAGS) $(LIBLZMA_CFLAGS) $(LIBZ_CFLAGS) 
$(LIBMAGIC_CFLAGS) $(LIBPTHREAD_CFLAGS) $(CFLAGS) $(CXXFLAGS)
-zimwriterfs_LDFLAGS=$(LIBZIM_LDFLAGS) $(LIBLZMA_LDFLAGS) $(LIBZ_LDFLAGS) 
$(LIBMAGIC_LDFLAGS) $(LIBPTHREAD_LDFLAGS)
\ No newline at end of file
diff --git a/zimwriterfs/README.md b/zimwriterfs/README.md
index 797f6fb..db8a01d 100644
--- a/zimwriterfs/README.md
+++ b/zimwriterfs/README.md
@@ -29,10 +29,16 @@
   packaged), resp. for the mimeType detection
 * libz (http://www.zlib.net/), resp. for unpack compressed HTML files
 
+On Debian, you can ensure these are installed with:
+```
+sudo apt-get install liblzma-dev libmagic-dev zlib1g-dev
+cd ../zimlib && ./autogen.sh && ./configure && make && cd ../zimwriterfs
+```
+
 Once the dependencies are in place, to build:
 ```
 ./autogen.sh
-./configure
+./configure CXXFLAGS=-I../zimlib/include LDFLAGS=-L../zimlib/src/.libs
 make
 ```
 
diff --git a/zimwriterfs/configure.ac b/zimwriterfs/configure.ac
index 5a01142..9c80493 100644
--- a/zimwriterfs/configure.ac
+++ b/zimwriterfs/configure.ac
@@ -33,70 +33,39 @@
   AC_MSG_ERROR([[cannot find pkg-config]])
 fi
 
-# Check if the liblzma is available
+# Set up CXXFLAGS/LDFLAGS and ensure they are substituted
+AC_ARG_VAR(CXXFLAGS, [C++ compiler flags])
+AC_ARG_VAR(LDFLAGS, linker flags)
+CFLAGS="-O3 -std=gnu99 -std=c99 $CFLAGS"
+CXXFLAGS="-O3 -Igumbo $CXXFLAGS"
+
+# Check if the liblzma library is available
 AC_CHECK_HEADER([lzma.h],, [AC_MSG_ERROR([[cannot find lzma header]])])
 AC_CHECK_LIB([lzma], [lzma_version_string],, [AC_MSG_ERROR([[cannot find 
lzma]])])
 
-# Check if the libzim is available
+# Check if the libzim library is available
 AC_CHECK_HEADER([zim/zim.h],, [AC_MSG_ERROR([[cannot find libzim header]])])
 AC_CHECK_LIB([zim], [zim_MD5Init],, [AC_MSG_ERROR([[cannot find libzim]])])
 
-# Check if the libmagic is available
+# Check if the libz library is available
+AC_CHECK_HEADER([zlib.h],, [AC_MSG_ERROR([[cannot find libz header]])])
+AC_CHECK_LIB([z], [deflate],, [AC_MSG_ERROR([[cannot find libz]])])
+
+# Check if the libmagic library is available
 AC_CHECK_HEADER([magic.h],, [AC_MSG_ERROR([[cannot find libmagic header]])])
 AC_CHECK_LIB([magic], [magic_file],, [AC_MSG_ERROR([[cannot find libmagic]])])
 
-# Check if the libpthread is available
+# Check if the libpthread library is available
 AC_CHECK_HEADER([pthread.h],, [AC_MSG_ERROR([[cannot find libpthread 
header]])])
 AC_CHECK_LIB([pthread], [pthread_exit],, [AC_MSG_ERROR([[cannot find 
libpthread]])])
 
-# Set current language to C++
-AC_LANG(C++)
-
 # Check the existence of stat64 (to handle file >2GB) in the libc
 AC_CHECK_FUNCS([stat64])
-
-# cxxflags
-CXXFLAGS="-O3 -Igumbo $CXXFLAGS"
-CFLAGS="-O3 -std=gnu99 -std=c99"
-
-# liblzma
-LIBLZMA_CFLAGS=""
-LIBLZMA_LDFLAGS=" -llzma"
-
-# libzim
-LIBZIM_CFLAGS=""
-LIBZIM_LDFLAGS=" -lzim"
-
-# libz
-LIBZ_CFLAGS=""
-LIBZ_LDFLAGS=" -lz"
-
-# libmagic
-LIBMAGIC_CFLAGS=""
-LIBMAGIC_LDFLAGS=" -lmagic"
-
-# libpthread
-LIBPTHREAD_CFLAGS=""
-LIBPTHREAD_LDFLAGS=" -lpthread"
 
 AC_DEFINE_UNQUOTED(CLUSTER_CACHE_SIZE, 16, [set zim cluster cache size to 
number of cached chunks])
 AC_DEFINE_UNQUOTED(DIRENT_CACHE_SIZE, 512, [set zim dirent cache size to 
number of cached chunks])
 AC_DEFINE_UNQUOTED(LZMA_MEMORY_SIZE, 128, [set lzma uncompress memory size to 
number of MB])
 AC_DEFINE(ENABLE_LZMA, [1], [defined if lzma compression is enabled])
-
-# export variables
-AC_SUBST(CXXFLAGS)
-AC_SUBST(CFLAGS)
-AC_SUBST(LIBLZMA_CFLAGS)
-AC_SUBST(LIBLZMA_LDFLAGS)
-AC_SUBST(LIBZIM_CFLAGS)
-AC_SUBST(LIBZIM_LDFLAGS)
-AC_SUBST(LIBZ_CFLAGS)
-AC_SUBST(LIBZ_LDFLAGS)
-AC_SUBST(LIBMAGIC_CFLAGS)
-AC_SUBST(LIBMAGIC_LDFLAGS)
-AC_SUBST(LIBPTHREAD_CFLAGS)
-AC_SUBST(LIBPTHREAD_LDFLAGS)
 
 # Configure the output files
 AC_CONFIG_FILES([
@@ -104,4 +73,4 @@
 ])
 
 AC_PROG_INSTALL
-AC_OUTPUT
\ No newline at end of file
+AC_OUTPUT

-- 
To view, visit https://gerrit.wikimedia.org/r/295038
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic0f93e19eb3e07d86c115bd15f47ef5fbc74f954

[MediaWiki-commits] [Gerrit] Update .gitignore files. - change (openzim)

2016-06-18 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Update .gitignore files.
..


Update .gitignore files.

Change-Id: Iad8d95fa2deab178396e8804b6e195a5b1520a4a
---
C zimlib/.gitignore
R zimwriterfs/.gitignore
2 files changed, 3 insertions(+), 11 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/.gitignore b/zimlib/.gitignore
similarity index 75%
copy from .gitignore
copy to zimlib/.gitignore
index c50679e..0a4a991 100644
--- a/.gitignore
+++ b/zimlib/.gitignore
@@ -4,7 +4,6 @@
 compile
 config.*
 configure
-createZimExample
 depcomp
 .deps
 .dirstamp
@@ -25,8 +24,6 @@
 .svn
 .*.swp
 *.zim
-zimdiff
-zimdump
-zimpatch
-zimsearch
-zimwriterfs
+examples/createZimExample
+src/tools/zimdump
+src/tools/zimsearch
diff --git a/.gitignore b/zimwriterfs/.gitignore
similarity index 80%
rename from .gitignore
rename to zimwriterfs/.gitignore
index c50679e..1aaab40 100644
--- a/.gitignore
+++ b/zimwriterfs/.gitignore
@@ -4,7 +4,6 @@
 compile
 config.*
 configure
-createZimExample
 depcomp
 .deps
 .dirstamp
@@ -25,8 +24,4 @@
 .svn
 .*.swp
 *.zim
-zimdiff
-zimdump
-zimpatch
-zimsearch
 zimwriterfs

-- 
To view, visit https://gerrit.wikimedia.org/r/295037
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Iad8d95fa2deab178396e8804b6e195a5b1520a4a
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Cscott 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Convert READMEs to markdown. - change (openzim)

2016-06-18 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Convert READMEs to markdown.
..


Convert READMEs to markdown.

Change-Id: I596bec7ef70bd0a7d45ba67d4fa313d93c861890
---
A README.md
D zimlib/README
A zimlib/README
A zimlib/README.md
D zimreader-java/README
A zimreader-java/README.md
D zimreader/README
A zimreader/README.md
A zimwriterdb/README.md
R zimwriterfs/README.md
10 files changed, 100 insertions(+), 61 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/README.md b/README.md
new file mode 100644
index 000..c5fee8e
--- /dev/null
+++ b/README.md
@@ -0,0 +1,11 @@
+This OpenZIM project repository contains the sources for:
+* [zimlib](./zimlib#readme): A library for reading and writing ZIM files.
+* [zimwriterfs](./zimwriterfs#readme): A tool for creating ZIM files based on
+  contents on a local filesystem.
+* [zimreader-java](./zimreader-java#readme): A ZIM reader written in Java.
+
+Old code, provided for archive purposes:
+* [zimreader](./zimreader#readme): An old reader for the Zeno file format, the
+  predecessor to the ZIM file format.
+* [zimwriterdb](./zimwriterdb#readme): Demonstrates how to write a ZIM file
+  based on data in a database.
diff --git a/zimlib/README b/zimlib/README
deleted file mode 100644
index e12d9c0..000
--- a/zimlib/README
+++ /dev/null
@@ -1,6 +0,0 @@
-The zimlib is the standard implementation of the ZIM specification. It
-is a library which implements the read and write method for ZIM
-files. The zimlib is released under the GPLv2 license terms. Use the
-zimlib in your own software - like reader applications - to make them
-ZIM-capable without the need having to dig too much into the ZIM file
-format.
diff --git a/zimlib/README b/zimlib/README
new file mode 12
index 000..42061c0
--- /dev/null
+++ b/zimlib/README
@@ -0,0 +1 @@
+README.md
\ No newline at end of file
diff --git a/zimlib/README.md b/zimlib/README.md
new file mode 100644
index 000..ce35ed9
--- /dev/null
+++ b/zimlib/README.md
@@ -0,0 +1,20 @@
+zimlib
+--
+
+The `zimlib` library is the standard implementation of the ZIM
+specification.  It is a library which implements read and write
+methods for ZIM files.
+
+Use the zimlib in your own software --- for example, reader
+applications --- to make them ZIM-capable without the need having to
+dig too much into the ZIM file format.
+
+To build:
+```
+./autogen.sh
+./configure
+make
+```
+
+The `zimlib` library is released under the GPLv2 license
+terms.
diff --git a/zimreader-java/README b/zimreader-java/README
deleted file mode 100644
index 973e7b2..000
--- a/zimreader-java/README
+++ /dev/null
@@ -1,46 +0,0 @@
-
-ZIMReader in Java
-=
-
-This is a port of the ZIMReader in Java. One 
-of the aims of this project is to enable mobile 
-users developing on Android, J2ME and other 
-platforms to use ZIM files and build offline 
-Wikipedia readers.
-
-I'll soon add a javadoc, in the mean time 
-you can go through the comments that I have 
-provided in the source code. Also, try running 
-the example ZIMTest.java. 
-
-This code was built on Java 1.6 and has not 
-been tested on previous versions. However, I'll
-do that soon on previous ones as well. In the 
-next release, I intend to provide an Ant file.
-
-If you find any bugs, please report them to 
- or visit the 
-IRC channel #openzim on Freenode and ping 
-'gremmachook'.
-
-This library is licensed under the LGPL v3.0 
-license. However, I understand that sometimes 
-licensing can be a problem for you. I would be 
-happy to provide a alternate lesser permissive 
-license if the need be.
-
-Found this library useful? Drop in a mail, I 
-love to hear feedback.
-
-Before this ends, I'd like to thank Lasse Collin 
-, who maintains the 
-Tukaani project, for his port of XZ in Java, 
-without which it wouldn't have been possible for 
-me to write this library.
- 
-
--- Arunesh Mathur
-   
-
-   
-
diff --git a/zimreader-java/README.md b/zimreader-java/README.md
new file mode 100644
index 000..abce858
--- /dev/null
+++ b/zimreader-java/README.md
@@ -0,0 +1,42 @@
+ZIMReader in Java
+=
+
+This is a port of the ZIMReader in Java. One
+of the aims of this project is to enable mobile
+users developing on Android, J2ME and other
+platforms to use ZIM files and build offline
+Wikipedia readers.
+
+I'll soon add a javadoc, in the mean time
+you can go through the comments that I have
+provided in the source code. Also, try running
+the example ZIMTest.java.
+
+This code was built on Java 1.6 and has not
+been tested on previous versions. However, I'll
+do that soon on previous ones as well. In the
+next release, I intend to provide an Ant file.
+
+If you find any bugs, please report 

[MediaWiki-commits] [Gerrit] Use the system's libgumbo. - change (openzim)

2016-06-18 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Use the system's libgumbo.
..


Use the system's libgumbo.

Change-Id: Id6b06033bba6c1d47d305b50259e7eaf41c073e2
---
M zimwriterfs/Makefile.am
M zimwriterfs/README.md
M zimwriterfs/configure.ac
D zimwriterfs/gumbo/attribute.c
D zimwriterfs/gumbo/attribute.h
D zimwriterfs/gumbo/char_ref.c
D zimwriterfs/gumbo/char_ref.h
D zimwriterfs/gumbo/error.c
D zimwriterfs/gumbo/error.h
D zimwriterfs/gumbo/gumbo.h
D zimwriterfs/gumbo/insertion_mode.h
D zimwriterfs/gumbo/parser.c
D zimwriterfs/gumbo/parser.h
D zimwriterfs/gumbo/string_buffer.c
D zimwriterfs/gumbo/string_buffer.h
D zimwriterfs/gumbo/string_piece.c
D zimwriterfs/gumbo/string_piece.h
D zimwriterfs/gumbo/tag.c
D zimwriterfs/gumbo/token_type.h
D zimwriterfs/gumbo/tokenizer.c
D zimwriterfs/gumbo/tokenizer.h
D zimwriterfs/gumbo/tokenizer_states.h
D zimwriterfs/gumbo/utf8.c
D zimwriterfs/gumbo/utf8.h
D zimwriterfs/gumbo/util.c
D zimwriterfs/gumbo/util.h
D zimwriterfs/gumbo/vector.c
D zimwriterfs/gumbo/vector.h
28 files changed, 9 insertions(+), 33,053 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved




-- 
To view, visit https://gerrit.wikimedia.org/r/295039
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id6b06033bba6c1d47d305b50259e7eaf41c073e2
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Cscott 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Move `throw` statement out of header file. - change (openzim)

2016-06-23 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Move `throw` statement out of header file.
..


Move `throw` statement out of header file.

This ensures that we don't fail compilation when including the header
file in an environment where exception handling is disabled (like
when binding to node.js).

Change-Id: Ib49060c7fe479054e58c20249dd9b3236ea7eb03
---
M zimlib/include/zim/writer/articlesource.h
M zimlib/src/articlesource.cpp
2 files changed, 5 insertions(+), 1 deletion(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/writer/articlesource.h 
b/zimlib/include/zim/writer/articlesource.h
index 94ee91b..54653ef 100644
--- a/zimlib/include/zim/writer/articlesource.h
+++ b/zimlib/include/zim/writer/articlesource.h
@@ -97,7 +97,8 @@
  * code to not use it.
  * This should be removed once every users switch to new API.
  */
-virtual Blob getData(const std::string& aid) { throw "This should not 
be called"; };
+virtual Blob getData(const std::string& aid);
+
 
/**/
 };
 
diff --git a/zimlib/src/articlesource.cpp b/zimlib/src/articlesource.cpp
index cc72ae2..a2087a7 100644
--- a/zimlib/src/articlesource.cpp
+++ b/zimlib/src/articlesource.cpp
@@ -80,6 +80,9 @@
   std::cerr << " You should override Article::getData 
directly." << std::endl;
   return __source->getData(getAid());
 }
+Blob ArticleSource::getData(const std::string& aid) {
+throw std::runtime_error("This should not be called");
+}
 
/**/
 
 Uuid ArticleSource::getUuid()

-- 
To view, visit https://gerrit.wikimedia.org/r/295720
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib49060c7fe479054e58c20249dd9b3236ea7eb03
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Cscott 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Bug fix: correctly update cluster offsets in directory entries. - change (openzim)

2016-06-26 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Bug fix: correctly update cluster offsets in directory entries.
..


Bug fix: correctly update cluster offsets in directory entries.

This is a follow-up to f5de40f94b30795f42bb9388cbb46df9cd605167.

When we moved the blob writing to the main dirent-creation loop,
we ended up making separate *copies* of the dirents and updating
blob/cluster information in these, instead of the dirents in the
main list which will eventually be written.  Make the auxilliary
lists contain dirent *pointers* to avoid this problem.

Change-Id: I008fa700acd90c3c51614bde65d61ffbc6061872
---
M zimlib/include/zim/writer/zimcreator.h
M zimlib/src/zimcreator.cpp
2 files changed, 10 insertions(+), 9 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/writer/zimcreator.h 
b/zimlib/include/zim/writer/zimcreator.h
index 6f47402..2e52d7e 100644
--- a/zimlib/include/zim/writer/zimcreator.h
+++ b/zimlib/include/zim/writer/zimcreator.h
@@ -33,6 +33,7 @@
 {
   public:
 typedef std::vector DirentsType;
+typedef std::vector DirentPtrsType;
 typedef std::vector SizeVectorType;
 typedef std::vector OffsetsType;
 typedef std::map MimeTypes;
diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index bc977ea..46c550f 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -144,7 +144,7 @@
   // because we don't know which one will fill up first.  We also need
   // to track the dirents currently in each, so we can fix up the
   // cluster index if the other one ends up written first.
-  DirentsType compDirents, uncompDirents;
+  DirentPtrsType compDirents, uncompDirents;
   Cluster compCluster, uncompCluster;
   compCluster.setCompression(compression);
   uncompCluster.setCompression(zimcompNone);
@@ -188,11 +188,11 @@
   }
 }
 
-dirents.push_back(dirent);
 currentSize +=
   dirent.getDirentSize() /* for directory entry */ +
   sizeof(offset_type) /* for url pointer list */ +
   sizeof(size_type) /* for title pointer list */;
+dirents.push_back(dirent);
 
 // If this is a redirect, we're done: there's no blob to add.
 if (dirent.isRedirect())
@@ -217,7 +217,7 @@
 }
 
 Cluster *cluster;
-DirentsType *myDirents, *otherDirents;
+DirentPtrsType *myDirents, *otherDirents;
 if (dirent.isCompress())
 {
   cluster = 
@@ -230,9 +230,9 @@
   myDirents = 
   otherDirents = 
 }
-myDirents->push_back(dirent);
-dirent.setCluster(clusterOffsets.size(), cluster->count());
+dirents.back().setCluster(clusterOffsets.size(), cluster->count());
 cluster->addBlob(blob);
+myDirents->push_back(&(dirents.back()));
 
 // If cluster is now large enough, write it to disk.
 if (cluster->size() >= minChunkSize * 1024)
@@ -247,10 +247,10 @@
   cluster->clear();
   myDirents->clear();
   // Update the cluster number of the dirents *not* written to disk.
-  for (DirentsType::iterator di = otherDirents->begin();
+  for (DirentPtrsType::iterator di = otherDirents->begin();
di != otherDirents->end(); ++di)
   {
-di->setCluster(clusterOffsets.size(), di->getBlobNumber());
+(*di)->setCluster(clusterOffsets.size(), (*di)->getBlobNumber());
   }
   offset_type end = out.tellp();
   currentSize += (end - start) +
@@ -263,10 +263,10 @@
   {
 clusterOffsets.push_back(out.tellp());
 out << compCluster;
-for (DirentsType::iterator di = uncompDirents.begin();
+for (DirentPtrsType::iterator di = uncompDirents.begin();
  di != uncompDirents.end(); ++di)
 {
-  di->setCluster(clusterOffsets.size(), di->getBlobNumber());
+  (*di)->setCluster(clusterOffsets.size(), (*di)->getBlobNumber());
 }
   }
   compCluster.clear();

-- 
To view, visit https://gerrit.wikimedia.org/r/296158
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I008fa700acd90c3c51614bde65d61ffbc6061872
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Cscott 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Fix regex to enable upload from ETHZ Library with the GWT - change (operations/mediawiki-config)

2016-02-28 Thread Kelson (Code Review)
Kelson has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/273774

Change subject: Fix regex to enable upload from ETHZ Library with the GWT
..

Fix regex to enable upload from ETHZ Library with the GWT

Change-Id: I124d051922aa71137058271bbc7ee564d1c081c2
---
M wmf-config/InitialiseSettings.php
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config 
refs/changes/74/273774/1

diff --git a/wmf-config/InitialiseSettings.php 
b/wmf-config/InitialiseSettings.php
index 79ad31c..66dca0b 100644
--- a/wmf-config/InitialiseSettings.php
+++ b/wmf-config/InitialiseSettings.php
@@ -11934,7 +11934,7 @@
'*.davidabian.com', // Trusted user website, 
used by Museo del Romanticismo - T121383
'davidabian.com',
'webapi.aucklandmuseum.com',// Auckland Museum - T122995
-   '*.ethz.ch',// ETH Library  - T123109
+   'www.e-pics.ethz.ch',   // ETH Library  - T123109
'*.museumvictoria.com.au',  // Victoria State (AU) 
Museum, requested in T125387
),
 ),

-- 
To view, visit https://gerrit.wikimedia.org/r/273774
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I124d051922aa71137058271bbc7ee564d1c081c2
Gerrit-PatchSet: 1
Gerrit-Project: operations/mediawiki-config
Gerrit-Branch: master
Gerrit-Owner: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Fix memory corruption in decodeUrl.

2016-08-09 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Fix memory corruption in decodeUrl.
..


Fix memory corruption in decodeUrl.

Reuse the function from kiwix as it does the same thing but with much
better c++ code.

If we pass a char to sscanf while letting it thinks it is a uint,
sscanf will initialize 4 bytes. As there is only byte associated to the
char, this lead to undefined behavior.

This bug was not found before as we were using the function as inline
function. Compiler optimization probably hid the error.

Change-Id: I618f8b9ede083e6580044f967153bfbe3ee3d294
---
M zimwriterfs/tools.cpp
1 file changed, 13 insertions(+), 10 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/tools.cpp b/zimwriterfs/tools.cpp
index 49bc79e..cc6091d 100644
--- a/zimwriterfs/tools.cpp
+++ b/zimwriterfs/tools.cpp
@@ -246,19 +246,22 @@
 
 }
 
-std::string decodeUrl(const std::string ) {
-  std::string decodedUrl = encodedUrl;
-  std::string::size_type pos = 0;
-  char ch;
+static char charFromHex(std::string a) {
+  std::istringstream Blat(a);
+  int Z;
+  Blat >> std::hex >> Z;
+  return char (Z);
+}
 
-  while ((pos = decodedUrl.find('%', pos)) != std::string::npos &&
-pos + 2 < decodedUrl.length()) {
-sscanf(decodedUrl.substr(pos + 1, 2).c_str(), "%x", (unsigned int*));
-decodedUrl.replace(pos, 3, 1, ch);
+std::string decodeUrl(const std::string ) {
+  std::string url = originalUrl;
+  std::string::size_type pos = 0;
+  while ((pos = url.find('%', pos)) != std::string::npos &&
+pos + 2 < url.length()) {
+url.replace(pos, 3, 1, charFromHex(url.substr(pos + 1, 2)));
 ++pos;
   }
-
-  return decodedUrl;
+  return url;
 }
 
 std::string removeLastPathElement(const std::string& path, const bool 
removePreSeparator, const bool removePostSeparator) {

-- 
To view, visit https://gerrit.wikimedia.org/r/303540
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I618f8b9ede083e6580044f967153bfbe3ee3d294
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: The indexed url must contain the namespace.

2016-08-11 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: The indexed url must contain the namespace.
..


The indexed url must contain the namespace.

Change-Id: Icc17c7552a2b213a15edc2c674fe00aacb36952b
---
M zimwriterfs/xapianIndexer.cpp
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/xapianIndexer.cpp b/zimwriterfs/xapianIndexer.cpp
index 8325152..4d0effa 100644
--- a/zimwriterfs/xapianIndexer.cpp
+++ b/zimwriterfs/xapianIndexer.cpp
@@ -108,7 +108,7 @@
 return;
 
 token.title = article->getTitle();
-token.url = article->getUrl();
+token.url = std::string(1, article->getNamespace()) + '/' + 
article->getUrl();
 zim::Blob article_content = article->getData();
 token.content = std::string(article_content.data(), 
article_content.size());
 

-- 
To view, visit https://gerrit.wikimedia.org/r/304240
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Icc17c7552a2b213a15edc2c674fe00aacb36952b
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Remove temporary ft_index.

2016-08-12 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Remove temporary ft_index.
..


Remove temporary ft_index.

Xapian indexer create temporary files while indexing content.
Remove those files at end.

Change-Id: I637da46f69a8f127052a48de5028c185fb0748aa
---
M zimwriterfs/tools.cpp
M zimwriterfs/tools.h
M zimwriterfs/xapianIndexer.cpp
M zimwriterfs/xapianIndexer.h
4 files changed, 31 insertions(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/tools.cpp b/zimwriterfs/tools.cpp
index b73b5ca..af6d71d 100644
--- a/zimwriterfs/tools.cpp
+++ b/zimwriterfs/tools.cpp
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -539,3 +540,19 @@
   ustring.toUTF8String(unaccentedText);
   return unaccentedText;
 }
+
+void remove_all(const std::string& path) {
+  DIR *dir;
+  struct dirent *ent;
+  if ((dir = opendir (path.c_str())) != NULL) {
+/* It's a directory, remove all its entries. */
+while ((ent = readdir (dir)) != NULL) {
+  if (strcmp(ent->d_name, ".") and strcmp(ent->d_name, "..")) {
+std::string childPath = path + SEPARATOR + ent->d_name;
+remove_all(childPath);
+  }
+}
+closedir (dir);
+  }
+  remove(path.c_str());
+}
diff --git a/zimwriterfs/tools.h b/zimwriterfs/tools.h
index d85b292..bd7d6e6 100644
--- a/zimwriterfs/tools.h
+++ b/zimwriterfs/tools.h
@@ -47,4 +47,6 @@
 
 std::string removeAccents(const std::string );
 
+void remove_all(const std::string& path);
+
 #endif //ย OPENZIM_ZIMWRITERFS_TOOLS_H
diff --git a/zimwriterfs/xapianIndexer.cpp b/zimwriterfs/xapianIndexer.cpp
index 4d0effa..4117c14 100644
--- a/zimwriterfs/xapianIndexer.cpp
+++ b/zimwriterfs/xapianIndexer.cpp
@@ -32,6 +32,17 @@
   */
 }
 
+XapianIndexer::~XapianIndexer(){
+  if (!indexPath.empty()) {
+try {
+  remove_all(indexPath + ".tmp");
+  remove_all(indexPath);
+} catch(...) {
+  /* Do not raise */
+}
+  }
+}
+
 void XapianIndexer::indexingPrelude(const string indexPath_) {
 indexPath = indexPath_;
 this->writableDatabase = Xapian::WritableDatabase(indexPath + ".tmp", 
Xapian::DB_CREATE_OR_OVERWRITE);
diff --git a/zimwriterfs/xapianIndexer.h b/zimwriterfs/xapianIndexer.h
index 71dfe64..bbb170e 100644
--- a/zimwriterfs/xapianIndexer.h
+++ b/zimwriterfs/xapianIndexer.h
@@ -50,6 +50,7 @@
 class XapianIndexer : public Indexer, public IHandler {
 public:
 XapianIndexer(const std::string& language, bool verbose);
+virtual ~XapianIndexer();
 std::string getIndexPath() { return indexPath; }
 
 protected:

-- 
To view, visit https://gerrit.wikimedia.org/r/304450
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I637da46f69a8f127052a48de5028c185fb0748aa
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Fix memory leak in zimwriterfs.

2016-08-09 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Fix memory leak in zimwriterfs.
..


Fix memory leak in zimwriterfs.

We are using polymorphism in articlesource.cpp (a Article* pointer but
derived class as real instance).
If we want to correctly delete allocated resources, we need to let compiler
get the correct destructor in the vtable.

In our case, the data member of FileArticle was never destructed and
the internal buffer of the std::string was leaked (even if the article was
correctly deleted).

Change-Id: I58833f48a440e9c91a97637d6c11e5642eaf64ce
---
M zimwriterfs/article.h
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/article.h b/zimwriterfs/article.h
index 49f7d25..4a26ead 100644
--- a/zimwriterfs/article.h
+++ b/zimwriterfs/article.h
@@ -47,6 +47,7 @@
 virtual std::string getMimeType() const;
 virtual std::string getRedirectAid() const;
 virtual bool shouldCompress() const;
+virtual ~Article() {};
 };
 
 class MetadataArticle : public Article {

-- 
To view, visit https://gerrit.wikimedia.org/r/303775
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I58833f48a440e9c91a97637d6c11e5642eaf64ce
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Delete all articles, even the last one.

2016-08-09 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Delete all articles, even the last one.
..


Delete all articles, even the last one.

No so important, only one article *could* be impacted.
(If it is the last article and it is invalid)

Change-Id: Ibd3e7587148eccbdb2c588d4a915a94e343fc003
---
M zimwriterfs/articlesource.cpp
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/articlesource.cpp b/zimwriterfs/articlesource.cpp
index 0eea3e1..5ce0cc1 100644
--- a/zimwriterfs/articlesource.cpp
+++ b/zimwriterfs/articlesource.cpp
@@ -75,6 +75,7 @@
   article = new FileArticle(path);
 };
 if (article->isInvalid()) {
+  delete article;
   article = NULL;
 }
   }

-- 
To view, visit https://gerrit.wikimedia.org/r/303776
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ibd3e7587148eccbdb2c588d4a915a94e343fc003
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Close the magic database.

2016-08-09 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Close the magic database.
..


Close the magic database.

Change-Id: Ib0c7d1736d804526eea701ef39cb9ae5780b13c1
---
M zimwriterfs/zimwriterfs.cpp
1 file changed, 2 insertions(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/zimwriterfs.cpp b/zimwriterfs/zimwriterfs.cpp
index 094e5c0..f69c530 100644
--- a/zimwriterfs/zimwriterfs.cpp
+++ b/zimwriterfs/zimwriterfs.cpp
@@ -404,6 +404,8 @@
 #if HAVE_XAPIAN
   delete xapianIndexer;
 #endif
+
+  magic_close(magic);
   /* Destroy mutex */
   pthread_mutex_destroy();
   pthread_mutex_destroy();

-- 
To view, visit https://gerrit.wikimedia.org/r/303777
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib0c7d1736d804526eea701ef39cb9ae5780b13c1
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Better replaceStringInPlaceOnce.

2016-08-09 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Better replaceStringInPlaceOnce.
..


Better replaceStringInPlaceOnce.

If we do only one replace, we do not need a while and a update of pos.
(Thank to cppcheck)

Change-Id: I5c825590c3a5ed7e01020b415b5ea695baa070a0
---
M zimwriterfs/tools.cpp
1 file changed, 2 insertions(+), 4 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/tools.cpp b/zimwriterfs/tools.cpp
index cc6091d..b73b5ca 100644
--- a/zimwriterfs/tools.cpp
+++ b/zimwriterfs/tools.cpp
@@ -423,11 +423,9 @@
 void replaceStringInPlaceOnce(std::string& subject,
   const std::string& search,
   const std::string& replace) {
-  size_t pos = 0;
-  while ((pos = subject.find(search, pos)) != std::string::npos) {
+  size_t pos = subject.find(search, 0);
+  if (pos != std::string::npos) {
 subject.replace(pos, search.length(), replace);
-pos += replace.length();
-return; /* Do it once */
   }
 }
 

-- 
To view, visit https://gerrit.wikimedia.org/r/303778
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5c825590c3a5ed7e01020b415b5ea695baa070a0
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Revert "Use explicit LZMA_STREAM_INIT initializer, instead o... - change (openzim)

2016-07-01 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Revert "Use explicit LZMA_STREAM_INIT initializer, instead of 
memset."
..


Revert "Use explicit LZMA_STREAM_INIT initializer, instead of memset."

This reverts commit 498539d869bbb9add9e795432520f305523e09bf.

Turns out that the clang compiler doesn't like this form of initializer:
it must be a GCC extension of some kind.  There's nothing technically
wrong with the previous `memset` way of initializing the variable,
it just looks kind of gross.  But functionality trumps aesthetics.

Change-Id: I1d5a12c6eb40706023f81323c04a356da3356f42
---
M zimlib/src/lzmastream.cpp
1 file changed, 3 insertions(+), 2 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/src/lzmastream.cpp b/zimlib/src/lzmastream.cpp
index bd02bd8..d880933 100644
--- a/zimlib/src/lzmastream.cpp
+++ b/zimlib/src/lzmastream.cpp
@@ -58,10 +58,11 @@
   }
 
   LzmaStreamBuf::LzmaStreamBuf(std::streambuf* sink_, uint32_t preset, 
lzma_check check, unsigned bufsize_)
-: stream(LZMA_STREAM_INIT),
-  obuffer(bufsize_),
+: obuffer(bufsize_),
   sink(sink_)
   {
+std::memset(reinterpret_cast(), 0, sizeof(stream));
+
 checkError(
   ::lzma_easy_encoder(, preset, check));
 

-- 
To view, visit https://gerrit.wikimedia.org/r/296752
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1d5a12c6eb40706023f81323c04a356da3356f42
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Cscott 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Explain that ZIntStream only represents uint32_t values righ... - change (openzim)

2016-06-29 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Explain that ZIntStream only represents uint32_t values right 
now.
..


Explain that ZIntStream only represents uint32_t values right now.

Change-Id: Id712847949af1ce32b2e3118fc241be171124186
---
M zimlib/include/zim/zintstream.h
1 file changed, 4 insertions(+), 1 deletion(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/zintstream.h b/zimlib/include/zim/zintstream.h
index 1c78a1f..66fdfef 100644
--- a/zimlib/include/zim/zintstream.h
+++ b/zimlib/include/zim/zintstream.h
@@ -41,7 +41,10 @@
   substracted from the actual number, so a 2 byte zero is actually a 128.
 
   The same logic continues on the 3rd, 4th, ... byte. Up to 7 additional bytes
-  are used, so the first byte must contain at least one 0.
+  could used, since the first byte must contain at least one 0.
+
+  This particular implementation only represents uint32_t values (numbers up
+  to 2^32-1), so it will only ever emit 5 bytes per input value.
 
   binary  range
   --- 
--

-- 
To view, visit https://gerrit.wikimedia.org/r/296657
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id712847949af1ce32b2e3118fc241be171124186
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Cscott 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Add a indexer. - change (openzim)

2016-07-03 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Add a indexer.
..


Add a indexer.

This indexer is not used.
This is mainly code from kiwix-indexer imported in openzim.
Unused function in *Tools has been removed.
No dependency to xapian.

Change-Id: I55079339d21d6903634c265f83f4d1c6ba0ac333
---
M zimwriterfs/Makefile.am
A zimwriterfs/indexer.cpp
A zimwriterfs/indexer.h
A zimwriterfs/pathTools.cpp
A zimwriterfs/pathTools.h
A zimwriterfs/resourceTools.cpp
A zimwriterfs/resourceTools.h
M zimwriterfs/zimwriterfs.cpp
8 files changed, 921 insertions(+), 2 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index 92641d9..628b74c 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -6,4 +6,7 @@
 tools.cpp \
 article.cpp \
 articlesource.cpp \
+indexer.cpp \
+resourceTools.cpp \
+pathTools.cpp \
 mimetypecounter.cpp
diff --git a/zimwriterfs/indexer.cpp b/zimwriterfs/indexer.cpp
new file mode 100644
index 000..7820a32
--- /dev/null
+++ b/zimwriterfs/indexer.cpp
@@ -0,0 +1,262 @@
+/*
+ * Copyright 2011-2014 Emmanuel Engelhart 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU  General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ * MA 02110-1301, USA.
+ */
+
+#include "indexer.h"
+#include "resourceTools.h"
+#include "pathTools.h"
+#include 
+
+  /* Count word */
+  unsigned int Indexer::countWords(const string ) {
+unsigned int numWords = 1;
+unsigned int length = text.size();
+
+for(unsigned int i=0; istopWords.push_back(stopWord);
+}
+  }
+
+  /* Article indexer methods */
+  void *Indexer::indexArticles(void *ptr) {
+pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
+Indexer *self = (Indexer *)ptr;
+unsigned int indexedArticleCount = 0;
+indexerToken token;
+
+self->indexingPrelude(self->getIndexPath());
+
+while (self->popFromToIndexQueue(token)) {
+  self->index(token.url,
+ token.accentedTitle,
+ token.title,
+ token.keywords,
+ token.content,
+ token.snippet,
+ token.size,
+ token.wordCount
+ );
+
+  indexedArticleCount += 1;
+
+  /* Make a hard-disk flush every 10.000 articles */
+  if (indexedArticleCount % 5000 == 0) {
+   self->flush();
+  }
+
+  /* Test if the thread should be cancelled */
+  pthread_testcancel();
+}
+self->indexingPostlude();
+
+/* Write content id file */
+string path = appendToDirectory(self->getIndexPath(), "content.id");
+writeTextFile(path, self->getZimId());
+
+usleep(100);
+
+self->articleIndexerRunning(false);
+pthread_exit(NULL);
+return NULL;
+  }
+
+  void Indexer::articleIndexerRunning(bool value) {
+pthread_mutex_lock();
+this->articleIndexerRunningFlag = value;
+pthread_mutex_unlock();
+  }
+
+  bool Indexer::isArticleIndexerRunning() {
+pthread_mutex_lock();
+bool retVal = this->articleIndexerRunningFlag;
+pthread_mutex_unlock();
+return retVal;
+  }
+
+  /* ToIndexQueue methods */
+  bool Indexer::isToIndexQueueEmpty() {
+pthread_mutex_lock();
+bool retVal = this->toIndexQueue.empty();
+pthread_mutex_unlock();
+return retVal;
+  }
+
+  void Indexer::pushToIndexQueue(indexerToken ) {
+pthread_mutex_lock();
+this->toIndexQueue.push(token);
+pthread_mutex_unlock();
+usleep(int(this->toIndexQueue.size() / 200) / 10 * 1000);
+  }
+
+  bool Indexer::popFromToIndexQueue(indexerToken ) {
+while (this->isToIndexQueueEmpty()) {
+  usleep(500);
+  if (this->getVerboseFlag()) {
+   std::cout << "Waiting... ToIndexQueue is empty for now..." << std::endl;
+  }
+
+  pthread_testcancel();
+}
+
+pthread_mutex_lock();
+token = this->toIndexQueue.front();
+this->toIndexQueue.pop();
+pthread_mutex_unlock();
+
+if (token.title == ""){
+//This is a empty token, end of the queue.
+return false;
+}
+return true;
+  }
+
+  /* Index methods */
+  void 

[MediaWiki-commits] [Gerrit] Port zimwriterfs to the new API. - change (openzim)

2016-07-03 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Port zimwriterfs to the new API.
..


Port zimwriterfs to the new API.

No more ArticleSource::getData.

Change-Id: I76cd6f3e7e4a390ed6a58cf9815dda2a2f1bfde5
---
M zimwriterfs/Makefile.am
M zimwriterfs/article.cpp
M zimwriterfs/article.h
M zimwriterfs/articlesource.cpp
M zimwriterfs/articlesource.h
A zimwriterfs/mimetypecounter.cpp
A zimwriterfs/mimetypecounter.h
M zimwriterfs/zimwriterfs.cpp
8 files changed, 376 insertions(+), 248 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index 6e46553..92641d9 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -5,4 +5,5 @@
 zimwriterfs.cpp \
 tools.cpp \
 article.cpp \
-articlesource.cpp
+articlesource.cpp \
+mimetypecounter.cpp
diff --git a/zimwriterfs/article.cpp b/zimwriterfs/article.cpp
index 98ec882..3840f7b 100644
--- a/zimwriterfs/article.cpp
+++ b/zimwriterfs/article.cpp
@@ -21,11 +21,61 @@
 #include "article.h"
 #include "tools.h"
 
+#include 
+#include 
+
 
 extern std::string directoryPath;
 
-Article::Article(ArticleSource* source, const std::string& path, const bool 
detectRedirects):
-source(source)
+std::string Article::getAid() const
+{
+  return aid;
+}
+
+bool Article::isInvalid() const
+{
+  return invalid;
+}
+
+char Article::getNamespace() const
+{
+  return ns;
+}
+
+std::string Article::getUrl() const
+{
+  return url;
+}
+
+std::string Article::getTitle() const
+{
+  return title;
+}
+
+bool Article::isRedirect() const
+{
+  return !redirectAid.empty();
+}
+
+std::string Article::getMimeType() const
+{
+  return mimeType;
+}
+
+std::string Article::getRedirectAid() const
+{
+  return redirectAid;
+}
+
+bool Article::shouldCompress() const {
+  return (getMimeType().find("text") == 0 ||
+ getMimeType() == "application/javascript" ||
+ getMimeType() == "application/json" ||
+  getMimeType() == "image/svg+xml" ? true : false);
+}
+
+FileArticle::FileArticle(const std::string& path, const bool detectRedirects):
+dataRead(false)
 {
   invalid = false;
 
@@ -109,57 +159,125 @@
   }
 }
 
+/* Update links in the html to let them still be valid */
+std::map links;
+getLinks(root, links);
+std::map::iterator it;
+
+/* If a link appearch to be duplicated in the HTML, it will
+   occurs only one time in the links variable */
+for(it = links.begin(); it != links.end(); it++) {
+  if (!it->first.empty()
+&& it->first[0] != '#'
+&& it->first[0] != '?'
+&& it->first.substr(0, 5) != "data:") {
+replaceStringInPlace(html, "\"" + it->first + "\"", "\"" + 
computeNewUrl(aid, it->first) + "\"");
+  }
+}
+
+data = html;
+dataRead = true;
+
 gumbo_destroy_output(, output);
   }
 }
 
-std::string Article::getAid() const
+zim::Blob FileArticle::getData() const {
+if ( dataRead )
+return zim::Blob(data.data(), data.size());;
+
+std::string aidPath = directoryPath + "/" + aid;
+std::string fileContent = getFileContent(aidPath);
+
+if (getMimeType().find("text/css") == 0) {
+/* Rewrite url() values in the CSS */
+size_t startPos = 0;
+size_t endPos = 0;
+std::string url;
+
+while ((startPos = fileContent.find("url(", endPos)) && startPos != 
std::string::npos) {
+/* URL delimiters */
+endPos = fileContent.find(")", startPos);
+startPos = startPos + (fileContent[startPos+4] == '\'' || 
fileContent[startPos+4] == '"' ? 5 : 4);
+endPos = endPos - (fileContent[endPos-1] == '\'' || 
fileContent[endPos-1] == '"' ? 1 : 0);
+url = fileContent.substr(startPos, endPos - startPos);
+std::string startDelimiter = fileContent.substr(startPos-1, 1);
+std::string endDelimiter = fileContent.substr(endPos, 1);
+
+if (url.substr(0, 5) != "data:") {
+/* Deal with URL with arguments (using '? ') */
+std::string path = url;
+size_t markPos = url.find("?");
+if (markPos != std::string::npos) {
+path = url.substr(0, markPos);
+}
+
+/* Embeded fonts need to be inline because Kiwix is
+   otherwise not able to load same because of the
+   same-origin security */
+std::string mimeType = getMimeTypeForFile(path);
+if ( mimeType == "application/font-ttf"
+  || mimeType == "application/font-woff"
+  || mimeType == "application/vnd.ms-opentype"
+  || mimeType == "application/vnd.ms-fontobject") {
+try {
+std::string fontContent 

[MediaWiki-commits] [Gerrit] Handle the case of last article for filenameQueue is invalid. - change (openzim)

2016-07-03 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Handle the case of last article for filenameQueue is invalid.
..


Handle the case of last article for filenameQueue is invalid.

Change-Id: I970d7dc6cfbc572ed7c2b2c7e1b4d3a27cd98ce9
---
M zimwriterfs/articlesource.cpp
1 file changed, 8 insertions(+), 3 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/articlesource.cpp b/zimwriterfs/articlesource.cpp
index 06a773d..0eea3e1 100644
--- a/zimwriterfs/articlesource.cpp
+++ b/zimwriterfs/articlesource.cpp
@@ -58,6 +58,7 @@
 
   if (article != NULL) {
 delete article;
+article = NULL;
   }
 
   if (!metadataQueue.empty()) {
@@ -69,12 +70,16 @@
 article = new RedirectArticle(line);
   } else if (filenameQueue.popFromQueue(path)) {
 article = new FileArticle(path);
-while (article && article->isInvalid() && 
filenameQueue.popFromQueue(path)) {
+while (article->isInvalid() && filenameQueue.popFromQueue(path)) {
   delete article;
   article = new FileArticle(path);
 };
-  } else {
-article = NULL;
+if (article->isInvalid()) {
+  article = NULL;
+}
+  }
+
+  if (article == NULL) {
 if ( !loopOverHandlerStarted )
 {
 currentLoopHandler = articleHandlers.begin();

-- 
To view, visit https://gerrit.wikimedia.org/r/296915
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I970d7dc6cfbc572ed7c2b2c7e1b4d3a27cd98ce9
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Add xapian indexer. - change (openzim)

2016-07-03 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Add xapian indexer.
..


Add xapian indexer.

Xapian is optional.
Build your index inside zim by adding "-i" or "--createFullTextIndex"
to zimwriterfs' command line.

Change-Id: I52c255e8335d0b6763c1c59eeb1549300d5f6f81
---
M zimwriterfs/Makefile.am
M zimwriterfs/configure.ac
M zimwriterfs/tools.cpp
M zimwriterfs/tools.h
A zimwriterfs/xapian/htmlparse.cc
A zimwriterfs/xapian/htmlparse.h
A zimwriterfs/xapian/myhtmlparse.cc
A zimwriterfs/xapian/myhtmlparse.h
A zimwriterfs/xapian/namedentities.h
A zimwriterfs/xapianIndexer.cpp
A zimwriterfs/xapianIndexer.h
M zimwriterfs/zimwriterfs.cpp
12 files changed, 1,490 insertions(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index 628b74c..1d40174 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -10,3 +10,15 @@
 resourceTools.cpp \
 pathTools.cpp \
 mimetypecounter.cpp
+
+zimwriterfs_CXXFLAGS = $(ICU_CFLAGS)
+zimwriterfs_LDFLAGS = $(ICU_LDFLAGS)
+
+if HAVE_XAPIAN
+zimwriterfs_CXXFLAGS += $(XAPIAN_CFLAGS)
+zimwriterfs_LDFLAGS += $(XAPIAN_LDFLAGS)
+zimwriterfs_SOURCES += \
+xapianIndexer.cpp \
+xapian/myhtmlparse.cc \
+xapian/htmlparse.cc
+endif
diff --git a/zimwriterfs/configure.ac b/zimwriterfs/configure.ac
index fb12c8f..795d3b1 100644
--- a/zimwriterfs/configure.ac
+++ b/zimwriterfs/configure.ac
@@ -71,6 +71,121 @@
 AC_DEFINE_UNQUOTED(LZMA_MEMORY_SIZE, 128, [set lzma uncompress memory size to 
number of MB])
 AC_DEFINE(ENABLE_LZMA, [1], [defined if lzma compression is enabled])
 
+
+function findLibrary {
+   found=0
+   for f in $(echo $LIBS_ROOT|tr ":" "\n") ; do
+   sf=`find $f -name $1 | grep $ARCH | head -1 2> /dev/null`
+   if [[ -f "$sf" -a $found -eq 0 ]]
+   then
+   found=1
+   echo $sf
+   fi
+   done
+   if [[ $found -eq 0 ]]
+   then
+   for f in $(echo $LIBS_ROOT|tr ":" "\n") ; do
+   sf=`find $f -name $1 | head -1 2> /dev/null`
+   if [[ -f "$sf" -a $found -eq 0 ]]
+   then
+   found=1
+   echo $sf
+   fi
+   done
+   fi
+   if [[ $found -eq 0 ]]
+   then
+   echo "no"
+   fi
+}
+
+
+
+ ICU
+
+
+
+ICU_CFLAGS=""
+ICU_LDFLAGS="-licui18n -licuuc -licudata" # replaced by icu-config
+ICU_STATIC_LDFLAGS=""
+
+# if --with-x, add path to LIBRARY_PATH
+AC_ARG_WITH(icu,
+AC_HELP_STRING([--with-icu=DIR], [alternate location for 
icu-config]),
+export 
LIBRARY_PATH="${withval}:${LIBRARY_PATH}";ICU_PATH=${withval}
+   )
+
+# look for shared library.
+# AC_CHECK_HEADER([zlib.h],, [AC_MSG_ERROR([[cannot find zlib header]])])
+# AC_CHECK_LIB([z], [zlibVersion],, [AC_MSG_ERROR([[cannot find 
zlib]]);COMPILE_ICU=1])
+# ICU_FILES=`findLibrary "libicuuc.${SHARED_EXT}"`
+
+AC_CHECK_TOOL(HAVE_ICU_CONFIG, icu-config,, "${ICU_PATH}:${PATH}")
+if test [ ! "$HAVE_ICU_CONFIG" ]
+then
+ AC_MSG_ERROR([[cannot find icu-config]])
+else
+OLDPATH=$PATH
+PATH="${ICU_PATH}:${PATH}"
+ICU_CFLAGS=`icu-config --cxxflags`;
+ICU_LDFLAGS=`icu-config --ldflags`;
+ICU_VER=`icu-config --version`;
+ICU_FILES="`findLibrary "libicuuc.${SHARED_EXT}"` `findLibrary 
"libicudata.${SHARED_EXT}"` `findLibrary "libicui18n.${SHARED_EXT}"`"
+PATH=$OLDPATH
+if [[ $ICU_VER \< "4.2" ]]
+   then
+AC_MSG_ERROR([[You need a version of libicu >= 4.2]])
+   fi
+fi
+
+
+AC_SUBST(ICU_CFLAGS)
+AC_SUBST(ICU_LDFLAGS)
+AC_SUBST(ICU_STATIC_LDFLAGS)
+AC_SUBST(ICU_FILES)
+AC_SUBST(COMPILED_ICUDATA_DAT)
+
+
+ XAPIAN
+
+
+XAPIAN_CFLAGS=""
+XAPIAN_LDFLAGS=""
+XAPIAN_STATIC_LDFLAGS=""
+XAPIAN_ENABLE=0
+
+# if --with-x, add path to LIBRARY_PATH
+AC_ARG_WITH([xapian],
+   [AS_HELP_STRING([--with-xapian=DIR], [alternat location for 
xapian-config] @@)],
+   [xapian_dir=$withval],
+   [with_xapian=yes])
+
+
+AS_IF([test "x$with_xapian" == xno],
+[AM_CONDITIONAL(HAVE_XAPIAN, false)],
+   [OLDPATH=$PATH
+AS_IF([test "x$with_xapian" != xyes],
+  PATH="$with_xapian:$PATH")
+AC_CHECK_TOOLS(XAPIAN_CONFIG, xapian-config-1.3, 
xapian-config,[],$PATH)
+AS_IF([test "x$XAPIAN_CONFIG" == x ],
+   AC_MSG_ERROR([[cannot find xapian-config file]])
+ )
+XAPIAN_VERSION=`$XAPIAN_CONFIG --version`
+good_version=yes
+

[MediaWiki-commits] [Gerrit] Revert compatibility with previous API. - change (openzim)

2016-07-03 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Revert compatibility with previous API.
..


Revert compatibility with previous API.

The API has changed. The behavior is different, so do not try to keep
a false APIย compatibility.

Change-Id: I174f7df03d5ad5477c6b0fd1869258a31af3366a
---
M zimlib/include/zim/writer/articlesource.h
M zimlib/src/articlesource.cpp
M zimlib/src/zimcreator.cpp
M zimwriterfs/article.cpp
M zimwriterfs/article.h
M zimwriterfs/articlesource.cpp
6 files changed, 15 insertions(+), 59 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/writer/articlesource.h 
b/zimlib/include/zim/writer/articlesource.h
index dbe9263..a9ecffb 100644
--- a/zimlib/include/zim/writer/articlesource.h
+++ b/zimlib/include/zim/writer/articlesource.h
@@ -46,26 +46,10 @@
 virtual bool shouldCompress() const;
 virtual std::string getRedirectAid() const;
 virtual std::string getParameter() const;
-/* Idealy this method should be pure virtual,
- * but for compatibility reasons, provide a default implementation
- * using the old ArticleSourc::getData.
- */
-virtual Blob getData() const;
+virtual Blob getData() const = 0;
 
 // returns the next category id, to which the article is assigned to
 virtual std::string getNextCategory();
-
-  
//
-  /* For API compatibility.
-   * The default Article::getData call ArticleSource::getData.
-   * So store the source of article in article to let default API 
compatible
-   * function do its job.
-   * This should be removed once every users switch to new API.
-   */
-  private:
-mutable ArticleSource*  __source;
-friend class ZimCreator;
-  
//
 };
 
 class Category
@@ -90,17 +74,6 @@
 // ids. Using this list, the writer fetches the category data using
 // this method.
 virtual Category* getCategory(const std::string& cid);
-
-
/**/
-/* For API compatibility.
- * The default Article::getData call ArticleSource::getData.
- * So keep the getData. Do not set it pure virtual cause we want new
- * code to not use it.
- * This should be removed once every users switch to new API.
- */
-virtual Blob getData(const std::string& aid);
-
-
/**/
 };
 
   }
diff --git a/zimlib/src/articlesource.cpp b/zimlib/src/articlesource.cpp
index a2087a7..26d33f8 100644
--- a/zimlib/src/articlesource.cpp
+++ b/zimlib/src/articlesource.cpp
@@ -69,22 +69,6 @@
   return std::string();
 }
 
-
/**/
-/* For API compatibility.
- * The default Article::getData call ArticleSource::getData.
- * This should be removed once every users switch to new API.
- */
-Blob Article::getData() const
-{
-  std::cerr << "DEPRECATED WARNING : Use of ArticleSource::getData is 
deprecated." << std::endl;
-  std::cerr << " You should override Article::getData 
directly." << std::endl;
-  return __source->getData(getAid());
-}
-Blob ArticleSource::getData(const std::string& aid) {
-throw std::runtime_error("This should not be called");
-}
-
/**/
-
 Uuid ArticleSource::getUuid()
 {
   return Uuid::generate();
diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index 1b528f4..0f0f6d0 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -202,15 +202,6 @@
 }
 
 // Add blob data to compressed or uncompressed cluster.
-
/**/
-/* For API compatibility.
- * The default Article::getData call ArticleSource::getData.
- * So set the source of article to let default API compatible function
- * do its job.
- * This should be removed once every users switch to new API.
- */
-article->__source = 
-
/**/
 Blob blob = article->getData();
 if (blob.size() > 0)
 {
diff --git a/zimwriterfs/article.cpp b/zimwriterfs/article.cpp
index 4aeb083..98ec882 100644
--- a/zimwriterfs/article.cpp
+++ b/zimwriterfs/article.cpp
@@ -24,7 +24,9 @@
 
 extern std::string directoryPath;
 
-Article::Article(const std::string& path, const bool detectRedirects) {

[MediaWiki-commits] [Gerrit] Add a API to get the offset of a article in the zimfile. - change (openzim)

2016-07-03 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Add a API to get the offset of a article in the zimfile.
..


Add a API to get the offset of a article in the zimfile.

To get the offset of a article :

- get the article
- use article.getOffset()

If offset cannot be found (not regular article (redirection...) or cluster
is compressed), 0 is returned.

Change-Id: I5b4aced056c16aa8fc62ce4b8048553ae1f96c25
---
M zimlib/include/zim/article.h
M zimlib/include/zim/cluster.h
M zimlib/include/zim/file.h
M zimlib/src/cluster.cpp
M zimlib/src/file.cpp
5 files changed, 32 insertions(+), 7 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/article.h b/zimlib/include/zim/article.h
index 5172adb..b950173 100644
--- a/zimlib/include/zim/article.h
+++ b/zimlib/include/zim/article.h
@@ -85,6 +85,15 @@
: 
const_cast(file).getBlob(dirent.getClusterNumber(), 
dirent.getBlobNumber());
   }
 
+  offset_type getOffset() const
+  {
+Dirent dirent = getDirent();
+return isRedirect()
+|| isLinktarget()
+|| isDeleted() ? 0
+   : 
const_cast(file).getOffset(dirent.getClusterNumber(), 
dirent.getBlobNumber());
+  }
+
   std::string getPage(bool layout = true, unsigned maxRecurse = 10);
   void getPage(std::ostream&, bool layout = true, unsigned maxRecurse = 
10);
 
diff --git a/zimlib/include/zim/cluster.h b/zimlib/include/zim/cluster.h
index bd55cb5..96b16f0 100644
--- a/zimlib/include/zim/cluster.h
+++ b/zimlib/include/zim/cluster.h
@@ -42,6 +42,7 @@
   CompressionType compression;
   Offsets offsets;
   Data data;
+  offset_type startOffset;
 
   void read(std::istream& in);
   void write(std::ostream& out) const;
@@ -49,14 +50,15 @@
 public:
   ClusterImpl();
 
-  void setCompression(CompressionType c)  { compression = c; }
-  CompressionType getCompression() const  { return compression; }
-  bool isCompressed() const   { return compression == 
zimcompZip || compression == zimcompBzip2 || compression == zimcompLzma; }
+  void setCompression(CompressionType c)   { compression = c; }
+  CompressionType getCompression() const   { return compression; }
+  bool isCompressed() const{ return compression == 
zimcompZip || compression == zimcompBzip2 || compression == zimcompLzma; }
 
-  size_type getCount() const  { return offsets.size() - 1; }
-  const char* getData(unsigned n) const   { return [ offsets[n] ]; }
-  size_type getSize(unsigned n) const { return offsets[n+1] - 
offsets[n]; }
-  size_type getSize() const   { return offsets.size() * 
sizeof(size_type) + data.size(); }
+  size_type getCount() const   { return offsets.size() - 1; }
+  const char* getData(unsigned n) const{ return [ offsets[n] ]; }
+  size_type getSize(unsigned n) const  { return offsets[n+1] - 
offsets[n]; }
+  size_type getSize() const{ return offsets.size() * 
sizeof(size_type) + data.size(); }
+  offset_type getOffset(size_type n) const { return startOffset + 
offsets[n]; }
   Blob getBlob(size_type n) const;
   void clear();
 
@@ -85,6 +87,7 @@
 
   const char* getBlobPtr(size_type n) const { return impl->getData(n); 
}
   size_type getBlobSize(size_type n) const  { return impl->getSize(n); 
}
+  offset_type getBlobOffset(size_type n) const  { return 
impl->getOffset(n); }
   Blob getBlob(size_type n) const;
 
   size_type count() const   { return impl ? impl->getCount() : 0; }
diff --git a/zimlib/include/zim/file.h b/zimlib/include/zim/file.h
index a6ac75b..0a3a2c3 100644
--- a/zimlib/include/zim/file.h
+++ b/zimlib/include/zim/file.h
@@ -62,6 +62,7 @@
 
   Blob getBlob(size_type clusterIdx, size_type blobIdx)
 { return getCluster(clusterIdx).getBlob(blobIdx); }
+  offset_type getOffset(size_type clusterIdx, size_type blobIdx);
 
   size_type getNamespaceBeginOffset(char ch)
 { return impl->getNamespaceBeginOffset(ch); }
diff --git a/zimlib/src/cluster.cpp b/zimlib/src/cluster.cpp
index 3630042..3b24fee 100644
--- a/zimlib/src/cluster.cpp
+++ b/zimlib/src/cluster.cpp
@@ -79,6 +79,9 @@
 
 size_type n = offset / 4;
 size_type a = offset;
+// offset are from start of cluster !after the char telling the 
compression!
+// but startOffset is offset from start of the cluster.
+startOffset = offset + sizeof(char);
 
 log_debug1("first offset is " << offset << " n=" << n << " a=" << a);
 
diff --git a/zimlib/src/file.cpp b/zimlib/src/file.cpp
index b6777e6..c5f25a4 100644
--- a/zimlib/src/file.cpp
+++ b/zimlib/src/file.cpp
@@ -201,6 +201,15 @@
   File::const_iterator File::findByTitle(char ns, const std::string& title)
   { 

[MediaWiki-commits] [Gerrit] Fix issue #2 Zimdump crashes on long titles. - change (openzim)

2016-07-03 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Fix issue #2 Zimdump crashes on long titles.
..


Fix issue #2 Zimdump crashes on long titles.

Most of filesystems have a filename limited to 255 bytes.
If the filename is > 255 bytes truncate it.
Postfix the truncated filename with a counter to avoid name collision.

Change-Id: I0475aaa2d1221be46c48c5a52814ca6659cc7940
---
M zimlib/src/tools/zimDump.cpp
1 file changed, 8 insertions(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/src/tools/zimDump.cpp b/zimlib/src/tools/zimDump.cpp
index 65d13b8..7c0149d 100644
--- a/zimlib/src/tools/zimDump.cpp
+++ b/zimlib/src/tools/zimDump.cpp
@@ -394,6 +394,7 @@
 
 void ZimDumper::dumpFiles(const std::string& directory)
 {
+  unsigned int truncatedFiles = 0;
   ::mkdir(directory.c_str(), 0777);
 
   std::set ns;
@@ -406,6 +407,13 @@
 std::string::size_type p;
 while ((p = t.find('/')) != std::string::npos)
   t.replace(p, 1, "%2f");
+if ( t.length() > 255 )
+{
+  std::ostringstream sspostfix, sst;
+  sspostfix << (++truncatedFiles);
+  sst << t.substr(0, 254-sspostfix.tellp()) << "~" << sspostfix.str();
+  t = sst.str();
+}
 std::string f = d + '/' + t;
 std::ofstream out(f.c_str());
 out << it->getData();

-- 
To view, visit https://gerrit.wikimedia.org/r/296931
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0475aaa2d1221be46c48c5a52814ca6659cc7940
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Bug fix: don't store pointers inside a dynamic vector. - change (openzim)

2016-06-29 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Bug fix: don't store pointers inside a dynamic vector.
..


Bug fix: don't store pointers inside a dynamic vector.

Use offsets instead, since the actual objects change location whenever
the std::vector resizes its internal storage.

This is a follow-up to f5de40f94b30795f42bb9388cbb46df9cd605167.

Change-Id: I166aa8dd209dd2755e68be70829f269a71a3aaca
---
M zimlib/include/zim/writer/zimcreator.h
M zimlib/src/zimcreator.cpp
2 files changed, 10 insertions(+), 8 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/writer/zimcreator.h 
b/zimlib/include/zim/writer/zimcreator.h
index 2e52d7e..8ce2c44 100644
--- a/zimlib/include/zim/writer/zimcreator.h
+++ b/zimlib/include/zim/writer/zimcreator.h
@@ -33,7 +33,7 @@
 {
   public:
 typedef std::vector DirentsType;
-typedef std::vector DirentPtrsType;
+typedef std::vector DirentPtrsType;
 typedef std::vector SizeVectorType;
 typedef std::vector OffsetsType;
 typedef std::map MimeTypes;
diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index 46c550f..9c024a7 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -232,7 +232,7 @@
 }
 dirents.back().setCluster(clusterOffsets.size(), cluster->count());
 cluster->addBlob(blob);
-myDirents->push_back(&(dirents.back()));
+myDirents->push_back(dirents.size()-1);
 
 // If cluster is now large enough, write it to disk.
 if (cluster->size() >= minChunkSize * 1024)
@@ -247,10 +247,11 @@
   cluster->clear();
   myDirents->clear();
   // Update the cluster number of the dirents *not* written to disk.
-  for (DirentPtrsType::iterator di = otherDirents->begin();
-   di != otherDirents->end(); ++di)
+  for (DirentPtrsType::iterator dpi = otherDirents->begin();
+   dpi != otherDirents->end(); ++dpi)
   {
-(*di)->setCluster(clusterOffsets.size(), (*di)->getBlobNumber());
+Dirent *di = [*dpi];
+di->setCluster(clusterOffsets.size(), di->getBlobNumber());
   }
   offset_type end = out.tellp();
   currentSize += (end - start) +
@@ -263,10 +264,11 @@
   {
 clusterOffsets.push_back(out.tellp());
 out << compCluster;
-for (DirentPtrsType::iterator di = uncompDirents.begin();
- di != uncompDirents.end(); ++di)
+for (DirentPtrsType::iterator dpi = uncompDirents.begin();
+ dpi != uncompDirents.end(); ++dpi)
 {
-  (*di)->setCluster(clusterOffsets.size(), (*di)->getBlobNumber());
+  Dirent *di = [*dpi];
+  di->setCluster(clusterOffsets.size(), di->getBlobNumber());
 }
   }
   compCluster.clear();

-- 
To view, visit https://gerrit.wikimedia.org/r/296644
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I166aa8dd209dd2755e68be70829f269a71a3aaca
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Cscott 
Gerrit-Reviewer: Kelson 
Gerrit-Reviewer: Mgautierfr 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] Actually call the (previously-unused) ArticleSource#setFilen... - change (openzim)

2016-06-29 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Actually call the (previously-unused) ArticleSource#setFilename 
method.
..


Actually call the (previously-unused) ArticleSource#setFilename method.

Change-Id: I00d340f86c91419f1237976f6eb636ea8c32a743
---
M zimlib/src/zimcreator.cpp
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index 9c024a7..1b528f4 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -110,6 +110,7 @@
  ? fname.substr(0, fname.size() - 4)
  : fname;
   log_debug("basename " << basename);
+  src.setFilename(fname);
 
   INFO("create directory entries");
   createDirentsAndClusters(src, basename + ".tmp");

-- 
To view, visit https://gerrit.wikimedia.org/r/296632
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I00d340f86c91419f1237976f6eb636ea8c32a743
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Cscott 
Gerrit-Reviewer: Kelson 
Gerrit-Reviewer: Mgautierfr 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: libzim.pc: Add "Requires" field

2017-01-23 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/330857 )

Change subject: libzim.pc: Add "Requires" field
..


libzim.pc: Add "Requires" field

Follows-up b7e5564423b8644c.

Change-Id: I0710aefa8f706067c5e6cb11c70da785c7956608
---
M zimlib/libzim.pc.in
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Mgautierfr: Looks good to me, but someone else must approve
  Kelson: Verified; Looks good to me, approved

Objections:
  Legoktm: There's a problem with this change, please improve



diff --git a/zimlib/libzim.pc.in b/zimlib/libzim.pc.in
index bbef0d6..58cc155 100644
--- a/zimlib/libzim.pc.in
+++ b/zimlib/libzim.pc.in
@@ -6,6 +6,7 @@
 Name: libzim
 Description: implements read and write methods for ZIM files
 Version: @VERSION@
+Requires: liblzma
 Libs: -L${libdir} -lzim
 Cflags: -I${includedir}
 

-- 
To view, visit https://gerrit.wikimedia.org/r/330857
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0710aefa8f706067c5e6cb11c70da785c7956608
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Legoktm 
Gerrit-Reviewer: Kelson 
Gerrit-Reviewer: Legoktm 
Gerrit-Reviewer: Mgautierfr 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Make zimlib compilable with meson.

2017-01-23 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/328341 )

Change subject: Make zimlib compilable with meson.
..


Make zimlib compilable with meson.

Also install a correct pkg-config file who correctly declare dependencies
of zimlib.

Change-Id: I6b2e1bb0797cdc0afbf8c986dd91bfac757242d0
---
M zimlib/.gitignore
A zimlib/include/meson.build
A zimlib/meson.build
A zimlib/meson_options.txt
A zimlib/src/config.h.in
A zimlib/src/meson.build
6 files changed, 168 insertions(+), 1 deletion(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/.gitignore b/zimlib/.gitignore
index 550c09a..3563fa8 100644
--- a/zimlib/.gitignore
+++ b/zimlib/.gitignore
@@ -2,7 +2,7 @@
 *#*
 autom4te.cache
 compile
-config.*
+config.h
 configure
 depcomp
 .deps
diff --git a/zimlib/include/meson.build b/zimlib/include/meson.build
new file mode 100644
index 000..0caa2e8
--- /dev/null
+++ b/zimlib/include/meson.build
@@ -0,0 +1,34 @@
+install_headers(
+'zim/article.h',
+'zim/articlesearch.h',
+'zim/blob.h',
+'zim/cache.h',
+'zim/cluster.h',
+'zim/dirent.h',
+'zim/endian.h',
+'zim/error.h',
+'zim/file.h',
+'zim/fileheader.h',
+'zim/fileimpl.h',
+'zim/fileiterator.h',
+'zim/fstream.h',
+'zim/indexarticle.h',
+'zim/noncopyable.h',
+'zim/search.h',
+'zim/smartptr.h',
+'zim/refcounted.h',
+'zim/template.h',
+'zim/unicode.h',
+'zim/uuid.h',
+'zim/zim.h',
+'zim/zintstream.h',
+subdir:'zim'
+)
+
+install_headers(
+'zim/writer/articlesource.h',
+'zim/writer/dirent.h',
+'zim/writer/zimcreator.h',
+subdir:'zim/writer'
+)
+
diff --git a/zimlib/meson.build b/zimlib/meson.build
new file mode 100644
index 000..0eed3c4
--- /dev/null
+++ b/zimlib/meson.build
@@ -0,0 +1,44 @@
+project('libzim', ['c', 'cpp'],
+  version : '1.4',
+  license : 'GPL2')
+
+abi_current=2
+abi_revision=0
+abi_age=0
+  
+conf = configuration_data()
+conf.set('VERSION', '"@0@"'.format(meson.project_version()))
+conf.set('DIRENT_CACHE_SIZE', get_option('DIRENT_CACHE_SIZE'))
+conf.set('CLUSTER_CACHE_SIZE', get_option('CLUSTER_CACHE_SIZE'))
+conf.set('LZMA_MEMORY_SIZE', get_option('LZMA_MEMORY_SIZE'))
+
+zlib_dep = dependency('zlib', required:false)
+conf.set('ENABLE_ZLIB', zlib_dep.found())
+lzma_dep = dependency('liblzma', required:false)
+conf.set('ENABLE_LZMA', lzma_dep.found())
+bzip2_dep = dependency('bzip2', required:false)
+conf.set('ENABLE_BZIP2', bzip2_dep.found())
+
+pkg_requires = []
+if zlib_dep.found()
+pkg_requires += ['zlib']
+endif
+if lzma_dep.found()
+pkg_requires += ['liblzma']
+endif
+if bzip2_dep.found()
+pkg_requires += ['bzip2']
+endif
+
+inc = include_directories('include')
+
+subdir('include')
+subdir('src')
+
+pkg_mod = import('pkgconfig')
+pkg_mod.generate(libraries : libzim,
+ version : meson.project_version(),
+ name : 'libzim',
+ filebase : 'libzim',
+ description : 'A Library to zim.',
+ requires : pkg_requires)
diff --git a/zimlib/meson_options.txt b/zimlib/meson_options.txt
new file mode 100644
index 000..108edf9
--- /dev/null
+++ b/zimlib/meson_options.txt
@@ -0,0 +1,6 @@
+option('CLUSTER_CACHE_SIZE', type : 'string', value : '16',
+  description : 'set cluster cache size to number (default:16)')
+option('DIRENT_CACHE_SIZE', type : 'string', value : '512',
+  description : 'set dirent cache size to number (default:512)')
+option('LZMA_MEMORY_SIZE', type : 'string', value : '128',
+  description : 'set lzma uncompress memory in MB (default:128)')
\ No newline at end of file
diff --git a/zimlib/src/config.h.in b/zimlib/src/config.h.in
new file mode 100644
index 000..d9e4b7d
--- /dev/null
+++ b/zimlib/src/config.h.in
@@ -0,0 +1,14 @@
+
+#mesondefine VERSION
+
+#mesondefine DIRENT_CACHE_SIZE
+
+#mesondefine CLUSTER_CACHE_SIZE
+
+#mesondefine LZMA_MEMORY_SIZE
+
+#mesondefine ENABLE_ZLIB
+
+#mesondefine ENABLE_LZMA
+
+#mesondefine ENABLE_BZIP2
diff --git a/zimlib/src/meson.build b/zimlib/src/meson.build
new file mode 100644
index 000..ec550f4
--- /dev/null
+++ b/zimlib/src/meson.build
@@ -0,0 +1,69 @@
+
+configure_file(output : 'config.h',
+   configuration : conf,
+   input : 'config.h.in')
+
+common_sources = [
+#'config.h',
+'article.cpp',
+'articlesearch.cpp',
+'articlesource.cpp',
+'cluster.cpp',
+'dirent.cpp',
+'envvalue.cpp',
+'file.cpp',
+'fileheader.cpp',
+'fileimpl.cpp',
+'fstream.cpp',
+'indexarticle.cpp',
+'md5.c',
+'md5stream.cpp',
+'ptrstream.cpp',
+'search.cpp',
+'tee.cpp',
+'template.cpp',
+'unicode.cpp',
+'uuid.cpp',
+'zimcreator.cpp',
+'zintstream.cpp'
+]
+
+zlib_sources = [
+'deflatestream.cpp',
+'inflatestream.cpp'
+]
+
+bzip2_sources = [
+

[MediaWiki-commits] [Gerrit] openzim[master]: Fix use of ENABLE_* by preprocessor.

2017-01-23 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/330236 )

Change subject: Fix use of ENABLE_* by preprocessor.
..


Fix use of ENABLE_* by preprocessor.

If a compression lib is not present, the associated ENABLE_* is not
defined.
So we need to always test for the definition existance and not for its
value.
This is not the case in zimcreator.cpp with the #ifdef/#elif construction.

We also change other (correct) syntax #ifdef to #if defined() to keep some
consistency.

Change-Id: I86d4309bfcdeeb3356d0fb4f192d0849a5e57275
---
M zimlib/src/cluster.cpp
M zimlib/src/zimcreator.cpp
M zimlib/test/cluster.cpp
3 files changed, 24 insertions(+), 24 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/src/cluster.cpp b/zimlib/src/cluster.cpp
index 9dbefdc..e944ec6 100644
--- a/zimlib/src/cluster.cpp
+++ b/zimlib/src/cluster.cpp
@@ -28,17 +28,17 @@
 
 #include "config.h"
 
-#ifdef ENABLE_ZLIB
+#if defined(ENABLE_ZLIB)
 #include 
 #include 
 #endif
 
-#ifdef ENABLE_BZIP2
+#if defined(ENABLE_BZIP2)
 #include 
 #include 
 #endif
 
-#ifdef ENABLE_LZMA
+#if defined(ENABLE_LZMA)
 #include 
 #include 
 #endif
@@ -212,7 +212,7 @@
 
   case zimcompZip:
 {
-#ifdef ENABLE_ZLIB
+#if defined(ENABLE_ZLIB)
   log_debug("uncompress data (zlib)");
   zim::InflateStream is(in);
   is.exceptions(std::ios::failbit | std::ios::badbit);
@@ -226,7 +226,7 @@
 
   case zimcompBzip2:
 {
-#ifdef ENABLE_BZIP2
+#if defined(ENABLE_BZIP2)
   log_debug("uncompress data (bzip2)");
   zim::Bunzip2Stream is(in);
   is.exceptions(std::ios::failbit | std::ios::badbit);
@@ -240,7 +240,7 @@
 
   case zimcompLzma:
 {
-#ifdef ENABLE_LZMA
+#if defined(ENABLE_LZMA)
   log_debug("uncompress data (lzma)");
   zim::UnlzmaStream is(in);
   is.exceptions(std::ios::failbit | std::ios::badbit);
@@ -274,7 +274,7 @@
 
   case zimcompZip:
 {
-#ifdef ENABLE_ZLIB
+#if defined(ENABLE_ZLIB)
   log_debug("compress data (zlib)");
   zim::DeflateStream os(out);
   os.exceptions(std::ios::failbit | std::ios::badbit);
@@ -288,7 +288,7 @@
 
   case zimcompBzip2:
 {
-#ifdef ENABLE_BZIP2
+#if defined(ENABLE_BZIP2)
   log_debug("compress data (bzip2)");
   zim::Bzip2Stream os(out);
   os.exceptions(std::ios::failbit | std::ios::badbit);
@@ -302,7 +302,7 @@
 
   case zimcompLzma:
 {
-#ifdef ENABLE_LZMA
+#if defined(ENABLE_LZMA)
   uint32_t lzmaPreset = 3 | LZMA_PRESET_EXTREME;
   /**
* read lzma preset from environment
diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index 1e4a21c..fac4c96 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -55,11 +55,11 @@
 ZimCreator::ZimCreator()
   : minChunkSize(1024-64),
 nextMimeIdx(0),
-#ifdef ENABLE_LZMA
+#if defined(ENABLE_LZMA)
 compression(zimcompLzma),
-#elif ENABLE_BZIP2
+#elif defined(ENABLE_BZIP2)
 compression(zimcompBzip2),
-#elif ENABLE_ZLIB
+#elif defined(ENABLE_ZLIB)
 compression(zimcompZip),
 #else
 compression(zimcompNone),
@@ -70,11 +70,11 @@
 
 ZimCreator::ZimCreator(int& argc, char* argv[])
   : nextMimeIdx(0),
-#ifdef ENABLE_LZMA
+#if defined(ENABLE_LZMA)
 compression(zimcompLzma),
-#elif ENABLE_BZIP2
+#elif defined(ENABLE_BZIP2)
 compression(zimcompBzip2),
-#elif ENABLE_ZLIB
+#elif defined( ENABLE_ZLIB)
 compression(zimcompZip),
 #else
 compression(zimcompNone),
@@ -87,15 +87,15 @@
   else
 minChunkSize = Arg(argc, argv, 's', 1024-64);
 
-#ifdef ENABLE_ZLIB
+#if defined(ENABLE_ZLIB)
   if (Arg(argc, argv, "--zlib"))
 compression = zimcompZip;
 #endif
-#ifdef ENABLE_BZIP2
+#if defined(ENABLE_BZIP2)
   if (Arg(argc, argv, "--bzip2"))
 compression = zimcompBzip2;
 #endif
-#ifdef ENABLE_LZMA
+#if defined(ENABLE_LZMA)
   if (Arg(argc, argv, "--lzma"))
 compression = zimcompLzma;
 #endif
diff --git a/zimlib/test/cluster.cpp b/zimlib/test/cluster.cpp
index 687c1e1..b907bad 100644
--- a/zimlib/test/cluster.cpp
+++ b/zimlib/test/cluster.cpp
@@ -39,13 +39,13 @@
   registerMethod("CreateCluster", *this, ::CreateCluster);
   registerMethod("ReadWriteCluster", *this, 
::ReadWriteCluster);
   registerMethod("ReadWriteEmpty", *this, ::ReadWriteEmpty);
-#ifdef ENABLE_ZLIB
+#if defined(ENABLE_ZLIB)
   registerMethod("ReadWriteClusterZ", *this, 
::ReadWriteClusterZ);
 #endif
-#ifdef ENABLE_BZIP2
+#if defined(ENABLE_BZIP2)
   registerMethod("ReadWriteClusterBz2", *this, 
::ReadWriteClusterBz2);
 #endif
-#ifdef ENABLE_LZMA
+#if defined(ENABLE_LZMA)
   registerMethod("ReadWriteClusterLzma", *this, 
::ReadWriteClusterLzma);
 #endif
 }
@@ -126,7 +126,7 @@
   

[MediaWiki-commits] [Gerrit] openzim[master]: Make zimlib compilable with meson.

2017-01-23 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/330417 )

Change subject: Make zimlib compilable with meson.
..


Make zimlib compilable with meson.

Also install a correct pkg-config file who correctly declare dependencies
of zimlib.

Change-Id: I86b999e46fac42e8d552cba8a1be1c674bb2c67d
---
M zimlib/.gitignore
A zimlib/include/meson.build
A zimlib/meson.build
A zimlib/meson_options.txt
A zimlib/src/config.h.in
A zimlib/src/meson.build
6 files changed, 168 insertions(+), 1 deletion(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/.gitignore b/zimlib/.gitignore
index 550c09a..3563fa8 100644
--- a/zimlib/.gitignore
+++ b/zimlib/.gitignore
@@ -2,7 +2,7 @@
 *#*
 autom4te.cache
 compile
-config.*
+config.h
 configure
 depcomp
 .deps
diff --git a/zimlib/include/meson.build b/zimlib/include/meson.build
new file mode 100644
index 000..0caa2e8
--- /dev/null
+++ b/zimlib/include/meson.build
@@ -0,0 +1,34 @@
+install_headers(
+'zim/article.h',
+'zim/articlesearch.h',
+'zim/blob.h',
+'zim/cache.h',
+'zim/cluster.h',
+'zim/dirent.h',
+'zim/endian.h',
+'zim/error.h',
+'zim/file.h',
+'zim/fileheader.h',
+'zim/fileimpl.h',
+'zim/fileiterator.h',
+'zim/fstream.h',
+'zim/indexarticle.h',
+'zim/noncopyable.h',
+'zim/search.h',
+'zim/smartptr.h',
+'zim/refcounted.h',
+'zim/template.h',
+'zim/unicode.h',
+'zim/uuid.h',
+'zim/zim.h',
+'zim/zintstream.h',
+subdir:'zim'
+)
+
+install_headers(
+'zim/writer/articlesource.h',
+'zim/writer/dirent.h',
+'zim/writer/zimcreator.h',
+subdir:'zim/writer'
+)
+
diff --git a/zimlib/meson.build b/zimlib/meson.build
new file mode 100644
index 000..0eed3c4
--- /dev/null
+++ b/zimlib/meson.build
@@ -0,0 +1,44 @@
+project('libzim', ['c', 'cpp'],
+  version : '1.4',
+  license : 'GPL2')
+
+abi_current=2
+abi_revision=0
+abi_age=0
+  
+conf = configuration_data()
+conf.set('VERSION', '"@0@"'.format(meson.project_version()))
+conf.set('DIRENT_CACHE_SIZE', get_option('DIRENT_CACHE_SIZE'))
+conf.set('CLUSTER_CACHE_SIZE', get_option('CLUSTER_CACHE_SIZE'))
+conf.set('LZMA_MEMORY_SIZE', get_option('LZMA_MEMORY_SIZE'))
+
+zlib_dep = dependency('zlib', required:false)
+conf.set('ENABLE_ZLIB', zlib_dep.found())
+lzma_dep = dependency('liblzma', required:false)
+conf.set('ENABLE_LZMA', lzma_dep.found())
+bzip2_dep = dependency('bzip2', required:false)
+conf.set('ENABLE_BZIP2', bzip2_dep.found())
+
+pkg_requires = []
+if zlib_dep.found()
+pkg_requires += ['zlib']
+endif
+if lzma_dep.found()
+pkg_requires += ['liblzma']
+endif
+if bzip2_dep.found()
+pkg_requires += ['bzip2']
+endif
+
+inc = include_directories('include')
+
+subdir('include')
+subdir('src')
+
+pkg_mod = import('pkgconfig')
+pkg_mod.generate(libraries : libzim,
+ version : meson.project_version(),
+ name : 'libzim',
+ filebase : 'libzim',
+ description : 'A Library to zim.',
+ requires : pkg_requires)
diff --git a/zimlib/meson_options.txt b/zimlib/meson_options.txt
new file mode 100644
index 000..108edf9
--- /dev/null
+++ b/zimlib/meson_options.txt
@@ -0,0 +1,6 @@
+option('CLUSTER_CACHE_SIZE', type : 'string', value : '16',
+  description : 'set cluster cache size to number (default:16)')
+option('DIRENT_CACHE_SIZE', type : 'string', value : '512',
+  description : 'set dirent cache size to number (default:512)')
+option('LZMA_MEMORY_SIZE', type : 'string', value : '128',
+  description : 'set lzma uncompress memory in MB (default:128)')
\ No newline at end of file
diff --git a/zimlib/src/config.h.in b/zimlib/src/config.h.in
new file mode 100644
index 000..d9e4b7d
--- /dev/null
+++ b/zimlib/src/config.h.in
@@ -0,0 +1,14 @@
+
+#mesondefine VERSION
+
+#mesondefine DIRENT_CACHE_SIZE
+
+#mesondefine CLUSTER_CACHE_SIZE
+
+#mesondefine LZMA_MEMORY_SIZE
+
+#mesondefine ENABLE_ZLIB
+
+#mesondefine ENABLE_LZMA
+
+#mesondefine ENABLE_BZIP2
diff --git a/zimlib/src/meson.build b/zimlib/src/meson.build
new file mode 100644
index 000..ec550f4
--- /dev/null
+++ b/zimlib/src/meson.build
@@ -0,0 +1,69 @@
+
+configure_file(output : 'config.h',
+   configuration : conf,
+   input : 'config.h.in')
+
+common_sources = [
+#'config.h',
+'article.cpp',
+'articlesearch.cpp',
+'articlesource.cpp',
+'cluster.cpp',
+'dirent.cpp',
+'envvalue.cpp',
+'file.cpp',
+'fileheader.cpp',
+'fileimpl.cpp',
+'fstream.cpp',
+'indexarticle.cpp',
+'md5.c',
+'md5stream.cpp',
+'ptrstream.cpp',
+'search.cpp',
+'tee.cpp',
+'template.cpp',
+'unicode.cpp',
+'uuid.cpp',
+'zimcreator.cpp',
+'zintstream.cpp'
+]
+
+zlib_sources = [
+'deflatestream.cpp',
+'inflatestream.cpp'
+]
+
+bzip2_sources = [
+

[MediaWiki-commits] [Gerrit] openzim[master]: Support running tests with "make check"

2017-01-18 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/330859 )

Change subject: Support running tests with "make check"
..


Support running tests with "make check"

Change-Id: I1e15ba2dbda2f71c98bafa84d36b81c2a1772df1
---
M zimlib/.gitignore
M zimlib/test/Makefile.am
2 files changed, 4 insertions(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/.gitignore b/zimlib/.gitignore
index 550c09a..5896863 100644
--- a/zimlib/.gitignore
+++ b/zimlib/.gitignore
@@ -28,3 +28,6 @@
 src/tools/zimdump
 src/tools/zimsearch
 libzim.pc
+test-driver
+test/zimlib-test*
+test/test-suite.log
diff --git a/zimlib/test/Makefile.am b/zimlib/test/Makefile.am
index 29dd7bd..f28d6fe 100644
--- a/zimlib/test/Makefile.am
+++ b/zimlib/test/Makefile.am
@@ -1,6 +1,7 @@
 AM_CPPFLAGS=-I$(top_srcdir)/include
 
 noinst_PROGRAMS = zimlib-test
+TESTS = zimlib-test
 
 if WITH_ZLIB
 ZLIB_SOURCES = \

-- 
To view, visit https://gerrit.wikimedia.org/r/330859
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1e15ba2dbda2f71c98bafa84d36b81c2a1772df1
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Legoktm 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Provide time gap between uuid generation during tests

2017-01-18 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/330861 )

Change subject: Provide time gap between uuid generation during tests
..


Provide time gap between uuid generation during tests

GNU Mach's i.e. Hurd's kernel clock is not very accurate so during
test on generating and comparing uuid's time might be same leading to
generating same Uuid. This patch adds a sleep of 1s between 2
generate statemens. Thanks to Pino Toscano.

This patch was written for the Debian package to fix a build failure on
GNU Hurd, and is suitable to be included upstream.

Change-Id: I1a1520bb0597244a1d2423ed17e87c5a39cb4aa6
---
M zimlib/test/uuid.cpp
1 file changed, 8 insertions(+), 0 deletions(-)

Approvals:
  Mgautierfr: Looks good to me, but someone else must approve
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/test/uuid.cpp b/zimlib/test/uuid.cpp
index 3348b73..0bace52 100644
--- a/zimlib/test/uuid.cpp
+++ b/zimlib/test/uuid.cpp
@@ -24,6 +24,8 @@
 #include 
 #include 
 
+#include 
+
 class UuidTest : public cxxtools::unit::TestSuite
 {
   public:
@@ -92,6 +94,12 @@
   CXXTOOLS_UNIT_ASSERT(uuid1 != zim::Uuid());
   CXXTOOLS_UNIT_ASSERT(uuid2 == zim::Uuid());
 
+  // Since GNU Mach's clock isn't precise hence the time might be
+  // same during generating uuid1 and uuid2 leading to test
+  // failure. To bring the time difference between 2 sleep for a
+  // second. Thanks to Pino Toscano.
+  sleep(1);
+
   uuid2 = zim::Uuid::generate();
   CXXTOOLS_UNIT_ASSERT(uuid1 != uuid2);
   CXXTOOLS_UNIT_ASSERT(uuid1 != zim::Uuid());

-- 
To view, visit https://gerrit.wikimedia.org/r/330861
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1a1520bb0597244a1d2423ed17e87c5a39cb4aa6
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Legoktm 
Gerrit-Reviewer: Kelson 
Gerrit-Reviewer: Mgautierfr 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Read the cluster content only when necessary.

2016-10-09 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Read the cluster content only when necessary.
..


Read the cluster content only when necessary.

Instead of reading the full cluster content at cluster creation, it is
better to read the cluster content when we need it.
This is mainly useful when we want a cluster to only get an article
offset.
For compressed cluster we read all in once cause :
- We create a proxy uncompressor stream from the input stream.
  This proxy stream do not handle teelg who is necessary for lazy_read.
- This change is only useful when we use getOffset, and there is no
  offset available on comressed cluster.

We do not use the operator>> anymore.
This operator is designed to allow chained reads. ie :
in >> cluster1 >> cluster2;

As may do not read all the content when reading from in, the use of the
operator>> is now not desirable.

Change-Id: I0709eb6b8fe49512ee302d13dfd5641cdc1c676b
---
M zimlib/include/zim/cluster.h
M zimlib/src/cluster.cpp
M zimlib/src/file.cpp
M zimlib/src/fileimpl.cpp
4 files changed, 87 insertions(+), 46 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/cluster.h b/zimlib/include/zim/cluster.h
index 96b16f0..4fd3971 100644
--- a/zimlib/include/zim/cluster.h
+++ b/zimlib/include/zim/cluster.h
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -33,7 +34,6 @@
 
   class ClusterImpl : public RefCounted
   {
-  friend std::istream& operator>> (std::istream& in, ClusterImpl& 
blobImpl);
   friend std::ostream& operator<< (std::ostream& out, const ClusterImpl& 
blobImpl);
 
   typedef std::vector Offsets;
@@ -41,11 +41,28 @@
 
   CompressionType compression;
   Offsets offsets;
-  Data data;
+  Data _data;
   offset_type startOffset;
 
-  void read(std::istream& in);
+  ifstream* lazy_read_stream;
+
+  offset_type read_header(std::istream& in);
+  void read_content(std::istream& in);
   void write(std::ostream& out) const;
+
+  void set_lazy_read(ifstream* in) {
+lazy_read_stream = in;
+  }
+
+  bool is_fully_initialised() const { return lazy_read_stream == 0; }
+  void finalise_read();
+  const Data& data() const {
+if ( !is_fully_initialised() )
+{
+   const_cast(this)->finalise_read();
+}
+return _data;
+  }
 
 public:
   ClusterImpl();
@@ -55,20 +72,21 @@
   bool isCompressed() const{ return compression == 
zimcompZip || compression == zimcompBzip2 || compression == zimcompLzma; }
 
   size_type getCount() const   { return offsets.size() - 1; }
-  const char* getData(unsigned n) const{ return [ offsets[n] ]; }
+  const char* getData(unsigned n) const{ return ()[ offsets[n] ]; 
}
   size_type getSize(unsigned n) const  { return offsets[n+1] - 
offsets[n]; }
-  size_type getSize() const{ return offsets.size() * 
sizeof(size_type) + data.size(); }
+  size_type getSize() const{ return offsets.size() * 
sizeof(size_type) + data().size(); }
   offset_type getOffset(size_type n) const { return startOffset + 
offsets[n]; }
   Blob getBlob(size_type n) const;
   void clear();
 
   void addBlob(const Blob& blob);
   void addBlob(const char* data, unsigned size);
+
+  void init_from_stream(ifstream& in, offset_type offset);
   };
 
   class Cluster
   {
-  friend std::istream& operator>> (std::istream& in, Cluster& blob);
   friend std::ostream& operator<< (std::ostream& out, const Cluster& blob);
 
   SmartPtr impl;
@@ -98,10 +116,10 @@
   void addBlob(const Blob& blob){ 
getImpl()->addBlob(blob); }
 
   operator bool() const   { return impl; }
+
+  void init_from_stream(ifstream& in, offset_type offset);
   };
 
-  std::istream& operator>> (std::istream& in, ClusterImpl& blobImpl);
-  std::istream& operator>> (std::istream& in, Cluster& blob);
   std::ostream& operator<< (std::ostream& out, const ClusterImpl& blobImpl);
   std::ostream& operator<< (std::ostream& out, const Cluster& blob);
 
diff --git a/zimlib/src/cluster.cpp b/zimlib/src/cluster.cpp
index 3b24fee..9dbefdc 100644
--- a/zimlib/src/cluster.cpp
+++ b/zimlib/src/cluster.cpp
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -60,34 +61,35 @@
   }
 
   ClusterImpl::ClusterImpl()
-: compression(zimcompNone)
+: compression(zimcompNone),
+  startOffset(0),
+  lazy_read_stream(NULL)
   {
 offsets.push_back(0);
   }
 
-  void ClusterImpl::read(std::istream& in)
+  /* This return the number of char read */
+  offset_type ClusterImpl::read_header(std::istream& in)
   {
-log_debug1("read");
-
+log_debug1("read_header");
 // read first offset, which specifies, how 

[MediaWiki-commits] [Gerrit] openzim[master]: Avoid get the dirent too many time.

2016-10-09 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Avoid get the dirent too many time.
..


Avoid get the dirent too many time.

Article::is* functions are "alias" to Article::getDirent().is* functions.
As we already get the dirent just before, directly use the functions on
the dirent, not the article.
This way, we avoid calls to getDirent().

Change-Id: Ida3531c540b4848ee7028fee420eefa68aea3d40
---
M zimlib/include/zim/article.h
1 file changed, 8 insertions(+), 8 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/article.h b/zimlib/include/zim/article.h
index b950173..f2ece1e 100644
--- a/zimlib/include/zim/article.h
+++ b/zimlib/include/zim/article.h
@@ -79,19 +79,19 @@
   Blob getData() const
   {
 Dirent dirent = getDirent();
-return isRedirect()
-|| isLinktarget()
-|| isDeleted() ? Blob()
-   : 
const_cast(file).getBlob(dirent.getClusterNumber(), 
dirent.getBlobNumber());
+return dirent.isRedirect()
+|| dirent.isLinktarget()
+|| dirent.isDeleted() ? Blob()
+  : 
const_cast(file).getBlob(dirent.getClusterNumber(), 
dirent.getBlobNumber());
   }
 
   offset_type getOffset() const
   {
 Dirent dirent = getDirent();
-return isRedirect()
-|| isLinktarget()
-|| isDeleted() ? 0
-   : 
const_cast(file).getOffset(dirent.getClusterNumber(), 
dirent.getBlobNumber());
+return dirent.isRedirect()
+|| dirent.isLinktarget()
+|| dirent.isDeleted() ? 0
+  : 
const_cast(file).getOffset(dirent.getClusterNumber(), 
dirent.getBlobNumber());
   }
 
   std::string getPage(bool layout = true, unsigned maxRecurse = 10);

-- 
To view, visit https://gerrit.wikimedia.org/r/314717
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ida3531c540b4848ee7028fee420eefa68aea3d40
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Adapt tests to new internal cluster API.

2016-11-07 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Adapt tests to new internal cluster API.
..


Adapt tests to new internal cluster API.

- There is no more operator>>() on cluster. We should use init_from_stream.
- The stream must be a zim::ifstream not a std::istream.

Change-Id: I58b8e1d43b0973129d02393b83c3f248b77768fd
---
M zimlib/test/cluster.cpp
1 file changed, 48 insertions(+), 20 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/test/cluster.cpp b/zimlib/test/cluster.cpp
index c4c25e6..687c1e1 100644
--- a/zimlib/test/cluster.cpp
+++ b/zimlib/test/cluster.cpp
@@ -18,9 +18,12 @@
  */
 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 #include 
 #include 
@@ -69,7 +72,9 @@
 
 void ReadWriteCluster()
 {
-  std::stringstream s;
+  std::string name = std::tmpnam(NULL);
+  std::ofstream os;
+  os.open(name.c_str());
 
   zim::Cluster cluster;
 
@@ -81,20 +86,25 @@
   cluster.addBlob(blob1.data(), blob1.size());
   cluster.addBlob(blob2.data(), blob2.size());
 
-  s << cluster;
+  os << cluster;
+  os.close();
 
+  zim::ifstream is(name);
   zim::Cluster cluster2;
-  s >> cluster2;
-  CXXTOOLS_UNIT_ASSERT(!s.fail());
+  cluster2.init_from_stream(is, 0);
+  CXXTOOLS_UNIT_ASSERT(!is.fail());
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.count(), 3);
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.getBlobSize(0), blob0.size());
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.getBlobSize(1), blob1.size());
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.getBlobSize(2), blob2.size());
+  std::remove(name.c_str());
 }
 
 void ReadWriteEmpty()
 {
-  std::stringstream s;
+  std::string name = std::tmpnam(NULL);
+  std::ofstream os;
+  os.open(name.c_str());
 
   zim::Cluster cluster;
 
@@ -102,21 +112,26 @@
   cluster.addBlob(0, 0);
   cluster.addBlob(0, 0);
 
-  s << cluster;
+  os << cluster;
+  os.close();
 
+  zim::ifstream is(name);
   zim::Cluster cluster2;
-  s >> cluster2;
-  CXXTOOLS_UNIT_ASSERT(!s.fail());
+  cluster2.init_from_stream(is, 0);
+  CXXTOOLS_UNIT_ASSERT(!is.fail());
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.count(), 3);
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.getBlobSize(0), 0);
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.getBlobSize(1), 0);
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.getBlobSize(2), 0);
+  std::remove(name.c_str());
 }
 
 #ifdef ENABLE_ZLIB
 void ReadWriteClusterZ()
 {
-  std::stringstream s;
+  std::string name = std::tmpnam(NULL);
+  std::ofstream os;
+  os.open(name.c_str());
 
   zim::Cluster cluster;
 
@@ -129,11 +144,13 @@
   cluster.addBlob(blob2.data(), blob2.size());
   cluster.setCompression(zim::zimcompZip);
 
-  s << cluster;
+  os << cluster;
+  os.close();
 
+  zim::ifstream is(name);
   zim::Cluster cluster2;
-  s >> cluster2;
-  CXXTOOLS_UNIT_ASSERT(!s.fail());
+  cluster2.init_from_stream(is, 0);
+  CXXTOOLS_UNIT_ASSERT(!is.fail());
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.count(), 3);
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.getCompression(), zim::zimcompZip);
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.getBlobSize(0), blob0.size());
@@ -142,6 +159,7 @@
   CXXTOOLS_UNIT_ASSERT(std::equal(cluster2.getBlobPtr(0), 
cluster2.getBlobPtr(0) + cluster2.getBlobSize(0), blob0.data()));
   CXXTOOLS_UNIT_ASSERT(std::equal(cluster2.getBlobPtr(1), 
cluster2.getBlobPtr(1) + cluster2.getBlobSize(1), blob1.data()));
   CXXTOOLS_UNIT_ASSERT(std::equal(cluster2.getBlobPtr(2), 
cluster2.getBlobPtr(2) + cluster2.getBlobSize(2), blob2.data()));
+  std::remove(name.c_str());
 }
 
 #endif
@@ -149,7 +167,9 @@
 #ifdef ENABLE_BZIP2
 void ReadWriteClusterBz2()
 {
-  std::stringstream s;
+  std::string name = std::tmpnam(NULL);
+  std::ofstream os;
+  os.open(name.c_str());
 
   zim::Cluster cluster;
 
@@ -162,11 +182,13 @@
   cluster.addBlob(blob2.data(), blob2.size());
   cluster.setCompression(zim::zimcompBzip2);
 
-  s << cluster;
+  os << cluster;
+  os.close();
 
+  zim::ifstream is(name);
   zim::Cluster cluster2;
-  s >> cluster2;
-  CXXTOOLS_UNIT_ASSERT(!s.fail());
+  cluster2.init_from_stream(is, 0);
+  CXXTOOLS_UNIT_ASSERT(!is.fail());
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.count(), 3);
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.getCompression(), 
zim::zimcompBzip2);
   CXXTOOLS_UNIT_ASSERT_EQUALS(cluster2.getBlobSize(0), blob0.size());
@@ -175,6 +197,7 @@
   CXXTOOLS_UNIT_ASSERT(std::equal(cluster2.getBlobPtr(0), 
cluster2.getBlobPtr(0) + cluster2.getBlobSize(0), blob0.data()));
   CXXTOOLS_UNIT_ASSERT(std::equal(cluster2.getBlobPtr(1), 
cluster2.getBlobPtr(1) + 

[MediaWiki-commits] [Gerrit] openzim[master]: [zimwriterfs] Try to avoid too big cluster.

2016-10-12 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: [zimwriterfs] Try to avoid too big cluster.
..


[zimwriterfs] Try to avoid too big cluster.

We check that cluster will not be too big *before* adding the content.
This way, cluster are always closed before the maximum size and not
just after.

The only way a cluster can be too big is if the content of a sole article
is bigger than the maximum size.

Change-Id: I77a581df46ae87e01a3fe2689570a7c7355d1877
---
M zimlib/src/zimcreator.cpp
1 file changed, 9 insertions(+), 5 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index 0f0f6d0..1e4a21c 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -222,12 +222,12 @@
   myDirents = 
   otherDirents = 
 }
-dirents.back().setCluster(clusterOffsets.size(), cluster->count());
-cluster->addBlob(blob);
-myDirents->push_back(dirents.size()-1);
 
-// If cluster is now large enough, write it to disk.
-if (cluster->size() >= minChunkSize * 1024)
+// If cluster will be too large, write it to dis, and open a new
+// one for the content.
+if ( cluster->count()
+  && cluster->size()+blob.size() >= minChunkSize * 1024
+   )
 {
   log_info("cluster with " << cluster->count() << " articles, " <<
cluster->size() << " bytes; current title \"" <<
@@ -249,6 +249,10 @@
   currentSize += (end - start) +
 sizeof(offset_type) /* for cluster pointer entry */;
 }
+
+dirents.back().setCluster(clusterOffsets.size(), cluster->count());
+cluster->addBlob(blob);
+myDirents->push_back(dirents.size()-1);
   }
 
   // When we've seen all articles, write any remaining clusters.

-- 
To view, visit https://gerrit.wikimedia.org/r/315238
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I77a581df46ae87e01a3fe2689570a7c7355d1877
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Do not cache uncompressed cluster by default.

2016-10-12 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Do not cache uncompressed cluster by default.
..


Do not cache uncompressed cluster by default.

Uncompressed cluster can be big (if they contain videos for example).
We should not cache them by default before having a cache system who
limits itself in memory used.

Change-Id: If307af3f91a614b943fa408b2bf30e2016ebfe81
---
M zimlib/include/zim/fileimpl.h
M zimlib/src/fileimpl.cpp
2 files changed, 10 insertions(+), 3 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/fileimpl.h b/zimlib/include/zim/fileimpl.h
index 1cf584d..ddf7bc1 100644
--- a/zimlib/include/zim/fileimpl.h
+++ b/zimlib/include/zim/fileimpl.h
@@ -41,6 +41,7 @@
 
   Cache direntCache;
   Cache clusterCache;
+  bool cacheUncompressedCluster;
   typedef std::map NamespaceCache;
   NamespaceCache namespaceBeginCache;
   NamespaceCache namespaceEndCache;
diff --git a/zimlib/src/fileimpl.cpp b/zimlib/src/fileimpl.cpp
index 2aa9e1f..a1b4eee 100644
--- a/zimlib/src/fileimpl.cpp
+++ b/zimlib/src/fileimpl.cpp
@@ -41,7 +41,8 @@
   FileImpl::FileImpl(const char* fname)
 : zimFile(fname),
   direntCache(envValue("ZIM_DIRENTCACHE", DIRENT_CACHE_SIZE)),
-  clusterCache(envValue("ZIM_CLUSTERCACHE", CLUSTER_CACHE_SIZE))
+  clusterCache(envValue("ZIM_CLUSTERCACHE", CLUSTER_CACHE_SIZE)),
+  cacheUncompressedCluster(envValue("ZIM_CACHEUNCOMPRESSEDCLUSTER", false))
   {
 log_trace("read file \"" << fname << '"');
 
@@ -181,8 +182,13 @@
 if (zimFile.fail())
   throw ZimFileFormatError("error reading cluster data");
 
-log_debug("put cluster " << idx << " into cluster cache; hits " << 
clusterCache.getHits() << " misses " << clusterCache.getMisses() << " ratio " 
<< clusterCache.hitRatio() * 100 << "% fillfactor " << 
clusterCache.fillfactor());
-clusterCache.put(idx, cluster);
+if (cacheUncompressedCluster || cluster.isCompressed())
+{
+  log_debug("put cluster " << idx << " into cluster cache; hits " << 
clusterCache.getHits() << " misses " << clusterCache.getMisses() << " ratio " 
<< clusterCache.hitRatio() * 100 << "% fillfactor " << 
clusterCache.fillfactor());
+  clusterCache.put(idx, cluster);
+}
+else
+  log_debug("cluster " << idx << " is not compressed - do not cache");
 
 return cluster;
   }

-- 
To view, visit https://gerrit.wikimedia.org/r/315237
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If307af3f91a614b943fa408b2bf30e2016ebfe81
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Always try to cache a cluster, even if it is a uncompressed ...

2016-10-09 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Always try to cache a cluster, even if it is a uncompressed one.
..


Always try to cache a cluster, even if it is a uncompressed one.

Creating a cluster is costly. We need to avoid it.

Change-Id: I5a5c384a13d77387d1d7ae020df08c010f82502c
---
M zimlib/src/fileimpl.cpp
1 file changed, 2 insertions(+), 7 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/src/fileimpl.cpp b/zimlib/src/fileimpl.cpp
index 8c072eb..04fcacd 100644
--- a/zimlib/src/fileimpl.cpp
+++ b/zimlib/src/fileimpl.cpp
@@ -182,13 +182,8 @@
 if (zimFile.fail())
   throw ZimFileFormatError("error reading cluster data");
 
-if (cluster.isCompressed())
-{
-  log_debug("put cluster " << idx << " into cluster cache; hits " << 
clusterCache.getHits() << " misses " << clusterCache.getMisses() << " ratio " 
<< clusterCache.hitRatio() * 100 << "% fillfactor " << 
clusterCache.fillfactor());
-  clusterCache.put(idx, cluster);
-}
-else
-  log_debug("cluster " << idx << " is not compressed - do not cache");
+log_debug("put cluster " << idx << " into cluster cache; hits " << 
clusterCache.getHits() << " misses " << clusterCache.getMisses() << " ratio " 
<< clusterCache.hitRatio() * 100 << "% fillfactor " << 
clusterCache.fillfactor());
+clusterCache.put(idx, cluster);
 
 return cluster;
   }

-- 
To view, visit https://gerrit.wikimedia.org/r/314721
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5a5c384a13d77387d1d7ae020df08c010f82502c
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Invalidate internal streambuf buffer when we change it.

2016-10-09 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Invalidate internal streambuf buffer when we change it.
..


Invalidate internal streambuf buffer when we change it.

It may be not necessary as it never fails before, but let's be cautious.

Change-Id: Ie54f95e08e2683c43ef7b0fdc70bd9f74fb1fbe9
---
M zimlib/include/zim/fstream.h
M zimlib/src/fstream.cpp
2 files changed, 2 insertions(+), 1 deletion(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/fstream.h b/zimlib/include/zim/fstream.h
index 970920e..4b99814 100644
--- a/zimlib/include/zim/fstream.h
+++ b/zimlib/include/zim/fstream.h
@@ -75,7 +75,7 @@
 
   void seekg(zim::offset_type off);
   void setBufsize(unsigned s)
-  { buffer.resize(s); }
+  { buffer.resize(s); setg(0, 0, 0);}
   zim::offset_type fsize() const;
   time_t getMTime() const;
   };
diff --git a/zimlib/src/fstream.cpp b/zimlib/src/fstream.cpp
index b925fc5..ef91b57 100644
--- a/zimlib/src/fstream.cpp
+++ b/zimlib/src/fstream.cpp
@@ -258,6 +258,7 @@
   throw std::runtime_error(msg.str());
 }
   }
+  setg(0, 0, 0);
 }
 
 void streambuf::seekg(zim::offset_type off)

-- 
To view, visit https://gerrit.wikimedia.org/r/314720
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie54f95e08e2683c43ef7b0fdc70bd9f74fb1fbe9
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Remove unused currentPos from fstream.

2016-10-09 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged.

Change subject: Remove unused currentPos from fstream.
..


Remove unused currentPos from fstream.

Change-Id: Idd7e392b35dbe36e5d4ee4a03f6119bb01ab4e2e
---
M zimlib/include/zim/fstream.h
M zimlib/src/fstream.cpp
2 files changed, 0 insertions(+), 2 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/fstream.h b/zimlib/include/zim/fstream.h
index 4b99814..b971da0 100644
--- a/zimlib/include/zim/fstream.h
+++ b/zimlib/include/zim/fstream.h
@@ -60,7 +60,6 @@
   FilesType files;
   OpenFilesCacheType openFilesCache;
   OpenfileInfoPtr currentFile;
-  zim::offset_type currentPos;
 
   std::streambuf::int_type overflow(std::streambuf::int_type ch);
   std::streambuf::int_type underflow();
diff --git a/zimlib/src/fstream.cpp b/zimlib/src/fstream.cpp
index ef91b57..b8e5a98 100644
--- a/zimlib/src/fstream.cpp
+++ b/zimlib/src/fstream.cpp
@@ -264,7 +264,6 @@
 void streambuf::seekg(zim::offset_type off)
 {
   setg(0, 0, 0);
-  currentPos = off;
 
   zim::offset_type o = off;
   FilesType::iterator it;

-- 
To view, visit https://gerrit.wikimedia.org/r/314719
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Idd7e392b35dbe36e5d4ee4a03f6119bb01ab4e2e
Gerrit-PatchSet: 3
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Add libzim.pc for pkg-config

2016-12-31 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/329759 )

Change subject: Add libzim.pc for pkg-config
..


Add libzim.pc for pkg-config

This registers the libzim library with pkg-config, mostly so adding it
as a dependency with meson works as expected.

After building and installing the library, the following should work:
 $ pkg-config --modversion libzim

Change-Id: Ieb41c0b3a9445e6d651f8c0d528d30c2ebaf06dc
---
M zimlib/.gitignore
M zimlib/Makefile.am
M zimlib/configure.ac
A zimlib/libzim.pc.in
4 files changed, 16 insertions(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/.gitignore b/zimlib/.gitignore
index 0a4a991..550c09a 100644
--- a/zimlib/.gitignore
+++ b/zimlib/.gitignore
@@ -27,3 +27,4 @@
 examples/createZimExample
 src/tools/zimdump
 src/tools/zimsearch
+libzim.pc
diff --git a/zimlib/Makefile.am b/zimlib/Makefile.am
index 504fe34..4053e7f 100644
--- a/zimlib/Makefile.am
+++ b/zimlib/Makefile.am
@@ -13,3 +13,6 @@
include \
$(UNITTEST_DIR) \
examples
+
+pkgconfigdir = $(libdir)/pkgconfig
+pkgconfig_DATA = libzim.pc
diff --git a/zimlib/configure.ac b/zimlib/configure.ac
index 0472538..da71e40 100644
--- a/zimlib/configure.ac
+++ b/zimlib/configure.ac
@@ -146,6 +146,7 @@
 # output
 #
 AC_CONFIG_FILES([
+  libzim.pc
   Makefile
   src/Makefile
   src/tools/Makefile
diff --git a/zimlib/libzim.pc.in b/zimlib/libzim.pc.in
new file mode 100644
index 000..bbef0d6
--- /dev/null
+++ b/zimlib/libzim.pc.in
@@ -0,0 +1,11 @@
+prefix=@prefix@
+exec_prefix=@exec_prefix@
+libdir=@libdir@
+includedir=@includedir@
+
+Name: libzim
+Description: implements read and write methods for ZIM files
+Version: @VERSION@
+Libs: -L${libdir} -lzim
+Cflags: -I${includedir}
+

-- 
To view, visit https://gerrit.wikimedia.org/r/329759
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ieb41c0b3a9445e6d651f8c0d528d30c2ebaf06dc
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Legoktm 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: pkg_config dependencies are properly declared.

2017-03-23 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/333593 )

Change subject: pkg_config dependencies are properly declared.
..


pkg_config dependencies are properly declared.

The 'Requires' field now use a variable from the configure to properly set
the dependencies depending of what compression algorithms are really used.

Change-Id: I282a8039bce4cfec23ae13d8a3240d45c5cab1ac
---
M zimlib/configure.ac
M zimlib/libzim.pc.in
2 files changed, 7 insertions(+), 1 deletion(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/configure.ac b/zimlib/configure.ac
index da71e40..6be35df 100644
--- a/zimlib/configure.ac
+++ b/zimlib/configure.ac
@@ -65,6 +65,7 @@
 # compression algorithms
 #
 
+pkg_config_deps=""
 # zlib
 AC_ARG_ENABLE([zlib],
   AS_HELP_STRING([--enable-zlib], [add support for zlib compression (disabled 
by default)]),
@@ -75,6 +76,7 @@
 then
 AC_CHECK_HEADER([zlib.h], , AC_MSG_ERROR([zlib header not found]))
 AC_DEFINE(ENABLE_ZLIB, [1], [defined if zlib compression is enabled])
+pkg_config_deps+=" zlib"
 fi
 
 AM_CONDITIONAL(WITH_ZLIB, test "$enable_zlib" = "yes")
@@ -89,6 +91,7 @@
 then
 AC_CHECK_HEADER([bzlib.h], , AC_MSG_ERROR([bzip2 header files not found]))
 AC_DEFINE(ENABLE_BZIP2, [1], [defined if bzip2 compression is enabled])
+pkg_config_deps+=" bzip2"
 fi
 
 AM_CONDITIONAL(WITH_BZIP2, test "$enable_bzip2" = "yes")
@@ -103,10 +106,13 @@
 then
 AC_CHECK_HEADER([lzma.h], , AC_MSG_ERROR([lzma header files not found]))
 AC_DEFINE(ENABLE_LZMA, [1], [defined if lzma compression is enabled])
+pkg_config_deps+=" liblzma"
 fi
 
 AM_CONDITIONAL(WITH_LZMA, test "$enable_lzma" = "yes")
 
+AC_SUBST(PKG_CONFIG_DEPENDENCIES, $pkg_config_deps)
+
 #
 # unittest
 #
diff --git a/zimlib/libzim.pc.in b/zimlib/libzim.pc.in
index 58cc155..d17e236 100644
--- a/zimlib/libzim.pc.in
+++ b/zimlib/libzim.pc.in
@@ -6,7 +6,7 @@
 Name: libzim
 Description: implements read and write methods for ZIM files
 Version: @VERSION@
-Requires: liblzma
+Requires: @PKG_CONFIG_DEPENDENCIES@
 Libs: -L${libdir} -lzim
 Cflags: -I${includedir}
 

-- 
To view, visit https://gerrit.wikimedia.org/r/333593
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I282a8039bce4cfec23ae13d8a3240d45c5cab1ac
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Add the valuesmap metadata.

2017-03-23 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/343896 )

Change subject: Add the valuesmap metadata.
..


Add the valuesmap metadata.

This way, the searcher will know where values are stored.

Change-Id: I1228ebfefd70dc7c22fc37034c50d0019968c796
---
M zimwriterfs/xapianIndexer.cpp
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/xapianIndexer.cpp b/zimwriterfs/xapianIndexer.cpp
index 6cb6713..65129b7 100644
--- a/zimwriterfs/xapianIndexer.cpp
+++ b/zimwriterfs/xapianIndexer.cpp
@@ -52,6 +52,7 @@
 void XapianIndexer::indexingPrelude(const string indexPath_) {
 indexPath = indexPath_;
 this->writableDatabase = Xapian::WritableDatabase(indexPath + ".tmp", 
Xapian::DB_CREATE_OR_OVERWRITE);
+this->writableDatabase.set_metadata("valuesmap", 
"title:0;snippet:1;size:2;wordcount:3");
 this->writableDatabase.begin_transaction(true);
 
 /* Insert the stopwords */

-- 
To view, visit https://gerrit.wikimedia.org/r/343896
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1228ebfefd70dc7c22fc37034c50d0019968c796
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Few clean and better verbose message.

2017-03-23 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/343898 )

Change subject: Few clean and better verbose message.
..


Few clean and better verbose message.

Change-Id: Id4675f70422ecef42198ca33fe82bc1f33866548
---
M zimwriterfs/indexer.cpp
M zimwriterfs/indexer.h
2 files changed, 4 insertions(+), 8 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/indexer.cpp b/zimwriterfs/indexer.cpp
index b83abd4..33989f4 100644
--- a/zimwriterfs/indexer.cpp
+++ b/zimwriterfs/indexer.cpp
@@ -89,6 +89,10 @@
 
   indexedArticleCount += 1;
 
+  if ( (indexedArticleCount % 1000 == 0) && self->getVerboseFlag()) {
+  std::cout << indexedArticleCount << " articled indexed." <flush();
@@ -137,10 +141,6 @@
   bool Indexer::popFromToIndexQueue(indexerToken ) {
 while (this->isToIndexQueueEmpty()) {
   usleep(500);
-  if (this->getVerboseFlag()) {
-   std::cout << "Waiting... ToIndexQueue is empty for now..." << std::endl;
-  }
-
   pthread_testcancel();
 }
 
diff --git a/zimwriterfs/indexer.h b/zimwriterfs/indexer.h
index 3291e36..686d156 100644
--- a/zimwriterfs/indexer.h
+++ b/zimwriterfs/indexer.h
@@ -29,14 +29,10 @@
 #include 
 
 #include 
-/*#include 
-#include 
-#include */
 #include 
 #include 
 #include 
 #include 
-/*#include "reader.h"*/
 
 using namespace std;
 

-- 
To view, visit https://gerrit.wikimedia.org/r/343898
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id4675f70422ecef42198ca33fe82bc1f33866548
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Do not store the snippet nor the size of the content in the ...

2017-03-23 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/343897 )

Change subject: Do not store the snippet nor the size of the content in the 
database.
..


Do not store the snippet nor the size of the content in the database.

Change-Id: I354a1e76dd2214e844d67ddb4b94f43087664729
---
M zimwriterfs/indexer.cpp
M zimwriterfs/indexer.h
M zimwriterfs/xapianIndexer.cpp
M zimwriterfs/xapianIndexer.h
4 files changed, 2 insertions(+), 28 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/indexer.cpp b/zimwriterfs/indexer.cpp
index 6c26fc9..b83abd4 100644
--- a/zimwriterfs/indexer.cpp
+++ b/zimwriterfs/indexer.cpp
@@ -84,8 +84,6 @@
  token.title,
  token.keywords,
  token.content,
- token.snippet,
- token.size,
  token.wordCount
  );
 
diff --git a/zimwriterfs/indexer.h b/zimwriterfs/indexer.h
index 02d989b..3291e36 100644
--- a/zimwriterfs/indexer.h
+++ b/zimwriterfs/indexer.h
@@ -46,8 +46,6 @@
 string title;
 string keywords;
 string content;
-string snippet;
-string size;
 string wordCount;
 };
 
@@ -70,8 +68,6 @@
   const string ,
   const string ,
   const string ,
-  const string ,
-  const string ,
   const string ) = 0;
 virtual void flush() = 0;
 virtual void indexingPostlude() = 0;
diff --git a/zimwriterfs/xapianIndexer.cpp b/zimwriterfs/xapianIndexer.cpp
index 65129b7..db27f9d 100644
--- a/zimwriterfs/xapianIndexer.cpp
+++ b/zimwriterfs/xapianIndexer.cpp
@@ -52,7 +52,7 @@
 void XapianIndexer::indexingPrelude(const string indexPath_) {
 indexPath = indexPath_;
 this->writableDatabase = Xapian::WritableDatabase(indexPath + ".tmp", 
Xapian::DB_CREATE_OR_OVERWRITE);
-this->writableDatabase.set_metadata("valuesmap", 
"title:0;snippet:1;size:2;wordcount:3");
+this->writableDatabase.set_metadata("valuesmap", "title:0;wordcount:1");
 this->writableDatabase.begin_transaction(true);
 
 /* Insert the stopwords */
@@ -72,17 +72,13 @@
   const string ,
   const string ,
   const string ,
-  const string ,
-  const string ,
   const string ) {
 
 /* Put the data in the document */
 Xapian::Document currentDocument;
 currentDocument.clear_values();
 currentDocument.add_value(0, title);
-currentDocument.add_value(1, snippet);
-currentDocument.add_value(2, size);
-currentDocument.add_value(3, wordCount);
+currentDocument.add_value(1, wordCount);
 currentDocument.set_data(url);
 indexer.set_document(currentDocument);
 
@@ -149,20 +145,6 @@
stringstream countWordStringStream;
countWordStringStream << countWords(htmlParser.dump);
token.wordCount = countWordStringStream.str();
-
-   /* snippet */
-   std::string snippet = std::string(htmlParser.dump, 0, 300);
-   std::string::size_type last = snippet.find_last_of('.');
-   if (last == snippet.npos)
- last = snippet.find_last_of(' ');
-   if (last != snippet.npos)
- snippet = snippet.substr(0, last);
-   token.snippet = snippet;
-
-   /* size */
-   stringstream sizeStringStream;
-   sizeStringStream << token.content.size() / 1024;
-   token.size = sizeStringStream.str();
 
/* Remove accent */
token.title = removeAccents(token.accentedTitle);
diff --git a/zimwriterfs/xapianIndexer.h b/zimwriterfs/xapianIndexer.h
index 1d854da..16dc094 100644
--- a/zimwriterfs/xapianIndexer.h
+++ b/zimwriterfs/xapianIndexer.h
@@ -61,8 +61,6 @@
const string ,
const string ,
const string ,
-   const string ,
-   const string ,
const string );
 void flush();
 void indexingPostlude();

-- 
To view, visit https://gerrit.wikimedia.org/r/343897
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I354a1e76dd2214e844d67ddb4b94f43087664729
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Store the stop words used by the indexer in the database.

2017-03-27 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/344963 )

Change subject: Store the stop words used by the indexer in the database.
..


Store the stop words used by the indexer in the database.

To properly search in the database, a user need to use the same stop word
that the ones used while indexing the content.

By storing the used stop words in the database, a user code can use them
on its side and correctly parse a query.
It is not enough to store the language as we need to read a file/resources
and user code may not have them.

Change-Id: I6cbc9f8d30c39d4fc1e65a356347d8fbfd456494
---
M zimwriterfs/indexer.cpp
M zimwriterfs/indexer.h
M zimwriterfs/xapianIndexer.cpp
M zimwriterfs/xapianIndexer.h
4 files changed, 14 insertions(+), 28 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/indexer.cpp b/zimwriterfs/indexer.cpp
index 33989f4..8e0c211 100644
--- a/zimwriterfs/indexer.cpp
+++ b/zimwriterfs/indexer.cpp
@@ -57,18 +57,6 @@
   Indexer::~Indexer() {
   }
 
-  /* Read the stopwords */
-  void Indexer::readStopWords(const string languageCode) {
-std::string stopWord;
-std::istringstream file(getResourceAsString("stopwords/" + languageCode));
-
-this->stopWords.clear();
-
-while (getline(file, stopWord, '\n')) {
-  this->stopWords.push_back(stopWord);
-}
-  }
-
   /* Article indexer methods */
   void *Indexer::indexArticles(void *ptr) {
 pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
diff --git a/zimwriterfs/indexer.h b/zimwriterfs/indexer.h
index 686d156..797db7a 100644
--- a/zimwriterfs/indexer.h
+++ b/zimwriterfs/indexer.h
@@ -68,10 +68,6 @@
 virtual void flush() = 0;
 virtual void indexingPostlude() = 0;
 
-/* Stop words */
-std::vector stopWords;
-void readStopWords(const string languageCode);
-
 /* Others */
 unsigned int countWords(const string );
 
diff --git a/zimwriterfs/xapianIndexer.cpp b/zimwriterfs/xapianIndexer.cpp
index 5abec26..c4c0b2e 100644
--- a/zimwriterfs/xapianIndexer.cpp
+++ b/zimwriterfs/xapianIndexer.cpp
@@ -18,13 +18,13 @@
  */
 
 #include "xapianIndexer.h"
+#include "resourceTools.h"
 
 /* Constructor */
 XapianIndexer::XapianIndexer(const std::string& language, const bool verbose) :
 language(language)
 {
 setVerboseFlag(verbose);
-readStopWords(language);
 
 /* Build ICU Local object to retrieve ISO-639 language code (from
ISO-639-3) */
@@ -38,6 +38,17 @@
 } catch (...) {
 std::cout << "No steemming for language '" << 
languageLocale.getLanguage() << "'" << std::endl;
 }
+
+ /* Read the stopwords */
+std::string stopWord;
+this->stopwords = getResourceAsString("stopwords/"+language);
+std::istringstream file(this->stopwords);
+while (std::getline(file, stopWord, '\n')) {
+this->stopper.add(stopWord);
+}
+
+this->indexer.set_stopper(&(this->stopper));
+this->indexer.set_stopper_strategy(Xapian::TermGenerator::STOP_ALL);
 }
 
 XapianIndexer::~XapianIndexer(){
@@ -56,18 +67,8 @@
 this->writableDatabase = Xapian::WritableDatabase(indexPath + ".tmp", 
Xapian::DB_CREATE_OR_OVERWRITE);
 this->writableDatabase.set_metadata("valuesmap", "title:0;wordcount:1");
 this->writableDatabase.set_metadata("language", language);
+this->writableDatabase.set_metadata("stopwords", stopwords);
 this->writableDatabase.begin_transaction(true);
-
-/* Insert the stopwords */
-if (!this->stopWords.empty()) {
-  std::vector::iterator it = this->stopWords.begin();
-  for( ; it != this->stopWords.end(); ++it) {
-   this->stopper.add(*it);
-  }
-
-  this->indexer.set_stopper(&(this->stopper));
-  this->indexer.set_stopper_strategy(Xapian::TermGenerator::STOP_ALL);
-}
 }
 
 void XapianIndexer::index(const string ,
diff --git a/zimwriterfs/xapianIndexer.h b/zimwriterfs/xapianIndexer.h
index 8f85337..510692d 100644
--- a/zimwriterfs/xapianIndexer.h
+++ b/zimwriterfs/xapianIndexer.h
@@ -74,6 +74,7 @@
 Xapian::TermGenerator indexer;
 std::string indexPath;
 std::string language;
+std::string stopwords;
 };
 
 #endif // OPENZIM_ZIMWRITERFS_XAPIANINDEXER_H

-- 
To view, visit https://gerrit.wikimedia.org/r/344963
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6cbc9f8d30c39d4fc1e65a356347d8fbfd456494
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Store the language used by the stemmer in the database.

2017-03-27 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/344962 )

Change subject: Store the language used by the stemmer in the database.
..


Store the language used by the stemmer in the database.

To properly search in the database, a user need to use the same stemming
algorithm/data that the one use while indexing the content.

By storing the used language in the database, a user code can create the
same stemmer on its side and correctly parse a query.

Change-Id: Idb7049f3639d4e96f50ca1af6bc491096ec2d52f
---
M zimwriterfs/xapianIndexer.cpp
M zimwriterfs/xapianIndexer.h
2 files changed, 10 insertions(+), 6 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/xapianIndexer.cpp b/zimwriterfs/xapianIndexer.cpp
index db27f9d..5abec26 100644
--- a/zimwriterfs/xapianIndexer.cpp
+++ b/zimwriterfs/xapianIndexer.cpp
@@ -20,21 +20,23 @@
 #include "xapianIndexer.h"
 
 /* Constructor */
-XapianIndexer::XapianIndexer(const std::string& language, const bool verbose) {
+XapianIndexer::XapianIndexer(const std::string& language, const bool verbose) :
+language(language)
+{
 setVerboseFlag(verbose);
 readStopWords(language);
 
 /* Build ICU Local object to retrieve ISO-639 language code (from
ISO-639-3) */
-icu::Locale *languageLocale = new icu::Locale(language.c_str());
+icu::Locale languageLocale(language.c_str());
 
 /* Configuring language base steemming */
 try {
-  this->stemmer = Xapian::Stem(languageLocale->getLanguage());
-  this->indexer.set_stemmer(this->stemmer);
-  this->indexer.set_stemming_strategy(Xapian::TermGenerator::STEM_ALL);
+this->stemmer = Xapian::Stem(languageLocale.getLanguage());
+this->indexer.set_stemmer(this->stemmer);
+this->indexer.set_stemming_strategy(Xapian::TermGenerator::STEM_ALL);
 } catch (...) {
-  std::cout << "No steemming for language '" << 
languageLocale->getLanguage() << "'" << std::endl;
+std::cout << "No steemming for language '" << 
languageLocale.getLanguage() << "'" << std::endl;
 }
 }
 
@@ -53,6 +55,7 @@
 indexPath = indexPath_;
 this->writableDatabase = Xapian::WritableDatabase(indexPath + ".tmp", 
Xapian::DB_CREATE_OR_OVERWRITE);
 this->writableDatabase.set_metadata("valuesmap", "title:0;wordcount:1");
+this->writableDatabase.set_metadata("language", language);
 this->writableDatabase.begin_transaction(true);
 
 /* Insert the stopwords */
diff --git a/zimwriterfs/xapianIndexer.h b/zimwriterfs/xapianIndexer.h
index 16dc094..8f85337 100644
--- a/zimwriterfs/xapianIndexer.h
+++ b/zimwriterfs/xapianIndexer.h
@@ -73,6 +73,7 @@
 Xapian::SimpleStopper stopper;
 Xapian::TermGenerator indexer;
 std::string indexPath;
+std::string language;
 };
 
 #endif // OPENZIM_ZIMWRITERFS_XAPIANINDEXER_H

-- 
To view, visit https://gerrit.wikimedia.org/r/344962
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Idb7049f3639d4e96f50ca1af6bc491096ec2d52f
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Actually call the (previously-unused) ArticleSource#setFilen...

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296632 )

Change subject: Actually call the (previously-unused) ArticleSource#setFilename 
method.
..


Actually call the (previously-unused) ArticleSource#setFilename method.

Change-Id: I00d340f86c91419f1237976f6eb636ea8c32a743
---
M zimlib/src/zimcreator.cpp
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index 9c024a7..1b528f4 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -110,6 +110,7 @@
  ? fname.substr(0, fname.size() - 4)
  : fname;
   log_debug("basename " << basename);
+  src.setFilename(fname);
 
   INFO("create directory entries");
   createDirentsAndClusters(src, basename + ".tmp");

-- 
To view, visit https://gerrit.wikimedia.org/r/296632
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I00d340f86c91419f1237976f6eb636ea8c32a743
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: C. Scott Ananian 
Gerrit-Reviewer: Kelson 
Gerrit-Reviewer: Mgautierfr 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Bug fix: don't store pointers inside a dynamic vector.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296644 )

Change subject: Bug fix: don't store pointers inside a dynamic vector.
..


Bug fix: don't store pointers inside a dynamic vector.

Use offsets instead, since the actual objects change location whenever
the std::vector resizes its internal storage.

This is a follow-up to f5de40f94b30795f42bb9388cbb46df9cd605167.

Change-Id: I166aa8dd209dd2755e68be70829f269a71a3aaca
---
M zimlib/include/zim/writer/zimcreator.h
M zimlib/src/zimcreator.cpp
2 files changed, 10 insertions(+), 8 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/writer/zimcreator.h 
b/zimlib/include/zim/writer/zimcreator.h
index 2e52d7e..8ce2c44 100644
--- a/zimlib/include/zim/writer/zimcreator.h
+++ b/zimlib/include/zim/writer/zimcreator.h
@@ -33,7 +33,7 @@
 {
   public:
 typedef std::vector DirentsType;
-typedef std::vector DirentPtrsType;
+typedef std::vector DirentPtrsType;
 typedef std::vector SizeVectorType;
 typedef std::vector OffsetsType;
 typedef std::map MimeTypes;
diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index 46c550f..9c024a7 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -232,7 +232,7 @@
 }
 dirents.back().setCluster(clusterOffsets.size(), cluster->count());
 cluster->addBlob(blob);
-myDirents->push_back(&(dirents.back()));
+myDirents->push_back(dirents.size()-1);
 
 // If cluster is now large enough, write it to disk.
 if (cluster->size() >= minChunkSize * 1024)
@@ -247,10 +247,11 @@
   cluster->clear();
   myDirents->clear();
   // Update the cluster number of the dirents *not* written to disk.
-  for (DirentPtrsType::iterator di = otherDirents->begin();
-   di != otherDirents->end(); ++di)
+  for (DirentPtrsType::iterator dpi = otherDirents->begin();
+   dpi != otherDirents->end(); ++dpi)
   {
-(*di)->setCluster(clusterOffsets.size(), (*di)->getBlobNumber());
+Dirent *di = [*dpi];
+di->setCluster(clusterOffsets.size(), di->getBlobNumber());
   }
   offset_type end = out.tellp();
   currentSize += (end - start) +
@@ -263,10 +264,11 @@
   {
 clusterOffsets.push_back(out.tellp());
 out << compCluster;
-for (DirentPtrsType::iterator di = uncompDirents.begin();
- di != uncompDirents.end(); ++di)
+for (DirentPtrsType::iterator dpi = uncompDirents.begin();
+ dpi != uncompDirents.end(); ++dpi)
 {
-  (*di)->setCluster(clusterOffsets.size(), (*di)->getBlobNumber());
+  Dirent *di = [*dpi];
+  di->setCluster(clusterOffsets.size(), di->getBlobNumber());
 }
   }
   compCluster.clear();

-- 
To view, visit https://gerrit.wikimedia.org/r/296644
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I166aa8dd209dd2755e68be70829f269a71a3aaca
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: C. Scott Ananian 
Gerrit-Reviewer: Kelson 
Gerrit-Reviewer: Mgautierfr 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Move article's related stuffs in article.(h|cpp).

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295516 )

Change subject: Move article's related stuffs in article.(h|cpp).
..


Move article's related stuffs in article.(h|cpp).

Change-Id: I2a257ea1a0a13eca0748b444838a525666a9090d
---
M zimwriterfs/Makefile.am
A zimwriterfs/article.cpp
A zimwriterfs/article.h
M zimwriterfs/zimwriterfs.cpp
4 files changed, 253 insertions(+), 199 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index ea2ab7a..3383e35 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -3,4 +3,5 @@
 
 zimwriterfs_SOURCES= \
 zimwriterfs.cpp \
-tools.cpp
+tools.cpp \
+article.cpp
diff --git a/zimwriterfs/article.cpp b/zimwriterfs/article.cpp
new file mode 100644
index 000..f743cde
--- /dev/null
+++ b/zimwriterfs/article.cpp
@@ -0,0 +1,158 @@
+/*
+ * Copyright 2013-2016 Emmanuel Engelhart 
+ * Copyright 2016 Matthieu Gautier 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU  General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ * MA 02110-1301, USA.
+ */
+
+#include "article.h"
+#include "tools.h"
+
+
+extern std::string directoryPath;
+
+Article::Article(const std::string& path, const bool detectRedirects) {
+  invalid = false;
+
+  /* aid */
+  aid = path.substr(directoryPath.size()+1);
+
+  /* url */
+  url = aid;
+
+  /* mime-type */
+  mimeType = getMimeTypeForFile(aid);
+  
+  /* namespace */
+  ns = getNamespaceForMimeType(mimeType)[0];
+
+  /* HTML specific code */
+  if (mimeType.find("text/html") != std::string::npos) {
+std::size_t found;
+std::string html = getFileContent(path);
+GumboOutput* output = gumbo_parse(html.c_str());
+GumboNode* root = output->root;
+
+/* Search the content of the  tag in the HTML */
+if (root->type == GUMBO_NODE_ELEMENT && root->v.element.children.length >= 
2) {
+  const GumboVector* root_children = >v.element.children;
+  GumboNode* head = NULL;
+  for (int i = 0; i < root_children->length; ++i) {
+   GumboNode* child = (GumboNode*)(root_children->data[i]);
+   if (child->type == GUMBO_NODE_ELEMENT &&
+   child->v.element.tag == GUMBO_TAG_HEAD) {
+ head = child;
+ break;
+   }
+  }
+
+  if (head != NULL) {
+   GumboVector* head_children = >v.element.children;
+   for (int i = 0; i < head_children->length; ++i) {
+ GumboNode* child = (GumboNode*)(head_children->data[i]);
+ if (child->type == GUMBO_NODE_ELEMENT &&
+ child->v.element.tag == GUMBO_TAG_TITLE) {
+   if (child->v.element.children.length == 1) {
+ GumboNode* title_text = 
(GumboNode*)(child->v.element.children.data[0]);
+ if (title_text->type == GUMBO_NODE_TEXT) {
+   title = title_text->v.text.text;
+ }
+   }
+ }
+   }
+
+   /* Detect if this is a redirection (if no redirects CSV specified) */
+   std::string targetUrl;
+   try {
+ targetUrl = detectRedirects ? 
extractRedirectUrlFromHtml(head_children) : "";
+   } catch (std::string ) {
+ std::cerr << error << std::endl;
+   }
+   if (!targetUrl.empty()) {
+ redirectAid = computeAbsolutePath(aid, decodeUrl(targetUrl));
+ if (!fileExists(directoryPath + "/" + redirectAid)) {
+   redirectAid.clear();
+   invalid = true;
+ }
+   }
+  }
+
+  /* If no title, then compute one from the filename */
+  if (title.empty()) {
+   found = path.rfind("/");
+   if (found != std::string::npos) {
+ title = path.substr(found+1);
+ found = title.rfind(".");
+ if (found!=std::string::npos) {
+   title = title.substr(0, found);
+ }
+   } else {
+ title = path;
+   }
+   std::replace(title.begin(), title.end(), '_',  ' ');
+  }
+}
+
+gumbo_destroy_output(, output);
+  }
+}
+
+std::string Article::getAid() const
+{
+  return aid;
+}
+
+bool Article::isInvalid() const
+{
+  return invalid;
+}
+
+char Article::getNamespace() const
+{
+  return ns;
+}
+
+std::string Article::getUrl() const
+{
+  return url;
+}
+
+std::string 

[MediaWiki-commits] [Gerrit] openzim[master]: Port zimwriterfs to the new API.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296911 )

Change subject: Port zimwriterfs to the new API.
..


Port zimwriterfs to the new API.

No more ArticleSource::getData.

Change-Id: I76cd6f3e7e4a390ed6a58cf9815dda2a2f1bfde5
---
M zimwriterfs/Makefile.am
M zimwriterfs/article.cpp
M zimwriterfs/article.h
M zimwriterfs/articlesource.cpp
M zimwriterfs/articlesource.h
A zimwriterfs/mimetypecounter.cpp
A zimwriterfs/mimetypecounter.h
M zimwriterfs/zimwriterfs.cpp
8 files changed, 376 insertions(+), 248 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index 6e46553..92641d9 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -5,4 +5,5 @@
 zimwriterfs.cpp \
 tools.cpp \
 article.cpp \
-articlesource.cpp
+articlesource.cpp \
+mimetypecounter.cpp
diff --git a/zimwriterfs/article.cpp b/zimwriterfs/article.cpp
index 98ec882..3840f7b 100644
--- a/zimwriterfs/article.cpp
+++ b/zimwriterfs/article.cpp
@@ -21,11 +21,61 @@
 #include "article.h"
 #include "tools.h"
 
+#include 
+#include 
+
 
 extern std::string directoryPath;
 
-Article::Article(ArticleSource* source, const std::string& path, const bool 
detectRedirects):
-source(source)
+std::string Article::getAid() const
+{
+  return aid;
+}
+
+bool Article::isInvalid() const
+{
+  return invalid;
+}
+
+char Article::getNamespace() const
+{
+  return ns;
+}
+
+std::string Article::getUrl() const
+{
+  return url;
+}
+
+std::string Article::getTitle() const
+{
+  return title;
+}
+
+bool Article::isRedirect() const
+{
+  return !redirectAid.empty();
+}
+
+std::string Article::getMimeType() const
+{
+  return mimeType;
+}
+
+std::string Article::getRedirectAid() const
+{
+  return redirectAid;
+}
+
+bool Article::shouldCompress() const {
+  return (getMimeType().find("text") == 0 ||
+ getMimeType() == "application/javascript" ||
+ getMimeType() == "application/json" ||
+  getMimeType() == "image/svg+xml" ? true : false);
+}
+
+FileArticle::FileArticle(const std::string& path, const bool detectRedirects):
+dataRead(false)
 {
   invalid = false;
 
@@ -109,57 +159,125 @@
   }
 }
 
+/* Update links in the html to let them still be valid */
+std::map links;
+getLinks(root, links);
+std::map::iterator it;
+
+/* If a link appearch to be duplicated in the HTML, it will
+   occurs only one time in the links variable */
+for(it = links.begin(); it != links.end(); it++) {
+  if (!it->first.empty()
+&& it->first[0] != '#'
+&& it->first[0] != '?'
+&& it->first.substr(0, 5) != "data:") {
+replaceStringInPlace(html, "\"" + it->first + "\"", "\"" + 
computeNewUrl(aid, it->first) + "\"");
+  }
+}
+
+data = html;
+dataRead = true;
+
 gumbo_destroy_output(, output);
   }
 }
 
-std::string Article::getAid() const
+zim::Blob FileArticle::getData() const {
+if ( dataRead )
+return zim::Blob(data.data(), data.size());;
+
+std::string aidPath = directoryPath + "/" + aid;
+std::string fileContent = getFileContent(aidPath);
+
+if (getMimeType().find("text/css") == 0) {
+/* Rewrite url() values in the CSS */
+size_t startPos = 0;
+size_t endPos = 0;
+std::string url;
+
+while ((startPos = fileContent.find("url(", endPos)) && startPos != 
std::string::npos) {
+/* URL delimiters */
+endPos = fileContent.find(")", startPos);
+startPos = startPos + (fileContent[startPos+4] == '\'' || 
fileContent[startPos+4] == '"' ? 5 : 4);
+endPos = endPos - (fileContent[endPos-1] == '\'' || 
fileContent[endPos-1] == '"' ? 1 : 0);
+url = fileContent.substr(startPos, endPos - startPos);
+std::string startDelimiter = fileContent.substr(startPos-1, 1);
+std::string endDelimiter = fileContent.substr(endPos, 1);
+
+if (url.substr(0, 5) != "data:") {
+/* Deal with URL with arguments (using '? ') */
+std::string path = url;
+size_t markPos = url.find("?");
+if (markPos != std::string::npos) {
+path = url.substr(0, markPos);
+}
+
+/* Embeded fonts need to be inline because Kiwix is
+   otherwise not able to load same because of the
+   same-origin security */
+std::string mimeType = getMimeTypeForFile(path);
+if ( mimeType == "application/font-ttf"
+  || mimeType == "application/font-woff"
+  || mimeType == "application/vnd.ms-opentype"
+  || mimeType == "application/vnd.ms-fontobject") {
+try {
+ 

[MediaWiki-commits] [Gerrit] openzim[master]: Move few utility functions to a separate module.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295515 )

Change subject: Move few utility functions to a separate module.
..


Move few utility functions to a separate module.

Change-Id: Ia26754d14f0cd6c557626675beb5a7c5fe2cadaa
---
M zimwriterfs/Makefile.am
A zimwriterfs/tools.cpp
A zimwriterfs/tools.h
M zimwriterfs/zimwriterfs.cpp
4 files changed, 578 insertions(+), 481 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index 0caa3c8..ea2ab7a 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -1,3 +1,6 @@
 AUTOMAKE_OPTIONS=subdir-objects
 bin_PROGRAMS=zimwriterfs
-zimwriterfs_SOURCES=zimwriterfs.cpp
+
+zimwriterfs_SOURCES= \
+zimwriterfs.cpp \
+tools.cpp
diff --git a/zimwriterfs/tools.cpp b/zimwriterfs/tools.cpp
new file mode 100644
index 000..019b22c
--- /dev/null
+++ b/zimwriterfs/tools.cpp
@@ -0,0 +1,525 @@
+/*
+ * Copyright 2013-2016 Emmanuel Engelhart 
+ * Copyright 2016 Matthieu Gautier 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU  General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ * MA 02110-1301, USA.
+ */
+
+#include "tools.h"
+
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef _WIN32
+#define SEPARATOR "\\"
+#else
+#define SEPARATOR "/"
+#endif
+
+
+/* Init file extensions hash */
+static std::map _create_extMimeTypes(){
+  std::map extMimeTypes;
+  extMimeTypes["HTML"] = "text/html";
+  extMimeTypes["html"] = "text/html";
+  extMimeTypes["HTM"] = "text/html";
+  extMimeTypes["htm"] = "text/html";
+  extMimeTypes["PNG"] = "image/png";
+  extMimeTypes["png"] = "image/png";
+  extMimeTypes["TIFF"] = "image/tiff";
+  extMimeTypes["tiff"] = "image/tiff";
+  extMimeTypes["TIF"] = "image/tiff";
+  extMimeTypes["tif"] = "image/tiff";
+  extMimeTypes["JPEG"] = "image/jpeg";
+  extMimeTypes["jpeg"] = "image/jpeg";
+  extMimeTypes["JPG"] = "image/jpeg";
+  extMimeTypes["jpg"] = "image/jpeg";
+  extMimeTypes["GIF"] = "image/gif";
+  extMimeTypes["gif"] = "image/gif";
+  extMimeTypes["SVG"] = "image/svg+xml";
+  extMimeTypes["svg"] = "image/svg+xml";
+  extMimeTypes["TXT"] = "text/plain";
+  extMimeTypes["txt"] = "text/plain";
+  extMimeTypes["XML"] = "text/xml";
+  extMimeTypes["xml"] = "text/xml";
+  extMimeTypes["EPUB"] = "application/epub+zip";
+  extMimeTypes["epub"] = "application/epub+zip";
+  extMimeTypes["PDF"] = "application/pdf";
+  extMimeTypes["pdf"] = "application/pdf";
+  extMimeTypes["OGG"] = "application/ogg";
+  extMimeTypes["ogg"] = "application/ogg";
+  extMimeTypes["JS"] = "application/javascript";
+  extMimeTypes["js"] = "application/javascript";
+  extMimeTypes["JSON"] = "application/json";
+  extMimeTypes["json"] = "application/json";
+  extMimeTypes["CSS"] = "text/css";
+  extMimeTypes["css"] = "text/css";
+  extMimeTypes["otf"] = "application/vnd.ms-opentype";
+  extMimeTypes["OTF"] = "application/vnd.ms-opentype";
+  extMimeTypes["eot"] = "application/vnd.ms-fontobject";
+  extMimeTypes["EOT"] = "application/vnd.ms-fontobject";
+  extMimeTypes["ttf"] = "application/font-ttf";
+  extMimeTypes["TTF"] = "application/font-ttf";
+  extMimeTypes["woff"] = "application/font-woff";
+  extMimeTypes["WOFF"] = "application/font-woff";
+  extMimeTypes["vtt"] = "text/vtt";
+  extMimeTypes["VTT"] = "text/vtt";
+  
+  return extMimeTypes;
+}
+
+static std::map extMimeTypes = 
_create_extMimeTypes();
+
+static std::map fileMimeTypes;
+
+
+extern std::string directoryPath;
+extern bool inflateHtmlFlag;
+extern bool uniqueNamespace;
+extern magic_t magic;
+
+/* Decompress an STL string using zlib and return the original data. */
+inline std::string inflateString(const std::string& str) {
+  z_stream zs; // z_stream is zlib's control structure
+  memset(, 0, sizeof(zs));
+
+  if (inflateInit() != Z_OK)
+throw(std::runtime_error("inflateInit failed while decompressing."));
+
+  zs.next_in = (Bytef*)str.data();
+  zs.avail_in = str.size();
+
+  int ret;
+  char outbuffer[32768];
+  std::string outstring;
+
+  // get the decompressed bytes blockwise using repeated 

[MediaWiki-commits] [Gerrit] openzim[master]: Add xapian indexer.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296913 )

Change subject: Add xapian indexer.
..


Add xapian indexer.

Xapian is optional.
Build your index inside zim by adding "-i" or "--createFullTextIndex"
to zimwriterfs' command line.

Change-Id: I52c255e8335d0b6763c1c59eeb1549300d5f6f81
---
M zimwriterfs/Makefile.am
M zimwriterfs/configure.ac
M zimwriterfs/tools.cpp
M zimwriterfs/tools.h
A zimwriterfs/xapian/htmlparse.cc
A zimwriterfs/xapian/htmlparse.h
A zimwriterfs/xapian/myhtmlparse.cc
A zimwriterfs/xapian/myhtmlparse.h
A zimwriterfs/xapian/namedentities.h
A zimwriterfs/xapianIndexer.cpp
A zimwriterfs/xapianIndexer.h
M zimwriterfs/zimwriterfs.cpp
12 files changed, 1,490 insertions(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index 628b74c..1d40174 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -10,3 +10,15 @@
 resourceTools.cpp \
 pathTools.cpp \
 mimetypecounter.cpp
+
+zimwriterfs_CXXFLAGS = $(ICU_CFLAGS)
+zimwriterfs_LDFLAGS = $(ICU_LDFLAGS)
+
+if HAVE_XAPIAN
+zimwriterfs_CXXFLAGS += $(XAPIAN_CFLAGS)
+zimwriterfs_LDFLAGS += $(XAPIAN_LDFLAGS)
+zimwriterfs_SOURCES += \
+xapianIndexer.cpp \
+xapian/myhtmlparse.cc \
+xapian/htmlparse.cc
+endif
diff --git a/zimwriterfs/configure.ac b/zimwriterfs/configure.ac
index fb12c8f..795d3b1 100644
--- a/zimwriterfs/configure.ac
+++ b/zimwriterfs/configure.ac
@@ -71,6 +71,121 @@
 AC_DEFINE_UNQUOTED(LZMA_MEMORY_SIZE, 128, [set lzma uncompress memory size to 
number of MB])
 AC_DEFINE(ENABLE_LZMA, [1], [defined if lzma compression is enabled])
 
+
+function findLibrary {
+   found=0
+   for f in $(echo $LIBS_ROOT|tr ":" "\n") ; do
+   sf=`find $f -name $1 | grep $ARCH | head -1 2> /dev/null`
+   if [[ -f "$sf" -a $found -eq 0 ]]
+   then
+   found=1
+   echo $sf
+   fi
+   done
+   if [[ $found -eq 0 ]]
+   then
+   for f in $(echo $LIBS_ROOT|tr ":" "\n") ; do
+   sf=`find $f -name $1 | head -1 2> /dev/null`
+   if [[ -f "$sf" -a $found -eq 0 ]]
+   then
+   found=1
+   echo $sf
+   fi
+   done
+   fi
+   if [[ $found -eq 0 ]]
+   then
+   echo "no"
+   fi
+}
+
+
+
+ ICU
+
+
+
+ICU_CFLAGS=""
+ICU_LDFLAGS="-licui18n -licuuc -licudata" # replaced by icu-config
+ICU_STATIC_LDFLAGS=""
+
+# if --with-x, add path to LIBRARY_PATH
+AC_ARG_WITH(icu,
+AC_HELP_STRING([--with-icu=DIR], [alternate location for 
icu-config]),
+export 
LIBRARY_PATH="${withval}:${LIBRARY_PATH}";ICU_PATH=${withval}
+   )
+
+# look for shared library.
+# AC_CHECK_HEADER([zlib.h],, [AC_MSG_ERROR([[cannot find zlib header]])])
+# AC_CHECK_LIB([z], [zlibVersion],, [AC_MSG_ERROR([[cannot find 
zlib]]);COMPILE_ICU=1])
+# ICU_FILES=`findLibrary "libicuuc.${SHARED_EXT}"`
+
+AC_CHECK_TOOL(HAVE_ICU_CONFIG, icu-config,, "${ICU_PATH}:${PATH}")
+if test [ ! "$HAVE_ICU_CONFIG" ]
+then
+ AC_MSG_ERROR([[cannot find icu-config]])
+else
+OLDPATH=$PATH
+PATH="${ICU_PATH}:${PATH}"
+ICU_CFLAGS=`icu-config --cxxflags`;
+ICU_LDFLAGS=`icu-config --ldflags`;
+ICU_VER=`icu-config --version`;
+ICU_FILES="`findLibrary "libicuuc.${SHARED_EXT}"` `findLibrary 
"libicudata.${SHARED_EXT}"` `findLibrary "libicui18n.${SHARED_EXT}"`"
+PATH=$OLDPATH
+if [[ $ICU_VER \< "4.2" ]]
+   then
+AC_MSG_ERROR([[You need a version of libicu >= 4.2]])
+   fi
+fi
+
+
+AC_SUBST(ICU_CFLAGS)
+AC_SUBST(ICU_LDFLAGS)
+AC_SUBST(ICU_STATIC_LDFLAGS)
+AC_SUBST(ICU_FILES)
+AC_SUBST(COMPILED_ICUDATA_DAT)
+
+
+ XAPIAN
+
+
+XAPIAN_CFLAGS=""
+XAPIAN_LDFLAGS=""
+XAPIAN_STATIC_LDFLAGS=""
+XAPIAN_ENABLE=0
+
+# if --with-x, add path to LIBRARY_PATH
+AC_ARG_WITH([xapian],
+   [AS_HELP_STRING([--with-xapian=DIR], [alternat location for 
xapian-config] @@)],
+   [xapian_dir=$withval],
+   [with_xapian=yes])
+
+
+AS_IF([test "x$with_xapian" == xno],
+[AM_CONDITIONAL(HAVE_XAPIAN, false)],
+   [OLDPATH=$PATH
+AS_IF([test "x$with_xapian" != xyes],
+  PATH="$with_xapian:$PATH")
+AC_CHECK_TOOLS(XAPIAN_CONFIG, xapian-config-1.3, 
xapian-config,[],$PATH)
+AS_IF([test "x$XAPIAN_CONFIG" == x ],
+   AC_MSG_ERROR([[cannot find xapian-config file]])
+ )
+XAPIAN_VERSION=`$XAPIAN_CONFIG --version`
+

[MediaWiki-commits] [Gerrit] openzim[master]: Revert "Use explicit LZMA_STREAM_INIT initializer, instead o...

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296752 )

Change subject: Revert "Use explicit LZMA_STREAM_INIT initializer, instead of 
memset."
..


Revert "Use explicit LZMA_STREAM_INIT initializer, instead of memset."

This reverts commit 498539d869bbb9add9e795432520f305523e09bf.

Turns out that the clang compiler doesn't like this form of initializer:
it must be a GCC extension of some kind.  There's nothing technically
wrong with the previous `memset` way of initializing the variable,
it just looks kind of gross.  But functionality trumps aesthetics.

Change-Id: I1d5a12c6eb40706023f81323c04a356da3356f42
---
M zimlib/src/lzmastream.cpp
1 file changed, 3 insertions(+), 2 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/src/lzmastream.cpp b/zimlib/src/lzmastream.cpp
index bd02bd8..d880933 100644
--- a/zimlib/src/lzmastream.cpp
+++ b/zimlib/src/lzmastream.cpp
@@ -58,10 +58,11 @@
   }
 
   LzmaStreamBuf::LzmaStreamBuf(std::streambuf* sink_, uint32_t preset, 
lzma_check check, unsigned bufsize_)
-: stream(LZMA_STREAM_INIT),
-  obuffer(bufsize_),
+: obuffer(bufsize_),
   sink(sink_)
   {
+std::memset(reinterpret_cast(), 0, sizeof(stream));
+
 checkError(
   ::lzma_easy_encoder(, preset, check));
 

-- 
To view, visit https://gerrit.wikimedia.org/r/296752
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1d5a12c6eb40706023f81323c04a356da3356f42
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: C. Scott Ananian 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Revert compatibility with previous API.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296910 )

Change subject: Revert compatibility with previous API.
..


Revert compatibility with previous API.

The API has changed. The behavior is different, so do not try to keep
a false APIย compatibility.

Change-Id: I174f7df03d5ad5477c6b0fd1869258a31af3366a
---
M zimlib/include/zim/writer/articlesource.h
M zimlib/src/articlesource.cpp
M zimlib/src/zimcreator.cpp
M zimwriterfs/article.cpp
M zimwriterfs/article.h
M zimwriterfs/articlesource.cpp
6 files changed, 15 insertions(+), 59 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/writer/articlesource.h 
b/zimlib/include/zim/writer/articlesource.h
index dbe9263..a9ecffb 100644
--- a/zimlib/include/zim/writer/articlesource.h
+++ b/zimlib/include/zim/writer/articlesource.h
@@ -46,26 +46,10 @@
 virtual bool shouldCompress() const;
 virtual std::string getRedirectAid() const;
 virtual std::string getParameter() const;
-/* Idealy this method should be pure virtual,
- * but for compatibility reasons, provide a default implementation
- * using the old ArticleSourc::getData.
- */
-virtual Blob getData() const;
+virtual Blob getData() const = 0;
 
 // returns the next category id, to which the article is assigned to
 virtual std::string getNextCategory();
-
-  
//
-  /* For API compatibility.
-   * The default Article::getData call ArticleSource::getData.
-   * So store the source of article in article to let default API 
compatible
-   * function do its job.
-   * This should be removed once every users switch to new API.
-   */
-  private:
-mutable ArticleSource*  __source;
-friend class ZimCreator;
-  
//
 };
 
 class Category
@@ -90,17 +74,6 @@
 // ids. Using this list, the writer fetches the category data using
 // this method.
 virtual Category* getCategory(const std::string& cid);
-
-
/**/
-/* For API compatibility.
- * The default Article::getData call ArticleSource::getData.
- * So keep the getData. Do not set it pure virtual cause we want new
- * code to not use it.
- * This should be removed once every users switch to new API.
- */
-virtual Blob getData(const std::string& aid);
-
-
/**/
 };
 
   }
diff --git a/zimlib/src/articlesource.cpp b/zimlib/src/articlesource.cpp
index a2087a7..26d33f8 100644
--- a/zimlib/src/articlesource.cpp
+++ b/zimlib/src/articlesource.cpp
@@ -69,22 +69,6 @@
   return std::string();
 }
 
-
/**/
-/* For API compatibility.
- * The default Article::getData call ArticleSource::getData.
- * This should be removed once every users switch to new API.
- */
-Blob Article::getData() const
-{
-  std::cerr << "DEPRECATED WARNING : Use of ArticleSource::getData is 
deprecated." << std::endl;
-  std::cerr << " You should override Article::getData 
directly." << std::endl;
-  return __source->getData(getAid());
-}
-Blob ArticleSource::getData(const std::string& aid) {
-throw std::runtime_error("This should not be called");
-}
-
/**/
-
 Uuid ArticleSource::getUuid()
 {
   return Uuid::generate();
diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index 1b528f4..0f0f6d0 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -202,15 +202,6 @@
 }
 
 // Add blob data to compressed or uncompressed cluster.
-
/**/
-/* For API compatibility.
- * The default Article::getData call ArticleSource::getData.
- * So set the source of article to let default API compatible function
- * do its job.
- * This should be removed once every users switch to new API.
- */
-article->__source = 
-
/**/
 Blob blob = article->getData();
 if (blob.size() > 0)
 {
diff --git a/zimwriterfs/article.cpp b/zimwriterfs/article.cpp
index 4aeb083..98ec882 100644
--- a/zimwriterfs/article.cpp
+++ b/zimwriterfs/article.cpp
@@ -24,7 +24,9 @@
 
 extern std::string directoryPath;
 
-Article::Article(const std::string& path, const bool 

[MediaWiki-commits] [Gerrit] openzim[master]: Compress blobs and track file size immediately after adding ...

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295383 )

Change subject: Compress blobs and track file size immediately after adding 
each article.
..


Compress blobs and track file size immediately after adding each article.

Rather than first collecting all the directory entries and only afterwards
writing the blobs, write each cluster on the fly as we see each article.
Keep track of both compressed and uncompressed clusters so that we don't
needlessly terminate compressed clusters just because we happen to have
encountered an uncompressible file.  Account for additions to each of
the various indices as we go so that we maintain a fairly accurate
size for the file at every point, which will allow us to stop adding
articles once the ZIM file gets to a certain size.

Change-Id: Ib644fff4cb804320a07aadbea499c8416df66adc
---
M zimlib/include/zim/writer/zimcreator.h
M zimlib/src/zimcreator.cpp
2 files changed, 126 insertions(+), 86 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/writer/zimcreator.h 
b/zimlib/include/zim/writer/zimcreator.h
index 5134d98..4296371 100644
--- a/zimlib/include/zim/writer/zimcreator.h
+++ b/zimlib/include/zim/writer/zimcreator.h
@@ -52,10 +52,10 @@
 CompressionType compression;
 bool isEmpty;
 offset_type clustersSize;
+offset_type currentSize;
 
-void createDirents(ArticleSource& src);
+void createDirents(ArticleSource& src, const std::string& tmpfname);
 void createTitleIndex(ArticleSource& src);
-void createClusters(ArticleSource& src, const std::string& tmpfname);
 void fillHeader(ArticleSource& src);
 void write(const std::string& fname, const std::string& tmpfname);
 
@@ -84,6 +84,10 @@
 void setMinChunkSize(int s)   { minChunkSize = s; }
 
 void create(const std::string& fname, ArticleSource& src);
+
+/* The user can query `currentSize` after each article has been
+ * added to the ZIM file. */
+offset_type getCurrentSize() { return currentSize; }
 };
 
   }
diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index 66ce902..ac2720b 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -56,28 +56,30 @@
   : minChunkSize(1024-64),
 nextMimeIdx(0),
 #ifdef ENABLE_LZMA
-compression(zimcompLzma)
+compression(zimcompLzma),
 #elif ENABLE_BZIP2
-compression(zimcompBzip2)
+compression(zimcompBzip2),
 #elif ENABLE_ZLIB
-compression(zimcompZip)
+compression(zimcompZip),
 #else
-compression(zimcompNone)
+compression(zimcompNone),
 #endif
+currentSize(0)
 {
 }
 
 ZimCreator::ZimCreator(int& argc, char* argv[])
   : nextMimeIdx(0),
 #ifdef ENABLE_LZMA
-compression(zimcompLzma)
+compression(zimcompLzma),
 #elif ENABLE_BZIP2
-compression(zimcompBzip2)
+compression(zimcompBzip2),
 #elif ENABLE_ZLIB
-compression(zimcompZip)
+compression(zimcompZip),
 #else
-compression(zimcompNone)
+compression(zimcompNone),
 #endif
+currentSize(0)
 {
   Arg minChunkSizeArg(argc, argv, "--min-chunk-size");
   if (minChunkSizeArg.isSet())
@@ -110,15 +112,12 @@
   log_debug("basename " << basename);
 
   INFO("create directory entries");
-  createDirents(src);
+  createDirents(src, basename + ".tmp");
   INFO(dirents.size() << " directory entries created");
 
   INFO("create title index");
   createTitleIndex(src);
   INFO(dirents.size() << " title index created");
-
-  INFO("create clusters");
-  createClusters(src, basename + ".tmp");
   INFO(clusterOffsets.size() << " clusters created");
 
   INFO("fill header");
@@ -132,9 +131,23 @@
   INFO("ready");
 }
 
-void ZimCreator::createDirents(ArticleSource& src)
+void ZimCreator::createDirents(ArticleSource& src, const std::string& 
tmpfname)
 {
   INFO("collect articles");
+  std::ofstream out(tmpfname.c_str());
+  currentSize =
+80 /* for header */ +
+1 /* for mime type table termination */ +
+16 /* for md5sum */;
+
+  // We keep both a "compressed cluster" and an "uncompressed cluster"
+  // because we don't know which one will fill up first.  We also need
+  // to track the dirents currently in each, so we can fix up the
+  // cluster index if the other one ends up written first.
+  DirentsType compDirents, uncompDirents;
+  Cluster compCluster, uncompCluster;
+  compCluster.setCompression(compression);
+  uncompCluster.setCompression(zimcompNone);
 
   const Article* article;
   while ((article = src.getNextArticle()) != 0)
@@ -163,13 +176,107 @@
 }
 else
 {
+  uint16_t oldMimeIdx = 

[MediaWiki-commits] [Gerrit] openzim[master]: Bug fix: preserve cluster compression type after clear().

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296547 )

Change subject: Bug fix: preserve cluster compression type after clear().
..


Bug fix: preserve cluster compression type after clear().

By nulling out the cluster implementation on clear, we were also
resetting the cluster compression type.  This caused the ZimCreator
API to not compress compressible clusters after the first one.
This is a follow-up to f5de40f94b30795f42bb9388cbb46df9cd605167,
which reused cluster objects instead of recreating them from scratch
each time.

Related issue: the `size()` of an empty cluster is actually 4, not 0,
since even an empty cluster contains an field which counts the number
of offsets in the cluster, and change the ClusterImpl initializer so
the compresion type doesn't change from zimcompNone to zimcompDefault
as soon as the ClusterImpl is created.

Change-Id: I468a1719a33c450db9a28d9704b539bdb97cd7fc
---
M zimlib/include/zim/cluster.h
M zimlib/src/cluster.cpp
2 files changed, 3 insertions(+), 3 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/cluster.h b/zimlib/include/zim/cluster.h
index c26c24c..bd55cb5 100644
--- a/zimlib/include/zim/cluster.h
+++ b/zimlib/include/zim/cluster.h
@@ -88,8 +88,8 @@
   Blob getBlob(size_type n) const;
 
   size_type count() const   { return impl ? impl->getCount() : 0; }
-  size_type size() const{ return impl ? impl->getSize() : 0; }
-  void clear()  { impl = 0; }
+  size_type size() const{ return impl ? impl->getSize(): 
sizeof(size_type); }
+  void clear()  { if (impl) impl->clear(); }
 
   void addBlob(const char* data, unsigned size) { getImpl()->addBlob(data, 
size); }
   void addBlob(const Blob& blob){ 
getImpl()->addBlob(blob); }
diff --git a/zimlib/src/cluster.cpp b/zimlib/src/cluster.cpp
index 6f6ea14..3630042 100644
--- a/zimlib/src/cluster.cpp
+++ b/zimlib/src/cluster.cpp
@@ -60,7 +60,7 @@
   }
 
   ClusterImpl::ClusterImpl()
-: compression(zimcompDefault)
+: compression(zimcompNone)
   {
 offsets.push_back(0);
   }

-- 
To view, visit https://gerrit.wikimedia.org/r/296547
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I468a1719a33c450db9a28d9704b539bdb97cd7fc
Gerrit-PatchSet: 3
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: C. Scott Ananian 
Gerrit-Reviewer: Kelson 
Gerrit-Reviewer: Mgautierfr 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Update .gitignore files.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295037 )

Change subject: Update .gitignore files.
..


Update .gitignore files.

Change-Id: Iad8d95fa2deab178396e8804b6e195a5b1520a4a
---
C zimlib/.gitignore
R zimwriterfs/.gitignore
2 files changed, 3 insertions(+), 11 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/.gitignore b/zimlib/.gitignore
similarity index 75%
copy from .gitignore
copy to zimlib/.gitignore
index c50679e..0a4a991 100644
--- a/.gitignore
+++ b/zimlib/.gitignore
@@ -4,7 +4,6 @@
 compile
 config.*
 configure
-createZimExample
 depcomp
 .deps
 .dirstamp
@@ -25,8 +24,6 @@
 .svn
 .*.swp
 *.zim
-zimdiff
-zimdump
-zimpatch
-zimsearch
-zimwriterfs
+examples/createZimExample
+src/tools/zimdump
+src/tools/zimsearch
diff --git a/.gitignore b/zimwriterfs/.gitignore
similarity index 80%
rename from .gitignore
rename to zimwriterfs/.gitignore
index c50679e..1aaab40 100644
--- a/.gitignore
+++ b/zimwriterfs/.gitignore
@@ -4,7 +4,6 @@
 compile
 config.*
 configure
-createZimExample
 depcomp
 .deps
 .dirstamp
@@ -25,8 +24,4 @@
 .svn
 .*.swp
 *.zim
-zimdiff
-zimdump
-zimpatch
-zimsearch
 zimwriterfs

-- 
To view, visit https://gerrit.wikimedia.org/r/295037
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Iad8d95fa2deab178396e8804b6e195a5b1520a4a
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: C. Scott Ananian 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Update zimwriterfs makefiles.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295038 )

Change subject: Update zimwriterfs makefiles.
..


Update zimwriterfs makefiles.

Change-Id: Ic0f93e19eb3e07d86c115bd15f47ef5fbc74f954
---
M zimwriterfs/Makefile.am
M zimwriterfs/README.md
M zimwriterfs/configure.ac
3 files changed, 22 insertions(+), 49 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index 308149d..e52da96 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -1,5 +1,3 @@
 AUTOMAKE_OPTIONS=subdir-objects
 bin_PROGRAMS=zimwriterfs
 zimwriterfs_SOURCES= zimwriterfs.cpp gumbo/utf8.c gumbo/string_buffer.c 
gumbo/parser.c gumbo/error.c gumbo/string_piece.c gumbo/tag.c gumbo/vector.c 
gumbo/tokenizer.c gumbo/util.c gumbo/char_ref.c gumbo/attribute.c
-zimwriterfs_CXXFLAGS=$(LIBZIM_CFLAGS) $(LIBLZMA_CFLAGS) $(LIBZ_CFLAGS) 
$(LIBMAGIC_CFLAGS) $(LIBPTHREAD_CFLAGS) $(CFLAGS) $(CXXFLAGS)
-zimwriterfs_LDFLAGS=$(LIBZIM_LDFLAGS) $(LIBLZMA_LDFLAGS) $(LIBZ_LDFLAGS) 
$(LIBMAGIC_LDFLAGS) $(LIBPTHREAD_LDFLAGS)
\ No newline at end of file
diff --git a/zimwriterfs/README.md b/zimwriterfs/README.md
index 797f6fb..db8a01d 100644
--- a/zimwriterfs/README.md
+++ b/zimwriterfs/README.md
@@ -29,10 +29,16 @@
   packaged), resp. for the mimeType detection
 * libz (http://www.zlib.net/), resp. for unpack compressed HTML files
 
+On Debian, you can ensure these are installed with:
+```
+sudo apt-get install liblzma-dev libmagic-dev zlib1g-dev
+cd ../zimlib && ./autogen.sh && ./configure && make && cd ../zimwriterfs
+```
+
 Once the dependencies are in place, to build:
 ```
 ./autogen.sh
-./configure
+./configure CXXFLAGS=-I../zimlib/include LDFLAGS=-L../zimlib/src/.libs
 make
 ```
 
diff --git a/zimwriterfs/configure.ac b/zimwriterfs/configure.ac
index 5a01142..9c80493 100644
--- a/zimwriterfs/configure.ac
+++ b/zimwriterfs/configure.ac
@@ -33,70 +33,39 @@
   AC_MSG_ERROR([[cannot find pkg-config]])
 fi
 
-# Check if the liblzma is available
+# Set up CXXFLAGS/LDFLAGS and ensure they are substituted
+AC_ARG_VAR(CXXFLAGS, [C++ compiler flags])
+AC_ARG_VAR(LDFLAGS, linker flags)
+CFLAGS="-O3 -std=gnu99 -std=c99 $CFLAGS"
+CXXFLAGS="-O3 -Igumbo $CXXFLAGS"
+
+# Check if the liblzma library is available
 AC_CHECK_HEADER([lzma.h],, [AC_MSG_ERROR([[cannot find lzma header]])])
 AC_CHECK_LIB([lzma], [lzma_version_string],, [AC_MSG_ERROR([[cannot find 
lzma]])])
 
-# Check if the libzim is available
+# Check if the libzim library is available
 AC_CHECK_HEADER([zim/zim.h],, [AC_MSG_ERROR([[cannot find libzim header]])])
 AC_CHECK_LIB([zim], [zim_MD5Init],, [AC_MSG_ERROR([[cannot find libzim]])])
 
-# Check if the libmagic is available
+# Check if the libz library is available
+AC_CHECK_HEADER([zlib.h],, [AC_MSG_ERROR([[cannot find libz header]])])
+AC_CHECK_LIB([z], [deflate],, [AC_MSG_ERROR([[cannot find libz]])])
+
+# Check if the libmagic library is available
 AC_CHECK_HEADER([magic.h],, [AC_MSG_ERROR([[cannot find libmagic header]])])
 AC_CHECK_LIB([magic], [magic_file],, [AC_MSG_ERROR([[cannot find libmagic]])])
 
-# Check if the libpthread is available
+# Check if the libpthread library is available
 AC_CHECK_HEADER([pthread.h],, [AC_MSG_ERROR([[cannot find libpthread 
header]])])
 AC_CHECK_LIB([pthread], [pthread_exit],, [AC_MSG_ERROR([[cannot find 
libpthread]])])
 
-# Set current language to C++
-AC_LANG(C++)
-
 # Check the existence of stat64 (to handle file >2GB) in the libc
 AC_CHECK_FUNCS([stat64])
-
-# cxxflags
-CXXFLAGS="-O3 -Igumbo $CXXFLAGS"
-CFLAGS="-O3 -std=gnu99 -std=c99"
-
-# liblzma
-LIBLZMA_CFLAGS=""
-LIBLZMA_LDFLAGS=" -llzma"
-
-# libzim
-LIBZIM_CFLAGS=""
-LIBZIM_LDFLAGS=" -lzim"
-
-# libz
-LIBZ_CFLAGS=""
-LIBZ_LDFLAGS=" -lz"
-
-# libmagic
-LIBMAGIC_CFLAGS=""
-LIBMAGIC_LDFLAGS=" -lmagic"
-
-# libpthread
-LIBPTHREAD_CFLAGS=""
-LIBPTHREAD_LDFLAGS=" -lpthread"
 
 AC_DEFINE_UNQUOTED(CLUSTER_CACHE_SIZE, 16, [set zim cluster cache size to 
number of cached chunks])
 AC_DEFINE_UNQUOTED(DIRENT_CACHE_SIZE, 512, [set zim dirent cache size to 
number of cached chunks])
 AC_DEFINE_UNQUOTED(LZMA_MEMORY_SIZE, 128, [set lzma uncompress memory size to 
number of MB])
 AC_DEFINE(ENABLE_LZMA, [1], [defined if lzma compression is enabled])
-
-# export variables
-AC_SUBST(CXXFLAGS)
-AC_SUBST(CFLAGS)
-AC_SUBST(LIBLZMA_CFLAGS)
-AC_SUBST(LIBLZMA_LDFLAGS)
-AC_SUBST(LIBZIM_CFLAGS)
-AC_SUBST(LIBZIM_LDFLAGS)
-AC_SUBST(LIBZ_CFLAGS)
-AC_SUBST(LIBZ_LDFLAGS)
-AC_SUBST(LIBMAGIC_CFLAGS)
-AC_SUBST(LIBMAGIC_LDFLAGS)
-AC_SUBST(LIBPTHREAD_CFLAGS)
-AC_SUBST(LIBPTHREAD_LDFLAGS)
 
 # Configure the output files
 AC_CONFIG_FILES([
@@ -104,4 +73,4 @@
 ])
 
 AC_PROG_INSTALL
-AC_OUTPUT
\ No newline at end of file
+AC_OUTPUT

-- 
To view, visit https://gerrit.wikimedia.org/r/295038
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: 

[MediaWiki-commits] [Gerrit] openzim[master]: Explain that ZIntStream only represents uint32_t values righ...

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296657 )

Change subject: Explain that ZIntStream only represents uint32_t values right 
now.
..


Explain that ZIntStream only represents uint32_t values right now.

Change-Id: Id712847949af1ce32b2e3118fc241be171124186
---
M zimlib/include/zim/zintstream.h
1 file changed, 4 insertions(+), 1 deletion(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/zintstream.h b/zimlib/include/zim/zintstream.h
index 1c78a1f..66fdfef 100644
--- a/zimlib/include/zim/zintstream.h
+++ b/zimlib/include/zim/zintstream.h
@@ -41,7 +41,10 @@
   substracted from the actual number, so a 2 byte zero is actually a 128.
 
   The same logic continues on the 3rd, 4th, ... byte. Up to 7 additional bytes
-  are used, so the first byte must contain at least one 0.
+  could used, since the first byte must contain at least one 0.
+
+  This particular implementation only represents uint32_t values (numbers up
+  to 2^32-1), so it will only ever emit 5 bytes per input value.
 
   binary  range
   --- 
--

-- 
To view, visit https://gerrit.wikimedia.org/r/296657
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id712847949af1ce32b2e3118fc241be171124186
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: C. Scott Ananian 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Use a Queue object to handle threadsafe access to a queue.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295518 )

Change subject: Use a Queue object to handle threadsafe access to a queue.
..


Use a Queue object to handle threadsafe access to a queue.

By using a Queue object we avoid the declaration of popFromFilenameQueue
function in articlesource.cpp.

Change-Id: Ic685f7e22e4ce95f6e0eb280f65809fd0dff1a6a
---
M zimwriterfs/articlesource.cpp
M zimwriterfs/articlesource.h
A zimwriterfs/queue.h
M zimwriterfs/zimwriterfs.cpp
4 files changed, 116 insertions(+), 55 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/articlesource.cpp b/zimwriterfs/articlesource.cpp
index 8b0b34c..6cf5f30 100644
--- a/zimwriterfs/articlesource.cpp
+++ b/zimwriterfs/articlesource.cpp
@@ -28,7 +28,6 @@
 #include 
 #include 
 
-bool popFromFilenameQueue(std::string );
 bool isVerbose();
 
 extern std::string welcome;
@@ -44,8 +43,9 @@
 unsigned int dataSize = 0;
 
 
-
-ArticleSource::ArticleSource() {
+ArticleSource::ArticleSource(Queue& filenameQueue):
+filenameQueue(filenameQueue)
+{
   /* Prepare metadata */
   metadataQueue.push("Language");
   metadataQueue.push("Publisher");
@@ -88,10 +88,10 @@
 std::string line = redirectsQueue.front();
 redirectsQueue.pop();
 article = new RedirectArticle(line);
-  } else if (popFromFilenameQueue(path)) {
+  } else if (filenameQueue.popFromQueue(path)) {
 do {
   article = new Article(path);
-} while (article && article->isInvalid() && popFromFilenameQueue(path));
+} while (article && article->isInvalid() && 
filenameQueue.popFromQueue(path));
   } else {
 article = NULL;
   }
diff --git a/zimwriterfs/articlesource.h b/zimwriterfs/articlesource.h
index adbdbda..1ad6524 100644
--- a/zimwriterfs/articlesource.h
+++ b/zimwriterfs/articlesource.h
@@ -24,12 +24,13 @@
 #include 
 #include 
 #include 
+#include "queue.h"
 
 #include 
 
 class ArticleSource : public zim::writer::ArticleSource {
   public:
-explicit ArticleSource();
+explicit ArticleSource(Queue& filenameQueue);
 virtual const zim::writer::Article* getNextArticle();
 virtual zim::Blob getData(const std::string& aid);
 virtual std::string getMainPage();
@@ -39,6 +40,7 @@
   private:
 std::queue metadataQueue;
 std::queue redirectsQueue;
+Queue& filenameQueue;
 };
 
 #endif //OPENZIM_ZIMWRITERFS_ARTICLESOURCE_H
diff --git a/zimwriterfs/queue.h b/zimwriterfs/queue.h
new file mode 100644
index 000..d177568
--- /dev/null
+++ b/zimwriterfs/queue.h
@@ -0,0 +1,88 @@
+/*
+ * Copyright 2016 Matthieu Gautier 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU  General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ * MA 02110-1301, USA.
+ */
+
+#ifndef OPENZIM_ZIMWRITERFS_QUEUE_H
+#define OPENZIM_ZIMWRITERFS_QUEUE_H
+
+#define MAX_QUEUE_SIZE 100
+
+#include 
+#include 
+
+template
+class Queue {
+public:
+Queue() {pthread_mutex_init(_queueMutex,NULL);};
+virtual ~Queue() {pthread_mutex_destroy(_queueMutex);};
+virtual bool isEmpty();
+virtual void pushToQueue(const T& element);
+virtual bool popFromQueue(T );
+
+protected:
+std::queue   m_realQueue;
+pthread_mutex_t m_queueMutex;
+
+private:
+// Make this queue non copyable
+Queue(const Queue&);
+Queue& operator=(const Queue&);
+};
+
+template
+bool Queue::isEmpty() {
+pthread_mutex_lock(_queueMutex);
+bool retVal = m_realQueue.empty();
+pthread_mutex_unlock(_queueMutex);
+return retVal;
+}
+
+template
+void Queue::pushToQueue(const T ) {
+unsigned int wait = 0;
+unsigned int queueSize = 0;
+
+do {
+usleep(wait);
+pthread_mutex_lock(_queueMutex);
+queueSize = m_realQueue.size();
+pthread_mutex_unlock(_queueMutex);
+wait += 10;
+} while (queueSize > MAX_QUEUE_SIZE);
+
+pthread_mutex_lock(_queueMutex);
+m_realQueue.push(element);
+pthread_mutex_unlock(_queueMutex);
+}
+
+template
+bool Queue::popFromQueue(T ) {
+pthread_mutex_lock(_queueMutex);
+if (m_realQueue.empty()) {
+pthread_mutex_unlock(_queueMutex);
+return false;
+}
+
+element = m_realQueue.front();
+m_realQueue.pop();
+

[MediaWiki-commits] [Gerrit] openzim[master]: Move `throw` statement out of header file.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295720 )

Change subject: Move `throw` statement out of header file.
..


Move `throw` statement out of header file.

This ensures that we don't fail compilation when including the header
file in an environment where exception handling is disabled (like
when binding to node.js).

Change-Id: Ib49060c7fe479054e58c20249dd9b3236ea7eb03
---
M zimlib/include/zim/writer/articlesource.h
M zimlib/src/articlesource.cpp
2 files changed, 5 insertions(+), 1 deletion(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/writer/articlesource.h 
b/zimlib/include/zim/writer/articlesource.h
index 94ee91b..54653ef 100644
--- a/zimlib/include/zim/writer/articlesource.h
+++ b/zimlib/include/zim/writer/articlesource.h
@@ -97,7 +97,8 @@
  * code to not use it.
  * This should be removed once every users switch to new API.
  */
-virtual Blob getData(const std::string& aid) { throw "This should not 
be called"; };
+virtual Blob getData(const std::string& aid);
+
 
/**/
 };
 
diff --git a/zimlib/src/articlesource.cpp b/zimlib/src/articlesource.cpp
index cc72ae2..a2087a7 100644
--- a/zimlib/src/articlesource.cpp
+++ b/zimlib/src/articlesource.cpp
@@ -80,6 +80,9 @@
   std::cerr << " You should override Article::getData 
directly." << std::endl;
   return __source->getData(getAid());
 }
+Blob ArticleSource::getData(const std::string& aid) {
+throw std::runtime_error("This should not be called");
+}
 
/**/
 
 Uuid ArticleSource::getUuid()

-- 
To view, visit https://gerrit.wikimedia.org/r/295720
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib49060c7fe479054e58c20249dd9b3236ea7eb03
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: C. Scott Ananian 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Support running tests with "make check"

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/330859 )

Change subject: Support running tests with "make check"
..


Support running tests with "make check"

Change-Id: I1e15ba2dbda2f71c98bafa84d36b81c2a1772df1
---
M zimlib/.gitignore
M zimlib/test/Makefile.am
2 files changed, 4 insertions(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/.gitignore b/zimlib/.gitignore
index 550c09a..5896863 100644
--- a/zimlib/.gitignore
+++ b/zimlib/.gitignore
@@ -28,3 +28,6 @@
 src/tools/zimdump
 src/tools/zimsearch
 libzim.pc
+test-driver
+test/zimlib-test*
+test/test-suite.log
diff --git a/zimlib/test/Makefile.am b/zimlib/test/Makefile.am
index 29dd7bd..f28d6fe 100644
--- a/zimlib/test/Makefile.am
+++ b/zimlib/test/Makefile.am
@@ -1,6 +1,7 @@
 AM_CPPFLAGS=-I$(top_srcdir)/include
 
 noinst_PROGRAMS = zimlib-test
+TESTS = zimlib-test
 
 if WITH_ZLIB
 ZLIB_SOURCES = \

-- 
To view, visit https://gerrit.wikimedia.org/r/330859
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1e15ba2dbda2f71c98bafa84d36b81c2a1772df1
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Legoktm 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Add a indexer.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296912 )

Change subject: Add a indexer.
..


Add a indexer.

This indexer is not used.
This is mainly code from kiwix-indexer imported in openzim.
Unused function in *Tools has been removed.
No dependency to xapian.

Change-Id: I55079339d21d6903634c265f83f4d1c6ba0ac333
---
M zimwriterfs/Makefile.am
A zimwriterfs/indexer.cpp
A zimwriterfs/indexer.h
A zimwriterfs/pathTools.cpp
A zimwriterfs/pathTools.h
A zimwriterfs/resourceTools.cpp
A zimwriterfs/resourceTools.h
M zimwriterfs/zimwriterfs.cpp
8 files changed, 921 insertions(+), 2 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index 92641d9..628b74c 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -6,4 +6,7 @@
 tools.cpp \
 article.cpp \
 articlesource.cpp \
+indexer.cpp \
+resourceTools.cpp \
+pathTools.cpp \
 mimetypecounter.cpp
diff --git a/zimwriterfs/indexer.cpp b/zimwriterfs/indexer.cpp
new file mode 100644
index 000..7820a32
--- /dev/null
+++ b/zimwriterfs/indexer.cpp
@@ -0,0 +1,262 @@
+/*
+ * Copyright 2011-2014 Emmanuel Engelhart 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU  General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ * MA 02110-1301, USA.
+ */
+
+#include "indexer.h"
+#include "resourceTools.h"
+#include "pathTools.h"
+#include 
+
+  /* Count word */
+  unsigned int Indexer::countWords(const string ) {
+unsigned int numWords = 1;
+unsigned int length = text.size();
+
+for(unsigned int i=0; istopWords.push_back(stopWord);
+}
+  }
+
+  /* Article indexer methods */
+  void *Indexer::indexArticles(void *ptr) {
+pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL);
+Indexer *self = (Indexer *)ptr;
+unsigned int indexedArticleCount = 0;
+indexerToken token;
+
+self->indexingPrelude(self->getIndexPath());
+
+while (self->popFromToIndexQueue(token)) {
+  self->index(token.url,
+ token.accentedTitle,
+ token.title,
+ token.keywords,
+ token.content,
+ token.snippet,
+ token.size,
+ token.wordCount
+ );
+
+  indexedArticleCount += 1;
+
+  /* Make a hard-disk flush every 10.000 articles */
+  if (indexedArticleCount % 5000 == 0) {
+   self->flush();
+  }
+
+  /* Test if the thread should be cancelled */
+  pthread_testcancel();
+}
+self->indexingPostlude();
+
+/* Write content id file */
+string path = appendToDirectory(self->getIndexPath(), "content.id");
+writeTextFile(path, self->getZimId());
+
+usleep(100);
+
+self->articleIndexerRunning(false);
+pthread_exit(NULL);
+return NULL;
+  }
+
+  void Indexer::articleIndexerRunning(bool value) {
+pthread_mutex_lock();
+this->articleIndexerRunningFlag = value;
+pthread_mutex_unlock();
+  }
+
+  bool Indexer::isArticleIndexerRunning() {
+pthread_mutex_lock();
+bool retVal = this->articleIndexerRunningFlag;
+pthread_mutex_unlock();
+return retVal;
+  }
+
+  /* ToIndexQueue methods */
+  bool Indexer::isToIndexQueueEmpty() {
+pthread_mutex_lock();
+bool retVal = this->toIndexQueue.empty();
+pthread_mutex_unlock();
+return retVal;
+  }
+
+  void Indexer::pushToIndexQueue(indexerToken ) {
+pthread_mutex_lock();
+this->toIndexQueue.push(token);
+pthread_mutex_unlock();
+usleep(int(this->toIndexQueue.size() / 200) / 10 * 1000);
+  }
+
+  bool Indexer::popFromToIndexQueue(indexerToken ) {
+while (this->isToIndexQueueEmpty()) {
+  usleep(500);
+  if (this->getVerboseFlag()) {
+   std::cout << "Waiting... ToIndexQueue is empty for now..." << std::endl;
+  }
+
+  pthread_testcancel();
+}
+
+pthread_mutex_lock();
+token = this->toIndexQueue.front();
+this->toIndexQueue.pop();
+pthread_mutex_unlock();
+
+if (token.title == ""){
+//This is a empty token, end of the queue.
+return false;
+}
+return true;
+  }
+
+  /* Index 

[MediaWiki-commits] [Gerrit] openzim[master]: Bug fix: correctly update cluster offsets in directory entries.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296158 )

Change subject: Bug fix: correctly update cluster offsets in directory entries.
..


Bug fix: correctly update cluster offsets in directory entries.

This is a follow-up to f5de40f94b30795f42bb9388cbb46df9cd605167.

When we moved the blob writing to the main dirent-creation loop,
we ended up making separate *copies* of the dirents and updating
blob/cluster information in these, instead of the dirents in the
main list which will eventually be written.  Make the auxilliary
lists contain dirent *pointers* to avoid this problem.

Change-Id: I008fa700acd90c3c51614bde65d61ffbc6061872
---
M zimlib/include/zim/writer/zimcreator.h
M zimlib/src/zimcreator.cpp
2 files changed, 10 insertions(+), 9 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/writer/zimcreator.h 
b/zimlib/include/zim/writer/zimcreator.h
index 6f47402..2e52d7e 100644
--- a/zimlib/include/zim/writer/zimcreator.h
+++ b/zimlib/include/zim/writer/zimcreator.h
@@ -33,6 +33,7 @@
 {
   public:
 typedef std::vector DirentsType;
+typedef std::vector DirentPtrsType;
 typedef std::vector SizeVectorType;
 typedef std::vector OffsetsType;
 typedef std::map MimeTypes;
diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index bc977ea..46c550f 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -144,7 +144,7 @@
   // because we don't know which one will fill up first.  We also need
   // to track the dirents currently in each, so we can fix up the
   // cluster index if the other one ends up written first.
-  DirentsType compDirents, uncompDirents;
+  DirentPtrsType compDirents, uncompDirents;
   Cluster compCluster, uncompCluster;
   compCluster.setCompression(compression);
   uncompCluster.setCompression(zimcompNone);
@@ -188,11 +188,11 @@
   }
 }
 
-dirents.push_back(dirent);
 currentSize +=
   dirent.getDirentSize() /* for directory entry */ +
   sizeof(offset_type) /* for url pointer list */ +
   sizeof(size_type) /* for title pointer list */;
+dirents.push_back(dirent);
 
 // If this is a redirect, we're done: there's no blob to add.
 if (dirent.isRedirect())
@@ -217,7 +217,7 @@
 }
 
 Cluster *cluster;
-DirentsType *myDirents, *otherDirents;
+DirentPtrsType *myDirents, *otherDirents;
 if (dirent.isCompress())
 {
   cluster = 
@@ -230,9 +230,9 @@
   myDirents = 
   otherDirents = 
 }
-myDirents->push_back(dirent);
-dirent.setCluster(clusterOffsets.size(), cluster->count());
+dirents.back().setCluster(clusterOffsets.size(), cluster->count());
 cluster->addBlob(blob);
+myDirents->push_back(&(dirents.back()));
 
 // If cluster is now large enough, write it to disk.
 if (cluster->size() >= minChunkSize * 1024)
@@ -247,10 +247,10 @@
   cluster->clear();
   myDirents->clear();
   // Update the cluster number of the dirents *not* written to disk.
-  for (DirentsType::iterator di = otherDirents->begin();
+  for (DirentPtrsType::iterator di = otherDirents->begin();
di != otherDirents->end(); ++di)
   {
-di->setCluster(clusterOffsets.size(), di->getBlobNumber());
+(*di)->setCluster(clusterOffsets.size(), (*di)->getBlobNumber());
   }
   offset_type end = out.tellp();
   currentSize += (end - start) +
@@ -263,10 +263,10 @@
   {
 clusterOffsets.push_back(out.tellp());
 out << compCluster;
-for (DirentsType::iterator di = uncompDirents.begin();
+for (DirentPtrsType::iterator di = uncompDirents.begin();
  di != uncompDirents.end(); ++di)
 {
-  di->setCluster(clusterOffsets.size(), di->getBlobNumber());
+  (*di)->setCluster(clusterOffsets.size(), (*di)->getBlobNumber());
 }
   }
   compCluster.clear();

-- 
To view, visit https://gerrit.wikimedia.org/r/296158
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I008fa700acd90c3c51614bde65d61ffbc6061872
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: C. Scott Ananian 
Gerrit-Reviewer: C. Scott Ananian 
Gerrit-Reviewer: Kelson 
Gerrit-Reviewer: Mgautierfr 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Fix memory corruption in decodeUrl.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/303540 )

Change subject: Fix memory corruption in decodeUrl.
..


Fix memory corruption in decodeUrl.

Reuse the function from kiwix as it does the same thing but with much
better c++ code.

If we pass a char to sscanf while letting it thinks it is a uint,
sscanf will initialize 4 bytes. As there is only byte associated to the
char, this lead to undefined behavior.

This bug was not found before as we were using the function as inline
function. Compiler optimization probably hid the error.

Change-Id: I618f8b9ede083e6580044f967153bfbe3ee3d294
---
M zimwriterfs/tools.cpp
1 file changed, 13 insertions(+), 10 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/tools.cpp b/zimwriterfs/tools.cpp
index 49bc79e..cc6091d 100644
--- a/zimwriterfs/tools.cpp
+++ b/zimwriterfs/tools.cpp
@@ -246,19 +246,22 @@
 
 }
 
-std::string decodeUrl(const std::string ) {
-  std::string decodedUrl = encodedUrl;
-  std::string::size_type pos = 0;
-  char ch;
+static char charFromHex(std::string a) {
+  std::istringstream Blat(a);
+  int Z;
+  Blat >> std::hex >> Z;
+  return char (Z);
+}
 
-  while ((pos = decodedUrl.find('%', pos)) != std::string::npos &&
-pos + 2 < decodedUrl.length()) {
-sscanf(decodedUrl.substr(pos + 1, 2).c_str(), "%x", (unsigned int*));
-decodedUrl.replace(pos, 3, 1, ch);
+std::string decodeUrl(const std::string ) {
+  std::string url = originalUrl;
+  std::string::size_type pos = 0;
+  while ((pos = url.find('%', pos)) != std::string::npos &&
+pos + 2 < url.length()) {
+url.replace(pos, 3, 1, charFromHex(url.substr(pos + 1, 2)));
 ++pos;
   }
-
-  return decodedUrl;
+  return url;
 }
 
 std::string removeLastPathElement(const std::string& path, const bool 
removePreSeparator, const bool removePostSeparator) {

-- 
To view, visit https://gerrit.wikimedia.org/r/303540
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I618f8b9ede083e6580044f967153bfbe3ee3d294
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Include files from srcdir; built files in builddir.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295111 )

Change subject: Include files from srcdir; built files in builddir.
..


Include files from srcdir; built files in builddir.

Change-Id: I84f097567b944601534cdbc3313216c0364211bd
---
M zimlib/examples/Makefile.am
M zimlib/src/Makefile.am
M zimlib/src/tools/Makefile.am
M zimlib/test/Makefile.am
M zimreader/src/Makefile.am
M zimwriterdb/src/Makefile.am
M zimwriterdb/test/Makefile.am
7 files changed, 7 insertions(+), 7 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/examples/Makefile.am b/zimlib/examples/Makefile.am
index 13ac63f..3157cf0 100644
--- a/zimlib/examples/Makefile.am
+++ b/zimlib/examples/Makefile.am
@@ -1,4 +1,4 @@
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include
 noinst_PROGRAMS = createZimExample
 createZimExample_SOURCES = createZimExample.cpp
 LDADD = $(top_builddir)/src/libzim.la
diff --git a/zimlib/src/Makefile.am b/zimlib/src/Makefile.am
index 823a3b6..c0bd632 100644
--- a/zimlib/src/Makefile.am
+++ b/zimlib/src/Makefile.am
@@ -1,4 +1,4 @@
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include
 
 lib_LTLIBRARIES = libzim.la
 
diff --git a/zimlib/src/tools/Makefile.am b/zimlib/src/tools/Makefile.am
index a50b7c8..410504a 100644
--- a/zimlib/src/tools/Makefile.am
+++ b/zimlib/src/tools/Makefile.am
@@ -1,4 +1,4 @@
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include -I$(top_srcdir)/src
 if MAKE_BENCHMARK
   ZIMBENCH = zimbench
 endif
diff --git a/zimlib/test/Makefile.am b/zimlib/test/Makefile.am
index 34dad12..29dd7bd 100644
--- a/zimlib/test/Makefile.am
+++ b/zimlib/test/Makefile.am
@@ -1,4 +1,4 @@
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include
 
 noinst_PROGRAMS = zimlib-test
 
diff --git a/zimreader/src/Makefile.am b/zimreader/src/Makefile.am
index ff40f97..b7fc747 100644
--- a/zimreader/src/Makefile.am
+++ b/zimreader/src/Makefile.am
@@ -10,7 +10,7 @@
 .css.cpp:
ecppc $(ECPPFLAGS) -b $(ECPPFLAGS_CSS) $<
 
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include
 
 bin_PROGRAMS = zimreader
 
diff --git a/zimwriterdb/src/Makefile.am b/zimwriterdb/src/Makefile.am
index 46b0fac..854f2cb 100644
--- a/zimwriterdb/src/Makefile.am
+++ b/zimwriterdb/src/Makefile.am
@@ -1,6 +1,6 @@
 bin_PROGRAMS = zimwriterdb zimindexer zimcreatorsearch wikizim
 
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include
 
 # wikizim
 #
diff --git a/zimwriterdb/test/Makefile.am b/zimwriterdb/test/Makefile.am
index fffe02a..52262b2 100644
--- a/zimwriterdb/test/Makefile.am
+++ b/zimwriterdb/test/Makefile.am
@@ -6,4 +6,4 @@
 createzim_t_LDFLAGS = -lcxxtools -lzim
 createzim_t_SOURCES = createzim-t.cpp
 
-AM_CPPFLAGS=-I$(top_builddir)/include
+AM_CPPFLAGS=-I$(top_srcdir)/include

-- 
To view, visit https://gerrit.wikimedia.org/r/295111
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I84f097567b944601534cdbc3313216c0364211bd
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: C. Scott Ananian 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Convert READMEs to markdown.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295035 )

Change subject: Convert READMEs to markdown.
..


Convert READMEs to markdown.

Change-Id: I596bec7ef70bd0a7d45ba67d4fa313d93c861890
---
A README.md
D zimlib/README
A zimlib/README
A zimlib/README.md
D zimreader-java/README
A zimreader-java/README.md
D zimreader/README
A zimreader/README.md
A zimwriterdb/README.md
R zimwriterfs/README.md
10 files changed, 100 insertions(+), 61 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/README.md b/README.md
new file mode 100644
index 000..c5fee8e
--- /dev/null
+++ b/README.md
@@ -0,0 +1,11 @@
+This OpenZIM project repository contains the sources for:
+* [zimlib](./zimlib#readme): A library for reading and writing ZIM files.
+* [zimwriterfs](./zimwriterfs#readme): A tool for creating ZIM files based on
+  contents on a local filesystem.
+* [zimreader-java](./zimreader-java#readme): A ZIM reader written in Java.
+
+Old code, provided for archive purposes:
+* [zimreader](./zimreader#readme): An old reader for the Zeno file format, the
+  predecessor to the ZIM file format.
+* [zimwriterdb](./zimwriterdb#readme): Demonstrates how to write a ZIM file
+  based on data in a database.
diff --git a/zimlib/README b/zimlib/README
deleted file mode 100644
index e12d9c0..000
--- a/zimlib/README
+++ /dev/null
@@ -1,6 +0,0 @@
-The zimlib is the standard implementation of the ZIM specification. It
-is a library which implements the read and write method for ZIM
-files. The zimlib is released under the GPLv2 license terms. Use the
-zimlib in your own software - like reader applications - to make them
-ZIM-capable without the need having to dig too much into the ZIM file
-format.
diff --git a/zimlib/README b/zimlib/README
new file mode 12
index 000..42061c0
--- /dev/null
+++ b/zimlib/README
@@ -0,0 +1 @@
+README.md
\ No newline at end of file
diff --git a/zimlib/README.md b/zimlib/README.md
new file mode 100644
index 000..ce35ed9
--- /dev/null
+++ b/zimlib/README.md
@@ -0,0 +1,20 @@
+zimlib
+--
+
+The `zimlib` library is the standard implementation of the ZIM
+specification.  It is a library which implements read and write
+methods for ZIM files.
+
+Use the zimlib in your own software --- for example, reader
+applications --- to make them ZIM-capable without the need having to
+dig too much into the ZIM file format.
+
+To build:
+```
+./autogen.sh
+./configure
+make
+```
+
+The `zimlib` library is released under the GPLv2 license
+terms.
diff --git a/zimreader-java/README b/zimreader-java/README
deleted file mode 100644
index 973e7b2..000
--- a/zimreader-java/README
+++ /dev/null
@@ -1,46 +0,0 @@
-
-ZIMReader in Java
-=
-
-This is a port of the ZIMReader in Java. One 
-of the aims of this project is to enable mobile 
-users developing on Android, J2ME and other 
-platforms to use ZIM files and build offline 
-Wikipedia readers.
-
-I'll soon add a javadoc, in the mean time 
-you can go through the comments that I have 
-provided in the source code. Also, try running 
-the example ZIMTest.java. 
-
-This code was built on Java 1.6 and has not 
-been tested on previous versions. However, I'll
-do that soon on previous ones as well. In the 
-next release, I intend to provide an Ant file.
-
-If you find any bugs, please report them to 
- or visit the 
-IRC channel #openzim on Freenode and ping 
-'gremmachook'.
-
-This library is licensed under the LGPL v3.0 
-license. However, I understand that sometimes 
-licensing can be a problem for you. I would be 
-happy to provide a alternate lesser permissive 
-license if the need be.
-
-Found this library useful? Drop in a mail, I 
-love to hear feedback.
-
-Before this ends, I'd like to thank Lasse Collin 
-, who maintains the 
-Tukaani project, for his port of XZ in Java, 
-without which it wouldn't have been possible for 
-me to write this library.
- 
-
--- Arunesh Mathur
-   
-
-   
-
diff --git a/zimreader-java/README.md b/zimreader-java/README.md
new file mode 100644
index 000..abce858
--- /dev/null
+++ b/zimreader-java/README.md
@@ -0,0 +1,42 @@
+ZIMReader in Java
+=
+
+This is a port of the ZIMReader in Java. One
+of the aims of this project is to enable mobile
+users developing on Android, J2ME and other
+platforms to use ZIM files and build offline
+Wikipedia readers.
+
+I'll soon add a javadoc, in the mean time
+you can go through the comments that I have
+provided in the source code. Also, try running
+the example ZIMTest.java.
+
+This code was built on Java 1.6 and has not
+been tested on previous versions. However, I'll
+do that soon on previous ones as well. In the
+next release, I intend to provide an Ant 

[MediaWiki-commits] [Gerrit] openzim[master]: Readd call to stripTitleInvalidChars.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295633 )

Change subject: Readd call to stripTitleInvalidChars.
..


Readd call to stripTitleInvalidChars.

Commit 1c969dd remove the call to stripTitleInvalidChars in Article.
This was a error while rebasing changes. Fix this.

Change-Id: I001bcdce81db27a359703221c929620e543b0a32
---
M zimwriterfs/article.cpp
M zimwriterfs/tools.h
2 files changed, 2 insertions(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/article.cpp b/zimwriterfs/article.cpp
index f743cde..4aeb083 100644
--- a/zimwriterfs/article.cpp
+++ b/zimwriterfs/article.cpp
@@ -69,6 +69,7 @@
  GumboNode* title_text = 
(GumboNode*)(child->v.element.children.data[0]);
  if (title_text->type == GUMBO_NODE_TEXT) {
title = title_text->v.text.text;
+   stripTitleInvalidChars(title);
  }
}
  }
diff --git a/zimwriterfs/tools.h b/zimwriterfs/tools.h
index ec6b454..8b43da4 100644
--- a/zimwriterfs/tools.h
+++ b/zimwriterfs/tools.h
@@ -40,6 +40,7 @@
 
 void replaceStringInPlaceOnce(std::string& subject, const std::string& search, 
const std::string& replace);
 void replaceStringInPlace(std::string& subject, const std::string& search, 
const std::string& replace);
+void stripTitleInvalidChars(std::string & str);
 
 std::string extractRedirectUrlFromHtml(const GumboVector* head_children);
 void getLinks(GumboNode* node, std::map );

-- 
To view, visit https://gerrit.wikimedia.org/r/295633
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I001bcdce81db27a359703221c929620e543b0a32
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Move articleSource's related stuffs in articlesource.(h|cpp).

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295517 )

Change subject: Move articleSource's related stuffs in articlesource.(h|cpp).
..


Move articleSource's related stuffs in articlesource.(h|cpp).

Change-Id: Iee91484679bf401a693af1ca7e1c7e34f2c741d0
---
M zimwriterfs/Makefile.am
A zimwriterfs/articlesource.cpp
A zimwriterfs/articlesource.h
M zimwriterfs/zimwriterfs.cpp
4 files changed, 305 insertions(+), 229 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/Makefile.am b/zimwriterfs/Makefile.am
index 3383e35..6e46553 100644
--- a/zimwriterfs/Makefile.am
+++ b/zimwriterfs/Makefile.am
@@ -4,4 +4,5 @@
 zimwriterfs_SOURCES= \
 zimwriterfs.cpp \
 tools.cpp \
-article.cpp
+article.cpp \
+articlesource.cpp
diff --git a/zimwriterfs/articlesource.cpp b/zimwriterfs/articlesource.cpp
new file mode 100644
index 000..8b0b34c
--- /dev/null
+++ b/zimwriterfs/articlesource.cpp
@@ -0,0 +1,256 @@
+/*
+ * Copyright 2013-2016 Emmanuel Engelhart 
+ * Copyright 2016 Matthieu Gautier 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU  General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ * MA 02110-1301, USA.
+ */
+
+#include "articlesource.h"
+#include "article.h"
+#include "tools.h"
+
+#include 
+
+#include 
+#include 
+#include 
+
+bool popFromFilenameQueue(std::string );
+bool isVerbose();
+
+extern std::string welcome;
+extern std::string language;
+extern std::string creator;
+extern std::string publisher;
+extern std::string title;
+extern std::string description;
+extern std::string directoryPath;
+
+std::map counters;
+char *data = NULL;
+unsigned int dataSize = 0;
+
+
+
+ArticleSource::ArticleSource() {
+  /* Prepare metadata */
+  metadataQueue.push("Language");
+  metadataQueue.push("Publisher");
+  metadataQueue.push("Creator");
+  metadataQueue.push("Title");
+  metadataQueue.push("Description");
+  metadataQueue.push("Date");
+  metadataQueue.push("Favicon");
+  metadataQueue.push("Counter");
+}
+
+void ArticleSource::init_redirectsQueue_from_file(const std::string& path){
+std::ifstream in_stream;
+std::string line;
+
+in_stream.open(path.c_str());
+while (std::getline(in_stream, line)) {
+  redirectsQueue.push(line);
+}
+in_stream.close();
+}
+
+std::string ArticleSource::getMainPage() {
+  return welcome;
+}
+
+Article *article = NULL;
+const zim::writer::Article* ArticleSource::getNextArticle() {
+  std::string path;
+
+  if (article != NULL) {
+delete(article);
+  }
+
+  if (!metadataQueue.empty()) {
+path = metadataQueue.front();
+metadataQueue.pop();
+article = new MetadataArticle(path);
+  } else if (!redirectsQueue.empty()) {
+std::string line = redirectsQueue.front();
+redirectsQueue.pop();
+article = new RedirectArticle(line);
+  } else if (popFromFilenameQueue(path)) {
+do {
+  article = new Article(path);
+} while (article && article->isInvalid() && popFromFilenameQueue(path));
+  } else {
+article = NULL;
+  }
+
+  /* Count mimetypes */
+  if (article != NULL && !article->isRedirect()) {
+
+if (isVerbose())
+  std::cout << "Creating entry for " << article->getAid() << std::endl;
+
+std::string mimeType = article->getMimeType();
+if (counters.find(mimeType) == counters.end()) {
+  counters[mimeType] = 1;
+} else {
+  counters[mimeType]++;
+}
+  }
+
+  return article;
+}
+
+zim::Blob ArticleSource::getData(const std::string& aid) {
+
+  if (isVerbose())
+std::cout << "Packing data for " << aid << std::endl;
+
+  if (data != NULL) {
+delete(data);
+data = NULL;
+  }
+
+  if (aid.substr(0, 3) == "/M/") {
+std::string value; 
+
+if ( aid == "/M/Language") {
+  value = language;
+} else if (aid == "/M/Creator") {
+  value = creator;
+} else if (aid == "/M/Publisher") {
+  value = publisher;
+} else if (aid == "/M/Title") {
+  value = title;
+} else if (aid == "/M/Description") {
+  value = description;
+} else if ( aid == "/M/Date") {
+  time_t t = time(0);
+  struct tm * now = localtime( & t );
+  std::stringstream stream;
+  stream << (now->tm_year + 1900) << '-' 
+   

[MediaWiki-commits] [Gerrit] openzim[master]: Use the system's libgumbo.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295039 )

Change subject: Use the system's libgumbo.
..


Use the system's libgumbo.

Change-Id: Id6b06033bba6c1d47d305b50259e7eaf41c073e2
---
M zimwriterfs/Makefile.am
M zimwriterfs/README.md
M zimwriterfs/configure.ac
D zimwriterfs/gumbo/attribute.c
D zimwriterfs/gumbo/attribute.h
D zimwriterfs/gumbo/char_ref.c
D zimwriterfs/gumbo/char_ref.h
D zimwriterfs/gumbo/error.c
D zimwriterfs/gumbo/error.h
D zimwriterfs/gumbo/gumbo.h
D zimwriterfs/gumbo/insertion_mode.h
D zimwriterfs/gumbo/parser.c
D zimwriterfs/gumbo/parser.h
D zimwriterfs/gumbo/string_buffer.c
D zimwriterfs/gumbo/string_buffer.h
D zimwriterfs/gumbo/string_piece.c
D zimwriterfs/gumbo/string_piece.h
D zimwriterfs/gumbo/tag.c
D zimwriterfs/gumbo/token_type.h
D zimwriterfs/gumbo/tokenizer.c
D zimwriterfs/gumbo/tokenizer.h
D zimwriterfs/gumbo/tokenizer_states.h
D zimwriterfs/gumbo/utf8.c
D zimwriterfs/gumbo/utf8.h
D zimwriterfs/gumbo/util.c
D zimwriterfs/gumbo/util.h
D zimwriterfs/gumbo/vector.c
D zimwriterfs/gumbo/vector.h
28 files changed, 9 insertions(+), 33,053 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved




-- 
To view, visit https://gerrit.wikimedia.org/r/295039
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id6b06033bba6c1d47d305b50259e7eaf41c073e2
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: C. Scott Ananian 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Use explicit LZMA_STREAM_INIT initializer, instead of memset.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296548 )

Change subject: Use explicit LZMA_STREAM_INIT initializer, instead of memset.
..


Use explicit LZMA_STREAM_INIT initializer, instead of memset.

This follows the recommendations in lzma.h.

Change-Id: Ib6b392ba3c6a249fa55b8cf8c04cdae1ae407925
---
M zimlib/src/lzmastream.cpp
1 file changed, 2 insertions(+), 3 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/src/lzmastream.cpp b/zimlib/src/lzmastream.cpp
index d880933..bd02bd8 100644
--- a/zimlib/src/lzmastream.cpp
+++ b/zimlib/src/lzmastream.cpp
@@ -58,11 +58,10 @@
   }
 
   LzmaStreamBuf::LzmaStreamBuf(std::streambuf* sink_, uint32_t preset, 
lzma_check check, unsigned bufsize_)
-: obuffer(bufsize_),
+: stream(LZMA_STREAM_INIT),
+  obuffer(bufsize_),
   sink(sink_)
   {
-std::memset(reinterpret_cast(), 0, sizeof(stream));
-
 checkError(
   ::lzma_easy_encoder(, preset, check));
 

-- 
To view, visit https://gerrit.wikimedia.org/r/296548
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib6b392ba3c6a249fa55b8cf8c04cdae1ae407925
Gerrit-PatchSet: 3
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: C. Scott Ananian 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Move zim::writer::ArticleSource::getData() to zim::writer::A...

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295460 )

Change subject: Move zim::writer::ArticleSource::getData() to 
zim::writer::Article::getData().
..


Move zim::writer::ArticleSource::getData() to zim::writer::Article::getData().

When building a ZIM file incrementally, it makes more sense to provide
the data upfront as soon as you return the Article object from
ArticleSource::getNextArticle.  This is the way that the Category
object already works.

Change-Id: I78f2a69ae3931cc43a51cdab360468e13fcc54cb
---
M zimlib/examples/createZimExample.cpp
M zimlib/include/zim/writer/articlesource.h
M zimlib/src/articlesource.cpp
M zimlib/src/zimcreator.cpp
4 files changed, 54 insertions(+), 14 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/examples/createZimExample.cpp 
b/zimlib/examples/createZimExample.cpp
index 914000c..95b8d09 100644
--- a/zimlib/examples/createZimExample.cpp
+++ b/zimlib/examples/createZimExample.cpp
@@ -40,7 +40,7 @@
 virtual std::string getMimeType() const;
 virtual std::string getRedirectAid() const;
 
-zim::Blob data()
+virtual zim::Blob getData() const
 { return zim::Blob(&_data[0], _data.size()); }
 };
 
@@ -96,7 +96,6 @@
 explicit TestArticleSource(unsigned max = 16);
 
 virtual const zim::writer::Article* getNextArticle();
-virtual zim::Blob getData(const std::string& aid);
 };
 
 TestArticleSource::TestArticleSource(unsigned max)
@@ -119,14 +118,6 @@
   unsigned n = _next++;
 
   return &_articles[n];
-}
-
-zim::Blob TestArticleSource::getData(const std::string& aid)
-{
-  unsigned n;
-  std::istringstream s(aid);
-  s >> n;
-  return _articles[n-1].data();
 }
 
 int main(int argc, char* argv[])
diff --git a/zimlib/include/zim/writer/articlesource.h 
b/zimlib/include/zim/writer/articlesource.h
index 1fda337..94ee91b 100644
--- a/zimlib/include/zim/writer/articlesource.h
+++ b/zimlib/include/zim/writer/articlesource.h
@@ -20,16 +20,16 @@
 #ifndef ZIM_WRITER_ARTICLESOURCE_H
 #define ZIM_WRITER_ARTICLESOURCE_H
 
+#include 
 #include 
 #include 
 #include 
 
 namespace zim
 {
-  class Blob;
-
   namespace writer
   {
+class ArticleSource;
 class Article
 {
   public:
@@ -45,9 +45,26 @@
 virtual bool shouldCompress() const;
 virtual std::string getRedirectAid() const;
 virtual std::string getParameter() const;
+/* Idealy this method should be pure virtual,
+ * but for compatibility reasons, provide a default implementation
+ * using the old ArticleSourc::getData.
+ */
+virtual Blob getData() const;
 
 // returns the next category id, to which the article is assigned to
 virtual std::string getNextCategory();
+
+  
//
+  /* For API compatibility.
+   * The default Article::getData call ArticleSource::getData.
+   * So store the source of article in article to let default API 
compatible
+   * function do its job.
+   * This should be removed once every users switch to new API.
+   */
+  private:
+mutable ArticleSource*  __source;
+friend class ZimCreator;
+  
//
 };
 
 class Category
@@ -63,7 +80,6 @@
   public:
 virtual void setFilename(const std::string& fname) { }
 virtual const Article* getNextArticle() = 0;
-virtual Blob getData(const std::string& aid) = 0;
 virtual Uuid getUuid();
 virtual std::string getMainPage();
 virtual std::string getLayoutPage();
@@ -73,6 +89,16 @@
 // ids. Using this list, the writer fetches the category data using
 // this method.
 virtual Category* getCategory(const std::string& cid);
+
+
/**/
+/* For API compatibility.
+ * The default Article::getData call ArticleSource::getData.
+ * So keep the getData. Do not set it pure virtual cause we want new
+ * code to not use it.
+ * This should be removed once every users switch to new API.
+ */
+virtual Blob getData(const std::string& aid) { throw "This should not 
be called"; };
+
/**/
 };
 
   }
diff --git a/zimlib/src/articlesource.cpp b/zimlib/src/articlesource.cpp
index 4d1ec91..cc72ae2 100644
--- a/zimlib/src/articlesource.cpp
+++ b/zimlib/src/articlesource.cpp
@@ -17,6 +17,7 @@
  *
  */
 
+#include 
 #include 
 
 namespace zim
@@ -68,6 +69,19 @@
   return std::string();
 }
 
+
/**/
+/* For API compatibility.
+ * The default Article::getData call 

[MediaWiki-commits] [Gerrit] openzim[master]: Update build instructions for MacOS X.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/295382 )

Change subject: Update build instructions for MacOS X.
..


Update build instructions for MacOS X.

Change-Id: I0190ada3bf02c42ed34cb6d253516a9c662357e1
---
M zimlib/README.md
M zimwriterfs/README.md
2 files changed, 31 insertions(+), 2 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/README.md b/zimlib/README.md
index ce35ed9..9aef82c 100644
--- a/zimlib/README.md
+++ b/zimlib/README.md
@@ -16,5 +16,24 @@
 make
 ```
 
+OSX compilation
+---
+On MacOSX, you'll need to install some packages from
+[homebrew](http://brew.sh/):
+```
+brew update
+brew install xz libmagic
+```
+You'll also want to add `/usr/local/include` to your search path,
+for example:
+```
+./autogen.sh
+./configure CFLAGS=-I/usr/local/include CXXFLAGS=-I/usr/local/include
+make
+```
+
+License
+---
+
 The `zimlib` library is released under the GPLv2 license
 terms.
diff --git a/zimwriterfs/README.md b/zimwriterfs/README.md
index 8695643..1cbb7dc 100644
--- a/zimwriterfs/README.md
+++ b/zimwriterfs/README.md
@@ -44,9 +44,19 @@
 ```
 
 OSX compilation
-
+---
+OSX builds are similar to Linux, except we use homebrew.  Change to
+`../zimlib` and build zimlib as instructed in the README there.  Then
+return here and:
+```
+brew install gumbo-parser
+./autogen.sh
+./configure CXXFLAGS="-I../zimlib/include -I/usr/local/include" 
LDFLAGS=-L../zimlib/src/.libs
+make
+```
 
-On MaxOSX, a script helps you build zimwriterfs both statically and 
dynamically.
+Alternatively, there is a script included here to help you build both
+static and dynamic binaries for `zimwriterfs`.
 You must have a working and set up Kiwix repository (with dependencies ready).
 
 1. Install libmagic with brew (it's important)

-- 
To view, visit https://gerrit.wikimedia.org/r/295382
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0190ada3bf02c42ed34cb6d253516a9c662357e1
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: C. Scott Ananian 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Fixed compilation on OSX. made a script to compile both stat...

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/216507 )

Change subject: Fixed compilation on OSX. made a script to compile both static 
and shared
..


Fixed compilation on OSX. made a script to compile both static and shared

Change-Id: I9dd7b7039988be13f932bdb3e4b30c5bcf2e6783
---
M zimwriterfs/README
A zimwriterfs/macosx-build.sh
2 files changed, 79 insertions(+), 12 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/README b/zimwriterfs/README
index c1d2b10..d8d6a83 100644
--- a/zimwriterfs/README
+++ b/zimwriterfs/README
@@ -32,16 +32,10 @@
 OSX compilation
 
 
-Until the autotool is configured properly to support flexible options,
-you can still compile and use zimwriterfs with some manual work.
+On MaxOSX, a script helps you build zimwriterfs both statically and 
dynamically.
+You must have a working and set up Kiwix repository (with dependencies ready).
 
-1. Have a working Kiwix dev environnment (up to make inside src/dependencies)
-2. Install libmagic: brew install libmagic
-3. zimwriterfs compilation
-./autogen.sh
-LDFLAGS="-L$KIWIX_ROOT/src/dependencies/zimlib-1.1/build/lib/ 
-L/usr/local/Cellar/libmagic/5.14/lib/ 
-L$KIWIX_ROOT/src/dependencies/xz/build/lib/" CXXFLAGS="-I 
$KIWIX_ROOT/src/kiwix/src/dependencies/zimlib-1.1/include/ 
-I/usr/local/Cellar/libmagic/5.14/include/ 
-I$KIWIX_ROOT/src/kiwix/src/dependencies/xz/src/liblzma/lzma/ 
-I$KIWIX_ROOT/src/kiwix/src/dependencies/xz/src/liblzma/" ./configure && make
-4. Copy libs if not in LIBRARY_PATH
-ln -s $KIWIX_ROOT/src/dependencies/zimlib-1.1/build/lib/libzim.dylib .
-ln -s $KIWIX_ROOT/src/dependencies/xz/build/lib/liblzma.dylib .
-5. Use it
-./zimwriterfs
\ No newline at end of file
+1. Install libmagic with brew (it's important)
+   - ruby -e "$(curl -fsSL 
https://raw.githubusercontent.com/Homebrew/install/master/install)"
+   - brew install libmagic
+2. KIWIX_ROOT=/Users/xxx/src/kiwix ./macosx-build.sh
diff --git a/zimwriterfs/macosx-build.sh b/zimwriterfs/macosx-build.sh
new file mode 100755
index 000..6f2e3bc
--- /dev/null
+++ b/zimwriterfs/macosx-build.sh
@@ -0,0 +1,73 @@
+#!/bin/bash
+
+if [ "x$KIWIX_ROOT" = "x" ];
+   then
+   echo "You must define envvironment variable KIWIX_ROOT to the root of 
Kiwix git repository. Exiting."
+   exit 1
+fi
+
+ZIM_DIR="${KIWIX_ROOT}/src/dependencies/zimlib-1.2"
+LIBZIM_DIR="${ZIM_DIR}/build/lib"
+LZMA_DIR="${KIWIX_ROOT}/src/dependencies/xz/build"
+LIBLZMA_DIR="${LZMA_DIR}/lib"
+MAGIC_DIR="/usr/local"
+LIBMAGIC_DIR="${MAGIC_DIR}/lib"
+STATIC_LDFLAGS="${LIBZIM_DIR}/libzim.a ${LIBLZMA_DIR}/liblzma.a 
${LIBMAGIC_DIR}/libmagic.a -lz"
+#LDFLAGS="-L${KIWIX_ROOT}/src/dependencies/zimlib-1.2/build/lib/ -lzim 
-L${KIWIX_ROOT}/src/dependencies/xz/build/lib/ -llzma 
-L/usr/local/Cellar/libmagic/5.22_1/lib/ -lmagic -lz"
+
+CC="clang -O3"
+CXX="clang++ -O3"
+CXXFLAGS="-Igumbo -I${ZIM_DIR}/include -I${MAGIC_DIR}/include/ 
-I${LZMA_DIR}/include"
+CFLAGS="$CXXFLAGS"
+LDFLAGS="-L. -lzim -llzma -lmagic -lz"
+SHARED_OUTPUT="zimwriterfs-shared"
+STATIC_OUTPUT="zimwriterfs-static"
+
+function compile {
+   $CXX $CXXFLAGS -c zimwriterfs.cpp
+   $CC $CFLAGS -c gumbo/utf8.c
+   $CC $CFLAGS -c gumbo/string_buffer.c
+   $CC $CFLAGS -c gumbo/parser.c
+   $CC $CFLAGS -c gumbo/error.c
+   $CC $CFLAGS -c gumbo/string_piece.c
+   $CC $CFLAGS -c gumbo/tag.c
+   $CC $CFLAGS -c gumbo/vector.c
+   $CC $CFLAGS -c gumbo/tokenizer.c
+   $CC $CFLAGS -c gumbo/util.c
+   $CC $CFLAGS -c gumbo/char_ref.c
+   $CC $CFLAGS -c gumbo/attribute.c
+}
+
+echo "Compiling zimwriterfs for OSX as static then shared."
+
+# remove object files
+echo "Clean-up repository (*.o, zimwriterfs-*, *.dylib)"
+rm *.o ${STATIC_OUTPUT} ${SHARED_OUTPUT} *.dylib
+
+# compile source code
+echo "Compile source code file objects"
+compile
+
+# link statically
+echo "Link statically into ${STATIC_OUTPUT}"
+$CXX $CXXFLAGS $STATIC_LDFLAGS -o ${STATIC_OUTPUT} *.o
+
+# copy dylib to current folder
+echo "Copy dylibs into the current folder"
+cp -v ${KIWIX_ROOT}/src/dependencies/zimlib-1.2/build/lib/libzim.dylib .
+cp -v ${KIWIX_ROOT}/src/dependencies/xz/build/lib/liblzma.dylib .
+cp -v ${LIBMAGIC_DIR}/libmagic.dylib .
+chmod 644 ./libmagic.dylib
+
+# link dynamicaly
+echo "Link dynamically into ${SHARED_OUTPUT}"
+$CXX $CXXFLAGS $LDFLAGS -o zimwriterfs-shared *.o
+
+echo "Fix install name tool on ${SHARED_OUTPUT}"
+install_name_tool -change ${LIBZIM_DIR}/libzim.0.dylib libzim.dylib 
${SHARED_OUTPUT}
+install_name_tool -change ${LIBLZMA_DIR}/liblzma.5.dylib liblzma.dylib 
${SHARED_OUTPUT}
+install_name_tool -change ${LIBMAGIC_DIR}/libmagic.1.dylib libmagic.dylib 
${SHARED_OUTPUT}
+otool -L ${SHARED_OUTPUT}
+
+ls -lh ${STATIC_OUTPUT}
+ls -lh ${SHARED_OUTPUT}

-- 
To view, visit https://gerrit.wikimedia.org/r/216507
To unsubscribe, 

[MediaWiki-commits] [Gerrit] openzim[master]: Invalidate internal streambuf buffer when we change it.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/314720 )

Change subject: Invalidate internal streambuf buffer when we change it.
..


Invalidate internal streambuf buffer when we change it.

It may be not necessary as it never fails before, but let's be cautious.

Change-Id: Ie54f95e08e2683c43ef7b0fdc70bd9f74fb1fbe9
---
M zimlib/include/zim/fstream.h
M zimlib/src/fstream.cpp
2 files changed, 2 insertions(+), 1 deletion(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/fstream.h b/zimlib/include/zim/fstream.h
index 970920e..4b99814 100644
--- a/zimlib/include/zim/fstream.h
+++ b/zimlib/include/zim/fstream.h
@@ -75,7 +75,7 @@
 
   void seekg(zim::offset_type off);
   void setBufsize(unsigned s)
-  { buffer.resize(s); }
+  { buffer.resize(s); setg(0, 0, 0);}
   zim::offset_type fsize() const;
   time_t getMTime() const;
   };
diff --git a/zimlib/src/fstream.cpp b/zimlib/src/fstream.cpp
index b925fc5..ef91b57 100644
--- a/zimlib/src/fstream.cpp
+++ b/zimlib/src/fstream.cpp
@@ -258,6 +258,7 @@
   throw std::runtime_error(msg.str());
 }
   }
+  setg(0, 0, 0);
 }
 
 void streambuf::seekg(zim::offset_type off)

-- 
To view, visit https://gerrit.wikimedia.org/r/314720
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie54f95e08e2683c43ef7b0fdc70bd9f74fb1fbe9
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Delete all articles, even the last one.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/303776 )

Change subject: Delete all articles, even the last one.
..


Delete all articles, even the last one.

No so important, only one article *could* be impacted.
(If it is the last article and it is invalid)

Change-Id: Ibd3e7587148eccbdb2c588d4a915a94e343fc003
---
M zimwriterfs/articlesource.cpp
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/articlesource.cpp b/zimwriterfs/articlesource.cpp
index 0eea3e1..5ce0cc1 100644
--- a/zimwriterfs/articlesource.cpp
+++ b/zimwriterfs/articlesource.cpp
@@ -75,6 +75,7 @@
   article = new FileArticle(path);
 };
 if (article->isInvalid()) {
+  delete article;
   article = NULL;
 }
   }

-- 
To view, visit https://gerrit.wikimedia.org/r/303776
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ibd3e7587148eccbdb2c588d4a915a94e343fc003
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Add a API to get the offset of a article in the zimfile.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296914 )

Change subject: Add a API to get the offset of a article in the zimfile.
..


Add a API to get the offset of a article in the zimfile.

To get the offset of a article :

- get the article
- use article.getOffset()

If offset cannot be found (not regular article (redirection...) or cluster
is compressed), 0 is returned.

Change-Id: I5b4aced056c16aa8fc62ce4b8048553ae1f96c25
---
M zimlib/include/zim/article.h
M zimlib/include/zim/cluster.h
M zimlib/include/zim/file.h
M zimlib/src/cluster.cpp
M zimlib/src/file.cpp
5 files changed, 32 insertions(+), 7 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/include/zim/article.h b/zimlib/include/zim/article.h
index 5172adb..b950173 100644
--- a/zimlib/include/zim/article.h
+++ b/zimlib/include/zim/article.h
@@ -85,6 +85,15 @@
: 
const_cast(file).getBlob(dirent.getClusterNumber(), 
dirent.getBlobNumber());
   }
 
+  offset_type getOffset() const
+  {
+Dirent dirent = getDirent();
+return isRedirect()
+|| isLinktarget()
+|| isDeleted() ? 0
+   : 
const_cast(file).getOffset(dirent.getClusterNumber(), 
dirent.getBlobNumber());
+  }
+
   std::string getPage(bool layout = true, unsigned maxRecurse = 10);
   void getPage(std::ostream&, bool layout = true, unsigned maxRecurse = 
10);
 
diff --git a/zimlib/include/zim/cluster.h b/zimlib/include/zim/cluster.h
index bd55cb5..96b16f0 100644
--- a/zimlib/include/zim/cluster.h
+++ b/zimlib/include/zim/cluster.h
@@ -42,6 +42,7 @@
   CompressionType compression;
   Offsets offsets;
   Data data;
+  offset_type startOffset;
 
   void read(std::istream& in);
   void write(std::ostream& out) const;
@@ -49,14 +50,15 @@
 public:
   ClusterImpl();
 
-  void setCompression(CompressionType c)  { compression = c; }
-  CompressionType getCompression() const  { return compression; }
-  bool isCompressed() const   { return compression == 
zimcompZip || compression == zimcompBzip2 || compression == zimcompLzma; }
+  void setCompression(CompressionType c)   { compression = c; }
+  CompressionType getCompression() const   { return compression; }
+  bool isCompressed() const{ return compression == 
zimcompZip || compression == zimcompBzip2 || compression == zimcompLzma; }
 
-  size_type getCount() const  { return offsets.size() - 1; }
-  const char* getData(unsigned n) const   { return [ offsets[n] ]; }
-  size_type getSize(unsigned n) const { return offsets[n+1] - 
offsets[n]; }
-  size_type getSize() const   { return offsets.size() * 
sizeof(size_type) + data.size(); }
+  size_type getCount() const   { return offsets.size() - 1; }
+  const char* getData(unsigned n) const{ return [ offsets[n] ]; }
+  size_type getSize(unsigned n) const  { return offsets[n+1] - 
offsets[n]; }
+  size_type getSize() const{ return offsets.size() * 
sizeof(size_type) + data.size(); }
+  offset_type getOffset(size_type n) const { return startOffset + 
offsets[n]; }
   Blob getBlob(size_type n) const;
   void clear();
 
@@ -85,6 +87,7 @@
 
   const char* getBlobPtr(size_type n) const { return impl->getData(n); 
}
   size_type getBlobSize(size_type n) const  { return impl->getSize(n); 
}
+  offset_type getBlobOffset(size_type n) const  { return 
impl->getOffset(n); }
   Blob getBlob(size_type n) const;
 
   size_type count() const   { return impl ? impl->getCount() : 0; }
diff --git a/zimlib/include/zim/file.h b/zimlib/include/zim/file.h
index a6ac75b..0a3a2c3 100644
--- a/zimlib/include/zim/file.h
+++ b/zimlib/include/zim/file.h
@@ -62,6 +62,7 @@
 
   Blob getBlob(size_type clusterIdx, size_type blobIdx)
 { return getCluster(clusterIdx).getBlob(blobIdx); }
+  offset_type getOffset(size_type clusterIdx, size_type blobIdx);
 
   size_type getNamespaceBeginOffset(char ch)
 { return impl->getNamespaceBeginOffset(ch); }
diff --git a/zimlib/src/cluster.cpp b/zimlib/src/cluster.cpp
index 3630042..3b24fee 100644
--- a/zimlib/src/cluster.cpp
+++ b/zimlib/src/cluster.cpp
@@ -79,6 +79,9 @@
 
 size_type n = offset / 4;
 size_type a = offset;
+// offset are from start of cluster !after the char telling the 
compression!
+// but startOffset is offset from start of the cluster.
+startOffset = offset + sizeof(char);
 
 log_debug1("first offset is " << offset << " n=" << n << " a=" << a);
 
diff --git a/zimlib/src/file.cpp b/zimlib/src/file.cpp
index b6777e6..c5f25a4 100644
--- a/zimlib/src/file.cpp
+++ b/zimlib/src/file.cpp
@@ -201,6 +201,15 @@
   File::const_iterator 

[MediaWiki-commits] [Gerrit] openzim[master]: Fix issue #2 Zimdump crashes on long titles.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296931 )

Change subject: Fix issue #2 Zimdump crashes on long titles.
..


Fix issue #2 Zimdump crashes on long titles.

Most of filesystems have a filename limited to 255 bytes.
If the filename is > 255 bytes truncate it.
Postfix the truncated filename with a counter to avoid name collision.

Change-Id: I0475aaa2d1221be46c48c5a52814ca6659cc7940
---
M zimlib/src/tools/zimDump.cpp
1 file changed, 8 insertions(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimlib/src/tools/zimDump.cpp b/zimlib/src/tools/zimDump.cpp
index 65d13b8..7c0149d 100644
--- a/zimlib/src/tools/zimDump.cpp
+++ b/zimlib/src/tools/zimDump.cpp
@@ -394,6 +394,7 @@
 
 void ZimDumper::dumpFiles(const std::string& directory)
 {
+  unsigned int truncatedFiles = 0;
   ::mkdir(directory.c_str(), 0777);
 
   std::set ns;
@@ -406,6 +407,13 @@
 std::string::size_type p;
 while ((p = t.find('/')) != std::string::npos)
   t.replace(p, 1, "%2f");
+if ( t.length() > 255 )
+{
+  std::ostringstream sspostfix, sst;
+  sspostfix << (++truncatedFiles);
+  sst << t.substr(0, 254-sspostfix.tellp()) << "~" << sspostfix.str();
+  t = sst.str();
+}
 std::string f = d + '/' + t;
 std::ofstream out(f.c_str());
 out << it->getData();

-- 
To view, visit https://gerrit.wikimedia.org/r/296931
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0475aaa2d1221be46c48c5a52814ca6659cc7940
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Handle the case of last article for filenameQueue is invalid.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/296915 )

Change subject: Handle the case of last article for filenameQueue is invalid.
..


Handle the case of last article for filenameQueue is invalid.

Change-Id: I970d7dc6cfbc572ed7c2b2c7e1b4d3a27cd98ce9
---
M zimwriterfs/articlesource.cpp
1 file changed, 8 insertions(+), 3 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/articlesource.cpp b/zimwriterfs/articlesource.cpp
index 06a773d..0eea3e1 100644
--- a/zimwriterfs/articlesource.cpp
+++ b/zimwriterfs/articlesource.cpp
@@ -58,6 +58,7 @@
 
   if (article != NULL) {
 delete article;
+article = NULL;
   }
 
   if (!metadataQueue.empty()) {
@@ -69,12 +70,16 @@
 article = new RedirectArticle(line);
   } else if (filenameQueue.popFromQueue(path)) {
 article = new FileArticle(path);
-while (article && article->isInvalid() && 
filenameQueue.popFromQueue(path)) {
+while (article->isInvalid() && filenameQueue.popFromQueue(path)) {
   delete article;
   article = new FileArticle(path);
 };
-  } else {
-article = NULL;
+if (article->isInvalid()) {
+  article = NULL;
+}
+  }
+
+  if (article == NULL) {
 if ( !loopOverHandlerStarted )
 {
 currentLoopHandler = articleHandlers.begin();

-- 
To view, visit https://gerrit.wikimedia.org/r/296915
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I970d7dc6cfbc572ed7c2b2c7e1b4d3a27cd98ce9
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: libzim.pc: Add "Requires" field

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/330857 )

Change subject: libzim.pc: Add "Requires" field
..


libzim.pc: Add "Requires" field

Follows-up b7e5564423b8644c.

Change-Id: I0710aefa8f706067c5e6cb11c70da785c7956608
---
M zimlib/libzim.pc.in
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Mgautierfr: Looks good to me, but someone else must approve
  Kelson: Verified; Looks good to me, approved

Objections:
  Legoktm: There's a problem with this change, please improve



diff --git a/zimlib/libzim.pc.in b/zimlib/libzim.pc.in
index bbef0d6..58cc155 100644
--- a/zimlib/libzim.pc.in
+++ b/zimlib/libzim.pc.in
@@ -6,6 +6,7 @@
 Name: libzim
 Description: implements read and write methods for ZIM files
 Version: @VERSION@
+Requires: liblzma
 Libs: -L${libdir} -lzim
 Cflags: -I${includedir}
 

-- 
To view, visit https://gerrit.wikimedia.org/r/330857
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I0710aefa8f706067c5e6cb11c70da785c7956608
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Legoktm 
Gerrit-Reviewer: Kelson 
Gerrit-Reviewer: Legoktm 
Gerrit-Reviewer: Mgautierfr 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Bug Fix for bug no 52324 on openzim. Sorts the MIME Types be...

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/79021 )

Change subject: Bug Fix for bug no 52324 on openzim. Sorts the MIME Types 
before storing them to file.
..


Bug Fix for bug no 52324 on openzim. Sorts the MIME Types before storing them 
to file.

Since this will be the new standard, the old ZIM files will have to be 
generated gain in order to obtain checksum match during zimdiff/zimpatch

Change-Id: I37a32d4d9adad7402ab7dbb0d9bee143b76d89b7
---
M zimlib/include/zim/dirent.h
M zimlib/src/zimcreator.cpp
2 files changed, 33 insertions(+), 2 deletions(-)



diff --git a/zimlib/include/zim/dirent.h b/zimlib/include/zim/dirent.h
index a5e0511..2ceea35 100644
--- a/zimlib/include/zim/dirent.h
+++ b/zimlib/include/zim/dirent.h
@@ -108,7 +108,10 @@
 clusterNumber = 0;
 blobNumber = 0;
   }
-
+  void setMimeType(uint16_t mime)
+  {
+mimeType=mime;
+  }
   void setLinktarget()
   {
 mimeType = linktargetMimeType;
diff --git a/zimlib/src/zimcreator.cpp b/zimlib/src/zimcreator.cpp
index bbc9420..e767dd3 100644
--- a/zimlib/src/zimcreator.cpp
+++ b/zimlib/src/zimcreator.cpp
@@ -401,12 +401,40 @@
   log_debug("after writing header - pos=" << zimfile.tellp());
 
   // write mime type list
+  std::vector  oldMImeList;
+  std::vector  newMImeList;
+  std::vectormapping;
+  for (RMimeTypes::const_iterator it = rmimeTypes.begin(); it != 
rmimeTypes.end(); ++it)
+  {
+oldMImeList.push_back(it->second);
+newMImeList.push_back(it->second);
+  }
+  mapping.resize(oldMImeList.size());
+  std::sort(newMImeList.begin(),newMImeList.end());
 
+  for(unsigned int i=0;isecond << '\0';
   }
-
+  */
   out << '\0';
 
   // write url ptr list

-- 
To view, visit https://gerrit.wikimedia.org/r/79021
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I37a32d4d9adad7402ab7dbb0d9bee143b76d89b7
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Kiran mathew koshy 1993 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Add .gitignore and .gitreview

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/84432 )

Change subject: Add .gitignore and .gitreview
..


Add .gitignore and .gitreview

Bug: 54175
Change-Id: Ic900b6a1aa3848aa59b428adef19115a9c55ac61
---
A .gitignore
A .gitreview
2 files changed, 9 insertions(+), 0 deletions(-)

Approvals:
  Reedy: Looks good to me, but someone else must approve
  Kelson: Verified; Looks good to me, approved



diff --git a/.gitignore b/.gitignore
new file mode 100644
index 000..98b092a
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,4 @@
+.svn
+*~
+*.kate-swp
+.*.swp
diff --git a/.gitreview b/.gitreview
new file mode 100644
index 000..2a01b99
--- /dev/null
+++ b/.gitreview
@@ -0,0 +1,5 @@
+[gerrit]
+host=gerrit.wikimedia.org
+port=29418
+project=openzim.git
+defaultbranch=master

-- 
To view, visit https://gerrit.wikimedia.org/r/84432
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ic900b6a1aa3848aa59b428adef19115a9c55ac61
Gerrit-PatchSet: 2
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Reedy 
Gerrit-Reviewer: Kelson 
Gerrit-Reviewer: Reedy 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[refs/meta/config]: Modify access rules

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/344685 )

Change subject: Modify access rules
..


Modify access rules

Change-Id: I8ff42c3054bc6b67f0dba0e3ee96d101bde215d9
---
M project.config
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/project.config b/project.config
index ba42c5f..8effac7 100644
--- a/project.config
+++ b/project.config
@@ -1,7 +1,6 @@
 [access]
inheritFrom = All-Projects
 [project]
-   state = active
description = openzim project
 [access "refs/*"]
owner = group openzim
@@ -10,6 +9,7 @@
push = group openzim
pushMerge = group openzim
submit = group openzim
+   forgeAuthor = group openzim
 [receive]
requireChangeId = true
 [submit]

-- 
To view, visit https://gerrit.wikimedia.org/r/344685
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I8ff42c3054bc6b67f0dba0e3ee96d101bde215d9
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: refs/meta/config
Gerrit-Owner: Paladox 
Gerrit-Reviewer: Kelson 
Gerrit-Reviewer: Reedy 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[refs/meta/config]: Modify access rules

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/344794 )

Change subject: Modify access rules
..


Modify access rules

Change-Id: Ifb79d6cb21c10144d5cce325136a3f68257b2ec9
---
M project.config
1 file changed, 1 insertion(+), 0 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/project.config b/project.config
index 8effac7..7e7ca50 100644
--- a/project.config
+++ b/project.config
@@ -5,6 +5,7 @@
 [access "refs/*"]
owner = group openzim
create = group openzim
+   forgeCommitter = group openzim
forgeCommitter = group platform-engineering
push = group openzim
pushMerge = group openzim

-- 
To view, visit https://gerrit.wikimedia.org/r/344794
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ifb79d6cb21c10144d5cce325136a3f68257b2ec9
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: refs/meta/config
Gerrit-Owner: Paladox 
Gerrit-Reviewer: Kelson 
Gerrit-Reviewer: Reedy 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] openzim[master]: Few clean and better verbose message.

2017-03-25 Thread Kelson (Code Review)
Kelson has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/343898 )

Change subject: Few clean and better verbose message.
..


Few clean and better verbose message.

Change-Id: Id4675f70422ecef42198ca33fe82bc1f33866548
---
M zimwriterfs/indexer.cpp
M zimwriterfs/indexer.h
2 files changed, 4 insertions(+), 8 deletions(-)

Approvals:
  Kelson: Verified; Looks good to me, approved



diff --git a/zimwriterfs/indexer.cpp b/zimwriterfs/indexer.cpp
index b83abd4..33989f4 100644
--- a/zimwriterfs/indexer.cpp
+++ b/zimwriterfs/indexer.cpp
@@ -89,6 +89,10 @@
 
   indexedArticleCount += 1;
 
+  if ( (indexedArticleCount % 1000 == 0) && self->getVerboseFlag()) {
+  std::cout << indexedArticleCount << " articled indexed." <flush();
@@ -137,10 +141,6 @@
   bool Indexer::popFromToIndexQueue(indexerToken ) {
 while (this->isToIndexQueueEmpty()) {
   usleep(500);
-  if (this->getVerboseFlag()) {
-   std::cout << "Waiting... ToIndexQueue is empty for now..." << std::endl;
-  }
-
   pthread_testcancel();
 }
 
diff --git a/zimwriterfs/indexer.h b/zimwriterfs/indexer.h
index 3291e36..686d156 100644
--- a/zimwriterfs/indexer.h
+++ b/zimwriterfs/indexer.h
@@ -29,14 +29,10 @@
 #include 
 
 #include 
-/*#include 
-#include 
-#include */
 #include 
 #include 
 #include 
 #include 
-/*#include "reader.h"*/
 
 using namespace std;
 

-- 
To view, visit https://gerrit.wikimedia.org/r/343898
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id4675f70422ecef42198ca33fe82bc1f33866548
Gerrit-PatchSet: 1
Gerrit-Project: openzim
Gerrit-Branch: master
Gerrit-Owner: Mgautierfr 
Gerrit-Reviewer: Kelson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


  1   2   >