Keywords: runtime error, package check, 32 bit architectures, large files

This is the second of two reports with CRAN check problems that I found in my 
package and
that affect only some particular architectures (in this case, x86_32)

Problem description:

 When compiling a package with C++ source code using Rcpp in a Linux system,
 kernel 5.19.16-100, distribution Fedora 35, the generated package passed
 R CMD check --as-cran test, giving no compilation warnings and no execution 
errors.

 Nevertheless, the runtime tests in the CRAN server provoked an error 
exclusively
 for the x86_32 architecture (found mostly in old PCs).

 Let's suppose you have stored a variable of unsigned long long type at the end
 of a binary file. You think you can read it with:

 unsigned long long endofbindata;
 std::string fname="yourfilename";

 std::ifstream f(fname.c_str());
 f.seekg(-sizeof(unsigned long long),std::ios::end);
 f.read((char *)&endofbindata,sizeof(unsigned long long));

 and indeed you can, but ONLY in 64-bit architectures. The function seekg does 
not
 work as expected in 32-bit architectures since the first parameter (offset)
 is of type streamoff which does not seem to be defined equally by g++ for 32 
and
 64 bit architectures. In 32 bit provokes over/underflow and absurd results
 on execution EVEN IF THE FILE is smaller than 2^32 bytes (in compilation, even 
in
 a 32-bit computer, no error or warning is raised so you don't notice the 
problem).

My solution has been:

 Make a more portable function to get the size of a file using the stat system 
call, like:

 unsigned long long GetFileSize(std::string fname)
 {
        struct stat stat_buf;
        int rc = stat(fname.c_str(), &stat_buf);
        if (rc != 0)
        {
         std::string err="Cannot obtain information (with stat system call) of file 
"+fname+"\n";
         err += "This is probably because you are running this in a 32-bit 
architecture and the file is bigger than 4 GB.\n";
         err += "Unfortunately, we have not found yet a solution for that and, if 
you need to manage so big files,\n";
         err += "probably you should consider using a 64-bit architecture.\n";
         Rcpp::stop(err);
         // NOTE: may be definition of __USE_FILE_OFFSET64 could solve this but 
it might provoke other problems...
        }
        else
         return ((unsigned long long)stat_buf.st_size);
 }

 According to the stat manual, stat returns this error:

   EOVERFLOW
        pathname or fd refers to a file whose size, inode number, or number of 
blocks cannot be represented in, respectively, the  types
        off_t,  ino_t,  or  blkcnt_t.   This  error  can  occur  when, for 
example, an application compiled on a 32-bit platform without
        -D_FILE_OFFSET_BITS=64 calls stat() on a file whose size exceeds 
(1<<31)-1 bytes.

 Done that, use the returned number (if it has succeeded) to go there (or 
there, less an offset) with a f.seekg call.

 As you see, I have not found a real solution, but at least this warns the user 
about the problem of using large files
 in 32-bit architectures.

 This should be now infrequent in practice, since every day less 32-bit 
computers remain in use,
 but since CRAN still checks with them I have preferred to document it, just in 
case anyone else may
 benefit of the information.

    Juan

--
________________________________________________________________
Juan Domingo Esteve
Dept. of Informatics, School of Engineering
University of Valencia
Avda. de la Universidad, s/n.
        46100-Burjasot (Valencia)
           SPAIN

Telephone:      +34-963543572
Fax:            +34-963543550
email:  juan.domi...@uv.es
________________________________________________________________
_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to