Re: Implementing own file system: passing back large int64_t values (>2 GB) in lseek, fstat & friends

2018-03-07 Thread Jukka Jylänki
The same issue with stat syscall truncating at 32 bits exists for native as
well, and there exists a stat64 syscall specifically for this purpose,
which replaces the overflowing 32-bit fields with 64-bit ones. See
https://www.ibm.com/support/knowledgecenter/en/ssw_i5_54/apis/stat64.htm .
It does look like those 64-bit counterparts of the syscalls are
implemented, so perhaps you can use those instead?

2018-03-06 1:02 GMT+02:00 Soeren Balko :

> Nah, we read these files progressively. ;-)
>
> On Tue, Mar 6, 2018 at 6:45 AM, Alon Zakai  wrote:
>
>> It's probably something customizable in musl, since it can run in 32 and
>> 64 bit systems. Probably for emscripten we defined it as 32-bit since
>> memory is 32-bit anyhow. So if you want to change this, just defining it as
>> 64-bit and fixing up the syscalls would be enough.
>>
>> Do you really use files larger than you can fit in memory all at once,
>> though? :)
>>
>> On Sun, Mar 4, 2018 at 3:05 PM, Sören Balko 
>> wrote:
>>
>>> Thanks, Alon - very helpful! Having unsigned 32bit ints would help, but
>>> not necessarily a lot. We process video files that can occasionally be
>>> huge, especially when dealing with poorly compressed video streams such as
>>> motion JPEGs. The fact that off_t is declared as a 32 bit int (signed or
>>> not) strikes me as odd. Is that a musl limitation?
>>>
>>>
>>> On Monday, 5 March 2018 04:58:55 UTC+10, Alon Zakai wrote:

 About the 31 bit issue, there's a chance the issue is that the asm.js
 FFI boundary is treated as signed (an asm.js function returning a 32-bit
 integer will use | 0). In that case, what might be the bug is that when JS
 calls a function returning an unsigned value it should use >>> 0. Another
 possibility is that the loads/stores of that struct value (makeSetValue
 etc.) may need to be marked as unsigned.

 On Sun, Mar 4, 2018 at 10:55 AM, Alon Zakai  wrote:

> C_STRUCTS is generated from the headers in gen_struct_info.py,
> basically by compiling small C programs to see what the offsets are. I
> believe it does not look at sizes, though (except for __size__ which is
> computed for the entire struct). The numbers there are the offsets, not 
> the
> sizes. So st_size is at offset 36.
>
> The stat.h says
>
> off_t st_size;
> blksize_t st_blksize;
>
> I'm not sure how to easily find the definition of off_t, but looking
> in the offsets, st_size is 36 and st_blksize which is after it is 40, so
> the size must be 4. So it's not big enough if you need more then 32 bits,
> off_t would need to be redefined. (Do you really need more than 32 bits,
> though?)
>
> A separate question is if 32-bit values work - I think you said 31
> bits seems to be the limit. That could be due to treating the value as
> signed somewhere ( | 0 will do that). If 32 unsigned bits are enough for
> you, finding that bug might be practical.
>
>
> On Sat, Mar 3, 2018 at 12:31 AM, Soeren Balko 
> wrote:
>
>> In struct_info.compiled.json, the "stat" struct is declared like so:
>>
>> "stat":{
>>
>>"st_rdev":28,
>>"st_mtim":{
>>   "tv_sec":56,
>>   "tv_nsec":60,
>>   "__size__":8
>>},
>>"st_blocks":44,
>>"st_atim":{
>>   "tv_sec":48,
>>   "tv_nsec":52,
>>   "__size__":8
>>},
>>"st_nlink":16,
>>"__st_ino_truncated":8,
>>"st_ctim": {
>>
>>   "tv_sec":64,
>>   "tv_nsec":68,
>>   "__size__":8
>>},
>>"st_mode":12,
>>"st_blksize":40,
>>"__st_dev_padding":4,
>>"st_dev":0,
>>"st_size":36,
>>"st_gid":24,
>>"__st_rdev_padding":32,
>>"st_uid":20,
>>"st_ino":72,
>>"__size__":76
>> }
>>
>>
>> I assume the properties are the bit widths of the various fields (?).
>> According to this, st_size is 36 bits, which is enough to cater even for
>> very large files.
>>
>> Can you please confirm, Alon?
>>
>>
>> On Saturday, March 3, 2018 at 6:18:09 PM UTC+10, Soeren Balko wrote:
>>
>>> Thanks, Alon! This does indeed seem to be the issue. In
>>> library_syscall.js, the "st_size" member is considered am i32 (see 
>>> below).
>>> I do  not yet fully understand how C_STRUCTS is generated. I can see 
>>> that
>>> compiler.js receives a JSON object STRUCT_INFO that contains the type
>>> definitions. Is this generated from the musl headers?
>>>
>>> doStat: function(func, path, buf) {
>>> try {
>>> var stat = func(path);
>>> } catch (e) {
>>> if (e && e.node && PATH.normalize(path) !== PATH.normalize(FS.
>>> getPath(e.node))) {
>>> // an error occurred while trying to look up the path; we should
>>> just report ENOTDIR
>>> return -ERRNO_CODES.ENOTDIR;
>>> }
>>

Re: Implementing own file system: passing back large int64_t values (>2 GB) in lseek, fstat & friends

2018-03-05 Thread Soeren Balko
Nah, we read these files progressively. ;-)

On Tue, Mar 6, 2018 at 6:45 AM, Alon Zakai  wrote:

> It's probably something customizable in musl, since it can run in 32 and
> 64 bit systems. Probably for emscripten we defined it as 32-bit since
> memory is 32-bit anyhow. So if you want to change this, just defining it as
> 64-bit and fixing up the syscalls would be enough.
>
> Do you really use files larger than you can fit in memory all at once,
> though? :)
>
> On Sun, Mar 4, 2018 at 3:05 PM, Sören Balko 
> wrote:
>
>> Thanks, Alon - very helpful! Having unsigned 32bit ints would help, but
>> not necessarily a lot. We process video files that can occasionally be
>> huge, especially when dealing with poorly compressed video streams such as
>> motion JPEGs. The fact that off_t is declared as a 32 bit int (signed or
>> not) strikes me as odd. Is that a musl limitation?
>>
>>
>> On Monday, 5 March 2018 04:58:55 UTC+10, Alon Zakai wrote:
>>>
>>> About the 31 bit issue, there's a chance the issue is that the asm.js
>>> FFI boundary is treated as signed (an asm.js function returning a 32-bit
>>> integer will use | 0). In that case, what might be the bug is that when JS
>>> calls a function returning an unsigned value it should use >>> 0. Another
>>> possibility is that the loads/stores of that struct value (makeSetValue
>>> etc.) may need to be marked as unsigned.
>>>
>>> On Sun, Mar 4, 2018 at 10:55 AM, Alon Zakai  wrote:
>>>
 C_STRUCTS is generated from the headers in gen_struct_info.py,
 basically by compiling small C programs to see what the offsets are. I
 believe it does not look at sizes, though (except for __size__ which is
 computed for the entire struct). The numbers there are the offsets, not the
 sizes. So st_size is at offset 36.

 The stat.h says

 off_t st_size;
 blksize_t st_blksize;

 I'm not sure how to easily find the definition of off_t, but looking in
 the offsets, st_size is 36 and st_blksize which is after it is 40, so the
 size must be 4. So it's not big enough if you need more then 32 bits, off_t
 would need to be redefined. (Do you really need more than 32 bits, though?)

 A separate question is if 32-bit values work - I think you said 31 bits
 seems to be the limit. That could be due to treating the value as signed
 somewhere ( | 0 will do that). If 32 unsigned bits are enough for you,
 finding that bug might be practical.


 On Sat, Mar 3, 2018 at 12:31 AM, Soeren Balko  wrote:

> In struct_info.compiled.json, the "stat" struct is declared like so:
>
> "stat":{
>
>"st_rdev":28,
>"st_mtim":{
>   "tv_sec":56,
>   "tv_nsec":60,
>   "__size__":8
>},
>"st_blocks":44,
>"st_atim":{
>   "tv_sec":48,
>   "tv_nsec":52,
>   "__size__":8
>},
>"st_nlink":16,
>"__st_ino_truncated":8,
>"st_ctim": {
>
>   "tv_sec":64,
>   "tv_nsec":68,
>   "__size__":8
>},
>"st_mode":12,
>"st_blksize":40,
>"__st_dev_padding":4,
>"st_dev":0,
>"st_size":36,
>"st_gid":24,
>"__st_rdev_padding":32,
>"st_uid":20,
>"st_ino":72,
>"__size__":76
> }
>
>
> I assume the properties are the bit widths of the various fields (?).
> According to this, st_size is 36 bits, which is enough to cater even for
> very large files.
>
> Can you please confirm, Alon?
>
>
> On Saturday, March 3, 2018 at 6:18:09 PM UTC+10, Soeren Balko wrote:
>
>> Thanks, Alon! This does indeed seem to be the issue. In
>> library_syscall.js, the "st_size" member is considered am i32 (see 
>> below).
>> I do  not yet fully understand how C_STRUCTS is generated. I can see that
>> compiler.js receives a JSON object STRUCT_INFO that contains the type
>> definitions. Is this generated from the musl headers?
>>
>> doStat: function(func, path, buf) {
>> try {
>> var stat = func(path);
>> } catch (e) {
>> if (e && e.node && PATH.normalize(path) !== PATH.normalize(FS.getPath
>> (e.node))) {
>> // an error occurred while trying to look up the path; we should
>> just report ENOTDIR
>> return -ERRNO_CODES.ENOTDIR;
>> }
>> throw e;
>> }
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_dev, 'stat.dev', 'i32')
>> }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_dev_padding, '0', 'i32')
>> }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_ino_truncated, 'stat.ino',
>> 'i32') }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_mode, 'stat.mode', 'i32')
>> }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_nlink, 'stat.nlink', 'i32')
>> }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_uid, 'stat.uid', 'i32')
>> }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.s

Re: Implementing own file system: passing back large int64_t values (>2 GB) in lseek, fstat & friends

2018-03-05 Thread Alon Zakai
It's probably something customizable in musl, since it can run in 32 and 64
bit systems. Probably for emscripten we defined it as 32-bit since memory
is 32-bit anyhow. So if you want to change this, just defining it as 64-bit
and fixing up the syscalls would be enough.

Do you really use files larger than you can fit in memory all at once,
though? :)

On Sun, Mar 4, 2018 at 3:05 PM, Sören Balko  wrote:

> Thanks, Alon - very helpful! Having unsigned 32bit ints would help, but
> not necessarily a lot. We process video files that can occasionally be
> huge, especially when dealing with poorly compressed video streams such as
> motion JPEGs. The fact that off_t is declared as a 32 bit int (signed or
> not) strikes me as odd. Is that a musl limitation?
>
>
> On Monday, 5 March 2018 04:58:55 UTC+10, Alon Zakai wrote:
>>
>> About the 31 bit issue, there's a chance the issue is that the asm.js FFI
>> boundary is treated as signed (an asm.js function returning a 32-bit
>> integer will use | 0). In that case, what might be the bug is that when JS
>> calls a function returning an unsigned value it should use >>> 0. Another
>> possibility is that the loads/stores of that struct value (makeSetValue
>> etc.) may need to be marked as unsigned.
>>
>> On Sun, Mar 4, 2018 at 10:55 AM, Alon Zakai  wrote:
>>
>>> C_STRUCTS is generated from the headers in gen_struct_info.py, basically
>>> by compiling small C programs to see what the offsets are. I believe it
>>> does not look at sizes, though (except for __size__ which is computed for
>>> the entire struct). The numbers there are the offsets, not the sizes. So
>>> st_size is at offset 36.
>>>
>>> The stat.h says
>>>
>>> off_t st_size;
>>> blksize_t st_blksize;
>>>
>>> I'm not sure how to easily find the definition of off_t, but looking in
>>> the offsets, st_size is 36 and st_blksize which is after it is 40, so the
>>> size must be 4. So it's not big enough if you need more then 32 bits, off_t
>>> would need to be redefined. (Do you really need more than 32 bits, though?)
>>>
>>> A separate question is if 32-bit values work - I think you said 31 bits
>>> seems to be the limit. That could be due to treating the value as signed
>>> somewhere ( | 0 will do that). If 32 unsigned bits are enough for you,
>>> finding that bug might be practical.
>>>
>>>
>>> On Sat, Mar 3, 2018 at 12:31 AM, Soeren Balko  wrote:
>>>
 In struct_info.compiled.json, the "stat" struct is declared like so:

 "stat":{

"st_rdev":28,
"st_mtim":{
   "tv_sec":56,
   "tv_nsec":60,
   "__size__":8
},
"st_blocks":44,
"st_atim":{
   "tv_sec":48,
   "tv_nsec":52,
   "__size__":8
},
"st_nlink":16,
"__st_ino_truncated":8,
"st_ctim": {

   "tv_sec":64,
   "tv_nsec":68,
   "__size__":8
},
"st_mode":12,
"st_blksize":40,
"__st_dev_padding":4,
"st_dev":0,
"st_size":36,
"st_gid":24,
"__st_rdev_padding":32,
"st_uid":20,
"st_ino":72,
"__size__":76
 }


 I assume the properties are the bit widths of the various fields (?).
 According to this, st_size is 36 bits, which is enough to cater even for
 very large files.

 Can you please confirm, Alon?


 On Saturday, March 3, 2018 at 6:18:09 PM UTC+10, Soeren Balko wrote:

> Thanks, Alon! This does indeed seem to be the issue. In
> library_syscall.js, the "st_size" member is considered am i32 (see below).
> I do  not yet fully understand how C_STRUCTS is generated. I can see that
> compiler.js receives a JSON object STRUCT_INFO that contains the type
> definitions. Is this generated from the musl headers?
>
> doStat: function(func, path, buf) {
> try {
> var stat = func(path);
> } catch (e) {
> if (e && e.node && PATH.normalize(path) !== PATH.normalize(FS.getPath(
> e.node))) {
> // an error occurred while trying to look up the path; we should just
> report ENOTDIR
> return -ERRNO_CODES.ENOTDIR;
> }
> throw e;
> }
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_dev, 'stat.dev', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_dev_padding, '0', 'i32')
> }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_ino_truncated, 'stat.ino',
> 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_mode, 'stat.mode', 'i32')
> }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_nlink, 'stat.nlink', 'i32')
> }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_uid, 'stat.uid', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_gid, 'stat.gid', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_rdev, 'stat.rdev', 'i32')
> }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_rdev_padding, '0', 'i32')
> }}};
> *{{{ makeSetValue('buf', C_STRUCTS.stat.st_siz

Re: Implementing own file system: passing back large int64_t values (>2 GB) in lseek, fstat & friends

2018-03-04 Thread Sören Balko
Thanks, Alon - very helpful! Having unsigned 32bit ints would help, but not 
necessarily a lot. We process video files that can occasionally be huge, 
especially when dealing with poorly compressed video streams such as motion 
JPEGs. The fact that off_t is declared as a 32 bit int (signed or not) 
strikes me as odd. Is that a musl limitation?


On Monday, 5 March 2018 04:58:55 UTC+10, Alon Zakai wrote:
>
> About the 31 bit issue, there's a chance the issue is that the asm.js FFI 
> boundary is treated as signed (an asm.js function returning a 32-bit 
> integer will use | 0). In that case, what might be the bug is that when JS 
> calls a function returning an unsigned value it should use >>> 0. Another 
> possibility is that the loads/stores of that struct value (makeSetValue 
> etc.) may need to be marked as unsigned.
>
> On Sun, Mar 4, 2018 at 10:55 AM, Alon Zakai  > wrote:
>
>> C_STRUCTS is generated from the headers in gen_struct_info.py, basically 
>> by compiling small C programs to see what the offsets are. I believe it 
>> does not look at sizes, though (except for __size__ which is computed for 
>> the entire struct). The numbers there are the offsets, not the sizes. So 
>> st_size is at offset 36.
>>
>> The stat.h says
>>
>> off_t st_size;
>> blksize_t st_blksize;
>>
>> I'm not sure how to easily find the definition of off_t, but looking in 
>> the offsets, st_size is 36 and st_blksize which is after it is 40, so the 
>> size must be 4. So it's not big enough if you need more then 32 bits, off_t 
>> would need to be redefined. (Do you really need more than 32 bits, though?)
>>
>> A separate question is if 32-bit values work - I think you said 31 bits 
>> seems to be the limit. That could be due to treating the value as signed 
>> somewhere ( | 0 will do that). If 32 unsigned bits are enough for you, 
>> finding that bug might be practical.
>>
>>
>> On Sat, Mar 3, 2018 at 12:31 AM, Soeren Balko > > wrote:
>>
>>> In struct_info.compiled.json, the "stat" struct is declared like so:
>>>
>>> "stat":{
>>>
>>>"st_rdev":28,
>>>"st_mtim":{
>>>   "tv_sec":56,
>>>   "tv_nsec":60,
>>>   "__size__":8
>>>},
>>>"st_blocks":44,
>>>"st_atim":{
>>>   "tv_sec":48,
>>>   "tv_nsec":52,
>>>   "__size__":8
>>>},
>>>"st_nlink":16,
>>>"__st_ino_truncated":8,
>>>"st_ctim": {
>>>
>>>   "tv_sec":64,
>>>   "tv_nsec":68,
>>>   "__size__":8
>>>},
>>>"st_mode":12,
>>>"st_blksize":40,
>>>"__st_dev_padding":4,
>>>"st_dev":0,
>>>"st_size":36,
>>>"st_gid":24,
>>>"__st_rdev_padding":32,
>>>"st_uid":20,
>>>"st_ino":72,
>>>"__size__":76
>>> }
>>>
>>>
>>> I assume the properties are the bit widths of the various fields (?). 
>>> According to this, st_size is 36 bits, which is enough to cater even for 
>>> very large files.
>>>
>>> Can you please confirm, Alon?
>>>
>>>
>>> On Saturday, March 3, 2018 at 6:18:09 PM UTC+10, Soeren Balko wrote:
>>>
 Thanks, Alon! This does indeed seem to be the issue. In 
 library_syscall.js, the "st_size" member is considered am i32 (see below). 
 I do  not yet fully understand how C_STRUCTS is generated. I can see that 
 compiler.js receives a JSON object STRUCT_INFO that contains the type 
 definitions. Is this generated from the musl headers? 

 doStat: function(func, path, buf) {
 try {
 var stat = func(path);
 } catch (e) {
 if (e && e.node && PATH.normalize(path) !== PATH.normalize(FS.getPath(e
 .node))) {
 // an error occurred while trying to look up the path; we should just 
 report ENOTDIR
 return -ERRNO_CODES.ENOTDIR;
 }
 throw e;
 }
 {{{ makeSetValue('buf', C_STRUCTS.stat.st_dev, 'stat.dev', 'i32') }}};
 {{{ makeSetValue('buf', C_STRUCTS.stat.__st_dev_padding, '0', 'i32') 
 }}};
 {{{ makeSetValue('buf', C_STRUCTS.stat.__st_ino_truncated, 'stat.ino', 
 'i32') }}};
 {{{ makeSetValue('buf', C_STRUCTS.stat.st_mode, 'stat.mode', 'i32') 
 }}};
 {{{ makeSetValue('buf', C_STRUCTS.stat.st_nlink, 'stat.nlink', 'i32') 
 }}};
 {{{ makeSetValue('buf', C_STRUCTS.stat.st_uid, 'stat.uid', 'i32') }}};
 {{{ makeSetValue('buf', C_STRUCTS.stat.st_gid, 'stat.gid', 'i32') }}};
 {{{ makeSetValue('buf', C_STRUCTS.stat.st_rdev, 'stat.rdev', 'i32') 
 }}};
 {{{ makeSetValue('buf', C_STRUCTS.stat.__st_rdev_padding, '0', 'i32') 
 }}};
 *{{{ makeSetValue('buf', C_STRUCTS.stat.st_size, 'stat.size', 'i32') 
 }}};*
 {{{ makeSetValue('buf', C_STRUCTS.stat.st_blksize, '4096', 'i32') }}};
 {{{ makeSetValue('buf', C_STRUCTS.stat.st_blocks, 'stat.blocks', 'i32') 
 }}};
 {{{ makeSetValue('buf', C_STRUCTS.stat.st_atim.tv_sec, 
 '(stat.atime.getTime() 
 / 1000)|0', 'i32') }}};
 {{{ makeSetValue('buf', C_STRUCTS.stat.st_atim.tv_nsec, '0', 'i32') 
 }}};
 {{{ makeSetValue('buf', C_STRUCTS.stat.st_mtim.tv_sec, 
 '(stat.mtime.getTime() 
 

Re: Implementing own file system: passing back large int64_t values (>2 GB) in lseek, fstat & friends

2018-03-04 Thread Alon Zakai
About the 31 bit issue, there's a chance the issue is that the asm.js FFI
boundary is treated as signed (an asm.js function returning a 32-bit
integer will use | 0). In that case, what might be the bug is that when JS
calls a function returning an unsigned value it should use >>> 0. Another
possibility is that the loads/stores of that struct value (makeSetValue
etc.) may need to be marked as unsigned.

On Sun, Mar 4, 2018 at 10:55 AM, Alon Zakai  wrote:

> C_STRUCTS is generated from the headers in gen_struct_info.py, basically
> by compiling small C programs to see what the offsets are. I believe it
> does not look at sizes, though (except for __size__ which is computed for
> the entire struct). The numbers there are the offsets, not the sizes. So
> st_size is at offset 36.
>
> The stat.h says
>
> off_t st_size;
> blksize_t st_blksize;
>
> I'm not sure how to easily find the definition of off_t, but looking in
> the offsets, st_size is 36 and st_blksize which is after it is 40, so the
> size must be 4. So it's not big enough if you need more then 32 bits, off_t
> would need to be redefined. (Do you really need more than 32 bits, though?)
>
> A separate question is if 32-bit values work - I think you said 31 bits
> seems to be the limit. That could be due to treating the value as signed
> somewhere ( | 0 will do that). If 32 unsigned bits are enough for you,
> finding that bug might be practical.
>
>
> On Sat, Mar 3, 2018 at 12:31 AM, Soeren Balko  wrote:
>
>> In struct_info.compiled.json, the "stat" struct is declared like so:
>>
>> "stat":{
>>
>>"st_rdev":28,
>>"st_mtim":{
>>   "tv_sec":56,
>>   "tv_nsec":60,
>>   "__size__":8
>>},
>>"st_blocks":44,
>>"st_atim":{
>>   "tv_sec":48,
>>   "tv_nsec":52,
>>   "__size__":8
>>},
>>"st_nlink":16,
>>"__st_ino_truncated":8,
>>"st_ctim": {
>>
>>   "tv_sec":64,
>>   "tv_nsec":68,
>>   "__size__":8
>>},
>>"st_mode":12,
>>"st_blksize":40,
>>"__st_dev_padding":4,
>>"st_dev":0,
>>"st_size":36,
>>"st_gid":24,
>>"__st_rdev_padding":32,
>>"st_uid":20,
>>"st_ino":72,
>>"__size__":76
>> }
>>
>>
>> I assume the properties are the bit widths of the various fields (?).
>> According to this, st_size is 36 bits, which is enough to cater even for
>> very large files.
>>
>> Can you please confirm, Alon?
>>
>>
>> On Saturday, March 3, 2018 at 6:18:09 PM UTC+10, Soeren Balko wrote:
>>
>>> Thanks, Alon! This does indeed seem to be the issue. In
>>> library_syscall.js, the "st_size" member is considered am i32 (see below).
>>> I do  not yet fully understand how C_STRUCTS is generated. I can see that
>>> compiler.js receives a JSON object STRUCT_INFO that contains the type
>>> definitions. Is this generated from the musl headers?
>>>
>>> doStat: function(func, path, buf) {
>>> try {
>>> var stat = func(path);
>>> } catch (e) {
>>> if (e && e.node && PATH.normalize(path) !== PATH.normalize(FS.getPath(e.
>>> node))) {
>>> // an error occurred while trying to look up the path; we should just
>>> report ENOTDIR
>>> return -ERRNO_CODES.ENOTDIR;
>>> }
>>> throw e;
>>> }
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_dev, 'stat.dev', 'i32') }}};
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_dev_padding, '0', 'i32')
>>> }}};
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_ino_truncated, 'stat.ino', '
>>> i32') }}};
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_mode, 'stat.mode', 'i32') }}};
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_nlink, 'stat.nlink', 'i32')
>>> }}};
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_uid, 'stat.uid', 'i32') }}};
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_gid, 'stat.gid', 'i32') }}};
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_rdev, 'stat.rdev', 'i32') }}};
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_rdev_padding, '0', 'i32')
>>> }}};
>>> *{{{ makeSetValue('buf', C_STRUCTS.stat.st_size, 'stat.size', 'i32')
>>> }}};*
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_blksize, '4096', 'i32') }}};
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_blocks, 'stat.blocks', 'i32')
>>> }}};
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_atim.tv_sec, 
>>> '(stat.atime.getTime()
>>> / 1000)|0', 'i32') }}};
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_atim.tv_nsec, '0', 'i32') }}};
>>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_mtim.tv_sec, 
>>> '(stat.mtime.getTime()
>>> / 1000)|0', 'i32') }}};
>>> >> style="text-align: left; box-sizing: border-box; padding-right: 10px;
>>> padding-lef
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "emscripten-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to emscripten-discuss+unsubscr...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop 

Re: Implementing own file system: passing back large int64_t values (>2 GB) in lseek, fstat & friends

2018-03-04 Thread Alon Zakai
C_STRUCTS is generated from the headers in gen_struct_info.py, basically by
compiling small C programs to see what the offsets are. I believe it does
not look at sizes, though (except for __size__ which is computed for the
entire struct). The numbers there are the offsets, not the sizes. So
st_size is at offset 36.

The stat.h says

off_t st_size;
blksize_t st_blksize;

I'm not sure how to easily find the definition of off_t, but looking in the
offsets, st_size is 36 and st_blksize which is after it is 40, so the size
must be 4. So it's not big enough if you need more then 32 bits, off_t
would need to be redefined. (Do you really need more than 32 bits, though?)

A separate question is if 32-bit values work - I think you said 31 bits
seems to be the limit. That could be due to treating the value as signed
somewhere ( | 0 will do that). If 32 unsigned bits are enough for you,
finding that bug might be practical.


On Sat, Mar 3, 2018 at 12:31 AM, Soeren Balko  wrote:

> In struct_info.compiled.json, the "stat" struct is declared like so:
>
> "stat":{
>
>"st_rdev":28,
>"st_mtim":{
>   "tv_sec":56,
>   "tv_nsec":60,
>   "__size__":8
>},
>"st_blocks":44,
>"st_atim":{
>   "tv_sec":48,
>   "tv_nsec":52,
>   "__size__":8
>},
>"st_nlink":16,
>"__st_ino_truncated":8,
>"st_ctim": {
>
>   "tv_sec":64,
>   "tv_nsec":68,
>   "__size__":8
>},
>"st_mode":12,
>"st_blksize":40,
>"__st_dev_padding":4,
>"st_dev":0,
>"st_size":36,
>"st_gid":24,
>"__st_rdev_padding":32,
>"st_uid":20,
>"st_ino":72,
>"__size__":76
> }
>
>
> I assume the properties are the bit widths of the various fields (?).
> According to this, st_size is 36 bits, which is enough to cater even for
> very large files.
>
> Can you please confirm, Alon?
>
>
> On Saturday, March 3, 2018 at 6:18:09 PM UTC+10, Soeren Balko wrote:
>
>> Thanks, Alon! This does indeed seem to be the issue. In
>> library_syscall.js, the "st_size" member is considered am i32 (see below).
>> I do  not yet fully understand how C_STRUCTS is generated. I can see that
>> compiler.js receives a JSON object STRUCT_INFO that contains the type
>> definitions. Is this generated from the musl headers?
>>
>> doStat: function(func, path, buf) {
>> try {
>> var stat = func(path);
>> } catch (e) {
>> if (e && e.node && PATH.normalize(path) !== PATH.normalize(FS.getPath(e.
>> node))) {
>> // an error occurred while trying to look up the path; we should just
>> report ENOTDIR
>> return -ERRNO_CODES.ENOTDIR;
>> }
>> throw e;
>> }
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_dev, 'stat.dev', 'i32') }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_dev_padding, '0', 'i32') }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_ino_truncated, 'stat.ino', '
>> i32') }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_mode, 'stat.mode', 'i32') }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_nlink, 'stat.nlink', 'i32')
>> }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_uid, 'stat.uid', 'i32') }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_gid, 'stat.gid', 'i32') }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_rdev, 'stat.rdev', 'i32') }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_rdev_padding, '0', 'i32')
>> }}};
>> *{{{ makeSetValue('buf', C_STRUCTS.stat.st_size, 'stat.size', 'i32') }}};*
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_blksize, '4096', 'i32') }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_blocks, 'stat.blocks', 'i32')
>> }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_atim.tv_sec, '(stat.atime.getTime()
>> / 1000)|0', 'i32') }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_atim.tv_nsec, '0', 'i32') }}};
>> {{{ makeSetValue('buf', C_STRUCTS.stat.st_mtim.tv_sec, '(stat.mtime.getTime()
>> / 1000)|0', 'i32') }}};
>> > style="text-align: left; box-sizing: border-box; padding-right: 10px;
>> padding-lef
>>
> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to emscripten-discuss+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to emscripten-discuss+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Implementing own file system: passing back large int64_t values (>2 GB) in lseek, fstat & friends

2018-03-03 Thread Soeren Balko
In struct_info.compiled.json, the "stat" struct is declared like so:

"stat":{

   "st_rdev":28,
   "st_mtim":{
  "tv_sec":56,
  "tv_nsec":60,
  "__size__":8
   },
   "st_blocks":44,
   "st_atim":{
  "tv_sec":48,
  "tv_nsec":52,
  "__size__":8
   },
   "st_nlink":16,
   "__st_ino_truncated":8,
   "st_ctim": {

  "tv_sec":64,
  "tv_nsec":68,
  "__size__":8
   },
   "st_mode":12,
   "st_blksize":40,
   "__st_dev_padding":4,
   "st_dev":0,
   "st_size":36,
   "st_gid":24,
   "__st_rdev_padding":32,
   "st_uid":20,
   "st_ino":72,
   "__size__":76
}


I assume the properties are the bit widths of the various fields (?). 
According to this, st_size is 36 bits, which is enough to cater even for 
very large files.

Can you please confirm, Alon?


On Saturday, March 3, 2018 at 6:18:09 PM UTC+10, Soeren Balko wrote:
>
> Thanks, Alon! This does indeed seem to be the issue. In 
> library_syscall.js, the "st_size" member is considered am i32 (see below). 
> I do  not yet fully understand how C_STRUCTS is generated. I can see that 
> compiler.js receives a JSON object STRUCT_INFO that contains the type 
> definitions. Is this generated from the musl headers? 
>
> doStat: function(func, path, buf) {
> try {
> var stat = func(path);
> } catch (e) {
> if (e && e.node && PATH.normalize(path) !== PATH.normalize(FS.getPath(e.
> node))) {
> // an error occurred while trying to look up the path; we should just 
> report ENOTDIR
> return -ERRNO_CODES.ENOTDIR;
> }
> throw e;
> }
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_dev, 'stat.dev', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_dev_padding, '0', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_ino_truncated, 'stat.ino', '
> i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_mode, 'stat.mode', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_nlink, 'stat.nlink', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_uid, 'stat.uid', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_gid, 'stat.gid', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_rdev, 'stat.rdev', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.__st_rdev_padding, '0', 'i32') }}};
> *{{{ makeSetValue('buf', C_STRUCTS.stat.st_size, 'stat.size', 'i32') }}};*
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_blksize, '4096', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_blocks, 'stat.blocks', 'i32') 
> }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_atim.tv_sec, '(stat.atime.getTime() 
> / 1000)|0', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_atim.tv_nsec, '0', 'i32') }}};
> {{{ makeSetValue('buf', C_STRUCTS.stat.st_mtim.tv_sec, '(stat.mtime.getTime() 
> / 1000)|0', 'i32') }}};
>  style="text-align: left; box-sizing: border-box; padding-right: 10px; 
> padding-lef
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to emscripten-discuss+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Implementing own file system: passing back large int64_t values (>2 GB) in lseek, fstat & friends

2018-03-03 Thread Soeren Balko
Thanks, Alon! This does indeed seem to be the issue. In library_syscall.js, 
the "st_size" member is considered am i32 (see below). I do  not yet fully 
understand how C_STRUCTS is generated. I can see that compiler.js receives 
a JSON object STRUCT_INFO that contains the type definitions. Is this 
generated from the musl headers? 

doStat: function(func, path, buf) {
try {
var stat = func(path);
} catch (e) {
if (e && e.node && PATH.normalize(path) !== PATH.normalize(FS.getPath(e.node))) 
{
// an error occurred while trying to look up the path; we should just 
report ENOTDIR
return -ERRNO_CODES.ENOTDIR;
}
throw e;
}
{{{ makeSetValue('buf', C_STRUCTS.stat.st_dev, 'stat.dev', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.__st_dev_padding, '0', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.__st_ino_truncated, 'stat.ino', 'i32') 
}}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_mode, 'stat.mode', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_nlink, 'stat.nlink', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_uid, 'stat.uid', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_gid, 'stat.gid', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_rdev, 'stat.rdev', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.__st_rdev_padding, '0', 'i32') }}};
*{{{ makeSetValue('buf', C_STRUCTS.stat.st_size, 'stat.size', 'i32') }}};*
{{{ makeSetValue('buf', C_STRUCTS.stat.st_blksize, '4096', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_blocks, 'stat.blocks', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_atim.tv_sec, '(stat.atime.getTime() 
/ 1000)|0', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_atim.tv_nsec, '0', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_mtim.tv_sec, '(stat.mtime.getTime() 
/ 1000)|0', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_mtim.tv_nsec, '0', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_ctim.tv_sec, '(stat.ctime.getTime() 
/ 1000)|0', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_ctim.tv_nsec, '0', 'i32') }}};
{{{ makeSetValue('buf', C_STRUCTS.stat.st_ino, 'stat.ino', 'i32') }}};
return 0;
},
On Saturday, March 3, 2018 at 6:27:32 AM UTC+10, Alon Zakai wrote:
>
> It's possible the issue is that 64-bit integers are passed as two 32-bit 
> integers, in which case the fix is to receive/send those properly (maybe 
> only the low bits are received, for example). Building with LIBRARY_DEBUG=1 
> or SYSCALL_DEBUG=1 might help here, it will print out each call with 
> arguments and return value, so you can find which syscall is relevant.
>
> However, it's also possible the issue is that musl uses a 32-bit signed 
> integer for those syscalls, in which case the syscall interface would need 
> to be changed.
>
> On Thu, Mar 1, 2018 at 11:24 PM, Sören Balko  > wrote:
>
>> Hi,
>>
>> I have run into an issue where our code tries to read very large files 
>> (>2^31 bytes in size) and is effectively running into what looks like an 
>> integer overflow issue. What happens is that the int64_t members of stat_t 
>> ("size") and also the return value of llseek are implicitly down-cast into 
>> signed ints. Here is what we do to mount our file system (slightly 
>> simplified for brevity):
>>
>>  var node = Module.FS.createFile('/', emscriptenPath, null, 
>> true, true);
>>
>> node.node_ops = {
>> getattr: function(ganode) {
>> return {
>> dev: 1,
>> ino: ganode.id,
>> mode: ganode.mode,
>> nlink: 1,
>> uid: 0,
>> gid: 0,
>> rdev: ganode.rdev,
>> size: size,  // <-- this is a file size > 2^31
>> atime: new Date(ganode.timestamp),
>> mtime: new Date(ganode.timestamp),
>> ctime: new Date(ganode.timestamp),
>> blksize: 4096,
>> blocks: Math.ceil(size / 4096)
>> };
>> }
>> };
>>
>> node.stream_ops = {
>> llseek: function(stream, offset, whence) {
>> switch (whence) {
>> case 0: // SEEK_SET
>> stream.position = offset;
>> break;
>> case 1: // SEEK_CUR
>> stream.positon += offset;
>> break;
>> case 2: // SEEK_END
>> stream.position = size + offset;
>> break;
>> default:
>> throw new Module.FS.ErrnoError(22); // EINVAL
>> }
>>
>> return stream.position; // <-- can be > 2^31
>> }, 
>> read: function(stream, buffer, heapOffset, numberOfBytes, 
>> fileOffset) {
>> // ...
>> } 
>> };
>>
>> I suspect that the issue arises from the fact that int64_t has no native 
>> counterpart in JS and is, hence, downcast in the interface between the 
>> asm.js and the file system code. Is there a quick fix to address this 
>> issu

Re: Implementing own file system: passing back large int64_t values (>2 GB) in lseek, fstat & friends

2018-03-02 Thread Alon Zakai
It's possible the issue is that 64-bit integers are passed as two 32-bit
integers, in which case the fix is to receive/send those properly (maybe
only the low bits are received, for example). Building with LIBRARY_DEBUG=1
or SYSCALL_DEBUG=1 might help here, it will print out each call with
arguments and return value, so you can find which syscall is relevant.

However, it's also possible the issue is that musl uses a 32-bit signed
integer for those syscalls, in which case the syscall interface would need
to be changed.

On Thu, Mar 1, 2018 at 11:24 PM, Sören Balko  wrote:

> Hi,
>
> I have run into an issue where our code tries to read very large files
> (>2^31 bytes in size) and is effectively running into what looks like an
> integer overflow issue. What happens is that the int64_t members of stat_t
> ("size") and also the return value of llseek are implicitly down-cast into
> signed ints. Here is what we do to mount our file system (slightly
> simplified for brevity):
>
>  var node = Module.FS.createFile('/', emscriptenPath, null, true,
> true);
>
> node.node_ops = {
> getattr: function(ganode) {
> return {
> dev: 1,
> ino: ganode.id,
> mode: ganode.mode,
> nlink: 1,
> uid: 0,
> gid: 0,
> rdev: ganode.rdev,
> size: size,  // <-- this is a file size > 2^31
> atime: new Date(ganode.timestamp),
> mtime: new Date(ganode.timestamp),
> ctime: new Date(ganode.timestamp),
> blksize: 4096,
> blocks: Math.ceil(size / 4096)
> };
> }
> };
>
> node.stream_ops = {
> llseek: function(stream, offset, whence) {
> switch (whence) {
> case 0: // SEEK_SET
> stream.position = offset;
> break;
> case 1: // SEEK_CUR
> stream.positon += offset;
> break;
> case 2: // SEEK_END
> stream.position = size + offset;
> break;
> default:
> throw new Module.FS.ErrnoError(22); // EINVAL
> }
>
> return stream.position; // <-- can be > 2^31
> },
> read: function(stream, buffer, heapOffset, numberOfBytes,
> fileOffset) {
> // ...
> }
> };
>
> I suspect that the issue arises from the fact that int64_t has no native
> counterpart in JS and is, hence, downcast in the interface between the
> asm.js and the file system code. Is there a quick fix to address this
> issue? I tried -s PRECISE_I64_MATH=2, but to no avail. Also, I am not
> entirely sure where exactly the precision is lost. I guess, it happens in
> the __syscallXY functions for fstat, lseek (and probably also for the
> arguments passed into read).
>
> One idea I had was to patch the syscalls in a way that I render the
> int64_t values as strings on the heap and pass back the pointer to that
> string inside the stat_t structure and the return value of llseek. These
> strings would then have to be parsed back into int64_t values inside the
> syscalls. Not exactly elegant, but it might work. Or is there a generic
> solution?
>
> Thanks heaps in advance for any suggestions...
>
> Soeren
>
> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to emscripten-discuss+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to emscripten-discuss+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.