date:20191213

Re: [osv-dev] how to find missing symbols if it doesn't print the name

2019-12-13 Thread Waldek Kozaczuk

I downloaded the wheel file but I get this error (I am using Ubuntu 19.10):

pip3 install ./tensorflow-1.14.1-cp36-cp36m-linux_x86_64.whl 
tensorflow-1.14.1-cp36-cp36m-linux_x86_64.whl is not a supported wheel on 
this platform.

Could you please send us full recipe that would build the app? Ideally an 
"app" just like any we have under osv-apps repo so that I can simply run:
./scripts/buils image= 

and then run it.

Thanks,
Waldek

On Friday, December 13, 2019 at 5:52:03 PM UTC-5, zhiting zhu wrote:
>
> No. It's  
>Num:Value  Size TypeBind   Vis  Ndx Name
> 1: 0affa7f8 0 SECTION LOCAL  DEFAULT   21 
>
> It's not hidden. 
>
> Here's the new link: 
> https://send.firefox.com/download/9a8bf3fa2909635f/#0ZecrR7UJwspr743vNBo6A 
> to the file. 
>
> On Fri, Dec 13, 2019 at 4:44 PM Waldek Kozaczuk  > wrote:
>
>> I am not sure but this issue is similar to what I encountered when 
>> dealing with dotnet:
>>
>> readelf -s libcoreclr.so | grep gCurrentThreadInfo
>> readelf: Warning: local symbol 31 found at index >= .dynsym's sh_info 
>> value of 1
>> 31: 24 TLS LOCAL  HIDDEN19 
>> gCurrentThreadInfo
>>   9799: 24 TLS LOCAL  HIDDEN19 
>> gCurrentThreadInfo
>>
>> When you use readelf does it show it as a hidden, local TLS symbol?
>>
>> That link with compiles expired. Can you upload it somewhere again?
>>
>> Thanks
>>
>> On Friday, December 13, 2019 at 1:16:21 PM UTC-5, zhiting zhu wrote:
>>>
>>> I was reading the print wrong.
>>> I think the problem is this one:
>>>
>>> 0b23ca60  00010010 R_X86_64_DTPMOD64 0aee9f58 .tbss + 0
>>>
>>> The sym is 1 but in this case, .tbss is still in the same shared object 
>>> but in arch_relocate_rela, it's going to the else branch that handles the 
>>> variable is located in DIFFERENT shared object that the caller. 
>>>
>>> On Wed, Dec 11, 2019 at 3:12 PM zhiting zhu  
>>> wrote:
>>>
 I get rid of the __VA_OPT and replace the __VA_ARGS__ with 
 ##__VA_ARGS__. ## seems to be the extension from GNU that does similar 
 things.

 Here's the link of the TensorFlow that I compile myself: 
 https://send.firefox.com/download/68d2a81e2cacdafb/#ui_lMMYAU9UW9e6Fd1VG7w

 Install it locally and build an osv vm that includes all the 
 dependencies:
 "tensorflow grpc google _cffi_backend past future \
   absl wrapt gast astor termcolor numpy unittest libfuturize \
   keras_applications keras_preprocessing tensorflow_estimator \
   tensorboard". 

 Then just run "import tensorflow" in python shell, it should show you 
 the error I'm looking at. 

 On Wed, Dec 11, 2019 at 2:55 PM Waldek Kozaczuk  
 wrote:

> I wonder if that is related to a similar issue as described here - 
> https://groups.google.com/forum/#!topic/osv-dev/k69cHw7qvTg.
>
> I will try to fix and apply this debug patch to master so it makes 
> easier to debug it.
>
> Meanwhile, can you provide a reproducing test? 
>
> On Wednesday, December 11, 2019 at 3:49:07 PM UTC-5, zhiting zhu wrote:
>>
>> After tracing inside arch_relocate_rela, it's failed in case 
>> R_X86_64_DTPMOD64. 
>>
>> I print out the name index of that symbol and it's 0. I use readelf 
>> -a to check the .rela.dyn section. 
>> There's a line that doesn't have Sym. Value and Sym.Name + Addend is 
>> 0. I think I'm failing at that line. 
>>
>>   Offset  Info   Type   Sym. ValueSym. 
>> Name + Addend
>> *0b202f90  0010 R_X86_64_DTPMOD640*
>> 0b23ca60  00010010 R_X86_64_DTPMOD64 0aee9f58 .tbss + 
>> 0
>>
>>
>>
>> On Wed, Dec 11, 2019 at 10:36 AM zhiting zhu  
>> wrote:
>>
>>> Thanks for the debug patch. When I apply it, g++ complains about 
>>> "core/elf.cc:36:118: error: expected ‘)’ before ‘__VA_OPT__’"
>>> I think __VA_OPT__ is only added to c++2a according to 
>>> https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html. Gcc 7.4.0 
>>> on Ubuntu cannot compile that. 
>>>
>>> On Tue, Dec 10, 2019 at 7:20 AM Waldek Kozaczuk  
>>> wrote:
>>>
 You may want to try to apply this patch - 
 https://groups.google.com/forum/#!topic/osv-dev/LbnnY2Kcmak - that 
 should provide many useful debug printouts.

 There is another patch I have sent that fixes the versioned 
 self-lookup problem - 
 https://groups.google.com/forum/#!topic/osv-dev/d56plMGXi6E. I 
 wonder if it fixes your problem (my gut tells me yours is different).

 On Tuesday, December 10, 2019 at 3:07:09 AM UTC-5, Nadav Har'El 
 wrote:
>
> On Mon, Dec 9, 2019 at 10:51 PM zhiting zhu  
> wrote:
>
>> Hey,
>>
>> I'm encountering this when I'm using some

Re: [osv-dev] Lazily allocating thread stacks WIP

2019-12-13 Thread Waldek Kozaczuk



On Friday, December 13, 2019 at 5:26:46 PM UTC-5, Waldek Kozaczuk wrote:
>
> Some other thoughts:
>
> 1. All the increments and decrements of the *stack_page_read_counter* 
> should be symmetrical. Decrementing it in wait_for_interrupts() does not 
> seem to but if I remember correctly when I was trying to get it all to work 
> I had to put it there, maybe because we start with 1. But I am not longer 
> convinced it is necessary. We need to better understand it.
>
> 2. Ideally, we should not even be trying to read the next page on kernel 
> threads or even better on threads with stack mapped with mmap_stack. That 
> can be accomplished by initializing the *stack_page_read_counter *to some 
> higher value than 1 (10 or 11) so it never reaches 0 to trigger page read. 
> This possibly can be set as a 1 thing in the thread_main_c method (see 
> arch/x64/arch-switch.hh). It received thread pointer so we should somehow 
> be able to determine how its stack got constructed. If not we can add a 
> boolean flag to the thread class. 
>
> I really meant: "we should not even be trying to read the next page on 
kernel threads or even better on threads with stack NOT mapped with 
mmap_stack""

> 3. In theory, we could get by without *stack_page_read_counter *with just 
> checking sched::preemptable() AND somehow read the flags register (PUSHF?) 
> to directly check if interrupt flag is set or not in the read_stack_page 
> method. But I  have a feeling it would be more expensive.
>
> 4. Using the counter may not be that expensive given we already have use a 
> similar mechanism to implement preemptable().
>
> It would be nice to have Nadav to weigh in on that as it seems it all 
> either is related to the scheduler or affects it in some way.
>
> On Wednesday, December 11, 2019 at 5:07:58 PM UTC-5, Matthew Pabst wrote:
>>
>> Great stuff Waldek! That solved a few of the issues I was running into. 
>> However, tst-vfs.so is still failing occasionally with the error I 
>> mentioned earlier (std::length_error), which apparently is usually the 
>> result of an illegal memory access, probably caused by the changes to 
>> thread::init_stack() or arch::read_next_stack_page(). I was trying to debug 
>> the test case using GDB, but I couldn't figure out how to run tst-vfs.so in 
>> debug mode like the wiki examples. What is the best way to do this?
>>
>
> Are you saying that this test runs just fine without the lazy stack patch?
>
> The best way is to run the test in repeatable mode until it breaks like so:
>
> ./scripts/tests.py --name "/tests/tst-vfs.so" -r 
>
> Though by default when it fails or aborts it powers down automatically. 
> There is no way to prevent it by passing an option to the test.py script, 
> but you can manually edit the scripts/tests/testing.py and remove 
> '"--power-off-on-abort" from the command line constructed by 
> run_command_in_guest method. This will keep qemu and OSv running after 
> crash and let you connect with gdb from another terminal. Best is also 
> limit the number of vcpus to 1 (add '-c 1' to that same line in 
> run_command_in_guest). You can debug using regular release version:
>
> gdb build/release/loader.elf
> connect
> osv syms
> bt
>
> The interesting stack trace may be on other thread so use 'osv thread ' 
> to switch to whatever thread you need to. See 
> https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#debugging-osv-with-gdb
>  for 
> more details.
>
>>
>> Another question I had was if the calls to barrier() are necessary in 
>> your patch. Do they ensure that the running thread gets the correct 
>> stack_page_read_counter?
>>
> As I understand barrier() is a hint to a compiler not to perform certain 
> optimizations that might skew what we want to achieve. I was roughly 
> following what we do around preempt_counter but not sure if that is 
> necessary for stack_page_read_counter. Nadav would probably be the best 
> person to explain that.
>
>>
>> On Wednesday, December 11, 2019 at 9:08:11 AM UTC-6, Waldek Kozaczuk 
>> wrote:
>>
>>> It was pretty late when I was sending this email. So to recap more 
>>> clearly how this patch works:
>>>
>>> 1. The stack_
>>> page_read_counter is a thread-local variable initialized to 1 and 
>>> behaves similarly to the preempt_counter
>>> 2. For every new thread (except the very first one) its thread_main_c 
>>> (see 
>>> https://github.com/cloudius-systems/osv/blob/master/arch/x64/arch-switch.hh#L312-L323)
>>>  
>>> calls irq_enable() and preempt_enable() which more importantly decrements 
>>> and resets the stack_page_read_counter to 0.
>>> 3. From this point on any time preempt_disable() or the lock() method on 
>>> irq_lock_type or irq_save_lock_type is called, the read_next_stack_page is 
>>> called which ALWAYS increments the stack_page_read_counter counter but 
>>> ONLY reads one byte from the page ahead on stack if that counter is 1 (to 
>>> prevent nesting problem). If nested preempt_disable() or lock() on those 
>>> irq

Re: [osv-dev] how to find missing symbols if it doesn't print the name

2019-12-13 Thread zhiting zhu

No. It's
   Num:Value  Size TypeBind   Vis  Ndx Name
1: 0affa7f8 0 SECTION LOCAL  DEFAULT   21

It's not hidden.

Here's the new link:
https://send.firefox.com/download/9a8bf3fa2909635f/#0ZecrR7UJwspr743vNBo6A
to the file.

On Fri, Dec 13, 2019 at 4:44 PM Waldek Kozaczuk 
wrote:

> I am not sure but this issue is similar to what I encountered when dealing
> with dotnet:
>
> readelf -s libcoreclr.so | grep gCurrentThreadInfo
> readelf: Warning: local symbol 31 found at index >= .dynsym's sh_info
> value of 1
> 31: 24 TLS LOCAL  HIDDEN19
> gCurrentThreadInfo
>   9799: 24 TLS LOCAL  HIDDEN19
> gCurrentThreadInfo
>
> When you use readelf does it show it as a hidden, local TLS symbol?
>
> That link with compiles expired. Can you upload it somewhere again?
>
> Thanks
>
> On Friday, December 13, 2019 at 1:16:21 PM UTC-5, zhiting zhu wrote:
>>
>> I was reading the print wrong.
>> I think the problem is this one:
>>
>> 0b23ca60  00010010 R_X86_64_DTPMOD64 0aee9f58 .tbss + 0
>>
>> The sym is 1 but in this case, .tbss is still in the same shared object
>> but in arch_relocate_rela, it's going to the else branch that handles the
>> variable is located in DIFFERENT shared object that the caller.
>>
>> On Wed, Dec 11, 2019 at 3:12 PM zhiting zhu 
>> wrote:
>>
>>> I get rid of the __VA_OPT and replace the __VA_ARGS__ with
>>> ##__VA_ARGS__. ## seems to be the extension from GNU that does similar
>>> things.
>>>
>>> Here's the link of the TensorFlow that I compile myself:
>>> https://send.firefox.com/download/68d2a81e2cacdafb/#ui_lMMYAU9UW9e6Fd1VG7w
>>>
>>> Install it locally and build an osv vm that includes all the
>>> dependencies:
>>> "tensorflow grpc google _cffi_backend past future \
>>>   absl wrapt gast astor termcolor numpy unittest libfuturize \
>>>   keras_applications keras_preprocessing tensorflow_estimator \
>>>   tensorboard".
>>>
>>> Then just run "import tensorflow" in python shell, it should show you
>>> the error I'm looking at.
>>>
>>> On Wed, Dec 11, 2019 at 2:55 PM Waldek Kozaczuk 
>>> wrote:
>>>
 I wonder if that is related to a similar issue as described here -
 https://groups.google.com/forum/#!topic/osv-dev/k69cHw7qvTg.

 I will try to fix and apply this debug patch to master so it makes
 easier to debug it.

 Meanwhile, can you provide a reproducing test?

 On Wednesday, December 11, 2019 at 3:49:07 PM UTC-5, zhiting zhu wrote:
>
> After tracing inside arch_relocate_rela, it's failed in case
> R_X86_64_DTPMOD64.
>
> I print out the name index of that symbol and it's 0. I use readelf -a
> to check the .rela.dyn section.
> There's a line that doesn't have Sym. Value and Sym.Name + Addend is
> 0. I think I'm failing at that line.
>
>   Offset  Info   Type   Sym. ValueSym.
> Name + Addend
> *0b202f90  0010 R_X86_64_DTPMOD640*
> 0b23ca60  00010010 R_X86_64_DTPMOD64 0aee9f58 .tbss + 0
>
>
>
> On Wed, Dec 11, 2019 at 10:36 AM zhiting zhu 
> wrote:
>
>> Thanks for the debug patch. When I apply it, g++ complains about
>> "core/elf.cc:36:118: error: expected ‘)’ before ‘__VA_OPT__’"
>> I think __VA_OPT__ is only added to c++2a according to
>> https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html. Gcc 7.4.0
>> on Ubuntu cannot compile that.
>>
>> On Tue, Dec 10, 2019 at 7:20 AM Waldek Kozaczuk 
>> wrote:
>>
>>> You may want to try to apply this patch -
>>> https://groups.google.com/forum/#!topic/osv-dev/LbnnY2Kcmak - that
>>> should provide many useful debug printouts.
>>>
>>> There is another patch I have sent that fixes the versioned
>>> self-lookup problem -
>>> https://groups.google.com/forum/#!topic/osv-dev/d56plMGXi6E. I
>>> wonder if it fixes your problem (my gut tells me yours is different).
>>>
>>> On Tuesday, December 10, 2019 at 3:07:09 AM UTC-5, Nadav Har'El
>>> wrote:

 On Mon, Dec 9, 2019 at 10:51 PM zhiting zhu 
 wrote:

> Hey,
>
> I'm encountering this when I'm using some tensorflow functions:
>
> /lib/python3.6/tensorflow/python/_pywrap_tensorflow_internal.so:
> failed looking up symbol
>

 This is interesting, because the "failed looking up symbol" message
 is always followed by the name of the symbol looked up:

 core/elf.cc:abort("%s: failed looking up symbol %s\n",
 pathname().c_str(), demangle(name).c_str());

 You can try to add printouts in object::arch_relocate_rela() to try
 to understand which symbol() is being called
 with an empty name.


> [backtrace]
> 0x403442a7 
>

Re: [osv-dev] how to find missing symbols if it doesn't print the name

2019-12-13 Thread Waldek Kozaczuk

I am not sure but this issue is similar to what I encountered when dealing 
with dotnet:

readelf -s libcoreclr.so | grep gCurrentThreadInfo
readelf: Warning: local symbol 31 found at index >= .dynsym's sh_info value 
of 1
31: 24 TLS LOCAL  HIDDEN19 
gCurrentThreadInfo
  9799: 24 TLS LOCAL  HIDDEN19 
gCurrentThreadInfo

When you use readelf does it show it as a hidden, local TLS symbol?

That link with compiles expired. Can you upload it somewhere again?

Thanks

On Friday, December 13, 2019 at 1:16:21 PM UTC-5, zhiting zhu wrote:
>
> I was reading the print wrong.
> I think the problem is this one:
>
> 0b23ca60  00010010 R_X86_64_DTPMOD64 0aee9f58 .tbss + 0
>
> The sym is 1 but in this case, .tbss is still in the same shared object 
> but in arch_relocate_rela, it's going to the else branch that handles the 
> variable is located in DIFFERENT shared object that the caller. 
>
> On Wed, Dec 11, 2019 at 3:12 PM zhiting zhu  > wrote:
>
>> I get rid of the __VA_OPT and replace the __VA_ARGS__ with ##__VA_ARGS__. 
>> ## seems to be the extension from GNU that does similar things.
>>
>> Here's the link of the TensorFlow that I compile myself: 
>> https://send.firefox.com/download/68d2a81e2cacdafb/#ui_lMMYAU9UW9e6Fd1VG7w
>>
>> Install it locally and build an osv vm that includes all the dependencies:
>> "tensorflow grpc google _cffi_backend past future \
>>   absl wrapt gast astor termcolor numpy unittest libfuturize \
>>   keras_applications keras_preprocessing tensorflow_estimator \
>>   tensorboard". 
>>
>> Then just run "import tensorflow" in python shell, it should show you the 
>> error I'm looking at. 
>>
>> On Wed, Dec 11, 2019 at 2:55 PM Waldek Kozaczuk > > wrote:
>>
>>> I wonder if that is related to a similar issue as described here - 
>>> https://groups.google.com/forum/#!topic/osv-dev/k69cHw7qvTg.
>>>
>>> I will try to fix and apply this debug patch to master so it makes 
>>> easier to debug it.
>>>
>>> Meanwhile, can you provide a reproducing test? 
>>>
>>> On Wednesday, December 11, 2019 at 3:49:07 PM UTC-5, zhiting zhu wrote:

 After tracing inside arch_relocate_rela, it's failed in case 
 R_X86_64_DTPMOD64. 

 I print out the name index of that symbol and it's 0. I use readelf -a 
 to check the .rela.dyn section. 
 There's a line that doesn't have Sym. Value and Sym.Name + Addend is 0. 
 I think I'm failing at that line. 

   Offset  Info   Type   Sym. ValueSym. Name 
 + Addend
 *0b202f90  0010 R_X86_64_DTPMOD640*
 0b23ca60  00010010 R_X86_64_DTPMOD64 0aee9f58 .tbss + 0



 On Wed, Dec 11, 2019 at 10:36 AM zhiting zhu  
 wrote:

> Thanks for the debug patch. When I apply it, g++ complains about 
> "core/elf.cc:36:118: error: expected ‘)’ before ‘__VA_OPT__’"
> I think __VA_OPT__ is only added to c++2a according to 
> https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html. Gcc 7.4.0 on 
> Ubuntu cannot compile that. 
>
> On Tue, Dec 10, 2019 at 7:20 AM Waldek Kozaczuk  
> wrote:
>
>> You may want to try to apply this patch - 
>> https://groups.google.com/forum/#!topic/osv-dev/LbnnY2Kcmak - that 
>> should provide many useful debug printouts.
>>
>> There is another patch I have sent that fixes the versioned 
>> self-lookup problem - 
>> https://groups.google.com/forum/#!topic/osv-dev/d56plMGXi6E. I 
>> wonder if it fixes your problem (my gut tells me yours is different).
>>
>> On Tuesday, December 10, 2019 at 3:07:09 AM UTC-5, Nadav Har'El wrote:
>>>
>>> On Mon, Dec 9, 2019 at 10:51 PM zhiting zhu  
>>> wrote:
>>>
 Hey,

 I'm encountering this when I'm using some tensorflow functions:

 /lib/python3.6/tensorflow/python/_pywrap_tensorflow_internal.so: 
 failed looking up symbol 

>>>
>>> This is interesting, because the "failed looking up symbol" message 
>>> is always followed by the name of the symbol looked up:
>>>
>>> core/elf.cc:abort("%s: failed looking up symbol %s\n", 
>>> pathname().c_str(), demangle(name).c_str());
>>>
>>> You can try to add printouts in object::arch_relocate_rela() to try 
>>> to understand which symbol() is being called
>>> with an empty name. 
>>>
>>>
 [backtrace]
 0x403442a7 
 0x40397dce >>> unsigned int, void*, long)+574>

>>>  
>>>
 0x4033eed4 
 0x40341d27 
 0x40345623 
 >>> std::char_traits, std::allocator >, 
 std::vector, 
 std::allocator >, 
 std::allocator>>> std::char_traits, std::allocator > > >, 
 std::vector, 
 std::allocator > >&)+1459>
 0x40345e70

Re: [osv-dev] Lazily allocating thread stacks WIP

2019-12-13 Thread Waldek Kozaczuk

Some other thoughts:

1. All the increments and decrements of the *stack_page_read_counter* 
should be symmetrical. Decrementing it in wait_for_interrupts() does not 
seem to but if I remember correctly when I was trying to get it all to work 
I had to put it there, maybe because we start with 1. But I am not longer 
convinced it is necessary. We need to better understand it.

2. Ideally, we should not even be trying to read the next page on kernel 
threads or even better on threads with stack mapped with mmap_stack. That 
can be accomplished by initializing the *stack_page_read_counter *to some 
higher value than 1 (10 or 11) so it never reaches 0 to trigger page read. 
This possibly can be set as a 1 thing in the thread_main_c method (see 
arch/x64/arch-switch.hh). It received thread pointer so we should somehow 
be able to determine how its stack got constructed. If not we can add a 
boolean flag to the thread class. 

3. In theory, we could get by without *stack_page_read_counter *with just 
checking sched::preemptable() AND somehow read the flags register (PUSHF?) 
to directly check if interrupt flag is set or not in the read_stack_page 
method. But I  have a feeling it would be more expensive.

4. Using the counter may not be that expensive given we already have use a 
similar mechanism to implement preemptable().

It would be nice to have Nadav to weigh in on that as it seems it all 
either is related to the scheduler or affects it in some way.

On Wednesday, December 11, 2019 at 5:07:58 PM UTC-5, Matthew Pabst wrote:
>
> Great stuff Waldek! That solved a few of the issues I was running into. 
> However, tst-vfs.so is still failing occasionally with the error I 
> mentioned earlier (std::length_error), which apparently is usually the 
> result of an illegal memory access, probably caused by the changes to 
> thread::init_stack() or arch::read_next_stack_page(). I was trying to debug 
> the test case using GDB, but I couldn't figure out how to run tst-vfs.so in 
> debug mode like the wiki examples. What is the best way to do this?
>

Are you saying that this test runs just fine without the lazy stack patch?

The best way is to run the test in repeatable mode until it breaks like so:

./scripts/tests.py --name "/tests/tst-vfs.so" -r 

Though by default when it fails or aborts it powers down automatically. 
There is no way to prevent it by passing an option to the test.py script, 
but you can manually edit the scripts/tests/testing.py and remove 
'"--power-off-on-abort" from the command line constructed by 
run_command_in_guest method. This will keep qemu and OSv running after 
crash and let you connect with gdb from another terminal. Best is also 
limit the number of vcpus to 1 (add '-c 1' to that same line in 
run_command_in_guest). You can debug using regular release version:

gdb build/release/loader.elf
connect
osv syms
bt

The interesting stack trace may be on other thread so use 'osv thread ' 
to switch to whatever thread you need to. See 
https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#debugging-osv-with-gdb
 for 
more details.

>
> Another question I had was if the calls to barrier() are necessary in your 
> patch. Do they ensure that the running thread gets the correct 
> stack_page_read_counter?
>
As I understand barrier() is a hint to a compiler not to perform certain 
optimizations that might skew what we want to achieve. I was roughly 
following what we do around preempt_counter but not sure if that is 
necessary for stack_page_read_counter. Nadav would probably be the best 
person to explain that.

>
> On Wednesday, December 11, 2019 at 9:08:11 AM UTC-6, Waldek Kozaczuk wrote:
>
>> It was pretty late when I was sending this email. So to recap more 
>> clearly how this patch works:
>>
>> 1. The stack_
>> page_read_counter is a thread-local variable initialized to 1 and 
>> behaves similarly to the preempt_counter
>> 2. For every new thread (except the very first one) its thread_main_c 
>> (see 
>> https://github.com/cloudius-systems/osv/blob/master/arch/x64/arch-switch.hh#L312-L323)
>>  
>> calls irq_enable() and preempt_enable() which more importantly decrements 
>> and resets the stack_page_read_counter to 0.
>> 3. From this point on any time preempt_disable() or the lock() method on 
>> irq_lock_type or irq_save_lock_type is called, the read_next_stack_page is 
>> called which ALWAYS increments the stack_page_read_counter counter but 
>> ONLY reads one byte from the page ahead on stack if that counter is 1 (to 
>> prevent nesting problem). If nested preempt_disable() or lock() on those 
>> irq locks is called it will only increment the counter but not read from 
>> stack.
>> 4. Any time preempt_enable() or unlock() on on irq_lock_type or 
>> irq_save_lock_type is called, correspondingly the 
>> stack_page_read_counter is decremented (eventually to 0).
>> 5. Lastly, any time wait_for_interrupt is called (re-enabled interrupts) 
>> we also decrement the

Re: [osv-dev] how to find missing symbols if it doesn't print the name

2019-12-13 Thread zhiting zhu

I was reading the print wrong.
I think the problem is this one:

0b23ca60  00010010 R_X86_64_DTPMOD64 0aee9f58 .tbss + 0

The sym is 1 but in this case, .tbss is still in the same shared object but
in arch_relocate_rela, it's going to the else branch that handles the
variable is located in DIFFERENT shared object that the caller.

On Wed, Dec 11, 2019 at 3:12 PM zhiting zhu  wrote:

> I get rid of the __VA_OPT and replace the __VA_ARGS__ with ##__VA_ARGS__.
> ## seems to be the extension from GNU that does similar things.
>
> Here's the link of the TensorFlow that I compile myself:
> https://send.firefox.com/download/68d2a81e2cacdafb/#ui_lMMYAU9UW9e6Fd1VG7w
>
> Install it locally and build an osv vm that includes all the dependencies:
> "tensorflow grpc google _cffi_backend past future \
>   absl wrapt gast astor termcolor numpy unittest libfuturize \
>   keras_applications keras_preprocessing tensorflow_estimator \
>   tensorboard".
>
> Then just run "import tensorflow" in python shell, it should show you the
> error I'm looking at.
>
> On Wed, Dec 11, 2019 at 2:55 PM Waldek Kozaczuk 
> wrote:
>
>> I wonder if that is related to a similar issue as described here -
>> https://groups.google.com/forum/#!topic/osv-dev/k69cHw7qvTg.
>>
>> I will try to fix and apply this debug patch to master so it makes easier
>> to debug it.
>>
>> Meanwhile, can you provide a reproducing test?
>>
>> On Wednesday, December 11, 2019 at 3:49:07 PM UTC-5, zhiting zhu wrote:
>>>
>>> After tracing inside arch_relocate_rela, it's failed in case
>>> R_X86_64_DTPMOD64.
>>>
>>> I print out the name index of that symbol and it's 0. I use readelf -a
>>> to check the .rela.dyn section.
>>> There's a line that doesn't have Sym. Value and Sym.Name + Addend is 0.
>>> I think I'm failing at that line.
>>>
>>>   Offset  Info   Type   Sym. ValueSym. Name
>>> + Addend
>>> *0b202f90  0010 R_X86_64_DTPMOD640*
>>> 0b23ca60  00010010 R_X86_64_DTPMOD64 0aee9f58 .tbss + 0
>>>
>>>
>>>
>>> On Wed, Dec 11, 2019 at 10:36 AM zhiting zhu 
>>> wrote:
>>>
 Thanks for the debug patch. When I apply it, g++ complains about
 "core/elf.cc:36:118: error: expected ‘)’ before ‘__VA_OPT__’"
 I think __VA_OPT__ is only added to c++2a according to
 https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html. Gcc 7.4.0 on
 Ubuntu cannot compile that.

 On Tue, Dec 10, 2019 at 7:20 AM Waldek Kozaczuk 
 wrote:

> You may want to try to apply this patch -
> https://groups.google.com/forum/#!topic/osv-dev/LbnnY2Kcmak - that
> should provide many useful debug printouts.
>
> There is another patch I have sent that fixes the versioned
> self-lookup problem -
> https://groups.google.com/forum/#!topic/osv-dev/d56plMGXi6E. I wonder
> if it fixes your problem (my gut tells me yours is different).
>
> On Tuesday, December 10, 2019 at 3:07:09 AM UTC-5, Nadav Har'El wrote:
>>
>> On Mon, Dec 9, 2019 at 10:51 PM zhiting zhu  wrote:
>>
>>> Hey,
>>>
>>> I'm encountering this when I'm using some tensorflow functions:
>>>
>>> /lib/python3.6/tensorflow/python/_pywrap_tensorflow_internal.so:
>>> failed looking up symbol
>>>
>>
>> This is interesting, because the "failed looking up symbol" message
>> is always followed by the name of the symbol looked up:
>>
>> core/elf.cc:abort("%s: failed looking up symbol %s\n",
>> pathname().c_str(), demangle(name).c_str());
>>
>> You can try to add printouts in object::arch_relocate_rela() to try
>> to understand which symbol() is being called
>> with an empty name.
>>
>>
>>> [backtrace]
>>> 0x403442a7 
>>> 0x40397dce >> unsigned int, void*, long)+574>
>>>
>>
>>
>>> 0x4033eed4 
>>> 0x40341d27 
>>> 0x40345623
>>> >> std::char_traits, std::allocator >,
>>> std::vector,
>>> std::allocator >, std::allocator>> std::char_traits, std::allocator > > >,
>>> std::vector,
>>> std::allocator > >&)+1459>
>>> 0x40345e70
>>> >> std::char_traits, std::allocator >,
>>> std::vector,
>>> std::allocator >, std::allocator>> std::char_traits, std::allocator > > >, bool)+336>
>>> 0x40465fd8 
>>> 0x10937228 <_PyImport_FindSharedFuncptr+376>
>>> 0x745f70617277796f 
>>>
>>> It seems the name is not print out.
>>>
>>> If I'm using the check-libcfunc-avail.sh to check the
>>> _pywrap_tensorflow_internal.so, I get the following output:
>>>
>>> pthread_mutex_consistent not found
>>> pthread_mutexattr_setrobust not found
>>> fmaf not found
>>> fma not found
>>> mallinfo not found
>>>
>>
>> All of these we should eventually add, these are real Linux glibc
>> functions...
>> Feel free to open issues

Re: [osv-dev] [PATCH] Signed-off-by: BassMatt

2019-12-13 Thread Pekka Enberg

On Thu, Dec 12, 2019 at 10:02 PM BassMatt  wrote:
>
> Main scripts in scripts/ folder updated to use Python3
>
> I went through the scripts detailed in scripts/README and updated them to use 
> Python3. I used the Python "Future" module to provide suggestions, then 
> manually went through and applied the changes. The "Future" module gives 
> suggestions to allow for cross-compatibility between Python2/3, but since it 
> was expressed that only Python3 needed to be supported, I left all that out.
>
> The issue is detailed here:
> https://github.com/cloudius-systems/osv/issues/1056

Looks good, thanks!

Acked-by: Pekka Enberg 

I assume this is you:

https://github.com/BassMatt

Please fix the sign-off to be:

Signed-off-by: Real Name 

as per contributions guide:

https://github.com/cloudius-systems/osv/blob/master/CONTRIBUTING

- Pekka

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CAGUyND8PC%3DD0ctpO0FxX6mJhBJTxr0d1jiF3-zGRLg_16UCn%3Dw%40mail.gmail.com.

Re: [osv-dev] how to find missing symbols if it doesn't print the name

Re: [osv-dev] Lazily allocating thread stacks WIP

Re: [osv-dev] how to find missing symbols if it doesn't print the name

Re: [osv-dev] how to find missing symbols if it doesn't print the name

Re: [osv-dev] Lazily allocating thread stacks WIP

Re: [osv-dev] how to find missing symbols if it doesn't print the name

Re: [osv-dev] [PATCH] Signed-off-by: BassMatt

7 matches

Site Navigation

Mail list logo

Footer information