Re: [v8-users] gc and threading

2019-05-20 Thread 'Peter Marshall' via v8-users
If you block the main thread at a safe time (e.g. not during GC) then you 
can probably access heap objects from your other threads without handles as 
long as you do your own synchronization between the background threads.
Not sure how concurrent marking threads from the GC will feel about that 
though.

On Friday, May 17, 2019 at 5:43:08 PM UTC+2, Ledion Bitincka wrote:
>
> >You could read from the heap on a concurrent thread but there is no 
> synchronization when writing to the heap from the main thread
>
> My current thinking is something like this - ie I'd be blocking in the 
> main thread until the serialization is done 
> in main thread:
>  handles[] = getHandlesToSerialize(); 
>  results[] = []
>  threads[] = spawnThreads(N, handles, results);
>
>  join(threads);
>
>
> One problem though is that I can't use the Handles directly in other 
> threads as they'd require access to the Isolate and thus would require 
> synchronization to enter/lock the Isolate. So the only option, as you 
> alluded to, is to copy off-heap then serialize, which I need to test to see 
> if it would result into any perf benefits due to the high setup cost for 
> going multi-threaded. Maybe I'll find a way to actually keep pointers to 
> underlying data rather than actually copy - will report here what I find. 
>
>
>
>
> On Thursday, May 16, 2019 at 6:18:06 AM UTC-7, Peter Marshall wrote:
>>
>> On Wednesday, May 15, 2019 at 9:09:18 PM UTC+2, Ledion Bitincka wrote:
>>>
>>> Thanks! 
>>>
>>> > While I understand that this is tempting, please be aware that only 
>>> one thread may be active in one Isolate at any given time.
>>> I was hoping that there could be multiple threads that had "read-only" 
>>> access to an Isolate's heap - I was looking through how ValueSerializer 
>>> works 
>>> and now I understand there's no such thing as "read-only" given the 
>>> getter/setter functions. However, wondering if this would be possible for 
>>> simple objects (key/value) and bail for more complex ones. Any other 
>>> suggestions for how that work can be parallelized, is it even possible? 
>>> (this is custom serialization, not JSON)
>>>

 You could read from the heap on a concurrent thread but there is no 
>> synchronization when writing to the heap from the main thread, so there's 
>> no guarantee that what you are reading is not being concurrently written 
>> e.g. when it is being allocated, when it is modified by user JS code or 
>> when the GC moves it.
>>
>> If the serialization work itself was particularly expensive (e.g. the 
>> format is very complicated or the data requires a lot of processing) then 
>> you could copy the relevant parts of the heap objects off-heap and then 
>> serialize from concurrent threads.
>>
>

-- 
-- 
v8-users mailing list
v8-users@googlegroups.com
http://groups.google.com/group/v8-users
--- 
You received this message because you are subscribed to the Google Groups 
"v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-users/2e9675e8-1f1e-4c8d-9cac-f990342b01ea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [v8-users] gc and threading

2019-05-17 Thread Ledion Bitincka
>You could read from the heap on a concurrent thread but there is no 
synchronization when writing to the heap from the main thread

My current thinking is something like this - ie I'd be blocking in the main 
thread until the serialization is done 
in main thread:
 handles[] = getHandlesToSerialize(); 
 results[] = []
 threads[] = spawnThreads(N, handles, results);

 join(threads);


One problem though is that I can't use the Handles directly in other 
threads as they'd require access to the Isolate and thus would require 
synchronization to enter/lock the Isolate. So the only option, as you 
alluded to, is to copy off-heap then serialize, which I need to test to see 
if it would result into any perf benefits due to the high setup cost for 
going multi-threaded. Maybe I'll find a way to actually keep pointers to 
underlying data rather than actually copy - will report here what I find. 




On Thursday, May 16, 2019 at 6:18:06 AM UTC-7, Peter Marshall wrote:
>
> On Wednesday, May 15, 2019 at 9:09:18 PM UTC+2, Ledion Bitincka wrote:
>>
>> Thanks! 
>>
>> > While I understand that this is tempting, please be aware that only one 
>> thread may be active in one Isolate at any given time.
>> I was hoping that there could be multiple threads that had "read-only" 
>> access to an Isolate's heap - I was looking through how ValueSerializer 
>> works 
>> and now I understand there's no such thing as "read-only" given the 
>> getter/setter functions. However, wondering if this would be possible for 
>> simple objects (key/value) and bail for more complex ones. Any other 
>> suggestions for how that work can be parallelized, is it even possible? 
>> (this is custom serialization, not JSON)
>>
>>>
>>> You could read from the heap on a concurrent thread but there is no 
> synchronization when writing to the heap from the main thread, so there's 
> no guarantee that what you are reading is not being concurrently written 
> e.g. when it is being allocated, when it is modified by user JS code or 
> when the GC moves it.
>
> If the serialization work itself was particularly expensive (e.g. the 
> format is very complicated or the data requires a lot of processing) then 
> you could copy the relevant parts of the heap objects off-heap and then 
> serialize from concurrent threads.
>

-- 
-- 
v8-users mailing list
v8-users@googlegroups.com
http://groups.google.com/group/v8-users
--- 
You received this message because you are subscribed to the Google Groups 
"v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-users/6284f2d2-2229-4d9b-b26e-6140f015c2e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [v8-users] gc and threading

2019-05-16 Thread 'Peter Marshall' via v8-users
On Wednesday, May 15, 2019 at 9:09:18 PM UTC+2, Ledion Bitincka wrote:
>
> Thanks! 
>
> > While I understand that this is tempting, please be aware that only one 
> thread may be active in one Isolate at any given time.
> I was hoping that there could be multiple threads that had "read-only" 
> access to an Isolate's heap - I was looking through how ValueSerializer works 
> and now I understand there's no such thing as "read-only" given the 
> getter/setter functions. However, wondering if this would be possible for 
> simple objects (key/value) and bail for more complex ones. Any other 
> suggestions for how that work can be parallelized, is it even possible? 
> (this is custom serialization, not JSON)
>
>>
>> You could read from the heap on a concurrent thread but there is no 
synchronization when writing to the heap from the main thread, so there's 
no guarantee that what you are reading is not being concurrently written 
e.g. when it is being allocated, when it is modified by user JS code or 
when the GC moves it.

If the serialization work itself was particularly expensive (e.g. the 
format is very complicated or the data requires a lot of processing) then 
you could copy the relevant parts of the heap objects off-heap and then 
serialize from concurrent threads.

-- 
-- 
v8-users mailing list
v8-users@googlegroups.com
http://groups.google.com/group/v8-users
--- 
You received this message because you are subscribed to the Google Groups 
"v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-users/910c73d5-4071-4ddf-8e63-6e896a777c46%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [v8-users] gc and threading

2019-05-15 Thread Ledion Bitincka
Thanks! 

> While I understand that this is tempting, please be aware that only one 
thread may be active in one Isolate at any given time.
I was hoping that there could be multiple threads that had "read-only" 
access to an Isolate's heap - I was looking through how ValueSerializer works 
and now I understand there's no such thing as "read-only" given the 
getter/setter functions. However, wondering if this would be possible for 
simple objects (key/value) and bail for more complex ones. Any other 
suggestions for how that work can be parallelized, is it even possible? 
(this is custom serialization, not JSON)


On Wednesday, May 15, 2019 at 11:34:11 AM UTC-7, Jakob Kummerow wrote:
>
> I'm trying to improve some serialization code in NodeJS and was wondering 
>> if this could be achieved by going down to the native code and using 
>> multiple threads to parallelize the serialization of multiple objects. For 
>> example, imagine we need to serialize 1K objects to JSON/OtherFormat - 
>> split into 4 x 250 batches and use 4 (non-main) threads to serialize the 
>> objects, then gather the results and return.  
>>
>
> While I understand that this is tempting, please be aware that only one 
> thread may be active in one Isolate at any given time. Lifting this 
> restriction would require huge effort, and/or only apply in limited 
> circumstances (e.g., when an object has a custom toJSON function, you 
> would have to bail out somehow, because such a function could cause 
> arbitrary modifications to any other object).
>  
>
>> However, I'm having a bit of a hard time understanding a few concepts:
>>
>> 1. when does the GC kick in for an Isolate? my understanding is that *any 
>> *V8 code executed in an Isolate can trigger GC for that Isolate  - is 
>> this correct? 
>>
>
> Any *allocation* on the managed heap can trigger GC. Of course, when you 
> don't know what code you're calling (e.g., a user-provided function), then 
> you have to assume that it might allocate. We use DisallowHeapAllocation 
> scopes to guard sections where we are sure that no allocation (and hence no 
> GC) can happen.
>
> 2. when GC triggers, what happens to pointers that might have been 
>> extracted from Handle *and* the Handle is still in scope?  
>>
>> HandleScope scope(isolate);
>> Handle foo = ...
>> Object* fooPtr = *foo;
>> someCallThatTriggersGC()
>> // is fooPtr still valid???
>>
>  
> Raw pointers will become stale, no matter where they came from. So in this 
> example, fooPtr will be invalid; foo will still be valid (that's the 
> point of having Handles). 
>
>

-- 
-- 
v8-users mailing list
v8-users@googlegroups.com
http://groups.google.com/group/v8-users
--- 
You received this message because you are subscribed to the Google Groups 
"v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-users/183feb94-a4c4-44e6-b881-02d40fa9bd8b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [v8-users] gc and threading

2019-05-15 Thread Jakob Kummerow
>
> I'm trying to improve some serialization code in NodeJS and was wondering
> if this could be achieved by going down to the native code and using
> multiple threads to parallelize the serialization of multiple objects. For
> example, imagine we need to serialize 1K objects to JSON/OtherFormat -
> split into 4 x 250 batches and use 4 (non-main) threads to serialize the
> objects, then gather the results and return.
>

While I understand that this is tempting, please be aware that only one
thread may be active in one Isolate at any given time. Lifting this
restriction would require huge effort, and/or only apply in limited
circumstances (e.g., when an object has a custom toJSON function, you would
have to bail out somehow, because such a function could cause arbitrary
modifications to any other object).


> However, I'm having a bit of a hard time understanding a few concepts:
>
> 1. when does the GC kick in for an Isolate? my understanding is that *any
> *V8 code executed in an Isolate can trigger GC for that Isolate  - is
> this correct?
>

Any *allocation* on the managed heap can trigger GC. Of course, when you
don't know what code you're calling (e.g., a user-provided function), then
you have to assume that it might allocate. We use DisallowHeapAllocation
scopes to guard sections where we are sure that no allocation (and hence no
GC) can happen.

2. when GC triggers, what happens to pointers that might have been
> extracted from Handle *and* the Handle is still in scope?
>
> HandleScope scope(isolate);
> Handle foo = ...
> Object* fooPtr = *foo;
> someCallThatTriggersGC()
> // is fooPtr still valid???
>

Raw pointers will become stale, no matter where they came from. So in this
example, fooPtr will be invalid; foo will still be valid (that's the point
of having Handles).

-- 
-- 
v8-users mailing list
v8-users@googlegroups.com
http://groups.google.com/group/v8-users
--- 
You received this message because you are subscribed to the Google Groups 
"v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-users/CAKSzg3TQRFkgXD1GMGDEAYJS3RSuBW_L-AqpLq2CPc16D30O%2BA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.