Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Murphy, Sean
> On Tuesday, 1 February 2022 11:12:29 PST Murphy, Sean wrote:
> > So if I understand you correctly, instantiating 60,000 of them when
> > you really only need one would be considered Not Advised?!?!
> 
> Correct.
> 
> I find it hard to believe you have a valid use-case for 60,000 deterministic
> pseudo-random generators, though.

I definitely didn't! But that didn't stop me from doing it...
Sean
___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Thiago Macieira
On Tuesday, 1 February 2022 11:12:29 PST Murphy, Sean wrote:
> So if I understand you correctly, instantiating 60,000 of them when you
> really only need one would be considered Not Advised?!?!

Correct.

I find it hard to believe you have a valid use-case for 60,000 deterministic 
pseudo-random generators, though.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Murphy, Sean
> On Tuesday, 1 February 2022 08:30:52 PST Murphy, Sean wrote:
> > I just made that switch - removed the QRandomGenerator member
> variable
> > From the tile class, and calling
> > QRandomGenerator::global()->bounded(min,
> > max). Now creating  + assigning each tile plummeted from 18 seconds to
> > 15 milliseconds.
> 
> Unfortunately, QRNG has now as ABI requirement that it uses the Mersenne
> twister. That means it's 624 * 4 bytes in size (plus overhead) and must seed
> that thing, which is non-trivial math.

So if I understand you correctly, instantiating 60,000 of them when you really 
only need one would be considered Not Advised?!?!

Sean
___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Thiago Macieira
On Tuesday, 1 February 2022 08:30:52 PST Murphy, Sean wrote:
> I just made that switch - removed the QRandomGenerator member variable
> From the tile class, and calling QRandomGenerator::global()->bounded(min,
> max). Now creating  + assigning each tile plummeted from 18 seconds to 15
> milliseconds.

Unfortunately, QRNG has now as ABI requirement that it uses the Mersenne 
twister. That means it's 624 * 4 bytes in size (plus overhead) and must seed 
that thing, which is non-trivial math.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel DPG Cloud Engineering



___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Murphy, Sean
> Depending on your QFuture setup, you could monitor each tile, and when it
> completes, update the min max...

I think from the testing I did yesterday, and this comment from 
https://doc.qt.io/qt-5/qfuturewatcher.html prevents you from knowing WHICH
result is ready if your map function returns void:

"QFutureWatcher is specialized to not contain any of the result fetching 
functions. Any QFuture can be watched by a QFutureWatcher as well. 
This is useful if only status or progress information is needed; not the actual 
result data."

One of my early go arounds yesterday had me connecting the 
QFutureWatcher::resultReadyAt(int index) from the watcher that was 
monitoring 
the tile::load() call to a slot in my tileManager class. That slot was never 
called. If I 
used the QFutureWatcher::progressValueChanged(int progressValue) signal 
instead, the slot was called as each tile completed. But the issue there is 
that you 
only know HOW MANY have completed, not WHICH ONES. 

So from what I could tell, if you want to work with the result of an individual 
item right 
when it comes in, the function you pass as the map function has to have a 
non-void return 
type so that you aren't instantiating a QFutureWatcher which doesn't fire 
those signals, 
but instead instantiate a QFutureWatcher and then you can use the 
resultReadyAt(int index)

> Passing the manager to each tile, and when the tile is finished update the
> manager with the tiles value.  Then a simple mutex should allow you to not
> collide.  No signals, just a simple function call.

Oh man, my head is already bursting with the QtConcurrent and QFuture stuff, 
now you want to 
toss QMutex at me too?!?! 
 
> When the future watcher says all tiles are done, you will also have the
> min/max computed at that point, so you can kick off phase 2.
> 
> While I love the signal slot for "generic, I don’t know who needs this
> information" type design. Sometimes the cost/overhead of being a qobject
> and sending the signal, especially when it’s a very discrete "signal" and not
> generic in any means, it can be overkill.
> 
> To me, the manager you have doesn’t need to know "is a tile finished", ie a
> generic signal.  But rather what Is the min max of the tile when finished a
> specific signal.  For that level, a simple function works.

From where I'm at right now, I think that can be accomplished by changing my 
loadTile function from:
void loadTile(tile )
{
t.load();
}

QPair loadTile(tile )
{
t.load();
return t.getMinMax();
}

Then change to QtConcurrent::mapped() instead of map() since my future type 
is now QPair and then I can connect the 
QFutureWatcher>::resultReadyAt(int index) to a slot that 
updates the global min/max.

Although as I'm reading through everything, this also might the point to use 
QtConcurrent::mappedReduced(). So many options to choose from...

Sean
___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Scott Bloom
Depending on your QFuture setup, you could monitor each tile, and when it 
completes, update the min max...

Passing the manager to each tile, and when the tile is finished update the 
manager with the tiles value.  Then a simple mutex should allow you to not 
collide.  No signals, just a simple function call.

When the future watcher says all tiles are done, you will also have the min/max 
computed at that point, so you can kick off phase 2.

While I love the signal slot for "generic, I don’t know who needs this 
information" type design. Sometimes the cost/overhead of being a qobject and 
sending the signal, especially when it’s a very discrete "signal" and not 
generic in any means, it can be overkill.

To me, the manager you have doesn’t need to know "is a tile finished", ie a 
generic signal.  But rather what Is the min max of the tile when finished a 
specific signal.  For that level, a simple function works.

Scott

-Original Message-
From: Interest  On Behalf Of Murphy, Sean
Sent: Tuesday, February 1, 2022 9:32 AM
To: interest@qt-project.org
Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

>   that definitely does.
> 
> Of course I wonder if you had removed that, but left in the 
> QObject etc etc, what would it have been.  Likely, not much worse than 15ms.

Yep, once I removed the random generator object, I had the same thought: 
"could I go back to QObject?!?!", but as you mention, I don't think there's any 
reason for me to do that, unless I can think of a reason that I need to provide 
some sort of signal down at a tile level beyond just completion progress (since 
QFutureWatcher can handle the progress part). 

The only potential signal that comes to mind is between the tile::load() and 
tile::remap() steps I do need to know the min/max raw data values to compute 
what the remap parameters are. Currently I have to wait for the 
loadFutureWatcher to report it is finished, then I take a pass through every 
tile to calculate the global min/max values, then I can kick off the remap 
process with those parameters in hand. If the tile was an QObject and it 
emitted a signal to relay that information back then the global min/max is 
getting updated as each tile completes its first step, which means as soon as 
the last tile finishes, I'm good to go with the remap step instead of taking 
that pass through all the tiles to determine the global min/max.

But I could also accomplish the same thing by modifying the loadTIle function 
that is passed to QtConcurrent to return a QPair instead of 
returning void like it currently is Does. Then I would connect the future 
watcher's resultReadyAt(int index) signal to a slot in my tile manager that 
accomplishes the same thing, without making my tile class inherit from QObject.

> 
> Why? IMO, Qt wouldn’t be what it is today, if simply allocating 60k 
> QObjects and connecting a signal to them too that long.
> 
> But I think the overall architecture of what you have now, is MUCH 
> better and will scale to a more complex "actually do the work" system 
> much better than the other.

I think I'm happy with it as a proof-of-concept design as I have it now - or at 
least once I replace the placeholder delays with the code that actually works 
on the data file. After that, I want to also implement the idea Konstantin 
suggested of just dividing the original image up into
QThread::idealThreadCount() blocks and then compare the results on actual data 
and see which performs better. I'm guessing it'll be the latter since on my 
idealThreadCount() == 8 system, there's probably way less overhead in creating 
8 things that do the work than 60,000+. 
Especially since the 60,000 items are still going to be funneled through 8 
threads anyways.

Sean
___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest
___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Murphy, Sean
>   that definitely does.
> 
> Of course I wonder if you had removed that, but left in the QObject etc
> etc, what would it have been.  Likely, not much worse than 15ms.

Yep, once I removed the random generator object, I had the same thought: 
"could I go back to QObject?!?!", but as you mention, I don't think there's any 
reason for me to do that, unless I can think of a reason that I need to provide 
some sort of signal down at a tile level beyond just completion progress (since 
QFutureWatcher can handle the progress part). 

The only potential signal that comes to mind is between the tile::load() and 
tile::remap() 
steps I do need to know the min/max raw data values to compute what the remap 
parameters are. Currently I have to wait for the loadFutureWatcher to report it 
is 
finished, then I take a pass through every tile to calculate the global min/max 
values, 
then I can kick off the remap process with those parameters in hand. If the 
tile was 
an QObject and it emitted a signal to relay that information back then the 
global 
min/max is getting updated as each tile completes its first step, which means 
as soon 
as the last tile finishes, I'm good to go with the remap step instead of taking 
that pass 
through all the tiles to determine the global min/max.

But I could also accomplish the same thing by modifying the loadTIle function 
that is passed to 
QtConcurrent to return a QPair instead of returning void like 
it currently is 
Does. Then I would connect the future watcher's resultReadyAt(int index) signal 
to a slot 
in my tile manager that accomplishes the same thing, without making my tile 
class 
inherit from QObject.

> 
> Why? IMO, Qt wouldn’t be what it is today, if simply allocating 60k QObjects
> and connecting a signal to them too that long.
> 
> But I think the overall architecture of what you have now, is MUCH better
> and will scale to a more complex "actually do the work" system much better
> than the other.

I think I'm happy with it as a proof-of-concept design as I have it now - or at 
least once I 
replace the placeholder delays with the code that actually works on the data 
file. After that, 
I want to also implement the idea Konstantin suggested of just dividing the 
original image up into 
QThread::idealThreadCount() blocks and then compare the results on actual data 
and see which performs better. I'm guessing it'll be the latter since on my 
idealThreadCount() == 8 
system, there's probably way less overhead in creating 8 things that do the 
work than 60,000+. 
Especially since the 60,000 items are still going to be funneled through 8 
threads anyways.

Sean
___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Scott Bloom
  that definitely does.

Of course I wonder if you had removed that, but left in the QObject etc 
etc, what would it have been.  Likely, not much worse than 15ms.

Why? IMO, Qt wouldn’t be what it is today, if simply allocating 60k QObjects 
and connecting a signal to them too that long.

But I think the overall architecture of what you have now, is MUCH better and 
will scale to a more complex "actually do the work" system much better than the 
other.

Scott

-Original Message-
From: Interest  On Behalf Of Murphy, Sean
Sent: Tuesday, February 1, 2022 8:31 AM
To: interest@qt-project.org
Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

> Subject: RE: [Interest] [External]Re: How to get QtConcurrent to do 
> what I want?
> 
> Something seems off.
> 
> But without looking at the actual code that is allocation 60k tiles 
> and the constructor itself, it just seems like a very expensive 
> construction if the "new" + "moving a pointer" is taking 3ms each.

I finally figured that part out. 

Since this is just test code and doesn't do any real work yet, I was putting a 
delays in
tile::load() and tile::remap() just to simulate that those functions take time 
to execute. 
To create a little realism, I was randomizing the amount of that delay instead 
of using a fixed delay. To accomplish that, I mistakenly added a 
QRandomGenerator as a member variable of the tile class, instead of just 
calling QRandomGenerator::global()->bounded(min, max) in the tile::load() and 
tile::remap()functions. Each tile certainly doesn't need its own uniquely seed 
random number sequence...

I just made that switch - removed the QRandomGenerator member variable From the 
tile class, and calling QRandomGenerator::global()->bounded(min, max). Now 
creating  + assigning each tile plummeted from 18 seconds to 15 milliseconds.

That seems pretty acceptable to me...

Sean

___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest
___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Murphy, Sean
> Subject: RE: [Interest] [External]Re: How to get QtConcurrent to do what I
> want?
> 
> Something seems off.
> 
> But without looking at the actual code that is allocation 60k tiles and the
> constructor itself, it just seems like a very expensive construction if the
> "new" + "moving a pointer" is taking 3ms each.

I finally figured that part out. 

Since this is just test code and doesn't do any real work yet, I was putting a 
delays in 
tile::load() and tile::remap() just to simulate that those functions take time 
to execute. 
To create a little realism, I was randomizing the amount of that delay instead 
of using a 
fixed delay. To accomplish that, I mistakenly added a QRandomGenerator as a 
member 
variable of the tile class, instead of just calling 
QRandomGenerator::global()->bounded(min, max)
in the tile::load() and tile::remap()functions. Each tile certainly doesn't 
need its own 
uniquely seed random number sequence...

I just made that switch - removed the QRandomGenerator member variable 
>From the tile class, and calling QRandomGenerator::global()->bounded(min, 
>max). Now 
creating  + assigning each tile plummeted from 18 seconds to 15 milliseconds.

That seems pretty acceptable to me...

Sean

___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Scott Bloom
Something seems off.  

But without looking at the actual code that is allocation 60k tiles and the 
constructor itself, it just seems like a very expensive construction if the 
"new" + "moving a pointer" is taking 3ms each.

Scott

-Original Message-
From: Interest  On Behalf Of Murphy, Sean
Sent: Tuesday, February 1, 2022 7:32 AM
To: interest@qt-project.org
Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

> Not knowing if a partial value makes any sense to your system.
> Qt::Concurrent::mappedReduced might make more sense, if its purely a 
> speedup you are looking for, and not a "keep the GUI alive during it" 
> possibly blockingMappedReduced.

I don't think mappedReduced would help me until after the remapping step when I 
want to assemble the individual tiles into a QImage - although since the 
ultimate destination for the image is inside a QGraphicsView, I'm tempted to 
just leave it as individual tiles, but I'm not there yet as far as testing.

Regarding the keep GUI alive portion, my tileManager is already running in a 
separate thread from the UI thread, so I do have the option of making blocking 
calls in the tileManager class with the exception that I do need to fire 
progress signals out of tileManager back to MainWindow to provide progress to 
the user

> 
> If you need the gui, setting up a qfuturewatcher on the results of the 
> mapped call, would be my approach
> 
> QFutureWatcher< XXX > watcher;
> connect( , , manager, );
> 
> auto future = QtConcurrent::mapReduced(tiles, processTile, 
> mergeFunction) watcher.setFuture( future );

I actually spent yesterday refactoring the tileManager class as you've just 
described, as well as changing out the tile class to no longer inherit from 
QObject. I've got a couple more things I want to try/clean up today but I still 
seem to be having trouble speeding up the allocation & assignment of the tiles 
themselves.

As my code currently stands, I now have to vectors, each of which will have 
60,000 items in them once they're populated:
  QVector mTileIndices;
  QList mTiles;

The mTileIndices vector is implementing Andrei's idea of quickly generating a 
list of unique tile indices, which can then be fed to a QtConcurrent::map() 
call to create & uniquely assign the tile items in parallel. The " mTiles " 
vector is obviously the tiles that will do the work.

As I build the vectors up, these are the timings I get:
  1. resizing tile index vector to 6 took 0.1514 ms
  a. This is just calling QVector::resize(6) on the ID. This takes 
less than a 
   millisecond, so no complaints here.
  2. allocated 6 indices in 0.0207 ms
  a. This is calling std::iota(mTileIndices.begin(), mTileIndices.end(), 
0). Also takes less than 
  a millisecond, still no complaints
  3. assigning 6 tiles took 18087.1 ms
  a. This timing is the result of QtConcurrent::mapped(mTiles, initTile) 
where the initTile 
  function takes in an integer from mTileIndices, and calls the tile 
constructor using the 
  combination of the tile index and tile size to do the assignment. The 
assigned tiles end 
  up in the mTiles
  b. This step takes 18 seconds, which seems excessive to me and I'd love 
to continue to 
   reduce that time.
  4. load finished in 37507.1 ms
  a. this is calling tile::load() on each tile. Right now that is just a 
dummy function that calls 
  msleep for a random amount milliseconds to simulate doing the actual 
work
  5. remapping 6 tiles took 48519.3 ms
  b. this is calling tile::remap() on each tile. Right now that is just a 
dummy function that 
  calls msleep for a random amount milliseconds to simulate doing the 
actual work

So the only step in this process that still bothers me is step 3 - creating and 
assigning each tile object takes 18 seconds. I log the total time by each tile 
spent in steps 4 & 5 and compare how long steps 4 & 5 actually take vs. the sum 
of how long each tile spent sleep and I routinely get a speedup factor of about 
7.5. QThread::idealThreadCount() reports 8 on my machine, so I think I'm 
getting what I should expect from those steps on this machine.

I'm not sure what else to try at this point. One thing I was thinking about 
measuring is that even though my tile class no longer inherits from QObject, it 
still is a class with a constructor and some getter functions. And when I look 
at the combination of steps 1 & 2 where I both resize a vector of integers AND 
assign each one a unique ID in less than a millisecond total, but then it takes 
me 18 seconds to create and assign each tile, I keep wondering if there isn't 
room for improvement there still?

And as I was typing this whole thing up, Konstantin gave me a different 
approach that I'll probably pursue, but I would like to better understand how 
to solve the question of "if you absolutely need to have a lot of items 
(whatever type 

Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Michael Jackson
I've gotten a bit lost in these requirements but having written a large piece 
of open source data analysis software (including image processing and tiling of 
data to form and image) there are certain scenarios when processing an image 
that you can share the entire image with *all* of the threads because each 
thread works on its own small part of the image and does NOT care about any 
other region of the image. This is the "embarrassingly parallel" algorithm you 
always hear about.

Like Konstantin suggested get the total number of "CPUs" that you want to use. 
Tell each thread what area it will work on and send the pointer to the image to 
process to each thread. You can work out the user feedback as the threads go 
through each section. We do this all the time in our program (also built with 
Qt5). Just my 2 cents.

Also an alternate suggestion would be to take a look at Threading Building 
Blocks (TBB). It has worked out very nicely for us and is a compliment to 
QThread in certain scenarios. We use both QThread and TBB in our application.

--
Mike Jackson
https://www.github.com/bluequartzsoftware/DREAM3D

On 2/1/22, 10:21 AM, "Interest on behalf of Murphy, Sean" 
 
wrote:

> On Mon, Jan 31, 2022 at 7:15 PM Murphy, Sean 
 wrote:
> >   1. Creating 60,000 QObjects in a single thread appears to be slow  
> [...]
>
> Ehm, maybe I'm not understanding something, but why do you need objects 
to begin with?

I do need to report progress back to the UI and I mistakenly thought I 
needed to have 
each tile inherit from QObject to provide that functionality. After reading 
up a bit more about 
QFuture and QFutureWatcher and then refactoring yesterday to use those 
classes, as of the 
moment, my tile class no longer inherits from anything, and I'm able to use 
signals from 
QFutureWatcher to relay progress back to the UI. 

> The actual loop is this:
>// generate each tile with its assignment
> [snip]
>
> What's the significance of the tiles? As far as I can tell from your 
requirements, you don't care about
> the "true geometry" of the data.

Either I'm understanding what you mean by "true geometry", or this 
assumption is at least partially incorrect. 
Looking back on my list of requirements I've posted, I left off the last 
step: 

  At the end of all this processing, I do need to produce an onscreen image 
to the user. 

So any way I slice up the work that needs to be done using threads, once 
they are all finished, I do need to 
know what chunk of the original image each thread was working on to know 
where place its normalized pixels 
in what I display to the user.

> At least to me it seems you want something like (pseudo algorithm):
>
> 1) Start QThread::idealThreadCount threads (QThread::create<> / 
std::thread)
> 2) Each thread works on "total samples" / QThread::idealThreadCount 
buffers that are completely independent.
> 2.1) Each thread goes through each sample from a partially mapped (from 
the file) buffer, takes the min/max to get the dynamic range
> 2.2) Sync the threads to get the global min/max
> 2.3) Go through each of the buffers a second time to normalize the 
dynamic range (again no tiles involved, just samples)
> 3) Done.

I think this is an preferable approach to what I was attempting, and I'm 
glad you suggested it. This being my first attempt at this, 
I naively started from asking "what seems like it would be a reasonable 
tile size?", arbitrarily thought "256 pixels square" and worked 
backwards from there, which is how I got into this mindset of "I might have 
60,000+ tiles to deal with". Your approach starts from 
what now appears to me a much better thought of "what's the ideal number of 
threads for your machine? Don't bother creating 
more threads than that because you're not going to benefit by having more" 
and then working forward from there.

> Note: As each thread works on its own piece of data in both cases there's 
no sync required at any one point - 
> you just read/write different parts of the same thing. Which is true both 
for when you load/write from/to a file 
> and from/to memory.

Not sure if I quite understand what you meant by this note? There is the 
sync you pointed out as your step 2.2, and then since I need to 
form the results into an onscreen image (most likely a QGraphicsPixmapItem, 
etc.) there's another sync at your step 3 before I can make
the final onscreen image. Otherwise I think I understand and prefer your 
concept to what I was doing.

Thanks again for your help! Now to test this out in practice... 
Sean

___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


___
Interest mailing list

Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Murphy, Sean
> Not knowing if a partial value makes any sense to your system.
> Qt::Concurrent::mappedReduced might make more sense, if its purely a
> speedup you are looking for, and not a "keep the GUI alive during it" possibly
> blockingMappedReduced.

I don't think mappedReduced would help me until after the remapping step 
when I want to assemble the individual tiles into a QImage - although since the 
ultimate destination for the image is inside a QGraphicsView, I'm tempted to 
just 
leave it as individual tiles, but I'm not there yet as far as testing.

Regarding the keep GUI alive portion, my tileManager is already running in a 
separate thread from the UI thread, so I do have the option of making blocking 
calls
in the tileManager class with the exception that I do need to fire progress 
signals out 
of tileManager back to MainWindow to provide progress to the user

> 
> If you need the gui, setting up a qfuturewatcher on the results of the
> mapped call, would be my approach
> 
> QFutureWatcher< XXX > watcher;
> connect( , , manager, );
> 
> auto future = QtConcurrent::mapReduced(tiles, processTile, mergeFunction)
> watcher.setFuture( future );

I actually spent yesterday refactoring the tileManager class as you've just 
described, 
as well as changing out the tile class to no longer inherit from QObject. I've 
got a couple 
more things I want to try/clean up today but I still seem to be having trouble 
speeding 
up the allocation & assignment of the tiles themselves.

As my code currently stands, I now have to vectors, each of which will have 
60,000 items 
in them once they're populated:
  QVector mTileIndices;
  QList mTiles;

The mTileIndices vector is implementing Andrei's idea of quickly generating a 
list of unique tile 
indices, which can then be fed to a QtConcurrent::map() call to create & 
uniquely assign the 
tile items in parallel. The " mTiles " vector is obviously the tiles that will 
do the 
work.

As I build the vectors up, these are the timings I get:
  1. resizing tile index vector to 6 took 0.1514 ms
  a. This is just calling QVector::resize(6) on the ID. This takes 
less than a 
   millisecond, so no complaints here.
  2. allocated 6 indices in 0.0207 ms
  a. This is calling std::iota(mTileIndices.begin(), mTileIndices.end(), 
0). Also takes less than 
  a millisecond, still no complaints
  3. assigning 6 tiles took 18087.1 ms
  a. This timing is the result of QtConcurrent::mapped(mTiles, initTile) 
where the initTile 
  function takes in an integer from mTileIndices, and calls the tile 
constructor using the 
  combination of the tile index and tile size to do the assignment. The 
assigned tiles end 
  up in the mTiles
  b. This step takes 18 seconds, which seems excessive to me and I'd love 
to continue to 
   reduce that time.
  4. load finished in 37507.1 ms
  a. this is calling tile::load() on each tile. Right now that is just a 
dummy function that calls 
  msleep for a random amount milliseconds to simulate doing the actual 
work
  5. remapping 6 tiles took 48519.3 ms
  b. this is calling tile::remap() on each tile. Right now that is just a 
dummy function that 
  calls msleep for a random amount milliseconds to simulate doing the 
actual work

So the only step in this process that still bothers me is step 3 - creating and 
assigning each tile 
object takes 18 seconds. I log the total time by each tile spent in steps 4 & 5 
and compare how 
long steps 4 & 5 actually take vs. the sum of how long each tile spent sleep 
and I routinely get
a speedup factor of about 7.5. QThread::idealThreadCount() reports 8 on my 
machine, so I think
I'm getting what I should expect from those steps on this machine.

I'm not sure what else to try at this point. One thing I was thinking about 
measuring is that even 
though my tile class no longer inherits from QObject, it still is a class with 
a constructor and some 
getter functions. And when I look at the combination of steps 1 & 2 where I 
both resize a vector 
of integers AND assign each one a unique ID in less than a millisecond total, 
but then it takes 
me 18 seconds to create and assign each tile, I keep wondering if there isn't 
room for 
improvement there still?

And as I was typing this whole thing up, Konstantin gave me a different 
approach that I'll probably 
pursue, but I would like to better understand how to solve the question of "if 
you absolutely need 
to have a lot of items (whatever type an "item" needs to be), what's the right 
design approach to 
be able to create and populate them quickly..."

Sean

___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Murphy, Sean
> > What's the significance of the tiles? As far as I can tell from your
> requirements, you don't care about
> > the "true geometry" of the data.
> 
> Either I'm understanding what you mean by "true geometry", or this

Oops! This was supposed to say "Either I'm NOT understanding..."

___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Murphy, Sean
> On Mon, Jan 31, 2022 at 7:15 PM Murphy, Sean 
>  wrote:
> >   1. Creating 60,000 QObjects in a single thread appears to be slow  
> [...]
>
> Ehm, maybe I'm not understanding something, but why do you need objects to 
> begin with?

I do need to report progress back to the UI and I mistakenly thought I needed 
to have 
each tile inherit from QObject to provide that functionality. After reading up 
a bit more about 
QFuture and QFutureWatcher and then refactoring yesterday to use those classes, 
as of the 
moment, my tile class no longer inherits from anything, and I'm able to use 
signals from 
QFutureWatcher to relay progress back to the UI. 

> The actual loop is this:
>    // generate each tile with its assignment
> [snip]
>
> What's the significance of the tiles? As far as I can tell from your 
> requirements, you don't care about
> the "true geometry" of the data.

Either I'm understanding what you mean by "true geometry", or this assumption 
is at least partially incorrect. 
Looking back on my list of requirements I've posted, I left off the last step: 

  At the end of all this processing, I do need to produce an onscreen image to 
the user. 

So any way I slice up the work that needs to be done using threads, once they 
are all finished, I do need to 
know what chunk of the original image each thread was working on to know where 
place its normalized pixels 
in what I display to the user.

> At least to me it seems you want something like (pseudo algorithm):
>
> 1) Start QThread::idealThreadCount threads (QThread::create<> / std::thread)
> 2) Each thread works on "total samples" / QThread::idealThreadCount buffers 
> that are completely independent.
> 2.1) Each thread goes through each sample from a partially mapped (from the 
> file) buffer, takes the min/max to get the dynamic range
> 2.2) Sync the threads to get the global min/max
> 2.3) Go through each of the buffers a second time to normalize the dynamic 
> range (again no tiles involved, just samples)
> 3) Done.

I think this is an preferable approach to what I was attempting, and I'm glad 
you suggested it. This being my first attempt at this, 
I naively started from asking "what seems like it would be a reasonable tile 
size?", arbitrarily thought "256 pixels square" and worked 
backwards from there, which is how I got into this mindset of "I might have 
60,000+ tiles to deal with". Your approach starts from 
what now appears to me a much better thought of "what's the ideal number of 
threads for your machine? Don't bother creating 
more threads than that because you're not going to benefit by having more" and 
then working forward from there.

> Note: As each thread works on its own piece of data in both cases there's no 
> sync required at any one point - 
> you just read/write different parts of the same thing. Which is true both for 
> when you load/write from/to a file 
> and from/to memory.

Not sure if I quite understand what you meant by this note? There is the sync 
you pointed out as your step 2.2, and then since I need to 
form the results into an onscreen image (most likely a QGraphicsPixmapItem, 
etc.) there's another sync at your step 3 before I can make
the final onscreen image. Otherwise I think I understand and prefer your 
concept to what I was doing.

Thanks again for your help! Now to test this out in practice... 
Sean

___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest


Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

2022-02-01 Thread Konstantin Shegunov
On Mon, Jan 31, 2022 at 7:15 PM Murphy, Sean 
wrote:

>   1. Creating 60,000 QObjects in a single thread appears to be slow

[...]
>

Ehm, maybe I'm not understanding something, but why do you need objects to
begin with?


> The actual loop is this:
> // generate each tile with its assignment
> [snip]
>

What's the significance of the tiles? As far as I can tell from your
requirements, you don't care about
the "true geometry" of the data.

At least to me it seems you want something like (pseudo algorithm):

1) Start QThread::idealThreadCount threads (QThread::create<> / std::thread)
2) Each thread works on "total samples" / QThread::idealThreadCount buffers
that are completely independent.
2.1) Each thread goes through each sample from a partially mapped (from the
file) buffer, takes the min/max to get the dynamic range
2.2) Sync the threads to get the global min/max
2.3) Go through each of the buffers a second time to normalize the dynamic
range (again no tiles involved, just samples)
3) Done.

Note: As each thread works on its own piece of data in both cases there's
no sync required at any one point - you just read/write different parts of
the same thing. Which is true both for when you load/write from/to a file
and from/to memory.

What am I missing here?

Kind regards,
Konstantin.
___
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest