Re: [Maya-Python] Re: [python] which library using to read / write huge amount of data

2016-10-19 Thread fruity
Hi !

I realized that i didn't put the result of the tests i did, so here they 
are, if it can help someone else ! 
Treating the maya part (getting / setting the weights) and the IO part 
(writing / reading the file) separately, i tried : 
*Maya : *
cmds.skinPercent
MFnSkinCluster.setWeights
get/setAttr (maya.cmds)
get/setAttr (MPlug)
I get the best results with get/setAttr MPlug. I don't remember the nb of 
vertices of my test mesh, but i guess it was the same than in my previous 
email (i.e. 39k) and it was around 1.4s for importing weights (which is the 
tricky part), against 4.7s with maya.cmds. This time includes the time to 
read the file (as i was interested by the entire operation, i didn't take 
time to format the code and separate my timers).
Exporting is roughly similar between cmds and mplug (less than 1s for the 
same 10k mesh).
*IO : *
I finally went for a json dict, and the weight entry is compressed using 
cPickle. It seemed to be the fastest way, allowing me to keep everything in 
one file, easily editable and understandable !
There must be better options (e.g. using zlib, or hdf5, although i'd like 
to keep something native), i'll try to see that more in depth later !
Thanks a lot for your help, anyway !

Le jeudi 13 octobre 2016 13:35:36 UTC-7, fruity a écrit :
>
> Hi Marcus !
>
> Thanks for your answer&help ! Well, i'm still working on the optimisation, 
> i used json for the readable info (influences, etc), and cPickle for the 
> weights array. But i think most of the optimisation should now come from 
> how to export / import the values to the vertices.
> Exporting is not that expensive (0.777652025223s for 39k vertices and 2 
> influences), but importing is still 4.7600607872s. It seems there are 
> different ways of reading/writing weights, and it takes some time to try 
> all of them ! For now, it seems that skinPercent is definetly the worst 
> idea (about 29sec for importing ^^), and i read that the MFnSkinCluster is 
> not necessarily the best option, at least using getWeights() and 
> setWeights() (http://www.macaronikazoo.com/?p=417). The fastest may be 
> based on querying the values via api plugs.
> Long story short, there are a lot of ways of doing it, so i need to try 
> all of them, but i think the part i need to work on is more the maya part 
> than the 'data' part ?
> hdf5 looks great (i wish i could have a look at the book you mentionned on 
> stackOverflow, too late now... ;-), but it's not native (because of the use 
> of numpy ?), unfortunately. I'm not really informed on alembic 
> possibilities and what you can or can't do with it, but it's definetly 
> something i want to investigate, looks super powerful !
> thanks for the help !
>
> Le jeudi 13 octobre 2016 14:12:20 UTC+2, Marcus Ottosson a écrit :
>>
>> Hey @fruity, how did it go with this? Did you make any progress? :)
>>
>> ​I came to think of another constraint or method which to do what you’re 
>> after - in regards to random access. That is, being able to query weights 
>> for any given vertex, without (1) reading it all into memory and (2) 
>> physically searching for it.
>>
>> There’s a file format called HDF5  
>> which was designed for this purpose (which has Python bindings as well). 
>> It’s written by the scientific community, but applies well to VFX in that 
>> they also deal with large datasets of high precision (in this case, 
>> millions of vertices and floating point weights). To give you some 
>> intuition for how it works, I formulated a StackOverflow question 
>> 
>>  
>> about it a while back that compares it to a “filesystem in a file” that has 
>> some good discussion around it.
>>
>> In more technical terms, you can think of it as Alembic. In fact Alembic 
>> is a “fork” of HDF5, which was later rewritten (i.e. “Ogawa 
>> “) but 
>> maintains (to my knowledge) the gist of how things are organised and 
>> accessed internally.
>>
>> At the end of the day, it means you can store the results of your weights 
>> in one of these hdf5 files, and read it back either as you would any normal 
>> file (i.e. entirely into memory) or random access - for example, if you’re 
>> only interested in applying weights to a selected area of a highly dense 
>> polygonal mesh. Or if you have multiple “channels” or “versions” of weights 
>> within the same file (e.g. 50 gb of weights), you could pick one without 
>> requiring all that memory to be readily available.
>> ​
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to python_inside_maya+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/python_inside_

Re: [Maya-Python] Re: [python] which library using to read / write huge amount of data

2016-10-13 Thread fruity
Hi Marcus !

Thanks for your answer&help ! Well, i'm still working on the optimisation, 
i used json for the readable info (influences, etc), and cPickle for the 
weights array. But i think most of the optimisation should now come from 
how to export / import the values to the vertices.
Exporting is not that expensive (0.777652025223s for 39k vertices and 2 
influences), but importing is still 4.7600607872s. It seems there are 
different ways of reading/writing weights, and it takes some time to try 
all of them ! For now, it seems that skinPercent is definetly the worst 
idea (about 29sec for importing ^^), and i read that the MFnSkinCluster is 
not necessarily the best option, at least using getWeights() and 
setWeights() (http://www.macaronikazoo.com/?p=417). The fastest may be 
based on querying the values via api plugs.
Long story short, there are a lot of ways of doing it, so i need to try all 
of them, but i think the part i need to work on is more the maya part than 
the 'data' part ?
hdf5 looks great (i wish i could have a look at the book you mentionned on 
stackOverflow, too late now... ;-), but it's not native (because of the use 
of numpy ?), unfortunately. I'm not really informed on alembic 
possibilities and what you can or can't do with it, but it's definetly 
something i want to investigate, looks super powerful !
thanks for the help !

Le jeudi 13 octobre 2016 14:12:20 UTC+2, Marcus Ottosson a écrit :
>
> Hey @fruity, how did it go with this? Did you make any progress? :)
>
> ​I came to think of another constraint or method which to do what you’re 
> after - in regards to random access. That is, being able to query weights 
> for any given vertex, without (1) reading it all into memory and (2) 
> physically searching for it.
>
> There’s a file format called HDF5  
> which was designed for this purpose (which has Python bindings as well). 
> It’s written by the scientific community, but applies well to VFX in that 
> they also deal with large datasets of high precision (in this case, 
> millions of vertices and floating point weights). To give you some 
> intuition for how it works, I formulated a StackOverflow question 
> 
>  
> about it a while back that compares it to a “filesystem in a file” that has 
> some good discussion around it.
>
> In more technical terms, you can think of it as Alembic. In fact Alembic 
> is a “fork” of HDF5, which was later rewritten (i.e. “Ogawa 
> “) but 
> maintains (to my knowledge) the gist of how things are organised and 
> accessed internally.
>
> At the end of the day, it means you can store the results of your weights 
> in one of these hdf5 files, and read it back either as you would any normal 
> file (i.e. entirely into memory) or random access - for example, if you’re 
> only interested in applying weights to a selected area of a highly dense 
> polygonal mesh. Or if you have multiple “channels” or “versions” of weights 
> within the same file (e.g. 50 gb of weights), you could pick one without 
> requiring all that memory to be readily available.
> ​
>

-- 
You received this message because you are subscribed to the Google Groups 
"Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to python_inside_maya+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/python_inside_maya/e51717ac-a4ff-4ffb-9950-44c617779e33%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Maya-Python] Re: [python] which library using to read / write huge amount of data

2016-10-13 Thread Marcus Ottosson
Hey @fruity, how did it go with this? Did you make any progress? :)

​I came to think of another constraint or method which to do what you’re
after - in regards to random access. That is, being able to query weights
for any given vertex, without (1) reading it all into memory and (2)
physically searching for it.

There’s a file format called HDF5 
which was designed for this purpose (which has Python bindings as well).
It’s written by the scientific community, but applies well to VFX in that
they also deal with large datasets of high precision (in this case,
millions of vertices and floating point weights). To give you some
intuition for how it works, I formulated a StackOverflow question

about it a while back that compares it to a “filesystem in a file” that has
some good discussion around it.

In more technical terms, you can think of it as Alembic. In fact Alembic is
a “fork” of HDF5, which was later rewritten (i.e. “Ogawa
“) but
maintains (to my knowledge) the gist of how things are organised and
accessed internally.

At the end of the day, it means you can store the results of your weights
in one of these hdf5 files, and read it back either as you would any normal
file (i.e. entirely into memory) or random access - for example, if you’re
only interested in applying weights to a selected area of a highly dense
polygonal mesh. Or if you have multiple “channels” or “versions” of weights
within the same file (e.g. 50 gb of weights), you could pick one without
requiring all that memory to be readily available.
​

-- 
You received this message because you are subscribed to the Google Groups 
"Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to python_inside_maya+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/python_inside_maya/CAFRtmOAguqu27GO0xDtEXJCubvPb%3DEpQaVa1xeSS7C4kAFVDSg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Maya-Python] Re: [python] which library using to read / write huge amount of data

2016-10-06 Thread Marcus Ottosson
Cool :)

About reading in a file, a little bit at a time, it’s not as difficult as
it seems.

f = open("100gb.json")

At this point, you’ve opened a file of an imaginary 100 gigabytes. What is
the effect on memory? 0. Nada.

Now, if you were to this..

data = f.read()

You’d be in trouble.

Now you’ve told Python (and the OS) to go out there and bring back the
entire contents of this file and put it in your variable data. That’s no
good.

But there are other ways of reading from a file.

first_line = next(f)

Bam, you’ve opened a huge file, read the first line and stopped. No more
data is read, memory is barely affected.

You can iterate this too.

for line in f:
  print(line)

It will read one line at a time, print it and throw the data away. Memory
is barely affected.

The thing about various file formats, is that some of them can’t be read
like this. Some of them won’t make sense until you’ve read the entire file.

For example, consider JSON.

{
  "key": "value"}

That first line is meaningless. The second line too. For this file to make
sense, you will need to read the entire file.

Some formats, including a variation of JSON, support “streaming”, which is
what we did up there. So if you’re looking to conserve memory, you’ll have
to add this criteria to your desired file format.

ps. Don’t forget to close the file, or use a context manager.

f.close()

​

On 6 October 2016 at 10:13, fruity  wrote:

> actually, the file would be split into 2 parts : the first one would be
> the infos that i want the user to be able to modify (a couple of lines),
> the second one would be a text of n lines (n being a number of vertices for
> a mesh, for instance), that the user will definetly not modify. So you're
> right, maybe i should split it into 2 files, the first in json or similar,
> the second encoded..
> For the memory leak, yes, i did experience something similar, during my
> last project (yes, in rigging too, it was nothing but fun on this project
> =p ), but with in-house tools. So i've seen it, but didn't do it myself.
> But that's why i'm concerned about that. And by memory leak, i'm also
> talking about memory management. For instance, you work on super heavy sets
> (like really super heavy sets ;-), and you want to load some datas
> attached to this set. By default, you'll have to load the entire data file,
> which will result in a huge consommation of RAM, and ultimately, probably a
> crash because of a lack of memory. So it wouldn't be a memory leak so to
> speak, but something that you'd have to handle by reading your file chunk
> after chunk, and flush the memory after each iteration. I'm not sure any
> module would do that automatically, though.
> I'll think at my problem differently, and try to split it into 2 parts,
> that seems so obvious now you mentionned it ! Thanks !
>
>
> Le jeudi 6 octobre 2016 10:29:18 UTC+2, Marcus Ottosson a écrit :
>>
>> cPickle looks great
>>
>> If you want it to be human readable, you can’t encrypt, compress or
>> otherwise obfuscate it, so both pickling and zlib is out. But think about
>> that for a second. If you are talking about multi-megabyte/gigabytes worth
>> of data, the mere quantity would make it uneditable regardless of the
>> format.
>>
>> The part that i’m not sure of is the memory management when working with
>> huge chunks of data, and i’m even not comfortable at how to mesure and
>> control it.
>>
>> I’m not sure why you would worry about that, unless you’ve experienced
>> some issues already? So long as you read the file into a variable, and that
>> variable isn’t kept around globally, then garbage collection would handle
>> this for you no matter which format you choose.
>>
>> To get a memory leak, you’d really have to try.
>>
>> import json
>>
>> leak = list()
>> def read_file(fname):
>>   with open(fname) as f:
>> data = json.load(f)
>> leak.append(data)  # builds up over time
>>
>> ​
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Python Programming for Autodesk Maya" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to python_inside_maya+unsubscr...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/python_inside_maya/211afd52-15e9-4b4f-81bb-
> 05d11fb702b4%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
*Marcus Ottosson*
konstrukt...@gmail.com

-- 
You received this message because you are subscribed to the Google Groups 
"Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to python_inside_maya+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/python_inside_maya/CAFRtmOAC7MZiDpKSWcUqGOqeS

Re: [Maya-Python] Re: [python] which library using to read / write huge amount of data

2016-10-06 Thread fruity
actually, the file would be split into 2 parts : the first one would be the 
infos that i want the user to be able to modify (a couple of lines), the 
second one would be a text of n lines (n being a number of vertices for a 
mesh, for instance), that the user will definetly not modify. So you're 
right, maybe i should split it into 2 files, the first in json or similar, 
the second encoded..
For the memory leak, yes, i did experience something similar, during my 
last project (yes, in rigging too, it was nothing but fun on this project 
=p ), but with in-house tools. So i've seen it, but didn't do it myself. 
But that's why i'm concerned about that. And by memory leak, i'm also 
talking about memory management. For instance, you work on super heavy sets 
(like really super heavy sets ;-), and you want to load some datas 
attached to this set. By default, you'll have to load the entire data file, 
which will result in a huge consommation of RAM, and ultimately, probably a 
crash because of a lack of memory. So it wouldn't be a memory leak so to 
speak, but something that you'd have to handle by reading your file chunk 
after chunk, and flush the memory after each iteration. I'm not sure any 
module would do that automatically, though.
I'll think at my problem differently, and try to split it into 2 parts, 
that seems so obvious now you mentionned it ! Thanks !


Le jeudi 6 octobre 2016 10:29:18 UTC+2, Marcus Ottosson a écrit :
>
> cPickle looks great
>
> If you want it to be human readable, you can’t encrypt, compress or 
> otherwise obfuscate it, so both pickling and zlib is out. But think about 
> that for a second. If you are talking about multi-megabyte/gigabytes worth 
> of data, the mere quantity would make it uneditable regardless of the 
> format.
>
> The part that i’m not sure of is the memory management when working with 
> huge chunks of data, and i’m even not comfortable at how to mesure and 
> control it.
>
> I’m not sure why you would worry about that, unless you’ve experienced 
> some issues already? So long as you read the file into a variable, and that 
> variable isn’t kept around globally, then garbage collection would handle 
> this for you no matter which format you choose.
>
> To get a memory leak, you’d really have to try.
>
> import json
>
> leak = list()
> def read_file(fname):
>   with open(fname) as f:
> data = json.load(f)
> leak.append(data)  # builds up over time
>
> ​
>

-- 
You received this message because you are subscribed to the Google Groups 
"Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to python_inside_maya+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/python_inside_maya/211afd52-15e9-4b4f-81bb-05d11fb702b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [Maya-Python] Re: [python] which library using to read / write huge amount of data

2016-10-06 Thread Marcus Ottosson
cPickle looks great

If you want it to be human readable, you can’t encrypt, compress or
otherwise obfuscate it, so both pickling and zlib is out. But think about
that for a second. If you are talking about multi-megabyte/gigabytes worth
of data, the mere quantity would make it uneditable regardless of the
format.

The part that i’m not sure of is the memory management when working with
huge chunks of data, and i’m even not comfortable at how to mesure and
control it.

I’m not sure why you would worry about that, unless you’ve experienced some
issues already? So long as you read the file into a variable, and that
variable isn’t kept around globally, then garbage collection would handle
this for you no matter which format you choose.

To get a memory leak, you’d really have to try.

import json

leak = list()
def read_file(fname):
  with open(fname) as f:
data = json.load(f)
leak.append(data)  # builds up over time

​

-- 
You received this message because you are subscribed to the Google Groups 
"Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to python_inside_maya+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/python_inside_maya/CAFRtmOBrcajZrWjMPad1K5Jkdk%3Dm0xRYqJqC%3DGNkggU-b%3DYfNA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.