Hi,

Just as a follow up we have found a quite weird behavior memory
footprint, seems that when the objects are iterated the deallocation
does not return all of the memory back to the process [1].

Any thoughts?


[1] https://github.com/protocolbuffers/protobuf/issues/5737

On Thu, Feb 14, 2019 at 10:47 PM Pau Freixes <[email protected]> wrote:
>
> Hi folks,
>
> Recently we have been experimenting a really weird memory consumption
> pattern using the Python protobuf implementation - using under the
> hood the CPP interface. In some specific scenario, we have spotted a
> sudden increase in memory usage just by iterating by some of the proto
> message attributes.
>
> Some context. The message is composed of three nested repeated fields,
> like the following message:
>
> Message Bar {
>     Message X {
>         Message Y {
>             int value = 1
>         }
>         repeated Y y = 1
>     }
>     repeated X x  = 1
> }
> Message Foo {
>     repeated Bar bar = 1
> }
>
> We have a serialized protobuf file that uses the previous message
> format that takes around 1GB in disk, containing around 10M of Bar
> messages as repeated elements of one Foo message. We do have a similar
> code like the following one:
>
> from foo_pb2 import Foo
>
> with open("/tmp/foo", "rb") as fd:
>     foo = Foo()
>     foo.ParseFromString(fd.read())
>
> for bar in foor.bar:
>     pass
>
> for bar in foo.bar:
>     for x in bar.x:
>         for y in x.y:
>             pass
>
> We have noticed that after the first iteration - that basically
> iterates for all of the repeated Bar elements within the Foo object -
> the memory increases till reaching the 16GB. And, after the second
> loop, the memory increases almost till the 30GB.
>
> Besides the amount of memory consumed, what really surprised us was
> the increase of the memory footprint just because of a simple
> iteration, we were wondering if we could find out a memory leak. But
> was quite unlikely taking into account the maturity of the project. We
> were digging a bit into the C extension implementation, and we found
> out something interesting. Reading this piece of code [1], which
> refers to the `getattribute` method, seems that Python objects are
> lazily created, so only when they are accessed are in fact created.
>
> Is this true? is there a lazy loading pattern that only creates the
> Python objects if and only if these are accessed?
>
> And in that case, can this be circumvented in some way? if we do not
> need to mutate the attribute can we make direct access to the
> underlying object without paying the cost of deserializing it?
>
> I forgot to call out that we are using the 3.6.X version of protobuf,
> I can see that in master the "message.cc" implementation has changed a
> bit, is there anything in the master or in the 3.7.X version that
> might help us in reducing the memory footprint?
>
> Thanks,
>
> [1] 
> https://github.com/protocolbuffers/protobuf/blob/3.6.x/python/google/protobuf/pyext/message.cc#L2732
> --
> --pau



-- 
--pau

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Reply via email to