Re: Customizable Serialization check-in

2012-08-24 Thread Russell Keith-Magee
Hi Piotr,

Thank you so much for your efforts over the summer.

I'd also like to apologise for my lack of communication; I certainly
haven't been a model mentor over the course of the program.

Although we may not have achieved all the goals we set out to achieve
at the start of the program, I don't think it's been a complete loss
-- we've certainly thrashed out some interesting ideas, and between
your work and Tom's, I'm sure we can salvage something that the
community can make use of.

Now that the program has finished, If you have any feedback about what
we could do differently next year, I'd love to hear it. Obviously,
we'd like every SoC student to be a complete success, so if there's
anything the Django team could do to improve the chances of success
for next year's program, I'd like to be able to learn from the
mistakes of this year.

Yours,
Russ Magee %-)

On Thu, Aug 23, 2012 at 8:14 AM, Piotr Grabowski  wrote:
> Hi,
>
> Google Sumer of Code is almost ended. I was working on customizable
> serialization. This project was a lot harder than I expected, and sadly in
> my opinion I failed to do it right. I want to apologize for that and
> especially for my poor communication with this group and my mentor. I want
> to improve it after midterm evaluation but it was only worse.
>
> I don't think my project is all wrong but there is a lot things that are
> different from how I planned. How it looks like (I wrote more in
> documentation)
> There is Serializer class that is made of two classes: NativeSerializer and
> FormatSerializer.
> NativeSerializer is for serialization and deserialization python objects
> from/to native python datatypes
> FormatSerializer is for serialization and deserialization python native
> datatypes to/from some format (xml, json, yaml)
>
> I want NativeSerializer to be fully independent from FormatSerializer (and
> vice versa) but this isn't possible. Either NativeSerializer must return
> some additional data or FormatSerializer must give NativeSerializer some
> context. For exemple in xml all python native datatypes must be serialized
> to string before serializing to xml. Some custom model fields can have more
> sophisticated way to serialize to sting than unicode() so
> `field.value_to_string` must be called and `field` are only accessible in
> NativeSerializer object. So either NativeSerializer will return also `field`
> or FormatSerializer will inform NativeSerializer that it handles only text
> data.
>
> Backward compatible dumpdata is almost working. Only few tests are not
> passed, but I am not sure why.
>
> Nested serialization of fk and m2m related fields which was main
> functionality of this project is working but not well tested. There are some
> issues especially with xml. I must write new xml format because old wont
> work with nested serialization.
>
> I didn't do any performance tests. Running full test suite take 40 seconds
> more with my serialization (about 1500s at all) if I remember correctly.
>
> I will try to complete this project so it will be at least bug free and
> usable. If someone was interested in using nested serialization there is
> other great project: https://github.com/tomchristie/django-serializers
>
> Code: https://github.com/grapo/django/tree/soc2012-serialization
> Documentation: https://gist.github.com/3085250
>
>
> --
> Piotr Grabowski
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To post to this group, send email to django-developers@googlegroups.com.
> To unsubscribe from this group, send email to
> django-developers+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-developers?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-08-24 Thread Tom Christie
Thanks Piotr,

  It's been interesting and helpful watching your progress on this project.
I wouldn't worry too much about not quite meeting all the goals you'd hoped 
for - it's a deceptively difficult task.
In particular it's really difficult trying to maintain full backwards 
comparability with the existing fixture serialization implementation,
whilst also totally redesigning the API to support the requirements of a 
more flexible serialization system.
Like you say, I think the overall direction of your project is right, and 
personally I've found it useful for my own work watching how you've tackled 
various parts of it.

All the best,

  Tom

On Thursday, 23 August 2012 01:14:26 UTC+1, Piotr Grabowski wrote:
>
> Hi, 
>
> Google Sumer of Code is almost ended. I was working on customizable 
> serialization. This project was a lot harder than I expected, and sadly 
> in my opinion I failed to do it right. I want to apologize for that and 
> especially for my poor communication with this group and my mentor. I 
> want to improve it after midterm evaluation but it was only worse. 
>
> I don't think my project is all wrong but there is a lot things that are 
> different from how I planned. How it looks like (I wrote more in 
> documentation) 
> There is Serializer class that is made of two classes: NativeSerializer 
> and FormatSerializer. 
> NativeSerializer is for serialization and deserialization python objects 
> from/to native python datatypes 
> FormatSerializer is for serialization and deserialization python native 
> datatypes to/from some format (xml, json, yaml) 
>
> I want NativeSerializer to be fully independent from FormatSerializer 
> (and vice versa) but this isn't possible. Either NativeSerializer must 
> return some additional data or FormatSerializer must give 
> NativeSerializer some context. For exemple in xml all python native 
> datatypes must be serialized to string before serializing to xml. Some 
> custom model fields can have more sophisticated way to serialize to 
> sting than unicode() so `field.value_to_string` must be called and 
> `field` are only accessible in NativeSerializer object. So either 
> NativeSerializer will return also `field` or FormatSerializer will 
> inform NativeSerializer that it handles only text data. 
>
> Backward compatible dumpdata is almost working. Only few tests are not 
> passed, but I am not sure why. 
>
> Nested serialization of fk and m2m related fields which was main 
> functionality of this project is working but not well tested. There are 
> some issues especially with xml. I must write new xml format because old 
> wont work with nested serialization. 
>
> I didn't do any performance tests. Running full test suite take 40 
> seconds more with my serialization (about 1500s at all) if I remember 
> correctly. 
>
> I will try to complete this project so it will be at least bug free and 
> usable. If someone was interested in using nested serialization there is 
> other great project: https://github.com/tomchristie/django-serializers 
>
> Code: https://github.com/grapo/django/tree/soc2012-serialization 
> Documentation: https://gist.github.com/3085250 
>
> -- 
> Piotr Grabowski 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/django-developers/-/a2gBdTn5C6EJ.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-08-22 Thread Piotr Grabowski

Hi,

Google Sumer of Code is almost ended. I was working on customizable 
serialization. This project was a lot harder than I expected, and sadly 
in my opinion I failed to do it right. I want to apologize for that and 
especially for my poor communication with this group and my mentor. I 
want to improve it after midterm evaluation but it was only worse.


I don't think my project is all wrong but there is a lot things that are 
different from how I planned. How it looks like (I wrote more in 
documentation)
There is Serializer class that is made of two classes: NativeSerializer 
and FormatSerializer.
NativeSerializer is for serialization and deserialization python objects 
from/to native python datatypes
FormatSerializer is for serialization and deserialization python native 
datatypes to/from some format (xml, json, yaml)


I want NativeSerializer to be fully independent from FormatSerializer 
(and vice versa) but this isn't possible. Either NativeSerializer must 
return some additional data or FormatSerializer must give 
NativeSerializer some context. For exemple in xml all python native 
datatypes must be serialized to string before serializing to xml. Some 
custom model fields can have more sophisticated way to serialize to 
sting than unicode() so `field.value_to_string` must be called and 
`field` are only accessible in NativeSerializer object. So either 
NativeSerializer will return also `field` or FormatSerializer will 
inform NativeSerializer that it handles only text data.


Backward compatible dumpdata is almost working. Only few tests are not 
passed, but I am not sure why.


Nested serialization of fk and m2m related fields which was main 
functionality of this project is working but not well tested. There are 
some issues especially with xml. I must write new xml format because old 
wont work with nested serialization.


I didn't do any performance tests. Running full test suite take 40 
seconds more with my serialization (about 1500s at all) if I remember 
correctly.


I will try to complete this project so it will be at least bug free and 
usable. If someone was interested in using nested serialization there is 
other great project: https://github.com/tomchristie/django-serializers


Code: https://github.com/grapo/django/tree/soc2012-serialization
Documentation: https://gist.github.com/3085250

--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-08-06 Thread Piotr Grabowski

Hi,

In the past 3 weeks, my project has changed a lot. First of all I 
changed output of first phase of serialization. Previously it was python 
native datatypes. At some point I added dictionary with metadata to it. 
Metadata was used in second phase of serialization. Now after first 
phase I returned ObjectWithMetadata which is wrapping for python native 
datatypes. It's a bit hackish so I don't know it is good solution:


class ObjectWithMetadata(object):
def __init__(self, obj, metadata=None, fields=None):
self._object = obj
self.metadata = metadata or {}
self.fields = fields or {}

def get_object(self):
return self._object

def __getattribute__(self, attr):
if attr not in ['_object', 'metadata', 'fields', 'get_object']:
return self._object.__getattribute__(attr)
else:
return object.__getattribute__(self, attr)

# there is a few more methods like this (for acting like a 
MutableMapping and Iterabla) and all are similar

def __getitem__(self, key):
return self._object.__getitem__(key)

...

Thanks to this solution, ObjectWithMetadata is acting like object stored 
in _object in almost all cases (also at isinstance tests), and there is 
place for storing additional data.


I didn't change deserialization so in output there are python native 
datatypes without wrapping. I don't know if this is good because there 
is no symmetry in this:
Django object -> python native datatype packed in ObjectWithMetadata -> 
json -> python native datatype -> Django object



I have all dumpsdata formats working now (xml, json, yaml). All tests 
pass, but there is problem with order of fields in yaml. It will be 
fixed soon.
I make new format new_xml which is similar to json and yaml. It's easier 
to parsing it.


Old:

 rel="ManyToOneRel">1
rel="ManyToManyRel">






New:

 
  1
   
   1
   2
   



There is also problem with json and serialization to stream because json 
is using extensions written in C (_json) for performance and this leads 
to exceptions when ObjectWithAttributes is used, so before pass objects 
to json.loads these objects should be unpacked from ObjectWithMetadata.



Probably there is no chance to achieve one of most important requirement 
which I have specify - using only one Serializer to serialize Django 
Models to multiple formats:

serializers.serialize('json', objects, serializer=MySerializer)
serializers.serialize('xml', objects, serializer=MySerializer)

Trouble is with xml (like always ;).  In xml every (model) field must be 
converted to string before serializing in xml serializer. In json and 
yaml if field have protected type (string, int, datetime etc.) then 
nothing is done with it. Converting is done in first phase because only 
there is access to field.value_to_string - field method that is used to 
convert field value to string. It can be override by user so simple 
doing smart_unicode in second phase instead isn't enough.



Most important tasks in TODO:
handling natural keys
tests
x correctness
x performance (I suspect my solution will be worse than actual used 
in Django, but how much?)

documentation

https://github.com/grapo/django/tree/soc2012-serialization/django/core/serializers
--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-07-12 Thread Piotr Grabowski

W dniu 11.07.2012 14:04, Russell Keith-Magee pisze:

There is still problem with API and how to do some things but in my opinion
it's going in right direction.

Generally, I agree. I still have some concerns however; mostly around
the things that you're putting onto the Meta class.

related_serializer, for example -- Why is this a single attribute in
the meta, rather than a method? By using an attribute, you're saying
that on any given serializer, *all* related objects will be serialised
the same, and I don't see why that should be the case.
Not *all* related objects but only those that aren't declared in class 
definition. I think related_serializer attribute is useful when you want 
to serialize all related object in one way: to their's primary key 
value, to their's  natural key value, to dumpdata format. If you want to 
do exception for some fields then you declare it in class definition.



class MySerializer(ModelSerializer):
special_object =  SpecialSerializer()
class Meta:
related_serializer = PkSerializer

In this case all related objects except special_object will be 
serialized to pk value.


What you will do more with a related_serializer method? If you want to 
serialize some related objects by one serializer and some by another the 
simplest way to do it is declare this in class definition.
I see only two examples when method will be needed. If you want to get 
serializer by some pattern in field name or if you want to get 
serializer by related object type (m2m, fk). Then you can override 
get_object_field_serializer(self, obj, field_name) method to do it. 
Default this method return related_serializer or field_serializer based 
on field type. Maybe good idea will be to split this method to two, one 
for related object and one for non related. Then overriding it will be 
very similar to set attribute in Meta, but I think attributes are more 
"declarative".


The same argument goes for class_name (which I think I've mentioned
before), field_serializer, and so on.

And there is method for that :)

def create_instance(self, serialized_obj):
if self.opts.class_name is not None:
if isinstance(self.opts.class_name, str):
return _get_model(serialized_obj[self.opts.class_name])()
else:
return self.opts.class_name()
raise base.DeserializationError(u"Can't resolve class for object 
creation")


Maybe it isn't proper way to do this - there is two ways to doing same 
operation, but I think this is simplest solution for end user.



The only fields that I can see
that *should* be declarative are 'fields' and 'exclude' -- and if
you've been tracking django-dev recently, there's been a discussion
about whether the idea of 'exclude' should be deprecated from Django
APIs (due to potential security issues -- explicit inclusion is safer
than implicit inclusion, because you can accidentally forget to
exclude sensitive data from an output list)
I have read this discussion. I'm +1 to deprecate 'exclude' :) Personally 
I almost never use it.




Some other API questions:

Why is deserialized_value decoupled from set_object? It isn't obvious
to me why this separation exists.
 It's possible that I overcomplicated this. There is three methods: 
set_object, deserialize and deserialize_value. When you want to 
deserialize object then you should:
* Ensure that this is proper object not list of objects or dict (dict in 
deserialization is another problem - I will present it below) - 
'deserialization' method will handle this - it recursively deserialize 
lists and dicts.
* Do some processing on object you get ( e.g. change string to int) 
'deserialize_value' method will handle this
* Set this object to upper level object. 'set_object' method will handle 
this. There shouldn't be reason to override it very often.


I think deserialize_value will be method that user would most often 
needed to override.
I would be acquiescent to merge deserialize and deserialize_value. But 
set_object should be left as is.


Problem with deserializing dict:
In current implementation in deserialization there is no way to guess 
that given dict is serialized object or it is dict of objects. So it 
might be better to don't automatically serialize dicts but leave it to 
the user decision?


  
I see where you're going with metainfo on fields (and that's a

reasonably elegant way of tackling the problem of XML needing
additional info to serialize), but what is the purpose of metadata on
Serializers?

Yours, Russ Magee %-) 


Because Serializer should also have possibility to give additional info 
to format serializer. For example which fields should be treat as 
attributes (pk and model in dumpdata).



--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googleg

Re: Customizable Serialization check-in

2012-07-11 Thread Russell Keith-Magee
On Wed, Jul 11, 2012 at 8:18 AM, Piotr Grabowski  wrote:
> Hi,
>
> It is time to midterm evaluation of my participation in gsoc so I want to
> summarize in this check-in what I have done in last month.
> https://gist.github.com/3085250 - here is something that can be
> "documentation". I wrote some examples of ModelSerializer usage and how it
> should work.
> https://github.com/grapo/django - in branch soc2012-serialization is code
> that I wrote.

It's good that you're starting to work on some documentation -- my
feedback is that you need to think about the purpose of this
documentation -- I can discover the API myself with Python's
interactive shell; what that won't tell me is what output I will
expect.

For example, you give an example of how to defined a 'metadata'
method, but you don't show the effect of adding that declaration on
the output serialised object. In fact, there doesn't seem to be a
single example of serialised *output* in the whole docs.

Giving lots of code examples of input doesn't really help me unless I
know how that input will shape the output. This is especially
important when we're dealing with serializers.

> There is still problem with API and how to do some things but in my opinion
> it's going in right direction.

Generally, I agree. I still have some concerns however; mostly around
the things that you're putting onto the Meta class.

related_serializer, for example -- Why is this a single attribute in
the meta, rather than a method? By using an attribute, you're saying
that on any given serializer, *all* related objects will be serialised
the same, and I don't see why that should be the case.

The same argument goes for class_name (which I think I've mentioned
before), field_serializer, and so on. The only fields that I can see
that *should* be declarative are 'fields' and 'exclude' -- and if
you've been tracking django-dev recently, there's been a discussion
about whether the idea of 'exclude' should be deprecated from Django
APIs (due to potential security issues -- explicit inclusion is safer
than implicit inclusion, because you can accidentally forget to
exclude sensitive data from an output list)

Some other API questions:

Why is deserialized_value decoupled from set_object? It isn't obvious
to me why this separation exists.

I see where you're going with metainfo on fields (and that's a
reasonably elegant way of tackling the problem of XML needing
additional info to serialize), but what is the purpose of metadata on
Serializers?

> Serialization and deserialization of Python objects is almost done. There is
> quite stable API, i used some ideas (and little code) from
> https://github.com/tomchristie/django-serializers
> Objects are serialized to metadicts which are dicts with additional data.
> this additional data can be used by format serializer to change presentation
> of data (e.g. attributes in xml)
>
> Serialization of Django models is started. I don't know what fields of model
> should be serialized by default: for sure all declared in model fields. What
> with pk field, reverse related fields?

Your goal here should be to exactly replicate Django's existing
serializers. That means serialising all local model fields, with the
PK being handled as a special case; reverse related fields aren't
included.

> Json dumpdata serializer is more or less written - I have not done fields
> sorting yet.
>
> I am sure that I can finish all this work until gsoc end.
>
> Sadly not all is going well. Especially my communication in this list and
> with my mentor should be improved. It's all by my fault. I should wrote
> check-ins more regularly and meet the deadlines that I set. I am not very
> satisfied with progress I have made. It can be done much more in about one
> and a half month.

My sincere apologies for not responding as often as I should. I
haven't been a very good mentor for this project. I'll try and improve
for the second half of the GSoC.

I can see you've been getting some feedback from Tom Christie; the
good news is that I'm generally in agreement with the feedback he's
been giving you, so he hasn't been leading you astray :-)

If you ever want to get my attention for a solid block of time to kick
around an idea, you can alway grab me on IRC. I lurk in #django-dev
most of the time.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-07-10 Thread Piotr Grabowski

Hi,

It is time to midterm evaluation of my participation in gsoc so I want 
to summarize in this check-in what I have done in last month.
https://gist.github.com/3085250 - here is something that can be 
"documentation". I wrote some examples of ModelSerializer usage and how 
it should work.
https://github.com/grapo/django - in branch soc2012-serialization is 
code that I wrote.


There is still problem with API and how to do some things but in my 
opinion it's going in right direction.


Serialization and deserialization of Python objects is almost done. 
There is quite stable API, i used some ideas (and little code) from 
https://github.com/tomchristie/django-serializers
Objects are serialized to metadicts which are dicts with additional 
data. this additional data can be used by format serializer to change 
presentation of data (e.g. attributes in xml)


Serialization of Django models is started. I don't know what fields of 
model should be serialized by default: for sure all declared in model 
fields. What with pk field, reverse related fields?


Json dumpdata serializer is more or less written - I have not done 
fields sorting yet.


I am sure that I can finish all this work until gsoc end.

Sadly not all is going well. Especially my communication in this list 
and with my mentor should be improved. It's all by my fault. I should 
wrote check-ins more regularly and meet the deadlines that I set. I am 
not very satisfied with progress I have made. It can be done much more 
in about one and a half month.


Regards,
Piotr Grabowski





--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-06-28 Thread Piotr Grabowski

W dniu 26.06.2012 11:52, Tom Christie pisze:

> It is the way I am doing deserialization - pass instance to subfields

Seems fine.  It's worth keeping in mind that there's two ways around 
of doing this.


1. Create an empty instance first, then populate it with the field 
values in turn.
2. Populate a dictionary with the field values first, and then create 
an instance using those values.


The current deserialization does something closer to the second.
I don't know if there's any issues with doing things the other way 
around, but you'll want to consider which makes more sense.


Second approach assume that every field returns some value. But what if 
we don't want to deserialize some field? In my deserialization instance 
is passed to field and field will eventually fill it with some value.

def deserialize_value(self, obj, instance, field_name):
setattr(instance, field_name, obj)

If we don't want to deserialize field we simply do nothing in 
deserialize_value.
If second approach is used we must return value. Some idea is to mark 
field as not deserializable:

class MyField(Field):
deserializable = False


> Where I returned (native, attributes) I will return (native, 
metainfo). It's only matter of renaming but metainfo will be more than 
attributes.


Again, there's two main ways around I can think of for populating 
metadata such as xml attributes.


1. Return the metadata upfront to the renderer.
2. Include some way for the renderer to get whatever metadata it needs 
at the point it's needed.


This is one point where what I'm doing in django-serializers differs 
from your work, in that rather than return extra metadata upfront, the 
serializers return a dictionary-like object (that e.g. can be directly 
serialized to json or yaml), that also includes a way of returning the 
fields for each key (so that e.g. the xml renderer can call 
field.attributes() when it's rendering each field.)


Again, you might decide that (1) makes more sense, but it's worth 
considering.


As ever, if there's any of this you'd like to talk over off-list, feel 
free to drop me a mail - t...@tomchristie.com


Regards,

  Tom


I rewrite this so it's more similar to django-serializers.
But from the beginning - what I do in this week? :)
I agreed that xml attributes in my solution are  overstated. So I want 
to modify it. Attributes in xml are one of (two) ways of presenting 
information. I still want to have field for attributes, but doing it in 
this way:


class MyField(Field):
attr1 = Field()
attr2 = Field()

def serialized_value(self, obj, field_name):
return field_value

def metainfo(self):
return {'attributes' : ['attr1', 'attr2']}


JSON will skip attributes at all:
some_field : field_value

XML will render it:

 field_value


If metainfo won't return dict with attributes XML will render this:

val1
val2
field_value


I code it like django-serializers's DictWithMeta but I added one more 
functionality to represent Field that have subfields and one extra 
value. I'm still not convicted it is good solution, so I rewrite it 
several times but always end up with something like that :)

 I will push code tomorrow because I still want to do some tweaks.

--
Piotr Grabowski






--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-06-26 Thread Tom Christie
> default deserialized_value don't returns anything. It sets the field 
value.

Cool, that's exactly what I meant.

> but declaring function only to say "do nothing" isn't good solution for 
me.

 Declaring a method to simply 'pass' seems fine to me if you want to 
override it to do nothing.

> It is the way I am doing deserialization - pass instance to subfields

Seems fine.  It's worth keeping in mind that there's two ways around of 
doing this.

1. Create an empty instance first, then populate it with the field values 
in turn.
2. Populate a dictionary with the field values first, and then create an 
instance using those values.

The current deserialization does something closer to the second.
I don't know if there's any issues with doing things the other way around, 
but you'll want to consider which makes more sense.

> Where I returned (native, attributes) I will return (native, metainfo). 
It's only matter of renaming but metainfo will be more than attributes.

Again, there's two main ways around I can think of for populating metadata 
such as xml attributes.

1. Return the metadata upfront to the renderer.
2. Include some way for the renderer to get whatever metadata it needs at 
the point it's needed.

This is one point where what I'm doing in django-serializers differs from 
your work, in that rather than return extra metadata upfront, the 
serializers return a dictionary-like object (that e.g. can be directly 
serialized to json or yaml), that also includes a way of returning the 
fields for each key (so that e.g. the xml renderer can call 
field.attributes() when it's rendering each field.)

Again, you might decide that (1) makes more sense, but it's worth 
considering.

As ever, if there's any of this you'd like to talk over off-list, feel free 
to drop me a mail - t...@tomchristie.com

Regards,

  Tom

On Wednesday, 20 June 2012 16:28:51 UTC+1, Piotr Grabowski wrote:
>
>  W dniu 20.06.2012 13:50, Tom Christie pisze:
>  
>
> > deserialized_value function with empty content
>
> Are you asking about how to be able to differentiate between a field that 
> deserializes to `None`, and a field that doesn't deserialize a value at 
> all? 
>
> No :) I had this problem before and I managed to resolve it - default 
> deserialized_value don't returns anything. It sets the field value.
> def deserialized_value(self, obj, instance, field_name):
> setattr(instance, field_name, obj)
>
> It is the way I am doing deserialization - pass instance to subfields, 
> retrieve it from them (should be same instance, but in specific cases eg. 
> immutable instance, I can imagine that another instance of same class is 
> returned)  and return it.
>
> If I don't declare deserialized_value function then function from base 
> class is taken. It's expected behavior. So how to say "This field shouldn't 
> be deserialized".  Now I declare:
> def deserialized_value(self, obj, instance, field_name):
> pass
> For true, I can do anything in this function excepting set some value to 
> instance, but declaring function only to say "do nothing"  isn't good 
> solution for me.
>
>  
> > I changed python datatype format returned from serializer.serialize 
> method.  Now it is tuple (native, attributes)
>
>  I'm not very keen on either this, or on the way that attributes are 
> represented as fields.
> To me this looks like taking the particular requirements of serializing to 
> xml, and baking them deep into the API, rather than treating them as a 
> special case, and dealing with them in a more decoupled and extensible way.
>
>  For example, I'd rather see an optional method `attributes` on the 
> `Field` class that returns a dictionary of attributes.  You'd then make 
> sure that when you serialize into the native python datatypes prior to 
> rendering, you also have some way of passing through the original Field 
> instances to the renderer in order to provide any additional metadata that 
> might be required in rendering the basic structure.
>
>  Wiring up things this way around lets you support other formats that 
> have extra information attached to the basic structure of the data.  As an 
> example use-case - In addition to json, yaml and xml, a developer might 
> also want to be able to serialize to say, a tabular HTML output.  In order 
> to do this they might need to be able attach template_name or widget 
> information to a field, that'd only be used if rendering to HTML.
>
>  It might be that it's a bit late in the day for API changes like that, 
> but hopefully it at least makes clear why I think that treating XML 
> attributes as anything other than a special case isn't quite the right 
> thing to do.  - Just my personal opinion of course :)
>  
>  
>  Regards,
>
>Tom
>
>  
> You right that I shouldn't treated attributes so special. I have idea how 
> to fix this. Where I returned (native, attributes) I will return (native, 
> metainfo). It's only matter of renaming but metainfo will be more t

Re: Customizable Serialization check-in

2012-06-20 Thread Piotr Grabowski

W dniu 20.06.2012 13:50, Tom Christie pisze:


>deserialized_value function with empty content

Are you asking about how to be able to differentiate between a field 
that deserializes to `None`, and a field that doesn't deserialize a 
value at all?
No :) I had this problem before and I managed to resolve it - default 
deserialized_value don't returns anything. It sets the field value.

def deserialized_value(self, obj, instance, field_name):
setattr(instance, field_name, obj)

It is the way I am doing deserialization - pass instance to subfields, 
retrieve it from them (should be same instance, but in specific cases 
eg. immutable instance, I can imagine that another instance of same 
class is returned)  and return it.


If I don't declare deserialized_value function then function from base 
class is taken. It's expected behavior. So how to say "This field 
shouldn't be deserialized".  Now I declare:

def deserialized_value(self, obj, instance, field_name):
pass
For true, I can do anything in this function excepting set some value to 
instance, but declaring function only to say "do nothing" isn't good 
solution for me.




> I changed python datatype format returned from serializer.serialize 
method.  Now it is tuple (native, attributes)


I'm not very keen on either this, or on the way that attributes are 
represented as fields.
To me this looks like taking the particular requirements of 
serializing to xml, and baking them deep into the API, rather than 
treating them as a special case, and dealing with them in a more 
decoupled and extensible way.


For example, I'd rather see an optional method `attributes` on the 
`Field` class that returns a dictionary of attributes.  You'd then 
make sure that when you serialize into the native python datatypes 
prior to rendering, you also have some way of passing through the 
original Field instances to the renderer in order to provide any 
additional metadata that might be required in rendering the basic 
structure.


Wiring up things this way around lets you support other formats that 
have extra information attached to the basic structure of the data. 
 As an example use-case - In addition to json, yaml and xml, a 
developer might also want to be able to serialize to say, a tabular 
HTML output.  In order to do this they might need to be able attach 
template_name or widget information to a field, that'd only be used if 
rendering to HTML.


It might be that it's a bit late in the day for API changes like that, 
but hopefully it at least makes clear why I think that treating XML 
attributes as anything other than a special case isn't quite the right 
thing to do.  - Just my personal opinion of course :)


Regards,

  Tom



You right that I shouldn't treated attributes so special. I have idea 
how to fix this. Where I returned (native, attributes) I will return 
(native, metainfo). It's only matter of renaming but metainfo will be 
more than attributes. In xml metainfo can contains attributes for field, 
in html it can be template_name or widget for rendering. If I don't use 
metainfo in my serializer class then it's still universal - can be used 
for serialization to any format.


How to create metainfo? Have a method `metainfo' in `Field` class that 
returns a dictionary seems to be good idea. And it is for this use-cases 
for html. But what to do with xml attributes again? :) They aren't only 
field meta informations but they can also contains instance information 
valuable in deserialization (like instance pk in current django 
solution) so they should be treated as fields, should have access to 
instance in serialization and deserialization.


 My last thought is that attributes should be treated as normal fields 
and be in tuple's native object and in metainfo there will be 
information for xml which fields in native should be rendered as attributes.

After first phase:
native =={
'field_1' : value1,
'field_2' : value2,
'field_3' : value3,
}
metainfo == {
'as_attributes' : ['field_2', 'field_3'],
'template_name' : 'my_template'
}

So if we use json in second phase field_2 and field_3 will be render 
same way as field_1 because json don't read metainfo. Xml will render 
fields according to metainfo['as_attributes']. Html will render native 
dict using my_template.


--
Piotr Grabowski


On Tuesday, 19 June 2012 21:48:37 UTC+1, Piotr Grabowski wrote:

Hi!

This week I wrote simple serialization and deserialization for
json format so it's possible now to encode objects from and to json:


import django.core.serializers as s

class Foo(object):
��� def __init__(self):
��� self.bar = [Bar(), Bar(), Bar()]
��� self.x = "X"

class Bar(object):
��� def __init__(self):
��� self.six = 6

class MyField2(s.Field):
��� def deserialized_value(self, obj, instance,� field_name):
��� pass

class MyField(s.Field):
��� x = MyField2(label="my_attribut

Re: Customizable Serialization check-in

2012-06-20 Thread Tom Christie
> if I put list in input I want list in output, not generator

I wouldn't worry about that.  The input and output should be *comparable*, 
but it doesn't mean they should be *identical*.
A couple of cases for example:

*) You should be able to pass both lists and generator expressions to a 
given serializer, but they'll end up with the same representation - there's 
no way to distinguish between the two cases and deserialize accordingly. 
*) Assuming you're going to maintain backwards compatibility, model 
instances will be deserialized into 
django.core.serializer.DeserializedObject instances, rather than 
deserializing directly back into complete model instances.  These match up 
with the original serialized instances, but they are not identical objects. 

> deserialized_value function with empty content

Are you asking about how to be able to differentiate between a field that 
deserializes to `None`, and a field that doesn't deserialize a value at 
all?  I'd suggest that the deserialization hook for a field needs to take 
eg. the dictionary that the value should be deserialized into, then it can 
determine which key to deserialize the field into, or simply 'pass' if it 
doesn't deserialize a value.

> I changed python datatype format returned from serializer.serialize 
method.  Now it is tuple (native, attributes)

I'm not very keen on either this, or on the way that attributes are 
represented as fields.
To me this looks like taking the particular requirements of serializing to 
xml, and baking them deep into the API, rather than treating them as a 
special case, and dealing with them in a more decoupled and extensible way.

For example, I'd rather see an optional method `attributes` on the `Field` 
class that returns a dictionary of attributes.  You'd then make sure that 
when you serialize into the native python datatypes prior to rendering, you 
also have some way of passing through the original Field instances to the 
renderer in order to provide any additional metadata that might be required 
in rendering the basic structure.

Wiring up things this way around lets you support other formats that have 
extra information attached to the basic structure of the data.  As an 
example use-case - In addition to json, yaml and xml, a developer might 
also want to be able to serialize to say, a tabular HTML output.  In order 
to do this they might need to be able attach template_name or widget 
information to a field, that'd only be used if rendering to HTML.

It might be that it's a bit late in the day for API changes like that, but 
hopefully it at least makes clear why I think that treating XML attributes 
as anything other than a special case isn't quite the right thing to do.  - 
Just my personal opinion of course :)

Regards,

  Tom


On Tuesday, 19 June 2012 21:48:37 UTC+1, Piotr Grabowski wrote:
>
>  Hi!
>
> This week I wrote simple serialization and deserialization for json format 
> so it's possible now to encode objects from and to json:
>
>
> import django.core.serializers as s
>
> class Foo(object):
> ��� def __init__(self):
> ������� self.bar = [Bar(), Bar(), Bar()]
> ������� self.x = "X"
>
> class Bar(object):
> ��� def __init__(self):
> ������� self.six = 6
>
> class MyField2(s.Field):
> ��� def deserialized_value(self, obj, instance,� field_name):
> ������� pass
>
> class MyField(s.Field):
> ��� x = MyField2(label="my_attribute", attribute=True)
>
> ��� def serialized_value(self, obj, field_name):
> ������� return getattr(obj, field_name, "No field like this")
>
> ��� def deserialized_value(self, obj, instance,� field_name):
> ������� pass
>
> class BarSerializer(s.ObjectSerializer):
> ��� class Meta:
> ������� class_name = Bar
>
> class FooSerializer(s.ObjectSerializer):
> ��� my_field=MyField(label="MYFIELD")
> ��� bar = BarSerializer()
> ��� class Meta:
> ������� class_name = Foo
>
>
> foos = [Foo(), Foo(), Foo()]
> ser = s.serialize('json', foos, serializer=FooSerializer, indent=4)
> new_foos = s.deserialize('json', ser, deserializer=FooSerializer)
>
>
> There are cases that I don't like:
>
>- deserialized_value function with empty content - what to do with 
>fields that we don't want to deserialize. Should be better way to handle 
>this, 
>- I put list foos but return generator new_foos, also bar in Foo 
>object is generator, not list like in input. Generators are better for 
>performance but if I put list in input I want list in output, not 
>generator. I don't know what to do with this. 
>
>
> Next week I will handle rest of issues that I mentioned in my last week 
> check-in and refactor json format (de)serialization - usage of streams and 
> proper parameters handling (like indent, etc.)
>  
> --
> Piotr Grabowski
>  
>
>
> 

-- 
You received this message because you are subscribe

Re: Customizable Serialization check-in

2012-06-19 Thread Piotr Grabowski

Hi!

This week I wrote simple serialization and deserialization for json 
format so it's possible now to encode objects from and to json:



import django.core.serializers as s

class Foo(object):
def __init__(self):
self.bar = [Bar(), Bar(), Bar()]
self.x = "X"

class Bar(object):
def __init__(self):
self.six = 6

class MyField2(s.Field):
def deserialized_value(self, obj, instance,  field_name):
pass

class MyField(s.Field):
x = MyField2(label="my_attribute", attribute=True)

def serialized_value(self, obj, field_name):
return getattr(obj, field_name, "No field like this")

def deserialized_value(self, obj, instance,  field_name):
pass

class BarSerializer(s.ObjectSerializer):
class Meta:
class_name = Bar

class FooSerializer(s.ObjectSerializer):
my_field=MyField(label="MYFIELD")
bar = BarSerializer()
class Meta:
class_name = Foo


foos = [Foo(), Foo(), Foo()]
ser = s.serialize('json', foos, serializer=FooSerializer, indent=4)
new_foos = s.deserialize('json', ser, deserializer=FooSerializer)


There are cases that I don't like:

 * deserialized_value function with empty content - what to do with
   fields that we don't want to deserialize. Should be better way to
   handle this,
 * I put list foos but return generator new_foos, also bar in Foo
   object is generator, not list like in input. Generators are better
   for performance but if I put list in input I want list in output,
   not generator. I don't know what to do with this.


Next week I will handle rest of issues that I mentioned in my last week 
check-in and refactor json format (de)serialization - usage of streams 
and proper parameters handling (like indent, etc.)


--
Piotr Grabowski




--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-06-11 Thread Piotr Grabowski

Hi!

This week I managed to write deserialization functions and tests.

*Issues with deserialization*
Working on deserialization give me a lot thoughts about previous 
concepts. I rewrite Field class so now Field can't be nested. Field can 
only have subfields if subfields are attributes:

class ContentField(Field):
title = Field(attribute=True) # valid
content = Field() # invalid -> raise exception in class declaration 
time


 def serialized_value(...):
 ...

Of course if ContentField is initialized as attribute and have subfields 
exception is raised (when ContentField is initialized)


I changed python datatype format returned from serializer.serialize 
method. Previously it was dict with serialized fields (label or field 
name as key) and special key __attributes__ with dict of attributes. Now 
it is tuple (native, attributes) where native is dict with serialized 
fields (or generator of dicts)


serializer.deserialize always return object instance

After first phase of serialization, python_serialized_object will be 
serialized by NativeFormat instance. Each format (json, xml, yaml, ...) 
have one NativeFormat that will translate python_serialized_object to 
serialized_string. I want to be able to do this:
object -> python_serial = object_serializer.serialize(object) -> 
string_serial = native_format.serialize(python_serial) -> 
python_deserial = native_format.deserialize(string_serial) -> object2 = 
object_serializer.deserialize(python_deserial)

object2 has same content as object

Now I have:
object -> python_serial = object_serializer.serialize(object) ->  
object2 = object_serializer.deserialize(python_deserial)


*Tests*
I wrote some tests (NativeSerializersTests) for ObjectSerializer in 
django/tests/modeltests/serializers/tests.py but I'm not sure this is 
good place for them. I used model (Article) defined in models.py but I 
used it like normal object. Relation fields aren't serialized in proper way.


Until now I tested the most important functions of ObjectSerializer. 
Creating custom fields, attributes, rename fields (using labels).


Next I want to resolve issues with:

 * Instance creation when deserialize. I have create_instance method
   and Meta.class_name. I must do some public API from them.
 * Ensure that Field serialize method returns always simple native
   python datatypes
 * Write NativeFormat for (at least) json
 * Find better names for already defined classes, methods and files
 * More tests and documentation

When I do this serialization and deserialization will be more or less 
done for (non model) python objects.



--
Piotr Grabowski





--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-06-04 Thread Piotr Grabowski

Hi,

Sorry for being late with weekly update. Due to some issues with Meta 
and my wrong understanding of metaclasses  that Russell pointed I spend 
time on enhance my knowledge about this. I rewrote also some part of 
code that I have written week before.
This week I will do what I was suppose to do last week - initial tests, 
documentations. After this week serialization should work with simple 
objects.



--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-30 Thread Piotr Grabowski

W dniu 29.05.2012 02:28, Russell Keith-Magee pisze:

Hi Piotr;

Apologies for the delay in responding to your updated API.

On Tue, May 22, 2012 at 6:59 AM, Piotr Grabowski  wrote:

I do some changes to my previous API: (https://gist.github.com/2597306<-
change are included)

  * which fields of object are default serialized. It's depend on
include_default_field but opposite to Tom Christie solution it's default
value is True so all fields (eventually specified in Meta.model_fields) are
present

Field options:
~~

  * There's a complication here that doesn't make sense to me.
Following your syntax, the following would appear to be legal:

class FieldA(Field):
 def serialize(…):
 def deserialize(…):

class FieldB(Field):
 to = FieldA()

 def serialize(…):
 def deserialize(…):

class FieldC(Field):
 to = FieldB(attribute=True)

 def serialize(…):
 def deserialize(…):

i.e., if Field allows declaration style definitions, and Field can be
*used* in declaration style definitions, then it's possible to define
them in a nested fashion -- at which point, it isn't clear to me what
is going to be output.

It seems to me that "attribute" shouldn't be an option on a field
declaration; it should either be something that's encompassed in a
similar way to serialise/deserialize (i.e., either additional
input/output from the serialise methods, or a parallel pair of
methods), or the use of a Field as a declarative definition implies
that it is of type attribute, and prevents the use of field types that
themselves have attributes.

In example that You present I thought about raising an exception when the 
FieldC is defined. Another option is to define class as being attribute:
 
class FieldB(Field):

to = FieldA()

def serialize(…):
def deserialize(…):

class Meta:
attribute=True

Then raise an exception when FieldB is defined because of 'to' field. Still one 
of my principle is to have one Serializer for all formats (or at least 
possibility to serialize Serializer in each format) and attribute is something 
really problematic.

About value returns by Field.serialize (Serializer.serialize in general) - now 
it is dict with key __attribute__, maybe better will be to return tuple 
(dict/field_value, attributes_dict) because of issues if there is no field_name 
and attributes are present.





Field methods:
~~~

  * serialize_value(), deserialize_value(); this is bike shedding, but
is there any reason to not use just "serialize() and deserialize()"?
I'm using serialize and deserialize in my code. 
Serializer.serialize(...) returns native python datatype.  It's matter 
of naming but in my opinion serialize is method that should return 
serialized Field/ObjectSerializer not only part of result 
(serialized_value returns only part of data needed for Field serialization)




ObjectSerializer methods:

  * Why does ObjectSerializer have options at all? How can it be "meta"
operating on a generic object? Consider -- if you pass in an instance
of an object, you'll need to use obj.field_name to access fields; if
you pass in a dictionary, you'll need to use obj['field_name']. And if
you're given a generic object what's the list of default fields to
serialize?

Like I said last time, ObjectSerializer should be completely
definition based. Look at Django's Form base class - it has no "meta"
concept -- it's fully declaration based. Then there's ModelForm, which
has a meta class; but the output of the ModelForm could be completely
manually generated using a base Form.
Ok, I think I get this idea finally. Before I think about class Meta 
more like options for class where it is. ObjectSerializer now is more 
like ModelForm than like Form. I have idea how to rewrite it and I will 
notice You when it will be done.

  * I mentioned this last time -- why is class_name a meta option,
rather than a method on the base class with a default implementation?
Having it as an Meta attribute
I answered You last time, I should add this to proposal. Probably I 
don't understand the issue.


get_class(self, data):
if self._meta.class_name is not None:
if isinstance(self._meta.class_name, str):
return object_from_string(data['self._meta.class_name'])
else:
return self._meta.class_name
raise Exception('No class for deserialization provided')

If someone wants more sophisticated class from data resolving then he 
can override get_class.


When I rewrite ObjectSerializer it will be different than this but my 
idea is to have class_name as short cut for writing method get_class.




  * I'm not wild about the way related_serializer seems to work,
either. Again, like class_name, it seems like it should be a method,
not an option. By making it an option, you're assuming that it will
have a single obvious value, which definitely won't be true -- e.g., I
have an object with relations to users, groups and permissions; I want
to output users as a li

Re: Customizable Serialization check-in

2012-05-28 Thread Russell Keith-Magee
Hi Piotr;

Apologies for the delay in responding to your updated API.

On Tue, May 22, 2012 at 6:59 AM, Piotr Grabowski  wrote:
> I do some changes to my previous API: (https://gist.github.com/2597306 <-
> change are included)
>
>  * which fields of object are default serialized. It's depend on
> include_default_field but opposite to Tom Christie solution it's default
> value is True so all fields (eventually specified in Meta.model_fields) are
> present

Field options:
~~

 * There's a complication here that doesn't make sense to me.
Following your syntax, the following would appear to be legal:

class FieldA(Field):
def serialize(…):
def deserialize(…):

class FieldB(Field):
to = FieldA()

def serialize(…):
def deserialize(…):

class FieldC(Field):
to = FieldB(attribute=True)

def serialize(…):
def deserialize(…):

i.e., if Field allows declaration style definitions, and Field can be
*used* in declaration style definitions, then it's possible to define
them in a nested fashion -- at which point, it isn't clear to me what
is going to be output.

It seems to me that "attribute" shouldn't be an option on a field
declaration; it should either be something that's encompassed in a
similar way to serialise/deserialize (i.e., either additional
input/output from the serialise methods, or a parallel pair of
methods), or the use of a Field as a declarative definition implies
that it is of type attribute, and prevents the use of field types that
themselves have attributes.

Field methods:
~~~

 * serialize_value(), deserialize_value(); this is bike shedding, but
is there any reason to not use just "serialize() and deserialize()"?

ObjectSerializer methods:

 * Why does ObjectSerializer have options at all? How can it be "meta"
operating on a generic object? Consider -- if you pass in an instance
of an object, you'll need to use obj.field_name to access fields; if
you pass in a dictionary, you'll need to use obj['field_name']. And if
you're given a generic object what's the list of default fields to
serialize?

Like I said last time, ObjectSerializer should be completely
definition based. Look at Django's Form base class - it has no "meta"
concept -- it's fully declaration based. Then there's ModelForm, which
has a meta class; but the output of the ModelForm could be completely
manually generated using a base Form.

 * I mentioned this last time -- why is class_name a meta option,
rather than a method on the base class with a default implementation?
Having it as an Meta attribute

 * I'm not wild about the way related_serializer seems to work,
either. Again, like class_name, it seems like it should be a method,
not an option. By making it an option, you're assuming that it will
have a single obvious value, which definitely won't be true -- e.g., I
have an object with relations to users, groups and permissions; I want
to output users as a list of nested objects, permissions as a list of
natural keys, and groups as a list of primary keys.

 * I'm not sure I see why include_default_fields is needed. Isn't this
implied by the values for "fields" and "exclude"? i.e., if fields or
exclude is defined, you're not including everything by default;
otherwise you are. Why the additional setting? What's the interaction
of include_default_fields with fields and exclude?

 * I don't understand what follow_object is trying to do. Isn't the
issue here whether you use a serializer that just outputs a primary
key, or an object that outputs field values? And if it's the latter,
the sub-serializer determines how deep things go?

ModelSerializer options:

 * I'm really not a fan of model_fields. This seems like a short cut
that will make the implementation a whole lot more complex, and
ultimately is much less explicit than just naming the fields that you
want to serialize.

> I'm aware that there will be lot of small issues but I believe that ideas
> are good.

I'm still optimistic, but there's still some fundamental issues here
-- in particular, the existence of Meta on ObjectSerializer, and the
way that attributes on XML tags are being handled. I don't think we've
hit any blockers, but we need to get these sorted out before you start
producing too much code.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-27 Thread Anssi Kääriäinen
On May 27, 7:37 pm, Piotr Grabowski  wrote:
> Hi,
>
> This week I started coding my project. It' available on branch
> soc2012-serialization onhttps://github.com/grapo/django.
>
> I'm not very familiar with git so I'm not suer that I do it right:
> * I forked django repo from github
> * clone it to my computer
> * create new branch soc2012
> * work in this branch
> * push it to origin
>
> When I want to synchronize my branch with django trunk I will fetch
> master from upstream (django/django) and  merge master to my branch.
> It's this flow good?

I think that is a good way to go. It might be the branch will need
some history rewriting when it is otherwise ready for commit, but
until then keeping your history intact so that others can easily
follow you work is good. One advice I have seen is that you should not
merge upstream changes too often, it will just mess up the history.
You can easily enough create another branch where you test how your
work interacts with master branch. Only merge your soc2012 branch if
upstream changes are such that your work needs major changes by them.
Trivial merge conflicts do not require merging upstream back.

Another option is rebase workflow for the branch, but in this case you
should make it absolutely clear that others should not consider your
github branch as anything else than a convenient way to publish pa
your work as patch-series. The good thing about this way of working is
that your changes will be on top of the commit log all the time, and
thus it is very easy to see what you have done in your branch.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-27 Thread Piotr Grabowski

Hi,

This week I started coding my project. It' available on branch 
soc2012-serialization on https://github.com/grapo/django.


I'm not very familiar with git so I'm not suer that I do it right:
* I forked django repo from github
* clone it to my computer
* create new branch soc2012
* work in this branch
* push it to origin

When I want to synchronize my branch with django trunk I will fetch 
master from upstream (django/django) and  merge master to my branch. 
It's this flow good?


Until now I coded base for Serializers and Fields. I don't include any 
test or documentation so it can be hard to try it. I am pretty sure that 
writing  appropriate docstring will be a challenge for me :) I copied 
some metaclass code from django forms and models. You can instantiate 
ObjectSerializer and try to serialize some simple python objects with 
it. It will serializer all fields presented in object.__dict__ and 
return python native datatype. The code is still in early phase so it's 
not polished and need for some refactor but if You have some tips for me 
I will be very grateful.


Next week I will fix some issues,  code ModelSerializer and write 
documentation and test for what I done so far. I must also think about 
renaming some functions so the API will be more convenient.


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-21 Thread Piotr Grabowski
I do some changes to my previous API: (https://gist.github.com/2597306 
<- change are included)


 * which fields of object are default serialized. It's depend on 
include_default_field but opposite to Tom Christie solution it's default 
value is True so all fields (eventually specified in Meta.model_fields) 
are present

.
 * follow_object attribute. In short - on which object should work 
Serializer's child Serializer. Tom wrote about this in previous mail but 
I didn't fully understand the problem so I gave him bad answer. It's 
better described in algorithm I present.


 * get rid of aliases and preserve_field_ordering fields

 * change class hierarchy
class Serializer(object) # base class for serializing
class Field(Serializer) # class for serializing fields in objects
class ObjectSerializer(Serializer) # class for serializing objects
class ModelSerializer(Serializer) # class for serializing Django 
Models.



I prepare  list of steps for first phase of serialization. It's written 
in English-Python pseudo code :) Hope indentation will be preserved.
Serializer.serialize is function that for object will return dict with 
python native datatypes.


(Object|Model)Serializer.serialize(object, field_name (can be None), 
**options)

1. Get object
1.1. if object is iterable then do this algorithm for all elements 
and return list of returned values
1.2. if field_name for object is set from upper level we have 
object Obj:
1.2.1. if Meta.follow_object == True then work on object 
Obj.field_name

1.2.2. else work on Obj

2. Find all fields Fs that should be serialized
   2.1. Get all fields declared in Serializer
   2.2. Get all fields from Meta.fields
   2.3. If Meta.include_default_fields = True then get all fields where 
type is valid in Meta.model_fields and not in Meta.exclude


3. Create dictionary A and for F in Fs:
3.1. Find serializer for F
3.1.1. If F is declared in Serializer then serializer is 
explicit declared

3.1.2. Else get serializer for F type (m2m related etc)
3.2. Save in dictionary A[field_name] = serializer_value
3.2.1. If field has set label then field_name = label
3.2.2. If field has set attribute=True then add this to 
dictionary A[__attributes__][field_name] = serializer_value


4. Return A


Field.serialize(object, field_name (can be None), **options)
1. Get object
1.1. if it is iterable then do this algorithm for all elements
1.2. work on object Obj passed from upper level

2. Find all fields Fs that should be serialized
   2.1. Get all fields from declared fields

3. Create dictionary A and for F in Fs:
3.1. Find serializer for F
3.1.1. F is in declared fields so serializer is explicit declared
3.2. Save in dictionary A[field_name] = serializer_value
3.2.1. If field has set label then field_name = label
3.2.2. If field has set attribute=True then add this to 
dictionary A[__attributes__][field_name] = serializer_value


4. Resolve function serialized_value
4.1. If Fs (and A) is empty:
4.1.1. If function field_name returns None then return 
serialized_value

4.1.2. Else return {field_name() : serialized_value()}
4.2. Else
4.2.1. If function field_name returns None then raise Exception
4.2.2. Else  A.update({field_name() : serialized_value()})

5. Return A

We have dict (list of dicts) from first phase of serialization. Next 
__attributes__ must be resolve (depends on format and strategy).



Deserialization: (it's early idea)

SomeSerializer.deserialize(D - python_native_datetype_objects (dict or 
list of dict), instance=None, field_name=None, class_name=None, **options)


1. Get object instance # Resolving this may be more complicated than I 
wrote below (e.g. base on D fields - duck typing)

1.1. If instance is not None then use it
1.2. Else try resolve class_name
1.2.1. If class_name is class object instantiate it.
1.2.2. If class_name is string then find string value for this 
key in D and instantiate it

1.2.3. If class_name is None raise Exception

2. Find all fields in D and find fields in Serializer for deserializing them
2.1. Resolve label attribute for fields

3. Pass instance, data D and field_name to all fields Serializers

4. Return instance


I'm aware that there will be lot of small issues but I believe that 
ideas are good.


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-20 Thread Piotr Grabowski

Hi,

During this week I was focused on my exams. Now I have more time for 
serialization project. Sadly API isn't finished yet. 21 May in gsoc 
calendar is time for start coding. Tomorrow I will send updates to API 
proposal and I will present idea of algorithm (maybe list of steps will 
be better name) used for serialization. Wednesday 23 May I want start 
coding and Saturday 27 may I will write next check in and present my 
initial code.


First thing I want to code is basis for serializers.serializer method, 
Serializer and Field class. After two first weeks I want to be able to 
serialize  very simple objects to json. Like I wrote in my first 
proposal I'm ready to spend 20 hours per week on this. In two first 
weeks it will be less due to my studies tasks.



--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-12 Thread Piotr Grabowski

Hi,

This week I think about internal API for Serializer. I want that 
developers can eventually use it for better customization of their 
solutions.


Next week I must learn for my exams so I suppose I will not do much with 
serialization project. I will try to resolve some issues about my API 
that Tom Christie pointed.


I know that I didn't do much but at the end of semester I have many 
tasks related to my studies. After end of May I will have much more time.


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-05-07 Thread Piotr Grabowski

W dniu 07.05.2012 20:13, Tom Christie pisze:

Hey Piotr,

Here's a few comments...

You have 'fields' and 'exclude' option, but it feels like it's missing 
an 'include' option - How would you represent serializing all the 
fields on a model instance (without replicating them), and 
additionally including one other field?  I see that you could do that 
by explicitly adding a Field declaration, but 'include' would seem 
like an obvious addition to the other two options.
Default all model fields will be serialized and additional all fields 
adding by Fields declaration. If 'fields' is set then only fields 
present in 'fields' and additional fields added by Fields declaration 
will be serialized. To many fields :). If exclude is set then all model 
fields except fields set in exclude will be serialized and additional 
fields added by explicit declaration. I think it's like in ModelForm 
declaration. Did I'm missing some case?


I'd second Russell's comment about aliases.  Defining a label on the 
field would seem more tidy.


Likewise the comment about 'preserve_field_order'  I've still got this 
in for 'django-serializers' at the moment, but I think it's something 
that should become private.  (At an implementation level it's still 
needed, in order to make sure you can exactly preserve the field 
ordering for json and yaml dumpdata, which is unsorted (determined by 
pythons dict key ordered).

I answer Russell about that


Being able to nest serializers inside other serializers makes sense, 
but I don't understand why you need to be able to nest fields inside 
fields.  Shouldn't serializers be used to represent complex outputs 
and fields be used to represent flat outputs?
At first I think Serializer should be tied with object (one Serializer = 
one object). But then I figured out that Serializer can work with object 
passed in upper level Serialized (so 'source' field isn't needed). Maybe 
nested serializers and flat field is better approach. I must consider this.




The "class_name" option for deserialization is making too many 
assumptions.  The class that's being deserialized may not be present 
in the data - for example if you're building an API, the class that's 
being deserialized might depend on the URL that the data is being sent 
too. eg "http://example.com/api/my-model/12";
I wrote about class_name in answer to Russell. If model class is in url 
then we can do something like that:
serializers.deserialize("json", data_from_response, 
deserializer=UserSerializer(class_name=model_from_url(url)))
In your dump data serializer, how do you distinguish that the 'fields' 
field is the entire object being serialized rather than the 'fields' 
attribute of the object being serialized?
fields = ModelFieldsSerializer(...) will be feed with object to 
serialize and name 'fields'. I'm only interested at output from it. It 
must be python native datatype and I do something like 
serialized_dict['fields'] = output_of_mode_fields_serializer 
ModelFieldsSerializer knows what do with object.
Also, the existing dumpdata serialization only serializes local fields 
on the model - if you're using multi-table inheritance only the 
child's fields will be serialized, so you'll need some way of handling 
that.


Your PKFlatField implementation will need to be a bit more complex in 
order to handle eg many to many relationships.  Also, you'll want to 
make sure you're accessing the pk's from the model without causing 
another database lookup.

Thanks for point that. Have to think about it.


Is there a particular reason you've chosen to drop 'depth' from the 
API?  Wouldn't it sometimes be useful to specify the depth you want to 
serialize to?
Sometimes maybe. But in most cases no. And there are some other ways to 
do that. In my opinion going (globally) more than one level depth almost 
never be needed. If there is need to go deeper in only one (or few but 
not all) fields 'depth' is unusable.


There's two approaches you can take to declaring the 'xml' format for 
dumpdata, given that it doesn't map nicely to the json and yaml 
formats.  One is to define a custom serializer (as you've done), the 
other is to keep the serializer the same and define a custom renderer 
(or encoder, or whatever you want to call the second stage).  Of the 
two, I think that the second is probably a simpler cleaner approach.
When you come to writing a dumpdata serializer, you'll find that 
there's quite a few corner cases that you'll need to deal with in 
order to maintain full byte-for-byte backwards compatibility, 
including how natural keys are serialized, how many to many 
relationships are encoded, how None is handled for different types, 
down to making sure you preserve the correct field ordering across 
each of json/yaml/xml.  I *think* that getting the details of all of 
those will end up being awkward to express using your current approach.
The second approach would be to a dict-like format, that can easily be 
encoded into json or yam

Re: Customizable Serialization check-in

2012-05-07 Thread Piotr Grabowski

W dniu 06.05.2012 10:45, Russell Keith-Magee pisze:


  - I'm not sure I follow how class_name would be used in practice. The
act of deserialization is to take a block of data, and process it to
populate an object.

In the simplest case, you could provide an empty instance (or factory)
that is then populated by deserialization. In this case, no class name
is required -- it's provided explicitly by the object you provide.


I have this functionality with class_name
serializers.deserialize("json", data, 
deserializer=UserSerializer(class_name=User))




A more complex case is to use the data itself to determine the type of
object to create. This seems to be the reason you have "class_name",
but I'm not sure it's that simple. Consider a case where you're
deserializing a thing of objects; if the data has a "name" attribute,
create a "Person" object, otherwise create a "Thing" object. The
object required is well defined, but not neatly available in a field.
If we have homogeneous list of object there is no problem. We can use 
same construction as above, or depends (by class_name) on some field in 
object. But if list is heterogeneous  and we havn't information about 
type - it's difficult then. There is need for feature like that? My 
first thought is to have method in Serializer class like:


get_class(self, data):
# data is object (dict?) produced by first phase of deserialization)
# user can search some field in it and return class for object creation
if 'name' in data:
return Person
return Thing

it can be more like internal API which can be default set to:

get_class(self, data):
if self._meta.class_name is not None:
if isinstance(self._meta.class_name, str):
return object_from_string(data['self._meta.class_name'])
else:
return self._meta.class_name
raise Exception('No class for deserialization provided')

So if someone has simple needs he can use simple functionality like 
class_name=Profile, but if there is need to find class by duck typing 
philosophy using overwriting get_class will be suitable.




There's also no requirement that deserialization into an object is
handled by a ModelSerializer. ModelSerializer should just be a
convenient factory for populating a Serializer based on attributes of
a model -- so anything you do with ModelSerializer should be possible
to do manually with a Serializer. If class_name is tied to
ModelSerializer, we lose this ability.
Yes, I make a mistake - where I wrote ModelSerializer options I should 
wrote Serializer options because ModelSerializer is just Serializer 
which understands difference about fields in object (m2m, fk ...)




  - I'm not sure I see the purpose of "aliases" -- or, why this role
can't be played by other parts of the system. In particular, I see
Field() as a definition for how to fill out one 'piece' of a
serialised object. Why doesn't Field() contain the logic for how to
extract it's value from the underlying object?
Previously I used it with additional meaning -> if aliases[x] = 
aliases[y] then x = [value[x], value[y]], but now it's only shortcut for 
writing:

1) fname = Field(label=first_name)

2) aliases = {'fname' :'first_name'}

It's redundant but I think this can be helpful


  - Why is preserve_field_ordering needed? Can't ordering be handled by
the explicit order of field definitions, or the ordering in the
"fields" attribute?

I agree, ordering in the 'fields' attribute (like in Forms) will be better.


  * As a matter of style, serializer_field_value and
deserialize_field_value seem excessively long as names. Is there
something wrong with serialize and deserialize?
For now I want reserve serialize and deserialize names because I think 
these names would be more appropriate for methods that will return 
python native datatypes after first phase of serialization. If user 
overwrite it he can do anything he want and must return native datatypes.
But sure, (de)serializer_field_value seems to be too long. Any other 
propositions? Maybe get_value (because it must get value from object 
field for serialization) and set_value (it sets value ob object field in 
deserialization) ?


  * I don't think getattr() works quite how you think it does. In
particular, I don't think:

   getattr(instance, instance_field_name) = getattr(obj, field_name)

will do what you think it does. I think you're looking for setattr() here.

Oops :) Definitely setattr should be there.



  * Can you elaborate some more on the XML attribute syntax in your
proposal? One of your original statements (that I agree with) is that
the "format" is independent of the syntax, and that a single set of
formatting rules should be able to be used for XML or for JSON. The
big difference between XML and JSON is that XML allows for values to
be packed as attributes. I can see that you've got an 'attribute'
argument on a Field, but it isn't clear to me how JSON would interpret
this, or how XML would interpret:


I

Re: Customizable Serialization check-in

2012-05-07 Thread Tom Christie
Hey Piotr,

Here's a few comments...

You have 'fields' and 'exclude' option, but it feels like it's missing an 
'include' option - How would you represent serializing all the fields on a 
model instance (without replicating them), and additionally including one 
other field?  I see that you could do that by explicitly adding a Field 
declaration, but 'include' would seem like an obvious addition to the other 
two options.  

I'd second Russell's comment about aliases.  Defining a label on the field 
would seem more tidy.

Likewise the comment about 'preserve_field_order'  I've still got this in 
for 'django-serializers' at the moment, but I think it's something that 
should become private.  (At an implementation level it's still needed, in 
order to make sure you can exactly preserve the field ordering for json and 
yaml dumpdata, which is unsorted (determined by pythons dict key ordered). 

Being able to nest serializers inside other serializers makes sense, but I 
don't understand why you need to be able to nest fields inside fields. 
 Shouldn't serializers be used to represent complex outputs and fields be 
used to represent flat outputs?

When defining custom fields it'd be good if there was a way of overriding 
the serialization that's independent of how the field is retrieved from the 
model.  For example, with model relation fields, you'd like to be able to 
subclass between representing as a natural key, representing as a url, 
representing as a string name etc... without having to replicate all the 
logic that handles the differences between relationship, multiple 
relationships, and reverse relationships.

The "class_name" option for deserialization is making too many assumptions. 
 The class that's being deserialized may not be present in the data - for 
example if you're building an API, the class that's being deserialized 
might depend on the URL that the data is being sent too. eg 
"http://example.com/api/my-model/12";

In your dump data serializer, how do you distinguish that the 'fields' 
field is the entire object being serialized rather than the 'fields' 
attribute of the object being serialized?  Also, the existing dumpdata 
serialization only serializes local fields on the model - if you're using 
multi-table inheritance only the child's fields will be serialized, so 
you'll need some way of handling that.

Your PKFlatField implementation will need to be a bit more complex in order 
to handle eg many to many relationships.  Also, you'll want to make sure 
you're accessing the pk's from the model without causing another database 
lookup.

Is there a particular reason you've chosen to drop 'depth' from the API? 
 Wouldn't it sometimes be useful to specify the depth you want to serialize 
to?

There's two approaches you can take to declaring the 'xml' format for 
dumpdata, given that it doesn't map nicely to the json and yaml formats. 
 One is to define a custom serializer (as you've done), the other is to 
keep the serializer the same and define a custom renderer (or encoder, or 
whatever you want to call the second stage).  Of the two, I think that the 
second is probably a simpler cleaner approach.
When you come to writing a dumpdata serializer, you'll find that there's 
quite a few corner cases that you'll need to deal with in order to maintain 
full byte-for-byte backwards compatibility, including how natural keys are 
serialized, how many to many relationships are encoded, how None is handled 
for different types, down to making sure you preserve the correct field 
ordering across each of json/yaml/xml.  I *think* that getting the details 
of all of those will end up being awkward to express using your current 
approach.
The second approach would be to a dict-like format, that can easily be 
encoded into json or yaml, but that can also include metadata specific to 
particular encodings such as xml (or perhaps, say, html).  You'd have a 
generic xml renderer, that handles encoding into fields and attributes in a 
fairly obvious way, and a dumpdata-specific renderer, that handles the odd 
edge cases that the dumpdata xml format requires.  The dumpdata-specific 
renderer would use the same intermediate data that's used for json and yaml.

I hope all of that makes sense, let me know if I've not explained myself 
very well anywhere.

Regards,

  Tom

On Friday, 4 May 2012 21:08:14 UTC+1, Piotr Grabowski wrote:
>
> Hi, 
>
> During this week I have a lot of work so I didn't manage to present my 
> revised proposal in Monday like i said. Sorry. I have it now: 
> https://gist.github.com/2597306 
>
> Next week I hope there will be some discussion about my proposal. I will 
> also think how it should be done under the hood. There should be some 
> internal API. I should also resolve one Django ticket. I think about 
> this https://code.djangoproject.com/ticket/9279 There will be good for 
> test cases in my future solution. 
>
> I should write my proposal on this group? In github I have nice 

Re: Customizable Serialization check-in

2012-05-06 Thread Russell Keith-Magee
On Sat, May 5, 2012 at 4:08 AM, Piotr Grabowski  wrote:
> Hi,
>
> During this week I have a lot of work so I didn't manage to present my
> revised proposal in Monday like i said. Sorry. I have it now:
> https://gist.github.com/2597306

Hi Piotr,

At a high level, I think you're headed in the right direction. I like
the way you've separated Field and Serializer, and I like the way that
Serializer represents on "nesting level" of the final output (so if
you want complex formats for a single object, such as with the way
Django's JSON serializer has id, model and fields at the top level,
you nest Serializers to suit).

Here's some specific feedback:

 * I can see that ModelSerializer will play an important part in your
proposal. However, some of your API proposals seem a little
unnecessary -- or are unclear why they're needed. Some areas that need
clarification:

 - I'm not sure I follow how class_name would be used in practice. The
act of deserialization is to take a block of data, and process it to
populate an object.

In the simplest case, you could provide an empty instance (or factory)
that is then populated by deserialization. In this case, no class name
is required -- it's provided explicitly by the object you provide.

A more complex case is to use the data itself to determine the type of
object to create. This seems to be the reason you have "class_name",
but I'm not sure it's that simple. Consider a case where you're
deserializing a thing of objects; if the data has a "name" attribute,
create a "Person" object, otherwise create a "Thing" object. The
object required is well defined, but not neatly available in a field.

There's also no requirement that deserialization into an object is
handled by a ModelSerializer. ModelSerializer should just be a
convenient factory for populating a Serializer based on attributes of
a model -- so anything you do with ModelSerializer should be possible
to do manually with a Serializer. If class_name is tied to
ModelSerializer, we lose this ability.

 - I'm not sure I see the purpose of "aliases" -- or, why this role
can't be played by other parts of the system. In particular, I see
Field() as a definition for how to fill out one 'piece' of a
serialised object. Why doesn't Field() contain the logic for how to
extract it's value from the underlying object?

 - Why is preserve_field_ordering needed? Can't ordering be handled by
the explicit order of field definitions, or the ordering in the
"fields" attribute?

 * As a matter of style, serializer_field_value and
deserialize_field_value seem excessively long as names. Is there
something wrong with serialize and deserialize?

 * I don't think getattr() works quite how you think it does. In
particular, I don't think:

  getattr(instance, instance_field_name) = getattr(obj, field_name)

will do what you think it does. I think you're looking for setattr() here.

 * Can you elaborate some more on the XML attribute syntax in your
proposal? One of your original statements (that I agree with) is that
the "format" is independent of the syntax, and that a single set of
formatting rules should be able to be used for XML or for JSON. The
big difference between XML and JSON is that XML allows for values to
be packed as attributes. I can see that you've got an 'attribute'
argument on a Field, but it isn't clear to me how JSON would interpret
this, or how XML would interpret:

  - A Field that had multiple sub-Fields, all of which were attribute=True
  - A Field that had multiple sub-Fields, several of which were attribute=False
 - The difference between these two definitions by your formatting rules:


subval


main value

In particular, why is the top level structure of the JSON serializer
handled with nested Serializers, but the structure of the XML
serializer is handled with nested Fields?

> Next week I hope there will be some discussion about my proposal. I will
> also think how it should be done under the hood. There should be some
> internal API. I should also resolve one Django ticket. I think about this
> https://code.djangoproject.com/ticket/9279 There will be good for test cases
> in my future solution.

I would suggest that you don't spend *too* much time on this. It's
certainly a good idea to get to know our committing procedures, and
historically we've encouraged students to get to use working on a
small ticket as a way to do this. However, your project is unusual in
that you've been accepted without a firm API proposal. Given that you
won't really be able to work on the GSoC without an accepted proposal,
I'd suggest that your API should take precedence in your pre-GSoC
plans.

> I should write my proposal on this group? In github I have nice formatting
> and in this group my Python code was badly formatted.

It's up to you; however, the problem with posting to a Gist (or
similar) is that it's very hard to comment on specific parts of your
proposal. I know code formatting is a pain in Google groups, but it is
a much be

Re: Customizable Serialization check-in

2012-05-04 Thread Piotr Grabowski

Hi,

During this week I have a lot of work so I didn't manage to present my 
revised proposal in Monday like i said. Sorry. I have it now:

https://gist.github.com/2597306

Next week I hope there will be some discussion about my proposal. I will 
also think how it should be done under the hood. There should be some 
internal API. I should also resolve one Django ticket. I think about 
this https://code.djangoproject.com/ticket/9279 There will be good for 
test cases in my future solution.


I should write my proposal on this group? In github I have nice 
formatting and in this group my Python code was badly formatted.


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-04-27 Thread Piotr Grabowski

W dniu 27.04.2012 12:39, Tom Christie pisze:

Hey Piotr,


> I quickly skimmed the proposal and I noticed speed/performance wasn't
mentioned. I believe performance is important in serialization and
especially in deserialization.

Right.  Also worth considering is making sure the API can deal with 
streaming large querysets,

rather than loading all the data into memory at once.
(See also https://code.djangoproject.com/ticket/5423)

- Tom.

Maybe it can be done with chain of two black box generators. First 
generator input are queryset (iterable sequence) and  user defined 
Serializer class contains how to transform single object and output is 
python primitive type objects. Second is feed with this objects and 
outputs serialized_string. What with nested objects - more generators? 
Generators are good because we can also reuse Serializer objects == 
better performance. But like Anssi said - optimize after the code is 
written, not before :)


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-04-27 Thread Tom Christie
Hey Piotr,

  Thanks for the quick response.

> However sharing ideas and discuss how the API should look and work it 
will be very desirable.

That'd be great, yup.  I've got a couple of comments and questions about 
bits of the API, but I'll wait until you've had a chance to post your 
proposal to the list before starting that discussion. 

> I quickly skimmed the proposal and I noticed speed/performance wasn't 
mentioned. I believe performance is important in serialization and 
especially in deserialization.

Right.  Also worth considering is making sure the API can deal with 
streaming large querysets,
rather than loading all the data into memory at once.
(See also https://code.djangoproject.com/ticket/5423)

- Tom.

On Friday, 27 April 2012 10:11:56 UTC+1, Piotr Grabowski wrote:
>
> W dniu 27.04.2012 10:36, Anssi K��ri�inen pisze: 
> > On Apr 27, 11:14 am, Piotr Grabowski  wrote: 
> >> Hi! 
> >> 
> >> I'm Piotr Grabowski, student from University of Wroclaw, Poland 
> >> In this Google Summer of Code I will  deal with problem of customizable 
> >> serialization in Django. 
> >> 
> >> You can find my proposal here https://gist.github.com/2319638 
> > I quickly skimmed the proposal and I noticed speed/performance wasn't 
> > mentioned. I believe performance is important in serialization and 
> > especially in deserialization. It is not the number one priority item, 
> > but it might be worth it to write a couple of benchmarks (preferably 
> > to djangobench [1]) and check that there are no big regressions 
> > introduced by your work. If somebody already has good real-life 
> > testcases available, please share them... 
> > 
> >   - Anssi 
> > 
> > [1] https://github.com/jacobian/djangobench/ 
> > 
> I didn't think about performance a lot. There will be regressions. 
> Now serialization is very simple: Iterate over fields, transform it into 
> string (or somethink serializable), serialize it with json|yaml|xml. 
> In my approach it is: transform (Model) object to Serializer object, 
> each field from original object is  FieldSerializer object, next  (maybe 
> recursively) get native python type object from each field, serialize it 
> with json|yaml|xml. 
> I can do some optimalizations in this process but it's clear it will 
> take longer to serialize (and deserialize) object then now. It can be 
> problem with time taken by tests if there is a lot of fixtures. 
> I will try to write good, fast code but I will be very glad if someone 
> give me tips about performance bottlenecks in it. 
>
> -- 
> Piotr Grabowski 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/django-developers/-/K9cslx5Fa_sJ.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-04-27 Thread Anssi Kääriäinen
On Apr 27, 12:11 pm, Piotr Grabowski  wrote:
> I didn't think about performance a lot. There will be regressions.
> Now serialization is very simple: Iterate over fields, transform it into
> string (or somethink serializable), serialize it with json|yaml|xml.
> In my approach it is: transform (Model) object to Serializer object,
> each field from original object is  FieldSerializer object, next  (maybe
> recursively) get native python type object from each field, serialize it
> with json|yaml|xml.
> I can do some optimalizations in this process but it's clear it will
> take longer to serialize (and deserialize) object then now. It can be
> problem with time taken by tests if there is a lot of fixtures.
> I will try to write good, fast code but I will be very glad if someone
> give me tips about performance bottlenecks in it.

One possibility is to have a fast-path for simple cases. But,
premature optimization is the root of all evil, so lets first see how
fast the code is, and then check if anything needs to be done.

I still think it is a good idea to actually check how fast the new
serialization code is, not just assume it is fast enough. So, please
include some simple benchmarks in your project.

I hope users who have a need for fast serialization will participate
in this discussion by telling their use cases.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-04-27 Thread Piotr Grabowski

W dniu 27.04.2012 10:36, Anssi Kääriäinen pisze:

On Apr 27, 11:14 am, Piotr Grabowski  wrote:

Hi!

I'm Piotr Grabowski, student from University of Wroclaw, Poland
In this Google Summer of Code I will  deal with problem of customizable
serialization in Django.

You can find my proposal here https://gist.github.com/2319638

I quickly skimmed the proposal and I noticed speed/performance wasn't
mentioned. I believe performance is important in serialization and
especially in deserialization. It is not the number one priority item,
but it might be worth it to write a couple of benchmarks (preferably
to djangobench [1]) and check that there are no big regressions
introduced by your work. If somebody already has good real-life
testcases available, please share them...

  - Anssi

[1] https://github.com/jacobian/djangobench/


I didn't think about performance a lot. There will be regressions.
Now serialization is very simple: Iterate over fields, transform it into 
string (or somethink serializable), serialize it with json|yaml|xml.
In my approach it is: transform (Model) object to Serializer object, 
each field from original object is  FieldSerializer object, next  (maybe 
recursively) get native python type object from each field, serialize it 
with json|yaml|xml.
I can do some optimalizations in this process but it's clear it will 
take longer to serialize (and deserialize) object then now. It can be 
problem with time taken by tests if there is a lot of fixtures.
I will try to write good, fast code but I will be very glad if someone 
give me tips about performance bottlenecks in it.


--
Piotr Grabowski

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Customizable Serialization check-in

2012-04-27 Thread Anssi Kääriäinen
On Apr 27, 11:14 am, Piotr Grabowski  wrote:
> Hi!
>
> I'm Piotr Grabowski, student from University of Wroclaw, Poland
> In this Google Summer of Code I will  deal with problem of customizable
> serialization in Django.
>
> You can find my proposal here https://gist.github.com/2319638

I quickly skimmed the proposal and I noticed speed/performance wasn't
mentioned. I believe performance is important in serialization and
especially in deserialization. It is not the number one priority item,
but it might be worth it to write a couple of benchmarks (preferably
to djangobench [1]) and check that there are no big regressions
introduced by your work. If somebody already has good real-life
testcases available, please share them...

 - Anssi

[1] https://github.com/jacobian/djangobench/

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



[GSoC] Customizable Serialization check-in

2012-04-27 Thread Piotr Grabowski

Hi!

I'm Piotr Grabowski, student from University of Wroclaw, Poland
In this Google Summer of Code I will  deal with problem of customizable 
serialization in Django.


You can find my proposal here https://gist.github.com/2319638

It's obviously not a finished idea, it's need to be simplified for sure. 
My mentor Russel Keith Magee told me to look at Tom Christie's 
serialization API. I found it similar to my proposal, there is a lot in 
common - declarative fields, same approach to various aspect of 
serialization , but his API is simpler and it feels better.


Since Tom already post on group about his project I can refer to it:

W dniu 27.04.2012 06:44, Tom Christie pisze:

...

Given that Piotr's GSoC proposal has now been accepted, I'm wondering 
what the
right way forward is?  I'd like to continue to push forward with this, 
but I'm
also aware that it might be a bit of an issue if there's already an 
ongoing

GSoC project along the same lines?

Having taken a good look through the GSoC proposal, it looks good, and 
there

seems to be a fair bit of overlap, so hopefully he'll find what I've done
useful, and I'm sure I'll have plenty of comments on his project as it
progresses.

I'd consider suggesting a collaborative approach, but the rules of the 
GSoC

wouldn't allow that right?

--
Like I said above, your work will be very useful for me. I must read 
GSoC regulations carefully but for sure collaboration with code writing 
is impossible. I don't know that I could use your existing code base but 
I think it's also impossible. However sharing ideas and discuss how the 
API should look and work it will be very desirable.



My plan for next few weeks is to meet Django contribution requirements, 
solve ticket to prove I now the process off doing it, and what's most 
important  have discussion about serialization API. I hope community 
will be interested in this feature.


After weekend I will post my proposal with updates from Tom's API.

--
Piotr Grabowski


--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.