C++ class to serialize to a buffer

2014-01-08 Thread Pedro Larroy
Hi

I need to serialize to a buffer and after failing by using streams and the
private OutputStream derived classes as the padding was included in the
buffer I implemented something like:



class BufferedOutputStream : public avro::OutputStream {
public:
std::vector m_buf;
BufferedOutputStream():
m_buf()
{
m_buf.reserve(4096);
}

~BufferedOutputStream()
{
}

bool next(uint8_t** data, size_t* len);

void backup(size_t len)
{
assert(m_buf.size() >= len);
m_buf.resize(m_buf.size() - len);
}

uint64_t byteCount() const
{
return m_buf.size();
}

void flush() {}
};


Any comments on this?   I think the C++ code should have something similar,
or maybe I wasn't able to find it.


Pedro.


[jira] [Commented] (AVRO-1382) Support for python3

2014-01-08 Thread Pedro Larroy (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865386#comment-13865386
 ] 

Pedro Larroy commented on AVRO-1382:


Hi Christophe, did you integrate my changes? Should I make a pull request on 
github?

> Support for python3
> ---
>
> Key: AVRO-1382
> URL: https://issues.apache.org/jira/browse/AVRO-1382
> Project: Avro
>  Issue Type: Bug
>  Components: python
>Affects Versions: 1.7.5
>Reporter: Christophe Taton
> Attachments: AVRO-1382.20131203-001922.diff, 
> AVRO-1382.20140101-123233-0800.diff, AVRO-1382.20140107-231626-0800.diff
>
>
> Hi,
> I'd need to use Avro from Python3, which would require essentially the 
> following changes, which I am happy to contribute:
>  - rewrite except statements according to new syntax
>  - rewrite print statements according to new syntax
>  - basestring becomes str
>  - update some imports (StringIO becomes io.StringIO, httplib becomes 
> http.client)
> This would apparently require branching the python code to maintain a version 
> for python2 and a separate version for python3.
> Any thoughts on how to approach this?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: unsigned types

2013-12-12 Thread Pedro Larroy
This would be very good. Is there a plan / schedule for Avro 2.0?

Pedro.


On Wed, Dec 11, 2013 at 9:16 PM, Doug Cutting  wrote:

> On Wed, Dec 11, 2013 at 9:37 AM, Pedro Larroy
> wrote:
>
> > I think it would be good to have avro as a generic
> > serialization format not only limited by jvm implementation details.
> >
>
> There are two issues here.  One is how to represent such things in Avro
> schemas and the other is how Avro schemas are mapped to programming
> languages.  The latter is much easier to alter compatibly.
>
> In Avro 1.0, for maximal interoperability and simple implementation, we
> sought to restrict schemas to types common to popular programming
> languages.  For example, we restricted map keys to strings since some
> languages don't permit other types as map keys, and we didn't directly
> support unsigned integers.
>
> One way to add support for unsigned integers would be to add new primitive
> types to Avro schemas.  We could then map the new primitive type to
> corresponding unsigned primitive types in C, C++ and C#, and perhaps to
> java.math.BigInteger in Java.  All implementations would need to be updated
> to somehow implement the new primitive type.
>
> However we cannot add new primitive types to Avro without breaking
> compatibility.  We can thus only consider adding such new primitive types
> in Avro 2.0.
>
> Another way to add support for unsigned integers in Avro would be to find a
> way to represent these as an Avro 1.0 schema.  For example, an unsigned
> 64-bit integer might be represented with the schema {"type":"fixed",
> "size":8, "is":"uint"}.  This optional extension could be defined in the
> specification.  We could then map this schema to the corresponding unsigned
> primitive types in C, C++ and C#, and perhaps to java.math.BigInteger in
> Java.  Schemas that use unsigned values would look a little less natural
> than if we add a new primitive type to Avro, but compatibility would be
> maintained.  Implementations could be updated incrementally to provide
> better support for unsigned values.
>
> With either approach we could also extend Avro IDL to better support
> unsigned types.
>
> Doug
>


Re: [jira] [Commented] (AVRO-1382) Support for python3

2013-12-12 Thread Pedro Larroy
There were just some strings replaced by ant, and I prefer to have the
python module more standalone, on the other side still we need to then put
the avro version somewhere, which so far is an ant replacement string in
the released versions.

Pedro.


On Thu, Dec 12, 2013 at 12:18 AM, Doug Cutting (JIRA) wrote:

>
> [
> https://issues.apache.org/jira/browse/AVRO-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845820#comment-13845820]
>
> Doug Cutting commented on AVRO-1382:
> 
>
> Ant is not required.
>
> All that releases require is that the top-level build.sh script works.  In
> particular, that './build.sh dist' puts binary release artifacts in the
> top-level dist/ directory, that 'test' runs unit tests, and 'clean' removes
> files generated by the other commands.
>
> If ant is replaced with some other build tool then the top-level build.sh
> should be updated to invoke the new tool rather than ant.  Some languages
> implement a lang/*/build.sh script that invokes a language-specific build
> tool and then copies source code archive files up to ../../dist.
>
> Also, if the build tools change then the top-level BUILD.txt file should
> be updated.
>
>
>
> > Support for python3
> > ---
> >
> > Key: AVRO-1382
> > URL: https://issues.apache.org/jira/browse/AVRO-1382
> > Project: Avro
> >  Issue Type: Bug
> >  Components: python
> >Affects Versions: 1.7.5
> >Reporter: Christophe Taton
> > Attachments: AVRO-1382.20131203-001922.diff
> >
> >
> > Hi,
> > I'd need to use Avro from Python3, which would require essentially the
> following changes, which I am happy to contribute:
> >  - rewrite except statements according to new syntax
> >  - rewrite print statements according to new syntax
> >  - basestring becomes str
> >  - update some imports (StringIO becomes io.StringIO, httplib becomes
> http.client)
> > This would apparently require branching the python code to maintain a
> version for python2 and a separate version for python3.
> > Any thoughts on how to approach this?
> > Thanks!
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.1.4#6159)
>


Re: unsigned types

2013-12-11 Thread Pedro Larroy
I like the idea about the "fixed" type, but it's not only about the number
range but also a matter of type correctness on decoding.
In C and C++ is common to use unsigned types for data which will always
have positive values might indeed use the full range. Then one would like
this type to be properly decoded in other languages. I'm more interested in
getting the data in the correct type in Python (int) than getting a "bytes"
object that I have to manually convert to an integer.


In my case in C++ i'm using the following hack to encode uint64 types:

template<>

struct codec_traits {

static void encode(Encoder &e, uint64 x)

{

avro::encode(e, static_cast(x));

}



static void decode(Decoder &d, uint64& x)

{

int64 r = 0;

avro::decode(d, r);

x = static_cast(r);

}

};

Which again, I think it's just not practical that one encodes
0xULL  in C++ and then reads -1 in other languages, you
need to build some wrapping functions around it and remember that some
fields are actually unsigned which breaks all the advantages of having a
schema.

>From the languages avro supports, C, C++ and C# have unsigned types. Python
has arbitrarily long integers, so it's not an issue.

If you think adding unsigned types is not a good idea, how would you solve
the previous problem that I stated in a matter that is convenient to read
from another languages. A bunch of bytes doesn't have the same semantics as
an unsigned integer. I think it would be good to have avro as a generic
serialization format not only limited by jvm implementation details.

Thanks.

Pedro.





On Wed, Dec 11, 2013 at 6:16 PM, Martin Kleppmann wrote:

> Personally, I think it's a good design decision that Avro doesn't support
> unsigned types.
>
> Whether you use signed or unsigned only makes a difference if you expect to
> have numbers between 2^63 and 2^64-1 (if you have numbers between 2^31 and
> 2^32-1 you can use the Avro 'long' type instead of the 'int' type). And if
> your numbers are indeed between 2^63 and 2^64-1, you're better off using a
> 'fixed' type, which will only use 8 bytes, rather than a 'long' which would
> use 10 bytes for such a large number, due to the variable-length encoding.
>
> Another problem with unsigned types can be seen in Protocol Buffers (which
> supports both signed and unsigned): if you do accidentally put -1 in a
> field with an unsigned type, the resulting encoding is ten bytes long — a
> surprising and unnecessary gotcha. (
> https://developers.google.com/protocol-buffers/docs/encoding#types)
>
> Interested to hear other opinions on the matter!
>
> Martin
>
>
> On 11 December 2013 12:38, Pedro Larroy  >wrote:
>
> > Hi
> >
> > Is there any reason except the java centric focus of avro that it
> shouldn't
> > support unsigned types? We use them extensively and I'm thinking for us*
> it
> > would be useful to have them as we use mostly C++ <-> python
> communication
> > with avro.
> >
> > Would this be accepted in the official avro distribution?
> >
> > Pedro.
> >
> >
> > *us: Here, a Nokia business.
> >
>


[jira] [Commented] (AVRO-1382) Support for python3

2013-12-11 Thread Pedro Larroy (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845527#comment-13845527
 ] 

Pedro Larroy commented on AVRO-1382:


https://pypi.python.org/pypi/avro3k/1.7.6-SNAPSHOT

> Support for python3
> ---
>
> Key: AVRO-1382
> URL: https://issues.apache.org/jira/browse/AVRO-1382
> Project: Avro
>  Issue Type: Bug
>  Components: python
>Affects Versions: 1.7.5
>Reporter: Christophe Taton
> Attachments: AVRO-1382.20131203-001922.diff
>
>
> Hi,
> I'd need to use Avro from Python3, which would require essentially the 
> following changes, which I am happy to contribute:
>  - rewrite except statements according to new syntax
>  - rewrite print statements according to new syntax
>  - basestring becomes str
>  - update some imports (StringIO becomes io.StringIO, httplib becomes 
> http.client)
> This would apparently require branching the python code to maintain a version 
> for python2 and a separate version for python3.
> Any thoughts on how to approach this?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (AVRO-1382) Support for python3

2013-12-11 Thread Pedro Larroy (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845464#comment-13845464
 ] 

Pedro Larroy commented on AVRO-1382:


Hi

Now the unittests are all passing.

> Support for python3
> ---
>
> Key: AVRO-1382
> URL: https://issues.apache.org/jira/browse/AVRO-1382
> Project: Avro
>  Issue Type: Bug
>  Components: python
>Affects Versions: 1.7.5
>Reporter: Christophe Taton
> Attachments: AVRO-1382.20131203-001922.diff
>
>
> Hi,
> I'd need to use Avro from Python3, which would require essentially the 
> following changes, which I am happy to contribute:
>  - rewrite except statements according to new syntax
>  - rewrite print statements according to new syntax
>  - basestring becomes str
>  - update some imports (StringIO becomes io.StringIO, httplib becomes 
> http.client)
> This would apparently require branching the python code to maintain a version 
> for python2 and a separate version for python3.
> Any thoughts on how to approach this?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


unsigned types

2013-12-11 Thread Pedro Larroy
Hi

Is there any reason except the java centric focus of avro that it shouldn't
support unsigned types? We use them extensively and I'm thinking for us* it
would be useful to have them as we use mostly C++ <-> python communication
with avro.

Would this be accepted in the official avro distribution?

Pedro.


*us: Here, a Nokia business.


[jira] [Commented] (AVRO-1382) Support for python3

2013-12-11 Thread Pedro Larroy (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845348#comment-13845348
 ] 

Pedro Larroy commented on AVRO-1382:


Hi
I'm also working on this as I need a python3 implementation of avro, currently 
working on fixing the unit tests and based on Christophe's branch: 

https://github.com/larroy/avro/tree/AVRO-1382

I would like to remove the dependency on ant also.

> Support for python3
> ---
>
> Key: AVRO-1382
> URL: https://issues.apache.org/jira/browse/AVRO-1382
> Project: Avro
>  Issue Type: Bug
>  Components: python
>Affects Versions: 1.7.5
>Reporter: Christophe Taton
> Attachments: AVRO-1382.20131203-001922.diff
>
>
> Hi,
> I'd need to use Avro from Python3, which would require essentially the 
> following changes, which I am happy to contribute:
>  - rewrite except statements according to new syntax
>  - rewrite print statements according to new syntax
>  - basestring becomes str
>  - update some imports (StringIO becomes io.StringIO, httplib becomes 
> http.client)
> This would apparently require branching the python code to maintain a version 
> for python2 and a separate version for python3.
> Any thoughts on how to approach this?
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)