[jira] [Resolved] (AVRO-1190) C++ json parser fails to decode multibyte unicode code points
[ https://issues.apache.org/jira/browse/AVRO-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. resolved AVRO-1190. --- Resolution: Fixed Fix Version/s: 1.9.0 Merged the Pull Request > C++ json parser fails to decode multibyte unicode code points > - > > Key: AVRO-1190 > URL: https://issues.apache.org/jira/browse/AVRO-1190 > Project: Apache Avro > Issue Type: Bug > Components: c++ >Affects Versions: 1.7.0 >Reporter: Keh-Li Sheng >Priority: Major > Fix For: 1.9.0 > > > The parser in JsonIO.cc does not handle decoding a multibyte unicode > character into any kind of valid character encoding for a std::string in c++. > The following snippet from JsonParser::tryString() has several flaws: > 1. sv is a std::string used as a vector, where each unit is a char > 2. a single unicode hex quad encoded in JSON can represent a 16-bit value > 3. a unicode hex quad can represent a "high surrogate" character meaning that > it must be combined with the following quad to derive the full unicode code > point > 4. \U is not a valid unicode escape for JSON (see > http://www.ietf.org/rfc/rfc4627.txt) > {code:title=JsonIO.cc} > case 'u': > case 'U': > { > unsigned int n = 0; > char e[4]; > in_.readBytes(reinterpret_cast(e), 4); > for (int i = 0; i < 4; i++) { > n *= 16; > char c = e[i]; > if (isdigit(c)) { > n += c - '0'; > } else if (c >= 'a' && c <= 'f') { > n += c - 'a' + 10; > } else if (c >= 'A' && c <= 'F') { > n += c - 'A' + 10; > } else { > throw unexpected(c); > } > } > sv.push_back(n); > } > {code} > This code loop creates a temporary int then decodes the quad into it and then > simply pushes the int (which may be a 16-bit value) onto the std::string. > This essentially means that the JSON parser does not decode any unicode > characters. For example, this JSON string: > {noformat} > "Dress up if you dare! Free cover all night! \uD83C\uDF83\uD83D\uDC7B" > {noformat} > results in a decoded byte sequence for the last 4 characters: > {noformat} > 3C 83 3D 7B 00 > {noformat} > where you can see that it simply drops the high order bytes. In this > particular example, \uD83C is a high-surrogate character which requires some > additional handling. I am not sure what users of the c++ library expect the > encoding to be, but given that we are working with json and given that avro > c++ uses char instead of wchar, I would assume users would expect a UTF-8 > encoded string. However, I could be wrong. There are many examples of > decoders that handle this string properly - I found this one helpful while > implementing a fix: http://rishida.net/tools/conversion/ > For basics on UTF-8 http://www.utf-8.com/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-1190) C++ json parser fails to decode multibyte unicode code points
[ https://issues.apache.org/jira/browse/AVRO-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730900#comment-16730900 ] ASF subversion and git services commented on AVRO-1190: --- Commit 8f94f5647b6351c219fa105f37cd01f156427f71 in avro's branch refs/heads/master from Thiruvalluvan M. G. [ https://gitbox.apache.org/repos/asf?p=avro.git;h=8f94f56 ] Merge pull request #417 from thiru-apache/AVRO-1190 UTF-8 support for JSON in C++ > C++ json parser fails to decode multibyte unicode code points > - > > Key: AVRO-1190 > URL: https://issues.apache.org/jira/browse/AVRO-1190 > Project: Apache Avro > Issue Type: Bug > Components: c++ >Affects Versions: 1.7.0 >Reporter: Keh-Li Sheng >Priority: Major > > The parser in JsonIO.cc does not handle decoding a multibyte unicode > character into any kind of valid character encoding for a std::string in c++. > The following snippet from JsonParser::tryString() has several flaws: > 1. sv is a std::string used as a vector, where each unit is a char > 2. a single unicode hex quad encoded in JSON can represent a 16-bit value > 3. a unicode hex quad can represent a "high surrogate" character meaning that > it must be combined with the following quad to derive the full unicode code > point > 4. \U is not a valid unicode escape for JSON (see > http://www.ietf.org/rfc/rfc4627.txt) > {code:title=JsonIO.cc} > case 'u': > case 'U': > { > unsigned int n = 0; > char e[4]; > in_.readBytes(reinterpret_cast(e), 4); > for (int i = 0; i < 4; i++) { > n *= 16; > char c = e[i]; > if (isdigit(c)) { > n += c - '0'; > } else if (c >= 'a' && c <= 'f') { > n += c - 'a' + 10; > } else if (c >= 'A' && c <= 'F') { > n += c - 'A' + 10; > } else { > throw unexpected(c); > } > } > sv.push_back(n); > } > {code} > This code loop creates a temporary int then decodes the quad into it and then > simply pushes the int (which may be a 16-bit value) onto the std::string. > This essentially means that the JSON parser does not decode any unicode > characters. For example, this JSON string: > {noformat} > "Dress up if you dare! Free cover all night! \uD83C\uDF83\uD83D\uDC7B" > {noformat} > results in a decoded byte sequence for the last 4 characters: > {noformat} > 3C 83 3D 7B 00 > {noformat} > where you can see that it simply drops the high order bytes. In this > particular example, \uD83C is a high-surrogate character which requires some > additional handling. I am not sure what users of the c++ library expect the > encoding to be, but given that we are working with json and given that avro > c++ uses char instead of wchar, I would assume users would expect a UTF-8 > encoded string. However, I could be wrong. There are many examples of > decoders that handle this string properly - I found this one helpful while > implementing a fix: http://rishida.net/tools/conversion/ > For basics on UTF-8 http://www.utf-8.com/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1137) Could we have a folder with examples/samples in the source code
[ https://issues.apache.org/jira/browse/AVRO-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1137: -- Component/s: build > Could we have a folder with examples/samples in the source code > --- > > Key: AVRO-1137 > URL: https://issues.apache.org/jira/browse/AVRO-1137 > Project: Apache Avro > Issue Type: Improvement > Components: build > Environment: all >Reporter: Ajo Fod >Priority: Major > Attachments: avro.helloworld.zip > > > I don't know if there are a collection of examples of usages of Avro anywhere > (something like jfreechart has). I've recently posted on stack overflow an > example of a problem I ran into: > http://stackoverflow.com/questions/11866466/using-apache-avro-reflect -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1774) Update documentation with instructions/examples for using different code generation template
[ https://issues.apache.org/jira/browse/AVRO-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1774: -- Component/s: doc > Update documentation with instructions/examples for using different code > generation template > > > Key: AVRO-1774 > URL: https://issues.apache.org/jira/browse/AVRO-1774 > Project: Apache Avro > Issue Type: Improvement > Components: doc >Affects Versions: 1.7.7 >Reporter: Jake Robb >Priority: Major > > AVRO-1209 added a template for generating immutable classes. I can't find > anything in the Avro docs that tells me how to use it, and it is not obvious > to me (as a total n00b to Avro) how to do so. It seems like I'm supposed to > specify a different {{templateDirectory}} to avro-maven-plugin, but I'm not > sure what value to provide there. > Hopefully this is an easy one. I'll keep trying to figure it out, and if I do > so before anybody beats me to it, I'll happily write some instructions in a > comment here so that someone with edit privs to the docs can just paste it in > (or should it be in the Wiki section of the docs? It's unclear why there are > Wiki and non-Wiki docs, or what should go in which part). :) > Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1269) AVRO is converting ORACLE,Netezza,Teradata decmials & long integers to Strings.
[ https://issues.apache.org/jira/browse/AVRO-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1269: -- Component/s: java > AVRO is converting ORACLE,Netezza,Teradata decmials & long integers to > Strings. > --- > > Key: AVRO-1269 > URL: https://issues.apache.org/jira/browse/AVRO-1269 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.1 >Reporter: Prasad Dasari >Priority: Major > > I tried to sqoop ORALCE,NETEZZA,TERADATA tables with AVRO foramt using plain > JDBC (without using Cloudera connectors). I can see DECIMAL & NUMERIC data > types are being converted to AVRO Strings. > Oracle --NUMBER & INTEGER data types are being converted to > AVRO String format. > NETEZZA-- DECIMAL,NUMERIC data types are converted to AVRO String > format. > Teradata -- DECIMAL AND LONG data types are converted to AVRO String > format. > When i tried with map-columns to BigDecimal,BigInteger i can see AVRO does > not support BigDecimal error message. > Thanks, > Prasad Dasari. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1180) Broken links on Code Review Checklist page on confluence
[ https://issues.apache.org/jira/browse/AVRO-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1180: -- Component/s: build > Broken links on Code Review Checklist page on confluence > > > Key: AVRO-1180 > URL: https://issues.apache.org/jira/browse/AVRO-1180 > Project: Apache Avro > Issue Type: Task > Components: build >Reporter: Pradeep Gollakota >Priority: Trivial > > The [Code Review > Checklist|https://cwiki.apache.org/confluence/display/AVRO/Code+Review+Checklist] > has two broken links. > The link referencing Sun's code conventions points to > http://java.sun.com/docs/codeconv/ > This link should be updated to (I'm guessing) > http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html > The link referencing Log4j Level's is pointing to > http://logging.apache.org/log4j/docs/api/org/apache/log4j/Level.html > This should be updated to > https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1025) migrate website & dist to svnpubsub
[ https://issues.apache.org/jira/browse/AVRO-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1025: -- Component/s: build > migrate website & dist to svnpubsub > --- > > Key: AVRO-1025 > URL: https://issues.apache.org/jira/browse/AVRO-1025 > Project: Apache Avro > Issue Type: Improvement > Components: build >Reporter: Doug Cutting >Assignee: Doug Cutting >Priority: Major > > ASF infrastructure has requested that all projects migrate to svnpubsub for > their websites and release distributions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1557) downloads AVRO from the website
[ https://issues.apache.org/jira/browse/AVRO-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1557: -- Component/s: build > downloads AVRO from the website > > > Key: AVRO-1557 > URL: https://issues.apache.org/jira/browse/AVRO-1557 > Project: Apache Avro > Issue Type: Improvement > Components: build > Environment: web site >Reporter: evgeny >Priority: Major > Original Estimate: 1h > Remaining Estimate: 1h > > Hi , > I think we have a little problem with the main Avro web site. > Usually, groups put the link to the last build version and current stable > version under static URL, allows to automatic tools and peoples download it > easily . > I believe many administrators will appreciate it . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2030) Fix broken URL to "this book chapter" about Rabin fingerprints in 1.8.1 spec
[ https://issues.apache.org/jira/browse/AVRO-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2030: -- Component/s: build > Fix broken URL to "this book chapter" about Rabin fingerprints in 1.8.1 spec > > > Key: AVRO-2030 > URL: https://issues.apache.org/jira/browse/AVRO-2030 > Project: Apache Avro > Issue Type: Task > Components: build >Affects Versions: 1.8.1 >Reporter: CJ Gaconnet >Priority: Trivial > > The [1.8.1 > specification|https://avro.apache.org/docs/current/spec.html#N1088B] has a > sentence saying: > bq. Readers interested in the mathematics behind this algorithm may want to > read [this book chapter|http://www.scribd.com/fb-6001967/d/84795-Crc]. > The URL http://www.scribd.com/fb-6001967/d/84795-Crc now serves up a 404. > Does anyone know what book the link was pointing to? > Searching around leads me to think it was probably pointing to "14-2 Theory" > of _Hacker's Delight_ by Henry S. Warren. If so, it would be nice to either > update or remove the hyperlink and have the text cite the book and chapter by > name so that interested readers can still find it even if an updated link > were to go away. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1059) Apache project branding requirements: DOAP file [PATCH]
[ https://issues.apache.org/jira/browse/AVRO-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1059: -- Component/s: build > Apache project branding requirements: DOAP file [PATCH] > --- > > Key: AVRO-1059 > URL: https://issues.apache.org/jira/browse/AVRO-1059 > Project: Apache Avro > Issue Type: Improvement > Components: build >Reporter: Shane Curcuru >Priority: Major > Attachments: doap_Avro.rdf > > > Attached. Re: http://www.apache.org/foundation/marks/pmcs > See Also: http://projects.apache.org/create.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1924) Variable named 'date' in IDL
[ https://issues.apache.org/jira/browse/AVRO-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1924: -- Component/s: java > Variable named 'date' in IDL > > > Key: AVRO-1924 > URL: https://issues.apache.org/jira/browse/AVRO-1924 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.1 >Reporter: Niels Basjes >Assignee: Ryan Blue >Priority: Critical > > I was compiling Apache Parquet and found that the switch from Avro 1.8.0 to > 1.8.1 broke their build. > The error: {code} > [ERROR] Failed to execute goal > org.apache.avro:avro-maven-plugin:1.8.1:idl-protocol (schemas) ... > org.apache.avro.compiler.idl.ParseException: Encountered " "date" "date "" at > line 23, column 14. > [ERROR] Was expecting one of: > [ERROR] ... > [ERROR] "@" ... > [ERROR] "`" ... > [ERROR] -> [Help 1] > {code} > As it turns out they have a test idl that contains this: > {code} > @namespace("org.apache.parquet.avro") > protocol Cars { > record Service { > long date; > } > } > {code} > And this change AVRO-1684 turned the word 'date' into something different for > the idl compiler. > So changing the word 'date' into something else fixes the problem. > Yet I think this is an undesirable effect for end user applications. > [~rdblue]: I assigned this to you implemented the mentioned change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-425) Would be very helpful if there was a wireshark "plugin" for decoding the binary wireformat AVRO uses.
[ https://issues.apache.org/jira/browse/AVRO-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-425: - Component/s: misc > Would be very helpful if there was a wireshark "plugin" for decoding the > binary wireformat AVRO uses. > - > > Key: AVRO-425 > URL: https://issues.apache.org/jira/browse/AVRO-425 > Project: Apache Avro > Issue Type: Improvement > Components: misc > Environment: Wireshark >Reporter: Mark Wolfe >Priority: Major > Labels: avro, wireshark > > This would be of great assistance to developers and network engineers when > debugging issues in production environments using AVRO. > It would certainly make adoption of this format easier for new developers in > the longer term. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2117) Overall cleanup of code
[ https://issues.apache.org/jira/browse/AVRO-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2117: -- Component/s: misc > Overall cleanup of code > --- > > Key: AVRO-2117 > URL: https://issues.apache.org/jira/browse/AVRO-2117 > Project: Apache Avro > Issue Type: Improvement > Components: misc >Reporter: Niels Basjes >Assignee: Niels Basjes >Priority: Major > > When opening Avro in my IDE I see lots of warnings and notifications that are > easy to fix. > I'm going to pick up several types of those issues (only on master / 1.9.0 !) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1187) Dart codegen + JSON encoding/decoding
[ https://issues.apache.org/jira/browse/AVRO-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1187: -- Component/s: misc > Dart codegen + JSON encoding/decoding > - > > Key: AVRO-1187 > URL: https://issues.apache.org/jira/browse/AVRO-1187 > Project: Apache Avro > Issue Type: Wish > Components: misc >Reporter: Quinn Slack >Priority: Major > > It would be nice to have [Dart|http://www.dartlang.org/] codegen and JSON > encoding/decoding support. > There has been some (unfinished) work on protobuf support for Dart: > http://code.google.com/p/dart/issues/detail?id=951 > https://chromiumcodereview.appspot.com/user/Dan%20Rice > https://chromiumcodereview.appspot.com/10595002/ > But there are no Dart implementations of Avro, as best I can determine. If > anybody is aware of any, or interested in helping create one, please post > here. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1791) Please delete old releases from mirroring system
[ https://issues.apache.org/jira/browse/AVRO-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1791: -- Component/s: misc > Please delete old releases from mirroring system > > > Key: AVRO-1791 > URL: https://issues.apache.org/jira/browse/AVRO-1791 > Project: Apache Avro > Issue Type: Bug > Components: misc >Affects Versions: 1.7.7 > Environment: https://dist.apache.org/repos/dist/release/avro/ >Reporter: Sebb >Priority: Major > > To reduce the load on the ASF mirrors, projects are required to delete old > releases [1] > Please can you remove all non-current releases? > i.e. 1.7.7 > Thanks! > [1] http://www.apache.org/dev/release.html#when-to-archive -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2255) Implement shellcheck automatically on Pull Requests
[ https://issues.apache.org/jira/browse/AVRO-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2255: -- Component/s: misc > Implement shellcheck automatically on Pull Requests > --- > > Key: AVRO-2255 > URL: https://issues.apache.org/jira/browse/AVRO-2255 > Project: Apache Avro > Issue Type: Improvement > Components: misc >Reporter: Michael A. Smith >Priority: Major > > In the several PRs to AVRO-2229 I suggested improvements to some of the shell > scripts. Many of those improvements were bugs caught by > [https://github.com/koalaman/shellcheck.] I think we should implement > shellcheck in our automatic checks so that contributors get fast feedback on > their shell scripts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1464) Implement Avro serialization in OCaml
[ https://issues.apache.org/jira/browse/AVRO-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1464: -- Component/s: misc > Implement Avro serialization in OCaml > - > > Key: AVRO-1464 > URL: https://issues.apache.org/jira/browse/AVRO-1464 > Project: Apache Avro > Issue Type: Wish > Components: misc >Reporter: Jeff Hammerbacher >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1105) Scala API for Avro
[ https://issues.apache.org/jira/browse/AVRO-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1105: -- Component/s: misc > Scala API for Avro > -- > > Key: AVRO-1105 > URL: https://issues.apache.org/jira/browse/AVRO-1105 > Project: Apache Avro > Issue Type: New Feature > Components: misc >Reporter: Christophe Taton >Priority: Major > Attachments: avro-scala.patch > > > Umbrella issue. > Goal is to provide Scala friendly APIs for Avro records and protocols (RPCs). > Related project: http://code.google.com/p/avro-scala-compiler-plugin/ looks > dead (no change since Sep 2010). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2075) Allow SchemaCompatibility to report possibly lossy conversions
[ https://issues.apache.org/jira/browse/AVRO-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2075: -- Component/s: java > Allow SchemaCompatibility to report possibly lossy conversions > -- > > Key: AVRO-2075 > URL: https://issues.apache.org/jira/browse/AVRO-2075 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7, 1.8.2 > Environment: Java >Reporter: Anders Sundelin >Assignee: Anders Sundelin >Priority: Minor > Attachments: > 0001-AVRO-2075-Add-option-to-report-possible-data-loss-in.patch > > > It is stated in the Avro spec that int and long values are promotable to > floats and doubles. > However, numeric promotions to floats are lossy (losing precision), as is > long promotion to double. > It is suggested that the SchemaCompatibility class is updated to be able to > flag conversions that have the possibility to be lossy as errors. The > attached patch does just that, by adding a new boolean flag (allowDataLoss), > preserving backwards compatibility by defaulting this flag to true. > Testcases illustrating the problem has been added to the unit test class > TestReadingWritingDataInEvolvedSchemas -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2045) Avro should warn about corrupt EOF files
[ https://issues.apache.org/jira/browse/AVRO-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2045: -- Component/s: java > Avro should warn about corrupt EOF files > > > Key: AVRO-2045 > URL: https://issues.apache.org/jira/browse/AVRO-2045 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.6 >Reporter: Lars Volker >Assignee: Nandor Kollar >Priority: Major > > When running queries on truncated files, Impala's Avro scanner issues a > warning: > {noformat} > WARNINGS: Problem parsing file > hdfs://host.company.com:8020/tmp/datagen/some_db/some_table/col1=A/col2=B/col3=D/col4=C/2017-05-18-18-5-9-876-0.avro > at 1327214080(EOF) > Tried to read 64653 bytes but could only read 16549 bytes. This may indicate > data file corruption. (file > hdfs://host.company.com:8020/tmp/datagen/some_db/some_table/col1=A/col2=B/col3=D/col4=C/2017-05-18-18-5-9-876-0.avro, > byte offset: 1327214080) > {noformat} > {{avro-tools tojson}} eventually prints the same number of rows that Impala > reads, but does not print a warning. Instead it seems to quietly swallow the > EOFException. > I think it should print a warning instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2075) Allow SchemaCompatibility to report possibly lossy conversions
[ https://issues.apache.org/jira/browse/AVRO-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2075: -- Environment: (was: Java) > Allow SchemaCompatibility to report possibly lossy conversions > -- > > Key: AVRO-2075 > URL: https://issues.apache.org/jira/browse/AVRO-2075 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7, 1.8.2 >Reporter: Anders Sundelin >Assignee: Anders Sundelin >Priority: Minor > Attachments: > 0001-AVRO-2075-Add-option-to-report-possible-data-loss-in.patch > > > It is stated in the Avro spec that int and long values are promotable to > floats and doubles. > However, numeric promotions to floats are lossy (losing precision), as is > long promotion to double. > It is suggested that the SchemaCompatibility class is updated to be able to > flag conversions that have the possibility to be lossy as errors. The > attached patch does just that, by adding a new boolean flag (allowDataLoss), > preserving backwards compatibility by defaulting this flag to true. > Testcases illustrating the problem has been added to the unit test class > TestReadingWritingDataInEvolvedSchemas -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1950) Better Json serialization for Avro decimal logical types?
[ https://issues.apache.org/jira/browse/AVRO-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1950: -- Component/s: java > Better Json serialization for Avro decimal logical types? > - > > Key: AVRO-1950 > URL: https://issues.apache.org/jira/browse/AVRO-1950 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Zoltan Farkas >Priority: Minor > > Currently as I understand decimal logical types are encoded on top of bytes > and fixed avro types. This makes them a bit "unnatural" in the json > encoding... > I worked around a hack in my fork to naturally encode them into json > decimals. A good starting point to look at is in: > https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/io/DecimalEncoder.java > > My approach is a bit hacky, so I would be interested in suggestions to have > this closer to something we can integrate into avro... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1827) Handling correctly optional fields when converting Protobuf to Avro
[ https://issues.apache.org/jira/browse/AVRO-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1827: -- Component/s: java > Handling correctly optional fields when converting Protobuf to Avro > --- > > Key: AVRO-1827 > URL: https://issues.apache.org/jira/browse/AVRO-1827 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7, 1.8.0 >Reporter: Jakub Kahovec >Assignee: Karel Fuka >Priority: Major > Attachments: AVRO-1827.patch, AVRO-1827.patch, AVRO-1827.patch, > AVRO-1827.patch > > > Hello, > as of the current implementation of converting protobuf files into avro > format, protobuf optional fields are being given default values in the avro > schema if not specified explicitly. > So for instance when the protobuf field is defined as > {quote} > optional int64 fieldInt64 = 1; > {quote} > in the avro schema it appears as > {quote} > "name" : "fieldInt64", > "type" : "long", > "default" : 0 > {quote} > The problem with this implementation is that we are losing information about > whether the field was present or not in the original protobuf, as when we ask > for this field's value in avro we will be given the default value. > What I'm proposing instead is that if the field in the protobuf is defined as > optional and has no default value then the generated avro schema type will us > a union comprising the matching type and null type with default value null. > It is going to look like this: > {quote} > "name" : "fieldIn64", > "type" : [ "null", "long" ], > "default" : null > {quote} > I'm aware that is a breaking change but I think that is the proper way how to > handle optional fields. > I've also created a patch which fixes the conversion > Jakub -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1726) Add support for appending a variable number of blocks to DataFileWriter
[ https://issues.apache.org/jira/browse/AVRO-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1726: -- Component/s: java > Add support for appending a variable number of blocks to DataFileWriter > --- > > Key: AVRO-1726 > URL: https://issues.apache.org/jira/browse/AVRO-1726 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7 >Reporter: Bryan Bende >Priority: Minor > Labels: starter > Fix For: 1.9.0 > > Attachments: AVRO-1726-2.patch, AVRO-1726.patch > > > It would be helpful to have the ability to append a variable number of raw > blocks from a DataFileReader to a DataFileWriter, similar to appendAllFrom() > but specifying how many blocks to append. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1643) Add non-String maps as a logical type
[ https://issues.apache.org/jira/browse/AVRO-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1643: -- Component/s: java > Add non-String maps as a logical type > - > > Key: AVRO-1643 > URL: https://issues.apache.org/jira/browse/AVRO-1643 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.6, 1.7.7 >Reporter: Sachin Goyal >Priority: Minor > > Other languages might not be able to duplicate the logic in AVRO-680, so a > logical type that indicates a non-string map is indeed a map would be great. > Reference: > https://github.com/apache/avro/pull/17 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1124) RESTful service for holding schemas
[ https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1124: -- Component/s: java > RESTful service for holding schemas > --- > > Key: AVRO-1124 > URL: https://issues.apache.org/jira/browse/AVRO-1124 > Project: Apache Avro > Issue Type: New Feature > Components: java >Reporter: Jay Kreps >Assignee: Jay Kreps >Priority: Major > Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, > AVRO-1124-validators-preliminary.patch, AVRO-1124.2.patch, AVRO-1124.3.patch, > AVRO-1124.4.patch, AVRO-1124.patch, AVRO-1124.patch > > > Motivation: It is nice to be able to pass around data in serialized form but > still know the exact schema that was used to serialize it. The overhead of > storing the schema with each record is too high unless the individual records > are very large. There are workarounds for some common cases: in the case of > files a schema can be stored once with a file of many records amortizing the > per-record cost, and in the case of RPC the schema can be negotiated ahead of > time and used for many requests. For other uses, though it is nice to be able > to pass a reference to a given schema using a small id and allow this to be > looked up. Since only a small number of schemas are likely to be active for a > given data source, these can easily be cached, so the number of remote > lookups is very small (one per active schema version). > Basically this would consist of two things: > 1. A simple REST service that stores and retrieves schemas > 2. Some helper java code for fetching and caching schemas for people using > the registry > We have used something like this at LinkedIn for a few years now, and it > would be nice to standardize this facility to be able to build up common > tooling around it. This proposal will be based on what we have, but we can > change it as ideas come up. > The facilities this provides are super simple, basically you can register a > schema which gives back a unique id for it or you can query for a schema. > There is almost no code, and nothing very complex. The contract is that > before emitting/storing a record you must first publish its schema to the > registry or know that it has already been published (by checking your cache > of published schemas). When reading you check your cache and if you don't > find the id/schema pair there you query the registry to look it up. I will > explain some of the nuances in more detail below. > An added benefit of such a repository is that it makes a few other things > possible: > 1. A graphical browser of the various data types that are currently used and > all their previous forms. > 2. Automatic enforcement of compatibility rules. Data is always compatible in > the sense that the reader will always deserialize it (since they are using > the same schema as the writer) but this does not mean it is compatible with > the expectations of the reader. For example if an int field is changed to a > string that will almost certainly break anyone relying on that field. This > definition of compatibility can differ for different use cases and should > likely be pluggable. > Here is a description of one of our uses of this facility at LinkedIn. We use > this to retain a schema with "log" data end-to-end from the producing app to > various real-time consumers as well as a set of resulting AvroFile in Hadoop. > This schema metadata can then be used to auto-create hive tables (or add new > fields to existing tables), or inferring pig fields, all without manual > intervention. One important definition of compatibility that is nice to > enforce is compatibility with historical data for a given "table". Log data > is usually loaded in an append-only manner, so if someone changes an int > field in a particular data set to be a string, tools like pig or hive that > expect static columns will be unusable. Even using plain-vanilla map/reduce > processing data where columns and types change willy nilly is painful. > However the person emitting this kind of data may not know all the details of > compatible schema evolution. We use the schema repository to validate that > any change made to a schema don't violate the compatibility model, and reject > the update if it does. We do this check both at run time, and also as part of > the ant task that generates specific record code (as an early warning). > Some details to consider: > Deployment > This can just be programmed against the servlet API and deploy as a standard > war. You have lots of instances and load balance traffic over them. > Persistence > The storage needs are not very heavy. The clients are expected to cache the > id=>schema mapping, and the server can
[jira] [Updated] (AVRO-2087) Allow specifying default values for logical types in human-readable form
[ https://issues.apache.org/jira/browse/AVRO-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2087: -- Component/s: spec > Allow specifying default values for logical types in human-readable form > > > Key: AVRO-2087 > URL: https://issues.apache.org/jira/browse/AVRO-2087 > Project: Apache Avro > Issue Type: New Feature > Components: spec >Reporter: Zoltan Ivanfi >Priority: Major > > Currently default values for logical types have to be specified as the binary > representation of the backing primary type. > For example, if one wanted to specify 0.00 as the default value for a decimal > field, "\u" has to be specified as the default value. If the user tries > to specify "0.00", like in AVRO-2086, it is silently accepted but results in > unexpected behaviour. This value is not parsed and interpreted as a decimal > number but is taken to be the byte representation, i.e. the corresponding > hexadecimal ASCII byte sequence 30 2E 30 30 = 80860 with a precision of 2 > results in a default decimal value of 808.60. > To set the default value to an arbitrary non-zero value, e.g., 31.80, one has > to multiply it by 10^2=100 for a precision of 2, resulting in 3180, which is > 0x0C6C when converted to hex. This means that "\u000C\u006C" has to be > specified as the default value. Having to do these calculations by hand is > not too user (programmer) friendly. > For a date or timestamp type, the default value has to be specified as a > number and not as a string, so an unexpected default value can not be set > accidentally in this case. However, one can't use a human-readable > representation in this case either, the number of days or seconds > (respectively) elapsed since the epoch must be specified, e.g., 1507216329 > for the current timestamp. > The first step towards solving this problem will be coming up with a > suggested solution. Once we have that, the JIRA description should be > extended with details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2078) Avro does not enforce schema resolution rules for Decimal type
[ https://issues.apache.org/jira/browse/AVRO-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2078: -- Component/s: java > Avro does not enforce schema resolution rules for Decimal type > -- > > Key: AVRO-2078 > URL: https://issues.apache.org/jira/browse/AVRO-2078 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.2 >Reporter: Anthony Hsu >Assignee: Nandor Kollar >Priority: Major > Attachments: dec.avro > > > According to http://avro.apache.org/docs/1.8.2/spec.html#Decimal > bq. For the purposes of schema resolution, two schemas that are {{decimal}} > logical types _match_ if their scales and precisions match. > This is not enforced. > I wrote a file with (precision 5, scale 2) and tried to read it with a reader > schema with (precision 3, scale 1). I expected an AvroTypeException to be > thrown, but none was thrown. > Test data file attached. The code to read it is: > {noformat:title=ReadDecimal.java} > import java.io.File; > import org.apache.avro.Schema; > import org.apache.avro.file.DataFileReader; > import org.apache.avro.generic.GenericDatumReader; > import org.apache.avro.generic.GenericRecord; > import org.apache.avro.io.DatumReader; > public class ReadDecimal { > public static void main(String[] args) throws Exception { > Schema schema = new Schema.Parser().parse("{\n" + " \"type\" : > \"record\",\n" + " \"name\" : \"some_schema\",\n" > + " \"namespace\" : \"com.howdy\",\n" + " \"fields\" : [ {\n" + " > \"name\" : \"name\",\n" > + "\"type\" : \"string\"\n" + " }, {\n" + "\"name\" : > \"value\",\n" + "\"type\" : {\n" > + " \"type\" : \"bytes\",\n" + " \"logicalType\" : > \"decimal\",\n" + " \"precision\" : 3,\n" > + " \"scale\" : 1\n" + "}\n" + " } ]\n" + "}"); > DatumReader datumReader = new GenericDatumReader<>(schema); > // dec.avro has precision 5, scale 2 > DataFileReader dataFileReader = new DataFileReader<>( > new File("/tmp/dec.avro"), datumReader); > GenericRecord foo = null; > while (dataFileReader.hasNext()) { > foo = dataFileReader.next(foo); // AvroTypeException expected due to > change in scale/precision but none occurs > } > } > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2047) NettyTransceiver can NPE when getRemoteName() is called
[ https://issues.apache.org/jira/browse/AVRO-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2047: -- Component/s: java > NettyTransceiver can NPE when getRemoteName() is called > --- > > Key: AVRO-2047 > URL: https://issues.apache.org/jira/browse/AVRO-2047 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 >Reporter: Clement Pang >Priority: Major > > NettyTransceiver can NPE if the channel is closed while a request is > underway. The correct thing to do seems to be to check for null and throw an > IOException ("underlying transport no longer available"). > {code} > ! java.lang.NullPointerException: null > ! at > org.apache.avro.ipc.NettyTransceiver.getRemoteName(NettyTransceiver.java:431) > ! at org.apache.avro.ipc.Requestor.writeHandshake(Requestor.java:202) > ! at org.apache.avro.ipc.Requestor.access$300(Requestor.java:52) > ! at org.apache.avro.ipc.Requestor$Request.getBytes(Requestor.java:478) > ! at org.apache.avro.ipc.Requestor.request(Requestor.java:181) > ! at org.apache.avro.ipc.Requestor.request(Requestor.java:129) > ! at > org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:84) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2099) Decimal precision is ignored
[ https://issues.apache.org/jira/browse/AVRO-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2099: -- Component/s: spec > Decimal precision is ignored > > > Key: AVRO-2099 > URL: https://issues.apache.org/jira/browse/AVRO-2099 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Reporter: Kornel Kiełczewski >Priority: Major > > According to the documentation > https://avro.apache.org/docs/1.8.1/spec.html#Decimal > {quote} > The decimal logical type represents an arbitrary-precision signed decimal > number of the form unscaled × 10-scale. > {quote} > Then in the schema we might have an entry like: > {code} > { > "type": "bytes", > "logicalType": "decimal", > "precision": 4, > "scale": 2 > } > {code} > However, in the java deserialization I see that the precision is ignored: > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Conversions.java#L79 > {code} > @Override > public BigDecimal fromBytes(ByteBuffer value, Schema schema, LogicalType > type) { > int scale = ((LogicalTypes.Decimal) type).getScale(); > // always copy the bytes out because BigInteger has no offset/length > ctor > byte[] bytes = new byte[value.remaining()]; > value.get(bytes); > return new BigDecimal(new BigInteger(bytes), scale); > } > {code} > The logical type definition in the java api requires the precision to be set: > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/LogicalTypes.java#L116 > {code} > /** Create a Decimal LogicalType with the given precision and scale */ > public static Decimal decimal(int precision, int scale) { > return new Decimal(precision, scale); > } > {code} > Is this a feature, that we allow arbitrary precision? If so, why do we have > the precision in the API and schema, if it's ignored? > Maybe that's some java specific issue? > Thanks for any hints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1763) Avro Schema Generator to handle polymorphic types
[ https://issues.apache.org/jira/browse/AVRO-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1763: -- Component/s: spec > Avro Schema Generator to handle polymorphic types > - > > Key: AVRO-1763 > URL: https://issues.apache.org/jira/browse/AVRO-1763 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Affects Versions: 1.9.0 >Reporter: Qiangqiang Shi >Priority: Major > > Inheritance and polymorphism are widely used in Java libraries. If multiple > sophisticated Avro schema generator can be added to Avro, users can generate > Avro schema easily for classes in complex context, third-party code and > legacy code. > For example, for the following class: > {code:java} > public class TestReflectPolymorphismData { > { > public static class SuperclassA1 { > private String SuperclassA1; > } > public static class SubclassA1 extends SuperclassA1 { > private String SubclassA1; > } > public static class SubclassA2 extends SuperclassA1 { > private String SubclassA2; > } > public static class SuperB1 { > private SubclassA1 SubclassA1; > private List SubclassA2List; > private Map stringSuperclassA1Map; > private Map integerSuperclassA1Map; > } > } > } > {code} > It'll be good if Avro can provide a schema generator to generate a schema > like the following automatically for class SuperB1 : > {code:java} > { > "type": "record", > "name": "SuperB1", > "namespace": "org.apache.avro.reflect.TestReflectPolymorphismData$", > "fields": [ > { > "name": "SubclassA1", > "type": { > "type": "record", > "name": "SuperclassA1", > "fields": [ > { > "name": "SuperclassA1", > "type": "string" > }, > { > "name": "SuperclassA1Subclasses", > "type": [ > "null", > { > "type": "record", > "name": "SubclassA1", > "fields": [ > { > "name": "SubclassA1", > "type": "string" > } > ] > }, > { > "type": "record", > "name": "SubclassA2", > "fields": [ > { > "name": "SubclassA2", > "type": "string" > } > ] > } > ] > } > ] > } > }, > { > "name": "SubclassA2List", > "type": { > "type": "array", > "items": "SuperclassA1", > "java-class": "java.util.List" > } > }, > { > "name": "stringSuperclassA1Map", > "type": { > "type": "map", > "values": "SuperclassA1" > } > }, > { > "name": "integerSuperclassA1Map", > "type": { > "type": "array", > "items": { > "type": "record", > "name": "Pair34255fab6d3d79ff", > "namespace": "org.apache.avro.reflect", > "fields": [ > { > "name": "key", > "type": "int" > }, > { > "name": "value", > "type": > "org.apache.avro.reflect.TestReflectPolymorphismData$.SuperclassA1" > } > ] > }, > "java-class": "java.util.Map" > } > } > ] > } > {code} > related story: AVRO-1568 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1752) Aliases for enum symbols.
[ https://issues.apache.org/jira/browse/AVRO-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1752: -- Component/s: spec > Aliases for enum symbols. > - > > Key: AVRO-1752 > URL: https://issues.apache.org/jira/browse/AVRO-1752 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Reporter: Zoltan Farkas >Priority: Minor > > Currently named types and fields might have aliases acording to the spec. > It would be great if enum symbols could have aliases as well... > This would be useful to compatibly fix misspellings... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1934) Avro test resources reference old avro dev versions
[ https://issues.apache.org/jira/browse/AVRO-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1934: -- Component/s: java > Avro test resources reference old avro dev versions > --- > > Key: AVRO-1934 > URL: https://issues.apache.org/jira/browse/AVRO-1934 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.1 >Reporter: Zoltan Farkas >Priority: Minor > > For example: > https://github.com/apache/avro/blob/master/lang/java/maven-plugin/src/test/resources/unit/idl/pom.xml > > references 1.7.3-SNAPSHOT: > {code} > > avro-parent > org.apache.avro > 1.7.3-SNAPSHOT > ../../../../../../../../../ > > {code} > this does not seem right. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1911) for avro HTTP content type instead of avro/binary, application/octet-stream;fmt=avro might be more appropriate?
[ https://issues.apache.org/jira/browse/AVRO-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1911: -- Component/s: java > for avro HTTP content type instead of avro/binary, > application/octet-stream;fmt=avro might be more appropriate? > --- > > Key: AVRO-1911 > URL: https://issues.apache.org/jira/browse/AVRO-1911 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Zoltan Farkas >Priority: Major > > the content type is defined in: > {code} > /** An HTTP-based {@link Transceiver} implementation. */ > public class HttpTransceiver extends Transceiver { > static final String CONTENT_TYPE = "avro/binary"; > {code} > I suggest using for avro binary: > application/octet-stream;fmt=avro > and for avro json: > application/json;fmt=avro > this would take advantage of standard mime types... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1850) Align JSON and binary record serialization
[ https://issues.apache.org/jira/browse/AVRO-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1850: -- Component/s: spec > Align JSON and binary record serialization > -- > > Key: AVRO-1850 > URL: https://issues.apache.org/jira/browse/AVRO-1850 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Reporter: David Kay >Priority: Major > Labels: Encoding, Record > > The documentation describes the encoding of Avro records as: > bq.Binary: A record is encoded by encoding the values of its fields in the > order that they are declared. In other words, a record is encoded as just the > concatenation of the encodings of its fields. Field values are encoded per > their schema. > bq.JSON: Except for unions, the JSON encoding is the same as is used to > encode field default values. > The _field default values_ table says that records and maps are both encoded > as JSON type _object_. > *Enhancement:* > There is currently no way to write an Avro schema describing a JSON array of > positional parameters (i.e. an array containing variables of possibly > different type). An Avro record is the datatype representing an ordered > collection of values. For consistency with the binary encoding, and to allow > Avro to represent a schema for JSON tuples, encoding should say: > bq.JSON: Except for unions and records, the JSON encoding is the same as is > used to encode field default values. A record is encoded as an array by > encoding the values of its fields in the order that they are declared. > For the example schema: > {noformat} > {"namespace": "example.avro", > "type": "record", > "name": "User", > "fields": [ > {"name": "name", "type": "string"}, > {"name": "favorite_number", "type": ["int", "null"]}, > {"name": "favorite_color", "type": ["string", "null"]} > ] > } > {noformat} > the JSON encoding currently converts an Avro record to an Avro map (JSON > object): > {noformat} > { "name": "Joe", > "favorite_number": 42, > "favorite_color": null } > {noformat} > Instead Avro records should be encoded in JSON in the same manner as they are > encoded in binary, as a JSON array containing the fields in the order they > are defined: > {noformat} > ["Joe", 42, null] > {noformat} > The set of JSON texts validated by the example Avro schema and by the > corresponding JSON schema should be equal: > {noformat} > { > "$schema": "http://json-schema.org/draft-04/schema#;, > "type": "array", > "name": "User", > "items": [ > {"name":"name", "type": "string"}, > {"name":"favorite_number", "type":["integer","null"]}, > {"name":"favorite_color", "type":["string","null"]} > ] > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1768) stdin support for getschema
[ https://issues.apache.org/jira/browse/AVRO-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1768: -- Component/s: java > stdin support for getschema > --- > > Key: AVRO-1768 > URL: https://issues.apache.org/jira/browse/AVRO-1768 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.7 >Reporter: Bennie Schut >Priority: Minor > > It would be nice to support reading from stdin on getschema calls so you > don't need a local file first. > Somewhat similar to AVRO-1583. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1717) [Github] Support for optional fields when converting json to avro
[ https://issues.apache.org/jira/browse/AVRO-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1717: -- Component/s: java > [Github] Support for optional fields when converting json to avro > - > > Key: AVRO-1717 > URL: https://issues.apache.org/jira/browse/AVRO-1717 > Project: Apache Avro > Issue Type: Bug > Components: java >Reporter: Bartosz Wojtkiewicz >Priority: Major > > Currently there is an issue when we want to convert json object to avro using > schema that allows optional fields (fields of type 'null'). When json object > does not explicitly have such field with 'null' value then is treated as not > conforming to schema. I added few test cases that illustrate this problem. > PR: https://github.com/apache/avro/pull/47 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1714) Nullable Named Schema definition in IDL fails.
[ https://issues.apache.org/jira/browse/AVRO-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1714: -- Component/s: java > Nullable Named Schema definition in IDL fails. > -- > > Key: AVRO-1714 > URL: https://issues.apache.org/jira/browse/AVRO-1714 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.6 >Reporter: Mark Perris >Priority: Major > > According to Section 7.2 of the Avro IDL, named schemata may be treated as > primitive types. > As such, I believe it should be possible to create a nullable schema > reference: > {code} >record coordinate { > string type; >} >record tweet { > union {null, coordinate} coordinate; >} > {code} > to accommodate > {code} > { >"coordinate":{ > "type":"point" >} > } > {code} and {code} > { >"coordinate":null > } > {code} > however, any attempt to store data against that schema results in > {code} > Exception in thread "main" org.apache.avro.AvroTypeException: Unknown union > branch type > at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:445) > at > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290) > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) > at > org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99) > at org.apache.avro.tool.Main.run(Main.java:85) > at org.apache.avro.tool.Main.main(Main.java:74) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1665) Provide way to represent BYTES type using base64 encoding in JSON
[ https://issues.apache.org/jira/browse/AVRO-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1665: -- Component/s: spec > Provide way to represent BYTES type using base64 encoding in JSON > - > > Key: AVRO-1665 > URL: https://issues.apache.org/jira/browse/AVRO-1665 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Affects Versions: 1.7.7 >Reporter: Konstantin Shaposhnikov >Priority: Major > > Currently JsonEncoder and JsonDecoder represent BYTES type as String encoded > using ISO-8859-1 charset. > It would be good to provide option to use base64 encoding (e.g. using jackson > JsonGenerator.writeBinary(byte[] data, int offset, int len) method). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1631) Support for field long names
[ https://issues.apache.org/jira/browse/AVRO-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1631: -- Component/s: spec > Support for field long names > > > Key: AVRO-1631 > URL: https://issues.apache.org/jira/browse/AVRO-1631 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Reporter: Nikoleta Verbeck >Priority: Minor > > It would be of benefit to allow for a way to define a different aliases to > reference a field by then just its name value. > The use case for this would be when you have a defined spec for communicating > between two services, and within this spec fields use short names like bId. > But within code you would like to reference that field as a longer, more > descriptive form. Example; setBidderId/getBidderId vs setBId/getBId. > Aliases somewhat solve this but only from a one sided approach (Read or > Write) not a bidirectional (Read and Write). The only way to make aliases > work in a bidirectional way would be to define two records of the same field > set but with the field name and alias values swapped. Basically creating 1 > record for reading data and the other for writing data. > One option to improve this would be to expose all field aliases as getters > and setters. Another would be to add another attribute to the field def such > as 'as' or 'knownAs'. > Example of option two: > {code:title=Option2.avsc} > { > "namespace":"options", > "type":"record", > "name":"Bidder", > "fields":[ > {"name":"bId", "as":"bidderId", "value":"string"} > ] > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2148) Avro: Schema compatibility/evolution - attribute size change
[ https://issues.apache.org/jira/browse/AVRO-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2148: -- Component/s: spec > Avro: Schema compatibility/evolution - attribute size change > > > Key: AVRO-2148 > URL: https://issues.apache.org/jira/browse/AVRO-2148 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Affects Versions: 1.8.2 >Reporter: Ashok >Priority: Critical > Labels: patch > > Let's assume the schema changed from V1 to V2. Currently, we can't create a > merge of the two schema that makes it a compatible for both. Depending on > what size we keep for the specific attribute in the reader schema (the > merged), you can read avro files of schema V1 or V2, but not both. If we > keep the higher attribute size value (64) as part of the merged schema, it > should allow the read of avro files with lower attribute size value (16) > > * *V1 schema:* > { "name": "sid", "type": [ "null", > { "type": "fixed", "name": "SID", "namespace": "com.int.datatype", "doc": "", > "size": *64* } > ], "doc": "", "default": null, "businessLogic": "" } > > * *V2 schema:* > { "name": "sid", "type": [ "null", > { "type": "fixed", "name": "SID", "namespace": "com.int.datatype", "doc": "", > "size": *16* } > ], "doc": "", "default": null, "businessLogic": "" } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1612) typo in documentation for "fixed" type
[ https://issues.apache.org/jira/browse/AVRO-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1612: -- Component/s: spec > typo in documentation for "fixed" type > -- > > Key: AVRO-1612 > URL: https://issues.apache.org/jira/browse/AVRO-1612 > Project: Apache Avro > Issue Type: Bug > Components: spec >Reporter: Peter Amstutz >Priority: Minor > > There appears to be a cut-and-paste error in the documentation for the > "Fixed" type. The "namespace" and "aliases" fields probably shouldn't be > there. > Text of the current online documentation (1.7.7): > https://avro.apache.org/docs/current/spec.html#Fixed > Fixed > Fixed uses the type name "fixed" and supports two attributes: > name: a string naming this fixed (required). > namespace, a string that qualifies the name; > aliases: a JSON array of strings, providing alternate names for this enum > (optional). > size: an integer, specifying the number of bytes per value (required). > For example, 16-byte quantity may be declared with: > {"type": "fixed", "size": 16, "name": "md5"} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1571) Support parameterized types in Avro
[ https://issues.apache.org/jira/browse/AVRO-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1571: -- Component/s: java > Support parameterized types in Avro > --- > > Key: AVRO-1571 > URL: https://issues.apache.org/jira/browse/AVRO-1571 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.6, 1.7.7, 1.8.1 >Reporter: Sachin Goyal >Priority: Major > Attachments: ParameterizedTypesTest.java > > > The below code cannot be serialized by Avro. > {code} > class Leaf { > P p; > Q q; > } > class Root { > Middle1 m1; > Middle2 m2; > Middle3 m3; > } > class Middle1 { > Leaf foo; > } > class Middle2 { > Leaf foo; > } > class Middle3 { > Leaf foo; > } > {code} > This is because when generating the schema, only the current class is used to > generate the schema. > The parent class' context is missing in ReflectData#createSchema() functions > where the actual type-information is actually present. > Please see the attached test too for a simpler case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2150) Improved idl syntax support for "marker properties"
[ https://issues.apache.org/jira/browse/AVRO-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2150: -- Component/s: spec > Improved idl syntax support for "marker properties" > --- > > Key: AVRO-2150 > URL: https://issues.apache.org/jira/browse/AVRO-2150 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Reporter: Zoltan Farkas >Priority: Minor > > It would be nice to allow in IDL "marker properties" like: > {code} > @MarkerProperty > record TestRecord { > > } > {code} > this would be only a simpler syntax for: > {code} > @MarkerProperty("") > record TestRecord { > > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1122) Java: Avro RPC Requestor can block during handshake in async mode
[ https://issues.apache.org/jira/browse/AVRO-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1122: -- Component/s: java > Java: Avro RPC Requestor can block during handshake in async mode > - > > Key: AVRO-1122 > URL: https://issues.apache.org/jira/browse/AVRO-1122 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.6.3 >Reporter: Mike Percy >Priority: Major > Attachments: Screen Shot 2012-06-27 at 12.43.32 AM.png > > > We are seeing an issue in Flume where the Avro RPC Requestor is blocking for > long periods of time waiting for the Avro handshake to complete. Since we are > using the API with Futures, this should not block. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1562) Add support for types extending Maps/Collections
[ https://issues.apache.org/jira/browse/AVRO-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1562: -- Component/s: java > Add support for types extending Maps/Collections > > > Key: AVRO-1562 > URL: https://issues.apache.org/jira/browse/AVRO-1562 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.6 >Reporter: Sachin Goyal >Priority: Major > Attachments: custom_map_and_collections1.patch > > > Consider the following code: > {code} > import java.io.ByteArrayOutputStream; > import java.util.*; > import org.apache.avro.Schema; > import org.apache.avro.file.DataFileWriter; > import org.apache.avro.reflect.ReflectData; > import org.apache.avro.reflect.ReflectDatumWriter; > public class AvroDerivingMaps > { > public static void main (String [] args) throws Exception > { > MapDerivedContainer orig = new MapDerivedContainer(); > ReflectData rdata = ReflectData.AllowNull.get(); > Schema schema = rdata.getSchema(MapDerivedContainer.class); > System.out.println(schema); > > ReflectDatumWriter datumWriter = new > ReflectDatumWriter (MapDerivedContainer.class, rdata); > DataFileWriter fileWriter = new > DataFileWriter (datumWriter); > ByteArrayOutputStream baos = new ByteArrayOutputStream(); > fileWriter.create(schema, baos); > fileWriter.append(orig); > fileWriter.close(); > } > } > class MapDerived extends HashMap > { > Integer a = 1; > String b = "b"; > } > class MapDerivedContainer > { > MapDerived2 map = new MapDerived2(); > } > class MapDerived2 extends MapDerived > { > String c = "c"; > } > {code} > \\ > \\ > It throws the following exception: > {code:javascript} > {"type":"record","name":"MapDerivedContainer","namespace":"avro","fields":[{"name":"map","type":["null",{"type":"record","name":"MapDerived2","fields":[{"name":"c","type":["null","string"],"default":null},{"name":"a","type":["null","int"],"default":null},{"name":"b","type":["null","string"],"default":null}]}],"default":null}]} > {code} > {color:brown} > Exception in thread "main" > org.apache.avro.file.DataFileWriter$AppendWriteException: > org.apache.avro.UnresolvedUnionException: > Caused by: org.apache.avro.UnresolvedUnionException: Not in union > ["null",{"type":"record","name":"MapDerived2","namespace":"avro","fields":[{"name":"c","type":["null","string"],"default":null},{"name":"a","type":["null","int"],"default":null},{"name":"b","type":["null","string"],"default":null}]}]: > {} > at > org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:600) > at > org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71) > at > org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145) > at > org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114) > at > org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:203) > at > org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) > at > org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290) > ... 1 more > {color} > \\ > \\ > It appears that ReflectData#createSchema() checks for "type instanceof > ParameterizedType" and because of this, it skips handling of the map. > The same is not true of GenericData#isMap() and GenericData#resolveUnion() > fails because of this. > The same may be true for classes extending ArrayList, Collection, Set etc. > Also, note the schema for the class extending Map: > {code:javascript} > { >"type":"record", >"name":"MapDerived2", >"fields":[ > { > "name":"c", > "type":[ > "null", > "string" > ], > "default":null > }, > { > "name":"a", > "type":[ > "null", > "int" > ], > "default":null > }, > { > "name":"b", > "type":[ > "null", > "string" > ], > "default":null > } >] > } > {code} > This schema ignores the Map completely. > Probably, for such a class, the schema should look like: > {code:javascript} > { >"type":"record", >"name":"MapDerived2", >"fields":[ > { >
[jira] [Updated] (AVRO-1570) ReflectData.AllowNull fails with polymorphism and @Union annotation
[ https://issues.apache.org/jira/browse/AVRO-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1570: -- Component/s: java > ReflectData.AllowNull fails with polymorphism and @Union annotation > --- > > Key: AVRO-1570 > URL: https://issues.apache.org/jira/browse/AVRO-1570 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.6 >Reporter: Sachin Goyal >Priority: Major > > Nested union exception is thrown if the following structure is serialized > with ReflectData.AllowNull > (Plain ReflectData works fine) > {code} > @Union({Derived.class}) > class Base > { >Integer a = 5; > } > class Derived extends Base > { > String b = "Foo"; > } > class PolymorphicDO > { >Base obj = new Derived(); > } > // Serialization code: > ReflectData rdata = ReflectData.AllowNull.get(); > Schema schema = rdata.getSchema(PolymorphicDO.class); > ReflectDatumWriter datumWriter = new ReflectDatumWriter > (PolymorphicDO.class, rdata); > DataFileWriter fileWriter = new DataFileWriter (datumWriter); > fileWriter.create(schema, new ByteArrayOutputStream()); > fileWriter.append(new PolymorphicDO()); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1371) Support of data encryption for Avro file
[ https://issues.apache.org/jira/browse/AVRO-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1371: -- Component/s: spec > Support of data encryption for Avro file > > > Key: AVRO-1371 > URL: https://issues.apache.org/jira/browse/AVRO-1371 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Affects Versions: 1.8.0 >Reporter: Haifeng Chen >Priority: Major > Labels: Rhino > Original Estimate: 672h > Remaining Estimate: 672h > > Avro file format is widely used in Hadoop. As data security is getting more > and more attention in Hadoop community, we propose to improve Avro file > format to be able to handle data encryption and decryption. > Similar to compression and decompression, encryption and decryption can be > implemented with Codecs, a concept that already exists in Avro. However, Avro > Codec context handling needs to be extended to support per-codec contexts, > such as encryption keys, for encryption and decryption. > Avro supports multiple language implementations. This is an umbrella JIRA for > this work and the implementation work for each language will be addressed in > sub tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1429) Exception on storing Null value through AvroStorage using PIG
[ https://issues.apache.org/jira/browse/AVRO-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1429: -- Component/s: java > Exception on storing Null value through AvroStorage using PIG > - > > Key: AVRO-1429 > URL: https://issues.apache.org/jira/browse/AVRO-1429 > Project: Apache Avro > Issue Type: Task > Components: java > Environment: Hadoop 0.20.2-cdh3u5 > Apache Pig version 0.8.1-cdh3u5 > java version "1.6.0_27" >Reporter: Sudhir Ranjan >Priority: Major > Labels: features, patch > > Getting exception on storing null valued record/tupple as avro. > The input file having one column with long values (one of them is null means > nothing) and when I am trying to store the data in avro format ,it throws > error. > Please suggest if I am missing any thing some where as per the bellow > codebase or else please provide the patch. > **My code base. > REGISTER > /home/hadoop/work/sudhir/AvroAnalysis/Avrojars/snappy-java-1.0.4.1.jar > REGISTER /home/hadoop/work/sudhir/AvroAnalysis/Avrojars/avro-1.7.5.jar > REGISTER /home/hadoop/work/sudhir/AvroAnalysis/Avrojars/json-simple-1.1.jar; > REGISTER /home/hadoop/work/sudhir/AvroAnalysis/Avrojars/piggybank.jar; > REGISTER > /home/hadoop/work/sudhir/AvroAnalysis/Avrojars/jackson-core-asl-1.5.5.jar; > REGISTER > /home/hadoop/work/sudhir/AvroAnalysis/Avrojars/jackson-mapper-asl-1.5.5.jar; > -- The input file only have 1 column (normal TEXT data i.e TSV format) and > the file having a null value means nothing > A = load '/home/hadoop/work/sudhir/AvroAnalysis/input/TSV_uncompressed/part*' > using PigStorage('\t') as (USER_ID:long); > -- The soutput to be stored in avro data format > STORE A INTO > '/home/hadoop/work/sudhir/AvroAnalysis/output/TSV_uncompressed/part*' USING > org.apache.pig.piggybank.storage.avro.AvroStorage('schema','{"namespace":"com.sudhir.schema.users.avro","type":"long","name":"users_avro","doc":"Avro > storing with schema using > Pig.","fields":[{"name":"USER_ID","type":["null","long"],"default":null}]}'); > ***Getting Error like: > INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 100% complete > ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate > exception from backed error: > org.apache.avro.file.DataFileWriter$AppendWriteException: > java.lang.NullPointerException: null of long > ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1354) SortedKeyValueFiles should support appends
[ https://issues.apache.org/jira/browse/AVRO-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1354: -- Component/s: java > SortedKeyValueFiles should support appends > -- > > Key: AVRO-1354 > URL: https://issues.apache.org/jira/browse/AVRO-1354 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Dipti Desai >Priority: Major > > SortedKeyValueFiles currently don't allow for appends. This functionality > would be a nice to have. > http://apache-avro.679487.n3.nabble.com/JIRA-to-support-append-for-SortedKeyValueFiles-td4027834.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2250) Release 1.9.0
[ https://issues.apache.org/jira/browse/AVRO-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2250: -- Component/s: release > Release 1.9.0 > - > > Key: AVRO-2250 > URL: https://issues.apache.org/jira/browse/AVRO-2250 > Project: Apache Avro > Issue Type: Task > Components: release >Reporter: Nandor Kollar >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1226) Non-Avro data causes runtime exceptions/errors when sent to Avro NettyTransceiver
[ https://issues.apache.org/jira/browse/AVRO-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1226: -- Component/s: java > Non-Avro data causes runtime exceptions/errors when sent to Avro > NettyTransceiver > - > > Key: AVRO-1226 > URL: https://issues.apache.org/jira/browse/AVRO-1226 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.3 >Reporter: Brock Noland >Priority: Major > > AVRO- put in a stop gap measure to stop Avro from throwing an OOMError > when something like an HTTP request was sent to an AVRO IPC port. The general > issue of port scanning/monitoring causing Avro to throw opaque runtime errors > still exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1156) Avro responder swallows thrown Errors
[ https://issues.apache.org/jira/browse/AVRO-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1156: -- Component/s: java > Avro responder swallows thrown Errors > - > > Key: AVRO-1156 > URL: https://issues.apache.org/jira/browse/AVRO-1156 > Project: Apache Avro > Issue Type: Bug > Components: java >Reporter: Mike Percy >Priority: Major > Attachments: AVRO-1156-1.patch > > > The Avro responder wraps caught Errors, such as OutOfMemoryErrors, in > Exceptions and rethrows them. That's problematic because an Error should be > allowed to crash the JVM, since it's often irrecoverable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1032) Add AvroMapDriver and AvroReduceDriver
[ https://issues.apache.org/jira/browse/AVRO-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1032: -- Component/s: java > Add AvroMapDriver and AvroReduceDriver > -- > > Key: AVRO-1032 > URL: https://issues.apache.org/jira/browse/AVRO-1032 > Project: Apache Avro > Issue Type: Wish > Components: java >Reporter: Daniel Micol-Ponce >Priority: Major > > I think Avro should include an AvroMapDriver and AvroReduceDriver, similar to > Hadoop's MapDriver and ReduceDriver, in order to allow easier unit tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-628) Unions have no getter function (avro_union_get)
[ https://issues.apache.org/jira/browse/AVRO-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-628: - Component/s: c > Unions have no getter function (avro_union_get) > --- > > Key: AVRO-628 > URL: https://issues.apache.org/jira/browse/AVRO-628 > Project: Apache Avro > Issue Type: Improvement > Components: c >Reporter: Gavin M. Roy >Priority: Major > > The union data type has no getter function and as such, the only way to get > to the data in the union is to create a struct > {code}typedef struct avro_union_datum_t { > struct avro_obj_t obj; > int64_t discriminant; > avro_datum_t value; > } avro_union_datum_t;{code} > in your own include or code as the struct for this is in datum.h which is not > installed with the library. In addition, there is a type warning when trying > to use avro_record_get using the avro_union_datum_t. > Ideally there would be a getter that exposes the datum in the union. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1009) Use ExecutionHandler by default in NettyServer and/or clarify documentation
[ https://issues.apache.org/jira/browse/AVRO-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1009: -- Component/s: java > Use ExecutionHandler by default in NettyServer and/or clarify documentation > --- > > Key: AVRO-1009 > URL: https://issues.apache.org/jira/browse/AVRO-1009 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.6.2 >Reporter: James Baldassari >Priority: Major > Labels: java > > It may be a good idea to use an ExecutionHandler with a cached thread pool by > default in NettyServer. If an ExecutionHandler is not used then, as pointed > out in AVRO-976 and AVRO-1001, each Netty session can only execute one RPC at > a time. Users should still be allowed to override the ExecutionHandler with > their own implementation. Whether we make this change or not, I think the > documentation in NettyServer should explain in a little more detail the > behavior of NettyServer with and without an ExecutionHandler. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2159) Naming Limitations of Schemas in Stricter Reference Contexts
[ https://issues.apache.org/jira/browse/AVRO-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2159: -- Component/s: spec > Naming Limitations of Schemas in Stricter Reference Contexts > > > Key: AVRO-2159 > URL: https://issues.apache.org/jira/browse/AVRO-2159 > Project: Apache Avro > Issue Type: New Feature > Components: spec >Reporter: Bridger Howell >Priority: Major > > (Excuse the lengthiness of this ticket description - it was initially written > as an email that became too long. Feel free to correct any misguided > reasoning.) > I've come to realize that there are some undesirable constraints on how avro > schemas can be used in Java code generation and IDL, that only appear as > minor annoyances when you use schemas generically. In particular, I'm focused > on cases where it's desirable to use two schemas that have the same name in > some context. > > *Issue:* > Suppose I'm writing an application that publishes a many different kinds of > data somewhere, with each type of data having its own schema. And then > suppose that a some number of those schemas would like to share some kind of > common schema, to start with. > > If I do this, and I happen to be using Java code generation to manage > schemas, I'll soon find difficulty in two directions: > > - I would find it difficult to upgrade the data shared among all of these > external schemas by way of the common schema, without upgrading all of those > schemas at the same time. The problem here being that neither Java's > classpath nor an IDL protocol can support the way avro's name field maps as a > class name onto the classpath or a reference name onto a protocol's symbols. > > The intermediate step of the application being partially migrated between > version 1 and version 2 of a common schema has no representation in either of > these contexts. Using a different name becomes a very annoying option in many > cases, since it is an incompatible change (or with aliases, it's at least not > consistently compatible across implementations). > - I would find it difficult to migrate away from the external schemas using > that shared schema, for the same reasons listed above. > In IDL (without code generation), these issues can usually be avoided by > creating a second protocol, and in generic avro, the issues would be avoided > by using a different schema parser or schema builder. > > *Analysis:* > At first glance, it is tempting to blame the name-matching requirement for > schema resolution as a culprit - and it may be correct in many cases that > requiring schemas have compatible structure is all that is needed. > > However, the way I see it is that the name-matching requirement for schema > resolution is there to ensure that there is _the intent for two schemas to > resolve with each other_, and the rest of the checks are just there to make > sure that such an intent can be reasonably carried out. > > The difficulty from either the two examples above happens not because of a > lack of pre-determined intent for schemas to resolve, but rather the > inability to simultaneously supply a unique reference for each of the > schemas, while intending that the correct groups of schemas can resolve. > > Thus, the way to avoid these issues so far has been to create a new > reference context, and the severity of the issue in each case corresponds to > the difficulty of creating a new reference context: > * For generic schemas, create a new parser or schema builder [easy - minorly > annoying] > * For IDL, create a new protocol [minorly annoying - somewhat annoying] > * For Java code generation, create a new classpath [very annoying (Java 9) - > impossible] > Based on that, I understand a schema's name as expressing two overlapping > meanings: > - the intent to be able to resolve with other schemas with the same name > (let's call this the {{resolveName}}) > - the ability to be uniquely referenced from some context (let's call this > the {{referenceName}}) > > If these two meanings were able to be specified independently, I think that > schemas would be much easier to use in contexts where references are more > limited. > > *Speculative Solutions:* > Minimally, I think it's reasonable to create at least one new field to > separate the meaning of a schema's {{referenceName}} from its > {{resolveName}}, and use the old name field to compatibly handle missing > values. Then other tools that don't immediately apply schema resolution, can > optionally upgrade to support using the {{referenceName}} instead of the > {{resolveName}}. > > Beyond that, having {{name}} continue to mean {{resolveName}} would mean > that old avro
[jira] [Updated] (AVRO-2187) Add RPC Streaming constructs/keywords to Avro IDL or schema
[ https://issues.apache.org/jira/browse/AVRO-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2187: -- Component/s: spec > Add RPC Streaming constructs/keywords to Avro IDL or schema > --- > > Key: AVRO-2187 > URL: https://issues.apache.org/jira/browse/AVRO-2187 > Project: Apache Avro > Issue Type: New Feature > Components: spec >Reporter: Srujan Narkedamalli >Priority: Major > > Motivation: > We recently added support for transporting Avro serialization and IDL over > gRPC for Java. In order to use the streaming features of gRPC or any other > transport that supports streaming we need to be able to specify them IDL and > schema. > Details: > Currently, gRPC supports 3 types of streaming calls: > # server streaming (server can send multiple responses for a single request) > # client streaming (client can multiple requests and server sends a single > response) > # bi-directional streaming call (on going rpc with multiple requests and > responses) > We would want a way to represent these types on calls in Avro's IDL similar > to one-way calls using a keywords. Usually in gRPC with other IDLs a > streaming request or response is repeated payload of same type. For client > streaming and bi-directional streaming it would be simpler to have a single > request argument when representing their type in callbacks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2254) Unions with 2 records declared downward fail
[ https://issues.apache.org/jira/browse/AVRO-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2254: -- Component/s: java > Unions with 2 records declared downward fail > > > Key: AVRO-2254 > URL: https://issues.apache.org/jira/browse/AVRO-2254 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.9.0 >Reporter: Zoltan Farkas >Priority: Major > > The following IDL will fail complaining that 2 same type is declared twice in > the union: > {code} > @namespace("org.apache.avro.gen") > protocol UnionFwd { > record TestRecord { > union {SR1, SR2} unionField; > } > record SR1 { > string field; > } > record SR2 { > string field; > } > } > {code} > the fix for this can be pretty simple: > https://github.com/zolyfarkas/avro/commit/56b215f73f34cc80d505875c90217916b271abb5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2205) Add IP address logical type and convertors
[ https://issues.apache.org/jira/browse/AVRO-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2205: -- Component/s: spec > Add IP address logical type and convertors > -- > > Key: AVRO-2205 > URL: https://issues.apache.org/jira/browse/AVRO-2205 > Project: Apache Avro > Issue Type: Improvement > Components: spec >Reporter: Tristan Stevens >Priority: Major > > IP addresses can be much more optimally represented as a 64 bit integer, > meaning that it's much more efficient for storage and allowing consumers to > do equality or subnet (range) comparisons using long-integer arithmetic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2273) Release 1.8.3
[ https://issues.apache.org/jira/browse/AVRO-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2273: -- Component/s: release > Release 1.8.3 > - > > Key: AVRO-2273 > URL: https://issues.apache.org/jira/browse/AVRO-2273 > Project: Apache Avro > Issue Type: Task > Components: release >Reporter: Thiruvalluvan M. G. >Priority: Major > Fix For: 1.8.3 > > > This ticket is for releasing Avro 1.8.3 and discussing any topics related to > it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2093) Extend "custom coders" to fully support union types
[ https://issues.apache.org/jira/browse/AVRO-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2093: -- Component/s: java > Extend "custom coders" to fully support union types > --- > > Key: AVRO-2093 > URL: https://issues.apache.org/jira/browse/AVRO-2093 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Raymie Stata >Priority: Major > > The initial implementation of "custom coders" for SpecificRecord (AVRO-2090) > only supports "nullable unions" (two-branch unions where one branch is the > null type). This JIRA extends that implementation to support all forms of > unions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1222) unable to install avro 1.7.3, avro-c.pc missing
[ https://issues.apache.org/jira/browse/AVRO-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1222: -- Component/s: c > unable to install avro 1.7.3, avro-c.pc missing > --- > > Key: AVRO-1222 > URL: https://issues.apache.org/jira/browse/AVRO-1222 > Project: Apache Avro > Issue Type: Bug > Components: c >Affects Versions: 1.7.3 > Environment: racky# uname -a > Linux racky 2.6.32.14-127.nuMetra.1.fc12.x86_64 #1 SMP Sat Jun 19 07:08:40 > PDT 2010 x86_64 x86_64 x86_64 GNU/Linux >Reporter: Alfonso Urdaneta >Priority: Major > > ./cmake_avrolib.sh in > http://mirror.nexcess.net/apache/avro/stable/c/avro-c-1.7.3.tar.gz fails > during the installation phase. > Install the project... > -- Install configuration: "Debug" > -- Installing: /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro.h > -- Installing: /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/consumer.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/data.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/legacy.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/msstdint.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/refcount.h > -- Installing: /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/io.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/allocation.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/platform.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/resolver.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/value.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/basics.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/errors.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/msinttypes.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/schema.h > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/include/avro/generic.h > -- Installing: /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/lib/libavro.a > -- Installing: > /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/lib/libavro.so.22.0.0 > -- Installing: /home/alfonso/tmp/avro-c-1.7.3/build/avrolib/lib/libavro.so > CMake Error at src/cmake_install.cmake:65 (FILE): > file INSTALL cannot find file > "/home/alfonso/tmp/avro-c-1.7.3/build/src/avro-c.pc" to install. > Call Stack (most recent call first): > cmake_install.cmake:37 (INCLUDE) > make: *** [install] Error 1 > 6.695u 3.161s 0:09.83 100.2% 0+0k 0+65560io 0pf+0w -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2146) getting Expected start-union. Got VALUE_STRING
[ https://issues.apache.org/jira/browse/AVRO-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2146: -- Component/s: java > getting Expected start-union. Got VALUE_STRING > -- > > Key: AVRO-2146 > URL: https://issues.apache.org/jira/browse/AVRO-2146 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.2 > Environment: error message: > Exception in thread "main" org.apache.avro.AvroTypeException: Expected > start-union. Got VALUE_STRING > at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:698) > at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441) > at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290) > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) > at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) > at > org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) > at myJson2Avro.fromJasonToAvro(myJson2Avro.java:81) > at myJson2Avro.main(myJson2Avro.java:48) >Reporter: laki >Priority: Major > > Here is the schema, no unions, but getting union error : > > { > "type" : "record", > "name" : "edm_generic_publisher_avro_schema", > "namespace" : "edm.avro", > "doc" : "The generic avro schema used by publishers to publish events to the > enterprise streaming service", > "fields" : [ > {"name" : "event", > "type" : { > "type" : "record", > "name" : "event_meta_data", > "fields" : [ > {"name" : "event_name", "type" : "string", "doc" : "The name of the event. In > the CDC, this field is populated with the name of the data base table or > segment."} > , > {"name" : "operation_type", "type" : "string", "doc": "The operation or > action that triggered the event. e.g., Insert, Update, Delete, etc."} > , > {"name" : "transaction_identifier", "type" : "string", "default" : "NONE", > "doc" : "A unique identifier that identifies a unit or work or transaction. > Useful in relating multiple events together."} > , > {"name" : "event_publication_timestamp_millis", "type" : "string", "doc": > "timestamp when the event was published"} > , > > {"name" : "event_publisher", "type" : "string", "doc" : "The system or > application that published the event"} > , > > {"name" : "event_publisher_identity", "type": "string", "default" : "NONE", > "doc": "The identity (user) of the system or application that published the > event"} > , > > {"name" : "event_timestamp_millis", "type" : "string", "default" : "NONE", > "doc": "timestamp when the event occured"} > , > > {"name": "event_initiator", "type": "string", "default" : "NONE", "doc" : > "The system or application that initiated the event"} > , > {"name": "event_initiator_identity", "type" : "string", "default" : "NONE", > "doc": "The system id or application id that initiated the event" } > ]}, > "doc" : "The data about the published event" > }, > { "name" : "contents", > "type" : { > "name": "data_field_groups", > "type": "array", > "items": { > "type": "record", > "name": "data_field_group", > "fields" : [ > {"name": "data_group_name", "type": "string" } > , > { > "name": "data_fields", > "type": { > "type": "array", > "items": { > "name": "data_field", > "type": "record", > "fields":[ > {"name" : "data_field_name", "type" : "string", "doc" : "The field name"} > , > > {"name": "data_field_type", "type": "string", "doc" : "The data type is one > of the following values: string, boolean, int, long, float, double or bytes"} > , > {"name" : "data_field_value", "type" : ["string"], "doc" : "The value"} > ] > } > } > } > ] > } > }, > "doc" : "The datafields for the for the published event" > } > ] > } > ; > > > here is the code that is causing the issue--- > > static byte[] fromJasonToAvro(
[jira] [Updated] (AVRO-1568) Allow Java polymorphism in Avro for third-party code
[ https://issues.apache.org/jira/browse/AVRO-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1568: -- Component/s: java > Allow Java polymorphism in Avro for third-party code > > > Key: AVRO-1568 > URL: https://issues.apache.org/jira/browse/AVRO-1568 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.6 >Reporter: Sachin Goyal >Priority: Major > > A large number of Java designs interacting with databases with > Hibernate/Couchbase (perhaps, even otherwise) have Java polymorphism of the > form: > {code:java} > class Base > { >Integer a = 5; > } > class Derived extends Base > { > String b = "Foo"; > } > class PolymorphicDO > { >Base b = new Derived(); > } > {code} > Jackson handles this kind of field by using annotations such as: > {code} > @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, include = > JsonTypeInfo.As.PROPERTY, property = "@class”) > {code} > If such a thing can be added to Avro, all those Java designs could become > immediately usable with Avro. They would also become Hadoop compatible due to > AvroSerde. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2094) Extend "custom coders" to support logical types
[ https://issues.apache.org/jira/browse/AVRO-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2094: -- Component/s: java > Extend "custom coders" to support logical types > --- > > Key: AVRO-2094 > URL: https://issues.apache.org/jira/browse/AVRO-2094 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Raymie Stata >Priority: Major > > The initial implementation of "custom coders" (AVRO-2090) does not support > Avro's logical types. This JIRA extends that implementation to remove this > limitation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2116) unknown fields in json not ignored
[ https://issues.apache.org/jira/browse/AVRO-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2116: -- Component/s: java > unknown fields in json not ignored > -- > > Key: AVRO-2116 > URL: https://issues.apache.org/jira/browse/AVRO-2116 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.1 > Environment: java 1.8 >Reporter: redlion >Priority: Major > > As the screenshot, I put two unknown field, one field 'unknown1' is under > root level, and another field 'unknown2' is under the sub record startRule, > when I try to parse this json . I got an error said: > Expected Unkown fileds: [unkown2], Got FIELD_NAME. > !http://images-1254198035.file.myqcloud.com/avro_issue.png|height=350,width=550! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2260) IDL Json Parsing is lossy, and it could be made more accurate.
[ https://issues.apache.org/jira/browse/AVRO-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2260: -- Component/s: java > IDL Json Parsing is lossy, and it could be made more accurate. > -- > > Key: AVRO-2260 > URL: https://issues.apache.org/jira/browse/AVRO-2260 > Project: Apache Avro > Issue Type: Bug > Components: java >Reporter: Zoltan Farkas >Priority: Minor > > Currently all integers are handled as Long, and all floating point as Double, > having basically the following issues: > 1) cannot handle numbers larger that MAXLONG. > 2) introducing unnecessary precision > {code} > JsonNode Json() : > { String s; Token t; JsonNode n; } > { > ( s = JsonString() { n = new TextNode(s); } > | (t= { n = new LongNode(Long.parseLong(t.image)); }) > | (t= {n=new > DoubleNode(Double.parseDouble(t.image));}) > | n=JsonObject() > | n=JsonArray() > | ( "true" { n = BooleanNode.TRUE; } ) > | ( "false" { n = BooleanNode.FALSE; } ) > | ( "null" { n = NullNode.instance; } ) > ) > { return n; } > } > {code} > This should be improved to: > {code} > JsonNode Json() : > { String s; Token t; JsonNode n; } > { > ( s = JsonString() { n = new TextNode(s); } > | (t= { >try { > n = new IntNode(Integer.parseInt(t.image)); >} catch(NumberFormatException e) { > try { > n = new LongNode(Long.parseLong(t.image)); > } catch(NumberFormatException ex2) { > n = new BigIntegerNode(new java.math.BigInteger(t.image)); > } >} > }) > | (t= {n=new DecimalNode(new > java.math.BigDecimal(t.image));}) > | n=JsonObject() > | n=JsonArray() > | ( "true" { n = BooleanNode.TRUE; } ) > | ( "false" { n = BooleanNode.FALSE; } ) > | ( "null" { n = NullNode.instance; } ) > ) > { return n; } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1273) JavaScript dynamic generation of constructor funcs for Avro records
[ https://issues.apache.org/jira/browse/AVRO-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1273: -- Component/s: javascript > JavaScript dynamic generation of constructor funcs for Avro records > --- > > Key: AVRO-1273 > URL: https://issues.apache.org/jira/browse/AVRO-1273 > Project: Apache Avro > Issue Type: Improvement > Components: javascript >Reporter: Quinn Slack >Priority: Minor > Labels: javascript > Attachments: AVRO-1273.patch > > > Per https://issues.apache.org/jira/browse/AVRO-485, I have extended Avro's > JavaScript support to dynamically generate constructors for Avro records. > Validation of JS objects against Avro schemas is still supported, but the API > is different: Avro.validate(schema, obj) instead of > Validator.validate(schema, obj). This is a breaking change but may be worth > it because there are now several Avro.* funcs. > Code is at https://github.com/sqs/avro/tree/lang-js/lang/js. I will attach a > diff. > Here is sample usage. We compile a ManyFieldsRecord constructor function > using the Avro schema as input. The constructor function accepts a JS object, > which it validates against the Avro schema and then uses to populate the new > object's fields. Then ManyFieldRecords objects use Object.defineProperty > setters to ensure that the object remains valid Avro. > {code:javascript} > var manyFieldsRecordSchema = { > type: 'record', name: 'ManyFieldsRecord', fields: [ > {name: 'nullField', type: 'null'}, > {name: 'booleanField', type: 'boolean'}, > {name: 'intField', type: 'int'}, > {name: 'longField', type: 'long'}, > {name: 'floatField', type: 'float'}, > {name: 'doubleField', type: 'double'}, > {name: 'stringField', type: 'string'}, > {name: 'bytesField', type: 'bytes'} > ] > }; > var compiledTypes = Avro.compile(manyFieldsRecordSchema) > ManyFieldsRecord = compiledTypes.ManyFieldsRecord, > mfr = new ManyFieldsRecord(); > test.throws(function() { mfr.nullField = undefined; }); > test.throws(function() { mfr.nullField = 1; }); > test.throws(function() { mfr.booleanField = 'a'; }); > test.throws(function() { mfr.intField = 'a'; }); // TODO: warn if setting > int/long field to a non-integer > test.throws(function() { mfr.longField = 'a'; }); > test.throws(function() { mfr.floatField = 'a'; }); > test.throws(function() { mfr.doubleField = 'a'; }); > test.throws(function() { mfr.stringField = 3; }); > mfr.nullField = null; > mfr.booleanField = true; > mfr.intField = 1; > mfr.longField = 2; > mfr.floatField = 3.5; > mfr.doubleField = 4.5; > mfr.stringField = 'a'; > test.equal(mfr.nullField, null); > test.equal(mfr.booleanField, true); > test.equal(mfr.intField, 1); > test.equal(mfr.longField, 2); > test.equal(mfr.floatField, 3.5); > test.equal(mfr.doubleField, 4.5); > test.equal(mfr.stringField, 'a'); > // Standard JavaScript JSON API interface: > mgr.toJSON(); // --> returns plain JS object (without Avro-validating setters) > JSON.stringify(mfr); // --> returns Avro JSON > {code} > More examples are in the test dir: > https://github.com/sqs/avro/tree/lang-js/lang/js/test. > This is still rough and I am very interested in getting feedback. Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1518) Python client support decimal.Decimal types -> double encoding / decoding
[ https://issues.apache.org/jira/browse/AVRO-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1518: -- Component/s: python > Python client support decimal.Decimal types -> double encoding / decoding > - > > Key: AVRO-1518 > URL: https://issues.apache.org/jira/browse/AVRO-1518 > Project: Apache Avro > Issue Type: Improvement > Components: python >Reporter: Scott Reynolds >Assignee: Scott Reynolds >Priority: Major > Fix For: 1.7.9 > > > Python standard library > 2.4 provides a Decimal type that has much better > semantics then standard binary float. Avro library should be able to accept > Decimal's and encode them as doubles. > (https://docs.python.org/2/library/decimal.html) > I also believe it should, by default, turn Avro double's into Decimal object > instead of a float. > Simple patch allows for encoding a Decimal into an Avro double > {code} > --- io.py 2014-05-23 13:41:14.0 -0700 > +++ /Users/sreynolds/Projects/avro-1.7.6 2/src/avro/io.py 2014-05-23 > 13:44:03.0 -0700 > @@ -46,6 +46,11 @@ try: > except ImportError: > import simplejson as json > +try: > +from decimal import Decimal > +except ImportError: > +Decimal = float > + > # > # Constants > # > @@ -117,7 +122,7 @@ def validate(expected_schema, datum): > and LONG_MIN_VALUE <= datum <= LONG_MAX_VALUE) >elif schema_type in ['float', 'double']: > return (isinstance(datum, int) or isinstance(datum, long) > -or isinstance(datum, float)) > +or isinstance(datum, float) or instance(datum, Decimal)) >elif schema_type == 'fixed': > return isinstance(datum, str) and len(datum) == expected_schema.size >elif schema_type == 'enum': > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1236) AvroMultipleOutputs fails to close successfuly
[ https://issues.apache.org/jira/browse/AVRO-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1236: -- Component/s: java > AvroMultipleOutputs fails to close successfuly > -- > > Key: AVRO-1236 > URL: https://issues.apache.org/jira/browse/AVRO-1236 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.3 >Reporter: Victor Iacoban >Priority: Major > > When I'm using AvroMultipleOutputs my job fails with exception, but works ok > if I replace AvroMultipleOutputs with MultipleOutputs: > 2013-01-29 14:29:04,771 WARN org.apache.hadoop.mapred.Child: Error running > child > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /tmp/avros/_temporary/_attempt_201301290714_0012_m_00_0/part-m-0 File > is not open for writing. Holder DFSClient_NONMAPREDUCE_-853305103_1 does not > have any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2316) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2299) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2095) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) > at org.apache.hadoop.ipc.Client.call(Client.java:1160) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > at $Proxy10.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > at $Proxy10.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1479) JavaScript encoder
[ https://issues.apache.org/jira/browse/AVRO-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1479: -- Component/s: javascript > JavaScript encoder > -- > > Key: AVRO-1479 > URL: https://issues.apache.org/jira/browse/AVRO-1479 > Project: Apache Avro > Issue Type: Improvement > Components: javascript >Reporter: Sal Zsosn >Priority: Trivial > > I've been working on an encoder for JavaScript. > From the other JavaScript implementations I've seen out there, this one is > different in that it supports all Avro types and not only works in node.js > but also in the browser. Also the Java implementation is able to parse the > resulting file formats. > I've included examples. It's only able to encode, not decode yet. > Maybe that's something worth including in future releases. > http://www.speedyshare.com/CAwTj/avro-encoder.7z -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2084) API changes review for Avro
[ https://issues.apache.org/jira/browse/AVRO-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2084: -- Component/s: java > API changes review for Avro > --- > > Key: AVRO-2084 > URL: https://issues.apache.org/jira/browse/AVRO-2084 > Project: Apache Avro > Issue Type: Test > Components: java >Reporter: Andrey Ponomarenko >Priority: Major > Attachments: Avro-Report-1.png, Avro-Report-2.png > > > The review of API changes for the Avro library since 1.0.0 version: > https://abi-laboratory.pro/java/tracker/timeline/avro/ > The report is updated three times a week. Hope it will be helpful for users > and maintainers of the library. > The report is generated by https://github.com/lvc/japi-tracker > Thank you. > !Avro-Report-1.png|API changes review! > !Avro-Report-2.png|API symbols timeline! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1547) AvroApp Schema Tool
[ https://issues.apache.org/jira/browse/AVRO-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1547: -- Component/s: java > AvroApp Schema Tool > --- > > Key: AVRO-1547 > URL: https://issues.apache.org/jira/browse/AVRO-1547 > Project: Apache Avro > Issue Type: New Feature > Components: java >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney >Priority: Minor > Fix For: 1.9.0 > > > Over in Gora, I have been thinking for a while that the process of writing > JSON data beans is rather time consuming when beans are LARGE. > I wanted to open this ticket for a while and now only get around to. I > proposed to have the following > A simple HTML webpage that defines a form of sorts, the form will enable > users to create JSON schemas and will be driven by enabling users to enter > Object values based on the current Avro specification document e.g. ti will > be restrictive in scope. > On top of this I propose to then use simple JQuery to send a request to the > JSONBlob API [0], obtain a JSON representation of the data and then pretty > print write this information to a file within the browser. The users can then > save this file focally and do with it what they wish. > I think that this page can easily be hosted alongside the current static Avro > website and that there is no need to write a web application for this yet. > I'll try to work on it sooner rather than later as this would also lower the > barrier for users of Gora (as I am sure it would Users of other technologies > requiring definition of Objects via JSOn schemas). > I've not assigned this against any component as there is none which I feel > appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1899) PascalCase for property names generated by avrogen for C#
[ https://issues.apache.org/jira/browse/AVRO-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1899: -- Component/s: csharp > PascalCase for property names generated by avrogen for C# > - > > Key: AVRO-1899 > URL: https://issues.apache.org/jira/browse/AVRO-1899 > Project: Apache Avro > Issue Type: Improvement > Components: csharp >Reporter: Xtra Coder >Priority: Major > > Currently (code in branch 1.8) avrogen generates properties in C# data > classes 1:1 as they are defined in shema, what results for field named > 'favorite_color' in code like following: > public string favorite_color { > get { return this._favorite_color; } > set { this._favorite_color = value; } > } > In general property names should use PascalCasing (see: > https://msdn.microsoft.com/en-us/library/ms229043.aspx) and correctly > generated code would look like > public string FavoriteColor { > get { return this._favorite_color; } > set { this._favorite_color = value; } > } > Potential change is rather minor: > .\avro\lang\csharp\src\apache\main\CodeGen\CodeGen.cs : 581 > change > var mangledName = CodeGenUtil.Instance.Mangle(field.Name); > to > var mangledName = CodeGenUtil.Instance.Mangle(AsPropName(field.Name)); > where AsPropName function may look like following > public string AsPropName(string name) { > return Regex.Replace(name, @"^\S|_\S", match => > match.Value.Replace("_","").ToUpper()); > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1938) Python (2) support for generating canonical forms of schema
[ https://issues.apache.org/jira/browse/AVRO-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1938: -- Component/s: python > Python (2) support for generating canonical forms of schema > --- > > Key: AVRO-1938 > URL: https://issues.apache.org/jira/browse/AVRO-1938 > Project: Apache Avro > Issue Type: Improvement > Components: python >Reporter: Erik Forsberg >Priority: Major > > The python implementation(s) lack support for generating canonical forms of > schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1452) Problem when using AvroMultipleOutputs with multiple schemas
[ https://issues.apache.org/jira/browse/AVRO-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1452: -- Component/s: java > Problem when using AvroMultipleOutputs with multiple schemas > > > Key: AVRO-1452 > URL: https://issues.apache.org/jira/browse/AVRO-1452 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.6 > Environment: Any Platform >Reporter: Vladislav Spivak >Priority: Major > Labels: easyfix > > When using multiple named outputs with different Key/Value Schemas, the last > provided schema overrides any previous schema definitions after first write > attempt. This happens due to issue with the following code in > AvroMultipleOutputs.java:509 > /*begin*/ > Job job = new Job(context.getConfiguration()); >... > setSchema(job, keySchema, valSchema); > taskContext = createTaskAttemptContext( > job.getConfiguration(), context.getTaskAttemptID()); > /*end*/ > Every time this code runs, actual configuration instance passed to > createTaskAttemptContext remains the same, because Job constructor creates > new configuration copy only if it is not instanceof JobConf. This way we have > properties "avro.schema.output.XXX" overwrote each time new > TaskAttemptContext is initialised and also mistakenly shared Configuration > instance for all TaskAttemptContextes > Proposed fix: > a) use "Job getInstance(Configuration conf)" or > b) call "new Job(new Configuration(context.getConfiguration))" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1818) Avoid buffer copy in DeflateCodec.compress and decompress
[ https://issues.apache.org/jira/browse/AVRO-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1818: -- Component/s: java > Avoid buffer copy in DeflateCodec.compress and decompress > - > > Key: AVRO-1818 > URL: https://issues.apache.org/jira/browse/AVRO-1818 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Rohini Palaniswamy >Assignee: Nandor Kollar >Priority: Major > > One of our jobs reading avro hit OOM due to the buffer copy in compress and > decompress methods which is very inefficient. > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/file/DeflateCodec.java#L71-L86 > {code} > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3236) > at > java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191) > at org.apache.avro.file.DeflateCodec.decompress(DeflateCodec.java:84) > {code} > I would suggest using a class that extends ByteArrrayOutputStream like > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java#L51-L53 > and do > ByteBuffer result = ByteBuffer.wrap(buf.getData(), 0, buf.getLength()); -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1559) Drop support for Ruby 1.8
[ https://issues.apache.org/jira/browse/AVRO-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1559: -- Component/s: ruby > Drop support for Ruby 1.8 > - > > Key: AVRO-1559 > URL: https://issues.apache.org/jira/browse/AVRO-1559 > Project: Apache Avro > Issue Type: Wish > Components: ruby >Affects Versions: 1.7.7 >Reporter: Willem van Bergen >Assignee: Willem van Bergen >Priority: Major > Fix For: 1.9.0 > > Attachments: AVRO-1559.patch > > > - Ruby 1.8 is EOL, and is even security issues aren't addressed anymore. > - It is also getting hard to set up Ruby 1.8 to run the tests (e.g. on a > recent OSX, it won't compile without manual fiddling). > - Handling character encodings in Ruby 1.9 is very different than Ruby 1.8. > Supporting both at the same time adds a lot of overhead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1742) Avro C# DataFileWriter Flush() does not flush the buffer to disk
[ https://issues.apache.org/jira/browse/AVRO-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1742: -- Component/s: csharp > Avro C# DataFileWriter Flush() does not flush the buffer to disk > > > Key: AVRO-1742 > URL: https://issues.apache.org/jira/browse/AVRO-1742 > Project: Apache Avro > Issue Type: Bug > Components: csharp >Reporter: Mika Ristimaki >Priority: Minor > > In C# DataFileWriter.Flush() is implemented as > {code} > public void Flush() > { > EnsureHeader(); > Sync(); > } > {code} > Is this by Avro spec or is this a bug. So should calling > DataFileWriter.Flush() just start a new Sync block and not flush the file to > disc? > In Java the implementation is > {code} > @Override > public void flush() throws IOException { > sync(); > vout.flush(); > } > {code} > where vout is a BinaryEncoder. So I think the correct implementation in C# is > {code} > public void Flush() > { > EnsureHeader(); > Sync(); >_encoder.Flush() > } > {code} > If someone can confirm my suspicion I'll try to contribute a fix in the near > future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1526) Add support for writing/reading json records using python3.
[ https://issues.apache.org/jira/browse/AVRO-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1526: -- Component/s: python > Add support for writing/reading json records using python3. > --- > > Key: AVRO-1526 > URL: https://issues.apache.org/jira/browse/AVRO-1526 > Project: Apache Avro > Issue Type: New Feature > Components: python >Reporter: Robert Chu >Assignee: Christophe Taton >Priority: Major > > Currently the avro python3 package only supports reading/writing binary data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1916) Building python version uses wrong version avro-tools.
[ https://issues.apache.org/jira/browse/AVRO-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1916: -- Component/s: python > Building python version uses wrong version avro-tools. > -- > > Key: AVRO-1916 > URL: https://issues.apache.org/jira/browse/AVRO-1916 > Project: Apache Avro > Issue Type: Bug > Components: python >Reporter: Niels Basjes >Priority: Major > > During {{./build.sh test}} I see this during the build of {{lang/py}} > {code} > [ivy:retrieve]found org.apache.avro#avro-tools;1.9.0-SNAPSHOT in > apache-snapshots > [ivy:retrieve] downloading > https://repository.apache.org/content/groups/snapshots/org/apache/avro/avro-tools/1.9.0-SNAPSHOT/avro-tools-1.9.0-20160122.173016-35.jar > ... > {code} > So apparently the py build phase uses an external version of avro-tools. > What if I just updated avro-tools? Then it is quite possible the test will > pass while in reality it should have failed. > I suspect the fix can be as simple as doing a {{mvn install}} on the java > avro-tools before building/testing the rest of the languages. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1527) support bzip2 in python avro tool
[ https://issues.apache.org/jira/browse/AVRO-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1527: -- Component/s: python > support bzip2 in python avro tool > - > > Key: AVRO-1527 > URL: https://issues.apache.org/jira/browse/AVRO-1527 > Project: Apache Avro > Issue Type: Improvement > Components: python >Affects Versions: 1.7.6 >Reporter: Eustache >Priority: Minor > Labels: avro, bzip2, python > Fix For: 1.7.9 > > Attachments: AVRO-1527.diff > > > The python tool to decode avro files is missing bzip2 support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1478) protobuf namespaces causing problem for avro c++ reader
[ https://issues.apache.org/jira/browse/AVRO-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1478: -- Component/s: c++ > protobuf namespaces causing problem for avro c++ reader > --- > > Key: AVRO-1478 > URL: https://issues.apache.org/jira/browse/AVRO-1478 > Project: Apache Avro > Issue Type: Bug > Components: c++ >Reporter: George Baxter >Priority: Major > > Utilizing the ProtobufData functionality to generate avro output, we run into > a complication when consuming this output using the c++ based avro reader. > Seems it doesn't much like the '$' of a nesting outer class that is inherent > with protocol buffers in java. > Exception opening file for read:Invalid namespace: > com.xxx.base.message.MessageProtos$ > in > avro::DataFileReader* file_reader; > file_reader = new > avro::DataFileReader(file_name.c_str());] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1208) Improve Trevni's performance on row-oriented data access
[ https://issues.apache.org/jira/browse/AVRO-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1208: -- Component/s: java > Improve Trevni's performance on row-oriented data access > > > Key: AVRO-1208 > URL: https://issues.apache.org/jira/browse/AVRO-1208 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.7.3 >Reporter: Yin Huai >Assignee: Yin Huai >Priority: Major > Attachments: AVRO-1208.1.patch, AVRO-1208.2.patch > > > Trevni uses an 64KB internal buffer to store values of a column. When > accessing a column, it reads 64KB (if we do not consider compression and > checksum) data from the storage layer. However, when the table is accessed in > a row-oriented fashion (a entire row needs to be handed over to the upper > layer), in the worst case (a full table scan and values of this table are all > the same size), every 64KB data read can cause a seek. > This jira is used to discuss if we should consider the data access pattern > mentioned above and if so, how to improve the performance of Trevni. > Row-oriented data processing engines, e.g. Hive, can benefit from this work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1192) trevni should support RLE encoding based on selectivity
[ https://issues.apache.org/jira/browse/AVRO-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1192: -- Component/s: java > trevni should support RLE encoding based on selectivity > --- > > Key: AVRO-1192 > URL: https://issues.apache.org/jira/browse/AVRO-1192 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.8.0 >Reporter: alex gemini >Priority: Minor > Labels: compression, performance > > it would be nice if trevni support run-length encoding. columnar format > should first sort the columnar order based on selectivity .for higher > selectivity column trenvi should support run-length encoding .more > information will be found in paper "C-Store: A Column-oriented DBMS" section > 3.1 :Encoding Schemes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1801) Generated code results in java.lang.ClassCastException
[ https://issues.apache.org/jira/browse/AVRO-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1801: -- Component/s: java > Generated code results in java.lang.ClassCastException > -- > > Key: AVRO-1801 > URL: https://issues.apache.org/jira/browse/AVRO-1801 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.0 >Reporter: Alex Baumgarten >Priority: Major > > Create and compile avro schema: > { > "namespace": "com.abc.def.ghi.schema", > "type": "record", > "name": "MyDataRecord", > "fields": [ > {"name": "Heading", "type": ["null", {"type": "fixed", "name": > "short", "size": 2}]} > ] > } > which leads to compiled code: > public void put(int field$, java.lang.Object value$) { > switch (field$) { > case 0: Heading = (com.abc.def.ghi.schema.short$)value$; break; > default: throw new org.apache.avro.AvroRuntimeException("Bad index"); > } > } > When this function is called the type of value is > org.apache.avro.generic.GenericData$Fixed and when it tries to cast to the > short$ type it throws a java.lang.ClassCastException. > This occurs when running the following code: > SpecificDatumReader datumReader = new > SpecificDatumReader<>(MyDataRecord.class); > DataFileReader dataFileReader = new DataFileReader<>(new > FsInput(inputAvroPath, configuration), datumReader); > for (MyDataRecord record : dataFileReader) { > // Do something with record > } > If I manually modify the generated code to extract the bytes from value$ and > call the constructor of short$ it works as expected. But this is not what is > generated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2029) Specific Data generated class missing Decimal Conversion
[ https://issues.apache.org/jira/browse/AVRO-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2029: -- Component/s: java > Specific Data generated class missing Decimal Conversion > > > Key: AVRO-2029 > URL: https://issues.apache.org/jira/browse/AVRO-2029 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.2 >Reporter: Adrian McCague >Priority: Major > Original Estimate: 4h > Remaining Estimate: 4h > > Using 1.8.2-rc3 > Given a class generated with {{-bigDecimal}}, the generated class defines the > DECIMAL_CONVERSION but does not set it to a {{BigDecimal}} field index. > Fields for illustration: > {code} > @Deprecated public java.lang.String id; > @Deprecated public org.joda.time.DateTime timestamp; > @Deprecated public java.lang.String applicationId; > @Deprecated public java.math.BigDecimal amount;; > ... > protected static final > org.apache.avro.data.TimeConversions.TimestampConversion TIMESTAMP_CONVERSION > = new org.apache.avro.data.TimeConversions.TimestampConversion(); > protected static final org.apache.avro.Conversions.DecimalConversion > DECIMAL_CONVERSION = new org.apache.avro.Conversions.DecimalConversion(); > private static final org.apache.avro.Conversion[] conversions = > new org.apache.avro.Conversion[] { > null, > TIMESTAMP_CONVERSION, > null, > null, // Should be DECIMAL_CONVERSION > null, > null, > null, > null, > null > }; > {code} > I am currently unsure of the impact of this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1196) trevni should add max-min value on file header
[ https://issues.apache.org/jira/browse/AVRO-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1196: -- Component/s: java > trevni should add max-min value on file header > -- > > Key: AVRO-1196 > URL: https://issues.apache.org/jira/browse/AVRO-1196 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.8.0 >Reporter: alex gemini >Priority: Minor > Labels: performance > > trevni's file header should contain max-min value for current block.It will > further support query engine predict push down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1666) avro.ipc.Responder logger is too noisy and have system/user error as WARN
[ https://issues.apache.org/jira/browse/AVRO-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1666: -- Component/s: java > avro.ipc.Responder logger is too noisy and have system/user error as WARN > - > > Key: AVRO-1666 > URL: https://issues.apache.org/jira/browse/AVRO-1666 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 >Reporter: Jacek Migdal >Priority: Major > Original Estimate: 1h > Remaining Estimate: 1h > > I used avro ipc a lot and enjoy it. Great work! Would love to contribute back. > We sometimes use avro-ipc exceptions to signal rare, but correct situations > (e.g. user session has ended). Because of the scale this cause tons of WARN > logs with stack traces from avro.icp.Responder: > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.avro/avro-ipc/1.7.5/org/apache/avro/ipc/Responder.java#156 > Though would like to exclude in log4j, I can't because I'm interested in > "system error" which signal real bug and are also on WARN level. > Potential solutions that would make me happy: > 1. Move "user error" to INFO level. > 2. Move "system error" to ERROR level. > 3. Have some option/flag to switch of INFO level. > > Happy to write a patch for that, once I get blessing from core developer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1707) Java serialization readers/writers in generated Java classes
[ https://issues.apache.org/jira/browse/AVRO-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1707: -- Component/s: java > Java serialization readers/writers in generated Java classes > > > Key: AVRO-1707 > URL: https://issues.apache.org/jira/browse/AVRO-1707 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.8.0 >Reporter: Zoltan Farkas >Priority: Major > > the following static instances are declared in the generated classes: > private static final org.apache.avro.io.DatumWriter > WRITER$ = new org.apache.avro.specific.SpecificDatumWriter(SCHEMA$); > private static final org.apache.avro.io.DatumReader > READER$ = new org.apache.avro.specific.SpecificDatumReader(SCHEMA$); > the reaser/writer hold on to a reference to the "Creator Thread": > "private final Thread creator;" > which inhibits GC-ing thread locals... for this thread... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1466) Avro Tools fromjson (ie JsonDecoder) cannot parse "NaN" values created by tojson (ie JsonEncoder)
[ https://issues.apache.org/jira/browse/AVRO-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1466: -- Component/s: java > Avro Tools fromjson (ie JsonDecoder) cannot parse "NaN" values created by > tojson (ie JsonEncoder) > - > > Key: AVRO-1466 > URL: https://issues.apache.org/jira/browse/AVRO-1466 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.6 >Reporter: Jamie Olson >Priority: Major > > Avro files containing NaN values are converted to JSON as a string "NaN" by > Avro Tools tojson command (ie JsonEncoder). These values cannot be parsed by > the Avro Tools fromjson command (ie JsonDecoder.readDouble). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1321) Avro-ipc-tests in compile scope instead of test in Avro-mapred
[ https://issues.apache.org/jira/browse/AVRO-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1321: -- Component/s: java > Avro-ipc-tests in compile scope instead of test in Avro-mapred > -- > > Key: AVRO-1321 > URL: https://issues.apache.org/jira/browse/AVRO-1321 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.3 >Reporter: Benyi Wang >Priority: Trivial > > org.apache.avro:avro-ipc:1.7.3:tests is listed in "compile" scope instead of > "test" scope. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1703) Specific record should not only be determined by presence of SCHEMA$ field
[ https://issues.apache.org/jira/browse/AVRO-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1703: -- Component/s: java > Specific record should not only be determined by presence of SCHEMA$ field > -- > > Key: AVRO-1703 > URL: https://issues.apache.org/jira/browse/AVRO-1703 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Marius Soutier >Priority: Major > Labels: starter > > I want to use Avro from Scala, i.e. generate case classes from an Avro > schema. So far this is working fine except for one thing - fields in Scala > classes are always private. This doesn't work with Avro SpecificRecords (at > least when inferring the schema from the class) and results in the following > exception: > org.apache.avro.AvroRuntimeException: java.lang.IllegalAccessException: Class > org.apache.avro.specific.SpecificData can not access a member of class > with modifiers "private" > The exception is thrown from the following line in > org.apache.avro.specific.SpecificData: > schema = (Schema)(c.getDeclaredField("SCHEMA$").get(null)); > My suggestion would be to additionally check for a method called `getSchema` > and read the schema from that method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2291) GenericData.Array not reusable after AVRO-2050
[ https://issues.apache.org/jira/browse/AVRO-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-2291: -- Component/s: java > GenericData.Array not reusable after AVRO-2050 > -- > > Key: AVRO-2291 > URL: https://issues.apache.org/jira/browse/AVRO-2291 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Martin Jubelgas >Priority: Minor > > The fix for AVRO-2050 left the reusing functionality of GenericData.Array > broken. By nulling all fields of the underlying array during > GenericData.Array.clear(), there was no more data to be reused in > GenericDatumReader.readArray(). > I've already posted a pull request to alleviate this issue in a backward > compatible matter as a possible solution. Comments welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1648) @Union annotation cannot handle the class on which its used
[ https://issues.apache.org/jira/browse/AVRO-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1648: -- Component/s: java > @Union annotation cannot handle the class on which its used > --- > > Key: AVRO-1648 > URL: https://issues.apache.org/jira/browse/AVRO-1648 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 >Reporter: Sachin Goyal >Priority: Major > > The bug is as shown in the following code: > {code} > // Having Base.class in the union results in infinite recursion > @Union ({Base.class, Derived.class}) > // Having no Base.class in the union fails PolymorphicDO.obj2 > @Union ({Derived.class}) > private static class Base > { > Integer a = 5; > } > private static class Derived extends Base > { > String b = "Foo"; > } > private static class PolymorphicDO > { > Base obj = new Derived(); > Base obj2 = new Base(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1194) trevni should support delta encoding based on selectivity and data type
[ https://issues.apache.org/jira/browse/AVRO-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1194: -- Component/s: java > trevni should support delta encoding based on selectivity and data type > --- > > Key: AVRO-1194 > URL: https://issues.apache.org/jira/browse/AVRO-1194 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.8.0 >Reporter: alex gemini >Priority: Minor > > it would be nice if trevni support delta encoding. columnar format should > first sort the columnar order based on selectivity .for middle selectivity > column such as timestamp trenvi should support delta encoding.more > information will be found in paper "C-Store: A Column-oriented DBMS" section > 3.1 :Encoding Schemes ,type 3 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1468) implement interface-based code-generation
[ https://issues.apache.org/jira/browse/AVRO-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1468: -- Component/s: java > implement interface-based code-generation > - > > Key: AVRO-1468 > URL: https://issues.apache.org/jira/browse/AVRO-1468 > Project: Apache Avro > Issue Type: New Feature > Components: java >Reporter: Doug Cutting >Priority: Major > > The current specific compiler generates a concrete class per record. > Instead, we might generate an interface per record that might be implemented > in different ways. Implementations might include: > - A wrapper for a generic record. This would permit the schema that is > compiled against to differ from that of the runtime instance. A field that > was added since code-generation could be retained as records are filtered or > sorted and re-written. > - A concrete record. This would be similar to the existing specific. > - A wrapped POJO. The generated class could wrap a POJO using reflection. > Aliases could map between the schema used at compilation and that of the > POJO, so field and class names need not match exactly. This would permit one > to evolve from a POJO-based Avro application to using generated code without > breaking existing code. > This approach was first described in http://s.apache.org/AvroFlex -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1641) parser.java stack should expand quickly up to some threshold rather than start at the threshold
[ https://issues.apache.org/jira/browse/AVRO-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1641: -- Component/s: java > parser.java stack should expand quickly up to some threshold rather than > start at the threshold > --- > > Key: AVRO-1641 > URL: https://issues.apache.org/jira/browse/AVRO-1641 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7, 1.8.0 >Reporter: Zoltan Farkas >Assignee: Zoltan Farkas >Priority: Minor > Attachments: AVRO-1641.patch > > > at Parser.java line 65 > (https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/io/parsing/Parser.java#L65): > > {noformat} > private void expandStack() { > stack = Arrays.copyOf(stack, stack.length+Math.max(stack.length,1024)); > } > {noformat} > should probably be: > {noformat} > private void expandStack() { > stack = Arrays.copyOf(stack, stack.length+Math.min(stack.length,1024)); > } > {noformat} > This expansion probably is intended to grow exponentially up to 1024, and not > exponentially after 1024... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-1195) trevni should support dictionary encoding based on selectivity and data type
[ https://issues.apache.org/jira/browse/AVRO-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. updated AVRO-1195: -- Component/s: java > trevni should support dictionary encoding based on selectivity and data type > > > Key: AVRO-1195 > URL: https://issues.apache.org/jira/browse/AVRO-1195 > Project: Apache Avro > Issue Type: Improvement > Components: java >Affects Versions: 1.8.0 >Reporter: alex gemini >Priority: Minor > Labels: compression, performance > > it would be nice if trevni support dictionary encoding. columnar format > should first sort the columnar order based on selectivity .for lower > selectivity column such as email or address trenvi should support dictionary > encoding .more information will be found in paper "C-Store: A Column-oriented > DBMS" section 3.1 :Encoding Schemes,type 4. -- This message was sent by Atlassian JIRA (v7.6.3#76005)