[jira] [Created] (AVRO-2764) Request to improve error message on type mismatch

2020-02-28 Thread Suren Nihalani (Jira)
Suren Nihalani created AVRO-2764:


 Summary: Request to improve error message on type mismatch
 Key: AVRO-2764
 URL: https://issues.apache.org/jira/browse/AVRO-2764
 Project: Apache Avro
  Issue Type: Wish
Affects Versions: 1.7.7
Reporter: Suren Nihalani


Let's say I am looking at a stacktrace like this (see at the bottom). It's hard 
for me to know what data field is corrupted. I'd like exception message to tell 
me what's wrong so I know what I need to do

{{Caused by: org.apache.avro.AvroTypeException: Found TUPLE_1, expecting union}}
{{at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)}}
{{at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)}}
{{at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)}}
{{at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)}}
{{at 
org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)}}
{{at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)}}
{{at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)}}
{{at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)}}
{{at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)}}
{{at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)}}
{{at 
org.apache.spark.sql.avro.AvroFileFormat$$anonfun$buildReader$1$$anon$1.next(AvroFileFormat.scala:302)}}
{{at 
org.apache.spark.sql.avro.AvroFileFormat$$anonfun$buildReader$1$$anon$1.next(AvroFileFormat.scala:282)}}
{{at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)}}
{{at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.next(FileScanRDD.scala:101)}}
{{at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys$(Unknown
 Source)}}
{{at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)}}
{{at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)}}
{{at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$12$$anon$1.hasNext(WholeStageCodegenExec.scala:634)}}
{{at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)}}
{{at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)}}
{{at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)}}
{{at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)}}
{{at org.apache.spark.scheduler.Task.run(Task.scala:109)}}
{{at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:384)}}
{{at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
{{at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
{{at java.lang.Thread.run(Thread.java:748)}}

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] version numbers and where changes should land

2020-02-28 Thread Sean Busbey
Counterpoint on independently versioning the various languages. Do we
know if Python Avro X works with Java Avro Y as it is? It seems like
we already get surprised pretty often when they don't.

If we stop including the "data compatibility version" or whatever
we're calling the first number, we'll need to get more formal on
versioning the specification and having libraries plainly label which
specification(s) they comport to.

At the very least it seems like we'd make the _easy_ path easier for
the languages that are well maintained. Sure it'll be burden on those
languages that aren't well maintained, but it seems like those are
already in that position?

On Thu, Feb 27, 2020 at 9:13 AM Ismaël Mejía  wrote:
>
> Bringing my comment from the JIRA ticket here for discussion:
>
> > "One argument against semantic versioning is the fact that Avro supports
> 9 language APIs, so if let's say C++ breaks its backwards compatibility
> should we move the version number up for every single language? Sounds like
> a burden and in particular a not easy task to track since we do not have
> proper validation of breaking changes in place for every language at this
> point.
> > ... (even if we separate release numbers per language) that seems like a
> lot of work for probably a similar output because then users will doubt,
> wait is Python Avro 3.1.0 compatible with Java Avro 5.2.0? and they will
> probably be for the binary format."
>
> Also there is the case of interop tests, how will those act in this case.
> We will need a compatibility matrix, again I am not sure if it is the best
> approach, looks like lots of work for not much in return.
>
>
>
> On Thu, Feb 27, 2020 at 12:21 PM Ryan Skraba  wrote:
>
> > Hello!  Resurrecting -- I think this was the last thread bringing up this
> > issue!
> >
> > Since we've talked about releasing 1.10.x in May, and it's a nice
> > round number... what do you think about
> >
> > 1) finally dropping the prefix for the "specification version" and
> > calling it Avro 10.x
> >
> > 2) committing to semantic versioning for future releases
> >
> > I can see this being a hugely positive move for aligning with the
> > expectations of developers and projects... but it leads to a lot of
> > questions about releasing all the artifacts together.
> >
> > There's already a JIRA: https://issues.apache.org/jira/browse/AVRO-2687
> >
> > Ryan
> >
> > On Fri, Sep 13, 2019 at 12:00 PM Driesprong, Fokko 
> > wrote:
> > >
> > > Thanks Sean for bringing this up.
> > >
> > > For the 1.9 branch there were some incompatible changes in the API with
> > > respect to 1.8.2. We've removed Jackson
> > >  and Netty from the public API.
> > > This is actually breaking some of the builds
> > > , so,
> > unfortunately,
> > > it isn't compatible, and therefore the major version bump.
> > >
> > > The 1.9.x branch still has support for the Joda time library, but
> > defaults
> > > to jsr310, but is still compatible (I believe). For 1.10 the plan is to
> > > completely remove Joda from the codebase since it is officially
> > deprecated
> > > in favor of Java8 time (jsr310). A lot of this stuff is just changes to
> > the
> > > Java API of Avro, which mostly involves changes to the LogicalTypes, so
> > the
> > > actual format is still compatible (as it should).
> > >
> > > I agree with you Sean, that a lot of the changes that are targeted for
> > 1.10
> > > could be cherry-picked back to the 1.9 branch. If someone is willing to
> > do
> > > this, I would be grateful. However, maintaining a lot of different
> > branches
> > > is quite time-consuming in terms of release management of the different
> > > versions. For Apache Avro 1.9.0 we actually had some regression bugs
> > which
> > > were blocking, therefore the 1.9.1 release.
> > >
> > > Personally I don't have big objection on bumping the major version if
> > there
> > > are breaking changes to one of the API's. But a big +1 on having a
> > > standardized approach on the versioning, this also includes a more clear
> > > approach on documenting the upgrade process and a better changelog. I've
> > > added summaries of the releases a the Github releases:
> > > https://github.com/apache/avro/releases but I think having this on the
> > Avro
> > > website might be more appropriate.
> > >
> > > Cheers, Fokko Driesprong
> > >
> > >
> > >
> > > Op wo 11 sep. 2019 om 18:17 schreef Ryan Blue  > >:
> > >
> > > > > What would it look like if we *did* have to make an incompatible data
> > > > format change after adopting "conventional" library version strings?
> > > >
> > > > Let's call these format v1 and v2. The library must produce v1 by
> > default,
> > > > so it's a matter of having support for writing v2. When the default
> > > > changes to v2, then that behavior change would require a major version
> > > > increase to signal changes to compatibility. I think we would also want
> > > > 

[jira] [Commented] (AVRO-2748) python schema resolution occurs on every read

2020-02-28 Thread Erik Erlandson (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047663#comment-17047663
 ] 

Erik Erlandson commented on AVRO-2748:
--

In a sense, resolving "int" against ["int", "string"] is not a type-safe match. 
 I can see why someone might want to allow it, but I can also imagine not 
wanting it to succeed, for exactly the reason you showed - it can fail partway 
through a data set.

It makes me wonder if there should be two modes of schema resolution. The mode 
that exists, which is sort of like "runtime type checking" and another mode 
that is closer to "compile-time type checking" in the sense that it (1) happens 
once, up front, and (2) if it does succeed, you can safely assume all your data 
reads will succeed.

> python schema resolution occurs on every read
> -
>
> Key: AVRO-2748
> URL: https://issues.apache.org/jira/browse/AVRO-2748
> Project: Apache Avro
>  Issue Type: Bug
>  Components: python
>Affects Versions: 1.9.2
>Reporter: Erik Erlandson
>Priority: Minor
>
> In python, the schema resolution appears to be happening on each read 
> operation. I'm not an avro expert but in my perusing through the python io 
> code I haven't yet noticed a reason that the schema resolution couldn't 
> happen once up front, during the construction of DataFileReader, when it 
> first loads the write_schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2687) Semantic Versioning

2020-02-28 Thread Elliotte Rusty Harold (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047533#comment-17047533
 ] 

Elliotte Rusty Harold commented on AVRO-2687:
-

You're right, it's a lot of work. I do think that loosening coupling between 
format and languages is a big win for the ecosystem, independent of its impact 
on semver. Products can move faster when developers only have to work in one 
language at a time.

> Semantic Versioning
> ---
>
> Key: AVRO-2687
> URL: https://issues.apache.org/jira/browse/AVRO-2687
> Project: Apache Avro
>  Issue Type: Improvement
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> API level and other incompatibility between Avro minor versions is causing 
> significant problems for Apache Beam. E.g. 
> [https://github.com/apache/beam/pull/9779]
>  
> Stable releases that don't break backwards compatibility would help us and 
> other users a great deal. E.g. not removing joda.time support in 1.10.
>  
> Absent that, at  a minimum Avro should update its major version for any API 
> breaking change. E.g. 1.9 should have been 2.0 because it was not API 
> compatible with 1.8. In the case of Avro, this would apply not just to the 
> public Java API but also to the serialization format. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2758) Bump istanbul to 0.4.5

2020-02-28 Thread ASF subversion and git services (Jira)

[ 
https://issues.apache.org/jira/browse/AVRO-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047437#comment-17047437
 ] 

ASF subversion and git services commented on AVRO-2758:
---

Commit bb77e772f3f388824f23f6b9079ec810427e07e6 in avro's branch 
refs/heads/branch-1.9 from Kengo Seki
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=bb77e77 ]

AVRO-2758: Bump istanbul to 0.4.5 (#833)



> Bump istanbul to 0.4.5
> --
>
> Key: AVRO-2758
> URL: https://issues.apache.org/jira/browse/AVRO-2758
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: js
>Affects Versions: 1.9.2
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 1.10.0, 1.9.3
>
>
> As reported in AVRO-2642, istanbul 0.4.4 or earlier has some vulnerabilities 
> as follows:
> {code}
> sekikn@0327d61710c0:~/avro/lang/js$ grep istanbul package.json 
> "cover": "istanbul cover _mocha -- -f interop -i",
> "istanbul": "^0.3.19",
> sekikn@0327d61710c0:~/avro/lang/js$ npm i
> audited 361 packages in 1.044s
> 4 packages are looking for funding
>   run `npm fund` for details
> found 3 vulnerabilities (1 moderate, 2 high)
>   run `npm audit fix` to fix them, or `npm audit` for details
> sekikn@0327d61710c0:~/avro/lang/js$ npm audit
>   
>   
>=== npm audit security report ===  
>   
>   
>   
> ┌──┐
> │Manual Review
>  │
> │Some vulnerabilities require your attention to resolve   
>  │
> │ 
>  │
> │ Visit https://go.npm.me/audit-guide for additional guidance 
>  │
> └──┘
> ┌───┬──┐
> │ High  │ Regular Expression Denial of Service
>  │
> ├───┼──┤
> │ Package   │ minimatch   
>  │
> ├───┼──┤
> │ Patched in│ >=3.0.2 
>  │
> ├───┼──┤
> │ Dependency of │ istanbul [dev]  
>  │
> ├───┼──┤
> │ Path  │ istanbul > fileset > minimatch  
>  │
> ├───┼──┤
> │ More info │ https://npmjs.com/advisories/118
>  │
> └───┴──┘
> ┌───┬──┐
> │ Moderate  │ Denial of Service   
>  │
> ├───┼──┤
> │ Package   │ js-yaml 
>  │
> ├───┼──┤
> │ Patched in│ >=3.13.0
>  │
> ├───┼──┤
> │ Dependency of │ istanbul [dev]  
>  │
> ├───┼──┤
> │ Path  │ istanbul > js-yaml  
>  │
> ├───┼──┤
> │ More info │ https://npmjs.com/advisories/788
>  │
> └───┴──┘
> ┌───┬──┐
> │ High  │ Code Injection  
>  │
> ├───┼──┤
> │ Package   │ js-yaml 
>  │
> ├───┼──┤
> │ Patched in│ >=3.13.1
>  │
> 

[jira] [Updated] (AVRO-2758) Bump istanbul to 0.4.5

2020-02-28 Thread Fokko Driesprong (Jira)

 [ 
https://issues.apache.org/jira/browse/AVRO-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong updated AVRO-2758:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Bump istanbul to 0.4.5
> --
>
> Key: AVRO-2758
> URL: https://issues.apache.org/jira/browse/AVRO-2758
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: js
>Affects Versions: 1.9.2
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
> Fix For: 1.10.0, 1.9.3
>
>
> As reported in AVRO-2642, istanbul 0.4.4 or earlier has some vulnerabilities 
> as follows:
> {code}
> sekikn@0327d61710c0:~/avro/lang/js$ grep istanbul package.json 
> "cover": "istanbul cover _mocha -- -f interop -i",
> "istanbul": "^0.3.19",
> sekikn@0327d61710c0:~/avro/lang/js$ npm i
> audited 361 packages in 1.044s
> 4 packages are looking for funding
>   run `npm fund` for details
> found 3 vulnerabilities (1 moderate, 2 high)
>   run `npm audit fix` to fix them, or `npm audit` for details
> sekikn@0327d61710c0:~/avro/lang/js$ npm audit
>   
>   
>=== npm audit security report ===  
>   
>   
>   
> ┌──┐
> │Manual Review
>  │
> │Some vulnerabilities require your attention to resolve   
>  │
> │ 
>  │
> │ Visit https://go.npm.me/audit-guide for additional guidance 
>  │
> └──┘
> ┌───┬──┐
> │ High  │ Regular Expression Denial of Service
>  │
> ├───┼──┤
> │ Package   │ minimatch   
>  │
> ├───┼──┤
> │ Patched in│ >=3.0.2 
>  │
> ├───┼──┤
> │ Dependency of │ istanbul [dev]  
>  │
> ├───┼──┤
> │ Path  │ istanbul > fileset > minimatch  
>  │
> ├───┼──┤
> │ More info │ https://npmjs.com/advisories/118
>  │
> └───┴──┘
> ┌───┬──┐
> │ Moderate  │ Denial of Service   
>  │
> ├───┼──┤
> │ Package   │ js-yaml 
>  │
> ├───┼──┤
> │ Patched in│ >=3.13.0
>  │
> ├───┼──┤
> │ Dependency of │ istanbul [dev]  
>  │
> ├───┼──┤
> │ Path  │ istanbul > js-yaml  
>  │
> ├───┼──┤
> │ More info │ https://npmjs.com/advisories/788
>  │
> └───┴──┘
> ┌───┬──┐
> │ High  │ Code Injection  
>  │
> ├───┼──┤
> │ Package   │ js-yaml 
>  │
> ├───┼──┤
> │ Patched in│ >=3.13.1
>  │
> ├───┼──┤
> │ Dependency of │ istanbul [dev]  
>  │
> ├───┼──┤
> │ Path  │ istanbul