[jira] [Resolved] (ARROW-337) UnionListWriter.list() is doing more than it should, this can cause data corruption
[ https://issues.apache.org/jira/browse/ARROW-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved ARROW-337. - Resolution: Fixed Issue resolved by pull request 183 [https://github.com/apache/arrow/pull/183] > UnionListWriter.list() is doing more than it should, this can cause data > corruption > --- > > Key: ARROW-337 > URL: https://issues.apache.org/jira/browse/ARROW-337 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim > > If you run the following code: > {code} > MapVector parent = new MapVector("parent", allocator, null); > ComplexWriter writer = new ComplexWriterImpl("root", parent); > MapWriter rootWriter = writer.rootAsMap(); > ListWriter listWriter = rootWriter.list("list"); > ListWriter list = listWriter.list(); > rootWriter.start(); > { > listWriter.startList(); > { > list.startList(); > list.bigInt().writeBigInt(0); > list.endList(); > } > { > list.startList(); > list.bigInt().writeBigInt(1); > list.endList(); > } > listWriter.endList(); > } > rootWriter.end(); > writer.setValueCount(1); > MapReader rootReader = new SingleMapReaderImpl(parent).reader("root"); > System.out.println(rootReader.reader("list").readObject()); > {code} > You should expect it to print {noformat}[[0],[1]]{noformat} > but it actually prints {noformat}[[0,1]]{noformat} > If you change the code so that UnionListWriter.list() is called along with > startList() then the code works fine: > {code} > MapVector parent = new MapVector("parent", allocator, null); > ComplexWriter writer = new ComplexWriterImpl("root", parent); > MapWriter rootWriter = writer.rootAsMap(); > rootWriter.start(); > { > ListWriter listWriter = rootWriter.list("mylist"); > listWriter.startList(); > { > ListWriter list = listWriter.list(); > list.startList(); > list.bigInt().writeBigInt(0); > list.endList(); > } > { > ListWriter list = listWriter.list(); > list.startList(); > list.bigInt().writeBigInt(1); > list.endList(); > } > listWriter.endList(); > } > rootWriter.end(); > writer.setValueCount(1); > MapReader rootReader = new SingleMapReaderImpl(parent).reader("root"); > System.out.println(rootReader.reader("mylist").readObject()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-300) [Format] Add buffer compression option to IPC file format
[ https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608798#comment-15608798 ] Uwe L. Korn commented on ARROW-300: --- +1 Compression makes sense to me and also the list of initial algorithms. High compression ratios probably only make sense once you have cross-datacenter traffic. > [Format] Add buffer compression option to IPC file format > - > > Key: ARROW-300 > URL: https://issues.apache.org/jira/browse/ARROW-300 > Project: Apache Arrow > Issue Type: New Feature > Components: Format >Reporter: Wes McKinney > > It may be useful if data is to be sent over the wire to compress the data > buffers themselves as their being written in the file layout. > I would propose that we keep this extremely simple with a global buffer > compression setting in the file Footer. Probably only two compressors worth > supporting out of the box would be zlib (higher compression ratios) and lz4 > (better performance). > What does everyone think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ARROW-300) [Format] Add buffer compression option to IPC file format
[ https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608768#comment-15608768 ] Wes McKinney commented on ARROW-300: It may make sense to limit to compressors designed for fast decompression performance: snappy, zstd, lz4. High compression ratios might be less interesting, but I'm interested in more feedback on use cases. > [Format] Add buffer compression option to IPC file format > - > > Key: ARROW-300 > URL: https://issues.apache.org/jira/browse/ARROW-300 > Project: Apache Arrow > Issue Type: New Feature > Components: Format >Reporter: Wes McKinney > > It may be useful if data is to be sent over the wire to compress the data > buffers themselves as their being written in the file layout. > I would propose that we keep this extremely simple with a global buffer > compression setting in the file Footer. Probably only two compressors worth > supporting out of the box would be zlib (higher compression ratios) and lz4 > (better performance). > What does everyone think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)