[jira] [Resolved] (ARROW-337) UnionListWriter.list() is doing more than it should, this can cause data corruption

2016-10-26 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved ARROW-337.
-
Resolution: Fixed

Issue resolved by pull request 183
[https://github.com/apache/arrow/pull/183]

> UnionListWriter.list() is doing more than it should, this can cause data 
> corruption
> ---
>
> Key: ARROW-337
> URL: https://issues.apache.org/jira/browse/ARROW-337
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>
> If you run the following code:
> {code}
> MapVector parent = new MapVector("parent", allocator, null);
> ComplexWriter writer = new ComplexWriterImpl("root", parent);
> MapWriter rootWriter = writer.rootAsMap();
> ListWriter listWriter = rootWriter.list("list");
> ListWriter list = listWriter.list();
> rootWriter.start();
> {
>   listWriter.startList();
>   {
> list.startList();
> list.bigInt().writeBigInt(0);
> list.endList();
>   }
>   {
> list.startList();
> list.bigInt().writeBigInt(1);
> list.endList();
>   }
>   listWriter.endList();
> }
> rootWriter.end();
> writer.setValueCount(1);
> MapReader rootReader = new SingleMapReaderImpl(parent).reader("root");
> System.out.println(rootReader.reader("list").readObject());
> {code}
> You should expect it to print {noformat}[[0],[1]]{noformat}
> but it actually prints {noformat}[[0,1]]{noformat}
> If you change the code so that UnionListWriter.list() is called along with 
> startList() then the code works fine:
> {code}
> MapVector parent = new MapVector("parent", allocator, null);
> ComplexWriter writer = new ComplexWriterImpl("root", parent);
> MapWriter rootWriter = writer.rootAsMap();
> rootWriter.start();
> {
>   ListWriter listWriter = rootWriter.list("mylist");
>   listWriter.startList();
>   {
> ListWriter list = listWriter.list();
> list.startList();
> list.bigInt().writeBigInt(0);
> list.endList();
>   }
>   {
> ListWriter list = listWriter.list();
> list.startList();
> list.bigInt().writeBigInt(1);
> list.endList();
>   }
>   listWriter.endList();
> }
> rootWriter.end();
> writer.setValueCount(1);
> MapReader rootReader = new SingleMapReaderImpl(parent).reader("root");
> System.out.println(rootReader.reader("mylist").readObject());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-300) [Format] Add buffer compression option to IPC file format

2016-10-26 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608798#comment-15608798
 ] 

Uwe L. Korn commented on ARROW-300:
---

+1 Compression makes sense to me and also the list of initial algorithms. High 
compression ratios probably only make sense once you have cross-datacenter 
traffic.

> [Format] Add buffer compression option to IPC file format
> -
>
> Key: ARROW-300
> URL: https://issues.apache.org/jira/browse/ARROW-300
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format
>Reporter: Wes McKinney
>
> It may be useful if data is to be sent over the wire to compress the data 
> buffers themselves as their being written in the file layout.
> I would propose that we keep this extremely simple with a global buffer 
> compression setting in the file Footer. Probably only two compressors worth 
> supporting out of the box would be zlib (higher compression ratios) and lz4 
> (better performance).
> What does everyone think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-300) [Format] Add buffer compression option to IPC file format

2016-10-26 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608768#comment-15608768
 ] 

Wes McKinney commented on ARROW-300:


It may make sense to limit to compressors designed for fast decompression 
performance: snappy, zstd, lz4. High compression ratios might be less 
interesting, but I'm interested in more feedback on use cases. 

> [Format] Add buffer compression option to IPC file format
> -
>
> Key: ARROW-300
> URL: https://issues.apache.org/jira/browse/ARROW-300
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format
>Reporter: Wes McKinney
>
> It may be useful if data is to be sent over the wire to compress the data 
> buffers themselves as their being written in the file layout.
> I would propose that we keep this extremely simple with a global buffer 
> compression setting in the file Footer. Probably only two compressors worth 
> supporting out of the box would be zlib (higher compression ratios) and lz4 
> (better performance).
> What does everyone think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)