[ https://issues.apache.org/jira/browse/ARROW-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Francois Saint-Jacques resolved ARROW-6046. ------------------------------------------- Resolution: Fixed Issue resolved by pull request 5126 [https://github.com/apache/arrow/pull/5126] > [C++] Slice RecordBatch of String array with offset 0 returns whole batch > ------------------------------------------------------------------------- > > Key: ARROW-6046 > URL: https://issues.apache.org/jira/browse/ARROW-6046 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 0.14.1 > Reporter: Sascha Hofmann > Assignee: Wes McKinney > Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We are seeing a very similar bug as in ARROW-809, just for a RecordBatch of > strings. A slice of a RecordBatch with a string column and offset =0 returns > the whole batch instead. > > {code:java} > import pandas as pd > import pyarrow as pa > df = pd.DataFrame({ 'b': ['test' for x in range(1000_000)]}) > tbl = pa.Table.from_pandas(df) > batch = tbl.to_batches()[0] > batch.slice(0,2).serialize().size > # 4000232 > batch.slice(1,2).serialize().size > # 240 > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003)