Re: Strategy for mixing large_string and string with chunked arrays

2019-12-17 Thread Maarten Breddels
Op wo 27 nov. 2019 om 19:37 schreef Wes McKinney : > On Tue, Nov 26, 2019 at 9:40 AM Maarten Breddels > wrote: > > > > Op di 26 nov. 2019 om 15:02 schreef Wes McKinney : > > > > > hi Maarten > > > > > > I opened https://issues.apache.org/jira/browse/ARROW-7245 in part > based > > > on this. > > >

Re: Strategy for mixing large_string and string with chunked arrays

2019-11-27 Thread Wes McKinney
On Tue, Nov 26, 2019 at 9:40 AM Maarten Breddels wrote: > > Op di 26 nov. 2019 om 15:02 schreef Wes McKinney : > > > hi Maarten > > > > I opened https://issues.apache.org/jira/browse/ARROW-7245 in part based > > on this. > > > > I think that normalizing to a common type (which would require castin

Re: Strategy for mixing large_string and string with chunked arrays

2019-11-26 Thread Maarten Breddels
Op di 26 nov. 2019 om 15:02 schreef Wes McKinney : > hi Maarten > > I opened https://issues.apache.org/jira/browse/ARROW-7245 in part based > on this. > > I think that normalizing to a common type (which would require casting > the offsets buffer, but not the data -- which can be shared -- so not

Re: Strategy for mixing large_string and string with chunked arrays

2019-11-26 Thread Wes McKinney
hi Maarten I opened https://issues.apache.org/jira/browse/ARROW-7245 in part based on this. I think that normalizing to a common type (which would require casting the offsets buffer, but not the data -- which can be shared -- so not too wasteful) during concatenation would be the approach I would

Strategy for mixing large_string and string with chunked arrays

2019-11-26 Thread Maarten Breddels
Hi Arrow devs, Small intro: I'm the main Vaex developer, an out of core dataframe library for Python - https://github.com/vaexio/vaex -, and we're looking into moving Vaex to use Apache Arrow for the data structure. At the beginning of this year, we added string support in Vaex, which required 64