Hi, I don’t think I understood perfectly your point, but I try to give you the answer that looks the simplest to me. In your code there isn’t any operation on table 1 and 2 separately, it just looks like you want to merge all those RecordBatches. Now I think that:
1. you can use the to_batches() operation reported in the API for Table, but I never tried it myself. In this way you create 2 tables, create batches from these tables, put the batches togheter. 2. I would rather store ALL the BATCHES in the two streams in the SAME python LIST, and then create an unique table using from_batches() as you suggested. That’s because in your code you create two tables even though you don’t seem to care about them. I didn’t try, but I think that you can go both ways and then tell us if the result is the same and if one of the two is faster then the other. Alberto Da: Rares Vernica<mailto:rvern...@gmail.com> Inviato: mercoledì 14 febbraio 2018 05:13 A: dev@arrow.apache.org<mailto:dev@arrow.apache.org> Oggetto: Merge multiple record batches Hi, If I have multiple RecordBatchStreamReader inputs, what is the recommended way to get all the RecordBatch from all the inputs together, maybe in a Table? They all have the same schema. The source for the readers are different files. So, I do something like: reader1 = pa.open_stream('foo') table1 = reader1.read_all() reader2 = pa.open_stream('bar') table2 = reader2.read_all() # table_all = ??? # OR maybe I don't need to create table1 and table2 # table_all = pa.Table.from_batches( ??? ) Thanks! Rares