Hi,
I don’t think I understood perfectly your point, but I try to give you the 
answer that looks the simplest to me.
In your code there isn’t any operation on table 1 and 2 separately, it just 
looks like you want to merge all those RecordBatches.
Now I think that:

  1.  you can use the to_batches() operation reported in the API for Table, but 
I never tried it myself. In this way you create 2 tables, create batches from 
these tables, put the batches togheter.
  2.  I would rather store ALL the BATCHES in the two streams in the SAME 
python LIST, and then create an unique table using from_batches() as you 
suggested. That’s because in your code you create two tables even though you 
don’t seem to care about them.

I didn’t try, but I think that you can go both ways and then tell us if the 
result is the same and if one of the two is faster then the other.

Alberto

Da: Rares Vernica<mailto:rvern...@gmail.com>
Inviato: mercoledì 14 febbraio 2018 05:13
A: dev@arrow.apache.org<mailto:dev@arrow.apache.org>
Oggetto: Merge multiple record batches

Hi,

If I have multiple RecordBatchStreamReader inputs, what is the recommended
way to get all the RecordBatch from all the inputs together, maybe in a
Table? They all have the same schema. The source for the readers are
different files.

So, I do something like:

reader1 = pa.open_stream('foo')
table1 = reader1.read_all()

reader2 = pa.open_stream('bar')
table2 = reader2.read_all()

# table_all = ???
# OR maybe I don't need to create table1 and table2
# table_all = pa.Table.from_batches( ??? )

Thanks!
Rares

Reply via email to