Re: Load performance with partitioned table

2016-09-19 Thread naveen mahadevuni
hi Franke, 1) We are using 4 indentical AWS machines. 8 vCPUs, 32 GB RAM. 1 TB storage 2) Setting up bloom filters only on two other string columns. Not all of them. 3) The data is any event data ex: Syslog. 4) Queries usually run on timestamp range with additional predicates on other columns

Re: Load performance with partitioned table

2016-09-15 Thread Jörn Franke
What is your hardware setup? Are the bloom filters necessary on all columns? Usually they make only sense for non-numeric columns. Updating bloom filters take time and should be avoided where they do not make sense. Can you provide an example of the data and the select queries that you execute

Load performance with partitioned table

2016-09-15 Thread naveen mahadevuni
Hi, I'm using ORC format for our table storage. The table has a timestamp column(say TS) and 25 other columns. The other ORC properties we are using arestorage index and bloom filters. We are loading 100 million records in to this table on a 4-node cluster. Our source table is a text table with