Hi 1) I've used Pivotal HAWQ v1.3, this is why it showed this number. The version of HAWQ that is released to Apache was the branch of HAWQ 2.0, and it works in a bit different way. It creates a single "segment" per node, but in fact it can spawn any number of executors per node (while you have enough resources).
2) In Apache HAWQ the amount of virtual segments (i.e. executors) brought up depends on many factors, and the most important one is the data size. This is why you observe many executors per node. To get more information on what's happening I'd recommend you to run "explain analyze <your query>", this way you would get all the details including the amount of executors used Regarding parameters, I'd recommend you to check the documentation here: http://hdb.docs.pivotal.io/. gp_vmem_protect_limit is deprecated 1) You can enforce vseg number with enforce_virtual_segment_number GUC on a session level. Set default number with default_segment_num in hawq-site.xml 2) Number of buckets you mean? You can do it on table creation time: CREATE TABLE t1(c1 int) WITH (bucketnum = 3); On Sun, Feb 21, 2016 at 9:28 PM, Marek Wiewiorka <[email protected]> wrote: > Hi All, > I've spent a lot of time trying to figure out how to control the number of > segments instances/threads per node in Hawq and could find any information > on that. > In fact I'm a bit confused: > 1)in this blog entry: > http://0x0fff.com/spark-dataframes-are-faster-arent-they/#more-268 > > I found that Alexey had a cluster of 4 nodes with 10threads running on > each node. > If I query the same table (gp_segment_configuration) in my installation I > get only 5 rows - one per each node - this might indicate that I have only > one segment instance running per node. > > 2) On the other hand when I monitor cpu utilization while running some > test queries I can observe > that actually 8 threads per node are active. I can also see that all my > tables have 40 segments at maximum. > > I found some pieces of information on 2 params: > NSegs > The number of segment instances to run per segment > host. > > gp_vmem_protect_limit > The amount of memory allowed to a single segment instance on a host. > > But when I tried to put them in posgresql.conf of my nodes I found in log > that both are unrecognized: > > 2016-02-21 20:16:36.720257 > GMT,,,p20790,th-670398080,,,,0,,,seg-10000,,,,,"LOG","42704","unrecognized > configuration parameter > ""gp_vmem_protect_limit""",,,,,,,,"set_config_option","guc.c",9933, > 2016-02-21 20:16:36.720913 > GMT,,,p20790,th-670398080,,,,0,,,seg-10000,,,,,"LOG","42704","unrecognized > configuration parameter ""NSegs""",,,,,,,,"set_config_option","guc.c",9933, > > > So my question is how to: > 1)Control number of threads/segment instances per node? > 2)Control number of segments per table? > I suspect that these 2 things might be somehow interconnected. > > Many thanks for any hints on that. > > Marek > -- Alexey Grishchenko, http://0x0fff.com
