Your default_hash_table_bucket_number value is set to 6. Typically, this
should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you
should set this value to 18. Any time you change this parameter, you should
redistribute your HASH distributed tables, if you have any (unless the
table DDL has # of buckets defined, I think).

Increase default_hash_table_bucket_number to 18 and retry the insert. Since
you have 18 ext table URLs, it should work with the change.

Thanks
Vineet


On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yin....@163.com> wrote:

>
> My environment is >>>> : 1 master 3 segments
> SQL >>>>>>:
> CREATE TABLE call_center (
>     cc_call_center_sk integer,
>     cc_call_center_id character varying(16),
>     cc_rec_start_date date,
>     cc_rec_end_date date,
>     cc_closed_date_sk integer,
>     cc_open_date_sk integer,
>     cc_name character varying(50),
>     cc_class character varying(50),
>     cc_employees integer,
>     cc_sq_ft integer,
>     cc_hours character varying(20),
>     cc_manager character varying(40),
>     cc_mkt_id integer,
>     cc_mkt_class character varying(50),
>     cc_mkt_desc character varying(100),
>     cc_market_manager character varying(40),
>     cc_division text,
>     cc_division_name character varying(50),
>     cc_company text,
>     cc_company_name character varying(50),
>     cc_street_number character varying(10),
>     cc_street_name character varying(60),
>     cc_street_type character varying(15),
>     cc_suite_number character varying(10),
>     cc_city character varying(60),
>     cc_county character varying(30),
>     cc_state text,
>     cc_zip character varying(10),
>     cc_country character varying(20),
>     cc_gmt_offset numeric(5,2),
>     cc_tax_percentage numeric(5,2)
> )
> WITH (appendonly=true, orientation=parquet)
> DISTRIBUTED BY (cc_call_center_sk);
>
> CREATE EXTERNAL TABLE ext_call_center (like call_center)
> LOCATION (
> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
>
> insert into call_center select * from ext_call_center;
>
> ERROR:  External scan error: There are more external files (URLs) than
> primary segments that can read them (COptTasks.cpp:1756)
>
> hawq config >>>>:
>
> GUC : add_missing_from Value : off
> GUC : application_name Value :
> GUC : array_nulls Value : on
> GUC : authentication_timeout Value : 60
> GUC : backslash_quote Value : safe_encoding
> GUC : block_size Value : 32768
> GUC : bonjour_name Value :
> GUC : check_function_bodies Value : on
> GUC : client_encoding Value : UTF8
> GUC : client_min_messages Value : ERROR
> GUC : cpu_index_tuple_cost Value : 0.005
> GUC : cpu_operator_cost Value : 0.0025
> GUC : cpu_tuple_cost Value : 0.01
> GUC : cursor_tuple_fraction Value : 1
> GUC : custom_variable_classes Value :
> GUC : DateStyle Value : ISO, MDY
> GUC : db_user_namespace Value : off
> GUC : deadlock_timeout Value : 1000
> GUC : debug_assertions Value : off
> GUC : debug_pretty_print Value : off
> GUC : debug_print_parse Value : off
> GUC : debug_print_plan Value : off
> GUC : debug_print_prelim_plan Value : off
> GUC : debug_print_rewritten Value : off
> GUC : debug_print_slice_table Value : off
> GUC : default_hash_table_bucket_number Value : 6
> GUC : default_statement_mem Value : 128000
> GUC : default_statistics_target Value : 25
> GUC : default_tablespace Value :
> GUC : default_transaction_isolation Value : read committed
> GUC : default_transaction_read_only Value : off
> GUC : dfs_url Value : localhost:8020/hawq
> GUC : dynamic_library_path Value : $libdir
> GUC : effective_cache_size Value : 16384
> GUC : enable_bitmapscan Value : on
> GUC : enable_groupagg Value : on
> GUC : enable_hashagg Value : on
> GUC : enable_hashjoin Value : on
> GUC : enable_indexscan Value : on
> GUC : enable_mergejoin Value : off
> GUC : enable_nestloop Value : off
> GUC : enable_secure_filesystem Value : off
> GUC : enable_seqscan Value : on
> GUC : enable_sort Value : on
> GUC : enable_tidscan Value : on
> GUC : escape_string_warning Value : on
> GUC : explain_memory_verbosity Value : suppress
> GUC : explain_pretty_print Value : on
> GUC : extra_float_digits Value : 0
> GUC : filesystem_support_truncate Value : on
> GUC : from_collapse_limit Value : 20
> GUC : gp_adjust_selectivity_for_outerjoins Value : on
> GUC : gp_analyze_relative_error Value : 0.25
> GUC : gp_autostats_mode Value : NONE
> GUC : gp_autostats_on_change_threshold Value : 2147483647
> GUC : gp_backup_directIO Value : off
> GUC : gp_backup_directIO_read_chunk_mb Value : 20
> GUC : gp_cached_segworkers_threshold Value : 5
> GUC : gp_command_count Value : 6
> GUC : gp_connections_per_thread Value : 64
> GUC : gp_debug_linger Value : 0
> GUC : gp_dynamic_partition_pruning Value : on
> GUC : gp_email_connect_avoid_duration Value : 7200
> GUC : gp_email_connect_failures Value : 5
> GUC : gp_email_connect_timeout Value : 15
> GUC : gp_email_from Value :
> GUC : gp_email_smtp_password Value :
> GUC : gp_email_smtp_server Value : localhost:25
> GUC : gp_email_smtp_userid Value :
> GUC : gp_email_to Value :
> GUC : gp_enable_agg_distinct Value : on
> GUC : gp_enable_agg_distinct_pruning Value : on
> GUC : gp_enable_direct_dispatch Value : on
> GUC : gp_enable_fallback_plan Value : on
> GUC : gp_enable_fast_sri Value : on
> GUC : gp_enable_gpperfmon Value : off
> GUC : gp_enable_groupext_distinct_gather Value : on
> GUC : gp_enable_groupext_distinct_pruning Value : on
> GUC : gp_enable_multiphase_agg Value : on
> GUC : gp_enable_predicate_propagation Value : on
> GUC : gp_enable_preunique Value : on
> GUC : gp_enable_sequential_window_plans Value : on
> GUC : gp_enable_sort_distinct Value : on
> GUC : gp_enable_sort_limit Value : on
> GUC : gp_external_enable_exec Value : on
> GUC : gp_external_grant_privileges Value : off
> GUC : gp_external_max_segs Value : 64
> GUC : gp_filerep_tcp_keepalives_count Value : 2
> GUC : gp_filerep_tcp_keepalives_idle Value : 60
> GUC : gp_filerep_tcp_keepalives_interval Value : 30
> GUC : gp_force_use_default_temporary_directory Value : off
> GUC : gp_gpperfmon_send_interval Value : 1
> GUC : gp_hashjoin_tuples_per_bucket Value : 5
> GUC : gp_idf_deduplicate Value : auto
> GUC : gp_interconnect_cache_future_packets Value : on
> GUC : gp_interconnect_default_rtt Value : 20
> GUC : gp_interconnect_fc_method Value : LOSS
> GUC : gp_interconnect_hash_multiplier Value : 2
> GUC : gp_interconnect_min_retries_before_timeout Value : 100
> GUC : gp_interconnect_min_rto Value : 20
> GUC : gp_interconnect_queue_depth Value : 4
> GUC : gp_interconnect_setup_timeout Value : 7200
> GUC : gp_interconnect_snd_queue_depth Value : 2
> GUC : gp_interconnect_timer_checking_period Value : 20
> GUC : gp_interconnect_timer_period Value : 5
> GUC : gp_interconnect_transmit_timeout Value : 3600
> GUC : gp_interconnect_type Value : UDP
> GUC : gp_log_format Value : csv
> GUC : gp_max_csv_line_length Value : 1048576
> GUC : gp_max_databases Value : 16
> GUC : gp_max_filespaces Value : 8
> GUC : gp_max_packet_size Value : 8192
> GUC : gp_max_partition_level Value : 0
> GUC : gp_max_plan_size Value : 0
> GUC : gp_max_relations Value : 65536
> GUC : gp_max_tablespaces Value : 16
> GUC : gp_motion_cost_per_row Value : 0
> GUC : gp_num_contents_in_cluster Value : -10000
> GUC : gp_query_context_mem_limit Value : 102400
> GUC : gp_reject_percent_threshold Value : 300
> GUC : gp_reraise_signal Value : on
> GUC : gp_role Value : utility
> GUC : gp_safefswritesize Value : 0
> GUC : gp_segment_connect_timeout Value : 600
> GUC : gp_segments_for_planner Value : 0
> GUC : gp_session_id Value : -1
> GUC : gp_set_proc_affinity Value : off
> GUC : gp_snmp_community Value : public
> GUC : gp_snmp_monitor_address Value :
> GUC : gp_snmp_use_inform_or_trap Value : trap
> GUC : gp_statistics_pullup_from_child_partition Value : on
> GUC : gp_statistics_use_fkeys Value : on
> GUC : gp_subtrans_warn_limit Value : 16777216
> GUC : gp_temporary_directory_mark_error Value : 0
> GUC : gp_udp_bufsize_k Value : 0
> GUC : gp_vmem_protect_segworker_cache_limit Value : 500
> GUC : gp_workfile_checksumming Value : on
> GUC : gp_workfile_compress_algorithm Value : none
> GUC : gp_workfile_limit_per_query Value : 0
> GUC : gp_workfile_limit_per_segment Value : 0
> GUC : gpperfmon_port Value : 8888
> GUC : hawq_global_rm_type Value : yarn
> GUC : hawq_master_address_host Value : master.bigdata
> GUC : hawq_master_address_port Value : 6432
> GUC : hawq_master_temp_directory Value : /tmp
> GUC : hawq_metadata_cache_block_capacity Value : 2097152
> GUC : hawq_metadata_cache_check_interval Value : 30
> GUC : hawq_metadata_cache_flush_ratio Value : 0.85
> GUC : hawq_metadata_cache_free_block_max_ratio Value : 0.05
> GUC : hawq_metadata_cache_free_block_normal_ratio Value : 0.2
> GUC : hawq_metadata_cache_max_hdfs_file_num Value : 524288
> GUC : hawq_metadata_cache_reduce_ratio Value : 0.7
> GUC : hawq_metadata_cache_refresh_interval Value : 3600
> GUC : hawq_metadata_cache_refresh_max_num Value : 1000
> GUC : hawq_metadata_cache_refresh_timeout Value : 3600
> GUC : hawq_re_cgroup_hierarchy_name Value : hadoop-yarn
> GUC : hawq_re_cgroup_mount_point Value : /sys/fs/cgroup
> GUC : hawq_re_cpu_enable Value : off
> GUC : hawq_re_cpu_weight Value : 1024
> GUC : hawq_re_memory_overcommit_max Value : 8192
> GUC : hawq_re_vcore_pcore_ratio Value : 1
> GUC : hawq_rm_cluster_report_period Value : 60
> GUC : hawq_rm_clusterratio_core_to_memorygb_factor Value : 5
> GUC : hawq_rm_connpool_sameaddr_buffersize Value : 2
> GUC : hawq_rm_container_batch_limit Value : 1000
> GUC : hawq_rm_enable_connpool Value : on
> GUC : hawq_rm_force_alterqueue_cancel_queued_request Value : on
> GUC : hawq_rm_force_fifo_queuing Value : on
> GUC : hawq_rm_master_domain_port Value : 5436
> GUC : hawq_rm_master_port Value : 5437
> GUC : hawq_rm_memory_limit_perseg Value : 480GB
> GUC : hawq_rm_min_resource_perseg Value : 2
> GUC : hawq_rm_nocluster_timeout Value : 60
> GUC : hawq_rm_nresqueue_limit Value : 128
> GUC : hawq_rm_nslice_perseg_limit Value : 5000
> GUC : hawq_rm_nvcore_limit_perseg Value : 16
> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_limit Value : 512
> GUC : hawq_rm_nvseg_for_analyze_nopart_perquery_perseg_limit Value : 8
> GUC : hawq_rm_nvseg_for_analyze_part_perquery_limit Value : 256
> GUC : hawq_rm_nvseg_for_analyze_part_perquery_perseg_limit Value : 4
> GUC : hawq_rm_nvseg_for_copy_from_perquery Value : 6
> GUC : hawq_rm_nvseg_perquery_limit Value : 512
> GUC : hawq_rm_nvseg_perquery_perseg_limit Value : 6
> GUC : hawq_rm_nvseg_variance_amon_seg_limit Value : 1
> GUC : hawq_rm_nvseg_variance_amon_seg_respool_limit Value : 2
> GUC : hawq_rm_regularize_io_factor Value : 1
> GUC : hawq_rm_regularize_io_max Value : 1.37439e+11
> GUC : hawq_rm_regularize_nvseg_factor Value : 1
> GUC : hawq_rm_regularize_nvseg_max Value : 300
> GUC : hawq_rm_regularize_usage_factor Value : 1
> GUC : hawq_rm_rejectrequest_nseg_limit Value : 0.25
> GUC : hawq_rm_request_timeoutcheck_interval Value : 1
> GUC : hawq_rm_resource_allocation_timeout Value : 600
> GUC : hawq_rm_resource_idle_timeout Value : 300
> GUC : hawq_rm_respool_test_file Value :
> GUC : hawq_rm_return_percent_on_overcommit Value : 10
> GUC : hawq_rm_segment_config_refresh_interval Value : 30
> GUC : hawq_rm_segment_heartbeat_interval Value : 30
> GUC : hawq_rm_segment_heartbeat_timeout Value : 300
> GUC : hawq_rm_segment_port Value : 5438
> GUC : hawq_rm_segment_tmpdir_detect_interval Value : 300
> GUC : hawq_rm_session_lease_heartbeat_enable Value : on
> GUC : hawq_rm_session_lease_heartbeat_interval Value : 10
> GUC : hawq_rm_session_lease_timeout Value : 180
> GUC : hawq_rm_stmt_nvseg Value : 0
> GUC : hawq_rm_stmt_vseg_memory Value : 128mb
> GUC : hawq_rm_tolerate_nseg_limit Value : 0.25
> GUC : hawq_rm_yarn_address Value : worker1.bigdata:8050
> GUC : hawq_rm_yarn_app_name Value : hawq
> GUC : hawq_rm_yarn_queue_name Value : default
> GUC : hawq_rm_yarn_scheduler_address Value : worker1.bigdata:8030
> GUC : hawq_segment_address_port Value : 40000
> GUC : hawq_segment_history_keep_period Value : 365
> GUC : hawq_segment_temp_directory Value : /tmp
> GUC : integer_datetimes Value : on
> GUC : IntervalStyle Value : postgres
> GUC : join_collapse_limit Value : 20
> GUC : krb5_ccname Value : /tmp/postgres.ccname
> GUC : krb_caseins_users Value : off
> GUC : krb_server_keyfile Value :
> FILE:/data/pulse2-agent/agents/agent1/work/HAWQ-main-opt/rhel5_x86_64/src/hawq-db-dist/etc/krb5.keytab
> GUC : krb_srvname Value : postgres
> GUC : lc_collate Value : C
> GUC : lc_ctype Value : C
> GUC : lc_messages Value : en_US.utf8
> GUC : lc_monetary Value : en_US.utf8
> GUC : lc_numeric Value : en_US.utf8
> GUC : lc_time Value : en_US.utf8
> GUC : listen_addresses Value : *
> GUC : local_preload_libraries Value :
> GUC : log_autostats Value : off
> GUC : log_connections Value : off
> GUC : log_disconnections Value : off
> GUC : log_dispatch_stats Value : off
> GUC : log_duration Value : off
> GUC : log_error_verbosity Value : default
> GUC : log_executor_stats Value : off
> GUC : log_hostname Value : off
> GUC : log_min_duration_statement Value : -1
> GUC : log_min_error_statement Value : error
> GUC : log_min_messages Value : warning
> GUC : log_parser_stats Value : off
> GUC : log_planner_stats Value : off
> GUC : log_rotation_age Value : 1440
> GUC : log_rotation_size Value : 0
> GUC : log_statement Value : none
> GUC : log_statement_stats Value : off
> GUC : log_timezone Value : PRC
> GUC : log_truncate_on_rotation Value : off
> GUC : maintenance_work_mem Value : 65536
> GUC : master_directory Value :
> GUC : max_appendonly_segfiles Value : 262144
> GUC : max_appendonly_tables Value : 10000
> GUC : max_connections Value : 1280
> GUC : max_files_per_process Value : 150
> GUC : max_fsm_pages Value : 200000
> GUC : max_fsm_relations Value : 1000
> GUC : max_function_args Value : 100
> GUC : max_identifier_length Value : 63
> GUC : max_index_keys Value : 32
> GUC : max_locks_per_transaction Value : 128
> GUC : max_prepared_transactions Value : 250
> GUC : max_stack_depth Value : 2048
> GUC : max_work_mem Value : 1024000
> GUC : metadata_cache_testfile Value :
> GUC : optimizer Value : on
> GUC : optimizer_analyze_root_partition Value : on
> GUC : optimizer_minidump Value : onerror
> GUC : optimizer_parts_to_force_sort_on_insert Value : 160
> GUC : password_encryption Value : on
> GUC : password_hash_algorithm Value : MD5
> GUC : pljava_classpath Value :
> GUC : pljava_release_lingering_savepoints Value : off
> GUC : pljava_statement_cache_size Value : 0
> GUC : pljava_vmoptions Value :
> GUC : port Value : 6432
> GUC : pxf_enable_filter_pushdown Value : on
> GUC : pxf_enable_locality_optimizations Value : on
> GUC : pxf_enable_stat_collection Value : on
> GUC : pxf_remote_service_login Value :
> GUC : pxf_remote_service_secret Value :
> GUC : pxf_service_address Value : localhost:51200
> GUC : pxf_stat_max_fragments Value : 100
> GUC : random_page_cost Value : 100
> GUC : regex_flavor Value : advanced
> GUC : runaway_detector_activation_percent Value : 95
> GUC : search_path Value : "$user",public
> GUC : seg_max_connections Value : 3000
> GUC : segment_directory Value :
> GUC : seq_page_cost Value : 1
> GUC : server_encoding Value : UTF8
> GUC : server_ticket_renew_interval Value : 43200000
> GUC : server_version Value : 8.2.15
> GUC : server_version_num Value : 80215
> GUC : shared_buffers Value : 4000
> GUC : shared_preload_libraries Value :
> GUC : ssl Value : off
> GUC : ssl_ciphers Value : ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH
> GUC : standard_conforming_strings Value : off
> GUC : standby_address_host Value : localhost
> GUC : statement_timeout Value : 0
> GUC : superuser_reserved_connections Value : 3
> GUC : tcp_keepalives_count Value : 9
> GUC : tcp_keepalives_idle Value : 7200
> GUC : tcp_keepalives_interval Value : 75
> GUC : temp_buffers Value : 1024
> GUC : TimeZone Value : PRC
> GUC : timezone_abbreviations Value : Default
> GUC : track_activities Value : on
> GUC : track_counts Value : off
> GUC : transaction_isolation Value : read committed
> GUC : transaction_read_only Value : off
> GUC : transform_null_equals Value : off
> GUC : unix_socket_directory Value :
> GUC : unix_socket_group Value :
> GUC : unix_socket_permissions Value : 511
> GUC : update_process_title Value : on
> GUC : vacuum_cost_delay Value : 0
> GUC : vacuum_cost_limit Value : 200
> GUC : vacuum_cost_page_dirty Value : 20
> GUC : vacuum_cost_page_miss Value : 10
> GUC : vacuum_freeze_min_age Value : 100000000
> GUC : work_mem Value : 51200
>
>
>
>
>
> At 2016-09-21 13:00:41, "Vineet Goel" <vvin...@apache.org> wrote:
>
> Could you please post your SQL DDL statement? How many URLs do you have in
> your external table? Also, your HASH dist table - how many buckets are
> defined, if any? Are the # of URLs more than the # of buckets or
> default_hash_table_bucket_number value? Perhaps you can attach your
> hawq-site.xml file as well.
>
> Also see:
> http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html
>
> Thanks
> Vineet
>
>
> On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yin....@163.com> wrote:
>
>> Hi,all:
>>     I am testing hawq 2.0.0 , and I find a problem like this:
>>  I load data from an external table (created using "like target_table"
>> statement) ,
>> if the target table was distributed by some column(s), it raise this
>> error:
>>  External scan error: There are more external files (URLs) than primary
>> segments that can read them (COptTasks.cpp:1756)
>> if the target table was distributed randomly, it works well,
>> I don't set any parameter special,does anybody know how to resolve this
>> problem?
>> thanks a lot.
>>
>

Reply via email to