Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)
Also on your location clause you should not reference the same file more than one time. If you want to scale gpfdist process you need to use different range on each port for a same server. (Not sure if I explain myself :)) If you use one gpfdist per server performance should be fine. One gpfdist process usually can do 250MB/s if underlying infra allows. Rgds --- Sent from my Nexus 5x Em 21 de set de 2016 12:14 PM, "Vineet Goel" <vvin...@apache.org> escreveu: > Your default_hash_table_bucket_number value is set to 6. Typically, this > should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you > should set this value to 18. Any time you change this parameter, you should > redistribute your HASH distributed tables, if you have any (unless the > table DDL has # of buckets defined, I think). > > Increase default_hash_table_bucket_number to 18 and retry the insert. > Since you have 18 ext table URLs, it should work with the change. > > Thanks > Vineet > > > On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yin@163.com> wrote: > >> >> My environment is >>>> : 1 master 3 segments >> SQL >>>>>>: >> CREATE TABLE call_center ( >> cc_call_center_sk integer, >> cc_call_center_id character varying(16), >> cc_rec_start_date date, >> cc_rec_end_date date, >> cc_closed_date_sk integer, >> cc_open_date_sk integer, >> cc_name character varying(50), >> cc_class character varying(50), >> cc_employees integer, >> cc_sq_ft integer, >> cc_hours character varying(20), >> cc_manager character varying(40), >> cc_mkt_id integer, >> cc_mkt_class character varying(50), >> cc_mkt_desc character varying(100), >> cc_market_manager character varying(40), >> cc_division text, >> cc_division_name character varying(50), >> cc_company text, >> cc_company_name character varying(50), >> cc_street_number character varying(10), >> cc_street_name character varying(60), >> cc_street_type character varying(15), >> cc_suite_number character varying(10), >> cc_city character varying(60), >> cc_county character varying(30), >> cc_state text, >> cc_zip character varying(10), >> cc_country character varying(20), >> cc_gmt_offset numeric(5,2), >> cc_tax_percentage numeric(5,2) >> ) >> WITH (appendonly=true, orientation=parquet) >> DISTRIBUTED BY (cc_call_center_sk); >> >> CREATE EXTERNAL TABLE ext_call_center (like call_center) >> LOCATION ( >> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat', >> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat') >> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\'); >> >> insert into call_center select * from ext_call_center; >> >> ERROR: External scan error: There are more external files (URLs) than >> primary segments that can read them (COptTasks.cpp:1756) >> >> hawq config >>>>: >> >> GUC : add_missing_from Value : off >> GUC : application_name Value : >> GUC : array_nulls Value : on >> GUC : authentication_timeout Value : 60 >> GUC : backslash_quote Value : safe_encoding >> GUC : block_size Value : 32768 >> GUC : bonjour_name Value : >> GUC : check_function_bodies Value : on >> GUC : client_encoding Value : UTF8 >> GUC : client_min_messages Value : ERROR >> GUC : cpu_index_tuple_cost Value : 0.005 >> GUC : cpu_operator_cost Value : 0.0025 >> GUC : cpu_tuple_cost Value : 0.01 >> GUC : cursor_tuple_fraction Value : 1 >> GUC : cust
Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)
Your default_hash_table_bucket_number value is set to 6. Typically, this should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you should set this value to 18. Any time you change this parameter, you should redistribute your HASH distributed tables, if you have any (unless the table DDL has # of buckets defined, I think). Increase default_hash_table_bucket_number to 18 and retry the insert. Since you have 18 ext table URLs, it should work with the change. Thanks Vineet On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yin@163.com> wrote: > > My environment is >>>> : 1 master 3 segments > SQL >>>>>>: > CREATE TABLE call_center ( > cc_call_center_sk integer, > cc_call_center_id character varying(16), > cc_rec_start_date date, > cc_rec_end_date date, > cc_closed_date_sk integer, > cc_open_date_sk integer, > cc_name character varying(50), > cc_class character varying(50), > cc_employees integer, > cc_sq_ft integer, > cc_hours character varying(20), > cc_manager character varying(40), > cc_mkt_id integer, > cc_mkt_class character varying(50), > cc_mkt_desc character varying(100), > cc_market_manager character varying(40), > cc_division text, > cc_division_name character varying(50), > cc_company text, > cc_company_name character varying(50), > cc_street_number character varying(10), > cc_street_name character varying(60), > cc_street_type character varying(15), > cc_suite_number character varying(10), > cc_city character varying(60), > cc_county character varying(30), > cc_state text, > cc_zip character varying(10), > cc_country character varying(20), > cc_gmt_offset numeric(5,2), > cc_tax_percentage numeric(5,2) > ) > WITH (appendonly=true, orientation=parquet) > DISTRIBUTED BY (cc_call_center_sk); > > CREATE EXTERNAL TABLE ext_call_center (like call_center) > LOCATION ( > 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat', > 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat') > FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\'); > > insert into call_center select * from ext_call_center; > > ERROR: External scan error: There are more external files (URLs) than > primary segments that can read them (COptTasks.cpp:1756) > > hawq config >>>>: > > GUC : add_missing_from Value : off > GUC : application_name Value : > GUC : array_nulls Value : on > GUC : authentication_timeout Value : 60 > GUC : backslash_quote Value : safe_encoding > GUC : block_size Value : 32768 > GUC : bonjour_name Value : > GUC : check_function_bodies Value : on > GUC : client_encoding Value : UTF8 > GUC : client_min_messages Value : ERROR > GUC : cpu_index_tuple_cost Value : 0.005 > GUC : cpu_operator_cost Value : 0.0025 > GUC : cpu_tuple_cost Value : 0.01 > GUC : cursor_tuple_fraction Value : 1 > GUC : custom_variable_classes Value : > GUC : DateStyle Value : ISO, MDY > GUC : db_user_namespace Value : off > GUC : deadlock_timeout Value : 1000 > GUC : debug_assertions Value : off > GUC : debug_pretty_print Value : off > GUC : debug_print_parse Value : off > GUC : debug_print_plan Value : off > GUC : debug_print_prelim_plan Value : off > GUC : debug_print_rewritten Value : off > GUC : debug_print_slice_table Value : off > GUC : default_hash_table_bucket_number Value : 6 > GUC : default_statement_mem Value : 128000 > GUC : default_statistics_target Value : 25 > GUC : default_tablespace Value : > GUC : default_transaction_isolation Value : read committed > GUC : default_transaction_read_only Value : off > GUC : dfs_url Value : localhost:8020/hawq > GUC : dynamic_library_path Value : $libdir > GUC : effective_cache
回复:Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)
My environment is >>>> : 1 master 3 segments SQL >>>>>>: CREATE TABLE call_center ( cc_call_center_sk integer, cc_call_center_id character varying(16), cc_rec_start_date date, cc_rec_end_date date, cc_closed_date_sk integer, cc_open_date_sk integer, cc_name character varying(50), cc_class character varying(50), cc_employees integer, cc_sq_ft integer, cc_hours character varying(20), cc_manager character varying(40), cc_mkt_id integer, cc_mkt_class character varying(50), cc_mkt_desc character varying(100), cc_market_manager character varying(40), cc_division text, cc_division_name character varying(50), cc_company text, cc_company_name character varying(50), cc_street_number character varying(10), cc_street_name character varying(60), cc_street_type character varying(15), cc_suite_number character varying(10), cc_city character varying(60), cc_county character varying(30), cc_state text, cc_zip character varying(10), cc_country character varying(20), cc_gmt_offset numeric(5,2), cc_tax_percentage numeric(5,2) ) WITH (appendonly=true, orientation=parquet) DISTRIBUTED BY (cc_call_center_sk); CREATE EXTERNAL TABLE ext_call_center (like call_center) LOCATION ( 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat', 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat') FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\'); insert into call_center select * from ext_call_center; ERROR: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756) hawq config >>>>: GUC: add_missing_fromValue: off GUC: application_nameValue: GUC: array_nullsValue: on GUC: authentication_timeoutValue: 60 GUC: backslash_quoteValue: safe_encoding GUC: block_sizeValue: 32768 GUC: bonjour_nameValue: GUC: check_function_bodiesValue: on GUC: client_encodingValue: UTF8 GUC: client_min_messagesValue: ERROR GUC: cpu_index_tuple_costValue: 0.005 GUC: cpu_operator_costValue: 0.0025 GUC: cpu_tuple_costValue: 0.01 GUC: cursor_tuple_fractionValue: 1 GUC: custom_variable_classesValue: GUC: DateStyleValue: ISO, MDY GUC: db_user_namespaceValue: off GUC: deadlock_timeoutValue: 1000 GUC: debug_assertionsValue: off GUC: debug_pretty_printValue: off GUC: debug_print_parseValue: off GUC: debug_print_planValue: off GUC: debug_print_prelim_planValue: off GUC: debug_print_rewrittenValue: off GUC: debug_print_slice_tableValue: off GUC: default_hash_table_bucket_numberValue: 6 GUC: default_statement_memValue: 128000 GUC: default_statistics_targetValue: 25 GUC: default_tablespaceValue: GUC: default_transaction_isolationValue: read committed GUC: default_transaction_read_onlyValue: off GUC: dfs_urlValue: localhost:8020/hawq GUC: dynamic_library_pathValue: $libdir GUC: effective_cache_sizeValue: 16384 GUC: enable_bitmapscanValue: on GUC: enable_groupaggValue: on GUC: enable_hashaggValue: on GUC: enable_hashjoinValue: on GUC: enable_indexscanValue: on GUC: enable_mergejoinValue: off GUC: enable_nestloopValue: off GUC: enable_secure_filesystemValue: off GUC: enable_seqscanValue: on GUC: enable_sortValue: on GUC: enable_tidscanValue: on GUC: escape_string_warningValue: on GUC: explain_memory_verbosityValue: suppress GUC: explain_pretty_printValue: on GUC: extra_float_digitsValue: 0 GUC: filesystem_support_truncateValue: on GUC: from_collapse_limitValue: 20 GUC: gp_adjust_selectivity_for_outerjoinsValue: on GUC: gp_analyze_relative_errorValue: 0.25 GUC: gp_autostats_modeValue: NONE GUC: gp_autostats_on_change_thresholdValue: 2147483647 GUC: gp_backup_directIOValue: off GUC: gp_backup_directIO_read_chunk_mbValue: 20 GUC: gp_cached_segworkers_thresholdValue: 5 GUC: gp_command_countValue: 6 GUC: gp_connections_per_threadValue: 64 GUC: gp_debug_lingerValue: 0 GUC: gp_dynamic_partition_pruningValue: on GUC: gp_email_connect_avoid_durationValue: 7200 GUC: gp_email_connect_failuresValue: 5 GUC: gp_email_connect_time
Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)
Could you please post your SQL DDL statement? How many URLs do you have in your external table? Also, your HASH dist table - how many buckets are defined, if any? Are the # of URLs more than the # of buckets or default_hash_table_bucket_number value? Perhaps you can attach your hawq-site.xml file as well. Also see: http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html Thanks Vineet On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yin@163.com> wrote: > Hi,all: > I am testing hawq 2.0.0 , and I find a problem like this: > I load data from an external table (created using "like target_table" > statement) , > if the target table was distributed by some column(s), it raise this error: > External scan error: There are more external files (URLs) than primary > segments that can read them (COptTasks.cpp:1756) > if the target table was distributed randomly, it works well, > I don't set any parameter special,does anybody know how to resolve this > problem? > thanks a lot. >
External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)
Hi,all: I am testing hawq 2.0.0 , and I find a problem like this: I load data from an external table (created using "like target_table" statement) , if the target table was distributed by some column(s), it raise this error: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756) if the target table was distributed randomly, it works well, I don't set any parameter special,does anybody know how to resolve this problem? thanks a lot.