Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

2016-09-21 Thread Luis Macedo
Also on your location clause you should not reference the same file more
than one time.

If you want to scale gpfdist process you need to use different range on
each port for a same server. (Not sure if I explain myself :))

If you use one gpfdist per server performance should be fine. One gpfdist
process usually can do 250MB/s if underlying infra allows.

Rgds

--- Sent from my Nexus 5x

Em 21 de set de 2016 12:14 PM, "Vineet Goel" <vvin...@apache.org> escreveu:

> Your default_hash_table_bucket_number value is set to 6. Typically, this
> should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you
> should set this value to 18. Any time you change this parameter, you should
> redistribute your HASH distributed tables, if you have any (unless the
> table DDL has # of buckets defined, I think).
>
> Increase default_hash_table_bucket_number to 18 and retry the insert.
> Since you have 18 ext table URLs, it should work with the change.
>
> Thanks
> Vineet
>
>
> On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yin@163.com> wrote:
>
>>
>> My environment is >>>> : 1 master 3 segments
>> SQL >>>>>>:
>> CREATE TABLE call_center (
>> cc_call_center_sk integer,
>> cc_call_center_id character varying(16),
>> cc_rec_start_date date,
>> cc_rec_end_date date,
>> cc_closed_date_sk integer,
>> cc_open_date_sk integer,
>> cc_name character varying(50),
>> cc_class character varying(50),
>> cc_employees integer,
>> cc_sq_ft integer,
>> cc_hours character varying(20),
>> cc_manager character varying(40),
>> cc_mkt_id integer,
>> cc_mkt_class character varying(50),
>> cc_mkt_desc character varying(100),
>> cc_market_manager character varying(40),
>> cc_division text,
>> cc_division_name character varying(50),
>> cc_company text,
>> cc_company_name character varying(50),
>> cc_street_number character varying(10),
>> cc_street_name character varying(60),
>> cc_street_type character varying(15),
>> cc_suite_number character varying(10),
>> cc_city character varying(60),
>> cc_county character varying(30),
>> cc_state text,
>> cc_zip character varying(10),
>> cc_country character varying(20),
>> cc_gmt_offset numeric(5,2),
>> cc_tax_percentage numeric(5,2)
>> )
>> WITH (appendonly=true, orientation=parquet)
>> DISTRIBUTED BY (cc_call_center_sk);
>>
>> CREATE EXTERNAL TABLE ext_call_center (like call_center)
>> LOCATION (
>> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat',
>> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
>> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
>>
>> insert into call_center select * from ext_call_center;
>>
>> ERROR:  External scan error: There are more external files (URLs) than
>> primary segments that can read them (COptTasks.cpp:1756)
>>
>> hawq config >>>>:
>>
>> GUC : add_missing_from Value : off
>> GUC : application_name Value :
>> GUC : array_nulls Value : on
>> GUC : authentication_timeout Value : 60
>> GUC : backslash_quote Value : safe_encoding
>> GUC : block_size Value : 32768
>> GUC : bonjour_name Value :
>> GUC : check_function_bodies Value : on
>> GUC : client_encoding Value : UTF8
>> GUC : client_min_messages Value : ERROR
>> GUC : cpu_index_tuple_cost Value : 0.005
>> GUC : cpu_operator_cost Value : 0.0025
>> GUC : cpu_tuple_cost Value : 0.01
>> GUC : cursor_tuple_fraction Value : 1
>> GUC : cust

Re: Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

2016-09-21 Thread Vineet Goel
Your default_hash_table_bucket_number value is set to 6. Typically, this
should be adjusted as 6 x #_of_your_segment_hosts. With 3 segments, you
should set this value to 18. Any time you change this parameter, you should
redistribute your HASH distributed tables, if you have any (unless the
table DDL has # of buckets defined, I think).

Increase default_hash_table_bucket_number to 18 and retry the insert. Since
you have 18 ext table URLs, it should work with the change.

Thanks
Vineet


On Wed, Sep 21, 2016 at 12:26 AM 来熊 <yin@163.com> wrote:

>
> My environment is >>>> : 1 master 3 segments
> SQL >>>>>>:
> CREATE TABLE call_center (
> cc_call_center_sk integer,
> cc_call_center_id character varying(16),
> cc_rec_start_date date,
> cc_rec_end_date date,
> cc_closed_date_sk integer,
> cc_open_date_sk integer,
> cc_name character varying(50),
> cc_class character varying(50),
> cc_employees integer,
> cc_sq_ft integer,
> cc_hours character varying(20),
> cc_manager character varying(40),
> cc_mkt_id integer,
> cc_mkt_class character varying(50),
> cc_mkt_desc character varying(100),
> cc_market_manager character varying(40),
> cc_division text,
> cc_division_name character varying(50),
> cc_company text,
> cc_company_name character varying(50),
> cc_street_number character varying(10),
> cc_street_name character varying(60),
> cc_street_type character varying(15),
> cc_suite_number character varying(10),
> cc_city character varying(60),
> cc_county character varying(30),
> cc_state text,
> cc_zip character varying(10),
> cc_country character varying(20),
> cc_gmt_offset numeric(5,2),
> cc_tax_percentage numeric(5,2)
> )
> WITH (appendonly=true, orientation=parquet)
> DISTRIBUTED BY (cc_call_center_sk);
>
> CREATE EXTERNAL TABLE ext_call_center (like call_center)
> LOCATION (
> 'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat',
> 'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
> FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
>
> insert into call_center select * from ext_call_center;
>
> ERROR:  External scan error: There are more external files (URLs) than
> primary segments that can read them (COptTasks.cpp:1756)
>
> hawq config >>>>:
>
> GUC : add_missing_from Value : off
> GUC : application_name Value :
> GUC : array_nulls Value : on
> GUC : authentication_timeout Value : 60
> GUC : backslash_quote Value : safe_encoding
> GUC : block_size Value : 32768
> GUC : bonjour_name Value :
> GUC : check_function_bodies Value : on
> GUC : client_encoding Value : UTF8
> GUC : client_min_messages Value : ERROR
> GUC : cpu_index_tuple_cost Value : 0.005
> GUC : cpu_operator_cost Value : 0.0025
> GUC : cpu_tuple_cost Value : 0.01
> GUC : cursor_tuple_fraction Value : 1
> GUC : custom_variable_classes Value :
> GUC : DateStyle Value : ISO, MDY
> GUC : db_user_namespace Value : off
> GUC : deadlock_timeout Value : 1000
> GUC : debug_assertions Value : off
> GUC : debug_pretty_print Value : off
> GUC : debug_print_parse Value : off
> GUC : debug_print_plan Value : off
> GUC : debug_print_prelim_plan Value : off
> GUC : debug_print_rewritten Value : off
> GUC : debug_print_slice_table Value : off
> GUC : default_hash_table_bucket_number Value : 6
> GUC : default_statement_mem Value : 128000
> GUC : default_statistics_target Value : 25
> GUC : default_tablespace Value :
> GUC : default_transaction_isolation Value : read committed
> GUC : default_transaction_read_only Value : off
> GUC : dfs_url Value : localhost:8020/hawq
> GUC : dynamic_library_path Value : $libdir
> GUC : effective_cache

回复:Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

2016-09-21 Thread 来熊


My environment is >>>> : 1 master 3 segments
SQL >>>>>>: 
CREATE TABLE call_center (
cc_call_center_sk integer,
cc_call_center_id character varying(16),
cc_rec_start_date date,
cc_rec_end_date date,
cc_closed_date_sk integer,
cc_open_date_sk integer,
cc_name character varying(50),
cc_class character varying(50),
cc_employees integer,
cc_sq_ft integer,
cc_hours character varying(20),
cc_manager character varying(40),
cc_mkt_id integer,
cc_mkt_class character varying(50),
cc_mkt_desc character varying(100),
cc_market_manager character varying(40),
cc_division text,
cc_division_name character varying(50),
cc_company text,
cc_company_name character varying(50),
cc_street_number character varying(10),
cc_street_name character varying(60),
cc_street_type character varying(15),
cc_suite_number character varying(10),
cc_city character varying(60),
cc_county character varying(30),
cc_state text,
cc_zip character varying(10),
cc_country character varying(20),
cc_gmt_offset numeric(5,2),
cc_tax_percentage numeric(5,2)
)
WITH (appendonly=true, orientation=parquet)
DISTRIBUTED BY (cc_call_center_sk);


CREATE EXTERNAL TABLE ext_call_center (like call_center)
LOCATION (
'gpfdist://segment3:9001/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment3:9002/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment3:9003/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment3:9004/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment3:9005/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment3:9006/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment2:9001/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment2:9002/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment2:9003/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment2:9004/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment2:9005/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment2:9006/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment1:9001/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment1:9002/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment1:9003/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment1:9004/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment1:9005/call_center_[0-9]*_[0-9]*.dat', 
'gpfdist://segment1:9006/call_center_[0-9]*_[0-9]*.dat')
FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');


insert into call_center select * from ext_call_center;


ERROR:  External scan error: There are more external files (URLs) than primary 
segments that can read them (COptTasks.cpp:1756)


hawq config >>>>:


GUC: add_missing_fromValue: off
GUC: application_nameValue: 
GUC: array_nullsValue: on
GUC: authentication_timeoutValue: 60
GUC: backslash_quoteValue: safe_encoding
GUC: block_sizeValue: 32768
GUC: bonjour_nameValue: 
GUC: check_function_bodiesValue: on
GUC: client_encodingValue: UTF8
GUC: client_min_messagesValue: ERROR
GUC: cpu_index_tuple_costValue: 0.005
GUC: cpu_operator_costValue: 0.0025
GUC: cpu_tuple_costValue: 0.01
GUC: cursor_tuple_fractionValue: 1
GUC: custom_variable_classesValue: 
GUC: DateStyleValue: ISO, MDY
GUC: db_user_namespaceValue: off
GUC: deadlock_timeoutValue: 1000
GUC: debug_assertionsValue: off
GUC: debug_pretty_printValue: off
GUC: debug_print_parseValue: off
GUC: debug_print_planValue: off
GUC: debug_print_prelim_planValue: off
GUC: debug_print_rewrittenValue: off
GUC: debug_print_slice_tableValue: off
GUC: default_hash_table_bucket_numberValue: 6
GUC: default_statement_memValue: 128000
GUC: default_statistics_targetValue: 25
GUC: default_tablespaceValue: 
GUC: default_transaction_isolationValue: read committed
GUC: default_transaction_read_onlyValue: off
GUC: dfs_urlValue: localhost:8020/hawq
GUC: dynamic_library_pathValue: $libdir
GUC: effective_cache_sizeValue: 16384
GUC: enable_bitmapscanValue: on
GUC: enable_groupaggValue: on
GUC: enable_hashaggValue: on
GUC: enable_hashjoinValue: on
GUC: enable_indexscanValue: on
GUC: enable_mergejoinValue: off
GUC: enable_nestloopValue: off
GUC: enable_secure_filesystemValue: off
GUC: enable_seqscanValue: on
GUC: enable_sortValue: on
GUC: enable_tidscanValue: on
GUC: escape_string_warningValue: on
GUC: explain_memory_verbosityValue: suppress
GUC: explain_pretty_printValue: on
GUC: extra_float_digitsValue: 0
GUC: filesystem_support_truncateValue: on
GUC: from_collapse_limitValue: 20
GUC: gp_adjust_selectivity_for_outerjoinsValue: on
GUC: gp_analyze_relative_errorValue: 0.25
GUC: gp_autostats_modeValue: NONE
GUC: gp_autostats_on_change_thresholdValue: 2147483647
GUC: gp_backup_directIOValue: off
GUC: gp_backup_directIO_read_chunk_mbValue: 20
GUC: gp_cached_segworkers_thresholdValue: 5
GUC: gp_command_countValue: 6
GUC: gp_connections_per_threadValue: 64
GUC: gp_debug_lingerValue: 0
GUC: gp_dynamic_partition_pruningValue: on
GUC: gp_email_connect_avoid_durationValue: 7200
GUC: gp_email_connect_failuresValue: 5
GUC: gp_email_connect_time

Re: External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

2016-09-20 Thread Vineet Goel
Could you please post your SQL DDL statement? How many URLs do you have in
your external table? Also, your HASH dist table - how many buckets are
defined, if any? Are the # of URLs more than the # of buckets or
default_hash_table_bucket_number value? Perhaps you can attach your
hawq-site.xml file as well.

Also see:
http://hdb.docs.pivotal.io/20/datamgmt/load/g-gpfdist-protocol.html

Thanks
Vineet


On Tue, Sep 20, 2016 at 7:07 PM 来熊 <yin@163.com> wrote:

> Hi,all:
> I am testing hawq 2.0.0 , and I find a problem like this:
>  I load data from an external table (created using "like target_table"
> statement) ,
> if the target table was distributed by some column(s), it raise this error:
>  External scan error: There are more external files (URLs) than primary
> segments that can read them (COptTasks.cpp:1756)
> if the target table was distributed randomly, it works well,
> I don't set any parameter special,does anybody know how to resolve this
> problem?
> thanks a lot.
>


External scan error: There are more external files (URLs) than primary segments that can read them (COptTasks.cpp:1756)

2016-09-20 Thread 来熊
Hi,all:
I am testing hawq 2.0.0 , and I find a problem like this:
 I load data from an external table (created using "like target_table" 
statement) ,
if the target table was distributed by some column(s), it raise this error:
 External scan error: There are more external files (URLs) than primary 
segments that can read them (COptTasks.cpp:1756)
if the target table was distributed randomly, it works well,
I don't set any parameter special,does anybody know how to resolve this problem?
thanks a lot.