Re: Fwd: Creating a Tableau extracts with Drill 1.12 uses unlimited memory

Francis McGregor-Macdonald Sun, 28 Jan 2018 14:49:24 -0800

Hi all,

A physical plan attached ... all memory appears to be 0.0 which seems odd?


Thanks

On Sun, Jan 28, 2018 at 10:37 PM, Francis McGregor-Macdonald <
[email protected]> wrote:

> And with logs as attachments.
>
> On Sun, Jan 28, 2018 at 9:40 PM, Francis McGregor-Macdonald <
> [email protected]> wrote:
>
>> Thanks Paul and Kunal,
>> I think I have the right information now. With Paul's changes (and fixing
>> up a zoo.cfg error) it isn't crashing, rather failing. Logs attached, still
>> blowing past memory limits. It does the same thing when re-running the
>> query from the web console so presumably its not actually Tableau related
>> despite me first generating it that way.
>>
>> Thanks.
>>
>> On Sat, Jan 27, 2018 at 1:15 PM, Francis McGregor-Macdonald <
>> [email protected]> wrote:
>>
>>> Thanks Paul,
>>>
>>> I will update with your suggested memory allocations also and retry.
>>>
>>> Zookeeper crashed too which might explain more? I have attached the logs
>>> from Zookeeper too.
>>>
>>> Thanks
>>>
>>> On Sat, Jan 27, 2018 at 6:45 AM, Paul Rogers <[email protected]> wrote:
>>>
>>>> Hi Francis,
>>>>
>>>> Thanks much for the log. The log shows running a query, then
>>>> immediately shows entries that occur when starting Drill. I'm guessing that
>>>> Drill literally crashed at this point? This is more severe than the usual
>>>> error in which a query exhausts memory.
>>>>
>>>> Some general observations. The Drill memory is 60 GB, but system memory
>>>> is 61 GB. Perhaps try dropping total Drill memory some to give the OS and
>>>> other tasks more headroom. For a SELECT * memory, Drill needs far less than
>>>> what you have, so maybe try giving Drill 48 GB total.
>>>>
>>>> Then, Drill needs direct memory much more than heap. So, maybe give
>>>> Drill 39 GB direct, 8 GB heap and 1 GB (the default) for code cache. These
>>>> settings are in drill-env.sh.
>>>>
>>>> Kunal, you have more experience with these issues. Can you make
>>>> additional suggestions by looking at the log?
>>>>
>>>> Thanks,
>>>>
>>>> - Paul
>>>>
>>>>
>>>>
>>>> On Thursday, January 25, 2018, 10:20:29 PM PST, Francis
>>>> McGregor-Macdonald <[email protected]> wrote:
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> I am guessing that each of your EMR nodes are quite large? EMR nodes
>>>> are: r4.2xlarge ('vcpu': 8, 'memory': 61)
>>>>
>>>> Property "planner.width.max_per_node" is set to = 6
>>>>
>>>> What is the system memory and what are the allocations for heap and
>>>> direct?
>>>> System Memory: 61GB (EMR nodes above)
>>>> drill_mem_heap: 12G
>>>> drill_mem_max: 48G
>>>>
>>>> The view is simple: SELECT * FROM s3://myparquet.parquet (14GB)
>>>>
>>>> planner.memory.max_query_memor y_per_node = 10479720202
>>>>
>>>> Drillbit.log attached (I think I have the correct selection included).
>>>>
>>>> Thanks
>>>>
>>>> On Fri, Jan 26, 2018 at 2:41 PM, Kunal Khatua <[email protected]> wrote:
>>>>
>>>> What is the system memory and what are the allocations for heap and
>>>> direct? The memory crash might be occurring due to insufficient heap. The
>>>> limits parameter applies to the direct memory and not Heap.
>>>>
>>>> Can you share details in the logs from the crash?
>>>>
>>>> -----Original Message-----
>>>> From: Timothy Farkas [mailto:[email protected]]
>>>> Sent: Thursday, January 25, 2018 2:58 PM
>>>> To: [email protected]
>>>> Subject: Re: Creating a Tableau extracts with Drill 1.12 uses unlimited
>>>> memory
>>>>
>>>> Hi Francis,
>>>>
>>>> I am guessing that each of your EMR nodes are quite large (32 or 64
>>>> vcpus). On large machines Drill's planner over parallelizes and over
>>>> allocates memory. There is a property "planner.width.max_per_node" which
>>>> limits the number of operators that can simultaneously execute on a
>>>> Drillbit for a query. If you configure the width per node to something like
>>>> 5 or 10 (you may have to play around with it) things should start working.
>>>>
>>>> Thanks,
>>>> Tim
>>>>
>>>> ______________________________ __
>>>> From: Francis McGregor-Macdonald <[email protected]>
>>>> Sent: Thursday, January 25, 2018 1:58:22 PM
>>>> To: [email protected]
>>>> Subject: Creating a Tableau extracts with Drill 1.12 uses unlimited
>>>> memory
>>>>
>>>> Creating a creating a Tableau (with 10.3, 10.5 desktop) extract from a
>>>> Drill (1.12 on EMR) cluster memory appears not to adhere to the limits set
>>>> by planner.memory.max_query_memor y_per_node.
>>>>
>>>> The extract query consumes all memory and then crashes drill.
>>>>
>>>> Running the same query as a create table memory behaves as expected.
>>>>
>>>> The query complexity is trivial:
>>>> select * from view only a single parquet with no calculated fields.
>>>>
>>>> Has anyone else observed this behavior?
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>

00-00    Screen : rowType = RecordType(INTEGER Number of Records, ANY dat_fact, 
ANY dat_market, ANY dat_period, ANY dat_product, ANY dat_value, ANY dataset, 
ANY mar_ccy, ANY mar_country, ANY mar_long_desc, ANY mar_sequence, ANY 
mar_short_desc, ANY mar_tag, ANY per_date, ANY per_list, ANY per_long_desc, ANY 
per_sequence, ANY per_short_desc, ANY per_tag, ANY pro_aroma/fragancia/sabor, 
ANY pro_atributo, ANY pro_barcode, ANY pro_categoria, ANY pro_category, ANY 
pro_char08, ANY pro_char10, ANY pro_char16, ANY pro_char17, ANY pro_color, ANY 
pro_consistencia, ANY pro_empaque/envase, ANY pro_empaque, ANY 
pro_envasado/granel, ANY pro_envase, ANY pro_fabricante, ANY pro_formato, ANY 
pro_gasificacion, ANY pro_item, ANY pro_level, ANY pro_long_desc, ANY 
pro_marca, ANY pro_marcas, ANY pro_natural/sabor, ANY pro_rangos, ANY 
pro_reg/diet, ANY pro_regular/light, ANY pro_regular_/_light_-_diet, ANY 
pro_sabor, ANY pro_sal, ANY pro_segmento, ANY pro_segmento_1, ANY 
pro_segmentos, ANY pro_sequence, ANY pro_short_desc, ANY pro_submarca, ANY 
pro_submarcas, ANY pro_tag, ANY pro_tamano, ANY pro_tipo, ANY pro_tipo_ii, ANY 
pro_total_fam_e_ind, ANY pro_variedad, ANY pro_variedad_2, ANY publisher, ANY 
version): rowcount = 3.88031625E8, cumulative cost = {1.5909296625E9 rows, 
5.4712459125E9 cpu, 0.0 io, 1.0330953984E14 network, 0.0 memory}, id = 3009
00-01      ComplexToJson : rowType = RecordType(INTEGER Number of Records, ANY 
dat_fact, ANY dat_market, ANY dat_period, ANY dat_product, ANY dat_value, ANY 
dataset, ANY mar_ccy, ANY mar_country, ANY mar_long_desc, ANY mar_sequence, ANY 
mar_short_desc, ANY mar_tag, ANY per_date, ANY per_list, ANY per_long_desc, ANY 
per_sequence, ANY per_short_desc, ANY per_tag, ANY pro_aroma/fragancia/sabor, 
ANY pro_atributo, ANY pro_barcode, ANY pro_categoria, ANY pro_category, ANY 
pro_char08, ANY pro_char10, ANY pro_char16, ANY pro_char17, ANY pro_color, ANY 
pro_consistencia, ANY pro_empaque/envase, ANY pro_empaque, ANY 
pro_envasado/granel, ANY pro_envase, ANY pro_fabricante, ANY pro_formato, ANY 
pro_gasificacion, ANY pro_item, ANY pro_level, ANY pro_long_desc, ANY 
pro_marca, ANY pro_marcas, ANY pro_natural/sabor, ANY pro_rangos, ANY 
pro_reg/diet, ANY pro_regular/light, ANY pro_regular_/_light_-_diet, ANY 
pro_sabor, ANY pro_sal, ANY pro_segmento, ANY pro_segmento_1, ANY 
pro_segmentos, ANY pro_sequence, ANY pro_short_desc, ANY pro_submarca, ANY 
pro_submarcas, ANY pro_tag, ANY pro_tamano, ANY pro_tipo, ANY pro_tipo_ii, ANY 
pro_total_fam_e_ind, ANY pro_variedad, ANY pro_variedad_2, ANY publisher, ANY 
version): rowcount = 3.88031625E8, cumulative cost = {1.5521265E9 rows, 
5.43244275E9 cpu, 0.0 io, 1.0330953984E14 network, 0.0 memory}, id = 3008
00-02        Project(Number of Records=[$0], dat_fact=[$1], dat_market=[$2], 
dat_period=[$3], dat_product=[$4], dat_value=[$5], dataset=[$6], mar_ccy=[$7], 
mar_country=[$8], mar_long_desc=[$9], mar_sequence=[$10], mar_short_desc=[$11], 
mar_tag=[$12], per_date=[$13], per_list=[$14], per_long_desc=[$15], 
per_sequence=[$16], per_short_desc=[$17], per_tag=[$18], 
pro_aroma/fragancia/sabor=[$19], pro_atributo=[$20], pro_barcode=[$21], 
pro_categoria=[$22], pro_category=[$23], pro_char08=[$24], pro_char10=[$25], 
pro_char16=[$26], pro_char17=[$27], pro_color=[$28], pro_consistencia=[$29], 
pro_empaque/envase=[$30], pro_empaque=[$31], pro_envasado/granel=[$32], 
pro_envase=[$33], pro_fabricante=[$34], pro_formato=[$35], 
pro_gasificacion=[$36], pro_item=[$37], pro_level=[$38], pro_long_desc=[$39], 
pro_marca=[$40], pro_marcas=[$41], pro_natural/sabor=[$42], pro_rangos=[$43], 
pro_reg/diet=[$44], pro_regular/light=[$45], pro_regular_/_light_-_diet=[$46], 
pro_sabor=[$47], pro_sal=[$48], pro_segmento=[$49], pro_segmento_1=[$50], 
pro_segmentos=[$51], pro_sequence=[$52], pro_short_desc=[$53], 
pro_submarca=[$54], pro_submarcas=[$55], pro_tag=[$56], pro_tamano=[$57], 
pro_tipo=[$58], pro_tipo_ii=[$59], pro_total_fam_e_ind=[$60], 
pro_variedad=[$61], pro_variedad_2=[$62], publisher=[$63], version=[$64]) : 
rowType = RecordType(INTEGER Number of Records, ANY dat_fact, ANY dat_market, 
ANY dat_period, ANY dat_product, ANY dat_value, ANY dataset, ANY mar_ccy, ANY 
mar_country, ANY mar_long_desc, ANY mar_sequence, ANY mar_short_desc, ANY 
mar_tag, ANY per_date, ANY per_list, ANY per_long_desc, ANY per_sequence, ANY 
per_short_desc, ANY per_tag, ANY pro_aroma/fragancia/sabor, ANY pro_atributo, 
ANY pro_barcode, ANY pro_categoria, ANY pro_category, ANY pro_char08, ANY 
pro_char10, ANY pro_char16, ANY pro_char17, ANY pro_color, ANY 
pro_consistencia, ANY pro_empaque/envase, ANY pro_empaque, ANY 
pro_envasado/granel, ANY pro_envase, ANY pro_fabricante, ANY pro_formato, ANY 
pro_gasificacion, ANY pro_item, ANY pro_level, ANY pro_long_desc, ANY 
pro_marca, ANY pro_marcas, ANY pro_natural/sabor, ANY pro_rangos, ANY 
pro_reg/diet, ANY pro_regular/light, ANY pro_regular_/_light_-_diet, ANY 
pro_sabor, ANY pro_sal, ANY pro_segmento, ANY pro_segmento_1, ANY 
pro_segmentos, ANY pro_sequence, ANY pro_short_desc, ANY pro_submarca, ANY 
pro_submarcas, ANY pro_tag, ANY pro_tamano, ANY pro_tipo, ANY pro_tipo_ii, ANY 
pro_total_fam_e_ind, ANY pro_variedad, ANY pro_variedad_2, ANY publisher, ANY 
version): rowcount = 3.88031625E8, cumulative cost = {1.164094875E9 rows, 
5.044411125E9 cpu, 0.0 io, 1.0330953984E14 network, 0.0 memory}, id = 3007
00-03          UnionExchange : rowType = RecordType(INTEGER Number of Records, 
ANY dat_fact, ANY dat_market, ANY dat_period, ANY dat_product, ANY dat_value, 
ANY dataset, ANY mar_ccy, ANY mar_country, ANY mar_long_desc, ANY mar_sequence, 
ANY mar_short_desc, ANY mar_tag, ANY per_date, ANY per_list, ANY per_long_desc, 
ANY per_sequence, ANY per_short_desc, ANY per_tag, ANY 
pro_aroma/fragancia/sabor, ANY pro_atributo, ANY pro_barcode, ANY 
pro_categoria, ANY pro_category, ANY pro_char08, ANY pro_char10, ANY 
pro_char16, ANY pro_char17, ANY pro_color, ANY pro_consistencia, ANY 
pro_empaque/envase, ANY pro_empaque, ANY pro_envasado/granel, ANY pro_envase, 
ANY pro_fabricante, ANY pro_formato, ANY pro_gasificacion, ANY pro_item, ANY 
pro_level, ANY pro_long_desc, ANY pro_marca, ANY pro_marcas, ANY 
pro_natural/sabor, ANY pro_rangos, ANY pro_reg/diet, ANY pro_regular/light, ANY 
pro_regular_/_light_-_diet, ANY pro_sabor, ANY pro_sal, ANY pro_segmento, ANY 
pro_segmento_1, ANY pro_segmentos, ANY pro_sequence, ANY pro_short_desc, ANY 
pro_submarca, ANY pro_submarcas, ANY pro_tag, ANY pro_tamano, ANY pro_tipo, ANY 
pro_tipo_ii, ANY pro_total_fam_e_ind, ANY pro_variedad, ANY pro_variedad_2, ANY 
publisher, ANY version): rowcount = 3.88031625E8, cumulative cost = 
{1.164094875E9 rows, 5.044411125E9 cpu, 0.0 io, 1.0330953984E14 network, 0.0 
memory}, id = 3006
01-01            Project(Number of Records=[1], dat_fact=[ITEM($0, 
'dat_fact')], dat_market=[ITEM($0, 'dat_market')], dat_period=[ITEM($0, 
'dat_period')], dat_product=[ITEM($0, 'dat_product')], dat_value=[ITEM($0, 
'dat_value')], dataset=[ITEM($0, 'dataset')], mar_ccy=[ITEM($0, 'mar_ccy')], 
mar_country=[ITEM($0, 'mar_country')], mar_long_desc=[ITEM($0, 
'mar_long_desc')], mar_sequence=[ITEM($0, 'mar_sequence')], 
mar_short_desc=[ITEM($0, 'mar_short_desc')], mar_tag=[ITEM($0, 'mar_tag')], 
per_date=[ITEM($0, 'per_date')], per_list=[ITEM($0, 'per_list')], 
per_long_desc=[ITEM($0, 'per_long_desc')], per_sequence=[ITEM($0, 
'per_sequence')], per_short_desc=[ITEM($0, 'per_short_desc')], 
per_tag=[ITEM($0, 'per_tag')], pro_aroma/fragancia/sabor=[ITEM($0, 
'pro_aroma/fragancia/sabor')], pro_atributo=[ITEM($0, 'pro_atributo')], 
pro_barcode=[ITEM($0, 'pro_barcode')], pro_categoria=[ITEM($0, 
'pro_categoria')], pro_category=[ITEM($0, 'pro_category')], 
pro_char08=[ITEM($0, 'pro_char08')], pro_char10=[ITEM($0, 'pro_char10')], 
pro_char16=[ITEM($0, 'pro_char16')], pro_char17=[ITEM($0, 'pro_char17')], 
pro_color=[ITEM($0, 'pro_color')], pro_consistencia=[ITEM($0, 
'pro_consistencia')], pro_empaque/envase=[ITEM($0, 'pro_empaque/envase')], 
pro_empaque=[ITEM($0, 'pro_empaque')], pro_envasado/granel=[ITEM($0, 
'pro_envasado/granel')], pro_envase=[ITEM($0, 'pro_envase')], 
pro_fabricante=[ITEM($0, 'pro_fabricante')], pro_formato=[ITEM($0, 
'pro_formato')], pro_gasificacion=[ITEM($0, 'pro_gasificacion')], 
pro_item=[ITEM($0, 'pro_item')], pro_level=[ITEM($0, 'pro_level')], 
pro_long_desc=[ITEM($0, 'pro_long_desc')], pro_marca=[ITEM($0, 'pro_marca')], 
pro_marcas=[ITEM($0, 'pro_marcas')], pro_natural/sabor=[ITEM($0, 
'pro_natural/sabor')], pro_rangos=[ITEM($0, 'pro_rangos')], 
pro_reg/diet=[ITEM($0, 'pro_reg/diet')], pro_regular/light=[ITEM($0, 
'pro_regular/light')], pro_regular_/_light_-_diet=[ITEM($0, 
'pro_regular_/_light_-_diet')], pro_sabor=[ITEM($0, 'pro_sabor')], 
pro_sal=[ITEM($0, 'pro_sal')], pro_segmento=[ITEM($0, 'pro_segmento')], 
pro_segmento_1=[ITEM($0, 'pro_segmento_1')], pro_segmentos=[ITEM($0, 
'pro_segmentos')], pro_sequence=[ITEM($0, 'pro_sequence')], 
pro_short_desc=[ITEM($0, 'pro_short_desc')], pro_submarca=[ITEM($0, 
'pro_submarca')], pro_submarcas=[ITEM($0, 'pro_submarcas')], pro_tag=[ITEM($0, 
'pro_tag')], pro_tamano=[ITEM($0, 'pro_tamano')], pro_tipo=[ITEM($0, 
'pro_tipo')], pro_tipo_ii=[ITEM($0, 'pro_tipo_ii')], 
pro_total_fam_e_ind=[ITEM($0, 'pro_total_fam_e_ind')], pro_variedad=[ITEM($0, 
'pro_variedad')], pro_variedad_2=[ITEM($0, 'pro_variedad_2')], 
publisher=[ITEM($0, 'publisher')], version=[ITEM($0, 'version')]) : rowType = 
RecordType(INTEGER Number of Records, ANY dat_fact, ANY dat_market, ANY 
dat_period, ANY dat_product, ANY dat_value, ANY dataset, ANY mar_ccy, ANY 
mar_country, ANY mar_long_desc, ANY mar_sequence, ANY mar_short_desc, ANY 
mar_tag, ANY per_date, ANY per_list, ANY per_long_desc, ANY per_sequence, ANY 
per_short_desc, ANY per_tag, ANY pro_aroma/fragancia/sabor, ANY pro_atributo, 
ANY pro_barcode, ANY pro_categoria, ANY pro_category, ANY pro_char08, ANY 
pro_char10, ANY pro_char16, ANY pro_char17, ANY pro_color, ANY 
pro_consistencia, ANY pro_empaque/envase, ANY pro_empaque, ANY 
pro_envasado/granel, ANY pro_envase, ANY pro_fabricante, ANY pro_formato, ANY 
pro_gasificacion, ANY pro_item, ANY pro_level, ANY pro_long_desc, ANY 
pro_marca, ANY pro_marcas, ANY pro_natural/sabor, ANY pro_rangos, ANY 
pro_reg/diet, ANY pro_regular/light, ANY pro_regular_/_light_-_diet, ANY 
pro_sabor, ANY pro_sal, ANY pro_segmento, ANY pro_segmento_1, ANY 
pro_segmentos, ANY pro_sequence, ANY pro_short_desc, ANY pro_submarca, ANY 
pro_submarcas, ANY pro_tag, ANY pro_tamano, ANY pro_tipo, ANY pro_tipo_ii, ANY 
pro_total_fam_e_ind, ANY pro_variedad, ANY pro_variedad_2, ANY publisher, ANY 
version): rowcount = 3.88031625E8, cumulative cost = {7.7606325E8 rows, 
1.940158125E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 3005
01-02              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=s3a://mycustomer-data/tmp/nielsen_cl.parquet]], 
selectionRoot=s3a://mycustomer-data/tmp/nielsen_cl.parquet, numFiles=1, 
numRowGroups=41, usedMetadataFile=false, columns=[`*`]]]) : rowType = 
(DrillRecordRow[*]): rowcount = 3.88031625E8, cumulative cost = {3.88031625E8 
rows, 3.88031625E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 3004

Re: Fwd: Creating a Tableau extracts with Drill 1.12 uses unlimited memory

Reply via email to