Hi,

1.  What version of Drill are you running your query on ?
2.  How were your parquet files created ? (were they created on Drill or
another system, like Hive etc.)
3.  Can you please share the parquet schema of the parquet file being
queried ?

Thanks,
Khurram

On Wed, Aug 1, 2018 at 3:24 AM, Peter Edike <
peter.ed...@interswitchgroup.com> wrote:

> Hello everyone,
>
> I am currently using apache drill to query a directory of parquet files
> (total size 64gb with each file averaging 6.4Gb per size).  I am running
> the following query
>
> select * from dfs.rj.vw_reports_201805 where unique_key in
>     (
>          select unique_key from dfs.rj.vw_reports_201805 group by
> unique_key having count(unique_key) > 1
>     )
> limit 40
>
> Now this query takes about 42 minutes after which it fails because one of
> the drillbits in the drillbit cluster crashes. I check the logs and I see
> several errors summarized as follows
>
> 20:35:18.006 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState
> - Connection timed out for connection string (172.35.15.129:5181,
> 172.35.15.151:5181,172.35.15.152:5181) and timeout (5000) / elapsed
> (13347)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
> ConnectionLoss
>         at 
> org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
> [curator-client-2.7.1.jar:na]
>         at 
> org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
> [curator-client-2.7.1.jar:na]
>         at org.apache.curator.CuratorZookeeperClient.getZooKeeper(
> CuratorZookeeperClient.java:115) [curator-client-2.7.1.jar:na]
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> performBackgroundOperation(CuratorFrameworkImpl.java:806)
> [curator-framework-2.7.1.jar:na]
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> backgroundOperationsLoop(CuratorFrameworkImpl.java:792)
> [curator-framework-2.7.1.jar:na]
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.
> access$300(CuratorFrameworkImpl.java:62) [curator-framework-2.7.1.jar:na]
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.
> call(CuratorFrameworkImpl.java:257) [curator-framework-2.7.1.jar:na]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [na:1.8.0_171]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_171]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_171]
>         at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
> AND
>
>
> Aug 1, 2018 7:54:18 AM WARNING: org.apache.parquet.CorruptStatistics:
> Ignoring statistics because created_by could not be parsed (see
> PARQUET-251): parquet-mr (build 6aa21f8776625b5fa6b18059cfebe7549f2e00cb)
>
> org.apache.parquet.VersionParser$VersionParseException: Could not parse
> created_by: parquet-mr (build 6aa21f8776625b5fa6b18059cfebe7549f2e00cb)
> using format: (.+) version ((.*) )?\(build ?(.*)\)
>
>         at org.apache.parquet.VersionParser.parse(VersionParser.java:112)
>
>         at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(
> CorruptStatistics.java:66)
>
>         at org.apache.parquet.format.converter.ParquetMetadataConverter.
> fromParquetStatistics(ParquetMetadataConverter.java:264)
>
>         at org.apache.parquet.format.converter.ParquetMetadataConverter.
> fromParquetMetadata(ParquetMetadataConverter.java:568)
>
>         at org.apache.parquet.format.converter.ParquetMetadataConverter.
> readParquetMetadata(ParquetMetadataConverter.java:545)
>
>         at org.apache.parquet.hadoop.ParquetFileReader.readFooter(
> ParquetFileReader.java:455)
>
>         at org.apache.parquet.hadoop.ParquetFileReader.readFooter(
> ParquetFileReader.java:404)
>
>         at org.apache.parquet.hadoop.ParquetFileReader.readFooter(
> ParquetFileReader.java:390)
>
>         at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.
> getBatch(ParquetScanBatchCreator.java:117)
>
>         at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.
> getBatch(ParquetScanBatchCreator.java:53)
>
>         at org.apache.drill.exec.physical.impl.ImplCreator$2.
> run(ImplCreator.java:146)
>
>         at org.apache.drill.exec.physical.impl.ImplCreator$2.
> run(ImplCreator.java:142)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>
>         at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1633)
>
>         at org.apache.drill.exec.physical.impl.ImplCreator.
> getRecordBatch(ImplCreator.java:142)
>
>         at org.apache.drill.exec.physical.impl.ImplCreator.
> getChildren(ImplCreator.java:182)
>
>         at org.apache.drill.exec.physical.impl.ImplCreator.
> getRootExec(ImplCreator.java:110)
>
>         at org.apache.drill.exec.physical.impl.ImplCreator.
> getExec(ImplCreator.java:87)
>
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(
> FragmentExecutor.java:206)
>
>         at org.apache.drill.common.SelfCleaningRunnable.run(
> SelfCleaningRunnable.java:38)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>
>         at java.lang.Thread.run(Thread.java:748)
>
> The Schema for the data I am querying in question is as follows
>
> index_no
>
> ANY
>
> YES
>
> bank_code
>
> ANY
>
> YES
>
> trxn_category
>
> ANY
>
> YES
>
> Debit_Account_Type
>
> ANY
>
> YES
>
> Credit_Account_Type
>
> ANY
>
> YES
>
> trxn_amount
>
> ANY
>
> YES
>
> trxn_fee
>
> ANY
>
> YES
>
> trxn_date
>
> ANY
>
> YES
>
> Currency
>
> ANY
>
> YES
>
> Late_reversal
>
> ANY
>
> YES
>
> card_type
>
> ANY
>
> YES
>
> terminal_type
>
> ANY
>
> YES
>
> Acquirer
>
> ANY
>
> YES
>
> Issuer
>
> ANY
>
> YES
>
> Value_SettleAmount
>
> ANY
>
> YES
>
> Value_RequestedAmount
>
> ANY
>
> YES
>
> Volume
>
> ANY
>
> YES
>
> unique_key
>
> ANY
>
> YES
>
> source_node_name
>
> ANY
>
> YES
>
> business_date
>
> ANY
>
> YES
>
> PT_datetime_tran_local
>
> ANY
>
> YES
>
> PTC_pan
>
> ANY
>
> YES
>
> PTC_terminal_id
>
> ANY
>
> YES
>
> PTC_card_acceptor_id_code
>
> ANY
>
> YES
>
> PTC_card_acceptor_name_loc
>
> ANY
>
> YES
>
> PT_system_trace_audit_nr
>
> ANY
>
> YES
>
> PT_message_type
>
> ANY
>
> YES
>
> PT_from_account_id
>
> ANY
>
> YES
>
> PT_to_account_id
>
> ANY
>
> YES
>
> PT_tran_type
>
> ANY
>
> YES
>
> PT_rsp_code_req
>
> ANY
>
> YES
>
> PT_rsp_code_rsp
>
> ANY
>
> YES
>
> PT_settle_amount_req
>
> ANY
>
> YES
>
> PT_settle_amount_rsp
>
> ANY
>
> YES
>
> PTC_merchant_type
>
> ANY
>
> YES
>
> PT_settle_amount_impact
>
> ANY
>
> YES
>
> PT_settle_tran_fee_req
>
> ANY
>
> YES
>
> PT_settle_tran_fee_rsp
>
> ANY
>
> YES
>
> PT_auth_id_rsp
>
> ANY
>
> YES
>
> PT_retrieval_reference_nr
>
> ANY
>
> YES
>
> PTC_totals_group
>
> ANY
>
> YES
>
> PTC_card_product
>
> ANY
>
> YES
>
> PT_tran_currency_code
>
> ANY
>
> YES
>
> PT_payee
>
> ANY
>
> YES
>
> PT_tran_amount_req
>
> ANY
>
> YES
>
> PT_tran_amount_rsp
>
> ANY
>
> YES
>
> PT_extended_tran_type
>
> ANY
>
> YES
>
> PT_tran_postilion_originated
>
> ANY
>
> YES
>
> PT_tran_completed
>
> ANY
>
> YES
>
> PT_tran_nr
>
> ANY
>
> YES
>
> PT_retention_data
>
> ANY
>
> YES
>
> PT_acquiring_inst_id_code
>
> ANY
>
> YES
>
> PT_message_reason_code
>
> ANY
>
> YES
>
> PT_sponsor_bank
>
> ANY
>
> YES
>
> PT_datetime_tran_gmt
>
> ANY
>
> YES
>
> PT_datetime_req
>
> ANY
>
> YES
>
> PT_datetime_rsp
>
> ANY
>
> YES
>
> PT_realtime_business_date
>
> ANY
>
> YES
>
> PT_recon_business_date
>
> ANY
>
> YES
>
> PT_from_account_type
>
> ANY
>
> YES
>
> PT_to_account_type
>
> ANY
>
> YES
>
> PT_tran_cash_req
>
> ANY
>
> YES
>
> PT_tran_cash_rsp
>
> ANY
>
> YES
>
> PT_tran_tran_fee_req
>
> ANY
>
> YES
>
> PT_tran_tran_fee_rsp
>
> ANY
>
> YES
>
> PT_tran_tran_fee_currency_code
>
> ANY
>
> YES
>
> PT_tran_proc_fee_req
>
> ANY
>
> YES
>
> PT_tran_proc_fee_rsp
>
> ANY
>
> YES
>
> PT_tran_proc_fee_currency_code
>
> ANY
>
> YES
>
> PT_settle_currency_code
>
> ANY
>
> YES
>
> PT_pos_entry_mode
>
> ANY
>
> YES
>
> PT_pos_condition_code
>
> ANY
>
> YES
>
> PT_tran_reversed
>
> ANY
>
> YES
>
> PT_card_verification_result
>
> ANY
>
> YES
>
> PT_online_system_id
>
> ANY
>
> YES
>
> PT_participant_id
>
> ANY
>
> YES
>
> PT_receiving_inst_id_code
>
> ANY
>
> YES
>
> PT_routing_type
>
> ANY
>
> YES
>
> PT_pt_pos_card_input_mode
>
> ANY
>
> YES
>
> PT_source_node_key
>
> ANY
>
> YES
>
> PT_proc_online_system_id
>
> ANY
>
> YES
>
> PTC_post_tran_cust_id
>
> ANY
>
> YES
>
> PTC_source_node_name
>
> ANY
>
> YES
>
> PTC_card_seq_nr
>
> ANY
>
> YES
>
> PTC_expiry_date
>
> ANY
>
> YES
>
> PTC_service_restriction_code
>
> ANY
>
> YES
>
> PTC_terminal_owner
>
> ANY
>
> YES
>
> PTC_mapped_card_acceptor_id_code
>
> ANY
>
> YES
>
> PTC_pos_terminal_type
>
> ANY
>
> YES
>
> PTC_pan_encrypted
>
> ANY
>
> YES
>
> PTSP_Account_Nr
>
> ANY
>
> YES
>
> PTSP_Code
>
> ANY
>
> YES
>
> account_PTSP_Code
>
> ANY
>
> YES
>
> PTSP_Name
>
> ANY
>
> YES
>
> Sort_Code
>
> ANY
>
> YES
>
> rdm_amt
>
> ANY
>
> YES
>
> Reward_Code
>
> ANY
>
> YES
>
> Reward_discount
>
> ANY
>
> YES
>
> ptsp_terminal_id
>
> ANY
>
> YES
>
> reward_terminal_id
>
> ANY
>
> YES
>
>
> txn_id
>
> ANY
>
> YES
>
> web_category_code
>
> ANY
>
> YES
>
> web_category_name
>
> ANY
>
> YES
>
> web_fee_type
>
> ANY
>
> YES
>
> web_merchant_disc
>
> ANY
>
> YES
>
> web_fee_cap
>
> ANY
>
> YES
>
> Account_Name
>
> ANY
>
> YES
>
> account_nr
>
> ANY
>
> YES
>
> Acquiring_bank
>
> ANY
>
> YES
>
> journal_amount
>
> ANY
>
> YES
>
> xls_amount
>
> ANY
>
> YES
>
> merch_cat_amount_cap
>
> ANY
>
> YES
>
> merch_cat_visa_amount_cap
>
> ANY
>
> YES
>
> reward_amount_cap
>
> ANY
>
> YES
>
> Amount_config_state
>
> ANY
>
> YES
>
> Amount_description
>
> ANY
>
> YES
>
> Authorized_Person
>
> ANY
>
> YES
>
> ACC_BANK_CODE
>
> ANY
>
> YES
>
> merch_cat_bearer
>
> ANY
>
> YES
>
> merch_cat_visa_bearer
>
> ANY
>
> YES
>
> merch_cat_visa_category_code
>
> ANY
>
> YES
>
> merch_cat_category_code
>
> ANY
>
> YES
>
> merch_cat_visa_category_name
>
> ANY
>
> YES
>
> merch_cat_category_name
>
> ANY
>
> YES
>
> credit_acc_id
>
> ANY
>
> YES
>
> credit_acc_nr_id
>
> ANY
>
> YES
>
> credit_cardholder_acc_id
>
> ANY
>
> YES
>
> credit_cardholder_acc_type
>
> ANY
>
> YES
>
> CreditAccNr_acc_id
>
> ANY
>
> YES
>
> CreditAccNr_acc_nr
>
> ANY
>
> YES
>
> CreditAccNr_acc_nr_id
>
> ANY
>
> YES
>
> CreditAccNr_se_id
>
> ANY
>
> YES
>
> CreditAccNr_state
>
> ANY
>
> YES
>
> Date_Modified
>
> ANY
>
> YES
>
> debit_acc_id
>
> ANY
>
> YES
>
> debit_acc_nr_id
>
> ANY
>
> YES
>
> debit_cardholder_acc_id
>
> ANY
>
> YES
>
> debit_cardholder_acc_type
>
> ANY
>
> YES
>
> DebitAccNr_acc_id
>
> ANY
>
> YES
>
> DebitAccNr_acc_nr
>
> ANY
>
> YES
>
> DebitAccNr_acc_nr_id
>
> ANY
>
> YES
>
> extended_trans_type
>
> ANY
>
> YES
>
> fee
>
> ANY
>
> YES
>
> Fee_amount_id
>
> ANY
>
> YES
>
> merch_cat_fee_cap
>
> ANY
>
> YES
>
> merch_cat_visa_fee_cap
>
> ANY
>
> YES
>
> reward_fee_cap
>
> ANY
>
> YES
>
> Fee_description
>
> ANY
>
> YES
>
> Fee_Discount
>
> ANY
>
> YES
>
> Fee_fee_id
>
> ANY
>
> YES
>
> fee_id
>
> ANY
>
> YES
>
> Fee_name
>
> ANY
>
> YES
>
> Fee_se_id
>
> ANY
>
> YES
>
> merch_cat_category_fee_type
>
> ANY
>
> YES
>
> merch_cat_category_visa_fee_type
>
> ANY
>
> YES
>
> merch_cat_category_merch_discount
>
> ANY
>
> YES
>
> merch_cat_category_visa_merch_discount
>
> ANY
>
> YES
>
> PT_sink_node_name
>
> ANY
>
> YES
>
> post_tran_id
>
> ANY
>
> YES
>
> Miscellaneous
>
> ANY
>
> YES
>
> terminal_owner
>
> ANY
>
> YES
>
> terminal_owner_name
>
> ANY
>
> YES
>
> REF_rpt_account_type
>
> ANY
>
> YES
>
> REF_IsPoolAccount
>
> ANY
>
> YES
>
> From_Account_Type
>
> ANY
>
> YES
>
> To_Account_Type
>
> ANY
>
> YES
>
> Tran_type_description
>
> ANY
>
> YES
>
> Region
>
> ANY
>
> YES
>
> Transaction_Status
>
> ANY
>
> YES
>
> Transaction_type_Impact
>
> ANY
>
> YES
>
> Message_Type_Desc
>
> ANY
>
> YES
>
> Response_Code_description
>
> ANY
>
> YES
>
> REF_BatchId
>
> ANY
>
> YES
>
> Beneficiary_Account
>
> ANY
>
> YES
>
> Rate
>
> ANY
>
> YES
>
> final_fee
>
> ANY
>
> YES
>
> final_fee_type
>
> ANY
>
> YES
>
> settlement_date
>
> ANY
>
> YES
>
>
> Please what can I do to improve performance of this query
>
>

Reply via email to