Hello,
I am working on a proof of concept to optimize the performance of the join
queries executed through drill. The underlying storage is a NO-SQL based
database - Mongo DB. The time it takes to return the result of the join query
is very high (46 seconds). Upon further analysis, as per the physical plan of
the query it is observed that both the left side (15 lakh records) and right
side table (13 lakh) are fully scanned and it takes 24 seconds and 20 seconds
respectively for the same.
Here is the query -
select ta.[SOME_COLUMN] from mongo.Test.TABLEA ta INNER JOIN mongo.Test.TABLEB
ta ON ta.Id = tb.Id and ta.Id ='123'
Records in table A : 15 Lakhs
Records in table B : 13 Lakhs
Filter condition : Id is indexed field in both the tables (ascending)
Drill Plan shows Hash-join being performed as shown in the plan (as attached
above)
I have the following questions -
1) Why does drill fetch all the records into the memory even though filter
condition on the indexed column was provided for one of the table ?
2) If the Drill planner/optimizer is intelligent then records could have
been filtered on the second table as well based on the join condition (to
reduce the dataset and hence result in faster execution time)
3) Are there any performance tuning options available to bring down the
query response time and generate a better query plan ?
I would appreciate if you can provide me pointers/directions or answers to the
above questions.
Please feel free if you need any additional information.
Thanks,
Mridul
***********************************************************************************
The Royal Bank of Scotland plc. Registered in Scotland No 90312.
Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
Authorised by the Prudential Regulation Authority and regulated
by the Financial Conduct Authority and Prudential Regulation Authority.
The Royal Bank of Scotland N.V. is authorised and regulated by the
De Nederlandsche Bank and has its seat at Amsterdam, the
Netherlands, and is registered in the Commercial Register under
number 33002587. Registered Office: Gustav Mahlerlaan 350,
Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and
The Royal Bank of Scotland plc are authorised to act as agent for each
other in certain jurisdictions.
This e-mail message is confidential and for use by the addressee only.
If the message is received by anyone other than the addressee, please
return the message to the sender by replying to it and then delete the
message from your computer. Internet e-mails are not necessarily
secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland
N.V. including its affiliates ("RBS group") does not accept responsibility
for changes made to this message after it was sent. For the protection
of RBS group and its clients and customers, and in compliance with
regulatory requirements, the contents of both incoming and outgoing
e-mail communications, which could include proprietary information and
Non-Public Personal Information, may be read by authorised persons
within RBS group other than the intended recipient(s).
Whilst all reasonable care has been taken to avoid the transmission of
viruses, it is the responsibility of the recipient to ensure that the onward
transmission, opening or use of this message and any attachments will
not adversely affect its systems or data. No responsibility is accepted
by the RBS group in this regard and the recipient should carry out such
virus and other checks as it considers appropriate.
Visit our website at www.rbs.com
***********************************************************************************