[GitHub] drill pull request #997: DRILL-5582: C++ Client: [Threat Modeling] Drillbit ...
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/997#discussion_r145319769 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -518,6 +518,22 @@ bool DrillClientImpl::clientNeedsEncryption(const DrillUserProperties* userPrope return needsEncryption; } +/* + * Checks if the client has explicitly expressed interest in authenticated connections only. + * If the USERPROP_PASSWORD or USERPROP_AUTH_MECHANISM connection string properties are set, + * then it is implied that the client wants authentication. + */ +bool DrillClientImpl::clientNeedsAuthentication(const DrillUserProperties* userProperties) { +bool needsAuthentication = false; +if(!userProperties) { +return false; +} +needsAuthentication = userProperties->isPropSet(USERPROP_PASSWORD) || +userProperties->isPropSet(USERPROP_AUTH_MECHANISM); --- End diff -- I think we should also check if the `password & auth parameter` value is not empty string. ---
[GitHub] drill pull request #997: DRILL-5582: C++ Client: [Threat Modeling] Drillbit ...
Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/997#discussion_r145319760 --- Diff: contrib/native/client/src/clientlib/saslAuthenticatorImpl.cpp --- @@ -145,6 +145,8 @@ int SaslAuthenticatorImpl::init(const std::vector& mechanisms, exec authMechanismToUse = value; } } +// clientNeedsAuth cannot be false if the code above picks an authMechanism --- End diff -- clientNeedsAuth --> clientNeedsAuthentication ---
[GitHub] drill issue #999: DRILL-5881:Java Client: [Threat Modeling] Drillbit may be ...
Github user sohami commented on the issue: https://github.com/apache/drill/pull/999 @parthchandra - Please help to review this PR. I have added new unit tests for the change and made sure all the existing test's are also passing. ---
[GitHub] drill pull request #997: DRILL-5582: C++ Client: [Threat Modeling] Drillbit ...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/997#discussion_r145317288 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -595,6 +611,12 @@ connectionStatus_t DrillClientImpl::validateHandshake(DrillUserProperties* prope switch(this->m_handshakeStatus) { case exec::user::SUCCESS: +// Check if client needs auth/encryption and server is not requiring it +if(clientNeedsAuthentication(properties) || clientNeedsEncryption(properties)) { --- End diff -- Generally, all error messages come from errmsgs.cpp so we can localize them when we need to. ---
[GitHub] drill pull request #997: DRILL-5582: C++ Client: [Threat Modeling] Drillbit ...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/997#discussion_r145317403 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -595,6 +611,12 @@ connectionStatus_t DrillClientImpl::validateHandshake(DrillUserProperties* prope switch(this->m_handshakeStatus) { case exec::user::SUCCESS: +// Check if client needs auth/encryption and server is not requiring it --- End diff -- Not too clear about the SASL flow, but I assume that if the server returning SUCCESS is sufficient to assume that there is no auth required by the server. ---
[GitHub] drill pull request #999: DRILL-5881:Java Client: [Threat Modeling] Drillbit ...
GitHub user sohami opened a pull request: https://github.com/apache/drill/pull/999 DRILL-5881:Java Client: [Threat Modeling] Drillbit may be spoofed by ⦠â¦an attacker and this may lead to data being written to the attacker's target instead of Drillbit You can merge this pull request into a Git repository by running: $ git pull https://github.com/sohami/drill DRILL-5881 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/999.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #999 ---
[GitHub] drill pull request #998: DRILL-5887: Display process user/groups info in Dri...
GitHub user prasadns14 opened a pull request: https://github.com/apache/drill/pull/998 DRILL-5887: Display process user/groups info in Drill UI Display process user and process user groups in Drill UI @paul-rogers please review You can merge this pull request into a Git repository by running: $ git pull https://github.com/prasadns14/drill DRILL-5887 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/998.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #998 commit 9f0719423980dfb5d825e4f03a2a450c915ada7c Author: Prasad Nagaraj SubramanyaDate: 2017-10-18T00:49:11Z DRILL-5887: Display process user/groups info in Drill UI ---
[GitHub] drill issue #991: DRILL-5876: Remove netty-tcnative dependency from java-exe...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/991 @vrozov per your other tests, this is still broken for eclipse. So it seems that the best bet is to comment out the dependency and the os extension. Developers needing to debug, will need to uncomment the dependency. I will remove the additional commit. ---
[GitHub] drill pull request #991: DRILL-5876: Remove netty-tcnative dependency from j...
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/991#discussion_r145291048 --- Diff: exec/java-exec/pom.xml --- @@ -22,7 +22,10 @@ 1.8-rev1 + --- End diff -- Please uncomment (should be harmful) or move to openssl profile. ---
[GitHub] drill pull request #991: DRILL-5876: Remove netty-tcnative dependency from j...
Github user vrozov commented on a diff in the pull request: https://github.com/apache/drill/pull/991#discussion_r145289493 --- Diff: exec/java-exec/pom.xml --- @@ -693,6 +699,19 @@ + + openssl + + + io.netty + netty-tcnative + 2.0.1.Final --- End diff -- Please add provided scope. ---
[GitHub] drill pull request #991: DRILL-5876: Remove netty-tcnative dependency from j...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/991#discussion_r145288182 --- Diff: exec/java-exec/pom.xml --- @@ -701,18 +707,21 @@ - -kr.motd.maven -os-maven-plugin -1.5.0.Final - - + --- End diff -- Updated the PR with the latest recommendations. Using a different profile seems to work well. ---
[jira] [Created] (DRILL-5887) Display process user/ groups in Drill UI
Prasad Nagaraj Subramanya created DRILL-5887: Summary: Display process user/ groups in Drill UI Key: DRILL-5887 URL: https://issues.apache.org/jira/browse/DRILL-5887 Project: Apache Drill Issue Type: Bug Components: Client - HTTP Affects Versions: 1.11.0 Reporter: Prasad Nagaraj Subramanya Assignee: Prasad Nagaraj Subramanya Priority: Minor Fix For: 1.12.0 Drill UI only lists admin user/ groups specified as options We should display the process user/ groups who have admin privilege -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] drill pull request #997: DRILL-5582: C++ Client: [Threat Modeling] Drillbit ...
GitHub user bitblender opened a pull request: https://github.com/apache/drill/pull/997 DRILL-5582: C++ Client: [Threat Modeling] Drillbit may be spoofed by ⦠â¦an attacker and this may lead to data being written to the attacker's target instead of Drillbit You can merge this pull request into a Git repository by running: $ git pull https://github.com/bitblender/drill KM-DRILL-5582 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/997.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #997 commit 488ebefd4a2d096c9f02cbcdfd8c6984901b3444 Author: karthikDate: 2017-10-17T23:18:45Z DRILL-5582: C++ Client: [Threat Modeling] Drillbit may be spoofed by an attacker and this may lead to data being written to the attacker's target instead of Drillbit ---
[jira] [Resolved] (DRILL-5804) External Sort times out, may be infinite loop
[ https://issues.apache.org/jira/browse/DRILL-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou resolved DRILL-5804. --- Resolution: Fixed > External Sort times out, may be infinite loop > - > > Key: DRILL-5804 > URL: https://issues.apache.org/jira/browse/DRILL-5804 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: drillbit.log > > > Query is: > {noformat} > ALTER SESSION SET `exec.sort.disable_managed` = false; > select count(*) from ( > select * from ( > select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid > from ( > select d.type type, d.uid uid, flatten(d.map.rm) rms from > dfs.`/drill/testdata/resource-manager/nested_large` d order by d.uid > ) s1 > ) s2 > order by s2.rms.mapid, s2.rptds.a, s2.rptds.do_not_exist > ); > {noformat} > Plan is: > {noformat} > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) > 00-03 UnionExchange > 01-01StreamAgg(group=[{}], EXPR$0=[COUNT()]) > 01-02 Project($f0=[0]) > 01-03SingleMergeExchange(sort0=[4 ASC], sort1=[5 ASC], > sort2=[6 ASC]) > 02-01 SelectionVectorRemover > 02-02Sort(sort0=[$4], sort1=[$5], sort2=[$6], dir0=[ASC], > dir1=[ASC], dir2=[ASC]) > 02-03 Project(type=[$0], rptds=[$1], rms=[$2], uid=[$3], > EXPR$4=[$4], EXPR$5=[$5], EXPR$6=[$6]) > 02-04HashToRandomExchange(dist0=[[$4]], dist1=[[$5]], > dist2=[[$6]]) > 03-01 UnorderedMuxExchange > 04-01Project(type=[$0], rptds=[$1], rms=[$2], > uid=[$3], EXPR$4=[$4], EXPR$5=[$5], EXPR$6=[$6], > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($6, hash32AsDouble($5, > hash32AsDouble($4, 1301011)))]) > 04-02 Project(type=[$0], rptds=[$1], rms=[$2], > uid=[$3], EXPR$4=[ITEM($2, 'mapid')], EXPR$5=[ITEM($1, 'a')], > EXPR$6=[ITEM($1, 'do_not_exist')]) > 04-03Flatten(flattenField=[$1]) > 04-04 Project(type=[$0], rptds=[ITEM($2, > 'rptd')], rms=[$2], uid=[$1]) > 04-05SingleMergeExchange(sort0=[1 ASC]) > 05-01 SelectionVectorRemover > 05-02Sort(sort0=[$1], dir0=[ASC]) > 05-03 Project(type=[$0], uid=[$1], > rms=[$2]) > 05-04 > HashToRandomExchange(dist0=[[$1]]) > 06-01 UnorderedMuxExchange > 07-01Project(type=[$0], > uid=[$1], rms=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 1301011)]) > 07-02 > Flatten(flattenField=[$2]) > 07-03Project(type=[$0], > uid=[$1], rms=[ITEM($2, 'rm')]) > 07-04 > Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=maprfs:///drill/testdata/resource-manager/nested_large]], > selectionRoot=maprfs:/drill/testdata/resource-manager/nested_large, > numFiles=1, usedMetadataFile=false, columns=[`type`, `uid`, `map`.`rm`]]]) > {noformat} > Here is a segment of the drillbit.log, starting at line 55890: > {noformat} > 2017-09-19 04:22:56,258 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:2] DEBUG > o.a.d.e.t.g.SingleBatchSorterGen44 - Took 142 us to sort 1023 records > 2017-09-19 04:22:56,265 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:4] DEBUG > o.a.d.e.t.g.SingleBatchSorterGen44 - Took 105 us to sort 1023 records > 2017-09-19 04:22:56,268 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:3:0] DEBUG > o.a.d.e.p.i.p.PartitionSenderRootExec - Partitioner.next(): got next record > batch with status OK > 2017-09-19 04:22:56,275 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:7] DEBUG > o.a.d.e.t.g.SingleBatchSorterGen44 - Took 145 us to sort 1023 records > 2017-09-19 04:22:56,354 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:3:0] DEBUG > o.a.d.e.p.i.p.PartitionSenderRootExec - Partitioner.next(): got next record > batch with status OK > 2017-09-19 04:22:56,357 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:2] DEBUG > o.a.d.e.t.g.SingleBatchSorterGen44 - Took 143 us to sort 1023 records > 2017-09-19 04:22:56,361 [263f0252-fc60-7f8d-a1b1-c075876d1bd2:frag:2:0] DEBUG > o.a.d.exec.compile.ClassTransformer - Compiled and merged > PriorityQueueCopierGen50: bytecode size = 11.0 KiB, time = 124 ms. > 2017-09-19
[jira] [Created] (DRILL-5886) Operators should create batch sizes that the next operator can consume to avoid OOM
Robert Hou created DRILL-5886: - Summary: Operators should create batch sizes that the next operator can consume to avoid OOM Key: DRILL-5886 URL: https://issues.apache.org/jira/browse/DRILL-5886 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.11.0 Reporter: Robert Hou Attachments: 26478262-f0a7-8fc1-1887-4f27071b9c0f.sys.drill, drillbit.log.exchange Query is: {noformat} ALTER SESSION SET `exec.sort.disable_managed` = false alter session set `planner.memory.max_query_memory_per_node` = 482344960 alter session set `planner.width.max_per_node` = 1 alter session set `planner.width.max_per_query` = 1 alter session set `planner.disable_exchanges` = true select count(*) from (select * from dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50], columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520], columns[1410], columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350], columns[],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530], columns[3210] ) d where d.col433 = 'sjka skjf'; {noformat} This is the error from drillbit.log: 2017-09-12 17:36:53,155 [26478262-f0a7-8fc1-1887-4f27071b9c0f:frag:0:0] ERROR o.a.d.e.p.i.x.m.ExternalSortBatch - Insufficient memory to merge two batches. Incoming batch size: 409305088, available memory: 482344960 Here is the plan: {noformat} | 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02StreamAgg(group=[{}], EXPR$0=[COUNT()]) 00-03 Project($f0=[0]) 00-04SelectionVectorRemover 00-05 Filter(condition=[=(ITEM($0, 'col433'), 'sjka skjf')]) 00-06Project(T8¦¦*=[$0]) 00-07 SelectionVectorRemover 00-08Sort(sort0=[$1], sort1=[$2], sort2=[$3], sort3=[$4], sort4=[$5], sort5=[$6], sort6=[$7], sort7=[$8], sort8=[$9], sort9=[$10], sort10=[$11], sort11=[$12], sort12=[$9], sort13=[$13], sort14=[$14], sort15=[$15], sort16=[$16], sort17=[$17], sort18=[$18], sort19=[$19], sort20=[$20], sort21=[$21], sort22=[$12], sort23=[$22], sort24=[$23], sort25=[$24], sort26=[$25], sort27=[$26], sort28=[$27], sort29=[$28], sort30=[$29], sort31=[$30], sort32=[$31], sort33=[$32], sort34=[$33], sort35=[$34], sort36=[$35], sort37=[$36], sort38=[$37], sort39=[$38], sort40=[$39], sort41=[$40], sort42=[$41], sort43=[$42], sort44=[$43], sort45=[$44], sort46=[$45], sort47=[$46], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC], dir4=[ASC], dir5=[ASC], dir6=[ASC], dir7=[ASC], dir8=[ASC], dir9=[ASC], dir10=[ASC], dir11=[ASC], dir12=[ASC], dir13=[ASC], dir14=[ASC], dir15=[ASC], dir16=[ASC], dir17=[ASC], dir18=[ASC], dir19=[ASC], dir20=[ASC], dir21=[ASC], dir22=[ASC], dir23=[ASC], dir24=[ASC], dir25=[ASC], dir26=[ASC], dir27=[ASC], dir28=[ASC], dir29=[ASC], dir30=[ASC], dir31=[ASC], dir32=[ASC], dir33=[ASC], dir34=[ASC], dir35=[ASC], dir36=[ASC], dir37=[ASC], dir38=[ASC], dir39=[ASC], dir40=[ASC], dir41=[ASC], dir42=[ASC], dir43=[ASC], dir44=[ASC], dir45=[ASC], dir46=[ASC], dir47=[ASC]) 00-09 Project(T8¦¦*=[$0], EXPR$1=[ITEM($1, 450)], EXPR$2=[ITEM($1, 330)], EXPR$3=[ITEM($1, 230)], EXPR$4=[ITEM($1, 220)], EXPR$5=[ITEM($1, 110)], EXPR$6=[ITEM($1, 90)], EXPR$7=[ITEM($1, 80)], EXPR$8=[ITEM($1, 70)], EXPR$9=[ITEM($1, 40)], EXPR$10=[ITEM($1, 10)], EXPR$11=[ITEM($1, 20)], EXPR$12=[ITEM($1, 30)], EXPR$13=[ITEM($1, 50)], EXPR$14=[ITEM($1, 454)], EXPR$15=[ITEM($1, 413)], EXPR$16=[ITEM($1, 940)], EXPR$17=[ITEM($1, 834)], EXPR$18=[ITEM($1, 73)], EXPR$19=[ITEM($1, 140)], EXPR$20=[ITEM($1, 104)], EXPR$21=[ITEM($1, )], EXPR$22=[ITEM($1, 2420)], EXPR$23=[ITEM($1, 1520)], EXPR$24=[ITEM($1, 1410)], EXPR$25=[ITEM($1, 1110)], EXPR$26=[ITEM($1, 1290)], EXPR$27=[ITEM($1, 2380)], EXPR$28=[ITEM($1, 705)], EXPR$29=[ITEM($1, 45)], EXPR$30=[ITEM($1, 1054)], EXPR$31=[ITEM($1, 2430)], EXPR$32=[ITEM($1, 420)], EXPR$33=[ITEM($1, 404)], EXPR$34=[ITEM($1, 3350)], EXPR$35=[ITEM($1, )], EXPR$36=[ITEM($1, 153)], EXPR$37=[ITEM($1, 356)], EXPR$38=[ITEM($1, 84)], EXPR$39=[ITEM($1, 745)], EXPR$40=[ITEM($1, 1450)], EXPR$41=[ITEM($1, 103)], EXPR$42=[ITEM($1, 2065)], EXPR$43=[ITEM($1, 343)], EXPR$44=[ITEM($1, 3420)], EXPR$45=[ITEM($1, 530)], EXPR$46=[ITEM($1, 3210)]) 00-10Project(T8¦¦*=[$0], columns=[$1]) 00-11 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/resource-manager/3500cols.tbl, numFiles=1, columns=[`*`],
[jira] [Created] (DRILL-5885) Drill consumes 2x memory when sorting and reading a spilled batch from disk.
Robert Hou created DRILL-5885: - Summary: Drill consumes 2x memory when sorting and reading a spilled batch from disk. Key: DRILL-5885 URL: https://issues.apache.org/jira/browse/DRILL-5885 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.11.0 Reporter: Robert Hou The query is: {noformat} select count(*) from (select * from dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50], columns[454],columns[413],columns[940],columns[834],columns[73],columns[140],columns[104],columns[],columns[30],columns[2420],columns[1520], columns[1410], columns[1110],columns[1290],columns[2380],columns[705],columns[45],columns[1054],columns[2430],columns[420],columns[404],columns[3350], columns[],columns[153],columns[356],columns[84],columns[745],columns[1450],columns[103],columns[2065],columns[343],columns[3420],columns[530], columns[3210] ) d where d.col433 = 'sjka skjf'; {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] drill pull request #996: DRILL-5878: TableNotFound exception is being report...
Github user HanumathRao commented on a diff in the pull request: https://github.com/apache/drill/pull/996#discussion_r145268123 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java --- @@ -481,6 +485,19 @@ public RelOptTableImpl getTable(final List names) { .message("Temporary tables usage is disallowed. Used temporary table name: %s.", names) .build(logger); } + + // Check the schema and throw a valid SchemaNotFound exception instead of TableNotFound exception. --- End diff -- Thank you for the review. I agree that this should ideally be handled at Calcite layer. I also think that even after Calcite providing this functionality there should be some customization that needs to be done as we understand the context better than Calcite. Once the calcite fixes this issue then we can always change the code accordingly. ---
Drill Questions (Developer)
Hi, I'm not exactly sure which mail listing I'm suppose to ask these developer questions on, so sorry if there is any inconvenience. I'm developing a web application that relies on Drill as its main search/querying functionality. I've gone through the documentation, but there's a couple things that are still unclear to me when using Drill. If anyone a part of the core/developer team could address any of these questions I would appreciate it. 1. From a terminal session I'm able to start Drill and start to execute queries on the CLI. One tasks that I can do from the terminal is CREATE A TEMPORARY TABLE name AS query; execute that and right after the execution I'm able to query the tmp table as long as I keep the terminal session open . I would like to be able to do this from a REST client, I was wondering if there was anyway to chain SQL queries when making a request to POST http://localhost:8047/query.json? When I submit a query via the web-console or the REST API, the temporary table gets created, but when I want to issue another request to the tmp_table I just created, I'm not able to because the table at the point has already been dropped. Is there a way to chain two queries using the REST API to execute one after another and on return the last queries results? 2. I have streams of data being written to separate folders (folderA, folderB, folderC) in parquet format. Each stream has common columns that are shared across all streams, but they also have unique columns that only apply to the particular stream. I know I'm able to query all streams by just issuing a wildcard for the pattern of the directories and the results will return with an extra column titled dir0, with the reference to the directory the record came from. I'm wondering if there's a way to sort amongst the results that are returned, because as of my trial and errors, I have not been able to sort when querying across different stream schemas, only when I query one schema at a time I'm able to sort the results. Is there a way to construct my query that could potentially assist with this request? 3. Do you have examples of constructing a histogram like query against sample data by date? Thank you for your time. Best regards, -- Max Orelus +1 (202) 361-9946 maxore...@fastmail.com
[GitHub] drill pull request #996: DRILL-5878: TableNotFound exception is being report...
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/996#discussion_r145231215 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/SqlConverter.java --- @@ -481,6 +485,19 @@ public RelOptTableImpl getTable(final List names) { .message("Temporary tables usage is disallowed. Used temporary table name: %s.", names) .build(logger); } + + // Check the schema and throw a valid SchemaNotFound exception instead of TableNotFound exception. --- End diff -- Does it mean that Calcite instead of returning schema not found exception returns table not found exception? Per my understanding this PR customizes Drill but what if we go different path and enhance Calcite (or may be this is already done in newer Calcite versions)? ---
RE: log flooded by "date values definitively CORRECT"
Ouch! Looks like a logger was left behind in DEBUG mode. Can you manually turn that off? More memory would help in this case, because it seems that the foreman node is the one running out of heap space as it goes through the metadata for all the files. Is there a reason you are generating so many files to query? There is most likely a lower threshold for a parquet file size, below which you might be better off just using something like a CSV format. -Original Message- From: François Méthot [mailto:fmetho...@gmail.com] Sent: Tuesday, October 17, 2017 10:35 AM To: dev@drill.apache.org Subject: log flooded by "date values definitively CORRECT" Hi again, I am running into an issue on a query done on 760 000 parquet files stored in HDFS. We are using Drill 1.10, 8GB heap, 20GB direct mem. Drill runs with debug log enabled all the time. The query is standard select on 8 fields from hdfs.`/path` where this = that For about an hour I see this message on the foreman: [pool-9-thread-##] DEBUG o.a.d.exec.store.parquet.Metadata - It is determined from metadata that the date values are definitely CORRECT Then [some UUID:foreman] INFO o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata : Executed 761659 out of 761659 using 16 threads. Time : 3022416ms Then : Java.lang.OutOfMemoryError: Java Heap Space at java.util.Arrays.copyOf ... at java.io.PrintWriter.println(PrintWriter.java:757) at org.apache.calcite.rel.externalize.RelWriterImplt.explain (RelWriterImpl.java:118) at org.apachje.calcite.rel.externalize.RelWriterImpl.done (RelWriterImpl.java:160) ... at org.apache.calcite.plan.RelOptUtil.toString (RelOptUtil.java:1927) at org.apache.drill.exec.planner.sql.handlers.DefaultSQLHandler.log(DefaultSQLHandler.java:138) ... at org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler:102) at org.apache.drill.exec.planner.DrillSqlWorker.getQueryPlan(DrillSqlWorker:131) ... at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1050) at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) I think it might be caused by having too much files to query, chunking our select into smaller piece actually helped. Also suspect that the DEBUG logging is taxing the poor node a bit much. Do you think adding more memory would address the issue (I can't try this right now) or you would think it is caused by a bug? Thank in advance for any advises, Francois
[GitHub] drill issue #970: DRILL-5832: Migrate OperatorFixture to use SystemOptionMan...
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/970 There is a funny thing about the way Drill works. I can review your changes and commit them as soon as I provide a +1. My changes must wait until another committer to find time in their very busy schedules to consider this work. So, we'll likely commit yours first, I'll rebase mine on top of it, then wait for another committer to find time to consider it. The one exception would be if a non-committer can give this PR a +1 and a committer agrees to do a bulk commit this week. ---
log flooded by "date values definitively CORRECT"
Hi again, I am running into an issue on a query done on 760 000 parquet files stored in HDFS. We are using Drill 1.10, 8GB heap, 20GB direct mem. Drill runs with debug log enabled all the time. The query is standard select on 8 fields from hdfs.`/path` where this = that For about an hour I see this message on the foreman: [pool-9-thread-##] DEBUG o.a.d.exec.store.parquet.Metadata - It is determined from metadata that the date values are definitely CORRECT Then [some UUID:foreman] INFO o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata : Executed 761659 out of 761659 using 16 threads. Time : 3022416ms Then : Java.lang.OutOfMemoryError: Java Heap Space at java.util.Arrays.copyOf ... at java.io.PrintWriter.println(PrintWriter.java:757) at org.apache.calcite.rel.externalize.RelWriterImplt.explain (RelWriterImpl.java:118) at org.apachje.calcite.rel.externalize.RelWriterImpl.done (RelWriterImpl.java:160) ... at org.apache.calcite.plan.RelOptUtil.toString (RelOptUtil.java:1927) at org.apache.drill.exec.planner.sql.handlers.DefaultSQLHandler.log(DefaultSQLHandler.java:138) ... at org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler:102) at org.apache.drill.exec.planner.DrillSqlWorker.getQueryPlan(DrillSqlWorker:131) ... at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1050) at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) I think it might be caused by having too much files to query, chunking our select into smaller piece actually helped. Also suspect that the DEBUG logging is taxing the poor node a bit much. Do you think adding more memory would address the issue (I can't try this right now) or you would think it is caused by a bug? Thank in advance for any advises, Francois
[GitHub] drill issue #970: DRILL-5832: Migrate OperatorFixture to use SystemOptionMan...
Github user ilooner commented on the issue: https://github.com/apache/drill/pull/970 @paul-rogers Some of the changes I am making on top of https://github.com/apache/drill/pull/978/ as part of DRILL-5730 will likely conflict with this change. When do you think this could make it in? It would be helpful to have it merged sooner to avoid more conflicts down the line :) . ---
[GitHub] drill issue #936: DRILL-5772: Add unit tests to indicate how utf-8 support c...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/936 @paul-rogers agree with you that charsets used in saffron properties should be defaulted in Drill to `UTF-8` since Drill can read UTF-8 data and it's strange that it would fail by default when Calcite will attempt to parse string into literal used in query. I have looked into Calcite code and there is no option to hard-code charset values for Calcite but charset can be changed using properties. There are two options of setting saffron properties: 1. as system property; 2. using `saffron.properties` file. I don't really like passing them as `-D` when starting the drillbit 9since there are at least two), so I am more inclined to use `saffron.properties` file. Unfortunately, in Calcite code `saffron.properties` location is expected to be working folder [1], i.e. the place where java process was started. I have created Jira and pull request in Calcite to allow `saffron.properties` to be present in classpath since it's more convenient [2]. I'll keep you updated on Calcite community feedback. [1] https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/util/SaffronProperties.java#L113 [2] https://issues.apache.org/jira/browse/CALCITE-2014 ---
[GitHub] drill pull request #971: Drill-5834 Add Networking Functions
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/971#discussion_r145080845 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/NetworkFunctions.java --- @@ -0,0 +1,619 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.expr.fn.impl; + +import io.netty.buffer.DrillBuf; +import org.apache.drill.exec.expr.DrillSimpleFunc; +import org.apache.drill.exec.expr.annotations.FunctionTemplate; +import org.apache.drill.exec.expr.annotations.Output; +import org.apache.drill.exec.expr.annotations.Param; +import org.apache.drill.exec.expr.holders.BigIntHolder; +import org.apache.drill.exec.expr.holders.BitHolder; +import org.apache.drill.exec.expr.holders.VarCharHolder; + +import javax.inject.Inject; + +public class NetworkFunctions { + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(NetworkFunctions.class); + + private NetworkFunctions() {} + + /** + * This function takes two arguments, an input IPv4 and a CIDR, and returns true if the IP is in the given CIDR block + * + */ + @FunctionTemplate( +name = "in_network", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL + ) + public static class InNetworkFunction implements DrillSimpleFunc { + +@Param +VarCharHolder inputIP; + +@Param +VarCharHolder inputCIDR; + +@Output +BitHolder out; + +@Inject +DrillBuf buffer; + +public void setup() { +} + + +public void eval() { + + String ipString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputIP.start, inputIP.end, inputIP.buffer); + String cidrString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start, inputCIDR.end, inputCIDR.buffer); + + int result = 0; + org.apache.commons.net.util.SubnetUtils utils = new org.apache.commons.net.util.SubnetUtils(cidrString); + + if(utils.getInfo().isInRange(ipString) ){ +result = 1; + } + + out.value = result; +} + } + + + /** + * This function retunrs the number of IP addresses in the input CIDR block. + */ + @FunctionTemplate( +name = "address_count", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL + ) + public static class AddressCountFunction implements DrillSimpleFunc { + +@Param +VarCharHolder inputCIDR; + +@Output +BigIntHolder out; + +@Inject +DrillBuf buffer; + +public void setup() { +} + +public void eval() { + + String cidrString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start, inputCIDR.end, inputCIDR.buffer); + org.apache.commons.net.util.SubnetUtils utils = new org.apache.commons.net.util.SubnetUtils(cidrString); + + out.value = utils.getInfo().getAddressCount(); + +} + + } + + /** + * This function returns the broadcast address of a given CIDR block. + */ + @FunctionTemplate( +name = "broadcast_address", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL + ) + public static class BroadcastAddressFunction implements DrillSimpleFunc { + +@Param +VarCharHolder inputCIDR; + +@Output +VarCharHolder out; + +@Inject +DrillBuf buffer; + +public void setup() { +} + +public void eval() { + + String cidrString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start, inputCIDR.end,
[GitHub] drill pull request #971: Drill-5834 Add Networking Functions
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/971#discussion_r145078505 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/NetworkFunctions.java --- @@ -0,0 +1,668 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.expr.fn.impl; + +import io.netty.buffer.DrillBuf; +import org.apache.commons.net.util.SubnetUtils; +import org.apache.drill.exec.expr.DrillSimpleFunc; +import org.apache.drill.exec.expr.annotations.FunctionTemplate; +import org.apache.drill.exec.expr.annotations.Output; +import org.apache.drill.exec.expr.annotations.Param; +import org.apache.drill.exec.expr.annotations.Workspace; +import org.apache.drill.exec.expr.holders.BigIntHolder; +import org.apache.drill.exec.expr.holders.BitHolder; +import org.apache.drill.exec.expr.holders.VarCharHolder; + +import javax.inject.Inject; + +public class NetworkFunctions{ + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(NetworkFunctions.class); + + private NetworkFunctions() {} + + /** + * This function takes two arguments, an input IPv4 and a CIDR, and returns true if the IP is in the given CIDR block + * + */ + @FunctionTemplate( +name = "in_network", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL + ) + public static class InNetworkFunction implements DrillSimpleFunc { + +@Param +VarCharHolder inputIP; + +@Param +VarCharHolder inputCIDR; + +@Output +BitHolder out; + +@Inject +DrillBuf buffer; + +@Workspace +SubnetUtils utils; + +public void setup() { +} + + +public void eval() { + + String ipString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputIP.start, inputIP.end, inputIP.buffer); + String cidrString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start, inputCIDR.end, inputCIDR.buffer); + + int result = 0; + utils = new org.apache.commons.net.util.SubnetUtils(cidrString); + + if( utils.getInfo().isInRange( ipString ) ){ +result = 1; + } + else{ +result = 0; + } + out.value = result; +} + } + + + /** + * This function retunrs the number of IP addresses in the input CIDR block. + */ + @FunctionTemplate( +name = "getAddressCount", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL + ) + public static class getAddressCountFunction implements DrillSimpleFunc { + +@Param +VarCharHolder inputCIDR; + +@Output +BigIntHolder out; + +@Inject +DrillBuf buffer; + +@Workspace +SubnetUtils utils; + +public void setup() { +} + +public void eval() { + + String cidrString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start, inputCIDR.end, inputCIDR.buffer); + utils = new org.apache.commons.net.util.SubnetUtils(cidrString); + + out.value = utils.getInfo().getAddressCount(); + +} + + } + + /** + * This function returns the broadcast address of a given CIDR block. + */ + @FunctionTemplate( +name = "getBroadcastAddress", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL + ) + public static class getBroadcastAddressFunction implements DrillSimpleFunc { + +@Param +VarCharHolder inputCIDR; + +@Output +VarCharHolder out; + +@Inject +DrillBuf buffer; + +@Workspace
[GitHub] drill pull request #971: Drill-5834 Add Networking Functions
Github user arina-ielchiieva commented on a diff in the pull request: https://github.com/apache/drill/pull/971#discussion_r145078865 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/NetworkFunctions.java --- @@ -0,0 +1,619 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.expr.fn.impl; + +import io.netty.buffer.DrillBuf; +import org.apache.drill.exec.expr.DrillSimpleFunc; +import org.apache.drill.exec.expr.annotations.FunctionTemplate; +import org.apache.drill.exec.expr.annotations.Output; +import org.apache.drill.exec.expr.annotations.Param; +import org.apache.drill.exec.expr.holders.BigIntHolder; +import org.apache.drill.exec.expr.holders.BitHolder; +import org.apache.drill.exec.expr.holders.VarCharHolder; + +import javax.inject.Inject; + +public class NetworkFunctions { + static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(NetworkFunctions.class); + + private NetworkFunctions() {} + + /** + * This function takes two arguments, an input IPv4 and a CIDR, and returns true if the IP is in the given CIDR block + * + */ + @FunctionTemplate( +name = "in_network", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL + ) + public static class InNetworkFunction implements DrillSimpleFunc { + +@Param +VarCharHolder inputIP; + +@Param +VarCharHolder inputCIDR; + +@Output +BitHolder out; + +@Inject +DrillBuf buffer; + +public void setup() { +} + + +public void eval() { + + String ipString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputIP.start, inputIP.end, inputIP.buffer); + String cidrString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start, inputCIDR.end, inputCIDR.buffer); + + int result = 0; + org.apache.commons.net.util.SubnetUtils utils = new org.apache.commons.net.util.SubnetUtils(cidrString); + + if(utils.getInfo().isInRange(ipString) ){ +result = 1; + } + + out.value = result; +} + } + + + /** + * This function retunrs the number of IP addresses in the input CIDR block. + */ + @FunctionTemplate( +name = "address_count", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL + ) + public static class AddressCountFunction implements DrillSimpleFunc { + +@Param +VarCharHolder inputCIDR; + +@Output +BigIntHolder out; + +@Inject +DrillBuf buffer; + +public void setup() { +} + +public void eval() { + + String cidrString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start, inputCIDR.end, inputCIDR.buffer); + org.apache.commons.net.util.SubnetUtils utils = new org.apache.commons.net.util.SubnetUtils(cidrString); + + out.value = utils.getInfo().getAddressCount(); + +} + + } + + /** + * This function returns the broadcast address of a given CIDR block. + */ + @FunctionTemplate( +name = "broadcast_address", +scope = FunctionTemplate.FunctionScope.SIMPLE, +nulls = FunctionTemplate.NullHandling.NULL_IF_NULL + ) + public static class BroadcastAddressFunction implements DrillSimpleFunc { + +@Param +VarCharHolder inputCIDR; + +@Output +VarCharHolder out; + +@Inject +DrillBuf buffer; + +public void setup() { +} + +public void eval() { + + String cidrString = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(inputCIDR.start, inputCIDR.end,
[GitHub] drill issue #976: DRILL-5797: Choose parquet reader from read columns
Github user dprofeta commented on the issue: https://github.com/apache/drill/pull/976 I updated the javadoc with Paul remarks. ---
[jira] [Created] (DRILL-5884) Encode dot characters and other special characters in identifiers
second88 created DRILL-5884: --- Summary: Encode dot characters and other special characters in identifiers Key: DRILL-5884 URL: https://issues.apache.org/jira/browse/DRILL-5884 Project: Apache Drill Issue Type: Wish Components: Client - JDBC, Client - ODBC, Metadata, SQL Parser, Storage - JDBC Affects Versions: 1.10.0 Environment: OS: Windows 7 32-bit Reporting tools: Crystal Reports 2008, Crystal Reports 2016 Reporter: second88 Crystal Reports 2008 & 2016 do not work for generic JDBC / ODBC drivers (including Drill) if there are dot characters in identifiers such as schema names. For example, given that there exists a view called `dfs.tmp`.`A`, it is not listed under schema `dfs.tmp` in the report creation wizard of Crystal Reports 2008 / 2016. It is because Crystal Reports chops schema name from "dfs.tmp" to "tmp" due the dot character and then tries to retrieve the table names under the non-existing schema "tmp" using the metadata API of JDBC / ODBC. I suggest to add an optional parameter called "url_encodes_id" to the connection string where the default value is false. When url_encodes_id=true, the JDBC / ODBC driver or the SQL parser on the server side provides URL-encoded metadata information such as schema names and table names and URL-decodes the identifiers before it actually execute the metadata API or SQL statements. For example, the following methods of DatabaseMetaData takes URL-encoded IDs / patterns and return URL-encoded IDs: getSchemas() getSchemas(String catalog, String schemaPattern) getTables(String catalog, String schemaPattern, String tableNamePattern, String types[]) And the following select statement, of which the schema name is URL-encoded, will be able to execute by JDBC / ODBC driver: {code:sql} SELECT `A`.`ID` FROM `dfs%2etmp`.`A` `A` {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)