[GitHub] drill issue #713: DRILL-3562: Query fails when using flatten on JSON data wh...
Github user Serhii-Harnyk commented on the issue: https://github.com/apache/drill/pull/713 @amansinha100, could you please review new changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Drill To Hive connectivity issue
Hi Team, Need your help in resolving this. I am struggling from last many days for this. I am getting error(ERROR hive.log - Got exception: org.apache.thrift.transport.TTransportException java.net.SocketException: Broken pipe (Write failed) while trying to connect Drill to Hive. For Hive Microsoft HDInsight (Remote metastore (MS SQL Server)) is getting used and for Drill I am using other VM which is under same VNet as cluster. I am able to make Drill Storage plugin with below configuration { "type": "hive", "enabled": true, "configProps": { "hive.metastore.uris": "thrift://hn0-xyz.cloudapp.net:9083,thrift:// hn1-xyz.cloudapp.net:9083", "hive.metastore.warehouse.dir": "/hive/warehouse", "fs.default.name": "wasb://qwerty @demo.blob.core.windows.net", "hive.metastore.sasl.enabled": "false" } } Stack Trace of error: PFA core-site.xml: fs.azure.account.keyprovider.kkhdistore.blob.core.windows.net org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider fs.azure.shellkeyprovider.script /usr/lib/python2.7/dist-packages/hdinsight_common/decrypt.sh fs.azure.account.key.kkhdistore.blob.core.windows.net {COPY FROM CLUSTER core-site.xml} fs.AbstractFileSystem.wasb.impl org.apache.hadoop.fs.azure.Wasb Regards Uday Sharma er.udaysha...@gmail.com 0: jdbc:drill:zk=local> use hive; 17:57:19.515 [2779bbff-d7a9-058c-d133-b41795a0ee58:foreman] ERROR hive.log - Got exception: org.apache.thrift.transport.TTransportException java.net.SocketException: Broken pipe (Write failed) org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed) at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161) ~[drill-hive-exec-shaded-1.9.0.jar:1.9.0] at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65) ~[drill-hive-exec-shaded-1.9.0.jar:1.9.0] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.send_get_all_databases(ThriftHiveMetastore.java:733) ~[hive-metastore-1.2.1.jar:1.2.1] at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_all_databases(ThriftHiveMetastore.java:726) ~[hive-metastore-1.2.1.jar:1.2.1] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1031) ~[hive-metastore-1.2.1.jar:1.2.1] at org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient.getDatabasesHelper(DrillHiveMetaStoreClient.java:205) [drill-storage-hive-core-1.9.0.jar:1.9.0] at org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$DatabaseLoader.load(DrillHiveMetaStoreClient.java:489) [drill-storage-hive-core-1.9.0.jar:1.9.0] at org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$DatabaseLoader.load(DrillHiveMetaStoreClient.java:482) [drill-storage-hive-core-1.9.0.jar:1.9.0] at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527) [guava-18.0.jar:na] at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) [guava-18.0.jar:na] at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282) [guava-18.0.jar:na] at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197) [guava-18.0.jar:na] at com.google.common.cache.LocalCache.get(LocalCache.java:3937) [guava-18.0.jar:na] at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) [guava-18.0.jar:na] at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) [guava-18.0.jar:na] at org.apache.drill.exec.store.hive.DrillHiveMetaStoreClient$HiveClientWithCaching.getDatabases(DrillHiveMetaStoreClient.java:449) [drill-storage-hive-core-1.9.0.jar:1.9.0] at org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.getSubSchema(HiveSchemaFactory.java:139) [drill-storage-hive-core-1.9.0.jar:1.9.0] at org.apache.drill.exec.store.hive.schema.HiveSchemaFactory$HiveSchema.(HiveSchemaFactory.java:133) [drill-storage-hive-core-1.9.0.jar:1.9.0] at org.apache.drill.exec.store.hive.schema.HiveSchemaFactory.registerSchemas(HiveSchemaFactory.java:118) [drill-storage-hive-core-1.9.0.jar:1.9.0] at org.apache.drill.exec.store.hive.HiveStoragePlugin.registerSchemas(HiveStoragePlugin.java:100) [drill-storage-hive-core-1.9.0.jar:1.9.0] at org.apache.drill.exec.store.StoragePluginRegistryImpl$DrillSchemaFactory.registerSchemas(StoragePluginRegistryImpl.java:365) [drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.store.SchemaTreeProvider.createRootSchema(SchemaTreeProvider.java:72) [drill-java-exec-1.9.0.jar:1.9.0] at org.apache.drill.exec.store.SchemaTreeProvider.createRootSchema(SchemaTreeProvider.java:61) [drill-java-exec-1.9.0.jar:1.9.0] at
[GitHub] drill issue #713: DRILL-3562: Query fails when using flatten on JSON data wh...
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/713 LGTM. +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #656: DRILL-5034: Select timestamp from hive generated parquet a...
Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/656 I've rebased the branch to the latest master version. @bitblender Could you please review? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [HANGOUT] Topics for 01/24/17
Join us here: https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc On Jan 23, 2017, at 6:43 PM, Sudheesh Katkam> wrote: I meant 01/24/17, 10 AM PT. On Jan 23, 2017, at 12:43 PM, Sudheesh Katkam > wrote: Hi drillers, Our bi-weekly hangout is tomorrow (01/23/17, 10 AM PT). If you have any suggestions for hangout topics, you can add them to this thread. We will also ask around at the beginning of the hangout for topics. Thank you, Sudheesh
[jira] [Created] (DRILL-5216) Set FetchSize to Speed up Metadata retrieval for JDBC storage plugin over high latency connections
Thomas Bünger created DRILL-5216: Summary: Set FetchSize to Speed up Metadata retrieval for JDBC storage plugin over high latency connections Key: DRILL-5216 URL: https://issues.apache.org/jira/browse/DRILL-5216 Project: Apache Drill Issue Type: Improvement Components: Storage - JDBC Affects Versions: 1.9.0 Environment: drill-embedded on ubuntu client - connected to a remote Oracle Reporter: Thomas Bünger Priority: Minor The metadata retrieval uses the default fetchsize for the underlying JDBC driver, which in case of Oracle is only 10. In larger scenarios - as in mine - the Oracle cluster hosts thousands of schemas and the small fetchsize results in hundres of individual roundtrips. In the end every Drill query against this storage takes at least a minute (server is remote) So far, Drill is using the JDBC metadata API {{java.sql.DatabaseMetaData.getSchemas()}} inside JdbcStoragePlugin.java and could set an appropriate fetchsize before iterating the result set. I've tested this locally and improved latency a lot, but am note sure how this affects other non-oracle JDBC drivers. The other (potentially long) query is the table enumeration. >From what I've seen is Drill not calling the JDBC driver directly, but goes >through apache.calcite calling {{getTableNames()}} which under the hood calls >{{java.sql.DatabaseMetaData.getTables()}} and also contributes to slow >metadata retrieval due to small default fetch size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill pull request #685: Drill 5043: Function that returns a unique id per s...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/685#discussion_r97449967 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/ContextFunctions.java --- @@ -64,17 +65,45 @@ public void eval() { @Inject DrillBuf buffer; @Workspace int currentSchemaBytesLength; +@Override public void setup() { final byte[] currentSchemaBytes = contextInfo.getCurrentDefaultSchema().getBytes(); buffer = buffer.reallocIfNeeded(currentSchemaBytes.length); currentSchemaBytesLength= currentSchemaBytes.length; buffer.setBytes(0, currentSchemaBytes); } +@Override public void eval() { out.start = 0; out.end = currentSchemaBytesLength; out.buffer = buffer; } } + + /** + * Implement "session_id" function. Returns the unique id of the current session. + */ + @FunctionTemplate(name = "session_id", scope = FunctionTemplate.FunctionScope.SIMPLE, isNiladic = true) --- End diff -- I thought this new introduced flag "isNiladic" should not only applied to this new "session_id" function, but also for all the existing functions which falls into same category, for example, current_date /current_time? 1. https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DateTypeFunctions.java#L234 2. https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/DateTypeFunctions.java#L290 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-5218) Support Disabling Heartbeats in C++ Client
Sudheesh Katkam created DRILL-5218: -- Summary: Support Disabling Heartbeats in C++ Client Key: DRILL-5218 URL: https://issues.apache.org/jira/browse/DRILL-5218 Project: Apache Drill Issue Type: Bug Components: Client - C++ Reporter: Sudheesh Katkam Assignee: Sudheesh Katkam Heartbeats between bits allow for detecting health of remotes, but heartbeats between client and bit are not necessary. So allow to (at least) disable heartbeats between C++ client and bit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5217) Heartbeat Fails when C++ client receives a large ResultSet
Sudheesh Katkam created DRILL-5217: -- Summary: Heartbeat Fails when C++ client receives a large ResultSet Key: DRILL-5217 URL: https://issues.apache.org/jira/browse/DRILL-5217 Project: Apache Drill Issue Type: Bug Components: Client - C++ Reporter: Sudheesh Katkam Priority: Critical If the listener thread is occupied for longer than 15 seconds (heartbeat timeout) while [handling a message from the drillbit|https://github.com/apache/drill/blob/master/contrib/native/client/src/clientlib/drillClientImpl.cpp#L1286] e.g. [processing query data blocks if the query result listener's buffer is full|https://github.com/apache/drill/blob/master/contrib/native/client/src/clientlib/drillClientImpl.cpp#L899], heartbeats fail because the same thread is responsible for sending heartbeats! Fix is to [handle long running operations|http://stackoverflow.com/questions/17648725/long-running-blocking-operations-in-boost-asio-handlers] separately using boost asio. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5220) Add api to set application name in C++ connector
Laurent Goujon created DRILL-5220: - Summary: Add api to set application name in C++ connector Key: DRILL-5220 URL: https://issues.apache.org/jira/browse/DRILL-5220 Project: Apache Drill Issue Type: Improvement Components: Client - C++ Affects Versions: 1.8.0 Reporter: Laurent Goujon Priority: Minor There's no API for a C++ connector user to specify the name of the application, and to provide it to the server (optional field added in DRILL-4369) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5219) Remove DrillUserProperties filtering in C++ driver
Laurent Goujon created DRILL-5219: - Summary: Remove DrillUserProperties filtering in C++ driver Key: DRILL-5219 URL: https://issues.apache.org/jira/browse/DRILL-5219 Project: Apache Drill Issue Type: Bug Components: Client - C++ Reporter: Laurent Goujon Priority: Minor Unlike the Java client, the C++ connector filter out unknown Drill user properties: https://github.com/apache/drill/blob/master/contrib/native/client/src/clientlib/drillClientImpl.cpp#L374 This prevents a client (like the ODBC driver) to pass extra properties to the server (like extra metainformation, or some specific behavior for a given software) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill pull request #726: DRILL-5218: Support disabling hearbeats from C++ cl...
GitHub user sudheeshkatkam opened a pull request: https://github.com/apache/drill/pull/726 DRILL-5218: Support disabling hearbeats from C++ client + remove invalid code (server should not request "handshake" type), in fact, client should fail in that case You can merge this pull request into a Git repository by running: $ git pull https://github.com/sudheeshkatkam/drill DRILL-5218 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/726.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #726 commit 6bc07bbba246ae376b6accc4e38f076bb15b83aa Author: Sudheesh KatkamDate: 2017-01-24T21:10:31Z DRILL-5218: Support disabling hearbeats from C++ client --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #673: DRILL-4764: Parquet file with INT_16, etc. logical types n...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/673 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-5221) cancel message is delayed until queryid or data is received
Laurent Goujon created DRILL-5221: - Summary: cancel message is delayed until queryid or data is received Key: DRILL-5221 URL: https://issues.apache.org/jira/browse/DRILL-5221 Project: Apache Drill Issue Type: Improvement Components: Client - C++ Affects Versions: 1.9.0 Reporter: Laurent Goujon When user is calling the cancel method of the C++ client, the client wait for a message from the server to reply back with a cancellation message. In case of queries taking a long time to return their first batch, it means cancellation is taking the same amount of time to be effective, instead of cancelling right away the query (assuming the query id has already been received, which is generally the case). It seems this was foreseen by [~vkorukanti] in his initial patch (https://github.com/vkorukanti/drill/commit/e0ef6349aac48de5828b6d725c2cf013905d18eb) but was omitted when I backported it post metadata changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill pull request #685: Drill 5043: Function that returns a unique id per s...
Github user nagarajanchinnasamy commented on a diff in the pull request: https://github.com/apache/drill/pull/685#discussion_r97721703 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/ContextFunctions.java --- @@ -64,17 +65,45 @@ public void eval() { @Inject DrillBuf buffer; @Workspace int currentSchemaBytesLength; +@Override public void setup() { final byte[] currentSchemaBytes = contextInfo.getCurrentDefaultSchema().getBytes(); buffer = buffer.reallocIfNeeded(currentSchemaBytes.length); currentSchemaBytesLength= currentSchemaBytes.length; buffer.setBytes(0, currentSchemaBytes); } +@Override public void eval() { out.start = 0; out.end = currentSchemaBytesLength; out.buffer = buffer; } } + + /** + * Implement "session_id" function. Returns the unique id of the current session. + */ + @FunctionTemplate(name = "session_id", scope = FunctionTemplate.FunctionScope.SIMPLE, isNiladic = true) --- End diff -- - Functions like current_time/current_date already function like niladic as Calcite recognizes them so. - IsNiladic flag can be set by any UDF. Not restricted only to session_I'd - If I understand right, currently there is no categorization of UDFs in Drill. If we want to restrict isNiladic flag only to a particular category of UDFs, then categorization of UDFs need to be designed n implemented. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #726: DRILL-5218: Support disabling hearbeats from C++ cl...
Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/726#discussion_r97665284 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -179,9 +179,11 @@ connectionStatus_t DrillClientImpl::sendHeartbeat(){ } void DrillClientImpl::resetHeartbeatTimer(){ -m_heartbeatTimer.cancel(); -DRILL_MT_LOG(DRILL_LOG(LOG_TRACE) << "Reset Heartbeat timer." << std::endl;) -startHeartbeatTimer(); +if (DrillClientConfig::getHeartbeatFrequency() > 0) { +m_heartbeatTimer.cancel(); --- End diff -- I was wondering if one needs to cancel the timer, as startHearbeatTimer sets it again to expire. Maybe all this logic can be done in one place (like startHeartbeatTimer?) I also noticed another place where m_heartbeatTime.cancel() is called, in broadcastError. I guess this is fine (probably not an error of calling cancel() if not set, but haven't checked asio doc on it), but maybe this should be cleaned up/guarded too... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #726: DRILL-5218: Support disabling hearbeats from C++ cl...
Github user laurentgo commented on a diff in the pull request: https://github.com/apache/drill/pull/726#discussion_r97663988 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -1400,22 +1404,6 @@ void DrillClientImpl::handleRead(ByteBuf_t _buf, break; case exec::user::HANDSHAKE: --- End diff -- maybe you can also remove the case statement --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #710: DRILL-5126: Provide simplified, unified "cluster fixture" ...
Github user sudheeshkatkam commented on the issue: https://github.com/apache/drill/pull/710 +1, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #685: Drill 5043: Function that returns a unique id per session/...
Github user nagarajanchinnasamy commented on the issue: https://github.com/apache/drill/pull/685 @arina-ielchiieva conflicts resolved and rebased on master. Please check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #725: DRILL-5215: CTTAS: disallow temp tables in view exp...
GitHub user arina-ielchiieva opened a pull request: https://github.com/apache/drill/pull/725 DRILL-5215: CTTAS: disallow temp tables in view expansion logic 1. Disallowed temporary table usage during in view expansion. 2. Added appropriate unit test. 3. Replace link to gist with CTTAS design doc to Jira link. You can merge this pull request into a Git repository by running: $ git pull https://github.com/arina-ielchiieva/drill DRILL-5215 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/725.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #725 commit 158f37233913e314a6212e8fb85a2ddf9cda Author: Arina IelchiievaDate: 2017-01-24T12:33:11Z DRILL-5215: CTTAS: disallow temp tables in view expansion logic --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #685: Drill 5043: Function that returns a unique id per session/...
Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/685 @nagarajanchinnasamy, thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #713: DRILL-3562: Query fails when using flatten on JSON ...
Github user Serhii-Harnyk commented on a diff in the pull request: https://github.com/apache/drill/pull/713#discussion_r97576603 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java --- @@ -59,6 +59,12 @@ private final boolean readNumbersAsDouble; /** + * Collection for tracking empty array writers during reading + * and storing them for initializing empty arrays + */ + private final Set emptyArrayWritersSet = Sets.newHashSet(); --- End diff -- @amansinha100 Yes, you are right, in this place should be used List. Fixed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---