Re: queries not being submitted in Impala cluster despite free resources

2017-01-31 Thread Jeszy
Hey William, IIUC you have configured both a memory-based upper bound and a # queries upper bound for the default pool. A query can get queued if it would exceed either of these limits. If you're not hitting the number of queries one, then it's probably memory, which can happen even if not fully u

Re: queries not being submitted in Impala cluster despite free resources

2017-01-31 Thread Jeszy
That would be good. If they eventually run successfully, a query profile would also be welcome. Thanks On Tue, Jan 31, 2017 at 4:28 PM, William Cox wrote: > Jeszy, > > Thanks for the suggestion. We also have a 25GB per-query limit set up. > Queries that estimate a large size are r

Re: Impala Hbase Security

2017-02-08 Thread Jeszy
Hey Danny, As far as I know Sentry doesn't work with HBase out of the box (so not sure about Sentry's HBase model). If you use Sentry, Impala will validate against Sentry's privilege db (it's cached version in the catalog) before doing anything. That means that if Impala is the only interface to

Re: Impala Failed to read file from HDFS

2017-03-09 Thread Jeszy
Hello, Sounds like Impala expected 1.parquet to be in the folder, but it wasn't. You probably forgot to do 'refresh ' after altering data from the outside. HTH On Fri, Mar 10, 2017 at 7:30 AM, 俊杰陈 wrote: > Hi, > I'm using latest impala built from github, and setup impala cluster with > 2-nodes

Re: Impala Failed to read file from HDFS

2017-03-10 Thread Jeszy
on: 2 | >>>> +--+ >>>> Fetched 1 row(s) in 0.50s >>>> [bdpe30-cjj:21000] > select count(*) from test; >>>> Query: select count(*) from test >>>> Query submitted at: 2017-03-10 0

Re: low CPU usage

2017-03-13 Thread Jeszy
Hey Joseph, Yes, basically everything except scans uses one thread per operator. IMPALA-3902 tracks multithreading. You can trick impala to use more threads for the same job by altering the query (break it up into smaller tasks by limiting the scans, then union), but it's not really a viable/nice

Re: Is there any way to retrieve table metadata using select rather than show?

2017-04-06 Thread Jeszy
Hey, that's not possible from within impala. If you go directly to the HMS's backing DB, you can query that. What information are you looking for? Thanks. On Thu, Apr 6, 2017 at 3:02 PM, 吴朱华 wrote: > Hi guys: > > Currently, we are using "show databases","show tables" or "Describe table" > to re

Re: CDH 5.10.1 - Cannot perform hash join at node with id 58. Repartitioning did not reduce the size of a spilled partition. Repartitioning level 7. Number of rows 1.

2017-04-13 Thread Jeszy
Hello, Yes, it's an alternate way of saying 'Out of memory'. Judging by the number of rows (1) that should fit in the memory, my guess would be that some other operator (maybe same, maybe other query) is using up all the memory on this node. HTH 2017-04-13 20:28 GMT+02:00 Krishnanand Khambadkone

Re: Enable Impala-kudu table, column stats.

2017-04-25 Thread Jeszy
Hey, The difference in the distinct values is expected, the estimation that the NDV function (https://www.cloudera.com/documentation/enterprise/latest/topics/impala_ndv.html) gives is good enough, and the execution is much faster. You can set both the table and the column stats manually as descri

Re: Specifying multiple hosts when connecting to Impala over JDBC

2017-06-27 Thread Jeszy
Hello, Failover would have to be handled by the JDBC client so it's outside of Impala's scope. Using HAProxy to route connections is fairly common. Regards, Jeszy On 27 June 2017 at 16:41, Aleksei Maželis wrote: > Hi, > > Is it possible to specify multiple hosts when connec

Re: [Impala] bucketing

2017-08-09 Thread Jeszy
Hello Andrey, No, it is an open request: https://issues.apache.org/jira/browse/IMPALA-3118 HTH On 9 August 2017 at 16:40, Andrey Kuznetsov wrote: > Hi folk, > > Need help, > > Do you know if we have bucketing or some replacement for bucketing in > impala? > > > > Best regards, > > ANDREY KUZNET

Re: Impala daemon memory

2017-08-17 Thread Jeszy
Hey, The table itself is just Impala's metadata (indeed located in every coordinator's catalog cache), the underlying data (read from HDFS) is stored on [replication factor] number of nodes. Jeszy On 17 August 2017 at 16:38, Alexander Shoshin wrote: > Hi Petter, > > > &

Re: Cloudera jdbc driver vs apache hive jdbc driver performance difference

2017-09-07 Thread Jeszy
Hello, It's not possible unfortunately to get the profile directly through JDBC. Can you please clarify which driver is faster? I expect Cloudera's Impala (and not hive, that's untested) JDBC driver to be faster than the Apache Hive driver. Can you attach profiles for both runs? Even though you ca

Re: HDFS timeout setting

2017-09-08 Thread Jeszy
Hello Nasron, The read hanging is an HDFS timeout issue, to work around it you would need an HDFS version which has HADOOP-12672. After that, you should be able to set ipc.client.rpc-timeout.ms to a reasonable value in Impala's HDFS client settings. HTH On 8 September 2017 at 20:24, Nasron Cheon

Re: 答复: Impala Memory Leak (version 2.7.0-cdh5.10.2 )

2017-10-10 Thread Jeszy
Hello Carl, Impala will not close the query until you explicitly call Close() on the statement - you can cause queries like this to time out automatically by setting --idle_session_timeout as a startup option for Impala. This behaviour is tracked at https://issues.apache.org/jira/browse/IMPALA-157

Re: Parquet min/max statistics & null values

2017-10-26 Thread Jeszy
Hello Bruno, Thanks for bringing this up. While not apparent from the commit comments, this limitation was mentioned during the code review: 'min/max are only set when there are non-null values, so we don't consider statistics for "is null".' (see https://gerrit.cloudera.org/#/c/6147/). It looks t