Hive LIMIT clause slows query

2017-11-02 Thread Igor Kuzmenko
I'm using HDP 2.5.0 with 1.2.1 Hive. Performing some tests I noticed that my query works better if I don't use limit clause. My query is: insert into table *results_table *partition (task_id=xxx) select * from *data_table * where dt=20171102 and . limit 100 This query runs in about 30

Re: Hive locking mechanism on read partition.

2017-10-13 Thread Igor Kuzmenko
> You could use https://cwiki.apache.org/confluence/display/Hive/ > Configuration+Properties#ConfigurationProperties-hive. > txn.strict.locking.mode > > To change X lock on write to S lock to get around this but this may not be > appropriate for the rest of your logic. > > &

Re: Hive locking mechanism on read partition.

2017-10-13 Thread Igor Kuzmenko
> > > > Eugene > > > > > > *From: *Igor Kuzmenko <f1she...@gmail.com> > *Reply-To: *"user@hive.apache.org" <user@hive.apache.org> > *Date: *Thursday, October 12, 2017 at 3:58 AM > *To: *"user@hive.apache.org" <user@hive.apache.org>

Hive locking mechanism on read partition.

2017-10-12 Thread Igor Kuzmenko
Hello, I'm using HDP 2.5.0.0 with included hive 1.2.1. And I have problem with locking mechanism. Most of my queries to hive looks like this. *(1)insert into table results_table partition(task_id=${task_id})* *select * from data_table where ;* results_table partitioned by

Unexpected query result

2017-08-21 Thread Igor Kuzmenko
Runnuning simple '*select count(*) from test_table*' query returned me 500_000 result. But when i run '*select count(distinct field) from test_table*' query result is 500_001. How it coud happen, that in table with 500_000 records have 500_001 unique field values? I'm using Hive from HDP 2.5.0

Re: Hive TxnHandler::lock method run into dead lock.

2017-03-28 Thread Igor Kuzmenko
Explicit configuration is workaround, but it doesn't solve deadlock problem. On Mon, Mar 27, 2017 at 8:28 PM, Eugene Koifman <ekoif...@hortonworks.com> wrote: > There is an open ticket > > https://issues.apache.org/jira/browse/HIVE-13842 > > > > Eugene > >

Re: Hive TxnHandler::lock method run into dead lock.

2017-03-27 Thread Igor Kuzmenko
ot;hikaricp.configurationFile"*); > *if*(systemProp != *null*) { > *this*.loadProperties(systemProp); > } > > } > > > > > > *From: *Igor Kuzmenko <f1she...@gmail.com> > *Reply-To: *"user@hive.apache.org" <user@hive.apache.org> > *Date: *Sa

Re: Hive TxnHandler::lock method run into dead lock.

2017-03-25 Thread Igor Kuzmenko
gt; Can you try use “hikaricp” connection pool manager? It seems to be using > default which is no limit. > > > > > > Eugene > > > > *From: *Igor Kuzmenko <f1she...@gmail.com> > *Reply-To: *"user@hive.apache.org" <user@hive.apache.org> >

Hive TxnHandler::lock method run into dead lock.

2017-03-20 Thread Igor Kuzmenko
Hello I'm running Hortonworks data platform 2.5.0.0 with included hive. I'm using storm hive bolt to load data into my hive. But launching many hive bolt always leads me to TimeoutException on calling hive metastore. Metastore logs full of Exception like this: 2017-03-15 18:46:12,436 ERROR

Re: Hive TxnHandler::lock method run into dead lock.

2017-03-20 Thread Igor Kuzmenko
/hadoop/hive/metastore/txn/TxnHandler.java I guess the closest branch in apach repo is: https://github.com/apache/hive/blob/branch-2.1/metastore/src/java/org/apache /hadoop/hive/metastore/txn/TxnHandler.java On Tue, Mar 21, 2017 at 12:07 AM, Igor Kuzmenko <f1she...@gmail.com> wrote: >

Re: How to remove Hive table property?

2016-08-24 Thread Igor Kuzmenko
estruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 23 August 2016 at 12:42, Igor

Re: Hive transaction doesn't release lock.

2016-08-24 Thread Igor Kuzmenko
LOCKS for this txn_id? > > If all you see is an entry in TXN table in ‘a’ state – that is OK. that > just mean that this transaction was aborted. > > Eugene > > From: Igor Kuzmenko <f1she...@gmail.com> > Reply-To: "user@hive.apache.org" <user@hive.ap

Re: Hive transaction doesn't release lock.

2016-08-23 Thread Igor Kuzmenko
y other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 22 August 2016 at 16:27, Igor Kuzmenko <f1she...@g

Hive transaction doesn't release lock.

2016-08-22 Thread Igor Kuzmenko
Hello, I'm using Apache Hive 1.2.1 and Apache Storm to stream data in hive table. After making some tests I tried to truncate my table, but sql execution doesn't complete because of the lock on table: select * from HIVE_LOCKS; # TXN_ID, TXN_STATE, TXN_STARTED, TXN_LAST_HEARTBEAT, TXN_USER,

Re: Malformed orc file

2016-08-05 Thread Igor Kuzmenko
sed > and hence may not be flushed completely. Did the transaction commit > successfully? Or was there any exception thrown during writes/commit? > > Thanks > Prasanth > > On Aug 3, 2016, at 6:09 AM, Igor Kuzmenko <f1she...@gmail.com> wrote: > > Hello, I've

Re: Hive LIKE predicate. '_' wildcard decrease perfomance

2016-08-05 Thread Igor Kuzmenko
Thanks for reply, Gopal. Very helpful. On Thu, Aug 4, 2016 at 10:15 PM, Gopal Vijayaraghavan wrote: > > where res_url like '%mts.ru%' > ... > > where res_url like '%mts_ru%' > ... > > Why '_' wildcard decrease perfomance? > > Because it misses the fast path by just one "_". >

Hive LIKE predicate. '_' wildcard decrease perfomance

2016-08-04 Thread Igor Kuzmenko
I've got Hive Transactional table 'data_http' in ORC format, containing around 100.000.000 rows. When I execute query: select * from data_http where res_url like '%mts.ru%' it completes in 10 seconds. But executing query select * from data_http where res_url like '%mts_ru%' takes more than

Malformed orc file

2016-08-03 Thread Igor Kuzmenko
Hello, I've got a malformed ORC file in my Hive table. File was created by Hive Streaming API and I have no idea under what circumstances it became corrupted. File on google drive: link Exception message when trying

Re: Hive compaction didn't launch

2016-07-29 Thread Igor Kuzmenko
mmit. The compactor can not make any assumptions about what > >sessions with open transactions will do in the future. > > > >Alan. > > > >> On Jul 28, 2016, at 09:19, Igor Kuzmenko <f1she...@gmail.com> wrote: > >> > >> But this minOpenTxn value isn'

Re: Hive compaction didn't launch

2016-07-28 Thread Igor Kuzmenko
elta. Storm > should be committing on some frequency even if it doesn’t have enough data > to commit. > > Alan. > > > On Jul 28, 2016, at 05:36, Igor Kuzmenko <f1she...@gmail.com> wrote: > > > > I made some research on that issue. > > The problem is in Vali

Re: Hive compaction didn't launch

2016-07-28 Thread Igor Kuzmenko
rm Hive Bolt. Hive Bolt gets transaction and maintain it open with heartbeat until there's data to commit. So if i get transaction and maintain it open all compactions will stop. Is it incorrect Hive behavior, or Storm should close transaction? On Wed, Jul 27, 2016 at 8:46 PM, Igor Kuzmenko &

Re: Hive compaction didn't launch

2016-07-27 Thread Igor Kuzmenko
gt; storm list to see if people have seen this issue before. > > Alan. > > > On Jul 27, 2016, at 03:31, Igor Kuzmenko <f1she...@gmail.com> wrote: > > > > One more thing. I'm using Apache Storm to stream data in Hive. And when > I turned off Storm topology compactions st

Re: Hive compaction didn't launch

2016-07-27 Thread Igor Kuzmenko
One more thing. I'm using Apache Storm to stream data in Hive. And when I turned off Storm topology compactions started to work properly. On Tue, Jul 26, 2016 at 6:28 PM, Igor Kuzmenko <f1she...@gmail.com> wrote: > I'm using Hive 1.2.1 transactional table. Inserting data in it

Hive compaction didn't launch

2016-07-26 Thread Igor Kuzmenko
I'm using Hive 1.2.1 transactional table. Inserting data in it via Hive Streaming API. After some time i expect compaction to start but it didn't happen: Here's part of log, which shows that compactor initiator thread doesn't see any delta files: *2016-07-26 18:06:52,459 INFO [Thread-8]:

Does HIVE JDBC return same sequence of records?

2016-07-04 Thread Igor Kuzmenko
If I perform query "*SELECT * FROM table t WHERE t.partition = value" *with Hive JDBC several times is there garantee, that when I will iterate throw result set I get records in the same order every time? Intuitively, it feels yes, because in that query ther's no MapReduce and hive just read data

What is the best way to store IPv6 address in Hive?

2016-06-28 Thread Igor Kuzmenko
Currently I'm using ORC transactional tables, and i need to store a lot of data containing IP addresses. With IPv4 it can be a Integer (4 bytes exacty), but what about IPv6? Obiously it should be space efficient and easy to search for exact match. As extra feature it would be good to do fast

Re: Delete hive partition while executing query.

2016-06-09 Thread Igor Kuzmenko
id this won’t prevent the exception you are getting but it needs > to be fixed to prevent a partition from disappearing while query 3 and 4 > are in progress. > > Could you file a Jira please? > > thanks, > Eugene > > From: Igor Kuzmenko <f1she...@gmail.com> &g

Re: Delete hive partition while executing query.

2016-06-08 Thread Igor Kuzmenko
above explanation is not what’s > happening. > > Would it be possible for you to turn on debug logging on your thrift > metastore process and rerun this test and post the logs somewhere? Apache > lists strip attachments so you won’t be able to attach them here, you’ll > have to p

Re: Delete hive partition while executing query.

2016-06-07 Thread Igor Kuzmenko
e transaction manager is what manages > locking and makes sure that your queries don’t stomp each other. > > Alan. > > > On Jun 6, 2016, at 06:01, Igor Kuzmenko <f1she...@gmail.com> wrote: > > > > Hello, I'm trying to find a safe way to delete partiti

Delete hive partition while executing query.

2016-06-06 Thread Igor Kuzmenko
Hello, I'm trying to find a safe way to delete partition with all data it includes. I'm using Hive 1.2.1, Hive JDBC driver 1.2.1 and perform simple test on transactional table: asyncExecute("Select count(distinct in_info_msisdn) from mobile_connections where dt=20151124 and msisdn_last_digit=2",

Hive Hcatalog Streaming. Why hive table must be bucketed?

2016-04-08 Thread Igor Kuzmenko
Hello I've got few questions about Hive HCatalog streaming . This feature has requirement: "*The Hive table must be bucketed , but not sorted.

Hive StreamingAPI leaves table in not consistent state

2016-03-10 Thread Igor Kuzmenko
Hello, I'm using Hortonworks Data Platform 2.3.4 which includes Apache Hive 1.2.1 and Apache Storm 0.10. I've build Storm topology using Hive Bolt, which eventually using Hive StreamingAPI to stream data into hive table. In Hive I've created transactional table: 1. CREATE EXTERNAL TABLE cdr1