Re: how to create hive table name with special character

2022-06-24 Thread Peter Vary
We try to make these table names work. By my understanding the `.` is still problematic, but the others should work, at least with 4.0.0. Not sure about the other versions though. TBH we do not have too much test around this, so there could be some issues. > On 2022. Jun 24., at 7:32, second_co..

Re: Hive unable to Launch job to spark

2022-05-31 Thread Peter Vary
Hi Prasanth, I would suggest not to invest too heavily in Hive on Spark. I recent years there was no movement around the feature and it will be removed in Hive 4.0.0. Thanks, Peter > On 2022. May 27., at 13:00, Prasanth M Sasidharan wrote: > > Hello team, > > I am trying to use spark as the

Re: time travel using hive cli

2022-05-20 Thread Peter Vary
his in the future? Any issue ticket in github tracking this? > > On Friday, May 20, 2022, 03:28:04 PM GMT+8, Peter Vary > wrote: > > > Time travel for Hive is only for Iceberg tables: > https://issues.apache.org/jira/browse/HIVE-25344 > <https://issues.apache

Re: time travel using hive cli

2022-05-20 Thread Peter Vary
Time travel for Hive is only for Iceberg tables: https://issues.apache.org/jira/browse/HIVE-25344 The syntax is: SELECT * FROM t FOR SYSTEM_TIME AS OF ; SELECT * FROM t FOR SYSTEM_VERSION AS OF ; Currently only Hive 4.0.0-alpha-1 release support

Re: Issue with the "hive.io.file.readcolumn.names" property

2022-05-19 Thread Peter Vary
Hi Julien, I am not sure about the MR codepath, but I seem to remember a case where the MR paln was optimized in a way that the table is only read once (with a wrong configuration) instead of twice with different configuration. When I asked around it was said that the issue was fixed for Tez. T

Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2022-05-10 Thread Peter Vary
for now since the core classified artifact has >been removed and the shading issue has to be solved before they can >consume the new jar. > >On Mon, May 9, 2022 at 4:10 AM Peter Vary wrote: >> >> Hi Team, >> >> My experience with the Iceberg c

Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2022-05-09 Thread Peter Vary
.x and 3.x, and > periodically they may need new fixes in these. Upgrading them to use > 4.x seems not an option for now since the core classified artifact has > been removed and the shading issue has to be solved before they can > consume the new jar. > > On Mon, May 9, 2022 at 4

Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2022-05-09 Thread Peter Vary
Hi Team, My experience with the Iceberg community shows that there are some sizeable userbase around Hive 2.x. I have seen patches, contributions to Hive 2.3.x branches, and the tests are in much better shape there. I would definitely vote for EOL Hive 1.x, but until we have a stable 4.x, I wo

Re: AvroSerde's inferred schema for NOT NULL columns

2022-05-05 Thread Peter Vary
Hi Julien, Your question again reminds me about the things we were facing with Iceberg :) In Iceberg there is a possibility to define `required` columns, and we thought it would be a good idea to convert these columns to `NOT NULL` columns in Hive. We tried to use the HiveMetaHook API to do this

Re: Custom OutputCommitter not called by Tez

2022-05-02 Thread Peter Vary
ur question, I'm currently working on a rewrite of the > > Hive-BigQuery connector ( > > https://github.com/GoogleCloudDataproc/hive-bigquery-storage-handler > > <https://github.com/GoogleCloudDataproc/hive-bigquery-storage-handler>). > > I'll > > be happy

Re: Using IntelliJ debugger with Tez

2022-04-28 Thread Peter Vary
Hi Julien, You could set the java options for the TezAM to start in in debug mode. IIRC it could be done by tez.am.java.opts Thanks, Peter > On 2022. Apr 27., at 19:16, Julien Phalip wrote: > > Hi, > > I'm able to successfully use the IntelliJ debugger and set breakpoints with > Hive while u

Re: Custom OutputCommitter not called by Tez

2022-04-28 Thread Peter Vary
; Julien > > On 2022/04/27 08:59:08 Peter Vary wrote: > > We had the same issue with the IcebergOutputCommitter. > > > > The first solution was this: > > https://issues.apache.org/jira/browse/HIVE-25006 > > <https://issues.apache.org/jira/browse/HIVE-2500

Re: Custom OutputCommitter not called by Tez

2022-04-27 Thread Peter Vary
We had the same issue with the IcebergOutputCommitter. The first solution was this: https://issues.apache.org/jira/browse/HIVE-25006 It needed https://issues.apache.org/jira/browse/TEZ-4279 Later

Re: Time to Remove Hive-on-Spark

2022-04-12 Thread Peter Vary
+1 from my side too. I have created PR against the current branch. Still needs some work, and as many reviews as possible, because it is quite big, and I might made some mistakes https://issues.apache.org/jira/browse/HIVE-26134 https://github.com/apache/hive/pull/3201 Thanks, Peter On Thu, 10 Fe

Re: Next gen metastore

2022-04-04 Thread Peter Vary
Hi Edward, We are currently working on integrating Apache Iceberg tables to Hive. In the latest released Hive 4.0.0-alpha-1 it is possible to create tables backed by Iceberg tables, and those could be queried by Hive. You can define the partitioning using Iceberg specification like this: CREATE

Re: Request write access to the Hive wiki

2022-04-01 Thread Peter Vary
Could you please check your rights now? Thanks, Peter > On 2022. Mar 30., at 22:09, Yu-Wen Lai wrote: > > Hello folks! > > We recently landed a patch regarding HS2 JWT authentication. > https://issues.apache.org/jira/browse/HIVE-25575 > > >

[ANNOUNCE] Apache Hive 4.0.0-alpha-1 Released

2022-03-30 Thread Peter Vary
The Apache Hive team is proud to announce the release of Apache Hive version 4.0.0-alpha-1 The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides, among others: * Tools to ena

Re: write access to the wiki

2022-03-03 Thread Peter Vary
Hi Alessandro, I have added you to the wiki. Could you please check it? Thanks, Peter > On 2022. Mar 3., at 13:29, Alessandro Solimando > wrote: > > Hello, > I have recently gone through regenerating some thrift files, and I have > noticed that the info in the wiki is outdated, I'd like to c

Re: Add Hive AUX Jars across the cluster

2021-04-12 Thread Peter Vary
Hi Abhishek, You might want to take a look at https://tez.apache.org/install.html . I think the interesting part starts with "Various ways to configure tez.lib.uris". In our case we had to update the tez tarball on HDFS by adding the new jars to it, but it r

Re: Maintaining Hive 2 and 3 branches,

2021-03-18 Thread Peter Vary
Thanks Sungwoo for sharing this! A few questions: - Are these patches you mention below bugfixes, or new features on Hive 3.1.3? (This might be a typo as I think the last Hive release is 3.1.2) - Could you backport these patches to the apache branch-3, and branch-3.1? - Is there any reason not to

Re: Any plan for new hive 3 or 4 release?

2021-02-25 Thread Peter Vary
Hi Lee, When I started to work on Hive around 4 years ago, MR was already set as deprecated. So you definitely should scan even older archives. For Iceberg integration, it would be good to have more frequent releases for Hive as well. Thanks, Peter Lee Ming-Ta ezt írta (időpont: 2021. febr.

Re: How useful are tools for Hive data modeling

2020-11-11 Thread Peter Vary
Hi Mich, Index support was removed from hive: https://issues.apache.org/jira/browse/HIVE-21968 https://issues.apache.org/jira/browse/HIVE-18715 Thanks, Peter > On Nov 11, 2020, at 17:25, Mich

Re: Hive SQL extension

2020-11-02 Thread Peter Vary
(Astronomical Data Query Language) is a SQL-like language that defines some >>> higher-level functions that enable powerful geospatial queries. Projects >>> like queryparser <https://github.com/aipescience/queryparser >>> <https://github.com/aipescience/queryparser>>

Re: Hive SQL extension

2020-10-22 Thread Peter Vary
a job of the > parser. > > Best, > Stamatis > > > On Thu, Oct 22, 2020 at 9:49 AM Peter Vary <mailto:pv...@cloudera.com>> wrote: > Hi Hive experts, > > I would like to extend Hive SQL language to provide a way to create Iceberg > partitioned table

Hive SQL extension

2020-10-22 Thread Peter Vary
Hi Hive experts, I would like to extend Hive SQL language to provide a way to create Iceberg partitioned tables like this: create table iceberg_test( level string, event_time timestamp, message string, register_time date, telephone array ) partitio

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread Peter Vary
es regularly. > If we can reduce the duration for some big Merge queries and make auto > compaction works properly it should be ok. > Problem: at the moment, compactions are not triggered automatically. Have you > an idea ? > > Le mar. 2 juin 2020 à 12:57, Peter Vary <mai

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread Peter Vary
e, the only way is to reduce time it takes for this "merge" queries > in order to cancel locks and related transactions. Am I right ? > > Le mar. 2 juin 2020 à 11:52, Peter Vary <mailto:pv...@cloudera.com>> a écrit : > Hi David, > > I think this jira des

Re: compaction issue: Compaction cannot compact above this txnid

2020-06-02 Thread Peter Vary
Hi David, I think this jira describes your situation: https://issues.apache.org/jira/browse/HIVE-16360 "The reason is that compactor won't compact anything above the level of an open transaction. So if there is a very long running txn, it may

Re: Count bug in Hive 3.0.0.3.1

2020-04-28 Thread Peter Vary
Hi Deepak, If I were you, I would test your repro case on the master branch. If it is fixed, I think you should try to find the fix which solves the problem and cherry-pick the fix to branch-3 and branch-3.1 so the fix is there in the next release. If the problem is still present on the master

Re: Exclusive locks acquired at the hive table has not got released automatically when hive services gets restarted

2020-03-10 Thread Peter Vary
Hi, Quick questions: - Which version of Hive are you using? - What type of LockManager is used? ZKLockManager is using ephemeral nodes to remove stale locks, but the default timeout is a bit high. DBLockManager uses heartbeat to identify stale locks, but you need a AcidHouseKeeperService to clean

Re: ORC: duplicate record - rowid meaning ?

2020-02-25 Thread Peter Vary
problem. Thx > I'll keep you in touch > > On 2020/02/06 09:42:39, Peter Vary <mailto:pv...@cloudera.com>> wrote: > > Hi David, > > > > I more familiar with ACID v2 :( > > What I would do is to run an update operation with your version of Hive

Re: ORC: duplicate record - rowid meaning ?

2020-02-06 Thread Peter Vary
originalTransaction,bucketId,rowId ascendingly and currentTransaction > descendingly. It works pretty well except for some tables with lot of updates. > The only thing I can see at the moment it is the fact that I mix different > types of operations in one bucket. The Merge query for example

Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread Peter Vary
that write Hive Delta Files for the managed tables >> directly from Flink. >> The current streaming apis for Hive 2 are not suitable for our needs and we >> cannot use the new Hive 3 streaming api yet. This system uses the Flink >> state to store Hive metadata (originalTransact

Re: Performance issue with hive metastore

2020-01-31 Thread Peter Vary
cale by using multiple threads even when > there's only one client machine and one server is involved. Do you know any > other spark config that can help with this? > > Do you think if I use hive jdbc instead of spark to submit these queries in > parallel they will execute

Re: Performance issue with hive metastore

2020-01-30 Thread Peter Vary
Hi Nirav, There are several configurations which could affect the number of parallel queries running in your environment depending on you Hive version. Thrift client is not thread safe and this causes bottleneck in the client - HS2, and HS2 - HMS communication. Hive solves this by creating its

Re: Apache Iceberg integration

2020-01-09 Thread Peter Vary
Hi Elliot, I think would be really worthwhile to have Iceberg integration with Hive. Minimally for reading through the available interfaces, then handling schema evolution / schema synchronization etc. Later having the possibility to write to an Iceberg table would be good as well, but integrati

Re: ORC: duplicate record - rowid meaning ?

2019-12-01 Thread Peter Vary
for your reply because yes, when files are ordered by > originalTransacion, bucket, rowId > it works ! I just have to use 1 transaction instead of 2 at the moment and it > will be ok. > > Thanks > David > > On 2019/11/29 11:18:05, Peter Vary wrote: >> Hi D

Re: Update Performance in Hive with data stored as Parquet, ORC

2019-11-29 Thread Peter Vary
Hi Shivam, There were a lot of changes around ACID with the Hive 3.0 release. I assume below, that your question is about Hive 3.x release. Hive ACID v2 implements UPDATE as deleting the old row, and creating a new one for performance reasons. See Eugene's nice presentation for the details: http

Re: ORC: duplicate record - rowid meaning ?

2019-11-29 Thread Peter Vary
Hi David, Not entirely sure what you are doing here :), my guess is that you are trying to write ACID tables outside of hive. Am I right? What is the exact use-case? There might be better solutions out there than writing the files by hand. As for your question below: Yes, the files should be or

Re: review the code question

2019-10-31 Thread Peter Vary
Hi! There is a wiki page outlining the way how to contribute. See: https://cwiki.apache.org/confluence/display/Hive/HowToContribute Thanks, Peter > On Oct 30, 2019, at 09:28, 阿伦 <849551...@qq.com> wrote: > > Hello: > >How

Re: So many SQL ROLLBACK commands on the Hive PostgreSQL table

2019-10-10 Thread Peter Vary
Souza > wrote: > > Hi Peter, thanks for the support. > > Every select count(*) from TXNS where txn_state = 'o' runs fine if I run it > from psql. No ROLLBACK happens after. > > Maybe the result , I’m only seeing 0 (zero), trigger the ROLLBACK command

Re: So many SQL ROLLBACK commands on the Hive PostgreSQL table

2019-10-09 Thread Peter Vary
Hi Fernando, Checked the comapaction_queue related one, and that is definitely normal. Checked the txn related one, and that seems more interesting. I would try to run the query above against you HMS DB - my guess that is failing with some error. Peter > On Oct 9, 2019, at 12:56, Antunes, Fern

Re: Support Parquet through HCatalog

2019-09-27 Thread Peter Vary
cket. > Would you be able to do it for me, please? > > Thanks, > > Jay > > From: Peter Vary > Reply-To: "user@hive.apache.org" > Date: Thursday, 26 September 2019 at 18:07 > To: "user@hive.apache.org" > Subject: Re: Support Parquet th

Re: Support Parquet through HCatalog

2019-09-26 Thread Peter Vary
Hi Jay, I suggest open a new jira if the patch does not cleanly apply to the branches you need. Thanks, Peter > On Sep 26, 2019, at 15:41, Jay Green Stevens wrote: > > Hi all, > > I’ve been asked to work with backporting the patch HIVE-8838.4.patch (from > HIVE-8838

Re: Questions about HIVE-20508

2019-09-20 Thread Peter Vary
Hi Julien, See my answers below: > On Sep 19, 2019, at 21:55, Julien Phalip wrote: > > Hi, > > I'm interested in a new config property that was added as part of HIVE-20508 > , and had a few questions: > > 1) The update was merged >

Re: Are Hive locks really ephemeral ?

2018-11-17 Thread Peter Vary
Hi Rajesh, When the queries are locking tables/partitions they are using the acquireLocks method. If you are using DummyTxnManager then it acquires locks with keepAlive false. When using manual locking then that is a different story, and most probably you are right when you say that it creates

Re: UDFJson cannot Make Progress and Looks Like Deadlock

2018-07-25 Thread Peter Vary
ing this out, it makes the problem clear. > > > For a quick workaround and low cost without upgrading, I'm considering > to reimplement the UDF get_json_object to a new name to avoid the problem. > > > > Thanks > Guizhou > -- > *From:* Pet

Re: UDFJson cannot Make Progress and Looks Like Deadlock

2018-07-24 Thread Peter Vary
Hi Guizhou, I would guess, that this is caused by: HIVE-16196 UDFJson having thread-safety issues Try to upgrade to a CDH version where this patch is already included (5.12.0 or later) Regards, Peter > On Jul 24, 2018, at 10:15, Proust (Feng

Re: Alter table using Thrift client

2018-07-12 Thread Peter Vary
wrote: > > Hi Peter, > > client.setMetaConf("metastore.disallow.incompatible.col.type.changes", > "false") gets me the same error: > > MetaException(message:Invalid configuration key > metastore.disallow.incompatible.col.type.changes) > > Thanks > > On Wed, Jul 11, 2018 at

Re: Alter table using Thrift client

2018-07-11 Thread Peter Vary
t; hive.metastore.disallow.incompatible.col.type.changes) > > Am I not setting the property correctly, or will I have to upgrade the hive > server to version 2.4/3.0 as well? > > Thanks! > > On Wed, Jul 11, 2018 at 4:56 AM, Peter Vary <mailto:pv...@cloudera.com>> w

Re: Alter table using Thrift client

2018-07-11 Thread Peter Vary
Hi Sylvester, You can set this specific configuration value per session since HIVE-17832 - Fixed in: Hive 3.0.0, Hive 2.4.0 So you can do this change this value through thrift if your metastore version is higher or equal than 2.4.0, or 3.0.0, but not with 1.2.1 If you want to use 1.2.1 version

Re: A misspelling about hive.server2.webui.use.spnego

2018-06-07 Thread Peter Vary
Hi, That was a documentation error. Fixed it. Thanks for reporting Kylin Jin! Peter > On Jun 7, 2018, at 9:12 AM, Kylin Jin wrote: > > hi > In Hive Configuration Properties: > > https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties >

Re: Jimmy Xiang now a Hive PMC member

2017-05-25 Thread Peter Vary
Congratulations Jimmy! > On May 25, 2017, at 6:16 AM, Xuefu Zhang wrote: > > Hi all, > > It's an honer to announce that Apache Hive PMC has recently voted to invite > Jimmy Xiang as a new Hive PMC member. Please join me in congratulating him > and looking forward to a bigger role that he will

Re: Welcome Rui Li to Hive PMC

2017-05-25 Thread Peter Vary
Congratulations Rui! > On May 25, 2017, at 6:19 AM, Xuefu Zhang wrote: > > Hi all, > > It's an honer to announce that Apache Hive PMC has recently voted to invite > Rui Li as a new Hive PMC member. Rui is a long time Hive contributor and > committer, and has made significant contribution in Hiv

Re: operation log is missing when using hive.execution.engine=mr

2017-05-15 Thread Peter Vary
Hi Jessica, Is it possible that you are effected by this? https://issues.apache.org/jira/browse/HIVE-16061 Thanks, Peter 2017. máj. 15. 19:44 ezt írta ("Jie Zhang" ): Hi, My team just upgrade Hive from 0.14.0 to 2.1.1. The operation log is missing when running the query, no query progress i

Re: Welcome new Hive committer, Zhihai Xu

2017-05-05 Thread Peter Vary
Congratulations Zhihai! 2017. máj. 5. 18:52 ezt írta ("Xuefu Zhang" ): > Hi all, > > I'm very please to announce that Hive PMC has recently voted to offer > Zhihai a committership which he accepted. Please join me in congratulating > on this recognition and thanking him for his contributions to H

Re: [ANNOUNCE] New Hive Committer - Rajesh Balamohan

2016-12-15 Thread Peter Vary
Congratulations Rajesh! > On Dec 15, 2016, at 6:40 AM, Rui Li wrote: > > Congratulations :) > > On Thu, Dec 15, 2016 at 6:50 AM, Gunther Hagleitner < > ghagleit...@hortonworks.com> wrote: > >> Congrats Rajesh! >> >> From: Jimmy Xiang >> Sent: Wednesday

Re: Can I specify database name in hive metastore service?

2016-10-27 Thread Peter Vary
make the metastore db stored in the same RDBMS in a different > database? > 发件人: Peter Vary > 发送时间: 2016年10月26日 21:11:26 > 收件人: Huang Meilong > 抄送: user@hive.apache.org > 主题: Re: Can I specify database name in hive metastore service? > > Hi Huang, > > Accordin

Re: Can I specify database name in hive metastore service?

2016-10-26 Thread Peter Vary
F-8" > and > "jdbc:mysql://x/hivemeta_2?createDatabaseIfNotExist=true&characterEncoding=UTF-8", > will the to metastore services work fine? > > In short, I want to use the same RDBMS database for the two hive metastore > services, and the meta data is isola

Re: Can I specify database name in hive metastore service?

2016-10-25 Thread Peter Vary
Hi Huang, Hive metastore is a component of the "Hive database". See: https://cwiki.apache.org/confluence/display/Hive/Design The metastore uses traditional RDBMS to store "the structure information of the various tables and partitions in the warehouse". The javax.jdo.option.ConnectionURL and the

Re: [ANNOUNCE] New PMC Member : Pengcheng

2016-07-18 Thread Peter Vary
Congratulations Pengcheng! > On Jul 18, 2016, at 6:55 AM, Wei Zheng wrote: > > Congrats Pengcheng! > > Thanks, > > Wei > > > > > > > On 7/17/16, 16:01, "Xuefu Zhang" wrote: > >> Congrats, PengCheng! >> >> On Sun, Jul 17, 2016 at 2:28 PM, Sushanth Sowmyan >> wrote: >> >>> Welcome ab

Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-18 Thread Peter Vary
Congratulations Jesus! > On Jul 18, 2016, at 6:55 AM, Wei Zheng wrote: > > Congrats Jesus! > > Thanks, > > Wei > > > > > > > > On 7/17/16, 14:29, "Sushanth Sowmyan" wrote: > >> Good to have you onboard, Jesus! :) >> >> On Jul 17, 2016 12:00, "Lefty Leverenz" wrote: >> >>> Congratul

Re: [Announce] New Hive Committer - Mohit Sabharwal

2016-07-01 Thread Peter Vary
Congratulations Mohit! 2016. júl. 1. 19:10 ezt írta ("Vihang Karajgaonkar" ): > Congratulations Mohit! > > > On Jul 1, 2016, at 10:05 AM, Chao Sun wrote: > > > > Congratulations Mohit! Good job! > > > > Best, > > Chao > > > > On Fri, Jul 1, 2016 at 9:57 AM, Szehon Ho

Re: LINES TERMINATED BY only supports newline '\n' right now

2016-06-02 Thread Peter Vary
Hi, According to the documentation you should write and set your own Inputformat when creating the table. Mike Sukmanowsky solved a similar problem here, this might help you: http://stackoverflow.com/questions/7692994/custom-inputformat-with-hive Regards, Peter For some of the columns '\n' chara