Re: Query Failures

2020-02-14 Thread David Mollitor
https://community.cloudera.com/t5/Support-Questions/Map-and-Reduce-Error-Java-heap-space/td-p/45874 On Fri, Feb 14, 2020, 6:58 PM David Mollitor wrote: > Hive has many optimizations. One is that it will load the data directly > from storage (HDFS) if it's a trivial query. For example: > >

Re: Query Failures

2020-02-14 Thread David Mollitor
Hive has many optimizations. One is that it will load the data directly from storage (HDFS) if it's a trivial query. For example: Select * from table limit 10; In natural language it says "give me any ten rows (if available) from the table." You don't need the overhead of launching a full

Re: Issues with aggregating on map values

2020-02-12 Thread Nakul Khanna (BLOOMBERG/ LONDON)
kul Khanna (BLOOMBERG/ LONDON ) , user@hive.apache.org Cc: Jacky Lee (BLOOMBERG/ PRINCETON ) , He Chen (BLOOMBERG/ PRINCETON ) , Peter Babinski (BLOOMBERG/ PRINCETON ) , Bernat Gabor (BLOOMBERG/ LONDON ) , Shashank Singh (BLOOMBERG/ PRINCETON ) Subject: Re: Issues with aggregating on map v

Re: Issues with aggregating on map values

2020-02-12 Thread Zoltan Haindrich
Hey Nakul! It's not clear which version you are using; I've checked this issue on apache/master and the 3.1.2 release - and both of them returned accurate results. You could execute: 'select version()' ; or run 'hive --version' in a commandline cheers, Zoltan On 2/11/20 11:38 AM, Nakul Khanna

Re: Query Failures

2020-02-11 Thread Pau Tallada
Hi, Do you have more complete tracebacks? Missatge de Charles Givre del dia dt., 11 de febr. 2020 a les 2:54: > Hello Everyone! > I recently joined a project that has a Hive/Impala installation and we are > experience a significant number of query failures. We are using an older > version of

Re: Is there any way to find Hive query to Datanucleus queries mapping

2020-02-11 Thread Chinna Rao Lalam
Thanks Zoltan for the prompt reply, I have checked the code with your insights, Yes with this call we can get the information like below. Using this data we can add a log for each HIVESql overall how much time spent in metadata operations. metadata.Hive: Time spent in each metastore function

Re: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large.

2020-02-11 Thread Bernard Quizon
Hi. We fixed the issue by patching protobuf-java-2.5.0.jar, we changed CodedInputStream.DEFAULT_SIZE_LIMIT to 1GB. Uploaded the patched version on our servers and added the location of the aforementioned jar to the *tez.cluster.additional.classpath.prefix* (tez-site.xml) to

Re: Is there any way to find Hive query to Datanucleus queries mapping

2020-02-10 Thread Zoltan Haindrich
Hey Chinna! I don't think a mapping like that is easy to get...I would rather try to narrow down to a single call which consumes most of the time. There is a log message which can help you get to the most relevant metastore call:

Re: ORC: duplicate record - rowid meaning ?

2020-02-06 Thread David Morin
ok, Peter No problem. Thx I'll keep you in touch On 2020/02/06 09:42:39, Peter Vary wrote: > Hi David, > > I more familiar with ACID v2 :( > What I would do is to run an update operation with your version of Hive and > try to see how it handles this case. > > Would be nice to hear back from

Re: ORC: duplicate record - rowid meaning ?

2020-02-06 Thread Peter Vary
Hi David, I more familiar with ACID v2 :( What I would do is to run an update operation with your version of Hive and try to see how it handles this case. Would be nice to hear back from you if you found something. Thanks, Peter > On Feb 5, 2020, at 16:55, David Morin wrote: > > Hello, > >

Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread David Morin
Hello, Thanks. In fact I use HDP 2.6.5 and previous Orc version with transactionid for example and the update flag. Sorry with the row__id iw would have been easier So, Here after the Orc files content (with hive --orcfiledump) hive --orcfiledump

Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread Peter Vary
Hi David, There is no tombstone for the updated record. In ACID v2 there is no update for the rows. Only insert and delete. So update is handled as delete (old) row, insert (new/independent) row. The delete is stored in the delete delta directories., and the file do not have to contain the

Re: ORC: duplicate record - rowid meaning ?

2020-02-05 Thread David Morin
Hi, It works pretty well but... still problems sometimes occur Do we have to separate operations ? Here after Orc files content: hive --orcfiledump hdfs:///delta_0198994_0198994_/bucket_0

Re: rename output error during hive query on AWSs3-external table

2020-02-04 Thread Sungwoo Park
Not a solution, but looking at the source code of S3AFileSystem.java (Hadoop 2.8.5), I think the Exception raised inside S3AFileSystem.rename() is swallowed and only a new HiveException is reported. So, in order to find out the root cause, I guess you might need to set Log level to DEBUG and see

RE: rename output error during hive query on AWSs3-external table

2020-02-04 Thread Aaron Grubb
Check this thread: https://forums.aws.amazon.com/thread.jspa?messageID=922594 From: Souvikk Roy Sent: Tuesday, February 4, 2020 3:06 AM To: user@hive.apache.org Subject: rename output error during hive query on AWSs3-external table Hello, We are using some external tables backed by aws S3.

Re: Performance issue with hive metastore

2020-01-31 Thread Peter Vary
Hi Nirav, I am not sure how spark uses Hive. If the ALTER TABLE sql is issued through Hive then Spark is not connecting directly to the HMS, but it connects to HS2 instead. If it is using only HMS uri, then the sql is translated inside Spark, and only metastore calls are sent to the HMS. This

Re: Performance issue with hive metastore

2020-01-30 Thread Nirav Patel
Thanks for responding Peter. It indeed seems like a one session per client (we can see in every log record - source:10.250.70.14 ). I don't create session with hive thrift server. Spark basically require this property "hive.metastore.uris" in sparkconfig which we set to "thrift://hivebox:9083"

Re: Performance issue with hive metastore

2020-01-30 Thread Peter Vary
Hi Nirav, There are several configurations which could affect the number of parallel queries running in your environment depending on you Hive version. Thrift client is not thread safe and this causes bottleneck in the client - HS2, and HS2 - HMS communication. Hive solves this by creating its

Re: UDF timestamp columns

2020-01-28 Thread Nicolas Paris
Thanks Shown Finally I got it working with java.sql.Timestamp Indeed the version of hive is important. However I am using hive UDF with apache spark. Paradoxally, spark-sql only handle hive udf. It does not handle spark udf (they only apply in spark - not spark-sql) On Wed, Jan 22, 2020 at

RE: Hive expection: Class org.apache.hadoop.fs.adl.AdlFileSystem not found

2020-01-27 Thread FABIAN Juan-antonio
Hello, just an update. I finally got it working. There were two issues: * Even we're not using HDFS, we need the core-site.xml, where we define the storage layer (minio in our case). * Apart from that, before the PutHiveQL, the queue still contained some elements pointing to an

Re: UDF timestamp columns

2020-01-22 Thread Shawn Weeks
Depending on what version of Hive you are looking for TimestampWritable or one of it's related classes. Thanks Shawn On 1/22/20, 6:51 AM, "Nicolas Paris" wrote: Hi I cannot find the way to implement hive UDF dealing with timestamp type. I tried both java.sql.Timestamp and

Re: If Hive Metastore is compatibility with MariaDB version 10.x.?

2020-01-20 Thread Zoltan Haindrich
Hello, Locally I use Mariadb 10.4.8 when I validate metastore schema/etc changes. So far, I've not uncovered any issues with it... I'm planning to integrate some kind of smoke tests against all the supported DBs to help uncover metastore related issues earlier. To evaluate that we have

Re: If Hive Metastore is compatibility with MariaDB version 10.x.?

2020-01-17 Thread Alan Gates
Hive is tested against MariaDB 5.5, so I can't say whether it will work against version 10. You would need to do some testing with it to see. Alan. On Fri, Jan 17, 2020 at 4:29 AM Oleksiy S wrote: > Hi all. > > Could you please help? Customer asked if Hive Metastore is compatible with >

Re: Alternatives to Streaming Mutation API in Hive 3.x

2020-01-16 Thread Nandakishore Mm
Here is some more context of what we are trying out. We have streaming mutable data in kafka and we want to stream these upserts into Hive ( HDFS backed) . I gather than hive has a streaming interface through which upserts can be streamed into Hive. Have u guys tested this and or done some POC

Re: Why Hive uses MetaStore?

2020-01-15 Thread Akash Mishra
sorry, I still do n’t > understand. Can you elaborate more? > > > -- 原始邮件 -- > *发件人:* "David Mollitor"; > *发送时间:* 2020年1月15日(星期三) 晚上11:01 > *收件人:* "user"; > *主题:* Re: Why Hive uses MetaStore? > > In the beginning, hive

Re: Why Hive uses MetaStore?

2020-01-15 Thread David Mollitor
In the beginning, hive was a command line tool. All the heavy lifting happened on the user's local box. If a user wanted to execute hive from their laptop, or a server, it always needs access to the list of available tables (and their schemas and their locations), otherwise every SQL script

Re: Alternatives to Streaming Mutation API in Hive 3.x

2020-01-14 Thread Nandakishore Mm
Hi David, Thanks for the response. I'm actually trying to do streaming upserts into hive. Since we already use Hive to perform our analytics we are looking for solutions based around Hive itself. Also as you mentioned Hive 3.x for upserts, could you point me to something specific in Hive that

Re: Which version of Hive support : creating Procedure

2020-01-13 Thread Suresh Kumar Sethuramaswamy
Hi Raviprasad, Though Hive procedural SQL is introduced as part of Hive2.0.0 tracked against HIVE-11055 , This is an unsupported feature in CDH6(Release Notes

Re: Alternatives to Streaming Mutation API in Hive 3.x

2020-01-13 Thread David Mollitor
Hello, Streaming? NiFi Upserts? HBase, Kudu, Hive 3.x Doing upserts on Hive can be cumbersome, depending on the use case. If Upserts are being submitted continuously and quickly, it can overwhelm the system because it will require a scan across the data set (for all intents and purposes) for

Re: OutOfMemoryError after loading lots of dynamic partitions

2020-01-09 Thread Suresh Kumar Sethuramaswamy
That's awesome. Thanks Suresh Sethuramaswamy On Thu, Jan 9, 2020 at 2:00 PM Patrick Duin wrote: > Thanks Suresh, changing the heap was our first guess as well actually. I > think we were on the right track there. Weird thing is that our jobs seems > to now run fine (all partitions are added)

Re: OutOfMemoryError after loading lots of dynamic partitions

2020-01-09 Thread Patrick Duin
Thanks Suresh, changing the heap was our first guess as well actually. I think we were on the right track there. Weird thing is that our jobs seems to now run fine (all partitions are added) despite still giving this error. Weird but it seems to be ok now. Thanks for the help. Op wo 8 jan. 2020

Re: Apache Iceberg integration

2020-01-09 Thread Feng Lu
For someone like me who is new to the hive community, is there a (semi-)formal process on contributing a large-scale feature like Hive-Iceberg integration? For example, hive improvement proposal, community voting, development and code review, release, etc. Thank you and sorry for derailing this

Re: Apache Iceberg integration

2020-01-09 Thread Peter Vary
Hi Elliot, I think would be really worthwhile to have Iceberg integration with Hive. Minimally for reading through the available interfaces, then handling schema evolution / schema synchronization etc. Later having the possibility to write to an Iceberg table would be good as well, but

Re: OutOfMemoryError after loading lots of dynamic partitions

2020-01-08 Thread Suresh Kumar Sethuramaswamy
Thanks for the Query and the hive options. Looks like the JVM HEAP space for HIVE CLI is running out of memory as per the EMR documentation https://aws.amazon.com/premiumsupport/knowledge-center/emr-hive-outofmemoryerror-heap-space/ On Wed, Jan 8, 2020 at 11:38 AM Patrick Duin wrote: > The

Re: OutOfMemoryError after loading lots of dynamic partitions

2020-01-08 Thread Patrick Duin
The query is rather large it won't tell you much (it's generated). It comes down to this: WITH gold AS ( select * from table1), delta AS (select * from table2) INSERT OVERWRITE TABLE my_db.temp__v1_2019_12_03_182627 PARTITION (`c_date`,`c_hour`,`c_b`,`c_p`) SELECT * FROM gold UNION

Re: OutOfMemoryError after loading lots of dynamic partitions

2020-01-08 Thread Suresh Kumar Sethuramaswamy
Could you please post your insert query snippet along with the SET statements ? On Wed, Jan 8, 2020 at 11:17 AM Patrick Duin wrote: > Hi, > I got a query that's producing about 3000 partitions which we load > dynamically (On Hive 2.3.5). > At the end of this query (running on M/R which runs

Re: HIVE-2.4 release plans

2020-01-08 Thread Oleksiy S
Thanks for answering. It would be nice to have Hive-2.4.0, but the versioning is up to you. Waiting for new Hive! On Fri, Jan 3, 2020 at 11:31 AM Mass Dosage wrote: > +1 for this, or for a Hive 2.3.7 release. We are blocked from releasing > some of our projects which use Hive 2.3.x on Java >8

Re: HIVE-2.4 release plans

2020-01-03 Thread Mass Dosage
+1 for this, or for a Hive 2.3.7 release. We are blocked from releasing some of our projects which use Hive 2.3.x on Java >8 due to https://issues.apache.org/jira/browse/HIVE-21508 which we helped get merged but it hasn't been released yet. Similarly we'd like to be able to use some Parquet

Re: How to build Hive HA?

2020-01-02 Thread Aaron Grubb
0 9:52:06 PM To: user Cc: dev Subject: Re: How to build Hive HA? Hey qq, This is not an answer but a few hints for you before you go ahead with your Hive project. 1. First of all, do you really really really really need to work directly on Hive instead of using an integrated so

Re: How to build Hive HA?

2020-01-02 Thread hernan saab
Hey qq, This is not an answer but a few hints for you before you go ahead with your Hive project. - First of all, do you really really really really need to work directly on Hive instead of using an integrated solution such as Cloudera? - Ask yourself again, do you really really

Re: How to decide Hive Cluster capacity

2019-12-21 Thread Sungwoo Park
I think this problem of choosing a cluster capacity is really challenging because the desired cluster capacity depends not only on the size of the dataset but also on the complexity of queries. For example, the execution time of the TPC-DS queries on the same dataset can range from sub-10 seconds

Re: Hive 1.1.0 support on hive metastore 2.3.0

2019-12-09 Thread Priyam Gupta
Thanks. Will try out MR3. On Mon, Dec 9, 2019 at 3:51 PM Sungwoo Park wrote: > I didn't try to run multiple versions of Hive on the same cluster. If your > installation of Hive uses Tez installed on the Hadoop system, I guess > running multiple versions of Hive might not be easy because

Re: Subquery failing - is there an alternative?

2019-12-09 Thread Dan Horne
Alas, we’re on Hive 2.1 On Mon, 9 Dec 2019 at 9:28 PM, Vineet G wrote: > What version of hive are you using? Support for scalar subqueries was > added in 2.2 (ref: HIVE-15544 > ) > > Vineet > > On Dec 9, 2019, at 7:58 AM, Devopam Mittra wrote:

Re: Hive 1.1.0 support on hive metastore 2.3.0

2019-12-09 Thread Sungwoo Park
I didn't try to run multiple versions of Hive on the same cluster. If your installation of Hive uses Tez installed on the Hadoop system, I guess running multiple versions of Hive might not be easy because different versions of Hive use different versions of Tez (especially if you want to run

Re: Hive 1.1.0 support on hive metastore 2.3.0

2019-12-09 Thread Priyam Gupta
Thanks Park for sharing the tests that you did. I will try out for the specific version for my use cases. Is there any better way/approach where we can have a single meta store and can launch multiple hive clusters of different versions point to the same metastore db. Thanks. On Mon, Dec 9,

Re: Subquery failing - is there an alternative?

2019-12-09 Thread Vineet G
What version of hive are you using? Support for scalar subqueries was added in 2.2 (ref: HIVE-15544 ) Vineet > On Dec 9, 2019, at 7:58 AM, Devopam Mittra wrote: > > Please try with subquery alias . > Regards > > > On Mon, Dec 9, 2019, 6:06

Re: Hive 1.1.0 support on hive metastore 2.3.0

2019-12-08 Thread Sungwoo Park
Not a definitive answer, but my test result might help. I tested with HiveServer2 1.2.2 and Metastore 2.3.6. Queries in the TPC-DS benchmark (which only read data and never update) run okay. Creating new tables and loading data to tables also work okay. So, I guess for basic uses of Hive, running

Re: Subquery failing - is there an alternative?

2019-12-08 Thread Devopam Mittra
Please try with subquery alias . Regards On Mon, Dec 9, 2019, 6:06 AM Dan Horne wrote: > Hi All > > I'm trying to run the following subquery but it returns an error that says > "cannot recognize input near 'select' 'max' '(' in expression specification" > > select id, > > first_name, > >

Re: Hive Query Performance Tuning

2019-12-03 Thread Matthew Dixon
Hi Rajbir, some thoughts to consider, I’m wondering what the row_number() functionality is doing. Because the window frame has no ORDER BY clause the result may not be deterministic, is this the expected behaviour? I ask because analytic functions can be expensive to compute so make sure you

Re: ORC: duplicate record - rowid meaning ?

2019-12-01 Thread Peter Vary
Thanks David, Hope that Hive 3 streaming will help you soon to avoid these kind of headaches :) Peter > On Dec 1, 2019, at 17:57, David Morin wrote: > > Hi Peter, > > At the moment I have a pipeline based on Flink to write Orc Files. These Orc > Files can be read from Hive thanks to external

Re: ORC: duplicate record - rowid meaning ?

2019-12-01 Thread David Morin
Hi Peter, At the moment I have a pipeline based on Flink to write Orc Files. These Orc Files can be read from Hive thanks to external tables and, then, a merge statement (triggered by oozie) push these data into tables managed by Hive (transactional tables => ORC). Hive version is 2.1 because

Re: hive error: "Too many bytes before delimiter: 2147483648"

2019-12-01 Thread Shawn Weeks
That looks like you’ve encountered a file with no delimiter as that’s near the max size for an array or string. Also I don’t think you can terminate fields with a line feed as that’s the hard coded row delimiter. Thanks Shawn From: xuanhuang <18351886...@163.com> Reply-To:

Re: Update Performance in Hive with data stored as Parquet, ORC

2019-11-29 Thread Peter Vary
Hi Shivam, There were a lot of changes around ACID with the Hive 3.0 release. I assume below, that your question is about Hive 3.x release. Hive ACID v2 implements UPDATE as deleting the old row, and creating a new one for performance reasons. See Eugene's nice presentation for the details:

Re: ORC: duplicate record - rowid meaning ?

2019-11-29 Thread Peter Vary
Hi David, Not entirely sure what you are doing here :), my guess is that you are trying to write ACID tables outside of hive. Am I right? What is the exact use-case? There might be better solutions out there than writing the files by hand. As for your question below: Yes, the files should be

Re: How to manage huge partitioned table with 1000+ columns in Hive

2019-11-26 Thread Furcy Pin
Hello, Sorry for the late reply, but this problem is very interesting. How did you end up solving it in the end? I have an idea which is very ugly but might work: Create a big view that is an union of all partitions SELECT '2019-10-01' as ds, * FROM test_1 a JOIN test_2 b ON a.id = b.id JOIN

Re: ORC: duplicate record - rowid meaning ?

2019-11-19 Thread David Morin
here after more details about ORC content and the fact we have duplicate rows: /delta_0011365_0011365_/bucket_3 {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}

Re: Gather Partition Locations

2019-11-17 Thread mb
Vivek Shrivastava wrote: > If you have access to HCatalog, it also has jdbc connection that would > allow you to get faster response. ah ok. sounds awesome as well! I will check. thanks marko -- Marko Bauhardt Software Engineer www.datameer.com Phone: +49 345 279 5030 Datameer GmbH

Re: Gather Partition Locations

2019-11-16 Thread Vivek Shrivastava
If you have access to HCatalog, it also has jdbc connection that would allow you to get faster response. On Tue, Nov 12, 2019 at 6:53 AM Elliot West wrote: > Hello, > > We faced a similar problem. Additionally, we had job clients were > difficult to integrate directly with the Thirft API, but

Re: What is the Hive HA processing mechanism?

2019-11-15 Thread David Mollitor
is generally considered to be highly available. HiveServer2 does not share client session state between instances. Therefore, if a single HiveServer2 instance is lost, all of the work being performed which was originated from this instance will be lost. Clients will have to re-connect to another

Re: Help Needed to handle Hive Error: Container [xxx] is running beyond physical memory limits.

2019-11-15 Thread Pau Tallada
yes, 2 nodes is very few On Fri, Nov 15, 2019, 16:37 Sai Teja Desu wrote: > Thanks for your detailed explanation Pau. The query actually never > returned even after 4 hours, I had to cancel the query. The reason might > be, I have too many small orc files as an input to Hive table. > > Also,

Re: Help Needed to handle Hive Error: Container [xxx] is running beyond physical memory limits.

2019-11-15 Thread Sai Teja Desu
Thanks for your detailed explanation Pau. The query actually never returned even after 4 hours, I had to cancel the query. The reason might be, I have too many small orc files as an input to Hive table. Also, You are right my Cluster capacity is very less. But, do you suggest we should keep on

Re: Help Needed to handle Hive Error: Container [xxx] is running beyond physical memory limits.

2019-11-15 Thread Pau Tallada
Hi Sai, Let me summarize some of your data: You have a 9 billion record table with 4 columns, which should account for a minimum raw size of about 200 GiB (not including string column) You want to select ALL columns from rows with a specific value in a column which is not partitioned, so Hive

Re: Help Needed to handle Hive Error: Container [xxx] is running beyond physical memory limits.

2019-11-15 Thread Sai Teja Desu
Hey Pau, Thanks for the clarification. Yes, that helped to start the query, however the query was taking huge time to retrieve a few records. May I know what steps can I take to make this kind of query performance better? I mean the predicates which does not have partitioning. Thanks, Sai. On

Re: Help Needed to handle Hive Error: Container [xxx] is running beyond physical memory limits.

2019-11-14 Thread Pau Tallada
Hi, The error is from the AM (Application Master), because it has s many partitions to orchestrate that needs lots of RAM. As Venkat said, try increasing tez.am.resource.memory.mb to 2G, even 4 or 8 might be needed. Cheers, Pau. Missatge de Sai Teja Desu del dia dj., 14 de nov. 2019 a

Re: Help Needed to handle Hive Error: Container [xxx] is running beyond physical memory limits.

2019-11-14 Thread Sai Teja Desu
Thanks for the reply Venkatesh. I did tried to increase the tez container size to 4GB but still giving me the same error. In addition, below are the settings I have tried: set mapreduce.map.memory.mb=4096; set mapreduce.map.java.opts=-Xmx3686m; set mapreduce.reduce.memory.mb=8192; set

Re: Help Needed to handle Hive Error: Container [xxx] is running beyond physical memory limits.

2019-11-14 Thread Venkatesh Selvaraj
Try increasing the AM Container memory. set it to 2 gigs may be. Regards, Venkat On Thu, Nov 14, 2019, 6:46 AM Sai Teja Desu < saiteja.d...@globalfoundries.com> wrote: > Hello All, > > I'm new to hive development and I'm memory limitation error for running a > simple query with a predicate

RE: LLAP/Protobuffers Error: Class Cannot Be Cast to Class

2019-11-12 Thread Aaron Grubb
Turns out I was using the wrong JAR to provide the base classes for LlapDaemon. Removing hadoop-client-* from the classpath and using hadoop-common instead fixed this problem. From: Aaron Grubb Sent: Monday, November 11, 2019 1:11 PM To: user@hive.apache.org Subject: LLAP/Protobuffers Error:

Re: Gather Partition Locations

2019-11-12 Thread Elliot West
Hello, We faced a similar problem. Additionally, we had job clients were difficult to integrate directly with the Thirft API, but needed to resolve file locations via the metastore. To handle this, we build a cut down service with a REST API that fronts the Hive metastore. The API is optimised

Re: Gather Partition Locations

2019-11-12 Thread mb
Gopal Vijayaraghavan wrote: > That was the reason Hive shipped with metatool, though it remains fairly > obscure outside of the devs. > > hive --service metatool -executeJDOQL "select database.name + '.' + tableName > from org.apache.hadoop.hive.metastore.model.MTable" > > You need to join

Re: Gather Partition Locations

2019-11-12 Thread mb
Ashutosh Bapat wrote: > There are multiple ways > 1. Query the HiveMetaStore directly. do you mean via thrift client? or directly native jdbc? But i think this is in an enterprise env not possible, when i'm not on the same machine where hive server is running. i believe the mysql or postgres

re: Gather Partition Locations

2019-11-11 Thread Gopal Vijayaraghavan
Hi, > I have a question about how to get the location for a bunch of partitions. ... > But in an enterprise environment I'm pretty sure this approach would not be > the best because the RDS (mysql or derby) is maybe not reachable or > I don't have the permission to it. That was the reason Hive

Re: Gather Partition Locations

2019-11-11 Thread Ashutosh Bapat
There are multiple ways 1. Query the HiveMetaStore directly. 2. Use sys.* tables or better even information_schema to get the same. On Fri, Nov 8, 2019 at 9:14 PM wrote: > Hi, > I have a question about how to get the location for a bunch of partitions. > My answer is: using the hive query

Re: unsubscribe

2019-11-10 Thread Liping Zhang
Hello Hive Team, Please unsubscribe me from this email list. Thank you. Sent from my iPhone > On Nov 10, 2019, at 7:30 PM, Dawood Munavar S M > wrote: > > Hello Hive Team, > > Please unsubscribe me from this email list. > > Thank you.

Re: Unsubscribe

2019-11-10 Thread Liping Zhang
Unsubscribe Sent from my iPhone > On Nov 10, 2019, at 6:29 PM, 王志刚 wrote: > > Unsubscribe

Re: Unsubscribe

2019-11-08 Thread Liping Zhang
Unsubscribe Sent from my iPhone > On Nov 8, 2019, at 8:49 AM, Delgado Talavera, Alfredo R. > wrote: > > Unsubscribe

Re: Unsubscribe

2019-11-08 Thread Delgado Talavera, Alfredo R.
Unsubscribe Obtener Outlook para iOS De: Ajit Kumar Shreevastava Enviado: Friday, November 8, 2019 10:49:22 AM Para: user@hive.apache.org Asunto: Unsubscribe Unsubscribe With Regards Ajit Kumar Shreevastava ::DISCLAIMER::

RE: Hive Not Returning YARN Application Results Correctly Nor Inserting Into Local Tables

2019-11-08 Thread Aaron Grubb
that mapreduce.framework.name=local (default in Hadoop 3.2.1) caused the container to use the local filesystem for everything. “Set mapreduce.framework.name=yarn” solved this problem. Thanks, Aaron From: Sungwoo Park Sent: Wednesday, November 6, 2019 8:59 PM To: user@hive.apache.org Subject: Re: Hive Not Returning YARN

Re: Hive Not Returning YARN Application Results Correctly Nor Inserting Into Local Tables

2019-11-06 Thread Sungwoo Park
For the problem of not returning the result to the console, I think it occurs because the default file system is set to local file system, not to HDFS. Perhaps hive.exec.scratchdir is already set to /tmp/hive, but if the default file system is local, FileSinkOperator writes the final result to the

Re: INSERT OVERWRITE Failure Saftey

2019-11-06 Thread David M
i36> From: Shawn Weeks Sent: Wednesday, November 6, 2019 5:35:17 PM To: user@hive.apache.org Subject: Re: INSERT OVERWRITE Failure Saftey I’m not sure specific to Hive 1.3 but in other versions the data is written to a temp location and then at the end of the query the previou

Re: INSERT OVERWRITE Failure Saftey

2019-11-06 Thread Shawn Weeks
I’m not sure specific to Hive 1.3 but in other versions the data is written to a temp location and then at the end of the query the previous data is deleted and the new data is renamed/moved. Something to watch out for is if the query returns no rows than the old data isn’t removed. Thanks

Re: review the code question

2019-10-31 Thread Peter Vary
Hi! There is a wiki page outlining the way how to contribute. See: https://cwiki.apache.org/confluence/display/Hive/HowToContribute Thanks, Peter > On Oct 30, 2019, at 09:28, 阿伦 <849551...@qq.com> wrote: > > Hello: > >

Re: How to migrate a table of a Hive database to another database

2019-10-30 Thread Pau Tallada
reply! > > > -- 原始邮件 -- > *发件人:* "我自己的邮箱"<987626...@qq.com>; > *发送时间:* 2019年10月30日(星期三) 下午3:59 > *收件人:* "user"; "user"; > *主题:* 回复: How to migrate a table of a Hive database to another database > > think you very much >

Re: How to migrate a table of a Hive database to another database

2019-10-30 Thread Pau Tallada
hi! alter table db1.tablename rename to db2.tablename Cheers! On Wed, Oct 30, 2019, 08:16 qq <987626...@qq.com> wrote: > Hello: >How to migrate a table of a Hive database to another database? >For example: >Hive contains two databases: databaseA and databaseB. I want to

Re: So many SQL ROLLBACK commands on the Hive PostgreSQL table

2019-10-10 Thread Antunes, Fernando De Souza
uot;user@hive.apache.org" Date: Thursday, 10 October 2019 04:37 To: "user@hive.apache.org" Subject: Re: So many SQL ROLLBACK commands on the Hive PostgreSQL table **This Message originated from a Non-ArcelorMittal source** Hi Fernando, My guess is that this is the query: https:/

Re: So many SQL ROLLBACK commands on the Hive PostgreSQL table

2019-10-10 Thread Peter Vary
Hive program. (???) > > From: Peter Vary mailto:pv...@cloudera.com>> > Reply-To: "user@hive.apache.org <mailto:user@hive.apache.org>" > mailto:user@hive.apache.org>> > Date: Wednesday, 9 October 2019 08:32 > To: "user@hive.apache.org <mailto:user

Re: So many SQL ROLLBACK commands on the Hive PostgreSQL table

2019-10-09 Thread Antunes, Fernando De Souza
;user@hive.apache.org" Date: Wednesday, 9 October 2019 08:32 To: "user@hive.apache.org" Subject: Re: So many SQL ROLLBACK commands on the Hive PostgreSQL table **This Message originated from a Non-ArcelorMittal source** Hi Fernando, Checked the comapaction_queue related one, and that

Re: So many SQL ROLLBACK commands on the Hive PostgreSQL table

2019-10-09 Thread Peter Vary
Hi Fernando, Checked the comapaction_queue related one, and that is definitely normal. Checked the txn related one, and that seems more interesting. I would try to run the query above against you HMS DB - my guess that is failing with some error. Peter > On Oct 9, 2019, at 12:56, Antunes,

Re: How to manage huge partitioned table with 1000+ columns in Hive

2019-10-02 Thread Pau Tallada
Hi, I would say the most efficient way would be option (3), where all the subtables are partitioned by date, and clustered+**sorted** by id. This way, efficient SMB map joins can be performed over the 10 tables of the same partition. Unfortunately, I haven't found a way to achieve SMB map joins*

Re: Delegation tokens for HDFS

2019-09-29 Thread Julien Phalip
Hi, thanks for your reply. Regarding your statement: > If you aren't using Hive Server 2, the user acquires tokens before the query gets submitted to Yarn. So is it right to say that Beeline doesn't support this pattern, i.e. collecting HDFS delegation tokens before submitting the job? Do you

Re: Support Parquet through HCatalog

2019-09-27 Thread Jay Green Stevens
Hi Peter, Do you have any pointers for next steps to take? Thanks in advance, Jay From: Jay Green Stevens Reply-To: "user@hive.apache.org" Date: Friday, 27 September 2019 at 14:53 To: "user@hive.apache.org" Subject: Re: Support Parquet through HCatalog Ah okay. I have

Re: Support Parquet through HCatalog

2019-09-27 Thread Jay Green Stevens
Ah okay. I have opened a new ticket<https://issues.apache.org/jira/browse/HIVE-22249>. Thanks Peter! Jay From: Peter Vary Reply-To: "user@hive.apache.org" Date: Friday, 27 September 2019 at 13:37 To: "user@hive.apache.org" Subject: Re: Support Parquet throug

Re: Support Parquet through HCatalog

2019-09-27 Thread Peter Vary
ticket. > Would you be able to do it for me, please? > > Thanks, > > Jay > > From: Peter Vary > Reply-To: "user@hive.apache.org" > Date: Thursday, 26 September 2019 at 18:07 > To: "user@hive.apache.org" > Subject: Re: Support Parquet th

Re: Support Parquet through HCatalog

2019-09-27 Thread Jay Green Stevens
hive.apache.org" Subject: Re: Support Parquet through HCatalog Hi Jay, I suggest open a new jira if the patch does not cleanly apply to the branches you need. Thanks, Peter On Sep 26, 2019, at 15:41, Jay Green Stevens mailto:jgreenstev...@hotels.com>> wrote: Hi all, I’ve been aske

Re: Rename Hive Database

2019-09-27 Thread Miklos Gergely
Hi Tharun, Currently there is no way to rename a database in Hive. Regards, Miklos On Fri, Sep 27, 2019 at 6:47 AM Tharun Mothukuri wrote: > Is there a way to rename database name in Hive? > -- *Miklós Gergely* | Staff Software Engineer t. +36 (30) 579-6433 <00> cloudera.com

Re: Support Parquet through HCatalog

2019-09-26 Thread Peter Vary
Hi Jay, I suggest open a new jira if the patch does not cleanly apply to the branches you need. Thanks, Peter > On Sep 26, 2019, at 15:41, Jay Green Stevens wrote: > > Hi all, > > I’ve been asked to work with backporting the patch HIVE-8838.4.patch (from > HIVE-8838

Re: Please share a document to install Hive on top of Hadoop node

2019-09-24 Thread Dawood Munavar S M
Thanks, Nandini Mankale for sharing the document. On Tue, Sep 24, 2019 at 12:09 PM Nandini Mankale wrote: > Try this > > > https://kontext.tech/docs/DataAndBusinessIntelligence/p/apache-hive-300-installation-on-windows-10-step-by-step-guide > > > > If you want to install on Windows 10 > > > >

RE: Please share a document to install Hive on top of Hadoop node

2019-09-24 Thread Nandini Mankale
Try this https://kontext.tech/docs/DataAndBusinessIntelligence/p/apache-hive-300-installation-on-windows-10-step-by-step-guide If you want to install on Windows 10 Sent from Mail for Windows 10 From: Dawood Munavar S M Sent: 21 September 2019 00:12 To: user@hive.apache.org Subject: Please

Re: Delegation tokens for HDFS

2019-09-20 Thread Owen O'Malley
If you are using Hive Server 2 through jdbc: - The most common way is to have the data only accessible to the 'hive' user. Since the users don't have access to the underlying HDFS files, Hive can enforce column/row permissions. - The other option is to use doAs and run as the user.

Re: Questions about HIVE-20508

2019-09-20 Thread Peter Vary
Hi Julien, See my answers below: > On Sep 19, 2019, at 21:55, Julien Phalip wrote: > > Hi, > > I'm interested in a new config property that was added as part of HIVE-20508 > , and had a few questions: > > 1) The update was merged >

Re: [External Email] Re: Hive load data OpenCSVSerde comment control

2019-09-19 Thread 黄璞
Thanks for your reply. I want to know how to update those comments. > 在 2019年9月20日,00:22,Suresh Kumar Sethuramaswamy 写道: > > You are concerned about data dictionary getting overwritten? > > > Or do you want to know how to update those comments( from deserializer)? > > > Regards > Suresh

<    2   3   4   5   6   7   8   9   10   11   >