Re: ODBC-hiveserver2 question
Add JAR works with HDFS, though perhaps not with ODBC drivers.ADD JAR hdfs://:8020/hive_jars/hive-contrib-2.1.1.jar should work (depending on your nn port and confirm this file exists)Alternative syntaxADD JAR hdfs:/hive_jars/hive-contrib-2.1.1.jarThe ODBC driver could be having an issue with the forward slashes.The guaranteed method is to create a permanent association by adding the JAR to hive/lib or hadoop/lib on hiveserver2 node.Copying to hive-client/auxlib/ and restarting Hive is an option.Adding following property to Hive-env.sh is an optionHIVE_AUX_JARS_PATH=There may be a trace function for your ODBC driver to see a more detailed error. Some ODBC drivers may not support the ADD JAR syntax.cheers,AndrewOn February 23, 2018 at 3:27 PM Jörn Frankewrote: Add jar works only with local files on the Hive server.On 23. Feb 2018, at 21:08, Andy Srine < andy.sr...@gmail.com> wrote: Team,Is ADD JAR from HDFS (ADD JAR hdfs:///hive_jars/hive-contrib-2.1.1.jar;) supported in hiveserver2 via an ODBC connection? Some relevant points:I am able to do it in Hive 2.1.1 via JDBC (beeline), but not via an ODBC client.In Hive 1.2.1, I can add a jar from the local node, but not a JAR on HDFS.Some old blogs online say HiveServer2 doesn't support "ADD JAR " period. But thats not what I experience via beeline.Let me know your thoughts and experiences.Thanks,Andy
Re: Need help with query
Hi there, The detailed error should be in the hiveserver2.log Cheers, Andrew On Wed, Sep 21, 2016 at 3:36 PM, Igor Kravzov < igork.ine...@gmail.com [igork.ine...@gmail.com] > wrote: I run MSCK REPAIR TABLE mytable; and got Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask On Mon, Sep 12, 2016 at 6:56 PM, Lefty Leverenz < leftylever...@gmail.com [leftylever...@gmail.com] > wrote: Here's a list of the wikidocs about dynamic partitions [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DynamicPartitions] . -- Lefty On Mon, Sep 12, 2016 at 3:25 PM, Devopam Mittra < devo...@gmail.com [devo...@gmail.com] > wrote: Kindly learn dynamic partition from cwiki. That will be the perfect solution to your requirement in my opinion. Regards Dev On 13 Sep 2016 12:49 am, "Igor Kravzov" < igork.ine...@gmail.com [igork.ine...@gmail.com] > wrote: Hi, I have a query like this one alter table my_table add if not exists partition (mmdd=20160912) location '/mylocation/20160912'; Is it possible to make so I don't have to change date every day? Something with CURRENT_DATE;? Thanks in advance.
HiveServer2 thrift service thread pool error
Hi everyone,We have Hive 1.2.1.2.3 Thrift service installed with Atlas Plugin, Ranger Plugin. After some days, we exhaust the running threads, receive an error such as below in the logs and the service stops responding, requiring a restart.org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Hive Internal Error: java.util.concurrent.RejectedExecutionException(Task java.util.concurrent.FutureTask@1c9f4873 rejected from java.util.concurrent.ThreadPoolExecutor@1bacbbcc[Running, pool size = 1, active threads = 1, queued tasks = 1, completed tasks = 345])Caused by: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@1c9f4873 rejected from java.util.concurrent.ThreadPoolExecutor@1bacbbcc[Running, pool size = 1, active threads = 1, queued tasks = 1, completed tasks = 345]Does anyone have further information on troubleshooting this issue or means to determine what is exhausting the thread pool?thanks,Andrew
Re: Some dates add/less a day...
It is HIVE-13948.https://github.com/apache/hive/commit/da3ed68eda10533f3c50aae19731ac6d059cda87https://issues.apache.org/jira/browse/HIVE-13948Regards,AndrewOn July 29, 2016 at 6:44 PM Julián Arocena <jaroc...@temperies.com> wrote:Hey, thank you so much! I was going crazy, you can image it :)Please let me know if you have it.I will have a nice weekend with this newsBest regards,El 29/7/2016 18:44, "Andrew Sears" <andrew.se...@analyticsdream.com> escribió:Hi there,This is a critical bug fixed by a JIRA, will see if I can get the number for you. It involves patching lib/hive-* files.Cheers, Andrew On Fri, Jul 29, 2016 at 4:37 PM, Julián Arocena <jaroc...@temperies.com> wrote:Hi,I´m having a problem with some dates using external tables to a text file. Let me give you an example:file content:1946-10-011946-10-02table:create external table date_issue_test(date_test Date)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\001'LINES TERMINATED BY '\n'STORED AS TEXTFILELOCATION '/user/hive/test';Select * from date_issue_test;OK1946-10-021946-10-02As you can see in this case it adds a day, there are a few cases like this.Also I tried with a CAST and a fixed date as bellow :hive> select CAST('1946-10-01' as date) from date_issue_test limit 1;OK1946-10-02Any idea to help me?Thank you so much!Julian
Re: Some dates add/less a day...
Hi there, This is a critical bug fixed by a JIRA, will see if I can get the number for you. It involves patching lib/hive-* files. Cheers, Andrew On Fri, Jul 29, 2016 at 4:37 PM, Julián Arocena < jaroc...@temperies.com [jaroc...@temperies.com] > wrote: Hi, I´m having a problem with some dates using external tables to a text file. Let me give you an example: file content: 1946-10-01 1946-10-02 table: create external table date_issue_test ( date_test Date ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/hive/test'; Select * from date_issue_test; OK 1946-10-02 1946-10-02 As you can see in this case it adds a day, there are a few cases like this. Also I tried with a CAST and a fixed date as bellow : hive> select CAST('1946-10-01' as date) from date_issue_test limit 1; OK 1946-10-02 Any idea to help me? Thank you so much! Julian
Re: Hive on TEZ + LLAP
HDP 2.5 includes LLAP. Cheers, Andrew On Fri, Jul 15, 2016 at 11:36 AM, Jörn Franke < jornfra...@gmail.com [jornfra...@gmail.com] > wrote: I would recommend a distribution such as Hortonworks were everything is already configured. As far as I know llap is currently not part of any distribution. On 15 Jul 2016, at 17:04, Ashok Kumar < ashok34...@yahoo.com [ashok34...@yahoo.com] > wrote: Hi, Has anyone managed to make Hive work with Tez + LLAP as the query engine in place of Map-reduce please? If you configured it yourself which version of Tez and LLAP work with Hive 2. Do I need to build Tez from source for example Thanks
Re: Best Hive Authorization Model for Shared data
Hi there, Depending on your distribution you may need to look at tools like Ranger or Sentry, which should extend the model to meet your needs. Regards, Andrew On Tue, Apr 12, 2016 at 6:42 PM, Udit Mehta < ume...@groupon.com [ume...@groupon.com] > wrote: Hi all, I wanted to understand what authorization model is most suitable for a production environment where most of the data is shared between multiple teams and users. I know this is would depend more on the use case but I cant seem to figure out the best model for our use: We have data that is owned by a certain process (R/W access for that user) while other users only have Read access to that data. We have a lot of instances when users would want to create external tables pointing to this data. We tried the following 3 auth models: 1. Default Authorization model : This we think is less secure and any user can grant himself access to create/modify tables and databases even where they are not supposed to. We would want to have much tighter security than this model provides. 2. Storage Based Authorization : While this helps us by preventing users from modifying metadata by checking the HDFS permissions of the underlying directories, it prevents our most important use case of letting users create external tables on data they dont have write access to. I would assume external tables wont actually delete the data when dropping tables/partitions so this operation should be allowed. But because it is not, even this authorization model does not meet our use case. 3. Sql Standard Based Authorization: This does give us fine-grained control over which users can perform specific commands, but when it comes to creating external tables, even this authorization scheme seems to use the filesystem's permissions. So overall all 3 models didnt seem to fulfill our requirement here which I think would be a fairly common one. I want to know how other users manage security on Hive or If i am missing something. Thanks in advance, Udit
Re: Best way of Unpivoting of hiva table data. Any Analytic function for unpivoting
From mytable Select id, 'mycol' as name, col1 as value Union Select id, 'mycol2' as name, col2 as value Something like this might work for you? Cheers, Andrew On Mon, Mar 28, 2016 at 7:53 PM, Ryan Harris < ryan.har...@zionsbancorp.com [ryan.har...@zionsbancorp.com] > wrote: collect_list(col) will give you an array with all of the data from that column However, the scalability of this approach will have limits. -Original Message- From: mahender bigdata [mailto:mahender.bigd...@outlook.com] Sent: Monday, March 28, 2016 5:47 PM To: user@hive.apache.org Subject: Best way of Unpivoting of hiva table data. Any Analytic function for unpivoting Hi, Has any one implemented Unpivoting of Hive external table data. We would like Convert Columns into Multiple Rows. We have external table, which holds almost 2 GB of Data. is there best and quicker way of Converting columns into Row. Any Analytic functions available in Hive to do Unpivoting. == THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately. Thank you.
Re: Automatic Update statistics on ORC tables in Hive
It would be useful to have a script that could be scheduled as part of a low priority background job, to update stats at least where none are available, and a report in the Hive GUI on stats per table. Encountered a Tez oo memory issue due to the lack of auto updated stats recently. Cheers, Andrew On Mon, Mar 28, 2016 at 2:27 PM, Mich Talebzadeh < mich.talebza...@gmail.com [mich.talebza...@gmail.com] > wrote: Hi Alan, Thanks for the clarification. I gather you are referring to the following notes in Jira "Given the work that's going on in HIVE-11160 [https://issues.apache.org/jira/browse/HIVE-11160] and HIVE-12763 [https://issues.apache.org/jira/browse/HIVE-12763] I don't think it makes sense to continue down this path. These JIRAs will lay the groundwork for auto-gathering stats on data as it is inserted rather than having a background process do the work." I concur that I am not a fan of automatic update statistics although many RDBMS vendor were touting about it in earlier days. The whole thing turned up to be a hindrance as UPDATE STATISTICS was being fired in the middle of the business day thus adding issues to the workload by taking resources away. Most vendors base the need for update/gathering stats on the number of rows being changed by relying on some Function say datachange(). When datachange() function indicates changes by 10% so it is time for update stats to run. Again in my opinion rather arbitrary and void of any scientific base. For Hive the important one is Inserts. For transactional tables one will have Updates and Deletes as well. My understanding is that the classical approach is to report on how many "row change operations" say Inserts have been performed since the last time any kind of analyze statistics was run. This came to my mind as I was using Spark to load CSV files and create and insert in Hive ORC tables. The problem I have is that Analyse statistics through Spark fails. This is not a show stopper as the load shell script invokes beeline to log in to Hive and Analyze statistics on the newly created table. Although some proponents might argue about saving data in Spark as Parquet file, when one has millions and millions of rows then stats matter and then ORC adds its value. Cheers Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw [https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw] http://talebzadehmich.wordpress.com [http://talebzadehmich.wordpress.com/] On 28 March 2016 at 18:43, Alan Gates < alanfga...@gmail.com [alanfga...@gmail.com] > wrote: I resolved that as Won’t Fix. See the last comment on the JIRA for my rationale. Alan. > On Mar 28, 2016, at 03:53, Mich Talebzadeh < mich.talebza...@gmail.com > [mich.talebza...@gmail.com] > wrote: > > Thanks. This does not seem to be implemented although the Jira says resolved. > It also mentions the timestamp of the last update stats. I do not see it yet. > > Regards, > > Mich > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > [https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw] > > http://talebzadehmich.wordpress.com [http://talebzadehmich.wordpress.com] > > > On 28 March 2016 at 06:19, Gopal Vijayaraghavan < gop...@apache.org > [gop...@apache.org] > wrote: > > > This might be a bit far fetched but is there any plan for background > >ANALYZE STATISTICS to be performed on ORC tables > > > https://issues.apache.org/jira/browse/HIVE-12669 > [https://issues.apache.org/jira/browse/HIVE-12669] > > Cheers, > Gopal > > >
Re: read-only mode for hive
Another option might be to lock using a zookeeper script. Andrew Sent using CloudMagic Email [https://cloudmagic.com/k/d/mailapp?ct=pa=7.4.10=5.1.1=email_footer_2] On Wed, Mar 09, 2016 at 7:05 PM, Andrew Sears < andrew.se...@analyticsdream.com [andrew.se...@analyticsdream.com] > wrote: What about renaming the table? To another schema with limited rights? Not sure why just flipping access grant to select only wouldn't also work, provided auth is enabled and not external. An hdfs snapshot could also give you point -in-time copy. Set acls to restrict access if enabled. Cheers, Andrew On Wed, Mar 09, 2016 at 1:15 PM, PG User < pguser1...@gmail.com [pguser1...@gmail.com] > wrote: Thank you all for replies. My usecase is as follows: I want to put a table (or database) in read-only mode. Then do some operations such as taking table definition and hdfs snapshot. I want to put table in read only mode to maintain consistency. After all my operations are done, I will again put hive to read-write mode. Sentry may not be solution as it will not handle existing transactions. creating view will not solve the purpose either if inserts are going on. - Nachiket On Wed, Mar 9, 2016 at 7:20 AM, David Capwell < dcapw...@gmail.com [dcapw...@gmail.com] > wrote: Could always set the tables output format to be the null output format On Mar 8, 2016 11:01 PM, "Jörn Franke" < jornfra...@gmail.com [jornfra...@gmail.com] > wrote: What is the use case? You can try security solutions such as Ranger or Sentry. As already mentioned another alternative could be a view. > On 08 Mar 2016, at 21:09, PG User < pguser1...@gmail.com > [pguser1...@gmail.com] > wrote: > > Hi All, > I have one question about putting hive in read-only mode. > > What are the ways of putting hive in read-only mode? > Can I take a lock at database level to serve purpose? What will happen to existing transaction? My guess is it will not grant a lock until all transactions are complete. > > I read to change owner ship of /user/hive/warehouse/, but it is not full proof solution. > > Thank you. > > - PG User