Re: Hive Config for ignoring Glacier Object while querying
Hi Team, It will be really helpful if someone can guide me here ? Regards, Anup Tiwari On Fri, Feb 28, 2020 at 6:12 PM Anup Tiwari wrote: > Hi Team, > > Do we have any config like we have in presto which ignores glacier objects > rather than failing the query ? > > *hive.s3.skip-glacier-objects > <https://prestodb.io/docs/current/connector/hive.html> :- Ignore Glacier > objects rather than failing the query. This will skip data that may be > expected to be part of the table or partition. Defaults to false.* > > Actually i have a partition tables of which data is on S3. We have > archived some older data to glacier so now if we run query it fails so we > have applied a partition date condition after which query is scanning only > those objects which are on S3 but just wanted to know that do we have any > config for ignoring glacier objects ? > > Regards, > Anup Tiwari >
Hive Config for ignoring Glacier Object while querying
Hi Team, Do we have any config like we have in presto which ignores glacier objects rather than failing the query ? *hive.s3.skip-glacier-objects <https://prestodb.io/docs/current/connector/hive.html> :- Ignore Glacier objects rather than failing the query. This will skip data that may be expected to be part of the table or partition. Defaults to false.* Actually i have a partition tables of which data is on S3. We have archived some older data to glacier so now if we run query it fails so we have applied a partition date condition after which query is scanning only those objects which are on S3 but just wanted to know that do we have any config for ignoring glacier objects ? Regards, Anup Tiwari
Unable to Cast "array>" column to "string"
Hi Team, I have a column in parquet table of datatype "array>" and now i have created one more table in which datatype of this column is "string". But when i am trying to insert data in newly table then i am getting *below error* , let me know how can i achieve this(i already tried explicit cast as string but it failed with similar error):- *FAILED: ClassCastException org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo* 2020-01-06T22:19:08,994 ERROR [main] ql.Driver: FAILED: ClassCastException org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to org.apach e.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo java.lang.ClassCastException: org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeIn fo at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.isRedundantConversionFunction(TypeCheckProcFactory.java:765) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:869) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1317) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:219) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:163) Regards, Anup Tiwari
Re: Less than(<) & Greater than(>) condition failing on string column but min/max is working
Hi Team, Can someone look into this and help me to understand the behavior and suggest other way around? On Thu, 15 Aug 2019 08:02 Anup Tiwari, wrote: > Hi Team, > > Can someone help me with this? > > On Tue, 13 Aug 2019 01:11 Anup Tiwari, wrote: > >> Hi All, >> >> I have a use case where i wanted to store multiple columns into a single >> map data type column but while doing so i came across a use case where i >> have mainly 2 type of column bigint and string so i stored them in a map >> column defined as and then i queried the key which hold >> integer values in a map column. >> >> Case 1 :- when i am applying > , < conditional operator then returning >> output is correct. >> Case 2 :- Taking min/max on integer column and then it is giving me >> incorrect output. >> >> Now i understood that it is due datatype of map column which is "string" >> but just a question, why min/max is working on string and <,> conditions >> failing ? >> >> Please note that the column which i am querying, will always contain int >> value so let me know if you have any suggestions. >> >> Combining and storing value in map datatype is necessary because each row >> can have different-different number of metadata depending upon event name. >> >> Please find table structure below :- >> >> CREATE TABLE `anup.test_map`( >> `SID` string, >> `eventName` string, >> `timestamp` timestamp, >> `merge` map) >> STORED AS PARQUET; >> >> Regards, >> Anup Tiwari >> >
Re: Less than(<) & Greater than(>) condition failing on string column but min/max is working
Hi Team, Can someone help me with this? On Tue, 13 Aug 2019 01:11 Anup Tiwari, wrote: > Hi All, > > I have a use case where i wanted to store multiple columns into a single > map data type column but while doing so i came across a use case where i > have mainly 2 type of column bigint and string so i stored them in a map > column defined as and then i queried the key which hold > integer values in a map column. > > Case 1 :- when i am applying > , < conditional operator then returning > output is correct. > Case 2 :- Taking min/max on integer column and then it is giving me > incorrect output. > > Now i understood that it is due datatype of map column which is "string" > but just a question, why min/max is working on string and <,> conditions > failing ? > > Please note that the column which i am querying, will always contain int > value so let me know if you have any suggestions. > > Combining and storing value in map datatype is necessary because each row > can have different-different number of metadata depending upon event name. > > Please find table structure below :- > > CREATE TABLE `anup.test_map`( > `SID` string, > `eventName` string, > `timestamp` timestamp, > `merge` map) > STORED AS PARQUET; > > Regards, > Anup Tiwari >
Less than(<) & Greater than(>) condition failing on string column but min/max is working
Hi All, I have a use case where i wanted to store multiple columns into a single map data type column but while doing so i came across a use case where i have mainly 2 type of column bigint and string so i stored them in a map column defined as and then i queried the key which hold integer values in a map column. Case 1 :- when i am applying > , < conditional operator then returning output is correct. Case 2 :- Taking min/max on integer column and then it is giving me incorrect output. Now i understood that it is due datatype of map column which is "string" but just a question, why min/max is working on string and <,> conditions failing ? Please note that the column which i am querying, will always contain int value so let me know if you have any suggestions. Combining and storing value in map datatype is necessary because each row can have different-different number of metadata depending upon event name. Please find table structure below :- CREATE TABLE `anup.test_map`( `SID` string, `eventName` string, `timestamp` timestamp, `merge` map) STORED AS PARQUET; Regards, Anup Tiwari
Unable to query remote hive metastore
Hi All, I have Hive 2.1.1 installed on one of our cluster(hadoop-1xx:9000) and now i am trying to read it from another Hive(2.1.1) cluster(ds-1xx) but it is throwing me below error, also it seems like HIVE-14380 <https://jira.apache.org/jira/browse/HIVE-14380> so do we have any workaround for this OR upgrading hive is the only option :- *ERROR :-* 2018-10-22T17:13:40,100 ERROR [main] ql.Driver: FAILED: SemanticException Unable to determine if hdfs://hadoop-1xx:9000/usr/hive/warehouse/cad_s3 is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://hadoop-1xx:9000/usr/hive/warehouse/cad_s3, expected: hdfs://ds-1xx:9000 org.apache.hadoop.hive.ql.parse.SemanticException: Unable to determine if hdfs://hadoop-1xx:9000/usr/hive/warehouse/cad_s3 is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://hadoop-1xx:9000/usr/hive/warehouse/cad_s3, expected: hdfs://ds-1xx:9000 at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1860) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10775) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10826) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:477) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1242) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1384) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:239) at org.apache.hadoop.util.RunJar.main(RunJar.java:153) Regards, Anup Tiwari
Re: How to Grant All Privileges for All Databases except one in Hive SQL
Hive doesn't have a "grant select on db.*" option, which is what I think you're looking for here. Yes i am looking something like this only and since it is not available, does that mean i have to go for each table ? I am asking because we have many DBs and a lot of tables within each DB so is there any other way ? Regards, Anup Tiwari On Mon, Sep 17, 2018 at 8:48 PM Alan Gates wrote: > What you are seeing is correct behavior. Select on the database means the > user can see objects in the database (ie, tables, views). To see contents > of those objects you have to grant access on those objects. Hive doesn't > have a "grant select on db.*" option, which is what I think you're looking > for here. > > Alan. > > On Mon, Sep 17, 2018 at 5:50 AM Anup Tiwari > wrote: > >> Hi Alan, >> >> I have given select access of a database to a role which is attached to a >> user but after this also that user is not able to execute select statements >> on tables of that database. But if i provide access at table level then >> that is working. Can you please help me here ? >> >> Hive Version : 2.3.2 >> >> Please find below steps :- >> >> 1. Added below confifuration in hive-site.xml >> >> >> hive.server2.enable.doAs >> false >> >> >> >> hive.users.in.admin.role >> hadoop >> >> >> >> hive.security.authorization.manager >> >> >> org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory >> >> >> >> hive.security.authorization.enabled >> true >> >> >> >> hive.security.authenticator.manager >> >> >> org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator >> >> >> 2. Restarted Hive Server2. >> >> 3. Logged in to hive shell with hadoop user and executed below command >> without any error :- >> >> set role admin; >> create role readonly; >> GRANT ROLE readonly TO USER `user2`; >> GRANT SELECT ON DATABASE anup TO ROLE readonly; >> >> 4. Logged in to hive shell with user2 and executed below commands :- >> >> select * from anup.t2 limit 5; >> >> *Error :-* >> Error: Error while compiling statement: FAILED: >> HiveAccessControlException Permission denied: Principal [name=mohan.b, >> type=USER] does not have following privileges for operation QUERY [[SELECT] >> on Object [type=TABLE_OR_VIEW, name=anup.t2]] (state=42000,code=4) >> >> >> show current roles; >> +---+ >> | role| >> +---+ >> | public| >> | readonly | >> +---+ >> 2 rows selected (0.085 seconds) >> >> SHOW GRANT ROLE `readonly` ON DATABASE anup; >> >> +---+++-+-+-++---++--+ >> | database | table | partition | column | principal_name | >> principal_type | privilege | grant_option | grant_time | grantor | >> >> +---+++-+-+-++---++--+ >> | anup ||| | readonly| >> ROLE| SELECT | false | 1537187896000 | hadoop | >> >> +---+++-+-+-++---++--+ >> >> Regards, >> Anup Tiwari >> >> >> On Fri, Sep 14, 2018 at 10:50 PM Alan Gates wrote: >> >>> You can see a full list of what grant supports at >>> https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-Grant >>> >>> There is no "grant x to user on all databases" or regex expressions for >>> database names. So you'll have to do the databases one by one. >>> >>> External security managers such as Apache Ranger (and I think Apache >>> Sentry, but I'm not sure) can do blanket policies or default policies. >>> This has the added advantage that as new databases are created the policies >>> immediately apply. >>> >>> Alan. >>> >>> On Thu, Sep 13, 2018 at 10:37 PM Anup Tiwari >>> wrote: >>> >>>> Hi, >>>> >>>> Can someone reply on this? >>>> >>>> On Tue, 11 Sep 2018 19:21 Anup Tiwari, wrote: >>>> >>>>> Hi All, >>>>> >>>>> I have similar requirement as mentioned in the link Link to question >>>>> <https://stackoverflow.com/questions/38199021/how-to-grant-all-privileges-for-all-databases-except-one-in-hive-sql> >>>>> . >>>>> >>>>> *Requirement :-* >>>>> >>>>> I know how to grant privileges on a database to a role in Hive SQL. >>>>> For example, GRANT ALL ON DATABASE analyst1 TO ROLE analyst_role; >>>>> But there are hundreds of databases on my system, it's almost >>>>> impossible to grant one by one. >>>>> Is it possible to grant all privileges for all databases ? >>>>> Also Is it possible to grant all privileges for all databases except >>>>> one database(ex: db.name = temp)? >>>>> >>>>> >>>>> Regards, >>>>> Anup Tiwari >>>>> >>>>
Re: How to Grant All Privileges for All Databases except one in Hive SQL
Hi Alan, I have given select access of a database to a role which is attached to a user but after this also that user is not able to execute select statements on tables of that database. But if i provide access at table level then that is working. Can you please help me here ? Hive Version : 2.3.2 Please find below steps :- 1. Added below confifuration in hive-site.xml hive.server2.enable.doAs false hive.users.in.admin.role hadoop hive.security.authorization.manager org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory hive.security.authorization.enabled true hive.security.authenticator.manager org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator 2. Restarted Hive Server2. 3. Logged in to hive shell with hadoop user and executed below command without any error :- set role admin; create role readonly; GRANT ROLE readonly TO USER `user2`; GRANT SELECT ON DATABASE anup TO ROLE readonly; 4. Logged in to hive shell with user2 and executed below commands :- select * from anup.t2 limit 5; *Error :-* Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: Principal [name=mohan.b, type=USER] does not have following privileges for operation QUERY [[SELECT] on Object [type=TABLE_OR_VIEW, name=anup.t2]] (state=42000,code=4) show current roles; +---+ | role| +---+ | public| | readonly | +---+ 2 rows selected (0.085 seconds) SHOW GRANT ROLE `readonly` ON DATABASE anup; +---+++-+-+-++---++--+ | database | table | partition | column | principal_name | principal_type | privilege | grant_option | grant_time | grantor | +---+++-+-+-++---++--+ | anup ||| | readonly| ROLE| SELECT | false | 1537187896000 | hadoop | +---+++-+-+-++---++--+ Regards, Anup Tiwari On Fri, Sep 14, 2018 at 10:50 PM Alan Gates wrote: > You can see a full list of what grant supports at > https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-Grant > > There is no "grant x to user on all databases" or regex expressions for > database names. So you'll have to do the databases one by one. > > External security managers such as Apache Ranger (and I think Apache > Sentry, but I'm not sure) can do blanket policies or default policies. > This has the added advantage that as new databases are created the policies > immediately apply. > > Alan. > > On Thu, Sep 13, 2018 at 10:37 PM Anup Tiwari > wrote: > >> Hi, >> >> Can someone reply on this? >> >> On Tue, 11 Sep 2018 19:21 Anup Tiwari, wrote: >> >>> Hi All, >>> >>> I have similar requirement as mentioned in the link Link to question >>> <https://stackoverflow.com/questions/38199021/how-to-grant-all-privileges-for-all-databases-except-one-in-hive-sql> >>> . >>> >>> *Requirement :-* >>> >>> I know how to grant privileges on a database to a role in Hive SQL. >>> For example, GRANT ALL ON DATABASE analyst1 TO ROLE analyst_role; >>> But there are hundreds of databases on my system, it's almost impossible >>> to grant one by one. >>> Is it possible to grant all privileges for all databases ? >>> Also Is it possible to grant all privileges for all databases except one >>> database(ex: db.name = temp)? >>> >>> >>> Regards, >>> Anup Tiwari >>> >>
Re: How to Grant All Privileges for All Databases except one in Hive SQL
Hi, Can someone reply on this? On Tue, 11 Sep 2018 19:21 Anup Tiwari, wrote: > Hi All, > > I have similar requirement as mentioned in the link Link to question > <https://stackoverflow.com/questions/38199021/how-to-grant-all-privileges-for-all-databases-except-one-in-hive-sql> > . > > *Requirement :-* > > I know how to grant privileges on a database to a role in Hive SQL. > For example, GRANT ALL ON DATABASE analyst1 TO ROLE analyst_role; > But there are hundreds of databases on my system, it's almost impossible > to grant one by one. > Is it possible to grant all privileges for all databases ? > Also Is it possible to grant all privileges for all databases except one > database(ex: db.name = temp)? > > > Regards, > Anup Tiwari >
Re: Not able to read Hive ACID table data created by Hive 2.1.1 in hive 2.3.3
Hi, I tried option 1 mentioned in trail mail and its working but due to this there are different access related tasks which needs to be done. So can anyone reply on point 2 and 3, it will be helpful to understand other possible solution. Regards, Anup Tiwari On Fri, Sep 7, 2018 at 10:15 AM Anup Tiwari wrote: > Hi Gopal, > > Thanks for reply. I have 3 questions:- > > 1. We have our metastore DB for hive 2.1.1 in MySQL so when you say > the only way it works if you connect to the old metastore. > Does that mean, if we point hive 2.3.3 to same metastore DB as 2.1.1 , > this will work? > > 2. Is this feature available in Hive 3.x? > > 3. Also can you please suggest best possible work around in such case on > Hive 2.x ? > > On Fri, 7 Sep 2018 05:25 Gopal Vijayaraghavan, wrote: > >> > msck repair table ; >> >> msck repair does not work on ACID tables. >> >> In Hive 2.x, there is no way to move, replicate or rehydrate ACID tables >> from a cold store - the only way it works if you connect to the old >> metastore. >> >> Cheers, >> Gopal >> >> >>
How to Grant All Privileges for All Databases except one in Hive SQL
Hi All, I have similar requirement as mentioned in the link Link to question <https://stackoverflow.com/questions/38199021/how-to-grant-all-privileges-for-all-databases-except-one-in-hive-sql> . *Requirement :-* I know how to grant privileges on a database to a role in Hive SQL. For example, GRANT ALL ON DATABASE analyst1 TO ROLE analyst_role; But there are hundreds of databases on my system, it's almost impossible to grant one by one. Is it possible to grant all privileges for all databases ? Also Is it possible to grant all privileges for all databases except one database(ex: db.name = temp)? Regards, Anup Tiwari
Re: Not able to read Hive ACID table data created by Hive 2.1.1 in hive 2.3.3
Hi Gopal, Thanks for reply. I have 3 questions:- 1. We have our metastore DB for hive 2.1.1 in MySQL so when you say the only way it works if you connect to the old metastore. Does that mean, if we point hive 2.3.3 to same metastore DB as 2.1.1 , this will work? 2. Is this feature available in Hive 3.x? 3. Also can you please suggest best possible work around in such case on Hive 2.x ? On Fri, 7 Sep 2018 05:25 Gopal Vijayaraghavan, wrote: > > msck repair table ; > > msck repair does not work on ACID tables. > > In Hive 2.x, there is no way to move, replicate or rehydrate ACID tables > from a cold store - the only way it works if you connect to the old > metastore. > > Cheers, > Gopal > > >
Not able to read Hive ACID table data created by Hive 2.1.1 in hive 2.3.3
Hi All, I have created an ACID manage table in hive 2.1.1 whose location is S3. Now i am creating same table(same DDL) on hive 2.3.3 which is pointing to same location and successfully executed below 2 commands but post this when i trying to execute sample select statement , i am getting empty set. But data is present on S3 which i am accessing via hive 2.1.1. set hive.msck.path.validation=ignore; msck repair table ; Let me know if i have missed anything. Regards, Anup Tiwari
Re: repair partition on hive transactional table is not working
Hi, I have able to resolve above issue by setting below parameter :- set hive.msck.path.validation=ignore; Regards, Anup Tiwari On Thu, Sep 6, 2018 at 3:48 PM Anup Tiwari wrote: > Hi All, > > Can anyone look into it? > > On Wed, 5 Sep 2018 19:28 Anup Tiwari, wrote: > >> Hi All, >> >> I have executed "msck repair table " on my hive ACID table and >> it printed message that partitions added but when i am querying it; it is >> giving below error :- >> >> >> >> Vertex failed, vertexName=Map 1, vertexId=vertex_1536134751043_0020_2_00, >> diagnostics=[Task failed, taskId=task_1536134751043_0020_2_00_03, >> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( >> failure ) : >> attempt_1536134751043_0020_2_00_03_0:java.lang.RuntimeException: >> java.lang.RuntimeException: java.io.IOException: >> java.io.FileNotFoundException: *Path is not a file: >> /user/hive/warehouse/fact_t_mp_basic_all_report_bucket/genddate=2017-04-02* >> at >> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76) >> at >> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62) >> at >> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1819) >> at >> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:692) >> at >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381) >> at >> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >> at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) >> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850) >> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489) >> >> at >> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) >> at >> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) >> at >> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) >> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> Caused by: java.lang.RuntimeException: java.io.IOException: >> java.io.FileNotFoundException: Path is not a file: >> /user/hive/warehouse/fact_t_mp_basic_all_report_bucket/genddate=2017-04-02 >> >> >> >> >> I can understand the error, but is it a bug that hive is not able to >> navigate to final file /data ? >> >> Regards, >> Anup Tiwari >> >
Re: repair partition on hive transactional table is not working
Hi All, Can anyone look into it? On Wed, 5 Sep 2018 19:28 Anup Tiwari, wrote: > Hi All, > > I have executed "msck repair table " on my hive ACID table and > it printed message that partitions added but when i am querying it; it is > giving below error :- > > > > Vertex failed, vertexName=Map 1, vertexId=vertex_1536134751043_0020_2_00, > diagnostics=[Task failed, taskId=task_1536134751043_0020_2_00_03, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1536134751043_0020_2_00_03_0:java.lang.RuntimeException: > java.lang.RuntimeException: java.io.IOException: > java.io.FileNotFoundException: *Path is not a file: > /user/hive/warehouse/fact_t_mp_basic_all_report_bucket/genddate=2017-04-02* > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62) > at > org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1819) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:692) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489) > > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: java.io.IOException: > java.io.FileNotFoundException: Path is not a file: > /user/hive/warehouse/fact_t_mp_basic_all_report_bucket/genddate=2017-04-02 > > > > > I can understand the error, but is it a bug that hive is not able to > navigate to final file /data ? > > Regards, > Anup Tiwari >
repair partition on hive transactional table is not working
Hi All, I have executed "msck repair table " on my hive ACID table and it printed message that partitions added but when i am querying it; it is giving below error :- Vertex failed, vertexName=Map 1, vertexId=vertex_1536134751043_0020_2_00, diagnostics=[Task failed, taskId=task_1536134751043_0020_2_00_03, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1536134751043_0020_2_00_03_0:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.io.FileNotFoundException: *Path is not a file: /user/hive/warehouse/fact_t_mp_basic_all_report_bucket/genddate=2017-04-02* at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1819) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:692) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: java.io.IOException: java.io.FileNotFoundException: Path is not a file: /user/hive/warehouse/fact_t_mp_basic_all_report_bucket/genddate=2017-04-02 I can understand the error, but is it a bug that hive is not able to navigate to final file /data ? Regards, Anup Tiwari
Re: Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables
Hi Thai, Any links or examples for achieving this? Since I do not have much idea of this. On Thu, 30 Aug 2018 20:08 Thai Bui, wrote: > Another option is to implement a custom ParquetInputFormat extending the > current Hive MR Parquet format and handle schema coersion at the input > split/record reader level. This would be more involving but guarantee to > work if you could add auxiliary jars to your Hive cluster. > > On Wed, Aug 29, 2018 at 8:06 AM Anup Tiwari > wrote: > >> Hi All, >> >> We have a use case where we have created a partition external table in >> hive 2.3.3 which is pointing to a parquet location where we have date level >> folder and on some days parquet was created by hive 2.1.1 and on some days >> it was created by using glue. Now when we trying to read this data, we are >> getting below error :- >> >> Vertex failed, vertexName=Map 1, vertexId=vertex_1535191533874_0135_2_00, >> diagnostics=[Task failed, taskId=task_1535191533874_0135_2_00_00, >> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( >> failure ) : >> attempt_1535191533874_0135_2_00_00_0:java.lang.RuntimeException: >> java.lang.RuntimeException: >> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while >> processing row [Error getting row data with exception >> java.lang.UnsupportedOperationException: Cannot inspect >> org.apache.hadoop.hive.serde2.io.DateWritable >> at >> org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77) >> at >> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:247) >> at >> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:366) >> at >> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:202) >> at >> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:188) >> at >> org.apache.hadoop.hive.ql.exec.MapOperator.toErrorMessage(MapOperator.java:588) >> at >> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:554) >> at >> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86) >> at >> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70) >> at >> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) >> at >> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185) >> at >> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) >> at >> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) >> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> ] >> at >> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) >> at >> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) >> at >> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) >> at >> org.apache.tez.runtime.task.TaskRunner2Callable.call
Re: Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables
Hi, > optional int32 action_date (DATE); > optional binary action_date (UTF8); Those two column types aren't convertible implicitly between each other, which is probably the problem In above statement, are you referring to date/utf-8 OR int32/binary.. Because I believe string should be able to handle integer as well. On Wed, 29 Aug 2018 21:49 Gopal Vijayaraghavan, wrote: > Hi, > > > on some days parquet was created by hive 2.1.1 and on some days it was > created by using glue > … > > After some drill down i saw schema of columns inside both type of > parquet file using parquet tool and found different data types for some > column > ... > > optional int32 action_date (DATE); > > optional binary action_date (UTF8); > > Those two column types aren't convertible implicitly between each other, > which is probably the problem. > > You'll have to create 2 different external tables and separate the files > into different directories, create tables with appropriate schema and write > a view to do UNION ALL. > > If your goal is to have good performance for Hive, it is best to write the > data from Hive & maintain the schema on write. > > Cheers, > Gopal > > >
Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables
for us as we have data for last 2-3 years. Please find below table which says data type of both file for some column along with data type in external table for same column and "test result" specifies whether i was able to read it or not. *Parquet made with Hive 2.1.1* *Parquet made with AWS Glue* *Final Hive Table for reading both of these* *Test Result* optional int32 action_date (DATE); optional binary action_date (UTF8); action_date : string Fail optional int64 user_id; optional int64 user_id; user_id : bigint Pass optional binary tracking_type (UTF8); required binary tracking_type (UTF8); tracking_type : string Fail optional int32 game_number; optional int64 game_number; game_number : bigint Pass Regards, Anup Tiwari
Apache Hadoop - Hive - Tez compatibility
Hi All, I just wanted to know that if Hadoop 2.9.1 is compatible with hive 2.3.3 and Tez 0.9.0? Also are we maintaining somewhere such info?
Re: Hive Row number Use case
Can someone look into this and revert if possible? Regards, Anup Tiwari On Sat, Jul 14, 2018 at 12:28 AM, Anup Tiwari wrote: > Hi All, > > Can someone look into this and revert if possible? > > Thanks. > > > On Thu, 12 Jul 2018 12:56 Anup Tiwari, wrote: > >> Hi All, >> >> We have a use case where we want to assign a row number to a table based >> on 3 column ( uid, update_date, flag) i.e. if value of any of the column >> gets changed, we want to reset this number. Please find below sample input >> data and expected output data. >> >> Also please note that we have tried row_number() over(partition by uid, >> update_date, flag order by update_time asc) but due to this actual >> input ordering got break due to I believe partition by clause because it >> seems partition by creates group within column specified and then it start >> row number and due to this actual ordering is breaking. So i just wanted to >> know that is there any function available in hive which can give us below >> result OR we are missing something in window function? >> >> >> *Input Data :- * >> >> *uid* *update_date* *update_time* *flag* >> 468730 2017-07-12 12/07/2017 22:59:17 1 >> 468730 2017-07-12 12/07/2017 23:02:14 0 >> 468730 2017-07-12 12/07/2017 23:07:40 0 >> 468730 2017-07-12 12/07/2017 23:12:41 0 >> 468730 2017-07-12 12/07/2017 23:22:06 0 >> 468730 2017-07-12 12/07/2017 23:38:35 0 >> 468730 2017-07-12 12/07/2017 23:44:19 0 >> 468730 2017-07-12 12/07/2017 23:47:49 1 >> 468730 2017-07-12 12/07/2017 23:48:49 1 >> 468730 2017-07-12 12/07/2017 23:53:31 0 >> 468730 2017-07-12 12/07/2017 23:57:01 1 >> 468730 2017-07-13 13/07/2017 00:03:10 1 >> 468730 2017-07-13 13/07/2017 00:06:35 0 >> 468730 2017-07-13 13/07/2017 00:07:29 1 >> 468731 2017-07-13 12/07/2017 12:59:17 1 >> 468731 2017-07-13 12/07/2017 13:02:14 0 >> 468731 2017-07-13 12/07/2017 13:07:40 0 >> 468731 2017-07-13 12/07/2017 13:12:41 0 >> >> >> *Output Data :-* >> >> *uid* *update_date* *update_time* *flag* *required_row_num* >> 468730 2017-07-12 12/07/2017 22:59:17 1 1 >> 468730 2017-07-12 12/07/2017 23:02:14 0 1 >> 468730 2017-07-12 12/07/2017 23:07:40 0 2 >> 468730 2017-07-12 12/07/2017 23:12:41 0 3 >> 468730 2017-07-12 12/07/2017 23:22:06 0 4 >> 468730 2017-07-12 12/07/2017 23:38:35 0 5 >> 468730 2017-07-12 12/07/2017 23:44:19 0 6 >> 468730 2017-07-12 12/07/2017 23:47:49 1 1 >> 468730 2017-07-12 12/07/2017 23:48:49 1 2 >> 468730 2017-07-12 12/07/2017 23:53:31 0 1 >> 468730 2017-07-12 12/07/2017 23:57:01 1 1 >> 468730 2017-07-13 13/07/2017 00:03:10 1 1 >> 468730 2017-07-13 13/07/2017 00:06:35 0 1 >> 468730 2017-07-13 13/07/2017 00:07:29 1 1 >> 468731 2017-07-13 12/07/2017 12:59:17 1 1 >> 468731 2017-07-13 12/07/2017 13:02:14 0 1 >> 468731 2017-07-13 12/07/2017 13:07:40 0 2 >> 468731 2017-07-13 12/07/2017 13:12:41 0 3 >> *FYI :* We are one Hive 2.3.1. >> >
Re: Hive Row number Use case
Hi All, Can someone look into this and revert if possible? Thanks. On Thu, 12 Jul 2018 12:56 Anup Tiwari, wrote: > Hi All, > > We have a use case where we want to assign a row number to a table based > on 3 column ( uid, update_date, flag) i.e. if value of any of the column > gets changed, we want to reset this number. Please find below sample input > data and expected output data. > > Also please note that we have tried row_number() over(partition by uid, > update_date, flag order by update_time asc) but due to this actual input > ordering got break due to I believe partition by clause because it seems > partition by creates group within column specified and then it start row > number and due to this actual ordering is breaking. So i just wanted to > know that is there any function available in hive which can give us below > result OR we are missing something in window function? > > > *Input Data :- * > > *uid* *update_date* *update_time* *flag* > 468730 2017-07-12 12/07/2017 22:59:17 1 > 468730 2017-07-12 12/07/2017 23:02:14 0 > 468730 2017-07-12 12/07/2017 23:07:40 0 > 468730 2017-07-12 12/07/2017 23:12:41 0 > 468730 2017-07-12 12/07/2017 23:22:06 0 > 468730 2017-07-12 12/07/2017 23:38:35 0 > 468730 2017-07-12 12/07/2017 23:44:19 0 > 468730 2017-07-12 12/07/2017 23:47:49 1 > 468730 2017-07-12 12/07/2017 23:48:49 1 > 468730 2017-07-12 12/07/2017 23:53:31 0 > 468730 2017-07-12 12/07/2017 23:57:01 1 > 468730 2017-07-13 13/07/2017 00:03:10 1 > 468730 2017-07-13 13/07/2017 00:06:35 0 > 468730 2017-07-13 13/07/2017 00:07:29 1 > 468731 2017-07-13 12/07/2017 12:59:17 1 > 468731 2017-07-13 12/07/2017 13:02:14 0 > 468731 2017-07-13 12/07/2017 13:07:40 0 > 468731 2017-07-13 12/07/2017 13:12:41 0 > > > *Output Data :-* > > *uid* *update_date* *update_time* *flag* *required_row_num* > 468730 2017-07-12 12/07/2017 22:59:17 1 1 > 468730 2017-07-12 12/07/2017 23:02:14 0 1 > 468730 2017-07-12 12/07/2017 23:07:40 0 2 > 468730 2017-07-12 12/07/2017 23:12:41 0 3 > 468730 2017-07-12 12/07/2017 23:22:06 0 4 > 468730 2017-07-12 12/07/2017 23:38:35 0 5 > 468730 2017-07-12 12/07/2017 23:44:19 0 6 > 468730 2017-07-12 12/07/2017 23:47:49 1 1 > 468730 2017-07-12 12/07/2017 23:48:49 1 2 > 468730 2017-07-12 12/07/2017 23:53:31 0 1 > 468730 2017-07-12 12/07/2017 23:57:01 1 1 > 468730 2017-07-13 13/07/2017 00:03:10 1 1 > 468730 2017-07-13 13/07/2017 00:06:35 0 1 > 468730 2017-07-13 13/07/2017 00:07:29 1 1 > 468731 2017-07-13 12/07/2017 12:59:17 1 1 > 468731 2017-07-13 12/07/2017 13:02:14 0 1 > 468731 2017-07-13 12/07/2017 13:07:40 0 2 > 468731 2017-07-13 12/07/2017 13:12:41 0 3 > *FYI :* We are one Hive 2.3.1. >
Hive Row number Use case
Hi All, We have a use case where we want to assign a row number to a table based on 3 column ( uid, update_date, flag) i.e. if value of any of the column gets changed, we want to reset this number. Please find below sample input data and expected output data. Also please note that we have tried row_number() over(partition by uid, update_date, flag order by update_time asc) but due to this actual input ordering got break due to I believe partition by clause because it seems partition by creates group within column specified and then it start row number and due to this actual ordering is breaking. So i just wanted to know that is there any function available in hive which can give us below result OR we are missing something in window function? *Input Data :- * *uid* *update_date* *update_time* *flag* 468730 2017-07-12 12/07/2017 22:59:17 1 468730 2017-07-12 12/07/2017 23:02:14 0 468730 2017-07-12 12/07/2017 23:07:40 0 468730 2017-07-12 12/07/2017 23:12:41 0 468730 2017-07-12 12/07/2017 23:22:06 0 468730 2017-07-12 12/07/2017 23:38:35 0 468730 2017-07-12 12/07/2017 23:44:19 0 468730 2017-07-12 12/07/2017 23:47:49 1 468730 2017-07-12 12/07/2017 23:48:49 1 468730 2017-07-12 12/07/2017 23:53:31 0 468730 2017-07-12 12/07/2017 23:57:01 1 468730 2017-07-13 13/07/2017 00:03:10 1 468730 2017-07-13 13/07/2017 00:06:35 0 468730 2017-07-13 13/07/2017 00:07:29 1 468731 2017-07-13 12/07/2017 12:59:17 1 468731 2017-07-13 12/07/2017 13:02:14 0 468731 2017-07-13 12/07/2017 13:07:40 0 468731 2017-07-13 12/07/2017 13:12:41 0 *Output Data :-* *uid* *update_date* *update_time* *flag* *required_row_num* 468730 2017-07-12 12/07/2017 22:59:17 1 1 468730 2017-07-12 12/07/2017 23:02:14 0 1 468730 2017-07-12 12/07/2017 23:07:40 0 2 468730 2017-07-12 12/07/2017 23:12:41 0 3 468730 2017-07-12 12/07/2017 23:22:06 0 4 468730 2017-07-12 12/07/2017 23:38:35 0 5 468730 2017-07-12 12/07/2017 23:44:19 0 6 468730 2017-07-12 12/07/2017 23:47:49 1 1 468730 2017-07-12 12/07/2017 23:48:49 1 2 468730 2017-07-12 12/07/2017 23:53:31 0 1 468730 2017-07-12 12/07/2017 23:57:01 1 1 468730 2017-07-13 13/07/2017 00:03:10 1 1 468730 2017-07-13 13/07/2017 00:06:35 0 1 468730 2017-07-13 13/07/2017 00:07:29 1 1 468731 2017-07-13 12/07/2017 12:59:17 1 1 468731 2017-07-13 12/07/2017 13:02:14 0 1 468731 2017-07-13 12/07/2017 13:07:40 0 2 468731 2017-07-13 12/07/2017 13:12:41 0 3 *FYI :* We are one Hive 2.3.1.
Ways to reduce launching time of query in Hive 2.2.1
Hi All, We have a use case where we need to return output in < 10 sec. We have evaluated different set of tool for execution and they work find but they do not cover all cases as well as they are not reliable(since they are in evolving phase). But Hive works well in this context. Using Hive LLAP, we have reduced query time to 6-7sec. But query launching takes ~12-15 sec due to which response time becomes 18-21 sec. Is there any way we can reduce this launching time? Please note that we have tried prewarm containers but when we are launching query from hive client then it is not picking containers from already initialized containers rather it launches its own. Please let me know how can we overcome this issue since this is the only problem which is stopping us from using Hive. Any links/description is really appreciated. Regards, Anup Tiwari
Re: Unable to read hive external table data which is linked to s3 after upgradation from 2.1.1 to 2.3.3
looping dev in for suggestions. Regards, Anup Tiwari On Thu, Apr 12, 2018 at 8:39 PM, Richard A. Bross <r...@oaktreepeak.com> wrote: > To be honest, my advice was just based on your post - we only use Hive in > AWS EMR, so I couldn't tell you. > > Glad that at least you're back up though. > > - Original Message - > From: "Anup Tiwari" <anupsdtiw...@gmail.com> > To: user@hive.apache.org > Sent: Thursday, April 12, 2018 10:50:23 AM > Subject: Re: Unable to read hive external table data which is linked to s3 > after upgradation from 2.1.1 to 2.3.3 > > > > > > > > > Hi Richard , > > After looking at hive 2.3.3 logs i found that when we are loading all > configuration parameters then below message is getting printed :- > > Values omitted for security reason if present: [fs.s3n.awsAccessKeyId, > fs.s3a.access.key, fs.s3.awsAccessKeyId, hive.server2.keystore.password, f > s.s3a.proxy.password, javax.jdo.option.ConnectionPassword, > fs.s3.awsSecretAccessKey, fs.s3n.awsSecretAccessKey, fs.s3a.secret.key] > > > while in hive 2.1.1 logs i found below message :- > > Values omitted for security reason if present: > [hive.server2.keystore.password, > javax.jdo.option.ConnectionPassword] > > Can this be the reason why hive 2.3.3 is not able to read s3 related > params from hive-site.xml? > > I found one of JIRA : https://issues.apache.org/jira/browse/HIVE-14588 > > > > Also i have set below property in hive-site.xml by excluding above s3 > variables(fs.s3a.access.key,fs.s3a.secret.key) from list and it worked. > > > hive.conf.hidden.list > javax.jdo.option.ConnectionPassword,hive. > server2.keystore.password,fs.s3.awsAccessKeyId,fs.s3. > awsSecretAccessKey,fs.s3n.awsAccessKeyId,fs.s3n.awsSecretAccessKey,fs.s3a. > proxy.password > Comma separated list of configuration options which should > not be read by normal user like passwords.Anup has excluded 2 variable > which were fs.s3a.access.key,fs.s3a.secret.key > > > Let me know if there is any other solution because i think if these > variables are by default part of hidden.list then there will be some other > proper workaround for this. > > > > > > > > > > Regards, > Anup Tiwari > > > On Thu, Apr 12, 2018 at 7:44 PM, Richard A. Bross < r...@oaktreepeak.com > > wrote: > > > I hear you, but given the exception log, it does seem that it can't > authenticate you. You can try using the AWS environment variables. If that > resolves the issue then you'll have some more to go on. According to > Hortonworks here: > > https://hortonworks.github.io/hdp-aws/s3-security/ > > "AWS CLI supports authentication through environment variables. These same > environment variables will be used by Hadoop if no configuration properties > are set." > > > - Original Message - > From: "Anup Tiwari" < anupsdtiw...@gmail.com > > To: user@hive.apache.org > Sent: Thursday, April 12, 2018 10:06:33 AM > Subject: Re: Unable to read hive external table data which is linked to s3 > after upgradation from 2.1.1 to 2.3.3 > > > We are not using EMR. Also we have set below params for accessing s3 > bucket in hive-site.xml which are same as what we have set in hive 2.1.1. > > > * fs.s3a.access.key > * fs.s3a.secret.key > * fs.s3a.connection.maximum > * fs.s3a.impl > > > > > > > > > Regards, > Anup Tiwari > > > On Thu, Apr 12, 2018 at 7:19 PM, Richard A. Bross < r...@oaktreepeak.com > > wrote: > > > Based on the exception, it looks more like an AWS credentials issue than a > Hive issue. Are you running in AWS EMR, on-prem? > > In AWS the resource accessing the S3 bucket would have to have an IAM that > gave permission. If you are running somewhere else whatever AWS login you > use would have to have the correct permissions in the IAM. > > > > - Original Message - > From: "Anup Tiwari" < anupsdtiw...@gmail.com > > To: user@hive.apache.org > Sent: Thursday, April 12, 2018 9:11:37 AM > Subject: Unable to read hive external table data which is linked to s3 > after upgradation from 2.1.1 to 2.3.3 > > > > Hi All, > > When i am trying to read s3 linked external table in 2.3.3 ; i am getting > errors.It was working properly in 2.1.1. please find below details and let > me know if i am missing something :- > > > Hadoop Version :- 2.8.0 > > Query :- > > select log_date,count(1) as cnt from test.tt1 group by log_date; > > Error :- > > Vertex failed, vertexName=Map 1, vertexId=vertex_1523502631429_0029_3_00, > diagnostics=[Vertex vertex_1523502631429_0029_3_00
Re: Unable to read hive external table data which is linked to s3 after upgradation from 2.1.1 to 2.3.3
Hi Richard, After looking at hive *2.3.3* logs i found that when we are loading all configuration parameters then below message is getting printed :- Values omitted for security reason if present: [fs.s3n.awsAccessKeyId, fs.s3a.access.key, fs.s3.awsAccessKeyId, hive.server2.keystore.password, f s.s3a.proxy.password, javax.jdo.option.ConnectionPassword, fs.s3.awsSecretAccessKey, fs.s3n.awsSecretAccessKey, fs.s3a.secret.key] while in hive *2.1.1* logs i found below message :- Values omitted for security reason if present: [hive.server2.keystore.password, javax.jdo.option.ConnectionPassword] Can this be the reason why hive 2.3.3 is not able to read s3 related params from hive-site.xml? I found one of JIRA : https://issues.apache.org/jira/browse/HIVE-14588 Also i have set below property in hive-site.xml by excluding above s3 variables(fs.s3a.access.key,fs.s3a.secret.key) from list and it worked. hive.conf.hidden.list javax.jdo.option.ConnectionPassword,hive.server2.keystore.password,fs.s3.awsAccessKeyId,fs.s3.awsSecretAccessKey,fs.s3n.awsAccessKeyId,fs.s3n.awsSecretAccessKey,fs.s3a.proxy.password Comma separated list of configuration options which should not be read by normal user like passwords.Anup has excluded 2 variable which were fs.s3a.access.key,fs.s3a.secret.key Let me know if there is any other solution because i think if these variables are by default part of hidden.list then there will be some other proper workaround for this. Regards, Anup Tiwari On Thu, Apr 12, 2018 at 7:44 PM, Richard A. Bross <r...@oaktreepeak.com> wrote: > I hear you, but given the exception log, it does seem that it can't > authenticate you. You can try using the AWS environment variables. If > that resolves the issue then you'll have some more to go on. According to > Hortonworks here: > > https://hortonworks.github.io/hdp-aws/s3-security/ > > "AWS CLI supports authentication through environment variables. These same > environment variables will be used by Hadoop if no configuration properties > are set." > > > - Original Message - > From: "Anup Tiwari" <anupsdtiw...@gmail.com> > To: user@hive.apache.org > Sent: Thursday, April 12, 2018 10:06:33 AM > Subject: Re: Unable to read hive external table data which is linked to s3 > after upgradation from 2.1.1 to 2.3.3 > > > We are not using EMR. Also we have set below params for accessing s3 > bucket in hive-site.xml which are same as what we have set in hive 2.1.1. > > > * fs.s3a.access.key > * fs.s3a.secret.key > * fs.s3a.connection.maximum > * fs.s3a.impl > > > > > > > Regards, > Anup Tiwari > > > On Thu, Apr 12, 2018 at 7:19 PM, Richard A. Bross < r...@oaktreepeak.com > > wrote: > > > Based on the exception, it looks more like an AWS credentials issue than a > Hive issue. Are you running in AWS EMR, on-prem? > > In AWS the resource accessing the S3 bucket would have to have an IAM that > gave permission. If you are running somewhere else whatever AWS login you > use would have to have the correct permissions in the IAM. > > > > - Original Message - > From: "Anup Tiwari" < anupsdtiw...@gmail.com > > To: user@hive.apache.org > Sent: Thursday, April 12, 2018 9:11:37 AM > Subject: Unable to read hive external table data which is linked to s3 > after upgradation from 2.1.1 to 2.3.3 > > > > Hi All, > > When i am trying to read s3 linked external table in 2.3.3 ; i am getting > errors.It was working properly in 2.1.1. please find below details and let > me know if i am missing something :- > > > Hadoop Version :- 2.8.0 > > Query :- > > select log_date,count(1) as cnt from test.tt1 group by log_date; > > Error :- > > Vertex failed, vertexName=Map 1, vertexId=vertex_1523502631429_0029_3_00, > diagnostics=[Vertex vertex_1523502631429_0029_3_00 [Map 1] killed/failed > due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: tt1 initializer failed, > vertex=vertex_1523502631429_0029_3_00 [Map 1], > org.apache.hadoop.fs.s3a.AWSClientIOException: > doesBucketExist on g24x7.new-analytics: com.amazonaws.AmazonClientException: > No AWS Credentials provided by BasicAWSCredentialsProvider > EnvironmentVariableCredentialsProvider > SharedInstanceProfileCredentialsProvider > : com.amazonaws.AmazonClientException: The requested metadata is not > found at http://16x.xxx.xxx.xx4/latest/meta-data/iam/security-credentials/ > : No AWS Credentials provided by BasicAWSCredentialsProvider > EnvironmentVariableCredentialsProvider > SharedInstanceProfileCredentialsProvider > : com.amazonaws.AmazonClientException: The requested metadata is not > found at http://16x.xxx.xxx.xx4/latest/meta-data/iam/security-credentia
Re: Unable to read hive external table data which is linked to s3 after upgradation from 2.1.1 to 2.3.3
We are not using EMR. Also we have set below params for accessing s3 bucket in hive-site.xml which are same as what we have set in hive 2.1.1. - fs.s3a.access.key - fs.s3a.secret.key - fs.s3a.connection.maximum - fs.s3a.impl Regards, Anup Tiwari On Thu, Apr 12, 2018 at 7:19 PM, Richard A. Bross <r...@oaktreepeak.com> wrote: > Based on the exception, it looks more like an AWS credentials issue than a > Hive issue. Are you running in AWS EMR, on-prem? > > In AWS the resource accessing the S3 bucket would have to have an IAM that > gave permission. If you are running somewhere else whatever AWS login you > use would have to have the correct permissions in the IAM. > > - Original Message - > From: "Anup Tiwari" <anupsdtiw...@gmail.com> > To: user@hive.apache.org > Sent: Thursday, April 12, 2018 9:11:37 AM > Subject: Unable to read hive external table data which is linked to s3 > after upgradation from 2.1.1 to 2.3.3 > > > > Hi All, > > When i am trying to read s3 linked external table in 2.3.3 ; i am getting > errors.It was working properly in 2.1.1. please find below details and let > me know if i am missing something :- > > > Hadoop Version :- 2.8.0 > > Query :- > > select log_date,count(1) as cnt from test.tt1 group by log_date; > > Error :- > > Vertex failed, vertexName=Map 1, vertexId=vertex_1523502631429_0029_3_00, > diagnostics=[Vertex vertex_1523502631429_0029_3_00 [Map 1] killed/failed > due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: tt1 initializer failed, > vertex=vertex_1523502631429_0029_3_00 [Map 1], > org.apache.hadoop.fs.s3a.AWSClientIOException: > doesBucketExist on g24x7.new-analytics: com.amazonaws.AmazonClientException: > No AWS Credentials provided by BasicAWSCredentialsProvider > EnvironmentVariableCredentialsProvider > SharedInstanceProfileCredentialsProvider > : com.amazonaws.AmazonClientException: The requested metadata is not > found at http://16x.xxx.xxx.xx4/latest/meta-data/iam/security-credentials/ > : No AWS Credentials provided by BasicAWSCredentialsProvider > EnvironmentVariableCredentialsProvider > SharedInstanceProfileCredentialsProvider > : com.amazonaws.AmazonClientException: The requested metadata is not > found at http://16x.xxx.xxx.xx4/latest/meta-data/iam/security-credentials/ > at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128) > at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists( > S3AFileSystem.java:288) > at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize( > S3AFileSystem.java:236) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) > at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus( > FileInputFormat.java:265) > at org.apache.hadoop.mapred.FileInputFormat.listStatus( > FileInputFormat.java:236) > at org.apache.hadoop.mapred.FileInputFormat.getSplits( > FileInputFormat.java:322) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup( > HiveInputFormat.java:442) > at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits( > HiveInputFormat.java:561) > at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator. > initialize(HiveSplitGenerator.java:196) > at org.apache.tez.dag.app.dag.RootInputInitializerManager$ > InputInitializerCallable$1.run(RootInputInitializerManager.java:278) > at org.apache.tez.dag.app.dag.RootInputInitializerManager$ > InputInitializerCallable$1.run(RootInputInitializerManager.java:269) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1807) > at org.apache.tez.dag.app.dag.RootInputInitializerManager$ > InputInitializerCallable.call(RootInputInitializerManager.java:269) > at org.apache.tez.dag.app.dag.RootInputInitializerManager$ > InputInitializerCallable.call(RootInputInitializerManager.java:253) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: com.amazonaws.AmazonClientException: No AWS Credentials > provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider > Shared
Loading data from one swift container into external table pointing to another container fails
Hello, Loading data from one swift container into external table pointing to another container is failing for me. hive> load data inpath 'swift://mycontainer.default/source_test' into table my_test; Loading data to table default.my_test Failed with exception copyFiles: error while moving files!!! Cannot move swift://mycontainer.default/source_test/AK.TXT to swift://test .default/my_test/AK.TXT FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask hive> Here is complete stacktrace 2018-02-02 21:27:28,309 ERROR [main]: exec.Task (SessionState.java:printError(962)) - Failed with exception copyFiles: error while moving files!!! Cannot move swift://mycontainer.default/source_test/AK.TXT to swift://test.default/my_test/AK.TXT org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error while moving files!!! Cannot move swift://mycontainer.default/source_test/AK.TXT to swift://test.default/my_test/AK.TXT at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2784) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1664) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1116) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:379) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:739) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException:Cannot move swift://mycontainer.default/source_test/AK.TXT to swift://test.default/my_test/AK.TXT at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2777) ... 21 more hive version = 1.2.1 Any help is appreciated. Thanks
insert on condition
Hello All, Is it possible to achieve " Insert overwrite into a partition only if it does not exist " ? Partition is a string date. Thanks, Anup
issue with hive jdbc
Hello, I am getting this exception when my query finishes which results in job failure. java.sql.SQLException: org.apache.http.NoHttpResponseException: The target server failed to respond at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) Any help is appreciated !! THanks !!! Anup
Re: need help with hive session getting killed
I am doing insert overwrite so results of query are getting written to an external table stored in parquet. Which the properties should I set and Can I set those in query ? Thanks > On Sep 20, 2016, at 8:27 AM, Jörn Franke <jornfra...@gmail.com> wrote: > > Increase timeout or let the result of the query be written in a dedicated > table. > >> On 20 Sep 2016, at 16:57, anup ahire <ahirea...@gmail.com> wrote: >> >> >> >> >> Hello, >> >> I am using hive-jdbc-1.2.1 to run a query. Query runs around an hour and >> eventually completes. >> But my hive session terminats before I can get results from completed query >> and job fails. >> java.sql.SQLException: org.apache.http.NoHttpResponseException: The target >> server failed to respond >> >> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) >> ~[hive-jdbc-1.2.1.jar!/:1.2.1] >> >> What is the best way to address this ? >> >> >> >> Thanks , >> >> Anup >> >>