Re: Hive Config for ignoring Glacier Object while querying

2020-03-02 Thread Anup Tiwari
Hi Team,

It will be really helpful if someone can guide me here ?

Regards,
Anup Tiwari


On Fri, Feb 28, 2020 at 6:12 PM Anup Tiwari  wrote:

> Hi Team,
>
> Do we have any config like we have in presto which ignores glacier objects
> rather than failing the query ?
>
> *hive.s3.skip-glacier-objects
> <https://prestodb.io/docs/current/connector/hive.html> :- Ignore Glacier
> objects rather than failing the query. This will skip data that may be
> expected to be part of the table or partition. Defaults to false.*
>
> Actually i have a partition tables of which data is on S3. We have
> archived some older data to glacier so now if we run query it fails so we
> have applied a partition date condition after which query is scanning only
> those objects which are on S3 but just wanted to know that do we have any
> config for ignoring glacier objects ?
>
> Regards,
> Anup Tiwari
>


Hive Config for ignoring Glacier Object while querying

2020-02-28 Thread Anup Tiwari
Hi Team,

Do we have any config like we have in presto which ignores glacier objects
rather than failing the query ?

*hive.s3.skip-glacier-objects
<https://prestodb.io/docs/current/connector/hive.html> :- Ignore Glacier
objects rather than failing the query. This will skip data that may be
expected to be part of the table or partition. Defaults to false.*

Actually i have a partition tables of which data is on S3. We have archived
some older data to glacier so now if we run query it fails so we have
applied a partition date condition after which query is scanning only those
objects which are on S3 but just wanted to know that do we have any config
for ignoring glacier objects ?

Regards,
Anup Tiwari


Unable to Cast "array>" column to "string"

2020-01-06 Thread Anup Tiwari
Hi Team,

I have a column in parquet table of datatype "array>"
and now i have created one more table in which datatype of this column is
"string".

But when i am trying to insert data in newly table then i am getting *below
error* , let me know how can i achieve this(i already tried explicit cast
as string but it failed with similar error):-


*FAILED: ClassCastException
org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to
org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo*
2020-01-06T22:19:08,994 ERROR [main] ql.Driver: FAILED: ClassCastException
org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to
org.apach
e.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo
java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo cannot be cast to
org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeIn
fo
at
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.isRedundantConversionFunction(TypeCheckProcFactory.java:765)
at
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:869)
at
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1317)
at
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:219)
at
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:163)

Regards,
Anup Tiwari


Re: Less than(<) & Greater than(>) condition failing on string column but min/max is working

2019-08-19 Thread Anup Tiwari
Hi Team,

Can someone look into this and help me to understand the behavior and
suggest other way around?

On Thu, 15 Aug 2019 08:02 Anup Tiwari,  wrote:

> Hi Team,
>
> Can someone help me with this?
>
> On Tue, 13 Aug 2019 01:11 Anup Tiwari,  wrote:
>
>> Hi All,
>>
>> I have a use case where i wanted to store multiple columns into a single
>> map data type column but while doing so i came across a use case where i
>> have mainly 2 type of column bigint and string so i stored them in a map
>> column defined as  and then i queried the key which hold
>> integer values in a map column.
>>
>> Case 1 :- when i am applying > , < conditional operator then returning
>> output is correct.
>> Case 2 :- Taking min/max on integer column and then it is giving me
>> incorrect output.
>>
>> Now i understood that it is due datatype of map column which is "string"
>> but just a question, why min/max is working on string and <,> conditions
>> failing ?
>>
>> Please note that the column which i am querying, will always contain int
>> value so let me know if you have any suggestions.
>>
>> Combining and storing value in map datatype is necessary because each row
>> can have different-different number of metadata depending upon event name.
>>
>> Please find table structure below :-
>>
>> CREATE TABLE `anup.test_map`(
>>   `SID` string,
>>   `eventName` string,
>>   `timestamp` timestamp,
>>   `merge` map)
>> STORED AS PARQUET;
>>
>> Regards,
>> Anup Tiwari
>>
>


Re: Less than(<) & Greater than(>) condition failing on string column but min/max is working

2019-08-14 Thread Anup Tiwari
Hi Team,

Can someone help me with this?

On Tue, 13 Aug 2019 01:11 Anup Tiwari,  wrote:

> Hi All,
>
> I have a use case where i wanted to store multiple columns into a single
> map data type column but while doing so i came across a use case where i
> have mainly 2 type of column bigint and string so i stored them in a map
> column defined as  and then i queried the key which hold
> integer values in a map column.
>
> Case 1 :- when i am applying > , < conditional operator then returning
> output is correct.
> Case 2 :- Taking min/max on integer column and then it is giving me
> incorrect output.
>
> Now i understood that it is due datatype of map column which is "string"
> but just a question, why min/max is working on string and <,> conditions
> failing ?
>
> Please note that the column which i am querying, will always contain int
> value so let me know if you have any suggestions.
>
> Combining and storing value in map datatype is necessary because each row
> can have different-different number of metadata depending upon event name.
>
> Please find table structure below :-
>
> CREATE TABLE `anup.test_map`(
>   `SID` string,
>   `eventName` string,
>   `timestamp` timestamp,
>   `merge` map)
> STORED AS PARQUET;
>
> Regards,
> Anup Tiwari
>


Less than(<) & Greater than(>) condition failing on string column but min/max is working

2019-08-12 Thread Anup Tiwari
Hi All,

I have a use case where i wanted to store multiple columns into a single
map data type column but while doing so i came across a use case where i
have mainly 2 type of column bigint and string so i stored them in a map
column defined as  and then i queried the key which hold
integer values in a map column.

Case 1 :- when i am applying > , < conditional operator then returning
output is correct.
Case 2 :- Taking min/max on integer column and then it is giving me
incorrect output.

Now i understood that it is due datatype of map column which is "string"
but just a question, why min/max is working on string and <,> conditions
failing ?

Please note that the column which i am querying, will always contain int
value so let me know if you have any suggestions.

Combining and storing value in map datatype is necessary because each row
can have different-different number of metadata depending upon event name.

Please find table structure below :-

CREATE TABLE `anup.test_map`(
  `SID` string,
  `eventName` string,
  `timestamp` timestamp,
  `merge` map)
STORED AS PARQUET;

Regards,
Anup Tiwari


Unable to query remote hive metastore

2018-10-22 Thread Anup Tiwari
Hi All,

I have Hive 2.1.1 installed on one of our cluster(hadoop-1xx:9000) and now
i am trying to read it from another Hive(2.1.1) cluster(ds-1xx) but it is
throwing me below error, also it seems like HIVE-14380
<https://jira.apache.org/jira/browse/HIVE-14380> so do we have any
workaround for this OR upgrading hive is the only option :-

*ERROR :-*

2018-10-22T17:13:40,100 ERROR [main] ql.Driver: FAILED: SemanticException
Unable to determine if hdfs://hadoop-1xx:9000/usr/hive/warehouse/cad_s3 is
encrypted: java.lang.IllegalArgumentException: Wrong FS:
hdfs://hadoop-1xx:9000/usr/hive/warehouse/cad_s3, expected:
hdfs://ds-1xx:9000
org.apache.hadoop.hive.ql.parse.SemanticException: Unable to determine if
hdfs://hadoop-1xx:9000/usr/hive/warehouse/cad_s3 is encrypted:
java.lang.IllegalArgumentException: Wrong FS:
hdfs://hadoop-1xx:9000/usr/hive/warehouse/cad_s3, expected:
hdfs://ds-1xx:9000
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1860)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10775)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10826)
at
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:477)
at
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1242)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1384)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)


Regards,
Anup Tiwari


Re: How to Grant All Privileges for All Databases except one in Hive SQL

2018-09-17 Thread Anup Tiwari
Hive doesn't have a "grant select on db.*" option, which is what I think
you're looking for here.

Yes i am looking something like this only and since it is not available,
does that mean i have to go for each table ?

I am asking because we have many DBs and a lot of tables within each DB so
is there any other way ?

Regards,
Anup Tiwari


On Mon, Sep 17, 2018 at 8:48 PM Alan Gates  wrote:

> What you are seeing is correct behavior.  Select on the database means the
> user can see objects in the database (ie, tables, views).  To see contents
> of those objects you have to grant access on those objects.  Hive doesn't
> have a "grant select on db.*" option, which is what I think you're looking
> for here.
>
> Alan.
>
> On Mon, Sep 17, 2018 at 5:50 AM Anup Tiwari 
> wrote:
>
>> Hi Alan,
>>
>> I have given select access of a database to a role which is attached to a
>> user but after this also that user is not able to execute select statements
>> on tables of that database. But if i provide access at table level then
>> that is working. Can you please help me here ?
>>
>> Hive Version : 2.3.2
>>
>> Please find below steps :-
>>
>> 1. Added below confifuration in hive-site.xml
>>
>>   
>> hive.server2.enable.doAs
>> false
>>   
>>
>>   
>> hive.users.in.admin.role
>> hadoop
>>   
>>
>> 
>>  hive.security.authorization.manager
>>
>>  
>> org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory
>> 
>>
>> 
>>  hive.security.authorization.enabled
>>  true
>> 
>>
>> 
>>  hive.security.authenticator.manager
>>
>>  
>> org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator
>> 
>>
>> 2. Restarted Hive Server2.
>>
>> 3. Logged in to hive shell with hadoop user and executed below command
>> without any error :-
>>
>> set role admin;
>> create role readonly;
>> GRANT ROLE readonly TO USER `user2`;
>> GRANT SELECT ON DATABASE anup TO ROLE readonly;
>>
>> 4. Logged in to hive shell with user2 and executed below commands :-
>>
>> select * from anup.t2 limit 5;
>>
>> *Error :-*
>> Error: Error while compiling statement: FAILED:
>> HiveAccessControlException Permission denied: Principal [name=mohan.b,
>> type=USER] does not have following privileges for operation QUERY [[SELECT]
>> on Object [type=TABLE_OR_VIEW, name=anup.t2]] (state=42000,code=4)
>>
>>
>> show current roles;
>> +---+
>> |   role|
>> +---+
>> | public|
>> | readonly  |
>> +---+
>> 2 rows selected (0.085 seconds)
>>
>> SHOW GRANT ROLE `readonly` ON DATABASE anup;
>>
>> +---+++-+-+-++---++--+
>> | database  | table  | partition  | column  | principal_name  |
>> principal_type  | privilege  | grant_option  |   grant_time   | grantor  |
>>
>> +---+++-+-+-++---++--+
>> | anup  ||| | readonly|
>> ROLE| SELECT | false | 1537187896000  | hadoop   |
>>
>> +---+++-+-+-++---++--+
>>
>> Regards,
>> Anup Tiwari
>>
>>
>> On Fri, Sep 14, 2018 at 10:50 PM Alan Gates  wrote:
>>
>>> You can see a full list of what grant supports at
>>> https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-Grant
>>>
>>> There is no "grant x to user on all databases" or regex expressions for
>>> database names.  So you'll have to do the databases one by one.
>>>
>>> External security managers such as Apache Ranger (and I think Apache
>>> Sentry, but I'm not sure) can do blanket policies or default policies.
>>> This has the added advantage that as new databases are created the policies
>>> immediately apply.
>>>
>>> Alan.
>>>
>>> On Thu, Sep 13, 2018 at 10:37 PM Anup Tiwari 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Can someone reply on this?
>>>>
>>>> On Tue, 11 Sep 2018 19:21 Anup Tiwari,  wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I have similar requirement as mentioned in the link Link to question
>>>>> <https://stackoverflow.com/questions/38199021/how-to-grant-all-privileges-for-all-databases-except-one-in-hive-sql>
>>>>> .
>>>>>
>>>>> *Requirement :-*
>>>>>
>>>>> I know how to grant privileges on a database to a role in Hive SQL.
>>>>> For example, GRANT ALL ON DATABASE analyst1 TO ROLE analyst_role;
>>>>> But there are hundreds of databases on my system, it's almost
>>>>> impossible to grant one by one.
>>>>> Is it possible to grant all privileges for all databases ?
>>>>> Also Is it possible to grant all privileges for all databases except
>>>>> one database(ex: db.name = temp)?
>>>>>
>>>>>
>>>>> Regards,
>>>>> Anup Tiwari
>>>>>
>>>>


Re: How to Grant All Privileges for All Databases except one in Hive SQL

2018-09-17 Thread Anup Tiwari
Hi Alan,

I have given select access of a database to a role which is attached to a
user but after this also that user is not able to execute select statements
on tables of that database. But if i provide access at table level then
that is working. Can you please help me here ?

Hive Version : 2.3.2

Please find below steps :-

1. Added below confifuration in hive-site.xml

  
hive.server2.enable.doAs
false
  

  
hive.users.in.admin.role
hadoop
  


 hive.security.authorization.manager
 
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory



 hive.security.authorization.enabled
 true



 hive.security.authenticator.manager
 org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator


2. Restarted Hive Server2.

3. Logged in to hive shell with hadoop user and executed below command
without any error :-

set role admin;
create role readonly;
GRANT ROLE readonly TO USER `user2`;
GRANT SELECT ON DATABASE anup TO ROLE readonly;

4. Logged in to hive shell with user2 and executed below commands :-

select * from anup.t2 limit 5;

*Error :-*
Error: Error while compiling statement: FAILED: HiveAccessControlException
Permission denied: Principal [name=mohan.b, type=USER] does not have
following privileges for operation QUERY [[SELECT] on Object
[type=TABLE_OR_VIEW, name=anup.t2]] (state=42000,code=4)


show current roles;
+---+
|   role|
+---+
| public|
| readonly  |
+---+
2 rows selected (0.085 seconds)

SHOW GRANT ROLE `readonly` ON DATABASE anup;
+---+++-+-+-++---++--+
| database  | table  | partition  | column  | principal_name  |
principal_type  | privilege  | grant_option  |   grant_time   | grantor  |
+---+++-+-+-++---++--+
| anup  ||| | readonly|
ROLE| SELECT | false | 1537187896000  | hadoop   |
+---+++-+-+-++---++--+

Regards,
Anup Tiwari


On Fri, Sep 14, 2018 at 10:50 PM Alan Gates  wrote:

> You can see a full list of what grant supports at
> https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization#SQLStandardBasedHiveAuthorization-Grant
>
> There is no "grant x to user on all databases" or regex expressions for
> database names.  So you'll have to do the databases one by one.
>
> External security managers such as Apache Ranger (and I think Apache
> Sentry, but I'm not sure) can do blanket policies or default policies.
> This has the added advantage that as new databases are created the policies
> immediately apply.
>
> Alan.
>
> On Thu, Sep 13, 2018 at 10:37 PM Anup Tiwari 
> wrote:
>
>> Hi,
>>
>> Can someone reply on this?
>>
>> On Tue, 11 Sep 2018 19:21 Anup Tiwari,  wrote:
>>
>>> Hi All,
>>>
>>> I have similar requirement as mentioned in the link Link to question
>>> <https://stackoverflow.com/questions/38199021/how-to-grant-all-privileges-for-all-databases-except-one-in-hive-sql>
>>> .
>>>
>>> *Requirement :-*
>>>
>>> I know how to grant privileges on a database to a role in Hive SQL.
>>> For example, GRANT ALL ON DATABASE analyst1 TO ROLE analyst_role;
>>> But there are hundreds of databases on my system, it's almost impossible
>>> to grant one by one.
>>> Is it possible to grant all privileges for all databases ?
>>> Also Is it possible to grant all privileges for all databases except one
>>> database(ex: db.name = temp)?
>>>
>>>
>>> Regards,
>>> Anup Tiwari
>>>
>>


Re: How to Grant All Privileges for All Databases except one in Hive SQL

2018-09-13 Thread Anup Tiwari
Hi,

Can someone reply on this?

On Tue, 11 Sep 2018 19:21 Anup Tiwari,  wrote:

> Hi All,
>
> I have similar requirement as mentioned in the link Link to question
> <https://stackoverflow.com/questions/38199021/how-to-grant-all-privileges-for-all-databases-except-one-in-hive-sql>
> .
>
> *Requirement :-*
>
> I know how to grant privileges on a database to a role in Hive SQL.
> For example, GRANT ALL ON DATABASE analyst1 TO ROLE analyst_role;
> But there are hundreds of databases on my system, it's almost impossible
> to grant one by one.
> Is it possible to grant all privileges for all databases ?
> Also Is it possible to grant all privileges for all databases except one
> database(ex: db.name = temp)?
>
>
> Regards,
> Anup Tiwari
>


Re: Not able to read Hive ACID table data created by Hive 2.1.1 in hive 2.3.3

2018-09-11 Thread Anup Tiwari
Hi,

I tried option 1 mentioned in trail mail and its working but due to this
there are different access related tasks which needs to be done.

So can anyone reply on point 2 and 3, it will be helpful to understand
other possible solution.

Regards,
Anup Tiwari


On Fri, Sep 7, 2018 at 10:15 AM Anup Tiwari  wrote:

> Hi Gopal,
>
> Thanks for reply.  I have 3 questions:-
>
> 1. We have our metastore DB for hive 2.1.1 in MySQL so when you say
> the only way it works if you connect to the old metastore.
> Does that mean, if we point hive 2.3.3 to same metastore DB as 2.1.1 ,
> this will work?
>
> 2. Is this feature available in Hive 3.x?
>
> 3. Also can you please suggest best possible work around in such case on
> Hive 2.x ?
>
> On Fri, 7 Sep 2018 05:25 Gopal Vijayaraghavan,  wrote:
>
>> > msck repair table ;
>>
>> msck repair does not work on ACID tables.
>>
>> In Hive 2.x, there is no way to move, replicate or rehydrate ACID tables
>> from a cold store - the only way it works if you connect to the old
>> metastore.
>>
>> Cheers,
>> Gopal
>>
>>
>>


How to Grant All Privileges for All Databases except one in Hive SQL

2018-09-11 Thread Anup Tiwari
Hi All,

I have similar requirement as mentioned in the link Link to question
<https://stackoverflow.com/questions/38199021/how-to-grant-all-privileges-for-all-databases-except-one-in-hive-sql>
.

*Requirement :-*

I know how to grant privileges on a database to a role in Hive SQL.
For example, GRANT ALL ON DATABASE analyst1 TO ROLE analyst_role;
But there are hundreds of databases on my system, it's almost impossible to
grant one by one.
Is it possible to grant all privileges for all databases ?
Also Is it possible to grant all privileges for all databases except one
database(ex: db.name = temp)?


Regards,
Anup Tiwari


Re: Not able to read Hive ACID table data created by Hive 2.1.1 in hive 2.3.3

2018-09-06 Thread Anup Tiwari
Hi Gopal,

Thanks for reply.  I have 3 questions:-

1. We have our metastore DB for hive 2.1.1 in MySQL so when you say
the only way it works if you connect to the old metastore.
Does that mean, if we point hive 2.3.3 to same metastore DB as 2.1.1 , this
will work?

2. Is this feature available in Hive 3.x?

3. Also can you please suggest best possible work around in such case on
Hive 2.x ?

On Fri, 7 Sep 2018 05:25 Gopal Vijayaraghavan,  wrote:

> > msck repair table ;
>
> msck repair does not work on ACID tables.
>
> In Hive 2.x, there is no way to move, replicate or rehydrate ACID tables
> from a cold store - the only way it works if you connect to the old
> metastore.
>
> Cheers,
> Gopal
>
>
>


Not able to read Hive ACID table data created by Hive 2.1.1 in hive 2.3.3

2018-09-06 Thread Anup Tiwari
Hi All,

I have created an ACID manage table in hive 2.1.1 whose location is S3. Now
i am creating same table(same DDL) on hive 2.3.3 which is pointing to same
location and successfully executed below 2 commands but post this when i
trying to execute sample select statement , i am getting empty set. But
data is present on S3 which i am accessing via hive 2.1.1.

set hive.msck.path.validation=ignore;
msck repair table ;

Let me know if i have missed anything.

Regards,
Anup Tiwari


Re: repair partition on hive transactional table is not working

2018-09-06 Thread Anup Tiwari
Hi,

I have able to resolve above issue by setting below parameter :-

set hive.msck.path.validation=ignore;


Regards,
Anup Tiwari


On Thu, Sep 6, 2018 at 3:48 PM Anup Tiwari  wrote:

> Hi All,
>
> Can anyone look into it?
>
> On Wed, 5 Sep 2018 19:28 Anup Tiwari,  wrote:
>
>> Hi All,
>>
>> I have executed "msck repair table " on my hive ACID table and
>> it printed message that partitions added but when i am querying it; it is
>> giving below error :-
>>
>>
>>
>> Vertex failed, vertexName=Map 1, vertexId=vertex_1536134751043_0020_2_00,
>> diagnostics=[Task failed, taskId=task_1536134751043_0020_2_00_03,
>> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task (
>> failure ) :
>> attempt_1536134751043_0020_2_00_03_0:java.lang.RuntimeException:
>> java.lang.RuntimeException: java.io.IOException:
>> java.io.FileNotFoundException: *Path is not a file:
>> /user/hive/warehouse/fact_t_mp_basic_all_report_bucket/genddate=2017-04-02*
>> at
>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
>> at
>> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1819)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:692)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850)
>> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489)
>>
>> at
>> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>> at
>> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>> at
>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>> at
>> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>> at
>> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
>> at
>> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>> at
>> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.lang.RuntimeException: java.io.IOException:
>> java.io.FileNotFoundException: Path is not a file:
>> /user/hive/warehouse/fact_t_mp_basic_all_report_bucket/genddate=2017-04-02
>>
>>
>>
>>
>> I can understand the error, but is it a bug that hive is not able to
>> navigate to final file /data ?
>>
>> Regards,
>> Anup Tiwari
>>
>


Re: repair partition on hive transactional table is not working

2018-09-06 Thread Anup Tiwari
Hi All,

Can anyone look into it?

On Wed, 5 Sep 2018 19:28 Anup Tiwari,  wrote:

> Hi All,
>
> I have executed "msck repair table " on my hive ACID table and
> it printed message that partitions added but when i am querying it; it is
> giving below error :-
>
>
>
> Vertex failed, vertexName=Map 1, vertexId=vertex_1536134751043_0020_2_00,
> diagnostics=[Task failed, taskId=task_1536134751043_0020_2_00_03,
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task (
> failure ) :
> attempt_1536134751043_0020_2_00_03_0:java.lang.RuntimeException:
> java.lang.RuntimeException: java.io.IOException:
> java.io.FileNotFoundException: *Path is not a file:
> /user/hive/warehouse/fact_t_mp_basic_all_report_bucket/genddate=2017-04-02*
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1819)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:692)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489)
>
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: java.io.IOException:
> java.io.FileNotFoundException: Path is not a file:
> /user/hive/warehouse/fact_t_mp_basic_all_report_bucket/genddate=2017-04-02
>
>
>
>
> I can understand the error, but is it a bug that hive is not able to
> navigate to final file /data ?
>
> Regards,
> Anup Tiwari
>


repair partition on hive transactional table is not working

2018-09-05 Thread Anup Tiwari
Hi All,

I have executed "msck repair table " on my hive ACID table and
it printed message that partitions added but when i am querying it; it is
giving below error :-



Vertex failed, vertexName=Map 1, vertexId=vertex_1536134751043_0020_2_00,
diagnostics=[Task failed, taskId=task_1536134751043_0020_2_00_03,
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task (
failure ) :
attempt_1536134751043_0020_2_00_03_0:java.lang.RuntimeException:
java.lang.RuntimeException: java.io.IOException:
java.io.FileNotFoundException: *Path is not a file:
/user/hive/warehouse/fact_t_mp_basic_all_report_bucket/genddate=2017-04-02*
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
at
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1819)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:692)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489)

at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.IOException:
java.io.FileNotFoundException: Path is not a file:
/user/hive/warehouse/fact_t_mp_basic_all_report_bucket/genddate=2017-04-02




I can understand the error, but is it a bug that hive is not able to
navigate to final file /data ?

Regards,
Anup Tiwari


Re: Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables

2018-08-30 Thread Anup Tiwari
Hi Thai,

Any links or examples for achieving this?  Since I do not have much idea of
this.

On Thu, 30 Aug 2018 20:08 Thai Bui,  wrote:

> Another option is to implement a custom ParquetInputFormat extending the
> current Hive  MR Parquet format and handle schema coersion at the input
> split/record reader level. This would be more involving but guarantee to
> work if you could add auxiliary jars to your Hive cluster.
>
> On Wed, Aug 29, 2018 at 8:06 AM Anup Tiwari 
> wrote:
>
>> Hi All,
>>
>> We have a use case where we have created a partition external table in
>> hive 2.3.3 which is pointing to a parquet location where we have date level
>> folder and on some days parquet was created by hive 2.1.1 and on some days
>> it was created by using glue. Now when we trying to read this data, we are
>> getting below error :-
>>
>> Vertex failed, vertexName=Map 1, vertexId=vertex_1535191533874_0135_2_00,
>> diagnostics=[Task failed, taskId=task_1535191533874_0135_2_00_00,
>> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task (
>> failure ) :
>> attempt_1535191533874_0135_2_00_00_0:java.lang.RuntimeException:
>> java.lang.RuntimeException:
>> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
>> processing row [Error getting row data with exception
>> java.lang.UnsupportedOperationException: Cannot inspect
>> org.apache.hadoop.hive.serde2.io.DateWritable
>> at
>> org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77)
>> at
>> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:247)
>> at
>> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:366)
>> at
>> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:202)
>> at
>> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:188)
>> at
>> org.apache.hadoop.hive.ql.exec.MapOperator.toErrorMessage(MapOperator.java:588)
>> at
>> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:554)
>> at
>> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:86)
>> at
>> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:70)
>> at
>> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
>> at
>> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>> at
>> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>> at
>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>> at
>> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>> at
>> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
>> at
>> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>> at
>> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>>  ]
>> at
>> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>> at
>> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>> at
>> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>> at
>> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>> at
>> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
>> at
>> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>> at
>> org.apache.tez.runtime.task.TaskRunner2Callable.call

Re: Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables

2018-08-29 Thread Anup Tiwari
Hi,

> optional int32 action_date (DATE);
> optional binary action_date (UTF8);

Those two column types aren't convertible implicitly between each other,
which is probably the problem

In above statement,  are you referring to date/utf-8 OR int32/binary..
Because I believe string should be able to handle integer as well.

On Wed, 29 Aug 2018 21:49 Gopal Vijayaraghavan,  wrote:

> Hi,
>
> > on some days parquet was created by hive 2.1.1 and on some days it was
> created by using glue
> …
> > After some drill down i saw schema of columns inside both type of
> parquet file using parquet tool and found different data types for some
> column
> ...
> > optional int32 action_date (DATE);
> > optional binary action_date (UTF8);
>
> Those two column types aren't convertible implicitly between each other,
> which is probably the problem.
>
> You'll have to create 2 different external tables and separate the files
> into different directories, create tables with appropriate schema and write
> a view to do UNION ALL.
>
> If your goal is to have good performance for Hive, it is best to write the
> data from Hive & maintain the schema on write.
>
> Cheers,
> Gopal
>
>
>


Problem in reading parquet data from 2 different sources(Hive + Glue) using hive tables

2018-08-29 Thread Anup Tiwari
 for us
as we have data for last 2-3 years. Please find below table which says data
type of both file for some column along with data type in external table
for same column and "test result" specifies whether i was able to read it
or not.

*Parquet made with Hive 2.1.1* *Parquet made with AWS Glue* *Final Hive
Table for reading both of these* *Test Result*
optional int32 action_date (DATE); optional binary action_date (UTF8);
action_date
: string Fail
optional int64 user_id; optional int64 user_id; user_id : bigint Pass
optional binary tracking_type (UTF8); required binary tracking_type
(UTF8); tracking_type
: string Fail
optional int32 game_number; optional int64 game_number; game_number : bigint
Pass

Regards,
Anup Tiwari


Apache Hadoop - Hive - Tez compatibility

2018-08-02 Thread Anup Tiwari
Hi All,

I just wanted to know that if Hadoop 2.9.1 is compatible with hive 2.3.3
and Tez 0.9.0?

Also are we maintaining somewhere such info?


Re: Hive Row number Use case

2018-07-18 Thread Anup Tiwari
Can someone look into this and revert if possible?


Regards,
Anup Tiwari

On Sat, Jul 14, 2018 at 12:28 AM, Anup Tiwari 
wrote:

> Hi All,
>
> Can someone look into this and revert if possible?
>
> Thanks.
>
>
> On Thu, 12 Jul 2018 12:56 Anup Tiwari,  wrote:
>
>> Hi All,
>>
>> We have a use case where we want to assign a row number to a table based
>> on 3 column ( uid, update_date, flag) i.e. if value of any of the column
>> gets changed, we want to reset this number. Please find below sample input
>> data and expected output data.
>>
>> Also please note that we have tried row_number() over(partition by uid,
>> update_date, flag order by update_time asc)  but due to this actual
>> input ordering got break due to I believe partition by clause because it
>> seems partition by creates group within column specified and then it start
>> row number and due to this actual ordering is breaking. So i just wanted to
>> know that is there any function available in hive which can give us below
>> result OR we are missing something in window function?
>>
>>
>> *Input Data :- *
>>
>> *uid* *update_date* *update_time* *flag*
>> 468730 2017-07-12 12/07/2017 22:59:17 1
>> 468730 2017-07-12 12/07/2017 23:02:14 0
>> 468730 2017-07-12 12/07/2017 23:07:40 0
>> 468730 2017-07-12 12/07/2017 23:12:41 0
>> 468730 2017-07-12 12/07/2017 23:22:06 0
>> 468730 2017-07-12 12/07/2017 23:38:35 0
>> 468730 2017-07-12 12/07/2017 23:44:19 0
>> 468730 2017-07-12 12/07/2017 23:47:49 1
>> 468730 2017-07-12 12/07/2017 23:48:49 1
>> 468730 2017-07-12 12/07/2017 23:53:31 0
>> 468730 2017-07-12 12/07/2017 23:57:01 1
>> 468730 2017-07-13 13/07/2017 00:03:10 1
>> 468730 2017-07-13 13/07/2017 00:06:35 0
>> 468730 2017-07-13 13/07/2017 00:07:29 1
>> 468731 2017-07-13 12/07/2017 12:59:17 1
>> 468731 2017-07-13 12/07/2017 13:02:14 0
>> 468731 2017-07-13 12/07/2017 13:07:40 0
>> 468731 2017-07-13 12/07/2017 13:12:41 0
>>
>>
>> *Output Data :-*
>>
>> *uid* *update_date* *update_time* *flag* *required_row_num*
>> 468730 2017-07-12 12/07/2017 22:59:17 1 1
>> 468730 2017-07-12 12/07/2017 23:02:14 0 1
>> 468730 2017-07-12 12/07/2017 23:07:40 0 2
>> 468730 2017-07-12 12/07/2017 23:12:41 0 3
>> 468730 2017-07-12 12/07/2017 23:22:06 0 4
>> 468730 2017-07-12 12/07/2017 23:38:35 0 5
>> 468730 2017-07-12 12/07/2017 23:44:19 0 6
>> 468730 2017-07-12 12/07/2017 23:47:49 1 1
>> 468730 2017-07-12 12/07/2017 23:48:49 1 2
>> 468730 2017-07-12 12/07/2017 23:53:31 0 1
>> 468730 2017-07-12 12/07/2017 23:57:01 1 1
>> 468730 2017-07-13 13/07/2017 00:03:10 1 1
>> 468730 2017-07-13 13/07/2017 00:06:35 0 1
>> 468730 2017-07-13 13/07/2017 00:07:29 1 1
>> 468731 2017-07-13 12/07/2017 12:59:17 1 1
>> 468731 2017-07-13 12/07/2017 13:02:14 0 1
>> 468731 2017-07-13 12/07/2017 13:07:40 0 2
>> 468731 2017-07-13 12/07/2017 13:12:41 0 3
>> *FYI :* We are one Hive 2.3.1.
>>
>


Re: Hive Row number Use case

2018-07-13 Thread Anup Tiwari
Hi All,

Can someone look into this and revert if possible?

Thanks.


On Thu, 12 Jul 2018 12:56 Anup Tiwari,  wrote:

> Hi All,
>
> We have a use case where we want to assign a row number to a table based
> on 3 column ( uid, update_date, flag) i.e. if value of any of the column
> gets changed, we want to reset this number. Please find below sample input
> data and expected output data.
>
> Also please note that we have tried row_number() over(partition by uid,
> update_date, flag order by update_time asc)  but due to this actual input
> ordering got break due to I believe partition by clause because it seems
> partition by creates group within column specified and then it start row
> number and due to this actual ordering is breaking. So i just wanted to
> know that is there any function available in hive which can give us below
> result OR we are missing something in window function?
>
>
> *Input Data :- *
>
> *uid* *update_date* *update_time* *flag*
> 468730 2017-07-12 12/07/2017 22:59:17 1
> 468730 2017-07-12 12/07/2017 23:02:14 0
> 468730 2017-07-12 12/07/2017 23:07:40 0
> 468730 2017-07-12 12/07/2017 23:12:41 0
> 468730 2017-07-12 12/07/2017 23:22:06 0
> 468730 2017-07-12 12/07/2017 23:38:35 0
> 468730 2017-07-12 12/07/2017 23:44:19 0
> 468730 2017-07-12 12/07/2017 23:47:49 1
> 468730 2017-07-12 12/07/2017 23:48:49 1
> 468730 2017-07-12 12/07/2017 23:53:31 0
> 468730 2017-07-12 12/07/2017 23:57:01 1
> 468730 2017-07-13 13/07/2017 00:03:10 1
> 468730 2017-07-13 13/07/2017 00:06:35 0
> 468730 2017-07-13 13/07/2017 00:07:29 1
> 468731 2017-07-13 12/07/2017 12:59:17 1
> 468731 2017-07-13 12/07/2017 13:02:14 0
> 468731 2017-07-13 12/07/2017 13:07:40 0
> 468731 2017-07-13 12/07/2017 13:12:41 0
>
>
> *Output Data :-*
>
> *uid* *update_date* *update_time* *flag* *required_row_num*
> 468730 2017-07-12 12/07/2017 22:59:17 1 1
> 468730 2017-07-12 12/07/2017 23:02:14 0 1
> 468730 2017-07-12 12/07/2017 23:07:40 0 2
> 468730 2017-07-12 12/07/2017 23:12:41 0 3
> 468730 2017-07-12 12/07/2017 23:22:06 0 4
> 468730 2017-07-12 12/07/2017 23:38:35 0 5
> 468730 2017-07-12 12/07/2017 23:44:19 0 6
> 468730 2017-07-12 12/07/2017 23:47:49 1 1
> 468730 2017-07-12 12/07/2017 23:48:49 1 2
> 468730 2017-07-12 12/07/2017 23:53:31 0 1
> 468730 2017-07-12 12/07/2017 23:57:01 1 1
> 468730 2017-07-13 13/07/2017 00:03:10 1 1
> 468730 2017-07-13 13/07/2017 00:06:35 0 1
> 468730 2017-07-13 13/07/2017 00:07:29 1 1
> 468731 2017-07-13 12/07/2017 12:59:17 1 1
> 468731 2017-07-13 12/07/2017 13:02:14 0 1
> 468731 2017-07-13 12/07/2017 13:07:40 0 2
> 468731 2017-07-13 12/07/2017 13:12:41 0 3
> *FYI :* We are one Hive 2.3.1.
>


Hive Row number Use case

2018-07-12 Thread Anup Tiwari
Hi All,

We have a use case where we want to assign a row number to a table based on
3 column ( uid, update_date, flag) i.e. if value of any of the column gets
changed, we want to reset this number. Please find below sample input data
and expected output data.

Also please note that we have tried row_number() over(partition by uid,
update_date, flag order by update_time asc)  but due to this actual input
ordering got break due to I believe partition by clause because it seems
partition by creates group within column specified and then it start row
number and due to this actual ordering is breaking. So i just wanted to
know that is there any function available in hive which can give us below
result OR we are missing something in window function?


*Input Data :- *

*uid* *update_date* *update_time* *flag*
468730 2017-07-12 12/07/2017 22:59:17 1
468730 2017-07-12 12/07/2017 23:02:14 0
468730 2017-07-12 12/07/2017 23:07:40 0
468730 2017-07-12 12/07/2017 23:12:41 0
468730 2017-07-12 12/07/2017 23:22:06 0
468730 2017-07-12 12/07/2017 23:38:35 0
468730 2017-07-12 12/07/2017 23:44:19 0
468730 2017-07-12 12/07/2017 23:47:49 1
468730 2017-07-12 12/07/2017 23:48:49 1
468730 2017-07-12 12/07/2017 23:53:31 0
468730 2017-07-12 12/07/2017 23:57:01 1
468730 2017-07-13 13/07/2017 00:03:10 1
468730 2017-07-13 13/07/2017 00:06:35 0
468730 2017-07-13 13/07/2017 00:07:29 1
468731 2017-07-13 12/07/2017 12:59:17 1
468731 2017-07-13 12/07/2017 13:02:14 0
468731 2017-07-13 12/07/2017 13:07:40 0
468731 2017-07-13 12/07/2017 13:12:41 0


*Output Data :-*

*uid* *update_date* *update_time* *flag* *required_row_num*
468730 2017-07-12 12/07/2017 22:59:17 1 1
468730 2017-07-12 12/07/2017 23:02:14 0 1
468730 2017-07-12 12/07/2017 23:07:40 0 2
468730 2017-07-12 12/07/2017 23:12:41 0 3
468730 2017-07-12 12/07/2017 23:22:06 0 4
468730 2017-07-12 12/07/2017 23:38:35 0 5
468730 2017-07-12 12/07/2017 23:44:19 0 6
468730 2017-07-12 12/07/2017 23:47:49 1 1
468730 2017-07-12 12/07/2017 23:48:49 1 2
468730 2017-07-12 12/07/2017 23:53:31 0 1
468730 2017-07-12 12/07/2017 23:57:01 1 1
468730 2017-07-13 13/07/2017 00:03:10 1 1
468730 2017-07-13 13/07/2017 00:06:35 0 1
468730 2017-07-13 13/07/2017 00:07:29 1 1
468731 2017-07-13 12/07/2017 12:59:17 1 1
468731 2017-07-13 12/07/2017 13:02:14 0 1
468731 2017-07-13 12/07/2017 13:07:40 0 2
468731 2017-07-13 12/07/2017 13:12:41 0 3
*FYI :* We are one Hive 2.3.1.


Ways to reduce launching time of query in Hive 2.2.1

2018-04-16 Thread Anup Tiwari
Hi All,

We have a use case where we need to return output in < 10 sec. We have
evaluated different set of tool for execution and they work find but they
do not cover all cases as well as they are not reliable(since they are in
evolving phase). But Hive works well in this context.

Using Hive LLAP, we have reduced query time to 6-7sec. But query launching
takes ~12-15 sec due to which response time becomes 18-21 sec.

Is there any way we can reduce this launching time?

Please note that we have tried prewarm containers but when we are launching
query from hive client then it is not picking containers from already
initialized containers rather it launches its own.

Please let me know how can we overcome this issue since this is the only
problem which is stopping us from using Hive. Any links/description is
really appreciated.


Regards,
Anup Tiwari


Re: Unable to read hive external table data which is linked to s3 after upgradation from 2.1.1 to 2.3.3

2018-04-12 Thread Anup Tiwari
looping dev in for suggestions.

Regards,
Anup Tiwari

On Thu, Apr 12, 2018 at 8:39 PM, Richard A. Bross <r...@oaktreepeak.com>
wrote:

> To be honest, my advice was just based on your post - we only use Hive in
> AWS EMR, so I couldn't tell you.
>
> Glad that at least you're back up though.
>
> - Original Message -
> From: "Anup Tiwari" <anupsdtiw...@gmail.com>
> To: user@hive.apache.org
> Sent: Thursday, April 12, 2018 10:50:23 AM
> Subject: Re: Unable to read hive external table data which is linked to s3
> after upgradation from 2.1.1 to 2.3.3
>
>
>
>
>
>
>
>
> Hi Richard ,
>
> After looking at hive 2.3.3 logs i found that when we are loading all
> configuration parameters then below message is getting printed :-
>
> Values omitted for security reason if present: [fs.s3n.awsAccessKeyId,
> fs.s3a.access.key, fs.s3.awsAccessKeyId, hive.server2.keystore.password, f
> s.s3a.proxy.password, javax.jdo.option.ConnectionPassword,
> fs.s3.awsSecretAccessKey, fs.s3n.awsSecretAccessKey, fs.s3a.secret.key]
>
>
> while in hive 2.1.1 logs i found below message :-
>
> Values omitted for security reason if present: 
> [hive.server2.keystore.password,
> javax.jdo.option.ConnectionPassword]
>
> Can this be the reason why hive 2.3.3 is not able to read s3 related
> params from hive-site.xml?
>
> I found one of JIRA : https://issues.apache.org/jira/browse/HIVE-14588
>
>
>
> Also i have set below property in hive-site.xml by excluding above s3
> variables(fs.s3a.access.key,fs.s3a.secret.key) from list and it worked.
>
> 
> hive.conf.hidden.list
> javax.jdo.option.ConnectionPassword,hive.
> server2.keystore.password,fs.s3.awsAccessKeyId,fs.s3.
> awsSecretAccessKey,fs.s3n.awsAccessKeyId,fs.s3n.awsSecretAccessKey,fs.s3a.
> proxy.password
> Comma separated list of configuration options which should
> not be read by normal user like passwords.Anup has excluded 2 variable
> which were fs.s3a.access.key,fs.s3a.secret.key
> 
>
> Let me know if there is any other solution because i think if these
> variables are by default part of hidden.list then there will be some other
> proper workaround for this.
>
>
>
>
>
>
>
>
>
> Regards,
> Anup Tiwari
>
>
> On Thu, Apr 12, 2018 at 7:44 PM, Richard A. Bross < r...@oaktreepeak.com >
> wrote:
>
>
> I hear you, but given the exception log, it does seem that it can't
> authenticate you. You can try using the AWS environment variables. If that
> resolves the issue then you'll have some more to go on. According to
> Hortonworks here:
>
> https://hortonworks.github.io/hdp-aws/s3-security/
>
> "AWS CLI supports authentication through environment variables. These same
> environment variables will be used by Hadoop if no configuration properties
> are set."
>
>
> - Original Message -
> From: "Anup Tiwari" < anupsdtiw...@gmail.com >
> To: user@hive.apache.org
> Sent: Thursday, April 12, 2018 10:06:33 AM
> Subject: Re: Unable to read hive external table data which is linked to s3
> after upgradation from 2.1.1 to 2.3.3
>
>
> We are not using EMR. Also we have set below params for accessing s3
> bucket in hive-site.xml which are same as what we have set in hive 2.1.1.
>
>
> * fs.s3a.access.key
> * fs.s3a.secret.key
> * fs.s3a.connection.maximum
> * fs.s3a.impl
>
>
>
>
>
>
>
>
> Regards,
> Anup Tiwari
>
>
> On Thu, Apr 12, 2018 at 7:19 PM, Richard A. Bross < r...@oaktreepeak.com >
> wrote:
>
>
> Based on the exception, it looks more like an AWS credentials issue than a
> Hive issue. Are you running in AWS EMR, on-prem?
>
> In AWS the resource accessing the S3 bucket would have to have an IAM that
> gave permission. If you are running somewhere else whatever AWS login you
> use would have to have the correct permissions in the IAM.
>
>
>
> - Original Message -
> From: "Anup Tiwari" < anupsdtiw...@gmail.com >
> To: user@hive.apache.org
> Sent: Thursday, April 12, 2018 9:11:37 AM
> Subject: Unable to read hive external table data which is linked to s3
> after upgradation from 2.1.1 to 2.3.3
>
>
>
> Hi All,
>
> When i am trying to read s3 linked external table in 2.3.3 ; i am getting
> errors.It was working properly in 2.1.1. please find below details and let
> me know if i am missing something :-
>
>
> Hadoop Version :- 2.8.0
>
> Query :-
>
> select log_date,count(1) as cnt from test.tt1 group by log_date;
>
> Error :-
>
> Vertex failed, vertexName=Map 1, vertexId=vertex_1523502631429_0029_3_00,
> diagnostics=[Vertex vertex_1523502631429_0029_3_00

Re: Unable to read hive external table data which is linked to s3 after upgradation from 2.1.1 to 2.3.3

2018-04-12 Thread Anup Tiwari
Hi Richard,

After looking at hive *2.3.3* logs i found that when we are loading all
configuration parameters then below message is getting printed :-

Values omitted for security reason if present: [fs.s3n.awsAccessKeyId,
fs.s3a.access.key, fs.s3.awsAccessKeyId, hive.server2.keystore.password, f
s.s3a.proxy.password, javax.jdo.option.ConnectionPassword,
fs.s3.awsSecretAccessKey, fs.s3n.awsSecretAccessKey, fs.s3a.secret.key]


while in hive *2.1.1* logs i found below message :-

Values omitted for security reason if present:
[hive.server2.keystore.password, javax.jdo.option.ConnectionPassword]

Can this be the reason why hive 2.3.3 is not able to read s3 related params
from hive-site.xml?

I found one of JIRA : https://issues.apache.org/jira/browse/HIVE-14588



Also i have set below property in hive-site.xml by excluding above s3
variables(fs.s3a.access.key,fs.s3a.secret.key) from list and it worked.


hive.conf.hidden.list

javax.jdo.option.ConnectionPassword,hive.server2.keystore.password,fs.s3.awsAccessKeyId,fs.s3.awsSecretAccessKey,fs.s3n.awsAccessKeyId,fs.s3n.awsSecretAccessKey,fs.s3a.proxy.password
Comma separated list of configuration options which should
not be read by normal user like passwords.Anup has excluded 2 variable
which were fs.s3a.access.key,fs.s3a.secret.key


Let me know if there is any other solution because i think if these
variables are by default part of hidden.list then there will be some other
proper workaround for this.



Regards,
Anup Tiwari

On Thu, Apr 12, 2018 at 7:44 PM, Richard A. Bross <r...@oaktreepeak.com>
wrote:

> I hear you, but given the exception log, it does seem that it can't
> authenticate you.  You can try using the AWS environment variables.  If
> that resolves the issue then you'll have some more to go on.  According to
> Hortonworks here:
>
> https://hortonworks.github.io/hdp-aws/s3-security/
>
> "AWS CLI supports authentication through environment variables. These same
> environment variables will be used by Hadoop if no configuration properties
> are set."
>
>
> - Original Message -
> From: "Anup Tiwari" <anupsdtiw...@gmail.com>
> To: user@hive.apache.org
> Sent: Thursday, April 12, 2018 10:06:33 AM
> Subject: Re: Unable to read hive external table data which is linked to s3
> after upgradation from 2.1.1 to 2.3.3
>
>
> We are not using EMR. Also we have set below params for accessing s3
> bucket in hive-site.xml which are same as what we have set in hive 2.1.1.
>
>
> * fs.s3a.access.key
> * fs.s3a.secret.key
> * fs.s3a.connection.maximum
> * fs.s3a.impl
>
>
>
>
>
>
> Regards,
> Anup Tiwari
>
>
> On Thu, Apr 12, 2018 at 7:19 PM, Richard A. Bross < r...@oaktreepeak.com >
> wrote:
>
>
> Based on the exception, it looks more like an AWS credentials issue than a
> Hive issue. Are you running in AWS EMR, on-prem?
>
> In AWS the resource accessing the S3 bucket would have to have an IAM that
> gave permission. If you are running somewhere else whatever AWS login you
> use would have to have the correct permissions in the IAM.
>
>
>
> - Original Message -
> From: "Anup Tiwari" < anupsdtiw...@gmail.com >
> To: user@hive.apache.org
> Sent: Thursday, April 12, 2018 9:11:37 AM
> Subject: Unable to read hive external table data which is linked to s3
> after upgradation from 2.1.1 to 2.3.3
>
>
>
> Hi All,
>
> When i am trying to read s3 linked external table in 2.3.3 ; i am getting
> errors.It was working properly in 2.1.1. please find below details and let
> me know if i am missing something :-
>
>
> Hadoop Version :- 2.8.0
>
> Query :-
>
> select log_date,count(1) as cnt from test.tt1 group by log_date;
>
> Error :-
>
> Vertex failed, vertexName=Map 1, vertexId=vertex_1523502631429_0029_3_00,
> diagnostics=[Vertex vertex_1523502631429_0029_3_00 [Map 1] killed/failed
> due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: tt1 initializer failed,
> vertex=vertex_1523502631429_0029_3_00 [Map 1], 
> org.apache.hadoop.fs.s3a.AWSClientIOException:
> doesBucketExist on g24x7.new-analytics: com.amazonaws.AmazonClientException:
> No AWS Credentials provided by BasicAWSCredentialsProvider
> EnvironmentVariableCredentialsProvider 
> SharedInstanceProfileCredentialsProvider
> : com.amazonaws.AmazonClientException: The requested metadata is not
> found at http://16x.xxx.xxx.xx4/latest/meta-data/iam/security-credentials/
> : No AWS Credentials provided by BasicAWSCredentialsProvider
> EnvironmentVariableCredentialsProvider 
> SharedInstanceProfileCredentialsProvider
> : com.amazonaws.AmazonClientException: The requested metadata is not
> found at http://16x.xxx.xxx.xx4/latest/meta-data/iam/security-credentia

Re: Unable to read hive external table data which is linked to s3 after upgradation from 2.1.1 to 2.3.3

2018-04-12 Thread Anup Tiwari
We are not using EMR. Also we have set below params for accessing s3 bucket
in hive-site.xml which are same as what we have set in hive 2.1.1.


   - fs.s3a.access.key
   - fs.s3a.secret.key
   - fs.s3a.connection.maximum
   - fs.s3a.impl


Regards,
Anup Tiwari

On Thu, Apr 12, 2018 at 7:19 PM, Richard A. Bross <r...@oaktreepeak.com>
wrote:

> Based on the exception, it looks more like an AWS credentials issue than a
> Hive issue.  Are you running in AWS EMR, on-prem?
>
> In AWS the resource accessing the S3 bucket would have to have an IAM that
> gave permission.  If you are running somewhere else whatever AWS login you
> use would have to have the correct permissions in the IAM.
>
> - Original Message -
> From: "Anup Tiwari" <anupsdtiw...@gmail.com>
> To: user@hive.apache.org
> Sent: Thursday, April 12, 2018 9:11:37 AM
> Subject: Unable to read hive external table data which is linked to s3
> after upgradation from 2.1.1 to 2.3.3
>
>
>
> Hi All,
>
> When i am trying to read s3 linked external table in 2.3.3 ; i am getting
> errors.It was working properly in 2.1.1. please find below details and let
> me know if i am missing something :-
>
>
> Hadoop Version :- 2.8.0
>
> Query :-
>
> select log_date,count(1) as cnt from test.tt1 group by log_date;
>
> Error :-
>
> Vertex failed, vertexName=Map 1, vertexId=vertex_1523502631429_0029_3_00,
> diagnostics=[Vertex vertex_1523502631429_0029_3_00 [Map 1] killed/failed
> due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: tt1 initializer failed,
> vertex=vertex_1523502631429_0029_3_00 [Map 1], 
> org.apache.hadoop.fs.s3a.AWSClientIOException:
> doesBucketExist on g24x7.new-analytics: com.amazonaws.AmazonClientException:
> No AWS Credentials provided by BasicAWSCredentialsProvider
> EnvironmentVariableCredentialsProvider 
> SharedInstanceProfileCredentialsProvider
> : com.amazonaws.AmazonClientException: The requested metadata is not
> found at http://16x.xxx.xxx.xx4/latest/meta-data/iam/security-credentials/
> : No AWS Credentials provided by BasicAWSCredentialsProvider
> EnvironmentVariableCredentialsProvider 
> SharedInstanceProfileCredentialsProvider
> : com.amazonaws.AmazonClientException: The requested metadata is not
> found at http://16x.xxx.xxx.xx4/latest/meta-data/iam/security-credentials/
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(
> S3AFileSystem.java:288)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(
> S3AFileSystem.java:236)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2811)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
> at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(
> FileInputFormat.java:265)
> at org.apache.hadoop.mapred.FileInputFormat.listStatus(
> FileInputFormat.java:236)
> at org.apache.hadoop.mapred.FileInputFormat.getSplits(
> FileInputFormat.java:322)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(
> HiveInputFormat.java:442)
> at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(
> HiveInputFormat.java:561)
> at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.
> initialize(HiveSplitGenerator.java:196)
> at org.apache.tez.dag.app.dag.RootInputInitializerManager$
> InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
> at org.apache.tez.dag.app.dag.RootInputInitializerManager$
> InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1807)
> at org.apache.tez.dag.app.dag.RootInputInitializerManager$
> InputInitializerCallable.call(RootInputInitializerManager.java:269)
> at org.apache.tez.dag.app.dag.RootInputInitializerManager$
> InputInitializerCallable.call(RootInputInitializerManager.java:253)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: com.amazonaws.AmazonClientException: No AWS Credentials
> provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider
> Shared

Loading data from one swift container into external table pointing to another container fails

2018-02-02 Thread anup ahire
Hello,


Loading data from one swift container into external table pointing to
another container is failing for me.


hive> load data inpath 'swift://mycontainer.default/source_test' into table
my_test;
Loading data to table default.my_test
Failed with exception copyFiles: error while moving files!!! Cannot move
swift://mycontainer.default/source_test/AK.TXT to swift://test
.default/my_test/AK.TXT
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask
hive>



Here is complete stacktrace

2018-02-02 21:27:28,309 ERROR [main]: exec.Task
(SessionState.java:printError(962)) - Failed with exception copyFiles:
error while moving files!!! Cannot move
swift://mycontainer.default/source_test/AK.TXT
to swift://test.default/my_test/AK.TXT
org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error while
moving files!!! Cannot move swift://mycontainer.default/source_test/AK.TXT
to swift://test.default/my_test/AK.TXT
   at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2784)
   at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1664)
   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1116)
   at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:379)
   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:739)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:498)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException:Cannot move
swift://mycontainer.default/source_test/AK.TXT
to swift://test.default/my_test/AK.TXT
   at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2777)
   ... 21 more

hive version = 1.2.1

Any help is appreciated.

Thanks


insert on condition

2016-12-27 Thread anup ahire
Hello All,

Is it possible to achieve " Insert overwrite into a partition only if it
does not exist " ?
Partition is a string date.

Thanks,
Anup


issue with hive jdbc

2016-09-23 Thread anup ahire
Hello,

I am getting this exception when my query finishes which results in job
failure.

java.sql.SQLException: org.apache.http.NoHttpResponseException: The target
server failed to respond
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)

Any help is appreciated !!


THanks !!!
Anup


Re: need help with hive session getting killed

2016-09-20 Thread Anup
I am doing insert overwrite so results of query are getting written to an 
external table stored in parquet.

Which the properties should I set and Can I set those in query ?

Thanks 


> On Sep 20, 2016, at 8:27 AM, Jörn Franke <jornfra...@gmail.com> wrote:
> 
> Increase timeout or let the result of the query be written in a dedicated 
> table.
> 
>> On 20 Sep 2016, at 16:57, anup ahire <ahirea...@gmail.com> wrote:
>> 
>> 
>> 
>> 
>> Hello,
>> 
>> I am using hive-jdbc-1.2.1 to run a query. Query runs around an hour and 
>> eventually completes.
>> But my hive session terminats before I can get results from completed query 
>> and job fails.
>> java.sql.SQLException: org.apache.http.NoHttpResponseException: The target 
>> server failed to respond
>> 
>> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) 
>> ~[hive-jdbc-1.2.1.jar!/:1.2.1]
>> 
>> What is the best way to address this ?
>> 
>> 
>> 
>> Thanks ,
>> 
>> Anup
>> 
>>