Re: HDFS small files to Sequence file using Hive

2016-09-23 Thread Gopal Vijayaraghavan
> Is there a way to create an external table on a directory, extract 'key' as > file name and 'value' as file content and write to a sequence file table? Do you care that it is a sequence file? The HDFS HAR format was invented for this particular problem, check if the "hadoop archive" command

issue with hive jdbc

2016-09-23 Thread anup ahire
Hello, I am getting this exception when my query finishes which results in job failure. java.sql.SQLException: org.apache.http.NoHttpResponseException: The target server failed to respond at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) Any help is appreciated !! THanks

RE: HDFS small files to Sequence file using Hive

2016-09-23 Thread Markovitz, Dudu
Hi I’m not sure how this will solve the issue you were mentioned, but just for the fun of it – Here is the code. Dudu set textinputformat.record.delimiter='\0'; set hive.mapred.supports.subdirectories=true; set mapred.input.dir.recursive=true; create external table if not exists files_ext

Re: on duplicate update equivalent?

2016-09-23 Thread Mich Talebzadeh
The fundamental question is: do you need these recurring updates to dimension tables throttling your Hive tables. Besides why bother with ETL when one can do ELT. For dimension table just add two additional columns namely , op_type int , op_time timestamp op_type = 1/2/3

Re: on duplicate update equivalent?

2016-09-23 Thread Gopal Vijayaraghavan
> Dimensions change, and I'd rather do update than recreate a snapshot. Slow changing dimensions are the common use-case for Hive's ACID MERGE. The feature you need is most likely covered by https://issues.apache.org/jira/browse/HIVE-10924 2nd comment from that JIRA "Once an hour, a set of

Encrypted communication in WebHCat

2016-09-23 Thread Алина Абрамова
Hi, Do we have plans to add SSL to connection client - REST service in WebHCat? Thanks, Alina

HDFS small files to Sequence file using Hive

2016-09-23 Thread Arun Patel
I'm trying to resolve small files issue using Hive. Is there a way to create an external table on a directory, extract 'key' as file name and 'value' as file content and write to a sequence file table? Or any other better option in Hive? Thank you Arun

Re: on duplicate update equivalent?

2016-09-23 Thread Mich Talebzadeh
Hi Vijay, If dimensional tables are reasonable size and frequently updated, then you can deploy *Spark SQL* to get data directly from your MySQL table through JDBC and do your join with your fact table stored in Hive. In general these days one can do better with Spark SQL. Your fact table still

RE: on duplicate update equivalent?

2016-09-23 Thread Vijay Ramachandran
Dimensions change, and I'd rather do update than recreate a snapshot. On 23-Sep-2016 17:23, "Markovitz, Dudu" wrote: > If these are dimension tables, what do you need to update there? > > > > Dudu > > > > *From:* Vijay Ramachandran [mailto:vi...@linkedin.com] > *Sent:*

RE: on duplicate update equivalent?

2016-09-23 Thread Markovitz, Dudu
If these are dimension tables, what do you need to update there? Dudu From: Vijay Ramachandran [mailto:vi...@linkedin.com] Sent: Friday, September 23, 2016 1:46 PM To: user@hive.apache.org Subject: Re: on duplicate update equivalent? On Fri, Sep 23, 2016 at 3:47 PM, Mich Talebzadeh

Re: on duplicate update equivalent?

2016-09-23 Thread Vijay Ramachandran
On Fri, Sep 23, 2016 at 3:47 PM, Mich Talebzadeh wrote: > What is the use case for UPSERT in Hive. The functionality does not exist > but there are other solutions. > > Are we talking about a set of dimension tables with primary keys hat need > to be updated (existing

RE: on duplicate update equivalent?

2016-09-23 Thread Markovitz, Dudu
You may however use a code similar to the following. The main idea is to work with 2 target tables. Instead of merging the source table into a target table, we create an additional target table based of the merge results. A view is pointing all the time to the most updated target table. Dudu

Re: on duplicate update equivalent?

2016-09-23 Thread Mich Talebzadeh
Hi Vijay, What is the use case for UPSERT in Hive. The functionality does not exist but there are other solutions. Are we talking about a set of dimension tables with primary keys hat need to be updated (existing rows) or inserted (new rows)? HTH Dr Mich Talebzadeh LinkedIn *

RE: on duplicate update equivalent?

2016-09-23 Thread Markovitz, Dudu
We’re not there yet… https://issues.apache.org/jira/browse/HIVE-10924 Dudu From: Vijay Ramachandran [mailto:vi...@linkedin.com] Sent: Friday, September 23, 2016 11:47 AM To: user@hive.apache.org Subject: on duplicate update equivalent? Hello. Is there a way to write a query with a behaviour

on duplicate update equivalent?

2016-09-23 Thread Vijay Ramachandran
Hello. Is there a way to write a query with a behaviour equivalent to mysql's "on duplicate update"? i.e., try to insert, and if key exists, update the row instead? thanks,

Re: iso 8601 to utc with timezone conversion

2016-09-23 Thread Manish R
Yes Sekine I am talking about AWS ELB logs in Mumbai region. Let me try implementing what Andres suggested and I also in a verge of implementing some other solution as well. I will let you all know once any of the solution works. On Sep 23, 2016 1:11 PM, "Sékine Coulibaly"

Re: iso 8601 to utc with timezone conversion

2016-09-23 Thread Sékine Coulibaly
Manish, UTC is not a format (but, ISO 8601 is). Consider UTC as + at the end of a ISO 8601 time. Eg: 2016-01-01T*23:45:22.943762*+ is stricylt equivalent to : 2016-01-01T*23:45:22.943762Z* *and is also strictly equivalent to the same time expressed in another timezone such as