from:"Damien Carol"

Re: Anyone successfully deployed Hive on TEZ engine?

2016-05-30 Thread Damien Carol

HIVE 1.2.1 and Tez 0.5.2 or 0.7.0 works pretty well

beginning to use HIVE 2.0.0 and 0.8.x but not stable :/

2016-05-29 22:26 GMT+02:00 Mich Talebzadeh :

>
> Please bear in mind that I am talking about your own build not anything
> comes as part of Vendor's package.
>
> If so kindly specify both Hive and TEZ versions.
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>

Re: last stats time on table columns

2016-06-16 Thread Damien Carol

ANALYZE TABLE  COMPUTE STATISTICS => change stats for the table
and should should it
ANALYZE TABLE  COMPUTE STATISTICS for COLUMNS => change stats
for columns and should change it for columns but NOT for the table

That's it.

2016-06-16 21:10 GMT+02:00 Ashok Kumar :

> Greeting gurus,
>
> When I use
>
> ANALYZE TABLE  COMPUTE STATISTICS for COLUMNS,
>
> Where can I get the last stats time.
>
> DESC FORMATTED  does not show it
>
> thanking you
>

Re: De-identification_in Hive

2016-03-18 Thread Damien Carol

For the record, see this ticket:
https://issues.apache.org/jira/browse/HIVE-13125

2016-03-17 17:02 GMT+01:00 Ajay Chander :

> Thanks for your time Mich! I will try this one out.
>
>
> On Thursday, March 17, 2016, Mich Talebzadeh 
> wrote:
>
>> Then probably the easiest option would be in INSERT/SELECT from external
>> table to target table and make that column NULL
>>
>> Check the VAT column here that I made it NULL
>>
>> DROP TABLE IF EXISTS stg_t2;
>> CREATE EXTERNAL TABLE stg_t2 (
>>  INVOICENUMBER string
>> ,PAYMENTDATE string
>> ,NET string
>> ,VAT string
>> ,TOTAL string
>> )
>> COMMENT 'from csv file from excel sheet '
>> ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
>> STORED AS TEXTFILE
>> LOCATION '/data/stg/table2'
>> TBLPROPERTIES ("skip.header.line.count"="1")
>> ;
>> --3)
>> DROP TABLE IF EXISTS t2;
>> CREATE TABLE t2 (
>>  INVOICENUMBER  INT
>> ,PAYMENTDATEtimestamp
>> ,NETDECIMAL(20,2)
>> ,VATDECIMAL(20,2)
>> ,TOTAL  DECIMAL(20,2)
>> )
>> COMMENT 'from csv file from excel sheet '
>> CLUSTERED BY (INVOICENUMBER) INTO 256 BUCKETS
>> STORED AS ORC
>> TBLPROPERTIES ( "orc.compress"="ZLIB",
>> "transactional"="true")
>> ;
>> --4) Put data in target table. do the conversion and ignore empty rows
>> INSERT INTO TABLE t2
>> SELECT
>>   INVOICENUMBER
>> , CAST(UNIX_TIMESTAMP(paymentdate,'DD/MM/')*1000 as timestamp)
>> , CAST(REGEXP_REPLACE(net,'[^\\d\\.]','') AS DECIMAL(20,2))
>> , NULL
>> , CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2))
>> FROM
>> stg_t2
>> WHERE
>> --INVOICENUMBER > 0 AND
>> CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2)) > 0.0
>> -- Exclude empty rows
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 March 2016 at 15:32, Ajay Chander  wrote:
>>
>>> Mich, I am okay with replacing the columns data with some characters
>>> like asterisk. Thanks
>>>
>>>
>>> On Thursday, March 17, 2016, Mich Talebzadeh 
>>> wrote:
>>>
 Hi Ajay,

 Do you want to be able to unmask it (at any time) or just have it
 totally scrambled (for example replace the column with random characters)
 in Hive?

 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com



 On 17 March 2016 at 15:14, Ajay Chander  wrote:

> Mich thbaks for looking into this. I have a 'csvfile.txt ' on hdfs. I
> have created an external table 'xyz' to load that data into it. One of the
> columns data 'ssn' needs to be masked. Is there any built in function is
> give that I could use?
>
>
> On Thursday, March 17, 2016, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Are you loading your CSV file from an External table into Hive table.?
>>
>> Basically you want to scramble that column before putting into Hive
>> table?
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 17 March 2016 at 14:37, Ajay Chander 
>> wrote:
>>
>>> Tustin, Is there anyway I can deidentify it in hive ?
>>>
>>>
>>> On Thursday, March 17, 2016, Marcin Tustin 
>>> wrote:
>>>
 This is a classic transform-load problem. You'll want to anonymise
 it once before making it available for analysis.

 On Thursday, March 17, 2016, Ajay Chander 
 wrote:

> Hi Everyone,
>
> I have a csv.file which has some sensitive data in a particular
> column in it.  Now I have to create a table in hive and load the data 
> into
> it. But when loading the data I have to make sure that the data is 
> masked.
> Is there any built in function is used ch supports this or do I have 
> to
> write UDF ? Any suggestions are appreciated. Thanks


 Want to work at Handy? Check out our culture deck and open roles
 
 Latest news

Pb with HBase

2016-03-14 Thread Damien Carol

I have this error sometimes :

{noformat}
Error: Error while processing statement: FAILED: Execution Error, return
code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Reducer 3, vertexId=vertex_1457964631631_0013_1_03,
diagnostics=[Task failed, taskId=task_1457964631631_0013_1_03_22,
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:
attempt_1457964631631_0013_1_03_22_0:java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.IllegalArgumentException: Must specify table name
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:195)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:351)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:59)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:59)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:36)
at
org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.IllegalArgumentException: Must specify table name
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createHiveOutputFormat(FileSinkOperator.java:1128)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:354)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:355)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:504)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:457)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:365)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:504)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:457)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:365)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:169)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
... 14 more
Caused by: java.lang.IllegalArgumentException: Must specify table name
at
org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
at
org.apache.hive.common.util.ReflectionUtil.setConf(ReflectionUtil.java:101)
at
org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:87)
at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:300)
at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:290)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createHiveOutputFormat(FileSinkOperator.java:1126)
... 24 more
], TaskAttempt 1 failed, info=[Error: Failure while running task:
attempt_1457964631631_0013_1_03_22_1:java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.IllegalArgumentException: Must specify table name
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:195)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:351)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:59)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:59)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:36)
at
org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at

Re: Hive system catalog

2016-05-19 Thread Damien Carol

1) If your metastore use a RDBMS for the backend, you can query it by
program.

2) If you want a more "command line" approach try the hcatalog cli tool

3) If you want to stay on hive, you could use beeline (HIVE as few meta
data commands like SHOW DATABASES)

Hope it helps,
Damien

2016-05-19 11:01 GMT+02:00 Mich Talebzadeh :

> The Hive 2 metastore with concurrency capability has 194 tables, 127 views
> and 38 relationships for a metastore created on Oracle 12c
>
> I have created an Entity-Relationship diagram but need to decide in what
> format to post it
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 19 May 2016 at 04:51, Mich Talebzadeh 
> wrote:
>
>> Hi Braj,
>>
>> Any tool GUI or OS level can log in and see the schema  created for Hive.
>>
>> For example my metadata for Hive is on Oracle and I can use SQL Developer
>> Data Model to create a logical model from the physical model
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 19 May 2016 at 03:34, brajmohan saxena 
>> wrote:
>>
>>> Hi,
>>>
>>> Could any body please help me to know how to get the Hive system catalog
>>> tables.
>>> Also I am looking for a catalog table which can show me all the built in
>>> functions list of Hive.
>>>
>>> Thanks in advance.
>>>
>>> Regards
>>> Braj
>>>
>>
>>
>

Re: Does HIVE JDBC return same sequence of records?

2016-07-04 Thread Damien Carol

There are no garantee.

Use a SORT BY and LIMIT BY if you want some sort of fixed result set.

2016-07-04 12:28 GMT+02:00 Igor Kuzmenko :

> If I perform query "*SELECT * FROM table t WHERE t.partition = value" *with
> Hive JDBC several times is there garantee, that when I will iterate throw
> result set I get records in the same order every time?
> Intuitively, it feels yes, because in that query ther's no MapReduce and
> hive just read data from HDFS directory.
>

Re: Hive LLAP daemon failing with Tez

2017-02-17 Thread Damien Carol

For record,

We had the same issue in my company. We use Ambari and HDP. Ambari in HDP
2.5.3 doesn't provide well some configuration keys.

My advice for HDP users: checks the xml configuration files which Ambari
generate.

Regards,
Damien

2017-02-16 22:32 GMT+01:00 Mistry, Jigar :

> Hi Gopal,
>
> Thanks for your response. I was indeed a configuration issue. I was able
> to make it work by using something like this:
>
> hive --service llap --name llap_test\
>  --instances 1\
>  --cache 2000m\
>  --executors 10\
>  --iothreads 10\
>  --size 5000m\
>  --xmx 2000m\
>  --loglevel WARN\
>  --args "-XX:+UseG1GC -XX:+ResizeTLAB -XX:+UseNUMA  -XX:-ResizePLAB"\
>  --javaHome $JAVA_HOME
>
> Thanks again for pointing me in the correct direction.
>
> Regards,
> Jigar
>
> On 2/16/17, 12:23 PM, "Gopal Vijayaraghavan"  behalf of gop...@apache.org> wrote:
>
>
> >at org.apache.hadoop.hive.llap.registry.impl.
> LlapZookeeperRegistryImpl$DynamicServiceInstance.getResource(
> LlapZookeeperRegistryImpl.java:474)
>
> Most likely incorrect configuration.
>
> hive.llap.daemon.num.executors
> or
> hive.llap.daemon.memory.per.instance.mb
>
> Are you using something like this?
>
> https://github.com/t3rmin4t0r/tez-autobuild/blob/llap/slider-gen.sh
>
> Cheers,
> Gopal
>
>
>
>
>
>

Re: on duplicate update equivalent?

2016-09-24 Thread Damien Carol

Another solution is to use HIVE over HBase.
When you insert in this table, HIVE do an upsert.




2016-09-23 21:00 GMT+02:00 Mich Talebzadeh :

> The fundamental question is: do you need these recurring updates to
> dimension tables throttling your Hive tables.
>
> Besides why bother with ETL when one can do ELT.
>
> For dimension table just add two additional columns namely
>
>, op_type int
>, op_time timestamp
>
> op_type = 1/2/3 (INSERT/UPDATE/DELETE)  and op_time = timestamp from Hive
> to the original table. New records will be appended to the dimension table.
> So when you have the full Entity Life History (one INSERT, multiple
> UPDATES and one delete) for a given primary key. then you can do whatever
> you want plus of course full audit of every record (for example what
> happened to every trade, who changed what etc).
>
> In your join with the FACT table you will need to use analytics to find
> the last entry for a given primary key (ignoring deletes) or just use
> standard HQL.
>
> If you are going to bring in Hbase etc to it, then Spark solution that I
> suggested earlier on may serve better.
>
> HTH
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 23 September 2016 at 19:36, Gopal Vijayaraghavan 
> wrote:
>
>> > Dimensions change, and I'd rather do update than recreate a snapshot.
>>
>> Slow changing dimensions are the common use-case for Hive's ACID MERGE.
>>
>> The feature you need is most likely covered by
>>
>> https://issues.apache.org/jira/browse/HIVE-10924
>>
>> 2nd comment from that JIRA
>>
>> "Once an hour, a set of inserts and updates (up to 500k rows) for various
>> dimension tables (eg. customer, inventory, stores) needs to be processed.
>> The dimension tables have primary keys and are typically bucketed and
>> sorted on those keys."
>>
>> Any other approach would need a full snapshot re-materialization, because
>> ACID can generate DELETE + INSERT instead of rewriting the original file
>> for a 2% upsert.
>>
>> If you do not have any isolation concerns (as in, a query doing a read
>> when 50% of your update has applied), using HBase backed dimension tables
>> in Hive is possible, but it does not offer the same transactional
>> consistency as the ACID merge will.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>

Re: incosistent result from orc select count(*)

2016-09-18 Thread Damien Carol

Could you please get hte result of this query:

*EXPLAIN EXTENDED select count(*) from table where id= <...>*

2016-09-17 4:15 GMT+02:00 Sanjeev Verma :

> Hi
>
> on hive-1.2.1 orc backed table I am running query select * from table
> where id=some it is returning me some 40 rows but when I did select
> count(*) from table where id= then
> it is returning me 14. tried to disable compute query using stats but no
> luck.
>
> could you please help me find out the issue, is there any caching involved
> here which is giving me bad results.
>
> Thanks
>

Re: ELB Log processing

2016-09-20 Thread Damien Carol

see the udf
*parse_url_tuple*
SELECT b.*
FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY',
'QUERY:id') b as host, path, query, query_id LIMIT 1;


https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-parse_url_tuple

2016-09-20 11:22 GMT+02:00 Manish Rangari :

> Guys,
>
> I want to get the field of elb logs. A sample elb log is given below and I
> am using below create table definition. It is working fine. I am getting
> what I wanted but now I want the bold part as well. For example eid, tid,
> aid. Can anyone help me how can I match them as well.
>
> NOTE: The position of aid, eid, tid is not fixed and it may change.
>
> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80 0.21
> 0.000596 0.2 200 200 0 43 "GET https://site1.example.com:443/peek?
> *eid=aw123=fskc235n=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 (Windows NT
> 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85
> Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
>
>
> CREATE TABLE elblog (
> Request_date STRING,
>   ELBName STRING,
>   RequestIP STRING,
>   RequestPort INT,
>   BackendIP STRING,
>   BackendPort INT,
>   RequestProcessingTime DOUBLE,
>   BackendProcessingTime DOUBLE,
>   ClientResponseTime DOUBLE,
>   ELBResponseCode STRING,
>   BackendResponseCode STRING,
>   ReceivedBytes BIGINT,
>   SentBytes BIGINT,
>   RequestVerb STRING,
>   URL STRING,
>   Protocol STRING,
> Useragent STRING,
> ssl_cipher STRING,
> ssl_protocol STRING
> )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
> WITH SERDEPROPERTIES (
>   "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
> )
> STORED AS TEXTFILE;
>

Re: ELB Log processing

2016-09-20 Thread Damien Carol

The royal way to do that is a view IMHO.

2016-09-20 12:14 GMT+02:00 Manish Rangari <linuxtricksfordev...@gmail.com>:

> Thanks for the reply Damien. The suggestion you gave is really useful.
> Currently I am achieving my desired output by performing below steps. But I
> want to achieve the desired result in one step instead of two. Do we have
> any way so that I can get the aid, did etc in create table statement? If
> not I will have to look for the option that you mentioned
>
> 1.
> CREATE TABLE elblog (
> Request_date STRING,
>   ELBName STRING,
>   RequestIP STRING,
>   RequestPort INT,
>   BackendIP STRING,
>   BackendPort INT,
>   RequestProcessingTime DOUBLE,
>   BackendProcessingTime DOUBLE,
>   ClientResponseTime DOUBLE,
>   ELBResponseCode STRING,
>   BackendResponseCode STRING,
>   ReceivedBytes BIGINT,
>   SentBytes BIGINT,
>   RequestVerb STRING,
>   URL STRING,
>   Protocol STRING,
> Useragent STRING,
> ssl_cipher STRING,
> ssl_protocol STRING
> )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
> WITH SERDEPROPERTIES (
>   "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
> )
> STORED AS TEXTFILE;
>
> 2.
> create table elb_raw_log as select request_date, elbname, requestip,
> requestport, backendip, backendport, requestprocessingtime,
> backendprocessingtime, clientresponsetime, elbresponsecode,
> backendresponsecode, receivedbytes, sentbytes, requestverb, url,
> regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid,
> regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid,
> regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid,
> regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol,
> useragent, ssl_cipher, ssl_protocol from elblog;
>
> On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <damien.ca...@gmail.com>
> wrote:
>
>> see the udf
>> *parse_url_tuple*
>> SELECT b.*
>> FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY',
>> 'QUERY:id') b as host, path, query, query_id LIMIT 1;
>>
>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageMan
>> ual+UDF#LanguageManualUDF-parse_url_tuple
>>
>> 2016-09-20 11:22 GMT+02:00 Manish Rangari <linuxtricksfordev...@gmail.com
>> >:
>>
>>> Guys,
>>>
>>> I want to get the field of elb logs. A sample elb log is given below and
>>> I am using below create table definition. It is working fine. I am getting
>>> what I wanted but now I want the bold part as well. For example eid, tid,
>>> aid. Can anyone help me how can I match them as well.
>>>
>>> NOTE: The position of aid, eid, tid is not fixed and it may change.
>>>
>>> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80
>>> 0.21 0.000596 0.2 200 200 0 43 "GET
>>> https://site1.example.com:443/peek?
>>> *eid=aw123=fskc235n=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 (Windows
>>> NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85
>>> Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
>>>
>>>
>>> CREATE TABLE elblog (
>>> Request_date STRING,
>>>   ELBName STRING,
>>>   RequestIP STRING,
>>>   RequestPort INT,
>>>   BackendIP STRING,
>>>   BackendPort INT,
>>>   RequestProcessingTime DOUBLE,
>>>   BackendProcessingTime DOUBLE,
>>>   ClientResponseTime DOUBLE,
>>>   ELBResponseCode STRING,
>>>   BackendResponseCode STRING,
>>>   ReceivedBytes BIGINT,
>>>   SentBytes BIGINT,
>>>   RequestVerb STRING,
>>>   URL STRING,
>>>   Protocol STRING,
>>> Useragent STRING,
>>> ssl_cipher STRING,
>>> ssl_protocol STRING
>>> )
>>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>>> WITH SERDEPROPERTIES (
>>>   "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
>>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
>>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>>> )
>>> STORED AS TEXTFILE;
>>>
>>
>>
>

Re: Connect metadata

2016-10-25 Thread Damien Carol

You could use CTAS in presto

2016-10-25 9:09 GMT+02:00 Rajendra Bhat :

> Hi Team,
>
> I have configured only meta store and started the meta store service,
> hwich i ma used on presto.
>
> I need create table on metastore.. how can i able create that.. as i am
> not started hive server service, bcoz hadoop not installed on my syatem.
>
> --
> Thanks and
> Regards
>
> Rajendra Bhat
>

Re: [ANNOUNCE] New committer for Apache Hive: Laszlo Vegh

2023-02-08 Thread Damien Carol

 Congratulations, Laszlo!

Regards,
Damien

Le mer. 8 févr. 2023 à 11:11, Stamatis Zampetakis  a
écrit :

> Congratulations Laszlo!
>
> ACID and compactions are a complex beast and the slightest problem there
> can have a huge impact in the system.
> Many thanks for all your work in this area that makes the life of the rest
> of us much easier.
>
> Best,
> Stamatis
>
> On Wed, Feb 8, 2023 at 9:46 AM Akshat m  wrote:
>
>> Congratulations Laszlo, Very well deserved :)
>>
>> Regards,
>> Akshat Mathur
>>
>> On Tue, Feb 7, 2023 at 9:08 PM Sai Hemanth Gantasala
>>  wrote:
>>
>>> Congratulations Laszlo Vegh, Great work on the compaction stuff!!
>>>
>>> Thanks,
>>> Sai.
>>>
>>> On Tue, Feb 7, 2023 at 4:24 AM Naveen Gangam 
>>> wrote:
>>>
>>> > The Project Management Committee (PMC) for Apache Hive has invited
>>> Laszlo
>>> > Vegh (veghlaci05) to become a committer and we are pleased
>>> > to announce that he has accepted.
>>> >
>>> > Contributions from Laszlo:
>>> >
>>> > He has authored 25 patches. Significant contributions to stabilization
>>> of
>>> > ACID compaction. Helped review other patches as well.
>>> >
>>> >
>>> >
>>> https://github.com/apache/hive/pulls?q=is%3Amerged+is%3Apr+author%3Aveghlaci05
>>> >
>>> > Being a committer enables easier contribution to the project since
>>> there
>>> > is no need to go via the patch submission process. This should enable
>>> > better productivity.A PMC member helps manage and guide the direction
>>> of
>>> > the project.
>>> >
>>> > Congratulations
>>> > Hive PMC
>>> >
>>>
>>

Re: Anyone successfully deployed Hive on TEZ engine?

Re: last stats time on table columns

Re: De-identification_in Hive

Pb with HBase

Re: Hive system catalog

Re: Does HIVE JDBC return same sequence of records?

Re: Hive LLAP daemon failing with Tez

Re: on duplicate update equivalent?

Re: incosistent result from orc select count(*)

Re: ELB Log processing

Re: ELB Log processing

Re: Connect metadata

Re: [ANNOUNCE] New committer for Apache Hive: Laszlo Vegh

13 matches

Site Navigation

Mail list logo

Footer information