Re: Hive SQL extension

2020-10-27 Thread Jesus Camacho Rodriguez
Hi Peter,

Thanks for bringing this up.

Why are targeting the 'partition by spec' syntax? Is it for convenience?
Was it already introduced by Iceberg?

I did not understand the reasoning for not introducing the new syntax in
Hive. As it was already mentioned by Stamatis, there is some advantage to
doing syntax validation through the usual flow.
If the question is whether we could make it useful beyond Iceberg format
and there is no attachment to the syntax above, could this be generalized
to introducing generated virtual column declaration in Hive (many RDBMSs
support these) + using the current partitioning declaration? For instance,
for your DDL declaration above:

create table iceberg_test(
level string,
event_time timestamp,
message string,
register_time date,
telephone array 
)
partitioned by (
v_level [GENERATED ALWAYS] AS level,
v_event_time [GENERATED ALWAYS] AS event_time,
v_event_time_hour [GENERATED ALWAYS] AS hour(event_time),
v_register_time [GENERATED ALWAYS] AS day(register_time)
)
stored as iceberg;

This would assume that the underlying storage format supports partitioning
by virtual columns. I believe this syntax would allow us to take some of
these ideas even further, e.g., introduce 'stored' derived columns or
custom partitioning specs (although without the underlying storage format
support, probably they would need to be 'stored' instead of 'virtual').

Even if you introduce this syntax, you could still do the transformation
that you described above internally, i.e., storage handler resolution and
table properties generation. Thus, the internal handling would be the same.

Thanks,
Jesús

On Mon, Oct 26, 2020 at 2:27 AM Stamatis Zampetakis 
wrote:

> I do like extensions and things that simplify our life when
> writing queries.
>
> Regarding the partitioning syntax for Iceberg, there may be better
> alternatives.
> I was also leaning towards a syntax like the one proposed by Jesus (in
> another thread) based on virtual columns, which is also part of SQL
> standard.
>
> Regarding the other use cases mentioned (temporal queries, time travel
> etc.) there are things that are part of SQL standard so we could start from
> there and then introduce extensions if needed.
>
> Syntactic sugar is powerful but in terms of design I find it more
> appropriate to perform "desugaring" after having an AST; either AST to AST
> transformations or afterwards.
> The syntax (sugared or not) is part and responsibility of the parser so an
> architecture with sub-parser hooks seems a bit brittle, especially if we
> start using it extensively.
> Having said that you have thought of this much more than I did so maybe
> the hook's approach is a better idea after all :)
>
> Best,
> Stamatis
>
> On Fri, Oct 23, 2020 at 2:26 PM Pau Tallada  wrote:
>
>> Hi all,
>>
>> I do not know if that may be of interest to you, but there are other
>> projects that could benefit from this.
>> For instance, ADQL
>> 
>> (Astronomical Data Query Language) is a SQL-like language that defines some
>> higher-level functions that enable powerful geospatial queries. Projects
>> like queryparser  are able
>> to translate from ADQL to vendor-SQL for MySQL or PostreSQL. In this case,
>> the syntactic sugar is implemented as an external layer on top, but could
>> very well be implemented in a rewrite hook if available.
>>
>> Cheers,
>>
>> Pau.
>>
>> Missatge de Peter Vary  del dia dj., 22 d’oct. 2020
>> a les 16:21:
>>
>>>
>>> Let's assume that this feature would be useful for Iceberg tables, but
>>> useless and even problematic/forbidden for other tables. :)
>>>
>>> My thinking is, that it could make Hive much more user friendly, if we
>>> would allow for extensions in language.
>>>
>>> With Iceberg integration we plan to do several extensions which might
>>> not be useful for other tables. Some examples:
>>>
>>>- When creating tables we want to send additional information to the
>>>storage layer, and pushing everything in properties is a pain (not really
>>>user friendly)
>>>- We would like to allow querying table history for iceberg tables
>>>(previous snapshotId-s, timestamps, etc)
>>>- We would like to allow time travel for iceberg tables based on the
>>>data queried above
>>>- We would like to allow the user to see / manage / remove old
>>>snapshots
>>>
>>>
>>> These are all very specific Iceberg related stuff, and most probably
>>> will not work / useful for any other type of the tables, so I think adding
>>> them to Hive parser would be a stretch.
>>>
>>> On the other hand if we do not provide SQL interface for accessing these
>>> features then the users will turn to Spark/Impala/Presto to be able to work
>>> with Iceberg tables.
>>>
>>> As for your specific question for handling syntax errors (I 

Re: Consuming delta from Hive tables

2019-05-06 Thread Jesus Camacho Rodriguez
Hi Bhargav,

We solve a similar problem for incremental maintenance

for materialized views.

row__id.writeid can be used for that scenario indeed. You just need to know
the current snapshot of the system at reading time (). Then you just need to add a filter operator on top
of that table, making explicit the data contained in it. The filter will
roughly take the form ROW_ID.writeid <= high_watermark and ROW_ID.writeid
not in (open/invalid_ids). Information about how "writeid" is generated can
be found in https://jira.apache.org/jira/browse/HIVE-18192 .

Note that when source tables are not append only and update/delete record
operations may have been executed over them, problem becomes trickier since
currently there is no way to retrieve update/delete records from the delta
files (contributions would be welcome).

Cheers,
Jesús


On Mon, May 6, 2019 at 6:23 AM Bhargav Bipinchandra Naik (Seller
Platform-BLR)  wrote:

> We have a scenario where we want to consume only delta updates from Hive
> tables.
> - Multiple producers are updating data in Hive table
> - Multiple consumer reading data from the Hive table
>
> Consumption pattern:
> - Get all data that has been updated since last time I read.
>
> Is there any mechanism in Hive 3.0 which can enable above consumption
> pattern?
>
> I see there is a construct of row__id(writeid, bucketid, rowid) in ACID
> tables.
> - Can row__id be used in this scenario?
> - How is the "writeid" generated?
> - Is there some meta information which captures the time when the rows
> were actually visible for read?
>


Re: Creating temp tables in select statements

2019-03-28 Thread Jesus Camacho Rodriguez
Depending on the version you are using, table + values syntax is supported.
https://issues.apache.org/jira/browse/HIVE-18416

SELECT a, b FROM TABLE(VALUES(1,2),(3,4)) AS x(a,b);

-Jesús


From: David Lavati 
Reply-To: "user@hive.apache.org" 
Date: Thursday, March 28, 2019 at 4:44 AM
To: "user@hive.apache.org" 
Subject: Re: Creating temp tables in select statements

Hi Mainak,

For select queries the only way I know of for multiple records is through using 
union:

0: jdbc:hive2://localhost:1> with x as (select 1 num union select 2 union 
select 3) select * from x;
++
| x.num  |
++
| 1  |
| 2  |
| 3  |
++

For table insertion you can use a syntax somewhat similar to VALUES 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingvaluesintotablesfromSQL

Kind Regards,
David


On Wed, Mar 27, 2019 at 12:40 AM Mainak Ghosh 
mailto:mgh...@twitter.com>> wrote:
Hello,

We want to create temp tables at a select query level. For example:

with x as (1, 2, 3) select * from x;

Or

Select * from table where id in ; Here list of integers is an 
input and can change.

Currently Postgres VALUES syntax is not supported in Hive. Is there some easy 
workarounds which does not involved explicitly creating temporary tables and 
can be specified at the select query level?

Thanks and Regards,
Mainak


--
David Lavati | Software Engineer

t. (+3620) 951-7468

cloudera.com


[Image removed by sender. Cloudera]



[Image removed by sender. Cloudera on Twitter]

[Image removed by sender. Cloudera on 
Facebook]

[Image removed by sender. Cloudera on 
LinkedIn]













Re: CURRENT_TIMESTAMP timezone in hive 3.1

2018-12-13 Thread Jesus Camacho Rodriguez
Hi Andrey,

Thanks for reporting this. This is a bug introduced by the JIRA you reference, 
since CURRENT_TIMESTAMP should not use UTC for every case. I have created 
https://issues.apache.org/jira/browse/HIVE-21039 to fix the issue and discuss 
about the desired behavior.

-Jesús

From: Andrey Zinovyev 
Reply-To: "user@hive.apache.org" 
Date: Wednesday, December 12, 2018 at 11:57 AM
To: "user@hive.apache.org" 
Subject: CURRENT_TIMESTAMP timezone in hive 3.1

We're upgrading from hive 1.2 to 3.1 and it seems like new hive returns 
CURRENT_TIMESTAMP in UTC timezone. But before it was in local (system's 
default) timezone.

According to HIVE-5472 current_timestamp should use user's local timezone. This 
behaviour was changed in HIVE-12192  (if I got it right). 
GenericUDFCurrentTimestamp now explicitly uses UTC as timezone to initialise 
org.apache.hadoop.hive.common.type.Timestamp

Is it some intentional change or some misconfiguration from on our side?

For example
Old hive:
hive> select current_timestamp;
OK
2018-12-12 22:43:39.024

New hive:
> select current_timestamp;
+--+
|   _c0|
+--+
| 2018-12-12 19:43:57.024  |
+--+


[CVE-2017-12625] Apache Hive information disclosure vulnerability for column masking

2017-10-31 Thread Jesus Camacho Rodriguez
CVE-2017-12625: Apache Hive information disclosure vulnerability for column 
masking

Severity: Important

Vendor: The Apache Software Foundation

Versions Affected: Hive 2.1.0 to 2.3.0

Description:
Hive exposes an interface through which masking policies can be defined on 
tables or
views, e.g., using Apache Ranger. When a view is created over a given table, the
policy enforcement does not happen correctly on the table for masked columns.

Mitigation:
2.3.0 users should upgrade to 2.3.1
2.2.0 users should upgrade to 2.3.1, obtain the latest source from git for 
branch-2.2
or apply this patch which will be included from 2.2.1
https://git1-us-west.apache.org/repos/asf?p=hive.git;a=commit;h=0e795debddf261b0ac6ace90e2d774f9a99b7f4b
2.1.x users should upgrade to 2.3.1, obtain the latest source from git for 
branch-2.1
or apply this patch which will be included from 2.1.2
https://git1-us-west.apache.org/repos/asf?p=hive.git;a=commit;h=6db9fd6e43f6eef3c9d1ca8e324b2edaa54fb0d3

To mitigate this vulnerability until Hive is upgraded to a new version, there 
are two
possible options. These steps need to be done manually in Ranger / Hive.
1) Restrict users from creating views on tables with column masking rules 
defined. For
this in Ranger Hive Policy:
 - Users should not have SELECT permission for those Table columns with masking 
rules
defined.
 - Give SELECT permission only for those columns without masking rules defined.
2) Review the Hive Column Masking Policies maintained in Ranger for the tables. 
Then
check in Hive if views that read those tables have been defined.
If present, either change the view definition so those columns are not selected 
or
directly drop those views.

Credit:
This issue was reported by Suja Santhosh of Hortonworks.


If you have any question, please reach out to us in the Hive dev list.

Regards,

The Apache Hive Team




[ANNOUNCE] Apache Hive 2.3.1 Released

2017-10-25 Thread Jesus Camacho Rodriguez
The Apache Hive team is proud to announce the release of Apache Hive
version 2.3.1.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 2.3.1 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12341418=Text=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team




Re: [ANNOUNCE] New Hive Committer - Rajesh Balamohan

2016-12-14 Thread Jesus Camacho Rodriguez
Congrats Rajesh, well deserved! :)

--
Jesús




On 12/14/16, 8:41 AM, "Lefty Leverenz"  wrote:

>Congratulations Rajesh!
>
>-- Lefty
>
>
>On Tue, Dec 13, 2016 at 11:58 PM, Rajesh Balamohan 
>wrote:
>
>> Thanks a lot for providing this opportunity and to all for their messages.
>> :)
>>
>> ~Rajesh.B
>>
>> On Wed, Dec 14, 2016 at 11:33 AM, Dharmesh Kakadia 
>> wrote:
>>
>> > Congrats Rajesh !
>> >
>> > Thanks,
>> > Dharmesh
>> >
>> > On Tue, Dec 13, 2016 at 7:37 PM, Vikram Dixit K 
>> > wrote:
>> >
>> >> Congrats Rajesh! :)
>> >>
>> >> On Tue, Dec 13, 2016 at 9:36 PM, Pengcheng Xiong 
>> >> wrote:
>> >>
>> >>> Congrats Rajesh! :)
>> >>>
>> >>> On Tue, Dec 13, 2016 at 6:51 PM, Prasanth Jayachandran <
>> >>> prasan...@apache.org
>> >>> > wrote:
>> >>>
>> >>> > The Apache Hive PMC has voted to make Rajesh Balamohan a committer on
>> >>> the
>> >>> > Apache Hive Project. Please join me in congratulating Rajesh.
>> >>> >
>> >>> > Congratulations Rajesh!
>> >>> >
>> >>> > Thanks
>> >>> > Prasanth
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Nothing better than when appreciated for hard work.
>> >> -Mark
>> >>
>> >
>> >
>>


[ANNOUNCE] Apache Hive 2.1.1 Released

2016-12-08 Thread Jesus Camacho Rodriguez
The Apache Hive team is proud to announce the release of Apache Hive
version 2.1.1.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)

* Query execution via Apache Hadoop MapReduce and Apache Tez frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 2.1.1 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843=12335838

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team




Re: [ANNOUNCE] New PMC Member : Jesus

2016-07-18 Thread Jesus Camacho Rodriguez
Thanks everybody! Looking forward to continue contributing to the project!

--
Jesús




On 7/18/16, 6:21 PM, "Prasanth Jayachandran" <pjayachand...@hortonworks.com> 
wrote:

>Congratulations Jesus!
>
>> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang <jxc...@gmail.com> wrote:
>> 
>> Congrats!!
>> 
>> On Mon, Jul 18, 2016 at 9:54 AM, Vihang Karajgaonkar
>> <vih...@cloudera.com> wrote:
>>> Congratulations Jesus!
>>> 
>>>> On Jul 18, 2016, at 8:30 AM, Sergio Pena <sergio.p...@cloudera.com> wrote:
>>>> 
>>>> Congrats Jesus !!!
>>>> 
>>>> On Mon, Jul 18, 2016 at 7:28 AM, Peter Vary <pv...@cloudera.com> wrote:
>>>> 
>>>>> Congratulations Jesus!
>>>>> 
>>>>>> On Jul 18, 2016, at 6:55 AM, Wei Zheng <wzh...@hortonworks.com> wrote:
>>>>>> 
>>>>>> Congrats Jesus!
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Wei
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 7/17/16, 14:29, "Sushanth Sowmyan" <khorg...@gmail.com> wrote:
>>>>>> 
>>>>>>> Good to have you onboard, Jesus! :)
>>>>>>> 
>>>>>>> On Jul 17, 2016 12:00, "Lefty Leverenz" <leftylever...@gmail.com>
>>>>> wrote:
>>>>>>> 
>>>>>>>> Congratulations Jesus!
>>>>>>>> 
>>>>>>>> -- Lefty
>>>>>>>> 
>>>>>>>> On Sun, Jul 17, 2016 at 1:01 PM, Ashutosh Chauhan <
>>>>> hashut...@apache.org>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hello Hive community,
>>>>>>>>> 
>>>>>>>>> I'm pleased to announce that Jesus Camacho Rodriguez has accepted the
>>>>>>>>> Apache Hive PMC's
>>>>>>>>> invitation, and is now our newest PMC member. Many thanks to Jesus for
>>>>>>>>> all of
>>>>>>>>> his hard work.
>>>>>>>>> 
>>>>>>>>> Please join me congratulating Jesus!
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Ashutosh
>>>>>>>>> (On behalf of the Apache Hive PMC)
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>
>


Re: [ANNOUNCE] New PMC Member : Pengcheng

2016-07-18 Thread Jesus Camacho Rodriguez
Congrats Pengcheng, well deserved! :)



On 7/18/16, 6:25 PM, "Vaibhav Gumashta"  wrote:

>Congrats Pengcheng!
>
>From: Prasanth Jayachandran 
>Sent: Monday, July 18, 2016 10:21 AM
>To: user@hive.apache.org
>Cc: d...@hive.apache.org
>Subject: Re: [ANNOUNCE] New PMC Member : Pengcheng
>
>Congratulations Pengcheng!
>
>> On Jul 18, 2016, at 10:10 AM, Jimmy Xiang  wrote:
>>
>> Congrats!!
>>
>> On Mon, Jul 18, 2016 at 9:55 AM, Vihang Karajgaonkar
>>  wrote:
>>> Congratulations!
>>>
 On Jul 18, 2016, at 5:28 AM, Peter Vary  wrote:

 Congratulations Pengcheng!


> On Jul 18, 2016, at 6:55 AM, Wei Zheng  wrote:
>
> Congrats Pengcheng!
>
> Thanks,
>
> Wei
>
>
>
>
>
>
> On 7/17/16, 16:01, "Xuefu Zhang"  wrote:
>
>> Congrats, PengCheng!
>>
>> On Sun, Jul 17, 2016 at 2:28 PM, Sushanth Sowmyan 
>> wrote:
>>
>>> Welcome aboard Pengcheng! :)
>>>
>>> On Jul 17, 2016 12:01, "Lefty Leverenz"  wrote:
>>>
 Congratulations Pengcheng!

 -- Lefty

 On Sun, Jul 17, 2016 at 1:03 PM, Ashutosh Chauhan 
 
 wrote:

>>
>> Hello Hive community,
>>
>> I'm pleased to announce that Pengcheng Xiong has accepted the Apache
> Hive
>> PMC's
>> invitation, and is now our newest PMC member. Many thanks to 
>> Pengcheng
> for
>> all of his hard work.
>>
>> Please join me congratulating Pengcheng!
>>
>> Best,
>> Ashutosh
>> (On behalf of the Apache Hive PMC)
>>
>


>>>

>>>
>>
>
>
>


Re: multiple selects on a left join give incorrect result

2016-05-05 Thread Jesus Camacho Rodriguez
I created https://issues.apache.org/jira/browse/HIVE-13693 to track the issue.

I have already submitted a fix.

Thanks,
Jesús


From: Frank Luo >
Reply-To: "user@hive.apache.org" 
>
Date: Wednesday, May 4, 2016 at 1:04 AM
To: "user@hive.apache.org" 
>
Cc: Rebecca Yang >
Subject: RE: multiple selects on a left join give incorrect result

Hdp 2.2 also works fine.

From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Tuesday, May 03, 2016 6:56 PM
To: user@hive.apache.org
Cc: Rebecca Yang >
Subject: RE: multiple selects on a left join give incorrect result

There is no issue on Cloudera VM

Dudu


[cloudera@quickstart ~]$ hadoop version
Hadoop 2.6.0-cdh5.5.0
Subversion http://github.com/cloudera/hadoop -r 
fd21232cef7b8c1f536965897ce20f50b83ee7b2
Compiled by jenkins on 2015-11-09T20:37Z
Compiled with protoc 2.5.0
From source with checksum 98e07176d1787150a6a9c087627562c
This command was run using /usr/jars/hadoop-common-2.6.0-cdh5.5.0.jar

[cloudera@quickstart ~]$ hive --version
Hive 1.1.0-cdh5.5.0
Subversion 
file:///data/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hive-1.1.0-cdh5.5.0
 -r Unknown
Compiled by jenkins on Mon Nov 9 12:37:34 PST 2015
From source with checksum 8dfc2aac3731e4e5f0e8bd1b442be0e2

From: Frank Luo [mailto:j...@merkleinc.com]
Sent: Wednesday, May 04, 2016 1:58 AM
To: user@hive.apache.org
Cc: Rebecca Yang >
Subject: multiple selects on a left join give incorrect result

All,

I have found that when doing a multiple selects on a left join, the “on” clause 
seems to be ignored!!! (It is hard to believe).

Below is a very simple test case and please tell me I am crazy. I am on hdp 
2.3.4.7.


CREATE TABLE T_A ( idSTRING, val   STRING );
CREATE TABLE T_B ( idSTRING, val   STRING );
CREATE TABLE join_result_1 ( idaSTRING, vala   STRING, idbSTRING, valb  
 STRING );
CREATE TABLE join_result_2 ( idaSTRING, vala   STRING, idbSTRING, valb  
 STRING );
CREATE TABLE join_result_3 ( idaSTRING, vala   STRING, idbSTRING, valb  
 STRING );

INSERT INTO TABLE T_A
VALUES ('Id_1', 'val_101'), ('Id_2', 'val_102'), ('Id_3', 'val_103');

INSERT INTO TABLE T_B
VALUES ('Id_1', 'val_103'), ('Id_2', 'val_104');

FROM T_A a LEFT JOIN T_B b ON a.id = b.id
INSERT OVERWRITE TABLE join_result_1
   SELECT a.*, b.*
WHERE b.id = 'Id_1' AND b.val = 'val_103'
INSERT OVERWRITE TABLE join_result_2
   SELECT a.*, b.*
WHERE b.val IS NULL OR (b.id = 'Id_3' AND b.val = 'val_101')
INSERT OVERWRITE TABLE join_result_3
   SELECT a.*, b.*
WHERE b.val = 'val_104' AND b.id = 'Id_2' AND a.val <> b.val;


And here is the result:

0: jdbc:hive2 > select * from join_result_1;
++-++-+--+
| join_result_1.ida  | join_result_1.vala  | join_result_1.idb  | 
join_result_1.valb  |
++-++-+--+
| Id_1   | val_101 | Id_1   | val_103   
  |
| Id_2   | val_102 | Id_1   | val_103   
  |
| Id_3   | val_103 | Id_1   | val_103   
  |
++-++-+--+
3 rows selected (0.057 seconds)




I am expecting join_result_1 to have one row, but got three!!!

Has other people run into the same thing?

Join us at Merkle’s 2016 annual Performance Marketer Executive Summit – June 7 
– 9 in Memphis, TN


Download the latest installment of our annual Marketing Imperatives, “Winning 
with People-Based 
Marketing”

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.

Join us at Merkle’s 2016 annual Performance Marketer Executive Summit – June 7 
– 9 in Memphis, TN

Re: [ANNOUNCE] New Hive Committer - Wei Zheng

2016-03-10 Thread Jesus Camacho Rodriguez
Congrats Wei!


From: Madhu Thalakola >
Reply-To: "user@hive.apache.org" 
>, Madhu Thalakola 
>
Date: Thursday, March 10, 2016 at 2:47 PM
To: "user@hive.apache.org" 
>
Cc: "d...@hive.apache.org" 
>, 
"w...@apache.org" 
>
Subject: Re: [ANNOUNCE] New Hive Committer - Wei Zheng

Congratulations Wei Zheng

Thanks,
MAdhu
Help ever, Hurt never






Re: How to use grouping__id in a query

2015-10-21 Thread Jesus Camacho Rodriguez
I created HIVE-12223 to track this issue.

Thanks,
Jesús


From: Jesus Camachorodriguez
Reply-To: "user@hive.apache.org"
Date: Friday, October 16, 2015 at 8:00 AM
To: "user@hive.apache.org"
Subject: Re: How to use grouping__id in a query

Hi Michal,

Sorry I didn't catch your message before. The change of behavior might be due 
to a bug; certainly we should filter or at least produce a proper error.

Could you file a JIRA case and assign it to me? I'll check further.

Thanks,
Jesús



From: Michal Krawczyk
Reply-To: "user@hive.apache.org"
Date: Friday, October 16, 2015 at 8:15 AM
To: "user@hive.apache.org"
Subject: Re: How to use grouping__id in a query

Hi all,

Unfortunately I didn't get any answer on this one, perhaps I asked the question 
incorrectly. I'll try another one then ;).

Should it be possible to use grouping__id function in having clause to filter 
our null values in the same query. It used to work in Hive 0.11 and 0.13, but 
doesn't work in Hive 1.0.

Thanks,
Michal

On Fri, Sep 25, 2015 at 1:14 PM, Michal Krawczyk 
> wrote:
Hi all,

During the migration from Hive 0.11 to 1.0 on Amazon EMR I run to an issue with 
grouping__id function. I'd like to use it to filter out NULL values that didn't 
come from grouping sets. Here's an example:

We have a simple table with some data:

hive> create table grouping_test (col1 string, col2 string);
hive> insert into grouping_test values (1, 2), (1, 3), (1, null), (null, 2);
hive> select * from grouping_test;
OK
1   2
1   3
1   NULL
NULL2

hive> select col1, col2, GROUPING__ID, count(*)
from grouping_test
group by col1, col2
grouping sets ((), (col1))
having !(col1 IS NULL AND ((CAST(GROUPING__ID as int) & 1) > 0))

I expect the query above to filter out NULL col1 for the col1 grouping set, it 
used to work on Hive 0.11. But on Hive 1.0 it doesn't filter any values and 
still returns NULL col1:

NULLNULL0   4
NULLNULL1   1 <=== this row is expected to be removed by 
the having clause
1   NULL1   3

I tried also a few other conditions on grouping__id in having clause and none 
of them seem to work correctly:

select col1, col2, GROUPING__ID, count(*)
from grouping_test
group by col1, col2
grouping sets ((), (col1))
having GROUPING__ID = '1'

This query doesn't return any data.


I also tried to embed it into a subquery, but still no luck. It finally worked 
when I saved the output of the main query to a temp table and filtered out the 
data using where clause, but this looks like an overkill.

So my question is: How to filter out values using grouping__id in Hive 1.0?

Thanks for your help,
Michal


--
Michal Krawczyk
Project Manager / Tech Lead
Union Square Internet Development
http://www.u2i.com/



--
Michal Krawczyk
Project Manager / Tech Lead
Union Square Internet Development
http://www.u2i.com/


Re: How to use grouping__id in a query

2015-10-16 Thread Jesus Camacho Rodriguez
Hi Michal,

Sorry I didn't catch your message before. The change of behavior might be due 
to a bug; certainly we should filter or at least produce a proper error.

Could you file a JIRA case and assign it to me? I'll check further.

Thanks,
Jesús



From: Michal Krawczyk
Reply-To: "user@hive.apache.org"
Date: Friday, October 16, 2015 at 8:15 AM
To: "user@hive.apache.org"
Subject: Re: How to use grouping__id in a query

Hi all,

Unfortunately I didn't get any answer on this one, perhaps I asked the question 
incorrectly. I'll try another one then ;).

Should it be possible to use grouping__id function in having clause to filter 
our null values in the same query. It used to work in Hive 0.11 and 0.13, but 
doesn't work in Hive 1.0.

Thanks,
Michal

On Fri, Sep 25, 2015 at 1:14 PM, Michal Krawczyk 
> wrote:
Hi all,

During the migration from Hive 0.11 to 1.0 on Amazon EMR I run to an issue with 
grouping__id function. I'd like to use it to filter out NULL values that didn't 
come from grouping sets. Here's an example:

We have a simple table with some data:

hive> create table grouping_test (col1 string, col2 string);
hive> insert into grouping_test values (1, 2), (1, 3), (1, null), (null, 2);
hive> select * from grouping_test;
OK
1   2
1   3
1   NULL
NULL2

hive> select col1, col2, GROUPING__ID, count(*)
from grouping_test
group by col1, col2
grouping sets ((), (col1))
having !(col1 IS NULL AND ((CAST(GROUPING__ID as int) & 1) > 0))

I expect the query above to filter out NULL col1 for the col1 grouping set, it 
used to work on Hive 0.11. But on Hive 1.0 it doesn't filter any values and 
still returns NULL col1:

NULLNULL0   4
NULLNULL1   1 <=== this row is expected to be removed by 
the having clause
1   NULL1   3

I tried also a few other conditions on grouping__id in having clause and none 
of them seem to work correctly:

select col1, col2, GROUPING__ID, count(*)
from grouping_test
group by col1, col2
grouping sets ((), (col1))
having GROUPING__ID = '1'

This query doesn't return any data.


I also tried to embed it into a subquery, but still no luck. It finally worked 
when I saved the output of the main query to a temp table and filtered out the 
data using where clause, but this looks like an overkill.

So my question is: How to filter out values using grouping__id in Hive 1.0?

Thanks for your help,
Michal


--
Michal Krawczyk
Project Manager / Tech Lead
Union Square Internet Development
http://www.u2i.com/



--
Michal Krawczyk
Project Manager / Tech Lead
Union Square Internet Development
http://www.u2i.com/


Re: Hive wiki (jcamachorodriguez)

2014-10-30 Thread Jesus Camacho Rodriguez
Thanks, Lefty!

--
Jesús

On Wed, Oct 29, 2014 at 10:20 PM, Lefty Leverenz leftylever...@gmail.com
wrote:

 Done.  Welcome to the Hive wiki team, Jesús!

 -- Lefty

 On Wed, Oct 29, 2014 at 10:54 PM, Jesus Camacho Rodriguez 
 jcamachorodrig...@hortonworks.com wrote:

 Hi,

 I would like to get write access for the Hive wiki.
 My Confluence username is jcamachorodriguez.

 Thanks,
 Jesús

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Hive wiki (jcamachorodriguez)

2014-10-29 Thread Jesus Camacho Rodriguez
Hi,

I would like to get write access for the Hive wiki. My Confluence username
is jcamachorodriguez.

Thanks,
Jesús

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.