[jira] [Commented] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark

2016-12-15 Thread Vikash Pareek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15750784#comment-15750784
 ] 

Vikash Pareek commented on HIVE-15272:
--

Query you can find in the issue description itself.
SELECT COUNT(DISTINCT t1.region, t1.amount)
FROM my_db.my_table1 t1
LEFT OUTER
JOIN my_db.my_table2 t2 ON (t1.id = t2.id
AND t1.name = t2.name)

For DDL, 
region -> STRING
amount -> DECIMAL
name -> STRING


> "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
> --
>
> Key: HIVE-15272
> URL: https://issues.apache.org/jira/browse/HIVE-15272
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4
>Reporter: Vikash Pareek
>
> I ran following Hive query multiple times with execution engine as Hive on 
> Spark and Hive on MapReduce.
> {code}
> SELECT COUNT(DISTINCT t1.region, t1.amount)
> FROM my_db.my_table1 t1
> LEFT OUTER
> JOIN my_db.my_table2 t2 ON (t1.id = t2.id
> AND t1.name = t2.name)
> {code}
> With Hive on Spark: Result (count) were different of every execution.
> With Hive on MapReduce: Result (count) were same of every execution.
> Seems like Hive on Spark behaving differently in each execution and does not 
> populating correct result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark

2016-12-15 Thread Vikash Pareek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikash Pareek updated HIVE-15272:
-
Description: 
I ran following Hive query multiple times with execution engine as Hive on 
Spark and Hive on MapReduce.
{code}
SELECT COUNT(DISTINCT t1.region, t1.amount)
FROM my_db.my_table1 t1
LEFT OUTER
JOIN my_db.my_table2 t2 ON (t1.id = t2.id
AND t1.name = t2.name)
{code}

With Hive on Spark: Result (count) were different of every execution.
With Hive on MapReduce: Result (count) were same of every execution.

Seems like Hive on Spark behaving differently in each execution and does not 
populating correct result.


  was:
I ran following Hive query multiple times with execution engine as Hive on 
Spark and Hive on MapReduce.
{code}
SELECT COUNT(DISTINCT t1.region, t1.amount)
FROM my_db.my_table1 t1
LEFT OUTER
JOIN my-db.my_table2 t2 ON (t1.id = t2.id
AND t1.name = t2.name)
{code}

With Hive on Spark: Result (count) were different of every execution.
With Hive on MapReduce: Result (count) were same of every execution.

Seems like Hive on Spark behaving differently in each execution and does not 
populating correct result.



> "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
> --
>
> Key: HIVE-15272
> URL: https://issues.apache.org/jira/browse/HIVE-15272
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4
>Reporter: Vikash Pareek
>
> I ran following Hive query multiple times with execution engine as Hive on 
> Spark and Hive on MapReduce.
> {code}
> SELECT COUNT(DISTINCT t1.region, t1.amount)
> FROM my_db.my_table1 t1
> LEFT OUTER
> JOIN my_db.my_table2 t2 ON (t1.id = t2.id
> AND t1.name = t2.name)
> {code}
> With Hive on Spark: Result (count) were different of every execution.
> With Hive on MapReduce: Result (count) were same of every execution.
> Seems like Hive on Spark behaving differently in each execution and does not 
> populating correct result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark

2016-11-24 Thread Vikash Pareek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15693243#comment-15693243
 ] 

Vikash Pareek edited comment on HIVE-15272 at 11/24/16 3:10 PM:


I am just calculating count of the records, result (count) does not dependent 
on ordering.
Result should be same for each execution as in case of MR.

my_table1 (left) is having ~30 million records 
my_table2 (right) is having ~85 million records 



was (Author: vpareek):
I am just calculating count of the records, result (count) does not dependent 
on ordering.
Result should be same for each execution as in case of MR.

I have around 30 million data in my_table1 (left) and 85 million data in 
my_table2 (right).


> "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
> --
>
> Key: HIVE-15272
> URL: https://issues.apache.org/jira/browse/HIVE-15272
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4
>Reporter: Vikash Pareek
>
> I ran following Hive query multiple times with execution engine as Hive on 
> Spark and Hive on MapReduce.
> {code}
> SELECT COUNT(DISTINCT t1.region, t1.amount)
> FROM my_db.my_table1 t1
> LEFT OUTER
> JOIN my-db.my_table2 t2 ON (t1.id = t2.id
> AND t1.name = t2.name)
> {code}
> With Hive on Spark: Result (count) were different of every execution.
> With Hive on MapReduce: Result (count) were same of every execution.
> Seems like Hive on Spark behaving differently in each execution and does not 
> populating correct result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15272) "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark

2016-11-24 Thread Vikash Pareek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikash Pareek updated HIVE-15272:
-
Summary: "LEFT OUTER JOIN" Is not populating correct records with Hive On 
Spark  (was: "LEFT OUTER JOIN" Is not populating different records with Hive On 
Spark)

> "LEFT OUTER JOIN" Is not populating correct records with Hive On Spark
> --
>
> Key: HIVE-15272
> URL: https://issues.apache.org/jira/browse/HIVE-15272
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4
>Reporter: Vikash Pareek
>
> I ran following Hive query multiple times with execution engine as Hive on 
> Spark and Hive on MapReduce.
> {code}
> SELECT COUNT(DISTINCT t1.region, t1.amount)
> FROM my_db.my_table1 t1
> LEFT OUTER
> JOIN my-db.my_table2 t2 ON (t1.id = t2.id
> AND t1.name = t2.name)
> {code}
> With Hive on Spark: Result (count) were different of every execution.
> With Hive on MapReduce: Result (count) were same of every execution.
> Seems like Hive on Spark behaving differently in each execution and does not 
> populating correct result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15272) "LEFT OUTER JOIN" Is not populating different records with Hive On Spark

2016-11-24 Thread Vikash Pareek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15693243#comment-15693243
 ] 

Vikash Pareek commented on HIVE-15272:
--

I am just calculating count of the records, result (count) does not dependent 
on ordering.
Result should be same for each execution as in case of MR.

I have around 30 million data in my_table1 (left) and 85 million data in 
my_table2 (right).


> "LEFT OUTER JOIN" Is not populating different records with Hive On Spark
> 
>
> Key: HIVE-15272
> URL: https://issues.apache.org/jira/browse/HIVE-15272
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4
>Reporter: Vikash Pareek
>
> I ran following Hive query multiple times with execution engine as Hive on 
> Spark and Hive on MapReduce.
> {code}
> SELECT COUNT(DISTINCT t1.region, t1.amount)
> FROM my_db.my_table1 t1
> LEFT OUTER
> JOIN my-db.my_table2 t2 ON (t1.id = t2.id
> AND t1.name = t2.name)
> {code}
> With Hive on Spark: Result (count) were different of every execution.
> With Hive on MapReduce: Result (count) were same of every execution.
> Seems like Hive on Spark behaving differently in each execution and does not 
> populating correct result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15272) "LEFT OUTER JOIN" Is not populating different records with Hive On Spark

2016-11-24 Thread Vikash Pareek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikash Pareek updated HIVE-15272:
-
Description: 
I ran following Hive query multiple times with execution engine as Hive on 
Spark and Hive on MapReduce.
{code}
SELECT COUNT(DISTINCT t1.region, t1.amount)
FROM my_db.my_table1 t1
LEFT OUTER
JOIN my-db.my_table2 t2 ON (t1.id = t2.id
AND t1.name = t2.name)
{code}

With Hive on Spark: Result (count) were different of every execution.
With Hive on MapReduce: Result (count) were same of every execution.

Seems like Hive on Spark behaving differently in each execution and does not 
populating correct result.


  was:
I ran following Hive query multiple times with execution engine as Hive on 
Spark and Hive on MapReduce.
{code}
SELECT COUNT(DISTINCT t1.region, t1.amount)
FROM my_db.my_table1 t1
LEFT OUTER
JOIN my-db.my_table2 t2 ON (t1.id = t2.id
AND t1.name = t2.name)
{code}

With Hive on Spark: Result (count) were different of every execution.
With Hive on MapReduce: Result (count) were same of every execution.




> "LEFT OUTER JOIN" Is not populating different records with Hive On Spark
> 
>
> Key: HIVE-15272
> URL: https://issues.apache.org/jira/browse/HIVE-15272
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4
>Reporter: Vikash Pareek
>
> I ran following Hive query multiple times with execution engine as Hive on 
> Spark and Hive on MapReduce.
> {code}
> SELECT COUNT(DISTINCT t1.region, t1.amount)
> FROM my_db.my_table1 t1
> LEFT OUTER
> JOIN my-db.my_table2 t2 ON (t1.id = t2.id
> AND t1.name = t2.name)
> {code}
> With Hive on Spark: Result (count) were different of every execution.
> With Hive on MapReduce: Result (count) were same of every execution.
> Seems like Hive on Spark behaving differently in each execution and does not 
> populating correct result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15272) "LEFT OUTER JOIN" Is not populating different records with Hive On Spark

2016-11-24 Thread Vikash Pareek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikash Pareek updated HIVE-15272:
-
Description: 
I ran following Hive query multiple times with execution engine as Hive on 
Spark and Hive on MapReduce.
{code}
SELECT COUNT(DISTINCT t1.region, t1.amount)
FROM my_db.my_table1 t1
LEFT OUTER
JOIN my-db.my_table2 t2 ON (t1.id = t2.id
AND t1.name = t2.name)
{code}

With Hive on Spark: Result (count) were different of every execution.
With Hive on MapReduce: Result (count) were same of every execution.



  was:
Following query is populating different result every time I ran with Hive on 
Spark:
{code}
SELECT COUNT(*)
FROM
  (SELECT DISTINCT mt1.name,
   mt1.id
   FROM
 (SELECT mt1.*,
 mt2.region,
 mt2.,
 regexp_replace(mt2.tr_dat,"\\.","") AS TRANSACTION_DATE
  FROM my_database.my_table1 mt1
  LEFT OUTER JOIN my_database.my_table2 mt2 ON (mt1.id=mt2.id
AND mt1.name = 
mt2.name))t6)A;
{code}

But the same query populating same result with Hive on MapReduce every time.


> "LEFT OUTER JOIN" Is not populating different records with Hive On Spark
> 
>
> Key: HIVE-15272
> URL: https://issues.apache.org/jira/browse/HIVE-15272
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 1.1.0
> Environment: Hive 1.1.0, CentOS, Cloudera 5.7.4
>Reporter: Vikash Pareek
>
> I ran following Hive query multiple times with execution engine as Hive on 
> Spark and Hive on MapReduce.
> {code}
> SELECT COUNT(DISTINCT t1.region, t1.amount)
> FROM my_db.my_table1 t1
> LEFT OUTER
> JOIN my-db.my_table2 t2 ON (t1.id = t2.id
> AND t1.name = t2.name)
> {code}
> With Hive on Spark: Result (count) were different of every execution.
> With Hive on MapReduce: Result (count) were same of every execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8960) ParsingException in the WHERE statement with a Sub Query

2016-08-20 Thread Vikash Pareek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429315#comment-15429315
 ] 

Vikash Pareek commented on HIVE-8960:
-

I am trying following query, it is working in Impala but not in Hive. 

SELECT t1.col1 FROM table1 t1 LEFT OUTER JOIN table2 t2 ON (t1.col2 = t2.col2 
AND t1.col3 = t2.col3) 
WHERE t2.col4 = (SELECT MAX(t22.col4) FROM table2 t22 WHERE t22.col4 <= 
t1.col4);

Is there any alternative for this in Hive?

> ParsingException in the WHERE statement with a Sub Query
> 
>
> Key: HIVE-8960
> URL: https://issues.apache.org/jira/browse/HIVE-8960
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 0.13.0
> Environment: Secured HDP 2.1.3 with Hive 0.13.0
>Reporter: Rémy SAISSY
>
> Comparison with a Sub query in a WHERE statement does not work.
> Given that id_chargement is an integer:
> USE db1;
> SELECT * FROM tbl1 a WHERE a.id_chargement > (SELECT MAX(b.id_chargement) 
> FROM tbl2 b);
> or
> SELECT * FROM tbl1 a WHERE a.id_chargement > (SELECT b.id_chargement FROM 
> tbl2 b LIMIT 1);
> Both return the following parsing error:
> Error: Error while compiling statement: FAILED: ParseException line 1:88 
> cannot recognize input near 'SELECT' 'b' '.' in expression specification 
> (state=42000,code=4)
> java.sql.SQLException: Error while compiling statement: FAILED: 
> ParseException line 1:88 cannot recognize input near 'SELECT' 'b' '.' in 
> expression specification
> at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121)
> at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109)
> at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231)
> at org.apache.hive.beeline.Commands.execute(Commands.java:736)
> at org.apache.hive.beeline.Commands.sql(Commands.java:657)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:804)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:659)
> at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:368)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:351)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)