[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3699:
-

Status: Open  (was: Patch Available)

A lot of tests are failing - can you debug ?

 Multiple insert overwrite into multiple tables query stores same results in 
 all tables
 --

 Key: HIVE-3699
 URL: https://issues.apache.org/jira/browse/HIVE-3699
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
 Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
 hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
Reporter: Alexandre Fouché
Assignee: Navis
 Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, 
 HIVE-3699_hive-0.9.1.patch.txt


 (Note: This might be related to HIVE-2750)
 I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
 to scan the dataset only 1 time, and i end up having all these tables with 
 the same content ! It seems the GROUP BY query that returns results is 
 overwriting all the temp tables.
 Weird enough, if i had further GROUP BY queries into additional temp tables, 
 grouped by a different field, then all temp tables, even the ones that would 
 have been wrong content are all correctly populated.
 This is the misbehaving query:
 FROM nikon
 INSERT OVERWRITE TABLE e1
 SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
 WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
 INSERT OVERWRITE TABLE e2
 SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
 WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
 ;
 It launches only one MR job and here are the results. Why does table 'e1' 
 contains results from table 'e2' ?! Table 'e1' should have been empty (see 
 individual SELECTs further below)
 hive SELECT * from e1;
 OK
 NULL2
 1627575 25
 1627576 70
 1690950 22
 1690952 42
 1696705 199
 1696706 66
 1696730 229
 1696759 85
 1696893 218
 Time taken: 0.229 seconds
 hive SELECT * from e2;
 OK
 NULL2
 1627575 25
 1627576 70
 1690950 22
 1690952 42
 1696705 199
 1696706 66
 1696730 229
 1696759 85
 1696893 218
 Time taken: 0.11 seconds
 Here is are the result to the indiviual queries (only the second query 
 returns a result set):
 hive SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
 nikon
 WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
 (...)
 OK
   - There are no results, this is normal
 Time taken: 41.471 seconds
 hive SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
 WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
 (...)
 OK
 NULL  2
 1627575 25
 1627576 70
 1690950 22
 1690952 42
 1696705 199
 1696706 66
 1696730 229
 1696759 85
 1696893 218
 Time taken: 39.607 seconds
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545816#comment-13545816
 ] 

Phabricator commented on HIVE-3853:
---

njain has commented on the revision HIVE-3853 [jira] UDF unix_timestamp is 
deterministic if an argument is given, but it treated as non-deterministic 
preventing PPD.

  This calls for deterministic not being an annotation -
  by any chance, do you know if the annotation can be overwritten dynamically --
  otherwise duplicate function is OK

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToUnixTimestamp.java:52
 Can you share the code between this and unix_timestamp ?

  I mean create a common class,and both the functions can extend that.

REVISION DETAIL
  https://reviews.facebook.net/D7767

To: JIRA, navis
Cc: njain


 UDF unix_timestamp is deterministic if an argument is given, but it treated 
 as non-deterministic preventing PPD
 ---

 Key: HIVE-3853
 URL: https://issues.apache.org/jira/browse/HIVE-3853
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Navis
Assignee: Navis
Priority: Trivial
  Labels: udf
 Attachments: HIVE-3853.D7767.1.patch


 unix_timestamp is declared as a non-deterministic function. But if user 
 provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3853:
-

Status: Open  (was: Patch Available)

comments

 UDF unix_timestamp is deterministic if an argument is given, but it treated 
 as non-deterministic preventing PPD
 ---

 Key: HIVE-3853
 URL: https://issues.apache.org/jira/browse/HIVE-3853
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Navis
Assignee: Navis
Priority: Trivial
  Labels: udf
 Attachments: HIVE-3853.D7767.1.patch


 unix_timestamp is declared as a non-deterministic function. But if user 
 provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3803:
-

Attachment: hive.3803.7.patch

 explain dependency should show the dependencies hierarchically in presence of 
 views
 ---

 Key: HIVE-3803
 URL: https://issues.apache.org/jira/browse/HIVE-3803
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
 hive.3803.4.patch, hive.3803.5.patch, hive.3803.6.patch, hive.3803.7.patch


 It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545819#comment-13545819
 ] 

Namit Jain commented on HIVE-3852:
--

[~navis], I had a higher level question.
Should we have this optimization now ?
I mean, is this really needed with map-side aggregates, or can we remove this 
code completely ?

 Multi-groupby optimization fails when same distinct column is used twice or 
 more
 

 Key: HIVE-3852
 URL: https://issues.apache.org/jira/browse/HIVE-3852
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3852.D7737.1.patch


 {code}
 FROM INPUT
 INSERT OVERWRITE TABLE dest1 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key
 INSERT OVERWRITE TABLE dest2 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key;
 {code}
 fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3852:
-

Status: Open  (was: Patch Available)

 Multi-groupby optimization fails when same distinct column is used twice or 
 more
 

 Key: HIVE-3852
 URL: https://issues.apache.org/jira/browse/HIVE-3852
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3852.D7737.1.patch


 {code}
 FROM INPUT
 INSERT OVERWRITE TABLE dest1 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key
 INSERT OVERWRITE TABLE dest2 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key;
 {code}
 fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow and

2013-01-07 Thread binlijin (JIRA)
binlijin created HIVE-3868:
--

 Summary: Use Hive‘s serde to parse HBase’s byte Data in 
LazyHBaseRow and
 Key: HIVE-3868
 URL: https://issues.apache.org/jira/browse/HIVE-3868
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.9.0
Reporter: binlijin


In LazyHBaseRow,
{code}
  private Object uncheckedGetField(int fieldID) {
  // it is a column i.e. a column-family with column-qualifier
  byte [] res = result.getValue(colMap.familyNameBytes, 
colMap.qualifierNameBytes);

  if (res == null) {
return null;
  } else {
ref = new ByteArrayRef();
ref.setData(res);
  }
  if (ref != null) {
fields[fieldID].init(ref, 0, ref.getData().length);
  }
  }

{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow

2013-01-07 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HIVE-3868:
---

Summary: Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow  (was: 
Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow and)

 Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow
 ---

 Key: HIVE-3868
 URL: https://issues.apache.org/jira/browse/HIVE-3868
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.9.0
Reporter: binlijin

 In LazyHBaseRow,
 {code}
   private Object uncheckedGetField(int fieldID) {
   // it is a column i.e. a column-family with column-qualifier
   byte [] res = result.getValue(colMap.familyNameBytes, 
 colMap.qualifierNameBytes);
   if (res == null) {
 return null;
   } else {
 ref = new ByteArrayRef();
 ref.setData(res);
   }
   if (ref != null) {
 fields[fieldID].init(ref, 0, ref.getData().length);
   }
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow

2013-01-07 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HIVE-3868:
---

Description: 
In LazyHBaseRow,
{code}
  private Object uncheckedGetField(int fieldID) {
  // it is a column i.e. a column-family with column-qualifier
  byte [] res = result.getValue(colMap.familyNameBytes, 
colMap.qualifierNameBytes);

  if (res == null) {
return null;
  } else {
ref = new ByteArrayRef();
ref.setData(res);
  }
  if (ref != null) {
fields[fieldID].init(ref, 0, ref.getData().length);
  }
  }
  For example, if the fields[fieldID] is Bigint, and ref stores HBase byte data 
(Long), it will use LazyLong to parse this data and will return NULL value, it 
should use Bytes.toLong(res.getData()) to parse this byte data
{code}

  was:
In LazyHBaseRow,
{code}
  private Object uncheckedGetField(int fieldID) {
  // it is a column i.e. a column-family with column-qualifier
  byte [] res = result.getValue(colMap.familyNameBytes, 
colMap.qualifierNameBytes);

  if (res == null) {
return null;
  } else {
ref = new ByteArrayRef();
ref.setData(res);
  }
  if (ref != null) {
fields[fieldID].init(ref, 0, ref.getData().length);
  }
  }

{code}


 Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow
 ---

 Key: HIVE-3868
 URL: https://issues.apache.org/jira/browse/HIVE-3868
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.9.0
Reporter: binlijin

 In LazyHBaseRow,
 {code}
   private Object uncheckedGetField(int fieldID) {
   // it is a column i.e. a column-family with column-qualifier
   byte [] res = result.getValue(colMap.familyNameBytes, 
 colMap.qualifierNameBytes);
   if (res == null) {
 return null;
   } else {
 ref = new ByteArrayRef();
 ref.setData(res);
   }
   if (ref != null) {
 fields[fieldID].init(ref, 0, ref.getData().length);
   }
   }
   For example, if the fields[fieldID] is Bigint, and ref stores HBase byte 
 data (Long), it will use LazyLong to parse this data and will return NULL 
 value, it should use Bytes.toLong(res.getData()) to parse this byte data
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow

2013-01-07 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HIVE-3868:
---

Description: 
In LazyHBaseRow,
{code}
  private Object uncheckedGetField(int fieldID) {
  // it is a column i.e. a column-family with column-qualifier
  byte [] res = result.getValue(colMap.familyNameBytes, 
colMap.qualifierNameBytes);

  if (res == null) {
return null;
  } else {
ref = new ByteArrayRef();
ref.setData(res);
  }
  if (ref != null) {
fields[fieldID].init(ref, 0, ref.getData().length);
  }
  }
  For example, if the fields[fieldID] is Bigint, and ref stores HBase byte data 
(Long), 
  it will use LazyLong to parse this data and will return NULL value, 
  it should use Bytes.toLong(res.getData()) to parse this byte data
{code}

  was:
In LazyHBaseRow,
{code}
  private Object uncheckedGetField(int fieldID) {
  // it is a column i.e. a column-family with column-qualifier
  byte [] res = result.getValue(colMap.familyNameBytes, 
colMap.qualifierNameBytes);

  if (res == null) {
return null;
  } else {
ref = new ByteArrayRef();
ref.setData(res);
  }
  if (ref != null) {
fields[fieldID].init(ref, 0, ref.getData().length);
  }
  }
  For example, if the fields[fieldID] is Bigint, and ref stores HBase byte data 
(Long), it will use LazyLong to parse this data and will return NULL value, it 
should use Bytes.toLong(res.getData()) to parse this byte data
{code}


 Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow
 ---

 Key: HIVE-3868
 URL: https://issues.apache.org/jira/browse/HIVE-3868
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.9.0
Reporter: binlijin

 In LazyHBaseRow,
 {code}
   private Object uncheckedGetField(int fieldID) {
   // it is a column i.e. a column-family with column-qualifier
   byte [] res = result.getValue(colMap.familyNameBytes, 
 colMap.qualifierNameBytes);
   if (res == null) {
 return null;
   } else {
 ref = new ByteArrayRef();
 ref.setData(res);
   }
   if (ref != null) {
 fields[fieldID].init(ref, 0, ref.getData().length);
   }
   }
   For example, if the fields[fieldID] is Bigint, and ref stores HBase byte 
 data (Long), 
   it will use LazyLong to parse this data and will return NULL value, 
   it should use Bytes.toLong(res.getData()) to parse this byte data
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow

2013-01-07 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545860#comment-13545860
 ] 

binlijin commented on HIVE-3868:


The reason is:
We use HBase's Bytes to convert long and other data type to byte data and store 
in hbase.
Then use hive to analysis the data in hbase.

 Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow
 ---

 Key: HIVE-3868
 URL: https://issues.apache.org/jira/browse/HIVE-3868
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.9.0
Reporter: binlijin

 In LazyHBaseRow,
 {code}
   private Object uncheckedGetField(int fieldID) {
   // it is a column i.e. a column-family with column-qualifier
   byte [] res = result.getValue(colMap.familyNameBytes, 
 colMap.qualifierNameBytes);
   if (res == null) {
 return null;
   } else {
 ref = new ByteArrayRef();
 ref.setData(res);
   }
   if (ref != null) {
 fields[fieldID].init(ref, 0, ref.getData().length);
   }
   }
   For example, if the fields[fieldID] is Bigint, and ref stores HBase byte 
 data (Long), 
   it will use LazyLong to parse this data and will return NULL value, 
   it should use Bytes.toLong(res.getData()) to parse this byte data
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1898 - Fixed

2013-01-07 Thread Apache Jenkins Server
Changes for Build #1896

Changes for Build #1897

Changes for Build #1898
[namit] HIVE-3300 LOAD DATA INPATH fails if a hdfs file with same name is added 
to table
(Navis via namit)

[namit] HIVE-3842 Remove redundant test codes
(Navis via namit)




All tests passed

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1898)

Status: Fixed

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1898/ to 
view the results.

[jira] [Commented] (HIVE-3842) Remove redundant test codes

2013-01-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545875#comment-13545875
 ] 

Hudson commented on HIVE-3842:
--

Integrated in Hive-trunk-h0.21 #1898 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1898/])
HIVE-3842 Remove redundant test codes
(Navis via namit) (Revision 1429682)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1429682
Files : 
* /hive/trunk/hbase-handler/src/test/templates/TestHBaseCliDriver.vm
* /hive/trunk/hbase-handler/src/test/templates/TestHBaseNegativeCliDriver.vm
* /hive/trunk/ql/src/test/templates/TestCliDriver.vm
* /hive/trunk/ql/src/test/templates/TestNegativeCliDriver.vm
* /hive/trunk/ql/src/test/templates/TestParse.vm
* /hive/trunk/ql/src/test/templates/TestParseNegative.vm


 Remove redundant test codes
 ---

 Key: HIVE-3842
 URL: https://issues.apache.org/jira/browse/HIVE-3842
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.11.0

 Attachments: HIVE-3842.D7773.1.patch


 Currently hive writes same test code again and again for each test, making 
 test class huge (50k line for ql).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3300) LOAD DATA INPATH fails if a hdfs file with same name is added to table

2013-01-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545874#comment-13545874
 ] 

Hudson commented on HIVE-3300:
--

Integrated in Hive-trunk-h0.21 #1898 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1898/])
HIVE-3300 LOAD DATA INPATH fails if a hdfs file with same name is added to 
table
(Navis via namit) (Revision 1429686)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1429686
Files : 
* /hive/trunk/build-common.xml
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
* /hive/trunk/ql/src/test/queries/clientpositive/load_fs2.q
* /hive/trunk/ql/src/test/results/clientpositive/load_fs2.q.out


 LOAD DATA INPATH fails if a hdfs file with same name is added to table
 --

 Key: HIVE-3300
 URL: https://issues.apache.org/jira/browse/HIVE-3300
 Project: Hive
  Issue Type: Bug
  Components: Import/Export
Affects Versions: 0.10.0
 Environment: ubuntu linux, hadoop 1.0.3, hive 0.9
Reporter: Bejoy KS
Assignee: Navis
 Fix For: 0.11.0

 Attachments: HIVE-3300.1.patch.txt, HIVE-3300.D4383.3.patch, 
 HIVE-3300.D4383.4.patch


 If we are loading data from local fs to hive tables using 'LOAD DATA LOCAL 
 INPATH' and if a file with the same name exists in the table's location then 
 the new file will be suffixed by *_copy_1.
 But if we do the 'LOAD DATA INPATH'  for a file in hdfs then there is no 
 rename happening but just a move task is getting triggered. Since a file with 
 same name exists in same hdfs location, hadoop fs move operation throws an 
 error.
 hive LOAD DATA INPATH '/userdata/bejoy/site.txt' INTO TABLE test.site;
 Loading data to table test.site
 Failed with exception null
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask
 hive 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2935) Implement HiveServer2

2013-01-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545973#comment-13545973
 ] 

Nicolas Fouché commented on HIVE-2935:
--

Using CDH 4.1.2, which includes this patch. I think there's a problem with 
hive-jdbc which includes a JDBC driver for the two version of hiveserver.

For the first version of hiveserver, hive-jdbc-0.9.0-cdh4.1.2 depends on 
libthrift-1.5.0, which defines org.apache.thrift.TServiceClient as an Interface.

For hiveserver2, hive-jdbc-0.9.0-cdh4.1.2 depends on 
hive-service-0.9.0-cdh4.1.2, which depends on hive-service-0.9.0-cdh4.1.2. The 
later seems to include code from libthrift, and defines 
org.apache.thrift.TServiceClient as an abstract class.

Thus this happens:

java.lang.IncompatibleClassChangeError: class 
org.apache.hive.service.cli.thrift.TCLIService$Client has interface 
org.apache.thrift.TServiceClient as super class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(Unknown Source)
at java.lang.ClassLoader.defineClass(Unknown Source)
at java.security.SecureClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.access$000(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at 
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:157)
at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:96)

Of course, I just have to remove libthrift from my libpath. But I just wanted 
to make Carl Steinbach know. (I used maven-dependency-plugin to get all 
dependent JARs, without thinking about what would be useless, or incompatible)

 Implement HiveServer2
 -

 Key: HIVE-2935
 URL: https://issues.apache.org/jira/browse/HIVE-2935
 Project: Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach
  Labels: HiveServer2
 Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
 HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt, 
 HS2-changed-files-only.patch, HS2-with-thrift-patch-rebased.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-07 Thread Liu Zongquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545984#comment-13545984
 ] 

Liu Zongquan commented on HIVE-2206:


If I plan to merge HIVE-2206 into the hive source code, which branch should I 
use? Can someone tell me?

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #27

2013-01-07 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/27/

--
[...truncated 8145 lines...]
 [echo] Project: common

create-dirs:
 [echo] Project: serde
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/serde/src/test/resources
 does not exist.

init:
 [echo] Project: serde

create-dirs:
 [echo] Project: metastore
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/metastore/src/test/resources
 does not exist.

init:
 [echo] Project: metastore

create-dirs:
 [echo] Project: ql

init:
 [echo] Project: ql

create-dirs:
 [echo] Project: contrib
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/contrib/src/test/resources
 does not exist.

init:
 [echo] Project: contrib

create-dirs:
 [echo] Project: service
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/service/src/test/resources
 does not exist.

init:
 [echo] Project: service

create-dirs:
 [echo] Project: cli
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/cli/src/test/resources
 does not exist.

init:
 [echo] Project: cli

create-dirs:
 [echo] Project: jdbc
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/jdbc/src/test/resources
 does not exist.

init:
 [echo] Project: jdbc

create-dirs:
 [echo] Project: hwi
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/hwi/src/test/resources
 does not exist.

init:
 [echo] Project: hwi

create-dirs:
 [echo] Project: hbase-handler
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/hbase-handler/src/test/resources
 does not exist.

init:
 [echo] Project: hbase-handler

create-dirs:
 [echo] Project: pdk
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/pdk/src/test/resources
 does not exist.

init:
 [echo] Project: pdk

create-dirs:
 [echo] Project: builtins
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/builtins/src/test/resources
 does not exist.

init:
 [echo] Project: builtins

jar:
 [echo] Project: hive

create-dirs:
 [echo] Project: shims
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/shims/src/test/resources
 does not exist.

init:
 [echo] Project: shims

ivy-init-settings:
 [echo] Project: shims

ivy-resolve:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/ivy/ivysettings.xml
[ivy:report] Processing 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/27/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-shims-default.xml
 to 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/27/artifact/hive/build/ivy/report/org.apache.hive-hive-shims-default.html

ivy-retrieve:
 [echo] Project: shims

compile:
 [echo] Project: shims
 [echo] Building shims 0.20

build-shims:
 [echo] Project: shims
 [echo] Compiling 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/0.20/java
 against hadoop 0.20.2 
(https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/27/artifact/hive/build/hadoopcore/hadoop-0.20.2)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/ivy/ivysettings.xml

ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.20S

build-shims:
 [echo] Project: shims
 [echo] Compiling 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/common-secure/java;/home/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/0.20S/java
 against hadoop 1.0.0 
(https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/27/artifact/hive/build/hadoopcore/hadoop-1.0.0)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/ivy/ivysettings.xml

ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.23

build-shims:
 [echo] Project: shims
 [echo] Compiling 

Hive-trunk-h0.21 - Build # 1899 - Failure

2013-01-07 Thread Apache Jenkins Server
Changes for Build #1899



No tests ran.

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1899)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1899/ to 
view the results.

Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #253

2013-01-07 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/253/

--
[...truncated 9916 lines...]

compile-test:
 [echo] Project: serde
[javac] Compiling 26 source files to 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/serde/test/classes
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

create-dirs:
 [echo] Project: service
 [copy] Warning: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/service/src/test/resources
 does not exist.

init:
 [echo] Project: service

ivy-init-settings:
 [echo] Project: service

ivy-resolve:
 [echo] Project: service
[ivy:resolve] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/ivy/ivysettings.xml
[ivy:report] Processing 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/resolution-cache/org.apache.hive-hive-service-default.xml
 to 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/report/org.apache.hive-hive-service-default.html

ivy-retrieve:
 [echo] Project: service

compile:
 [echo] Project: service

ivy-resolve-test:
 [echo] Project: service

ivy-retrieve-test:
 [echo] Project: service

compile-test:
 [echo] Project: service
[javac] Compiling 2 source files to 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/service/test/classes

test:
 [echo] Project: hive

test-shims:
 [echo] Project: hive

test-conditions:
 [echo] Project: shims

gen-test:
 [echo] Project: shims

create-dirs:
 [echo] Project: shims
 [copy] Warning: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/test/resources
 does not exist.

init:
 [echo] Project: shims

ivy-init-settings:
 [echo] Project: shims

ivy-resolve:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/ivy/ivysettings.xml
[ivy:report] Processing 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/resolution-cache/org.apache.hive-hive-shims-default.xml
 to 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/report/org.apache.hive-hive-shims-default.html

ivy-retrieve:
 [echo] Project: shims

compile:
 [echo] Project: shims
 [echo] Building shims 0.20

build_shims:
 [echo] Project: shims
 [echo] Compiling 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20/java
 against hadoop 0.20.2 
(/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/hadoopcore/hadoop-0.20.2)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/ivy/ivysettings.xml

ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.20S

build_shims:
 [echo] Project: shims
 [echo] Compiling 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common-secure/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20S/java
 against hadoop 1.0.0 
(/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/hadoopcore/hadoop-1.0.0)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/ivy/ivysettings.xml

ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.23

build_shims:
 [echo] Project: shims
 [echo] Compiling 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common-secure/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.23/java
 against hadoop 0.23.3 
(/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/hadoopcore/hadoop-0.23.3)

ivy-init-settings:
 [echo] Project: shims

[jira] [Created] (HIVE-3869) SELECT foo, NULL UNION ALL SELECT bar, baz fails

2013-01-07 Thread David Morel (JIRA)
David Morel created HIVE-3869:
-

 Summary: SELECT foo, NULL UNION ALL SELECT bar, baz fails
 Key: HIVE-3869
 URL: https://issues.apache.org/jira/browse/HIVE-3869
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: David Morel


In order to avoid the curse of the last reducer by using a left outer join 
where most joined rows woudl be NULLs, I rewrote the query as:
{code}

SELECT * FROM (
SELECT
A.user_id id,
B.created
FROM (
SELECT DISTINCT user_id
FROM users
) A
JOIN
buyhist B
ON
A.user_id = B.user_id
AND B.created = '2013-01-01'
UNION ALL
SELECT
DISTINCT(user_id) id,
NULL created
FROM users
) foo;
{code}

The expection thrown is this:

{code}
2013-01-07 17:00:01,081 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 22 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 22 more
{code}

The 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60)
 caught my attention, so I replaced NULL by an empty string:

{code}
...
UNION ALL
SELECT
DISTINCT(user_id) id,
'' created
{code}

Shouldn't the query parser accept the form using NULL, or at least output a 
message before the job is sent to the jobtracker?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3870) SELECT foo, NULL UNION ALL SELECT bar, baz fails

2013-01-07 Thread David Morel (JIRA)
David Morel created HIVE-3870:
-

 Summary: SELECT foo, NULL UNION ALL SELECT bar, baz fails
 Key: HIVE-3870
 URL: https://issues.apache.org/jira/browse/HIVE-3870
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: David Morel


In order to avoid the curse of the last reducer by using a left outer join 
where most joined rows woudl be NULLs, I rewrote the query as:
{code}

SELECT * FROM (
SELECT
A.user_id id,
B.created
FROM (
SELECT DISTINCT user_id
FROM users
) A
JOIN
buyhist B
ON
A.user_id = B.user_id
AND B.created = '2013-01-01'
UNION ALL
SELECT
DISTINCT(user_id) id,
NULL created
FROM users
) foo;
{code}

The expection thrown is this:

{code}
2013-01-07 17:00:01,081 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
... 22 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
at 
org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
... 22 more
{code}

The 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60)
 caught my attention, so I replaced NULL by an empty string:

{code}
...
UNION ALL
SELECT
DISTINCT(user_id) id,
'' created
{code}

Shouldn't the query parser accept the form using NULL, or at least output a 
message before the job is sent to the jobtracker?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3431) Avoid race conditions while downloading resources from non-local filesystem

2013-01-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3431:
---

Summary: Avoid race conditions while downloading resources from non-local 
filesystem  (was: Resources on non-local file system should be downloaded to 
temporary directory sometimes)

 Avoid race conditions while downloading resources from non-local filesystem
 ---

 Key: HIVE-3431
 URL: https://issues.apache.org/jira/browse/HIVE-3431
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3431.1.patch.txt, HIVE-3431.D5199.2.patch, 
 HIVE-3431.D5199.3.patch, HIVE-3431.D5199.4.patch


 add resource remote-uri command downloads the resource file to location 
 specified by conf hive.downloaded.resources.dir in local file system. But 
 when the command above is executed concurrently to hive-server for same file, 
 some client fails by VM crash, which is caused by overwritten file by other 
 requests.
 So there should be a configuration to provide per request location for add 
 resource command, something like set 
 hiveconf:hive.downloaded.resources.dir=temporary

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3431) Avoid race conditions while downloading resources from non-local filesystem

2013-01-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3431:
---

   Resolution: Fixed
Fix Version/s: 0.11.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 Avoid race conditions while downloading resources from non-local filesystem
 ---

 Key: HIVE-3431
 URL: https://issues.apache.org/jira/browse/HIVE-3431
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.11.0

 Attachments: HIVE-3431.1.patch.txt, HIVE-3431.D5199.2.patch, 
 HIVE-3431.D5199.3.patch, HIVE-3431.D5199.4.patch


 add resource remote-uri command downloads the resource file to location 
 specified by conf hive.downloaded.resources.dir in local file system. But 
 when the command above is executed concurrently to hive-server for same file, 
 some client fails by VM crash, which is caused by overwritten file by other 
 requests.
 So there should be a configuration to provide per request location for add 
 resource command, something like set 
 hiveconf:hive.downloaded.resources.dir=temporary

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3697) External JAR files on HDFS can lead to race condition with hive.downloaded.resources.dir

2013-01-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3697.


   Resolution: Fixed
Fix Version/s: 0.11.0

HIVE-3431 should fix this issue. Please reopen if you find otherwise.

 External JAR files on HDFS can lead to race condition with 
 hive.downloaded.resources.dir
 

 Key: HIVE-3697
 URL: https://issues.apache.org/jira/browse/HIVE-3697
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Chris McConnell
 Fix For: 0.11.0


 I've seen situations where utilizing JAR files on HDFS can cause job failures 
 via CNFE or JVM crashes. 
 This is difficult to replicate, seems to be related to JAR size, latency 
 between client and HDFS cluster, but I've got some example stack traces 
 below. Seems that the calls made to FileSystem (copyToLocal) which are static 
 and will be executed to delete the current local copy can cause the file(s) 
 to be removed during job processing.
 We should consider changing the default for hive.downloaded.resources.dir to 
 include some level of uniqueness per job. We should not consider 
 hive.session.id however, as execution of multiple statements via the same 
 user/session which might access the same JAR files will utilize the same 
 session.
 A proposal might be to utilize System.nanoTime() -- which might be enough to 
 avoid the issue, although it's not perfect (depends on JVM and system for 
 level of precision) as part of the default 
 (/tmp/${user.name}/resources/System.nanoTime()/). 
 If anyone else has hit this, would like to capture environment information as 
 well. Perhaps there is something else at play here. 
 Here are some examples of the errors:
 for i in {0..2}; do hive -S -f query.q done
 [2] 48405
 [3] 48406
 [4] 48407
 % #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0x7) at pc=0x7fb10bd931f0, pid=48407, tid=140398456698624
 #
 # JRE version: 6.0_31-b04
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.6-b01 mixed mode linux-amd64 
 compressed oops)
 # Problematic frame:
 # C  [libzip.so+0xb1f0]  __int128+0x60
 #
 # An error report file with more information is saved as:
 # /home/.../hs_err_pid48407.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://java.sun.com/webapps/bugreport/crash.jsp
 # The crash happened outside the Java Virtual Machine in native code.
 # See problematic frame for where to report the bug.
 #
 java.lang.NoClassDefFoundError: com/example/udf/Lower
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:105)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask.java:75)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:63)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:439)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:449)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processInitFiles(CliDriver.java:485)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:692)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
 Caused by: java.lang.ClassNotFoundException: com.example.udf.Lower
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)

[jira] [Resolved] (HIVE-3870) SELECT foo, NULL UNION ALL SELECT bar, baz fails

2013-01-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3870.


Resolution: Duplicate

Dupe of HIVE-3869

 SELECT foo, NULL UNION ALL SELECT bar, baz fails
 

 Key: HIVE-3870
 URL: https://issues.apache.org/jira/browse/HIVE-3870
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: David Morel

 In order to avoid the curse of the last reducer by using a left outer join 
 where most joined rows woudl be NULLs, I rewrote the query as:
 {code}
 SELECT * FROM (
 SELECT
 A.user_id id,
 B.created
 FROM (
 SELECT DISTINCT user_id
 FROM users
 ) A
 JOIN
 buyhist B
 ON
 A.user_id = B.user_id
 AND B.created = '2013-01-01'
 UNION ALL
 SELECT
 DISTINCT(user_id) id,
 NULL created
 FROM users
 ) foo;
 {code}
 The expection thrown is this:
 {code}
 2013-01-07 17:00:01,081 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:264)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
   ... 9 more
 Caused by: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130)
   at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
   ... 14 more
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103)
   ... 17 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
   ... 22 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60)
   at java.lang.String.valueOf(String.java:2826)
   at java.lang.StringBuilder.append(StringBuilder.java:115)
   at 
 org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
   at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98)
   ... 22 more
 {code}
 The 
 org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60)
  caught my attention, so I replaced NULL by an empty string:
 {code}
 ...
 UNION ALL
 SELECT
 DISTINCT(user_id) id,
 '' created
 {code}
 Shouldn't the query parser accept the form using NULL, or at least output a 
 message before the job is sent to the jobtracker?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1900 - Still Failing

2013-01-07 Thread Apache Jenkins Server
Changes for Build #1899

Changes for Build #1900
[hashutosh] HIVE-3431 : Avoid race conditions while downloading resources from 
non-local filesystem (Navis via Ashutosh Chauhan)




No tests ran.

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1900)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1900/ to 
view the results.

[jira] [Commented] (HIVE-3431) Avoid race conditions while downloading resources from non-local filesystem

2013-01-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546111#comment-13546111
 ] 

Hudson commented on HIVE-3431:
--

Integrated in Hive-trunk-h0.21 #1900 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1900/])
HIVE-3431 : Avoid race conditions while downloading resources from 
non-local filesystem (Navis via Ashutosh Chauhan) (Revision 1429916)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1429916
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java


 Avoid race conditions while downloading resources from non-local filesystem
 ---

 Key: HIVE-3431
 URL: https://issues.apache.org/jira/browse/HIVE-3431
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.11.0

 Attachments: HIVE-3431.1.patch.txt, HIVE-3431.D5199.2.patch, 
 HIVE-3431.D5199.3.patch, HIVE-3431.D5199.4.patch


 add resource remote-uri command downloads the resource file to location 
 specified by conf hive.downloaded.resources.dir in local file system. But 
 when the command above is executed concurrently to hive-server for same file, 
 some client fails by VM crash, which is caused by overwritten file by other 
 requests.
 So there should be a configuration to provide per request location for add 
 resource command, something like set 
 hiveconf:hive.downloaded.resources.dir=temporary

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3773) Share input scan by unions across multiple queries

2013-01-07 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3773 started by Gang Tim Liu.

 Share input scan by unions across multiple queries
 --

 Key: HIVE-3773
 URL: https://issues.apache.org/jira/browse/HIVE-3773
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu

 Consider a query like:
 select * from
 (
   select key, 1 as value, count(1) from src group by key
 union all
   select 1 as key, value, count(1) from src group by value
 union all
   select key, value, count(1) from src group by key, value
 ) s;
 src is scanned multiple times currently (one per sub-query).
 This should be treated like a multi-table insert by the optimizer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3773) Share input scan by unions across multiple queries

2013-01-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546139#comment-13546139
 ] 

Ashutosh Chauhan commented on HIVE-3773:


Isn't this already implemented in HIVE-2206 ?

 Share input scan by unions across multiple queries
 --

 Key: HIVE-3773
 URL: https://issues.apache.org/jira/browse/HIVE-3773
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu

 Consider a query like:
 select * from
 (
   select key, 1 as value, count(1) from src group by key
 union all
   select 1 as key, value, count(1) from src group by value
 union all
   select key, value, count(1) from src group by key, value
 ) s;
 src is scanned multiple times currently (one per sub-query).
 This should be treated like a multi-table insert by the optimizer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-07 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546150#comment-13546150
 ] 

Yin Huai commented on HIVE-2206:


[~liuzongquan] The latest patch was developed based on hive trunk revision 
1410581.

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546152#comment-13546152
 ] 

He Yongqiang commented on HIVE-3585:


HBaseSerde is first added to contrib and then moved to core later.
  
bq. Pig is adding TrevniStorage as a builtin, and interoperability is desired.
I think interoperability is not a problem no matter where the code residents.

 Integrate Trevni as another columnar oriented file format
 -

 Key: HIVE-3585
 URL: https://issues.apache.org/jira/browse/HIVE-3585
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: alex gemini
Assignee: Mark Wagner
Priority: Minor

 add new avro module trevni as another columnar format.New columnar format 
 need a columnar SerDe,seems fastutil is a good choice.the shark project use 
 fastutil library as columnar serde library but it seems too large (almost 
 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546170#comment-13546170
 ] 

Sean Busbey commented on HIVE-3585:
---

[~namita] Trevni defines a columnar format that can be used with different 
serialization systems. I believe initial efforts across different components 
are planning to use Avro for serialization.

Eventually, Trevni support should also work for Thrift and Protobufs.

 Integrate Trevni as another columnar oriented file format
 -

 Key: HIVE-3585
 URL: https://issues.apache.org/jira/browse/HIVE-3585
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: alex gemini
Assignee: Mark Wagner
Priority: Minor

 add new avro module trevni as another columnar format.New columnar format 
 need a columnar SerDe,seems fastutil is a good choice.the shark project use 
 fastutil library as columnar serde library but it seems too large (almost 
 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546210#comment-13546210
 ] 

Carl Steinbach commented on HIVE-3585:
--

bq. HBaseSerde is first added to contrib and then moved to core later.

And what did this accomplish? Wouldn't it have been better to put it in core to 
begin with? In fact, can anyone tell me why we shouldn't abolish contrib 
altogether?

 Integrate Trevni as another columnar oriented file format
 -

 Key: HIVE-3585
 URL: https://issues.apache.org/jira/browse/HIVE-3585
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: alex gemini
Assignee: Mark Wagner
Priority: Minor

 add new avro module trevni as another columnar format.New columnar format 
 need a columnar SerDe,seems fastutil is a good choice.the shark project use 
 fastutil library as columnar serde library but it seems too large (almost 
 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3773) Share input scan by unions across multiple queries

2013-01-07 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546230#comment-13546230
 ] 

Gang Tim Liu commented on HIVE-3773:


thank you for great point.

Yes, it can. In addition, it can solve much complexer queries like join and 
will bring other benefits.

This issue is targeted to solve the simple use case in a simple way. It will 
benefit general purpose including the use case where configuration of 2206 is 
not turned on. 

 Share input scan by unions across multiple queries
 --

 Key: HIVE-3773
 URL: https://issues.apache.org/jira/browse/HIVE-3773
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu

 Consider a query like:
 select * from
 (
   select key, 1 as value, count(1) from src group by key
 union all
   select 1 as key, value, count(1) from src group by value
 union all
   select key, value, count(1) from src group by key, value
 ) s;
 src is scanned multiple times currently (one per sub-query).
 This should be treated like a multi-table insert by the optimizer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira



[jira] [Commented] (HIVE-2693) Add DECIMAL data type

2013-01-07 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546256#comment-13546256
 ] 

Mark Grover commented on HIVE-2693:
---

Non-committer +1

Namit, any thoughts on the UDF method selection logic?

 Add DECIMAL data type
 -

 Key: HIVE-2693
 URL: https://issues.apache.org/jira/browse/HIVE-2693
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor, Types
Affects Versions: 0.10.0
Reporter: Carl Steinbach
Assignee: Prasad Mujumdar
 Attachments: 2693_7.patch, 2693_8.patch, 2693_fix_all_tests1.patch, 
 HIVE-2693-10.patch, HIVE-2693-11.patch, HIVE-2693-12-SortableSerDe.patch, 
 HIVE-2693-13.patch, HIVE-2693-14.patch, HIVE-2693-15.patch, 
 HIVE-2693-16.patch, HIVE-2693-17.patch, HIVE-2693-18.patch, 
 HIVE-2693-19.patch, HIVE-2693-1.patch.txt, HIVE-2693-all.patch, 
 HIVE-2693.D7683.1.patch, HIVE-2693-fix.patch, HIVE-2693.patch, 
 HIVE-2693-take3.patch, HIVE-2693-take4.patch


 Add support for the DECIMAL data type. HIVE-2272 (TIMESTAMP) provides a nice 
 template for how to do this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3789) Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9

2013-01-07 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546285#comment-13546285
 ] 

Arup Malakar commented on HIVE-3789:


Hi Ashutosh, you are right. My concern was that checkPath() should look for 
pfile:// scheme in the path that is passed. It  

For the test cases to pass adding resolvePath() is sufficient. I will submit a 
patch without the modification in checkPath().

 Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9
 

 Key: HIVE-3789
 URL: https://issues.apache.org/jira/browse/HIVE-3789
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Tests
Affects Versions: 0.9.0, 0.10.0
 Environment: Hadooop 0.23.5, JDK 1.6.0_31
Reporter: Chris Drome
Assignee: Arup Malakar
 Attachments: HIVE-3789.branch-0.9_1.patch, HIVE-3789.trunk.1.patch


 Rolling back to before this patch shows that the unit tests are passing, 
 after the patch, the majority of the unit tests are failing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3789) Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9

2013-01-07 Thread Arup Malakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arup Malakar updated HIVE-3789:
---

Attachment: HIVE-3789.branch-0.9_2.patch
HIVE-3789.trunk.2.patch

Patch with reverted checkPath()

 Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9
 

 Key: HIVE-3789
 URL: https://issues.apache.org/jira/browse/HIVE-3789
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Tests
Affects Versions: 0.9.0, 0.10.0
 Environment: Hadooop 0.23.5, JDK 1.6.0_31
Reporter: Chris Drome
Assignee: Arup Malakar
 Attachments: HIVE-3789.branch-0.9_1.patch, 
 HIVE-3789.branch-0.9_2.patch, HIVE-3789.trunk.1.patch, HIVE-3789.trunk.2.patch


 Rolling back to before this patch shows that the unit tests are passing, 
 after the patch, the majority of the unit tests are failing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546348#comment-13546348
 ] 

He Yongqiang commented on HIVE-3585:


contrib is a good place for any projects that is not mature. There are so many 
custom data formats out there, it does not make sense to support all of them in 
core hive code base. contrib is a good place for them to grow. 

Another good place i can think of is the hcatalog project.  


 Integrate Trevni as another columnar oriented file format
 -

 Key: HIVE-3585
 URL: https://issues.apache.org/jira/browse/HIVE-3585
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: alex gemini
Assignee: Mark Wagner
Priority: Minor

 add new avro module trevni as another columnar format.New columnar format 
 need a columnar SerDe,seems fastutil is a good choice.the shark project use 
 fastutil library as columnar serde library but it seems too large (almost 
 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546348#comment-13546348
 ] 

He Yongqiang edited comment on HIVE-3585 at 1/7/13 10:40 PM:
-

contrib is a good place for any projects that is not mature. There are so many 
custom data formats out there, it does not make sense to support all of them in 
core hive code base. contrib is a good place for them to grow. 

From http://incubator.apache.org/hcatalog/docs/r0.4.0/, another good place i 
can think of is the hcatalog project. But i don't know if hcatalog itself 
includes custom data format support or not.

  was (Author: he yongqiang):
contrib is a good place for any projects that is not mature. There are so 
many custom data formats out there, it does not make sense to support all of 
them in core hive code base. contrib is a good place for them to grow. 

Another good place i can think of is the hcatalog project.  

  
 Integrate Trevni as another columnar oriented file format
 -

 Key: HIVE-3585
 URL: https://issues.apache.org/jira/browse/HIVE-3585
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: alex gemini
Assignee: Mark Wagner
Priority: Minor

 add new avro module trevni as another columnar format.New columnar format 
 need a columnar SerDe,seems fastutil is a good choice.the shark project use 
 fastutil library as columnar serde library but it seems too large (almost 
 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546400#comment-13546400
 ] 

Carl Steinbach commented on HIVE-3585:
--

The only concrete difference between core and contrib that I'm aware of is that 
the latter doesn't appear on Hive's classpath by default. As such I can only 
see two advantages to putting code in contrib: 1) it makes it harder for folks 
to use, and 2) it makes it harder for us to test. Did I miss anything?

 Integrate Trevni as another columnar oriented file format
 -

 Key: HIVE-3585
 URL: https://issues.apache.org/jira/browse/HIVE-3585
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: alex gemini
Assignee: Mark Wagner
Priority: Minor

 add new avro module trevni as another columnar format.New columnar format 
 need a columnar SerDe,seems fastutil is a good choice.the shark project use 
 fastutil library as columnar serde library but it seems too large (almost 
 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Blue tables in Hive xdocs

2013-01-07 Thread Lefty Leverenz
Tables in Hive xdocs have a default background color that's rather
overpowering (see Hive interactive shell commands in
http://hive.apache.org/docs/r0.9.0/language_manual/cli.html).  I'm working
on a new doc that has lots of tables, so I tried to change the color to
white (or any quieter color) but had no luck.

Is this an Anakia issue, or Velocity?  Does anyone know how to set the
color either cell-by-cell or for the whole table?

Thanks for any help or pointers to help.


– Lefty Leverenz


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546516#comment-13546516
 ] 

Russell Jurney commented on HIVE-3585:
--

He, HCatalog uses Hive Serde. By adding the Trevni builtin for Apache Hive, 
Apache Hive, Shark, Apache HCatalog and Apache Pig will all get Trevni support. 
Synergy, baby!

Apache Trevni is part of an actual Apache top-level project, Apache Avro, so it 
is nothing like Zebra, which I notice you reported yourself for addition in 
HIVE-781. Avro and Trevni are specifically designed for Hadoop workloads, and 
other tools like Pig are including Trevni immediately.


 Integrate Trevni as another columnar oriented file format
 -

 Key: HIVE-3585
 URL: https://issues.apache.org/jira/browse/HIVE-3585
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: alex gemini
Assignee: Mark Wagner
Priority: Minor

 add new avro module trevni as another columnar format.New columnar format 
 need a columnar SerDe,seems fastutil is a good choice.the shark project use 
 fastutil library as columnar serde library but it seems too large (almost 
 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546517#comment-13546517
 ] 

Russell Jurney commented on HIVE-3585:
--

This ticket now has 5 votes, and 22 watchers. Support for a Trevni builtin is 
overwhelming.

 Integrate Trevni as another columnar oriented file format
 -

 Key: HIVE-3585
 URL: https://issues.apache.org/jira/browse/HIVE-3585
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: alex gemini
Assignee: Mark Wagner
Priority: Minor

 add new avro module trevni as another columnar format.New columnar format 
 need a columnar SerDe,seems fastutil is a good choice.the shark project use 
 fastutil library as columnar serde library but it seems too large (almost 
 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3789) Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9

2013-01-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546535#comment-13546535
 ] 

Ashutosh Chauhan commented on HIVE-3789:


+1

 Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9
 

 Key: HIVE-3789
 URL: https://issues.apache.org/jira/browse/HIVE-3789
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Tests
Affects Versions: 0.9.0, 0.10.0
 Environment: Hadooop 0.23.5, JDK 1.6.0_31
Reporter: Chris Drome
Assignee: Arup Malakar
 Attachments: HIVE-3789.branch-0.9_1.patch, 
 HIVE-3789.branch-0.9_2.patch, HIVE-3789.trunk.1.patch, HIVE-3789.trunk.2.patch


 Rolling back to before this patch shows that the unit tests are passing, 
 after the patch, the majority of the unit tests are failing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3853:
--

Attachment: HIVE-3853.D7767.2.patch

navis updated the revision HIVE-3853 [jira] UDF unix_timestamp is 
deterministic if an argument is given, but it treated as non-deterministic 
preventing PPD.
Reviewers: JIRA

  Addressed comments


REVISION DETAIL
  https://reviews.facebook.net/D7767

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUnixTimeStamp.java
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToUnixTimeStamp.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUnixTimeStamp.java
  ql/src/test/queries/clientpositive/udf_to_unix_timestamp.q
  ql/src/test/queries/clientpositive/udf_unix_timestamp.q
  ql/src/test/results/clientpositive/show_functions.q.out
  ql/src/test/results/clientpositive/udf5.q.out
  ql/src/test/results/clientpositive/udf_to_unix_timestamp.q.out
  ql/src/test/results/clientpositive/udf_unix_timestamp.q.out

To: JIRA, navis
Cc: njain


 UDF unix_timestamp is deterministic if an argument is given, but it treated 
 as non-deterministic preventing PPD
 ---

 Key: HIVE-3853
 URL: https://issues.apache.org/jira/browse/HIVE-3853
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Navis
Assignee: Navis
Priority: Trivial
  Labels: udf
 Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch


 unix_timestamp is declared as a non-deterministic function. But if user 
 provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3853:


Status: Patch Available  (was: Open)

 UDF unix_timestamp is deterministic if an argument is given, but it treated 
 as non-deterministic preventing PPD
 ---

 Key: HIVE-3853
 URL: https://issues.apache.org/jira/browse/HIVE-3853
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Navis
Assignee: Navis
Priority: Trivial
  Labels: udf
 Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch


 unix_timestamp is declared as a non-deterministic function. But if user 
 provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546578#comment-13546578
 ] 

Phabricator commented on HIVE-3853:
---

navis has commented on the revision HIVE-3853 [jira] UDF unix_timestamp is 
deterministic if an argument is given, but it treated as non-deterministic 
preventing PPD.

  I've heard annotation information is a part of class definition, which cannot 
be overwritten in runtime.

REVISION DETAIL
  https://reviews.facebook.net/D7767

To: JIRA, navis
Cc: njain


 UDF unix_timestamp is deterministic if an argument is given, but it treated 
 as non-deterministic preventing PPD
 ---

 Key: HIVE-3853
 URL: https://issues.apache.org/jira/browse/HIVE-3853
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Navis
Assignee: Navis
Priority: Trivial
  Labels: udf
 Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch


 unix_timestamp is declared as a non-deterministic function. But if user 
 provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format

2013-01-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546609#comment-13546609
 ] 

Namit Jain commented on HIVE-3585:
--

The main reason that contrib exists is to add new features/projects which are 
being tested, may take some time to
mature, and are reasonably stand-alone, so that they dont need many changes in 
existing code. New serdes/fileformats/udfs
are good usecases for them.

I dont see why is testing/development in contrib so difficult or different as 
compared to development in any other component.
This is the reason why contrib was added, so new stand-alone components can 
bake. We can definitely move it from contrib, once
it is mature/safe.

Why is development in contrib such a bad idea ?

 Integrate Trevni as another columnar oriented file format
 -

 Key: HIVE-3585
 URL: https://issues.apache.org/jira/browse/HIVE-3585
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: alex gemini
Assignee: Mark Wagner
Priority: Minor

 add new avro module trevni as another columnar format.New columnar format 
 need a columnar SerDe,seems fastutil is a good choice.the shark project use 
 fastutil library as columnar serde library but it seems too large (almost 
 15m) for just a few primitive array collection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3803:
-

Attachment: hive.3803.8.patch

 explain dependency should show the dependencies hierarchically in presence of 
 views
 ---

 Key: HIVE-3803
 URL: https://issues.apache.org/jira/browse/HIVE-3803
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
 hive.3803.4.patch, hive.3803.5.patch, hive.3803.6.patch, hive.3803.7.patch, 
 hive.3803.8.patch


 It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3871) show number of mappers/reducers as part of explain extended

2013-01-07 Thread Namit Jain (JIRA)
Namit Jain created HIVE-3871:


 Summary: show number of mappers/reducers as part of explain 
extended
 Key: HIVE-3871
 URL: https://issues.apache.org/jira/browse/HIVE-3871
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain


It would be useful to show the number of mappers/reducers as part of explain 
extended.
For the MR jobs referencing intermediate data, the number can be approximate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3803:
-

Attachment: hive.3803.9.patch

 explain dependency should show the dependencies hierarchically in presence of 
 views
 ---

 Key: HIVE-3803
 URL: https://issues.apache.org/jira/browse/HIVE-3803
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, 
 hive.3803.4.patch, hive.3803.5.patch, hive.3803.6.patch, hive.3803.7.patch, 
 hive.3803.8.patch, hive.3803.9.patch


 It should also include tables whose partitions are being accessed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3825) Add Operator level Hooks

2013-01-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546616#comment-13546616
 ] 

Namit Jain commented on HIVE-3825:
--

Look at optrstat_groupby.q for an example.

 Add Operator level Hooks
 

 Key: HIVE-3825
 URL: https://issues.apache.org/jira/browse/HIVE-3825
 Project: Hive
  Issue Type: New Feature
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3825.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546629#comment-13546629
 ] 

Namit Jain commented on HIVE-3853:
--

+1

 UDF unix_timestamp is deterministic if an argument is given, but it treated 
 as non-deterministic preventing PPD
 ---

 Key: HIVE-3853
 URL: https://issues.apache.org/jira/browse/HIVE-3853
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Navis
Assignee: Navis
Priority: Trivial
  Labels: udf
 Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch


 unix_timestamp is declared as a non-deterministic function. But if user 
 provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD

2013-01-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546631#comment-13546631
 ] 

Phabricator commented on HIVE-3853:
---

njain has accepted the revision HIVE-3853 [jira] UDF unix_timestamp is 
deterministic if an argument is given, but it treated as non-deterministic 
preventing PPD.

REVISION DETAIL
  https://reviews.facebook.net/D7767

BRANCH
  DPAL-1956

To: JIRA, njain, navis
Cc: njain


 UDF unix_timestamp is deterministic if an argument is given, but it treated 
 as non-deterministic preventing PPD
 ---

 Key: HIVE-3853
 URL: https://issues.apache.org/jira/browse/HIVE-3853
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Navis
Assignee: Navis
Priority: Trivial
  Labels: udf
 Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch


 unix_timestamp is declared as a non-deterministic function. But if user 
 provides an argument, it makes deterministic result and eligible to PPD.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2013-01-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546643#comment-13546643
 ] 

Phabricator commented on HIVE-3562:
---

njain has commented on the revision HIVE-3562 [jira] Some limit can be pushed 
down to map stage.

INLINE COMMENTS
  conf/hive-default.xml.template:1434 Can you add more details here - a example 
query would really help ?
  ql/src/test/queries/clientpositive/limit_pushdown.q:16 What is so special 
about 40 ?

  set hive.limit.pushdown.heap.threshold explicitly at the beginning of the 
test, makes the
  test easier to maintain in the long run.

  ql/src/test/queries/clientpositive/limit_pushdown.q:34 What is the difference 
between this and line 3 ?

  ql/src/test/queries/clientpositive/limit_pushdown.q:10 I think this plan is 
not correct.

  Let us say, the values are
  v1
  v2
  ..
  v10
  v11
  v12
  ..
  v20

  The first mapper does not have v8-10, so it emits v1-v7, v11-v13
  The second mapper contains data for all values, but it only emits v1-v10

  Since it does not involves a order by, it is possible that the data for v11 
will get picked up, which does not contain data from the second mapper. If you 
are pushing the limit up, you should create an additional MR job which orders 
the rows - in the above example, making sure that only v1-v10 are picked up.

  Am I missing something here ?

REVISION DETAIL
  https://reviews.facebook.net/D5967

To: JIRA, tarball, navis
Cc: njain


 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3562) Some limit can be pushed down to map stage

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3562:
-

Status: Open  (was: Patch Available)

comments

 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2013-01-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546649#comment-13546649
 ] 

Phabricator commented on HIVE-3562:
---

njain has commented on the revision HIVE-3562 [jira] Some limit can be pushed 
down to map stage.

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:75 
remove the TODO
  ql/src/test/queries/clientpositive/limit_pushdown.q:51 There is no test where 
the limit is  hive.limit.pushdown.heap.threshold.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:87 
Do you want to compare the threshold with the actual limit here ?


REVISION DETAIL
  https://reviews.facebook.net/D5967

To: JIRA, tarball, navis
Cc: njain


 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2013-01-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546661#comment-13546661
 ] 

Phabricator commented on HIVE-3562:
---

njain has commented on the revision HIVE-3562 [jira] Some limit can be pushed 
down to map stage.

  Sorry, my earlier comments were assuming that the threshold is for number of 
rows

INLINE COMMENTS
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:483 Coming to a 
earlier comment from Sivaramakrishnan Narayanan, would it be simpler if this 
was the number of rows ?
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:414 Define 
40 as a constant somewhere

REVISION DETAIL
  https://reviews.facebook.net/D5967

To: JIRA, tarball, navis
Cc: njain


 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira