[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables
[ https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3699: - Status: Open (was: Patch Available) A lot of tests are failing - can you debug ? Multiple insert overwrite into multiple tables query stores same results in all tables -- Key: HIVE-3699 URL: https://issues.apache.org/jira/browse/HIVE-3699 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch Reporter: Alexandre Fouché Assignee: Navis Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, HIVE-3699_hive-0.9.1.patch.txt (Note: This might be related to HIVE-2750) I am doing a query with multiple INSERT OVERWRITE to multiple tables in order to scan the dataset only 1 time, and i end up having all these tables with the same content ! It seems the GROUP BY query that returns results is overwriting all the temp tables. Weird enough, if i had further GROUP BY queries into additional temp tables, grouped by a different field, then all temp tables, even the ones that would have been wrong content are all correctly populated. This is the misbehaving query: FROM nikon INSERT OVERWRITE TABLE e1 SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid INSERT OVERWRITE TABLE e2 SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid ; It launches only one MR job and here are the results. Why does table 'e1' contains results from table 'e2' ?! Table 'e1' should have been empty (see individual SELECTs further below) hive SELECT * from e1; OK NULL2 1627575 25 1627576 70 1690950 22 1690952 42 1696705 199 1696706 66 1696730 229 1696759 85 1696893 218 Time taken: 0.229 seconds hive SELECT * from e2; OK NULL2 1627575 25 1627576 70 1690950 22 1690952 42 1696705 199 1696706 66 1696730 229 1696759 85 1696893 218 Time taken: 0.11 seconds Here is are the result to the indiviual queries (only the second query returns a result set): hive SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM nikon WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid; (...) OK - There are no results, this is normal Time taken: 41.471 seconds hive SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid; (...) OK NULL 2 1627575 25 1627576 70 1690950 22 1690952 42 1696705 199 1696706 66 1696730 229 1696759 85 1696893 218 Time taken: 39.607 seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD
[ https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545816#comment-13545816 ] Phabricator commented on HIVE-3853: --- njain has commented on the revision HIVE-3853 [jira] UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD. This calls for deterministic not being an annotation - by any chance, do you know if the annotation can be overwritten dynamically -- otherwise duplicate function is OK INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToUnixTimestamp.java:52 Can you share the code between this and unix_timestamp ? I mean create a common class,and both the functions can extend that. REVISION DETAIL https://reviews.facebook.net/D7767 To: JIRA, navis Cc: njain UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD --- Key: HIVE-3853 URL: https://issues.apache.org/jira/browse/HIVE-3853 Project: Hive Issue Type: Improvement Components: UDF Reporter: Navis Assignee: Navis Priority: Trivial Labels: udf Attachments: HIVE-3853.D7767.1.patch unix_timestamp is declared as a non-deterministic function. But if user provides an argument, it makes deterministic result and eligible to PPD. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD
[ https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3853: - Status: Open (was: Patch Available) comments UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD --- Key: HIVE-3853 URL: https://issues.apache.org/jira/browse/HIVE-3853 Project: Hive Issue Type: Improvement Components: UDF Reporter: Navis Assignee: Navis Priority: Trivial Labels: udf Attachments: HIVE-3853.D7767.1.patch unix_timestamp is declared as a non-deterministic function. But if user provides an argument, it makes deterministic result and eligible to PPD. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views
[ https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3803: - Attachment: hive.3803.7.patch explain dependency should show the dependencies hierarchically in presence of views --- Key: HIVE-3803 URL: https://issues.apache.org/jira/browse/HIVE-3803 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, hive.3803.4.patch, hive.3803.5.patch, hive.3803.6.patch, hive.3803.7.patch It should also include tables whose partitions are being accessed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more
[ https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545819#comment-13545819 ] Namit Jain commented on HIVE-3852: -- [~navis], I had a higher level question. Should we have this optimization now ? I mean, is this really needed with map-side aggregates, or can we remove this code completely ? Multi-groupby optimization fails when same distinct column is used twice or more Key: HIVE-3852 URL: https://issues.apache.org/jira/browse/HIVE-3852 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3852.D7737.1.patch {code} FROM INPUT INSERT OVERWRITE TABLE dest1 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct substr(INPUT.value,5)) GROUP BY INPUT.key INSERT OVERWRITE TABLE dest2 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct substr(INPUT.value,5)) GROUP BY INPUT.key; {code} fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more
[ https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3852: - Status: Open (was: Patch Available) Multi-groupby optimization fails when same distinct column is used twice or more Key: HIVE-3852 URL: https://issues.apache.org/jira/browse/HIVE-3852 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3852.D7737.1.patch {code} FROM INPUT INSERT OVERWRITE TABLE dest1 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct substr(INPUT.value,5)) GROUP BY INPUT.key INSERT OVERWRITE TABLE dest2 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct substr(INPUT.value,5)) GROUP BY INPUT.key; {code} fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow and
binlijin created HIVE-3868: -- Summary: Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow and Key: HIVE-3868 URL: https://issues.apache.org/jira/browse/HIVE-3868 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Reporter: binlijin In LazyHBaseRow, {code} private Object uncheckedGetField(int fieldID) { // it is a column i.e. a column-family with column-qualifier byte [] res = result.getValue(colMap.familyNameBytes, colMap.qualifierNameBytes); if (res == null) { return null; } else { ref = new ByteArrayRef(); ref.setData(res); } if (ref != null) { fields[fieldID].init(ref, 0, ref.getData().length); } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow
[ https://issues.apache.org/jira/browse/HIVE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HIVE-3868: --- Summary: Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow (was: Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow and) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow --- Key: HIVE-3868 URL: https://issues.apache.org/jira/browse/HIVE-3868 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Reporter: binlijin In LazyHBaseRow, {code} private Object uncheckedGetField(int fieldID) { // it is a column i.e. a column-family with column-qualifier byte [] res = result.getValue(colMap.familyNameBytes, colMap.qualifierNameBytes); if (res == null) { return null; } else { ref = new ByteArrayRef(); ref.setData(res); } if (ref != null) { fields[fieldID].init(ref, 0, ref.getData().length); } } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow
[ https://issues.apache.org/jira/browse/HIVE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HIVE-3868: --- Description: In LazyHBaseRow, {code} private Object uncheckedGetField(int fieldID) { // it is a column i.e. a column-family with column-qualifier byte [] res = result.getValue(colMap.familyNameBytes, colMap.qualifierNameBytes); if (res == null) { return null; } else { ref = new ByteArrayRef(); ref.setData(res); } if (ref != null) { fields[fieldID].init(ref, 0, ref.getData().length); } } For example, if the fields[fieldID] is Bigint, and ref stores HBase byte data (Long), it will use LazyLong to parse this data and will return NULL value, it should use Bytes.toLong(res.getData()) to parse this byte data {code} was: In LazyHBaseRow, {code} private Object uncheckedGetField(int fieldID) { // it is a column i.e. a column-family with column-qualifier byte [] res = result.getValue(colMap.familyNameBytes, colMap.qualifierNameBytes); if (res == null) { return null; } else { ref = new ByteArrayRef(); ref.setData(res); } if (ref != null) { fields[fieldID].init(ref, 0, ref.getData().length); } } {code} Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow --- Key: HIVE-3868 URL: https://issues.apache.org/jira/browse/HIVE-3868 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Reporter: binlijin In LazyHBaseRow, {code} private Object uncheckedGetField(int fieldID) { // it is a column i.e. a column-family with column-qualifier byte [] res = result.getValue(colMap.familyNameBytes, colMap.qualifierNameBytes); if (res == null) { return null; } else { ref = new ByteArrayRef(); ref.setData(res); } if (ref != null) { fields[fieldID].init(ref, 0, ref.getData().length); } } For example, if the fields[fieldID] is Bigint, and ref stores HBase byte data (Long), it will use LazyLong to parse this data and will return NULL value, it should use Bytes.toLong(res.getData()) to parse this byte data {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow
[ https://issues.apache.org/jira/browse/HIVE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HIVE-3868: --- Description: In LazyHBaseRow, {code} private Object uncheckedGetField(int fieldID) { // it is a column i.e. a column-family with column-qualifier byte [] res = result.getValue(colMap.familyNameBytes, colMap.qualifierNameBytes); if (res == null) { return null; } else { ref = new ByteArrayRef(); ref.setData(res); } if (ref != null) { fields[fieldID].init(ref, 0, ref.getData().length); } } For example, if the fields[fieldID] is Bigint, and ref stores HBase byte data (Long), it will use LazyLong to parse this data and will return NULL value, it should use Bytes.toLong(res.getData()) to parse this byte data {code} was: In LazyHBaseRow, {code} private Object uncheckedGetField(int fieldID) { // it is a column i.e. a column-family with column-qualifier byte [] res = result.getValue(colMap.familyNameBytes, colMap.qualifierNameBytes); if (res == null) { return null; } else { ref = new ByteArrayRef(); ref.setData(res); } if (ref != null) { fields[fieldID].init(ref, 0, ref.getData().length); } } For example, if the fields[fieldID] is Bigint, and ref stores HBase byte data (Long), it will use LazyLong to parse this data and will return NULL value, it should use Bytes.toLong(res.getData()) to parse this byte data {code} Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow --- Key: HIVE-3868 URL: https://issues.apache.org/jira/browse/HIVE-3868 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Reporter: binlijin In LazyHBaseRow, {code} private Object uncheckedGetField(int fieldID) { // it is a column i.e. a column-family with column-qualifier byte [] res = result.getValue(colMap.familyNameBytes, colMap.qualifierNameBytes); if (res == null) { return null; } else { ref = new ByteArrayRef(); ref.setData(res); } if (ref != null) { fields[fieldID].init(ref, 0, ref.getData().length); } } For example, if the fields[fieldID] is Bigint, and ref stores HBase byte data (Long), it will use LazyLong to parse this data and will return NULL value, it should use Bytes.toLong(res.getData()) to parse this byte data {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3868) Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow
[ https://issues.apache.org/jira/browse/HIVE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545860#comment-13545860 ] binlijin commented on HIVE-3868: The reason is: We use HBase's Bytes to convert long and other data type to byte data and store in hbase. Then use hive to analysis the data in hbase. Use Hive‘s serde to parse HBase’s byte Data in LazyHBaseRow --- Key: HIVE-3868 URL: https://issues.apache.org/jira/browse/HIVE-3868 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Reporter: binlijin In LazyHBaseRow, {code} private Object uncheckedGetField(int fieldID) { // it is a column i.e. a column-family with column-qualifier byte [] res = result.getValue(colMap.familyNameBytes, colMap.qualifierNameBytes); if (res == null) { return null; } else { ref = new ByteArrayRef(); ref.setData(res); } if (ref != null) { fields[fieldID].init(ref, 0, ref.getData().length); } } For example, if the fields[fieldID] is Bigint, and ref stores HBase byte data (Long), it will use LazyLong to parse this data and will return NULL value, it should use Bytes.toLong(res.getData()) to parse this byte data {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1898 - Fixed
Changes for Build #1896 Changes for Build #1897 Changes for Build #1898 [namit] HIVE-3300 LOAD DATA INPATH fails if a hdfs file with same name is added to table (Navis via namit) [namit] HIVE-3842 Remove redundant test codes (Navis via namit) All tests passed The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1898) Status: Fixed Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1898/ to view the results.
[jira] [Commented] (HIVE-3842) Remove redundant test codes
[ https://issues.apache.org/jira/browse/HIVE-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545875#comment-13545875 ] Hudson commented on HIVE-3842: -- Integrated in Hive-trunk-h0.21 #1898 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1898/]) HIVE-3842 Remove redundant test codes (Navis via namit) (Revision 1429682) Result = SUCCESS namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1429682 Files : * /hive/trunk/hbase-handler/src/test/templates/TestHBaseCliDriver.vm * /hive/trunk/hbase-handler/src/test/templates/TestHBaseNegativeCliDriver.vm * /hive/trunk/ql/src/test/templates/TestCliDriver.vm * /hive/trunk/ql/src/test/templates/TestNegativeCliDriver.vm * /hive/trunk/ql/src/test/templates/TestParse.vm * /hive/trunk/ql/src/test/templates/TestParseNegative.vm Remove redundant test codes --- Key: HIVE-3842 URL: https://issues.apache.org/jira/browse/HIVE-3842 Project: Hive Issue Type: Test Components: Tests Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.11.0 Attachments: HIVE-3842.D7773.1.patch Currently hive writes same test code again and again for each test, making test class huge (50k line for ql). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3300) LOAD DATA INPATH fails if a hdfs file with same name is added to table
[ https://issues.apache.org/jira/browse/HIVE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545874#comment-13545874 ] Hudson commented on HIVE-3300: -- Integrated in Hive-trunk-h0.21 #1898 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1898/]) HIVE-3300 LOAD DATA INPATH fails if a hdfs file with same name is added to table (Navis via namit) (Revision 1429686) Result = SUCCESS namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1429686 Files : * /hive/trunk/build-common.xml * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java * /hive/trunk/ql/src/test/queries/clientpositive/load_fs2.q * /hive/trunk/ql/src/test/results/clientpositive/load_fs2.q.out LOAD DATA INPATH fails if a hdfs file with same name is added to table -- Key: HIVE-3300 URL: https://issues.apache.org/jira/browse/HIVE-3300 Project: Hive Issue Type: Bug Components: Import/Export Affects Versions: 0.10.0 Environment: ubuntu linux, hadoop 1.0.3, hive 0.9 Reporter: Bejoy KS Assignee: Navis Fix For: 0.11.0 Attachments: HIVE-3300.1.patch.txt, HIVE-3300.D4383.3.patch, HIVE-3300.D4383.4.patch If we are loading data from local fs to hive tables using 'LOAD DATA LOCAL INPATH' and if a file with the same name exists in the table's location then the new file will be suffixed by *_copy_1. But if we do the 'LOAD DATA INPATH' for a file in hdfs then there is no rename happening but just a move task is getting triggered. Since a file with same name exists in same hdfs location, hadoop fs move operation throws an error. hive LOAD DATA INPATH '/userdata/bejoy/site.txt' INTO TABLE test.site; Loading data to table test.site Failed with exception null FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask hive -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2935) Implement HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545973#comment-13545973 ] Nicolas Fouché commented on HIVE-2935: -- Using CDH 4.1.2, which includes this patch. I think there's a problem with hive-jdbc which includes a JDBC driver for the two version of hiveserver. For the first version of hiveserver, hive-jdbc-0.9.0-cdh4.1.2 depends on libthrift-1.5.0, which defines org.apache.thrift.TServiceClient as an Interface. For hiveserver2, hive-jdbc-0.9.0-cdh4.1.2 depends on hive-service-0.9.0-cdh4.1.2, which depends on hive-service-0.9.0-cdh4.1.2. The later seems to include code from libthrift, and defines org.apache.thrift.TServiceClient as an abstract class. Thus this happens: java.lang.IncompatibleClassChangeError: class org.apache.hive.service.cli.thrift.TCLIService$Client has interface org.apache.thrift.TServiceClient as super class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(Unknown Source) at java.lang.ClassLoader.defineClass(Unknown Source) at java.security.SecureClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.access$000(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:157) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:96) Of course, I just have to remove libthrift from my libpath. But I just wanted to make Carl Steinbach know. (I used maven-dependency-plugin to get all dependent JARs, without thinking about what would be useless, or incompatible) Implement HiveServer2 - Key: HIVE-2935 URL: https://issues.apache.org/jira/browse/HIVE-2935 Project: Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Carl Steinbach Assignee: Carl Steinbach Labels: HiveServer2 Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt, HS2-changed-files-only.patch, HS2-with-thrift-patch-rebased.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545984#comment-13545984 ] Liu Zongquan commented on HIVE-2206: If I plan to merge HIVE-2206 into the hive source code, which branch should I use? Can someone tell me? add a new optimizer for query correlation discovery and optimization Key: HIVE-2206 URL: https://issues.apache.org/jira/browse/HIVE-2206 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.10.0 Reporter: He Yongqiang Assignee: Yin Huai Attachments: HIVE-2206.10-r1384442.patch.txt, HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch This issue proposes a new logical optimizer called Correlation Optimizer, which is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The paper and slides of YSmart are linked at the bottom. Since Hive translates queries in a sentence by sentence fashion, for every operation which may need to shuffle the data (e.g. join and aggregation operations), Hive will generate a MapReduce job for that operation. However, for those operations which may need to shuffle the data, they may involve correlations explained below and thus can be executed in a single MR job. # Input Correlation: Multiple MR jobs have input correlation (IC) if their input relation sets are not disjoint; # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they have not only input correlation, but also the same partition key; # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its child nodes if it has the same partition key as that child node. The current implementation of correlation optimizer only detect correlations among MR jobs for reduce-side join operators and reduce-side aggregation operators (not map only aggregation). A query will be optimized if it satisfies following conditions. # There exists a MR job for reduce-side join operator or reduce side aggregation operator which have JFC with all of its parents MR jobs (TCs will be also exploited if JFC exists); # All input tables of those correlated MR job are original input tables (not intermediate tables generated by sub-queries); and # No self join is involved in those correlated MR jobs. Correlation optimizer is implemented as a logical optimizer. The main reasons are that it only needs to manipulate the query plan tree and it can leverage the existing component on generating MR jobs. Current implementation can serve as a framework for correlation related optimizations. I think that it is better than adding individual optimizers. There are several work that can be done in future to improve this optimizer. Here are three examples. # Support queries only involve TC; # Support queries in which input tables of correlated MR jobs involves intermediate tables; and # Optimize queries involving self join. References: Paper and presentation of YSmart. Paper: http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf Slides: http://sdrv.ms/UpwJJc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-0.10.0-SNAPSHOT-h0.20.1 #27
See https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/27/ -- [...truncated 8145 lines...] [echo] Project: common create-dirs: [echo] Project: serde [copy] Warning: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/serde/src/test/resources does not exist. init: [echo] Project: serde create-dirs: [echo] Project: metastore [copy] Warning: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/metastore/src/test/resources does not exist. init: [echo] Project: metastore create-dirs: [echo] Project: ql init: [echo] Project: ql create-dirs: [echo] Project: contrib [copy] Warning: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/contrib/src/test/resources does not exist. init: [echo] Project: contrib create-dirs: [echo] Project: service [copy] Warning: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/service/src/test/resources does not exist. init: [echo] Project: service create-dirs: [echo] Project: cli [copy] Warning: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/cli/src/test/resources does not exist. init: [echo] Project: cli create-dirs: [echo] Project: jdbc [copy] Warning: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/jdbc/src/test/resources does not exist. init: [echo] Project: jdbc create-dirs: [echo] Project: hwi [copy] Warning: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/hwi/src/test/resources does not exist. init: [echo] Project: hwi create-dirs: [echo] Project: hbase-handler [copy] Warning: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/hbase-handler/src/test/resources does not exist. init: [echo] Project: hbase-handler create-dirs: [echo] Project: pdk [copy] Warning: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/pdk/src/test/resources does not exist. init: [echo] Project: pdk create-dirs: [echo] Project: builtins [copy] Warning: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/builtins/src/test/resources does not exist. init: [echo] Project: builtins jar: [echo] Project: hive create-dirs: [echo] Project: shims [copy] Warning: https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/shims/src/test/resources does not exist. init: [echo] Project: shims ivy-init-settings: [echo] Project: shims ivy-resolve: [echo] Project: shims [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/ivy/ivysettings.xml [ivy:report] Processing https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/27/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-shims-default.xml to https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/27/artifact/hive/build/ivy/report/org.apache.hive-hive-shims-default.html ivy-retrieve: [echo] Project: shims compile: [echo] Project: shims [echo] Building shims 0.20 build-shims: [echo] Project: shims [echo] Compiling https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/0.20/java against hadoop 0.20.2 (https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/27/artifact/hive/build/hadoopcore/hadoop-0.20.2) ivy-init-settings: [echo] Project: shims ivy-resolve-hadoop-shim: [echo] Project: shims [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/ivy/ivysettings.xml ivy-retrieve-hadoop-shim: [echo] Project: shims [echo] Building shims 0.20S build-shims: [echo] Project: shims [echo] Compiling https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/common-secure/java;/home/jenkins/jenkins-slave/workspace/Hive-0.10.0-SNAPSHOT-h0.20.1/hive/shims/src/0.20S/java against hadoop 1.0.0 (https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/27/artifact/hive/build/hadoopcore/hadoop-1.0.0) ivy-init-settings: [echo] Project: shims ivy-resolve-hadoop-shim: [echo] Project: shims [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Hive-0.10.0-SNAPSHOT-h0.20.1/ws/hive/ivy/ivysettings.xml ivy-retrieve-hadoop-shim: [echo] Project: shims [echo] Building shims 0.23 build-shims: [echo] Project: shims [echo] Compiling
Hive-trunk-h0.21 - Build # 1899 - Failure
Changes for Build #1899 No tests ran. The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1899) Status: Failure Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1899/ to view the results.
Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #253
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/253/ -- [...truncated 9916 lines...] compile-test: [echo] Project: serde [javac] Compiling 26 source files to /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/serde/test/classes [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. create-dirs: [echo] Project: service [copy] Warning: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/service/src/test/resources does not exist. init: [echo] Project: service ivy-init-settings: [echo] Project: service ivy-resolve: [echo] Project: service [ivy:resolve] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/ivy/ivysettings.xml [ivy:report] Processing /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/resolution-cache/org.apache.hive-hive-service-default.xml to /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/report/org.apache.hive-hive-service-default.html ivy-retrieve: [echo] Project: service compile: [echo] Project: service ivy-resolve-test: [echo] Project: service ivy-retrieve-test: [echo] Project: service compile-test: [echo] Project: service [javac] Compiling 2 source files to /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/service/test/classes test: [echo] Project: hive test-shims: [echo] Project: hive test-conditions: [echo] Project: shims gen-test: [echo] Project: shims create-dirs: [echo] Project: shims [copy] Warning: /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/test/resources does not exist. init: [echo] Project: shims ivy-init-settings: [echo] Project: shims ivy-resolve: [echo] Project: shims [ivy:resolve] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/ivy/ivysettings.xml [ivy:report] Processing /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/resolution-cache/org.apache.hive-hive-shims-default.xml to /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/ivy/report/org.apache.hive-hive-shims-default.html ivy-retrieve: [echo] Project: shims compile: [echo] Project: shims [echo] Building shims 0.20 build_shims: [echo] Project: shims [echo] Compiling /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20/java against hadoop 0.20.2 (/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/hadoopcore/hadoop-0.20.2) ivy-init-settings: [echo] Project: shims ivy-resolve-hadoop-shim: [echo] Project: shims [ivy:resolve] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/ivy/ivysettings.xml ivy-retrieve-hadoop-shim: [echo] Project: shims [echo] Building shims 0.20S build_shims: [echo] Project: shims [echo] Compiling /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common-secure/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20S/java against hadoop 1.0.0 (/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/hadoopcore/hadoop-1.0.0) ivy-init-settings: [echo] Project: shims ivy-resolve-hadoop-shim: [echo] Project: shims [ivy:resolve] :: loading settings :: file = /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/ivy/ivysettings.xml ivy-retrieve-hadoop-shim: [echo] Project: shims [echo] Building shims 0.23 build_shims: [echo] Project: shims [echo] Compiling /x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common-secure/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.23/java against hadoop 0.23.3 (/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/build/hadoopcore/hadoop-0.23.3) ivy-init-settings: [echo] Project: shims
[jira] [Created] (HIVE-3869) SELECT foo, NULL UNION ALL SELECT bar, baz fails
David Morel created HIVE-3869: - Summary: SELECT foo, NULL UNION ALL SELECT bar, baz fails Key: HIVE-3869 URL: https://issues.apache.org/jira/browse/HIVE-3869 Project: Hive Issue Type: Bug Affects Versions: 0.8.1 Reporter: David Morel In order to avoid the curse of the last reducer by using a left outer join where most joined rows woudl be NULLs, I rewrote the query as: {code} SELECT * FROM ( SELECT A.user_id id, B.created FROM ( SELECT DISTINCT user_id FROM users ) A JOIN buyhist B ON A.user_id = B.user_id AND B.created = '2013-01-01' UNION ALL SELECT DISTINCT(user_id) id, NULL created FROM users ) foo; {code} The expection thrown is this: {code} 2013-01-07 17:00:01,081 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 22 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 22 more {code} The org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60) caught my attention, so I replaced NULL by an empty string: {code} ... UNION ALL SELECT DISTINCT(user_id) id, '' created {code} Shouldn't the query parser accept the form using NULL, or at least output a message before the job is sent to the jobtracker? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3870) SELECT foo, NULL UNION ALL SELECT bar, baz fails
David Morel created HIVE-3870: - Summary: SELECT foo, NULL UNION ALL SELECT bar, baz fails Key: HIVE-3870 URL: https://issues.apache.org/jira/browse/HIVE-3870 Project: Hive Issue Type: Bug Affects Versions: 0.8.1 Reporter: David Morel In order to avoid the curse of the last reducer by using a left outer join where most joined rows woudl be NULLs, I rewrote the query as: {code} SELECT * FROM ( SELECT A.user_id id, B.created FROM ( SELECT DISTINCT user_id FROM users ) A JOIN buyhist B ON A.user_id = B.user_id AND B.created = '2013-01-01' UNION ALL SELECT DISTINCT(user_id) id, NULL created FROM users ) foo; {code} The expection thrown is this: {code} 2013-01-07 17:00:01,081 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 22 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 22 more {code} The org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60) caught my attention, so I replaced NULL by an empty string: {code} ... UNION ALL SELECT DISTINCT(user_id) id, '' created {code} Shouldn't the query parser accept the form using NULL, or at least output a message before the job is sent to the jobtracker? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3431) Avoid race conditions while downloading resources from non-local filesystem
[ https://issues.apache.org/jira/browse/HIVE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3431: --- Summary: Avoid race conditions while downloading resources from non-local filesystem (was: Resources on non-local file system should be downloaded to temporary directory sometimes) Avoid race conditions while downloading resources from non-local filesystem --- Key: HIVE-3431 URL: https://issues.apache.org/jira/browse/HIVE-3431 Project: Hive Issue Type: Improvement Components: Configuration Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3431.1.patch.txt, HIVE-3431.D5199.2.patch, HIVE-3431.D5199.3.patch, HIVE-3431.D5199.4.patch add resource remote-uri command downloads the resource file to location specified by conf hive.downloaded.resources.dir in local file system. But when the command above is executed concurrently to hive-server for same file, some client fails by VM crash, which is caused by overwritten file by other requests. So there should be a configuration to provide per request location for add resource command, something like set hiveconf:hive.downloaded.resources.dir=temporary -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3431) Avoid race conditions while downloading resources from non-local filesystem
[ https://issues.apache.org/jira/browse/HIVE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-3431: --- Resolution: Fixed Fix Version/s: 0.11.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Avoid race conditions while downloading resources from non-local filesystem --- Key: HIVE-3431 URL: https://issues.apache.org/jira/browse/HIVE-3431 Project: Hive Issue Type: Improvement Components: Configuration Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.11.0 Attachments: HIVE-3431.1.patch.txt, HIVE-3431.D5199.2.patch, HIVE-3431.D5199.3.patch, HIVE-3431.D5199.4.patch add resource remote-uri command downloads the resource file to location specified by conf hive.downloaded.resources.dir in local file system. But when the command above is executed concurrently to hive-server for same file, some client fails by VM crash, which is caused by overwritten file by other requests. So there should be a configuration to provide per request location for add resource command, something like set hiveconf:hive.downloaded.resources.dir=temporary -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-3697) External JAR files on HDFS can lead to race condition with hive.downloaded.resources.dir
[ https://issues.apache.org/jira/browse/HIVE-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-3697. Resolution: Fixed Fix Version/s: 0.11.0 HIVE-3431 should fix this issue. Please reopen if you find otherwise. External JAR files on HDFS can lead to race condition with hive.downloaded.resources.dir Key: HIVE-3697 URL: https://issues.apache.org/jira/browse/HIVE-3697 Project: Hive Issue Type: Bug Affects Versions: 0.9.0 Reporter: Chris McConnell Fix For: 0.11.0 I've seen situations where utilizing JAR files on HDFS can cause job failures via CNFE or JVM crashes. This is difficult to replicate, seems to be related to JAR size, latency between client and HDFS cluster, but I've got some example stack traces below. Seems that the calls made to FileSystem (copyToLocal) which are static and will be executed to delete the current local copy can cause the file(s) to be removed during job processing. We should consider changing the default for hive.downloaded.resources.dir to include some level of uniqueness per job. We should not consider hive.session.id however, as execution of multiple statements via the same user/session which might access the same JAR files will utilize the same session. A proposal might be to utilize System.nanoTime() -- which might be enough to avoid the issue, although it's not perfect (depends on JVM and system for level of precision) as part of the default (/tmp/${user.name}/resources/System.nanoTime()/). If anyone else has hit this, would like to capture environment information as well. Perhaps there is something else at play here. Here are some examples of the errors: for i in {0..2}; do hive -S -f query.q done [2] 48405 [3] 48406 [4] 48407 % # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x7fb10bd931f0, pid=48407, tid=140398456698624 # # JRE version: 6.0_31-b04 # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.6-b01 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libzip.so+0xb1f0] __int128+0x60 # # An error report file with more information is saved as: # /home/.../hs_err_pid48407.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # java.lang.NoClassDefFoundError: com/example/udf/Lower at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:105) at org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask.java:75) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:63) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:439) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:449) at org.apache.hadoop.hive.cli.CliDriver.processInitFiles(CliDriver.java:485) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:692) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Caused by: java.lang.ClassNotFoundException: com.example.udf.Lower at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
[jira] [Resolved] (HIVE-3870) SELECT foo, NULL UNION ALL SELECT bar, baz fails
[ https://issues.apache.org/jira/browse/HIVE-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan resolved HIVE-3870. Resolution: Duplicate Dupe of HIVE-3869 SELECT foo, NULL UNION ALL SELECT bar, baz fails Key: HIVE-3870 URL: https://issues.apache.org/jira/browse/HIVE-3870 Project: Hive Issue Type: Bug Affects Versions: 0.8.1 Reporter: David Morel In order to avoid the curse of the last reducer by using a left outer join where most joined rows woudl be NULLs, I rewrote the query as: {code} SELECT * FROM ( SELECT A.user_id id, B.created FROM ( SELECT DISTINCT user_id FROM users ) A JOIN buyhist B ON A.user_id = B.user_id AND B.created = '2013-01-01' UNION ALL SELECT DISTINCT(user_id) id, NULL created FROM users ) foo; {code} The expection thrown is this: {code} 2013-01-07 17:00:01,081 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:389) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:72) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:130) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:103) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121) ... 22 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:427) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:98) ... 22 more {code} The org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:60) caught my attention, so I replaced NULL by an empty string: {code} ... UNION ALL SELECT DISTINCT(user_id) id, '' created {code} Shouldn't the query parser accept the form using NULL, or at least output a message before the job is sent to the jobtracker? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hive-trunk-h0.21 - Build # 1900 - Still Failing
Changes for Build #1899 Changes for Build #1900 [hashutosh] HIVE-3431 : Avoid race conditions while downloading resources from non-local filesystem (Navis via Ashutosh Chauhan) No tests ran. The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1900) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1900/ to view the results.
[jira] [Commented] (HIVE-3431) Avoid race conditions while downloading resources from non-local filesystem
[ https://issues.apache.org/jira/browse/HIVE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546111#comment-13546111 ] Hudson commented on HIVE-3431: -- Integrated in Hive-trunk-h0.21 #1900 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1900/]) HIVE-3431 : Avoid race conditions while downloading resources from non-local filesystem (Navis via Ashutosh Chauhan) (Revision 1429916) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1429916 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java Avoid race conditions while downloading resources from non-local filesystem --- Key: HIVE-3431 URL: https://issues.apache.org/jira/browse/HIVE-3431 Project: Hive Issue Type: Improvement Components: Configuration Affects Versions: 0.10.0 Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.11.0 Attachments: HIVE-3431.1.patch.txt, HIVE-3431.D5199.2.patch, HIVE-3431.D5199.3.patch, HIVE-3431.D5199.4.patch add resource remote-uri command downloads the resource file to location specified by conf hive.downloaded.resources.dir in local file system. But when the command above is executed concurrently to hive-server for same file, some client fails by VM crash, which is caused by overwritten file by other requests. So there should be a configuration to provide per request location for add resource command, something like set hiveconf:hive.downloaded.resources.dir=temporary -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-3773) Share input scan by unions across multiple queries
[ https://issues.apache.org/jira/browse/HIVE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-3773 started by Gang Tim Liu. Share input scan by unions across multiple queries -- Key: HIVE-3773 URL: https://issues.apache.org/jira/browse/HIVE-3773 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Consider a query like: select * from ( select key, 1 as value, count(1) from src group by key union all select 1 as key, value, count(1) from src group by value union all select key, value, count(1) from src group by key, value ) s; src is scanned multiple times currently (one per sub-query). This should be treated like a multi-table insert by the optimizer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3773) Share input scan by unions across multiple queries
[ https://issues.apache.org/jira/browse/HIVE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546139#comment-13546139 ] Ashutosh Chauhan commented on HIVE-3773: Isn't this already implemented in HIVE-2206 ? Share input scan by unions across multiple queries -- Key: HIVE-3773 URL: https://issues.apache.org/jira/browse/HIVE-3773 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Consider a query like: select * from ( select key, 1 as value, count(1) from src group by key union all select 1 as key, value, count(1) from src group by value union all select key, value, count(1) from src group by key, value ) s; src is scanned multiple times currently (one per sub-query). This should be treated like a multi-table insert by the optimizer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization
[ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546150#comment-13546150 ] Yin Huai commented on HIVE-2206: [~liuzongquan] The latest patch was developed based on hive trunk revision 1410581. add a new optimizer for query correlation discovery and optimization Key: HIVE-2206 URL: https://issues.apache.org/jira/browse/HIVE-2206 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.10.0 Reporter: He Yongqiang Assignee: Yin Huai Attachments: HIVE-2206.10-r1384442.patch.txt, HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch This issue proposes a new logical optimizer called Correlation Optimizer, which is used to merge correlated MapReduce jobs (MR jobs) into a single MR job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The paper and slides of YSmart are linked at the bottom. Since Hive translates queries in a sentence by sentence fashion, for every operation which may need to shuffle the data (e.g. join and aggregation operations), Hive will generate a MapReduce job for that operation. However, for those operations which may need to shuffle the data, they may involve correlations explained below and thus can be executed in a single MR job. # Input Correlation: Multiple MR jobs have input correlation (IC) if their input relation sets are not disjoint; # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they have not only input correlation, but also the same partition key; # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its child nodes if it has the same partition key as that child node. The current implementation of correlation optimizer only detect correlations among MR jobs for reduce-side join operators and reduce-side aggregation operators (not map only aggregation). A query will be optimized if it satisfies following conditions. # There exists a MR job for reduce-side join operator or reduce side aggregation operator which have JFC with all of its parents MR jobs (TCs will be also exploited if JFC exists); # All input tables of those correlated MR job are original input tables (not intermediate tables generated by sub-queries); and # No self join is involved in those correlated MR jobs. Correlation optimizer is implemented as a logical optimizer. The main reasons are that it only needs to manipulate the query plan tree and it can leverage the existing component on generating MR jobs. Current implementation can serve as a framework for correlation related optimizations. I think that it is better than adding individual optimizers. There are several work that can be done in future to improve this optimizer. Here are three examples. # Support queries only involve TC; # Support queries in which input tables of correlated MR jobs involves intermediate tables; and # Optimize queries involving self join. References: Paper and presentation of YSmart. Paper: http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf Slides: http://sdrv.ms/UpwJJc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546152#comment-13546152 ] He Yongqiang commented on HIVE-3585: HBaseSerde is first added to contrib and then moved to core later. bq. Pig is adding TrevniStorage as a builtin, and interoperability is desired. I think interoperability is not a problem no matter where the code residents. Integrate Trevni as another columnar oriented file format - Key: HIVE-3585 URL: https://issues.apache.org/jira/browse/HIVE-3585 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: alex gemini Assignee: Mark Wagner Priority: Minor add new avro module trevni as another columnar format.New columnar format need a columnar SerDe,seems fastutil is a good choice.the shark project use fastutil library as columnar serde library but it seems too large (almost 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546170#comment-13546170 ] Sean Busbey commented on HIVE-3585: --- [~namita] Trevni defines a columnar format that can be used with different serialization systems. I believe initial efforts across different components are planning to use Avro for serialization. Eventually, Trevni support should also work for Thrift and Protobufs. Integrate Trevni as another columnar oriented file format - Key: HIVE-3585 URL: https://issues.apache.org/jira/browse/HIVE-3585 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: alex gemini Assignee: Mark Wagner Priority: Minor add new avro module trevni as another columnar format.New columnar format need a columnar SerDe,seems fastutil is a good choice.the shark project use fastutil library as columnar serde library but it seems too large (almost 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546210#comment-13546210 ] Carl Steinbach commented on HIVE-3585: -- bq. HBaseSerde is first added to contrib and then moved to core later. And what did this accomplish? Wouldn't it have been better to put it in core to begin with? In fact, can anyone tell me why we shouldn't abolish contrib altogether? Integrate Trevni as another columnar oriented file format - Key: HIVE-3585 URL: https://issues.apache.org/jira/browse/HIVE-3585 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: alex gemini Assignee: Mark Wagner Priority: Minor add new avro module trevni as another columnar format.New columnar format need a columnar SerDe,seems fastutil is a good choice.the shark project use fastutil library as columnar serde library but it seems too large (almost 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3773) Share input scan by unions across multiple queries
[ https://issues.apache.org/jira/browse/HIVE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546230#comment-13546230 ] Gang Tim Liu commented on HIVE-3773: thank you for great point. Yes, it can. In addition, it can solve much complexer queries like join and will bring other benefits. This issue is targeted to solve the simple use case in a simple way. It will benefit general purpose including the use case where configuration of 2206 is not turned on. Share input scan by unions across multiple queries -- Key: HIVE-3773 URL: https://issues.apache.org/jira/browse/HIVE-3773 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Gang Tim Liu Consider a query like: select * from ( select key, 1 as value, count(1) from src group by key union all select 1 as key, value, count(1) from src group by value union all select key, value, count(1) from src group by key, value ) s; src is scanned multiple times currently (one per sub-query). This should be treated like a multi-table insert by the optimizer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2693) Add DECIMAL data type
[ https://issues.apache.org/jira/browse/HIVE-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546256#comment-13546256 ] Mark Grover commented on HIVE-2693: --- Non-committer +1 Namit, any thoughts on the UDF method selection logic? Add DECIMAL data type - Key: HIVE-2693 URL: https://issues.apache.org/jira/browse/HIVE-2693 Project: Hive Issue Type: New Feature Components: Query Processor, Types Affects Versions: 0.10.0 Reporter: Carl Steinbach Assignee: Prasad Mujumdar Attachments: 2693_7.patch, 2693_8.patch, 2693_fix_all_tests1.patch, HIVE-2693-10.patch, HIVE-2693-11.patch, HIVE-2693-12-SortableSerDe.patch, HIVE-2693-13.patch, HIVE-2693-14.patch, HIVE-2693-15.patch, HIVE-2693-16.patch, HIVE-2693-17.patch, HIVE-2693-18.patch, HIVE-2693-19.patch, HIVE-2693-1.patch.txt, HIVE-2693-all.patch, HIVE-2693.D7683.1.patch, HIVE-2693-fix.patch, HIVE-2693.patch, HIVE-2693-take3.patch, HIVE-2693-take4.patch Add support for the DECIMAL data type. HIVE-2272 (TIMESTAMP) provides a nice template for how to do this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3789) Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9
[ https://issues.apache.org/jira/browse/HIVE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546285#comment-13546285 ] Arup Malakar commented on HIVE-3789: Hi Ashutosh, you are right. My concern was that checkPath() should look for pfile:// scheme in the path that is passed. It For the test cases to pass adding resolvePath() is sufficient. I will submit a patch without the modification in checkPath(). Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9 Key: HIVE-3789 URL: https://issues.apache.org/jira/browse/HIVE-3789 Project: Hive Issue Type: Bug Components: Metastore, Tests Affects Versions: 0.9.0, 0.10.0 Environment: Hadooop 0.23.5, JDK 1.6.0_31 Reporter: Chris Drome Assignee: Arup Malakar Attachments: HIVE-3789.branch-0.9_1.patch, HIVE-3789.trunk.1.patch Rolling back to before this patch shows that the unit tests are passing, after the patch, the majority of the unit tests are failing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3789) Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9
[ https://issues.apache.org/jira/browse/HIVE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arup Malakar updated HIVE-3789: --- Attachment: HIVE-3789.branch-0.9_2.patch HIVE-3789.trunk.2.patch Patch with reverted checkPath() Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9 Key: HIVE-3789 URL: https://issues.apache.org/jira/browse/HIVE-3789 Project: Hive Issue Type: Bug Components: Metastore, Tests Affects Versions: 0.9.0, 0.10.0 Environment: Hadooop 0.23.5, JDK 1.6.0_31 Reporter: Chris Drome Assignee: Arup Malakar Attachments: HIVE-3789.branch-0.9_1.patch, HIVE-3789.branch-0.9_2.patch, HIVE-3789.trunk.1.patch, HIVE-3789.trunk.2.patch Rolling back to before this patch shows that the unit tests are passing, after the patch, the majority of the unit tests are failing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546348#comment-13546348 ] He Yongqiang commented on HIVE-3585: contrib is a good place for any projects that is not mature. There are so many custom data formats out there, it does not make sense to support all of them in core hive code base. contrib is a good place for them to grow. Another good place i can think of is the hcatalog project. Integrate Trevni as another columnar oriented file format - Key: HIVE-3585 URL: https://issues.apache.org/jira/browse/HIVE-3585 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: alex gemini Assignee: Mark Wagner Priority: Minor add new avro module trevni as another columnar format.New columnar format need a columnar SerDe,seems fastutil is a good choice.the shark project use fastutil library as columnar serde library but it seems too large (almost 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546348#comment-13546348 ] He Yongqiang edited comment on HIVE-3585 at 1/7/13 10:40 PM: - contrib is a good place for any projects that is not mature. There are so many custom data formats out there, it does not make sense to support all of them in core hive code base. contrib is a good place for them to grow. From http://incubator.apache.org/hcatalog/docs/r0.4.0/, another good place i can think of is the hcatalog project. But i don't know if hcatalog itself includes custom data format support or not. was (Author: he yongqiang): contrib is a good place for any projects that is not mature. There are so many custom data formats out there, it does not make sense to support all of them in core hive code base. contrib is a good place for them to grow. Another good place i can think of is the hcatalog project. Integrate Trevni as another columnar oriented file format - Key: HIVE-3585 URL: https://issues.apache.org/jira/browse/HIVE-3585 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: alex gemini Assignee: Mark Wagner Priority: Minor add new avro module trevni as another columnar format.New columnar format need a columnar SerDe,seems fastutil is a good choice.the shark project use fastutil library as columnar serde library but it seems too large (almost 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546400#comment-13546400 ] Carl Steinbach commented on HIVE-3585: -- The only concrete difference between core and contrib that I'm aware of is that the latter doesn't appear on Hive's classpath by default. As such I can only see two advantages to putting code in contrib: 1) it makes it harder for folks to use, and 2) it makes it harder for us to test. Did I miss anything? Integrate Trevni as another columnar oriented file format - Key: HIVE-3585 URL: https://issues.apache.org/jira/browse/HIVE-3585 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: alex gemini Assignee: Mark Wagner Priority: Minor add new avro module trevni as another columnar format.New columnar format need a columnar SerDe,seems fastutil is a good choice.the shark project use fastutil library as columnar serde library but it seems too large (almost 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Blue tables in Hive xdocs
Tables in Hive xdocs have a default background color that's rather overpowering (see Hive interactive shell commands in http://hive.apache.org/docs/r0.9.0/language_manual/cli.html). I'm working on a new doc that has lots of tables, so I tried to change the color to white (or any quieter color) but had no luck. Is this an Anakia issue, or Velocity? Does anyone know how to set the color either cell-by-cell or for the whole table? Thanks for any help or pointers to help. – Lefty Leverenz
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546516#comment-13546516 ] Russell Jurney commented on HIVE-3585: -- He, HCatalog uses Hive Serde. By adding the Trevni builtin for Apache Hive, Apache Hive, Shark, Apache HCatalog and Apache Pig will all get Trevni support. Synergy, baby! Apache Trevni is part of an actual Apache top-level project, Apache Avro, so it is nothing like Zebra, which I notice you reported yourself for addition in HIVE-781. Avro and Trevni are specifically designed for Hadoop workloads, and other tools like Pig are including Trevni immediately. Integrate Trevni as another columnar oriented file format - Key: HIVE-3585 URL: https://issues.apache.org/jira/browse/HIVE-3585 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: alex gemini Assignee: Mark Wagner Priority: Minor add new avro module trevni as another columnar format.New columnar format need a columnar SerDe,seems fastutil is a good choice.the shark project use fastutil library as columnar serde library but it seems too large (almost 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546517#comment-13546517 ] Russell Jurney commented on HIVE-3585: -- This ticket now has 5 votes, and 22 watchers. Support for a Trevni builtin is overwhelming. Integrate Trevni as another columnar oriented file format - Key: HIVE-3585 URL: https://issues.apache.org/jira/browse/HIVE-3585 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: alex gemini Assignee: Mark Wagner Priority: Minor add new avro module trevni as another columnar format.New columnar format need a columnar SerDe,seems fastutil is a good choice.the shark project use fastutil library as columnar serde library but it seems too large (almost 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3789) Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9
[ https://issues.apache.org/jira/browse/HIVE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546535#comment-13546535 ] Ashutosh Chauhan commented on HIVE-3789: +1 Patch HIVE-3648 causing the majority of unit tests to fail on branch 0.9 Key: HIVE-3789 URL: https://issues.apache.org/jira/browse/HIVE-3789 Project: Hive Issue Type: Bug Components: Metastore, Tests Affects Versions: 0.9.0, 0.10.0 Environment: Hadooop 0.23.5, JDK 1.6.0_31 Reporter: Chris Drome Assignee: Arup Malakar Attachments: HIVE-3789.branch-0.9_1.patch, HIVE-3789.branch-0.9_2.patch, HIVE-3789.trunk.1.patch, HIVE-3789.trunk.2.patch Rolling back to before this patch shows that the unit tests are passing, after the patch, the majority of the unit tests are failing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD
[ https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3853: -- Attachment: HIVE-3853.D7767.2.patch navis updated the revision HIVE-3853 [jira] UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD. Reviewers: JIRA Addressed comments REVISION DETAIL https://reviews.facebook.net/D7767 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUnixTimeStamp.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFToUnixTimeStamp.java ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUnixTimeStamp.java ql/src/test/queries/clientpositive/udf_to_unix_timestamp.q ql/src/test/queries/clientpositive/udf_unix_timestamp.q ql/src/test/results/clientpositive/show_functions.q.out ql/src/test/results/clientpositive/udf5.q.out ql/src/test/results/clientpositive/udf_to_unix_timestamp.q.out ql/src/test/results/clientpositive/udf_unix_timestamp.q.out To: JIRA, navis Cc: njain UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD --- Key: HIVE-3853 URL: https://issues.apache.org/jira/browse/HIVE-3853 Project: Hive Issue Type: Improvement Components: UDF Reporter: Navis Assignee: Navis Priority: Trivial Labels: udf Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch unix_timestamp is declared as a non-deterministic function. But if user provides an argument, it makes deterministic result and eligible to PPD. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD
[ https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3853: Status: Patch Available (was: Open) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD --- Key: HIVE-3853 URL: https://issues.apache.org/jira/browse/HIVE-3853 Project: Hive Issue Type: Improvement Components: UDF Reporter: Navis Assignee: Navis Priority: Trivial Labels: udf Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch unix_timestamp is declared as a non-deterministic function. But if user provides an argument, it makes deterministic result and eligible to PPD. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD
[ https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546578#comment-13546578 ] Phabricator commented on HIVE-3853: --- navis has commented on the revision HIVE-3853 [jira] UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD. I've heard annotation information is a part of class definition, which cannot be overwritten in runtime. REVISION DETAIL https://reviews.facebook.net/D7767 To: JIRA, navis Cc: njain UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD --- Key: HIVE-3853 URL: https://issues.apache.org/jira/browse/HIVE-3853 Project: Hive Issue Type: Improvement Components: UDF Reporter: Navis Assignee: Navis Priority: Trivial Labels: udf Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch unix_timestamp is declared as a non-deterministic function. But if user provides an argument, it makes deterministic result and eligible to PPD. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3585) Integrate Trevni as another columnar oriented file format
[ https://issues.apache.org/jira/browse/HIVE-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546609#comment-13546609 ] Namit Jain commented on HIVE-3585: -- The main reason that contrib exists is to add new features/projects which are being tested, may take some time to mature, and are reasonably stand-alone, so that they dont need many changes in existing code. New serdes/fileformats/udfs are good usecases for them. I dont see why is testing/development in contrib so difficult or different as compared to development in any other component. This is the reason why contrib was added, so new stand-alone components can bake. We can definitely move it from contrib, once it is mature/safe. Why is development in contrib such a bad idea ? Integrate Trevni as another columnar oriented file format - Key: HIVE-3585 URL: https://issues.apache.org/jira/browse/HIVE-3585 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.10.0 Reporter: alex gemini Assignee: Mark Wagner Priority: Minor add new avro module trevni as another columnar format.New columnar format need a columnar SerDe,seems fastutil is a good choice.the shark project use fastutil library as columnar serde library but it seems too large (almost 15m) for just a few primitive array collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views
[ https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3803: - Attachment: hive.3803.8.patch explain dependency should show the dependencies hierarchically in presence of views --- Key: HIVE-3803 URL: https://issues.apache.org/jira/browse/HIVE-3803 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, hive.3803.4.patch, hive.3803.5.patch, hive.3803.6.patch, hive.3803.7.patch, hive.3803.8.patch It should also include tables whose partitions are being accessed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3871) show number of mappers/reducers as part of explain extended
Namit Jain created HIVE-3871: Summary: show number of mappers/reducers as part of explain extended Key: HIVE-3871 URL: https://issues.apache.org/jira/browse/HIVE-3871 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain It would be useful to show the number of mappers/reducers as part of explain extended. For the MR jobs referencing intermediate data, the number can be approximate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3803) explain dependency should show the dependencies hierarchically in presence of views
[ https://issues.apache.org/jira/browse/HIVE-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3803: - Attachment: hive.3803.9.patch explain dependency should show the dependencies hierarchically in presence of views --- Key: HIVE-3803 URL: https://issues.apache.org/jira/browse/HIVE-3803 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.3803.1.patch, hive.3803.2.patch, hive.3803.3.patch, hive.3803.4.patch, hive.3803.5.patch, hive.3803.6.patch, hive.3803.7.patch, hive.3803.8.patch, hive.3803.9.patch It should also include tables whose partitions are being accessed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3825) Add Operator level Hooks
[ https://issues.apache.org/jira/browse/HIVE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546616#comment-13546616 ] Namit Jain commented on HIVE-3825: -- Look at optrstat_groupby.q for an example. Add Operator level Hooks Key: HIVE-3825 URL: https://issues.apache.org/jira/browse/HIVE-3825 Project: Hive Issue Type: New Feature Reporter: Pamela Vagata Assignee: Pamela Vagata Priority: Minor Attachments: HIVE-3825.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD
[ https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546629#comment-13546629 ] Namit Jain commented on HIVE-3853: -- +1 UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD --- Key: HIVE-3853 URL: https://issues.apache.org/jira/browse/HIVE-3853 Project: Hive Issue Type: Improvement Components: UDF Reporter: Navis Assignee: Navis Priority: Trivial Labels: udf Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch unix_timestamp is declared as a non-deterministic function. But if user provides an argument, it makes deterministic result and eligible to PPD. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3853) UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD
[ https://issues.apache.org/jira/browse/HIVE-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546631#comment-13546631 ] Phabricator commented on HIVE-3853: --- njain has accepted the revision HIVE-3853 [jira] UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD. REVISION DETAIL https://reviews.facebook.net/D7767 BRANCH DPAL-1956 To: JIRA, njain, navis Cc: njain UDF unix_timestamp is deterministic if an argument is given, but it treated as non-deterministic preventing PPD --- Key: HIVE-3853 URL: https://issues.apache.org/jira/browse/HIVE-3853 Project: Hive Issue Type: Improvement Components: UDF Reporter: Navis Assignee: Navis Priority: Trivial Labels: udf Attachments: HIVE-3853.D7767.1.patch, HIVE-3853.D7767.2.patch unix_timestamp is declared as a non-deterministic function. But if user provides an argument, it makes deterministic result and eligible to PPD. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546643#comment-13546643 ] Phabricator commented on HIVE-3562: --- njain has commented on the revision HIVE-3562 [jira] Some limit can be pushed down to map stage. INLINE COMMENTS conf/hive-default.xml.template:1434 Can you add more details here - a example query would really help ? ql/src/test/queries/clientpositive/limit_pushdown.q:16 What is so special about 40 ? set hive.limit.pushdown.heap.threshold explicitly at the beginning of the test, makes the test easier to maintain in the long run. ql/src/test/queries/clientpositive/limit_pushdown.q:34 What is the difference between this and line 3 ? ql/src/test/queries/clientpositive/limit_pushdown.q:10 I think this plan is not correct. Let us say, the values are v1 v2 .. v10 v11 v12 .. v20 The first mapper does not have v8-10, so it emits v1-v7, v11-v13 The second mapper contains data for all values, but it only emits v1-v10 Since it does not involves a order by, it is possible that the data for v11 will get picked up, which does not contain data from the second mapper. If you are pushing the limit up, you should create an additional MR job which orders the rows - in the above example, making sure that only v1-v10 are picked up. Am I missing something here ? REVISION DETAIL https://reviews.facebook.net/D5967 To: JIRA, tarball, navis Cc: njain Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-3562: - Status: Open (was: Patch Available) comments Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546649#comment-13546649 ] Phabricator commented on HIVE-3562: --- njain has commented on the revision HIVE-3562 [jira] Some limit can be pushed down to map stage. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:75 remove the TODO ql/src/test/queries/clientpositive/limit_pushdown.q:51 There is no test where the limit is hive.limit.pushdown.heap.threshold. ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:87 Do you want to compare the threshold with the actual limit here ? REVISION DETAIL https://reviews.facebook.net/D5967 To: JIRA, tarball, navis Cc: njain Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546661#comment-13546661 ] Phabricator commented on HIVE-3562: --- njain has commented on the revision HIVE-3562 [jira] Some limit can be pushed down to map stage. Sorry, my earlier comments were assuming that the threshold is for number of rows INLINE COMMENTS common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:483 Coming to a earlier comment from Sivaramakrishnan Narayanan, would it be simpler if this was the number of rows ? ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:414 Define 40 as a constant somewhere REVISION DETAIL https://reviews.facebook.net/D5967 To: JIRA, tarball, navis Cc: njain Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira