[jira] [Resolved] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang resolved HIVE-2201. Resolution: Fixed committed, thanks Siying! reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch, HIVE-2201.4.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.21 #838
See https://builds.apache.org/job/Hive-trunk-h0.21/838/changes Changes: [heyongqiang] HIVE-2209:add support for map comparision in serde layer (Krishna Kumar via He Yongqiang) -- [...truncated 20304 lines...] [junit] # [junit] Running org.apache.hadoop.hive.ql.parse.TestParseNegative [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.hadoop.hive.ql.parse.TestParseNegative FAILED (crashed) [junit] # [junit] # A fatal error has been detected by the Java Runtime Environment: [junit] # [junit] # SIGBUS (0x7) at pc=0xf76fd7a4, pid=19865, tid=4137978736 [junit] # [junit] # JRE version: 6.0_20-b02 [junit] # Java VM: Java HotSpot(TM) Server VM (16.3-b01 mixed mode linux-x86 ) [junit] # Problematic frame: [junit] # C [libc.so.6+0x1117a4] [junit] # [junit] # An error report file with more information is saved as: [junit] # /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ql/hs_err_pid19865.log [junit] # [junit] # If you would like to submit a bug report, please visit: [junit] # http://java.sun.com/webapps/bugreport/crash.jsp [junit] # [junit] Running org.apache.hadoop.hive.ql.tool.TestLineageInfo [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.hadoop.hive.ql.tool.TestLineageInfo FAILED (crashed) [junit] # [junit] # A fatal error has been detected by the Java Runtime Environment: [junit] # [junit] # SIGBUS (0x7) at pc=0xf76fa7a4, pid=19872, tid=4137966448 [junit] # [junit] # JRE version: 6.0_20-b02 [junit] # Java VM: Java HotSpot(TM) Server VM (16.3-b01 mixed mode linux-x86 ) [junit] # Problematic frame: [junit] # C [libc.so.6+0x1117a4] [junit] # [junit] # An error report file with more information is saved as: [junit] # /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ql/hs_err_pid19872.log [junit] # [junit] # If you would like to submit a bug report, please visit: [junit] # http://java.sun.com/webapps/bugreport/crash.jsp [junit] # [junit] Running org.apache.hadoop.hive.ql.udf.TestUDFDateAdd [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.hadoop.hive.ql.udf.TestUDFDateAdd FAILED (crashed) [junit] # [junit] # A fatal error has been detected by the Java Runtime Environment: [junit] # [junit] # SIGBUS (0x7) at pc=0xf76937a4, pid=19879, tid=4137544560 [junit] # [junit] # JRE version: 6.0_20-b02 [junit] # Java VM: Java HotSpot(TM) Server VM (16.3-b01 mixed mode linux-x86 ) [junit] # Problematic frame: [junit] # C [libc.so.6+0x1117a4] [junit] # [junit] # An error report file with more information is saved as: [junit] # /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ql/hs_err_pid19879.log [junit] # [junit] # If you would like to submit a bug report, please visit: [junit] # http://java.sun.com/webapps/bugreport/crash.jsp [junit] # [junit] Running org.apache.hadoop.hive.ql.udf.TestUDFDateDiff [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.hadoop.hive.ql.udf.TestUDFDateDiff FAILED (crashed) [junit] # [junit] # A fatal error has been detected by the Java Runtime Environment: [junit] # [junit] # SIGBUS (0x7) at pc=0xf76db7a4, pid=19886, tid=4137839472 [junit] # [junit] # JRE version: 6.0_20-b02 [junit] # Java VM: Java HotSpot(TM) Server VM (16.3-b01 mixed mode linux-x86 ) [junit] # Problematic frame: [junit] # C [libc.so.6+0x1117a4] [junit] # [junit] # An error report file with more information is saved as: [junit] # /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ql/hs_err_pid19886.log [junit] # [junit] # If you would like to submit a bug report, please visit: [junit] # http://java.sun.com/webapps/bugreport/crash.jsp [junit] # [junit] Running org.apache.hadoop.hive.ql.udf.TestUDFDateSub [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.hadoop.hive.ql.udf.TestUDFDateSub FAILED (crashed) [junit] # [junit] # A fatal error has been detected by the Java Runtime Environment: [junit] # [junit] # SIGBUS (0x7) at pc=0xf76857a4, pid=19893, tid=4137487216 [junit] # [junit] # JRE version: 6.0_20-b02 [junit] # Java VM: Java HotSpot(TM) Server VM (16.3-b01 mixed mode linux-x86 ) [junit] # Problematic frame: [junit] # C [libc.so.6+0x1117a4] [junit] # [junit] # An error report file with more information is saved as: [junit] # /x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/ql/hs_err_pid19893.log [junit] # [junit] # If you would like to submit a bug report, please visit: [junit] # http://java.sun.com/webapps/bugreport/crash.jsp [junit] #
[jira] [Commented] (HIVE-2209) Provide a way by which ObjectInspectorUtils.compare can be extended by the caller for comparing maps which are part of the object
[ https://issues.apache.org/jira/browse/HIVE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068838#comment-13068838 ] Hudson commented on HIVE-2209: -- Integrated in Hive-trunk-h0.21 #838 (See [https://builds.apache.org/job/Hive-trunk-h0.21/838/]) HIVE-2209:add support for map comparision in serde layer (Krishna Kumar via He Yongqiang) heyongqiang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1149027 Files : * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java * /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestFullMapEqualComparer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/FullMapEqualComparer.java * /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java Provide a way by which ObjectInspectorUtils.compare can be extended by the caller for comparing maps which are part of the object - Key: HIVE-2209 URL: https://issues.apache.org/jira/browse/HIVE-2209 Project: Hive Issue Type: Improvement Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor Attachments: HIVE-2209v0.patch, HIVE-2209v2.patch, HIVE2209v1.patch Now ObjectInspectorUtils.compare throws an exception if a map is contained (recursively) within the objects being compared. Two obvious implementations are - a simple map comparer which assumes keys of the first map can be used to fetch values from the second - a 'cross-product' comparer which compares every pair of key-value pairs in the two maps, and calls a match if and only if all pairs are matched Note that it would be difficult to provide a transitive greater-than/less-than indication with maps so that is not in scope. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.21 #839
See https://builds.apache.org/job/Hive-trunk-h0.21/839/changes Changes: [heyongqiang] HIVE-2201:reduce name node calls in hive by creating temporary directories (Siying Dong via He Yongqiang) -- [...truncated 4330 lines...] A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java AUql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIndex.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStringToMap.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNull.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/UDTFCollector.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPOr.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEWAHBitmap.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NumericHistogram.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStruct.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMax.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNotEqual.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVarianceSample.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStd.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBridge.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapEmpty.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUtils.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/NGramEstimator.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPAnd.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPEqual.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCoalesce.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPLessThan.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFField.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFElt.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFExplode.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentileApprox.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUnion.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/Collector.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPEqualOrGreaterThan.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFInstr.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFWhen.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCollectSet.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTF.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPGreaterThan.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapAnd.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDFEWAHBitmapBop.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPEqualOrLessThan.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSize.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCovarianceSample.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseCompare.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFConcatWS.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEWAHBitmapOr.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFResolver2.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMin.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFReflect.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/AbstractGenericUDAFResolver.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/package-info.java A
[jira] [Commented] (HIVE-2201) reduce name node calls in hive by creating temporary directories
[ https://issues.apache.org/jira/browse/HIVE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068839#comment-13068839 ] Hudson commented on HIVE-2201: -- Integrated in Hive-trunk-h0.21 #839 (See [https://builds.apache.org/job/Hive-trunk-h0.21/839/]) HIVE-2201:reduce name node calls in hive by creating temporary directories (Siying Dong via He Yongqiang) heyongqiang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1149047 Files : * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java reduce name node calls in hive by creating temporary directories Key: HIVE-2201 URL: https://issues.apache.org/jira/browse/HIVE-2201 Project: Hive Issue Type: Improvement Reporter: Namit Jain Assignee: Siying Dong Attachments: HIVE-2201.1.patch, HIVE-2201.2.patch, HIVE-2201.3.patch, HIVE-2201.4.patch Currently, in Hive, when a file gets written by a FileSinkOperator, the sequence of operations is as follows: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp1/1 3. Move directory /tmp1 to /tmp2 4. For all files in /tmp2, remove all files starting with _tmp and duplicate files. Due to speculative execution, a lot of temporary files are created in /tmp1 (or /tmp2). This leads to a lot of name node calls, specially for large queries. The protocol above can be modified slightly: 1. In tmp directory tmp1, create a tmp file _tmp_1 2. At the end of the operator, move /tmp1/_tmp_1 to /tmp2/1 3. Move directory /tmp2 to /tmp3 4. For all files in /tmp3, remove all duplicate files. This should reduce the number of tmp files. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2296) bad compressed file names from insert into
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franklin Hu updated HIVE-2296: -- Fix Version/s: 0.8.0 Status: Patch Available (was: Open) bad compressed file names from insert into -- Key: HIVE-2296 URL: https://issues.apache.org/jira/browse/HIVE-2296 Project: Hive Issue Type: Bug Affects Versions: 0.8.0 Reporter: Franklin Hu Assignee: Franklin Hu Fix For: 0.8.0 Attachments: hive-2296.1.patch, hive-2296.2.patch When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names: Before INSERT INTO: 00_0.gz After INSERT INTO: 00_0.gz 00_0.gz_copy_1 This causes corrupted output when doing a SELECT * on the table. Correct behavior should be to pick a valid filename such as: 00_0_copy_1.gz -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work stopped] (HIVE-2296) bad compressed file names from insert into
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-2296 stopped by Franklin Hu. bad compressed file names from insert into -- Key: HIVE-2296 URL: https://issues.apache.org/jira/browse/HIVE-2296 Project: Hive Issue Type: Bug Affects Versions: 0.8.0 Reporter: Franklin Hu Assignee: Franklin Hu Fix For: 0.8.0 Attachments: hive-2296.1.patch, hive-2296.2.patch When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names: Before INSERT INTO: 00_0.gz After INSERT INTO: 00_0.gz 00_0.gz_copy_1 This causes corrupted output when doing a SELECT * on the table. Correct behavior should be to pick a valid filename such as: 00_0_copy_1.gz -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: Cli: Print Hadoop's CPU milliseconds
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/948/ --- (Updated 2011-07-21 17:30:55.228025) Review request for hive, Yongqiang He, Ning Zhang, and namit jain. Changes --- fix a bug Summary --- In hive CLI, print out CPU msec from Hadoop MapReduce coutners. This addresses bug HIVE-2236. https://issues.apache.org/jira/browse/HIVE-2236 Diffs (updated) - trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1148623 trunk/ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1148623 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java 1148623 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1148623 trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 1148623 Diff: https://reviews.apache.org/r/948/diff Testing --- run the updated codes against real clusters and make sure it printing is correct. Thanks, Siying
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Attachment: HIVE-2236.3.patch fix a bug Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Status: Open (was: Patch Available) Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siying Dong updated HIVE-2236: -- Status: Patch Available (was: Open) Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2236) Cli: Print Hadoop's CPU milliseconds
[ https://issues.apache.org/jira/browse/HIVE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069094#comment-13069094 ] jirapos...@reviews.apache.org commented on HIVE-2236: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/948/ --- (Updated 2011-07-21 17:30:55.228025) Review request for hive, Yongqiang He, Ning Zhang, and namit jain. Changes --- fix a bug Summary --- In hive CLI, print out CPU msec from Hadoop MapReduce coutners. This addresses bug HIVE-2236. https://issues.apache.org/jira/browse/HIVE-2236 Diffs (updated) - trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1148623 trunk/ql/src/java/org/apache/hadoop/hive/ql/MapRedStats.java PRE-CREATION trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 1148623 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java 1148623 trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1148623 trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 1148623 Diff: https://reviews.apache.org/r/948/diff Testing --- run the updated codes against real clusters and make sure it printing is correct. Thanks, Siying Cli: Print Hadoop's CPU milliseconds Key: HIVE-2236 URL: https://issues.apache.org/jira/browse/HIVE-2236 Project: Hive Issue Type: New Feature Components: CLI Reporter: Siying Dong Assignee: Siying Dong Priority: Minor Attachments: HIVE-2236.1.patch, HIVE-2236.2.patch, HIVE-2236.3.patch CPU Milliseonds information is available from Hadoop's framework. Printing it out to Hive CLI when executing a job will help users to know more about their jobs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2285) Pretty-print output of DESCRIBE TABLE EXTENDED
[ https://issues.apache.org/jira/browse/HIVE-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069117#comment-13069117 ] Carl Steinbach commented on HIVE-2285: -- Formatted output should be the default behavior of DESCRIBE TABLE EXTENDED, but due to backwards compatibility issues DESCRIBE FORMATTED was introduced instead. In this ticket I'm proposing that we introduce a configuration variable hive.formatted.describe.extended, which when set to true will cause the output of DESCRIBE EXTENDED to be formatted. Pretty-print output of DESCRIBE TABLE EXTENDED -- Key: HIVE-2285 URL: https://issues.apache.org/jira/browse/HIVE-2285 Project: Hive Issue Type: Improvement Components: CLI Reporter: Carl Steinbach Assignee: Carl Steinbach -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069126#comment-13069126 ] Siying Dong commented on HIVE-2247: --- I'm looking at the patch. Please test the backward compatible between the old server, new client and new server, old client. Please come by if you don't know how to test it. ALTER TABLE RENAME PARTITION Key: HIVE-2247 URL: https://issues.apache.org/jira/browse/HIVE-2247 Project: Hive Issue Type: New Feature Reporter: Siying Dong Assignee: Weiyan Wang Attachments: HIVE-2247.3.patch.txt, HIVE-2247.4.patch.txt, HIVE-2247.5.patch.txt We need a ALTER TABLE TABLE RENAME PARTITIONfunction that is similar t ALTER TABLE RENAME. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2139) Enables HiveServer to accept -hiveconf option
[ https://issues.apache.org/jira/browse/HIVE-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2139: - Resolution: Fixed Fix Version/s: 0.8.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Patrick! Enables HiveServer to accept -hiveconf option - Key: HIVE-2139 URL: https://issues.apache.org/jira/browse/HIVE-2139 Project: Hive Issue Type: Improvement Components: CLI Environment: Linux + CDH3u0 (Hive 0.7.0+27.1-2~lucid-cdh3) Reporter: Kazuki Ohta Assignee: Patrick Hunt Fix For: 0.8.0 Attachments: HIVE-2139.patch, HIVE-2139.patch, HIVE-2139.patch Currently, I'm trying to test HiveHBaseIntegration on HiveServer. But it doesn't seem to accept -hiveconf command. {code} hive --service hiveserver -hiveconf hbase.zookeeper.quorum=hdp0,hdp1,hdp2 Starting Hive Thrift Server java.lang.NumberFormatException: For input string: -hiveconf at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:449) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.hadoop.hive.service.HiveServer.main(HiveServer.java:382) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {code} Therefore, you need to throw the query like set hbase.zookeeper.quorum=hdp0,hdp1,hdp2 everytime. It's not convenient for separating the configuration between server-side and client-side. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2297) Fix NPE in ConditionalResolverSkewJoin
[ https://issues.apache.org/jira/browse/HIVE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2297: --- Attachment: fix_npe.patch Fix NPE in ConditionalResolverSkewJoin -- Key: HIVE-2297 URL: https://issues.apache.org/jira/browse/HIVE-2297 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: fix_npe.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2297) Fix NPE in ConditionalResolverSkewJoin
[ https://issues.apache.org/jira/browse/HIVE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069134#comment-13069134 ] Vaibhav Aggarwal commented on HIVE-2297: Some of the file systems can return null if there are no objects to list. Added a fix for that. Fix NPE in ConditionalResolverSkewJoin -- Key: HIVE-2297 URL: https://issues.apache.org/jira/browse/HIVE-2297 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: fix_npe.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2297) Fix NPE in ConditionalResolverSkewJoin
Fix NPE in ConditionalResolverSkewJoin -- Key: HIVE-2297 URL: https://issues.apache.org/jira/browse/HIVE-2297 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: fix_npe.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2297) Fix NPE in ConditionalResolverSkewJoin
[ https://issues.apache.org/jira/browse/HIVE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2297: --- Status: Patch Available (was: Open) Fix NPE in ConditionalResolverSkewJoin -- Key: HIVE-2297 URL: https://issues.apache.org/jira/browse/HIVE-2297 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: fix_npe.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2298: --- Attachment: HIVE-2298.patch Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2298) Fix UDAFPercentile to tolerate null percentiles
[ https://issues.apache.org/jira/browse/HIVE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2298: --- Status: Patch Available (was: Open) Fix UDAFPercentile to tolerate null percentiles --- Key: HIVE-2298 URL: https://issues.apache.org/jira/browse/HIVE-2298 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2298.patch UDAFPercentile when passed null percentile list will throw a null pointer exception. Submitting a small fix for that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2299) Optimize Hive query startup time for multiple partitions
Optimize Hive query startup time for multiple partitions Key: HIVE-2299 URL: https://issues.apache.org/jira/browse/HIVE-2299 Project: Hive Issue Type: Improvement Reporter: Vaibhav Aggarwal Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O(n) operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2299) Optimize Hive query startup time for multiple partitions
[ https://issues.apache.org/jira/browse/HIVE-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2299: --- Description: Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O n operation. was: Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O(n) operation. Optimize Hive query startup time for multiple partitions Key: HIVE-2299 URL: https://issues.apache.org/jira/browse/HIVE-2299 Project: Hive Issue Type: Improvement Reporter: Vaibhav Aggarwal Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O n operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2297) Fix NPE in ConditionalResolverSkewJoin
[ https://issues.apache.org/jira/browse/HIVE-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2297: --- Attachment: HIVE-2297.patch Fix NPE in ConditionalResolverSkewJoin -- Key: HIVE-2297 URL: https://issues.apache.org/jira/browse/HIVE-2297 Project: Hive Issue Type: Bug Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2297.patch, fix_npe.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2299) Optimize Hive query startup time for multiple partitions
[ https://issues.apache.org/jira/browse/HIVE-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2299: --- Attachment: HIVE-2299.patch Optimize Hive query startup time for multiple partitions Key: HIVE-2299 URL: https://issues.apache.org/jira/browse/HIVE-2299 Project: Hive Issue Type: Improvement Reporter: Vaibhav Aggarwal Attachments: HIVE-2299.patch Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O n operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2299) Optimize Hive query startup time for multiple partitions
[ https://issues.apache.org/jira/browse/HIVE-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal updated HIVE-2299: --- Assignee: Vaibhav Aggarwal Status: Patch Available (was: Open) Optimize Hive query startup time for multiple partitions Key: HIVE-2299 URL: https://issues.apache.org/jira/browse/HIVE-2299 Project: Hive Issue Type: Improvement Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-2299.patch Added an optimization to the way input splits are computed. Reduced an O(n^2) operation to O n operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2247) ALTER TABLE RENAME PARTITION
[ https://issues.apache.org/jira/browse/HIVE-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069146#comment-13069146 ] jirapos...@reviews.apache.org commented on HIVE-2247: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1105/#review1156 --- Please try to add the new column in the middle first. If that works, we should do that way to make it consistent with alter_table() call. If that doesn't work, it's OK to add it to the end now. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java https://reviews.apache.org/r/1105/#comment2385 Why we still need another function call rename_partition_core()? Can't we just modify alter_partition_core() to always use the same logic? - Siying On 2011-07-21 01:20:25, Weiyan Wang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1105/ bq. --- bq. bq. (Updated 2011-07-21 01:20:25) bq. bq. bq. Review request for Siying Dong. bq. bq. bq. Summary bq. --- bq. bq. Implement ALTER TABLE PARTITION RENAME function to rename a partition. bq. Add HiveQL syntax ALTER TABLE bar PARTITION (k1='v1', k2='v2') RENAME TO PARTITION (k1='v3', k2='v4'); bq. This is my first Hive diff, I just learn everything from existing codebase and may not have a good understanding on it. bq. Feel free to inform me if I make something wrong. Thanks bq. bq. bq. This addresses bug HIVE-2247. bq. https://issues.apache.org/jira/browse/HIVE-2247 bq. bq. bq. Diffs bq. - bq. bq.trunk/metastore/if/hive_metastore.thrift 1145366 bq.trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 1145366 bq.trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 1145366 bq. trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp 1145366 bq. trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 1145366 bq. trunk/metastore/src/gen/thrift/gen-php/hive_metastore/ThriftHiveMetastore.php 1145366 bq. trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote 1145366 bq. trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 1145366 bq.trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 1145366 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 1145366 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 1145366 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 1145366 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 1145366 bq. trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1145366 bq.trunk/metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 1145366 bq. trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 1145366 bq.trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1145366 bq.trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1145366 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1145366 bq.trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1145366 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 1145366 bq.trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/AlterTableDesc.java 1145366 bq.trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/DDLWork.java 1145366 bq.trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java 1145366 bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/RenamePartitionDesc.java PRE-CREATION bq. trunk/ql/src/test/queries/clientnegative/alter_rename_partition_failure.q PRE-CREATION bq. trunk/ql/src/test/queries/clientnegative/alter_rename_partition_failure2.q PRE-CREATION bq. trunk/ql/src/test/queries/clientnegative/alter_rename_partition_failure3.q PRE-CREATION bq.trunk/ql/src/test/queries/clientpositive/alter_rename_partition.q PRE-CREATION bq. trunk/ql/src/test/queries/clientpositive/alter_rename_partition_authorization.q PRE-CREATION bq. trunk/ql/src/test/results/clientnegative/alter_rename_partition_failure.q.out PRE-CREATION bq. trunk/ql/src/test/results/clientnegative/alter_rename_partition_failure2.q.out PRE-CREATION bq. trunk/ql/src/test/results/clientnegative/alter_rename_partition_failure3.q.out PRE-CREATION bq.
[jira] [Updated] (HIVE-2086) Add test coverage for external table data loss issue
[ https://issues.apache.org/jira/browse/HIVE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2086: - Summary: Add test coverage for external table data loss issue (was: Data loss with external table) Add test coverage for external table data loss issue Key: HIVE-2086 URL: https://issues.apache.org/jira/browse/HIVE-2086 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Environment: Amazon elastics mapreduce cluster Reporter: Q Long Assignee: Jonathan Natkins Attachments: HIVE-2086.1.patch, HIVE-2086.2.patch, HIVE-2086.3.patch, create_like.q.out Data loss when using create external table like statement. 1) Set up an external table S, point to location L. Populate data in S. 2) Create another external table T, using statement like this: create external table T like S location L Make sure table T point to the same location as the original table S. 3) Query table T, see the same set of data in S. 4) drop table T. 5) Query table S will return nothing, and location L is deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2086) Add test coverage for external table data loss issue
[ https://issues.apache.org/jira/browse/HIVE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-2086: - Resolution: Fixed Fix Version/s: 0.8.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Natty! Add test coverage for external table data loss issue Key: HIVE-2086 URL: https://issues.apache.org/jira/browse/HIVE-2086 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Environment: Amazon elastics mapreduce cluster Reporter: Q Long Assignee: Jonathan Natkins Fix For: 0.8.0 Attachments: HIVE-2086.1.patch, HIVE-2086.2.patch, HIVE-2086.3.patch, create_like.q.out Data loss when using create external table like statement. 1) Set up an external table S, point to location L. Populate data in S. 2) Create another external table T, using statement like this: create external table T like S location L Make sure table T point to the same location as the original table S. 3) Query table T, see the same set of data in S. 4) drop table T. 5) Query table S will return nothing, and location L is deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Hive-trunk-h0.21 #840
See https://builds.apache.org/job/Hive-trunk-h0.21/840/ -- [...truncated 36020 lines...] [junit] OK [junit] PREHOOK: query: select count(1) as cnt from testhivedrivertable [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/jenkins/hive_2011-07-21_12-25-43_113_324537898771116411/-mr-1 [junit] Total MapReduce jobs = 1 [junit] Launching Job 1 out of 1 [junit] Number of reduce tasks determined at compile time: 1 [junit] In order to change the average load for a reducer (in bytes): [junit] set hive.exec.reducers.bytes.per.reducer=number [junit] In order to limit the maximum number of reducers: [junit] set hive.exec.reducers.max=number [junit] In order to set a constant number of reducers: [junit] set mapred.reduce.tasks=number [junit] Job running in-process (local Hadoop) [junit] Hadoop job information for null: number of mappers: 0; number of reducers: 0 [junit] 2011-07-21 12:25:46,011 null map = 100%, reduce = 100% [junit] Ended Job = job_local_0001 [junit] POSTHOOK: query: select count(1) as cnt from testhivedrivertable [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/jenkins/hive_2011-07-21_12-25-43_113_324537898771116411/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201107211225_1946650877.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: load data local inpath '/x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/data/files/kv1.txt' into table testhivedrivertable [junit] PREHOOK: type: LOAD [junit] PREHOOK: Output: default@testhivedrivertable [junit] Copying data from file:/x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/data/files/kv1.txt [junit] Loading data to table default.testhivedrivertable [junit] POSTHOOK: query: load data local inpath '/x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/data/files/kv1.txt' into table testhivedrivertable [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: select * from testhivedrivertable limit 10 [junit] PREHOOK: type: QUERY [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: file:/tmp/jenkins/hive_2011-07-21_12-25-47_339_5931340635384748549/-mr-1 [junit] POSTHOOK: query: select * from testhivedrivertable limit 10 [junit] POSTHOOK: type: QUERY [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: file:/tmp/jenkins/hive_2011-07-21_12-25-47_339_5931340635384748549/-mr-1 [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] PREHOOK: Input: default@testhivedrivertable [junit] PREHOOK: Output: default@testhivedrivertable [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] POSTHOOK: Input: default@testhivedrivertable [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] Hive history file=/x1/jenkins/jenkins-slave/workspace/Hive-trunk-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201107211225_522256161.txt [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit] POSTHOOK: query: drop table testhivedrivertable [junit] POSTHOOK: type: DROPTABLE [junit] OK [junit] PREHOOK: query: create table testhivedrivertable (num int) [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: create table testhivedrivertable (num int) [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@testhivedrivertable [junit] OK [junit] PREHOOK: query: drop table testhivedrivertable [junit] PREHOOK: type: DROPTABLE [junit]
Build failed in Jenkins: Hive-trunk-h0.21 #841
See https://builds.apache.org/job/Hive-trunk-h0.21/841/changes Changes: [cws] HIVE-2086. Add test coverage for external table data loss issue (Jonathan Natkins via cws) [cws] HIVE-2139. Enable HiveServer to accept -hiveconf option (Patrick Hunt via cws) -- [...truncated 14342 lines...] [junit] o[2] class = class java.util.HashMap [junit] o = [234, [firstString, secondString], {firstKey=1, secondKey=2}, -234, 1.0, -2.5] [junit] Testing protocol: org.apache.thrift.protocol.TBinaryProtocol [junit] TypeName = struct_hello:int,2bye:arraystring,another:mapstring,int,nhello:int,d:double,nd:double [junit] bytes =x08xffxffx00x00x00xeax0fxffxfex0bx00x00x00x02x00x00x00x0bx66x69x72x73x74x53x74x72x69x6ex67x00x00x00x0cx73x65x63x6fx6ex64x53x74x72x69x6ex67x0dxffxfdx0bx08x00x00x00x02x00x00x00x08x66x69x72x73x74x4bx65x79x00x00x00x01x00x00x00x09x73x65x63x6fx6ex64x4bx65x79x00x00x00x02x08xffxfcxffxffxffx16x04xffxfbx3fxf0x00x00x00x00x00x00x04xffxfaxc0x04x00x00x00x00x00x00x00 [junit] o class = class java.util.ArrayList [junit] o size = 6 [junit] o[0] class = class java.lang.Integer [junit] o[1] class = class java.util.ArrayList [junit] o[2] class = class java.util.HashMap [junit] o = [234, [firstString, secondString], {firstKey=1, secondKey=2}, -234, 1.0, -2.5] [junit] Testing protocol: org.apache.thrift.protocol.TJSONProtocol [junit] TypeName = struct_hello:int,2bye:arraystring,another:mapstring,int,nhello:int,d:double,nd:double [junit] bytes =x7bx22x2dx31x22x3ax7bx22x69x33x32x22x3ax32x33x34x7dx2cx22x2dx32x22x3ax7bx22x6cx73x74x22x3ax5bx22x73x74x72x22x2cx32x2cx22x66x69x72x73x74x53x74x72x69x6ex67x22x2cx22x73x65x63x6fx6ex64x53x74x72x69x6ex67x22x5dx7dx2cx22x2dx33x22x3ax7bx22x6dx61x70x22x3ax5bx22x73x74x72x22x2cx22x69x33x32x22x2cx32x2cx7bx22x66x69x72x73x74x4bx65x79x22x3ax31x2cx22x73x65x63x6fx6ex64x4bx65x79x22x3ax32x7dx5dx7dx2cx22x2dx34x22x3ax7bx22x69x33x32x22x3ax2dx32x33x34x7dx2cx22x2dx35x22x3ax7bx22x64x62x6cx22x3ax31x2ex30x7dx2cx22x2dx36x22x3ax7bx22x64x62x6cx22x3ax2dx32x2ex35x7dx7d [junit] bytes in text ={-1:{i32:234},-2:{lst:[str,2,firstString,secondString]},-3:{map:[str,i32,2,{firstKey:1,secondKey:2}]},-4:{i32:-234},-5:{dbl:1.0},-6:{dbl:-2.5}} [junit] o class = class java.util.ArrayList [junit] o size = 6 [junit] o[0] class = class java.lang.Integer [junit] o[1] class = class java.util.ArrayList [junit] o[2] class = class java.util.HashMap [junit] o = [234, [firstString, secondString], {firstKey=1, secondKey=2}, -234, 1.0, -2.5] [junit] Testing protocol: org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol [junit] TypeName = struct_hello:int,2bye:arraystring,another:mapstring,int,nhello:int,d:double,nd:double [junit] bytes =x32x33x34x01x66x69x72x73x74x53x74x72x69x6ex67x02x73x65x63x6fx6ex64x53x74x72x69x6ex67x01x66x69x72x73x74x4bx65x79x03x31x02x73x65x63x6fx6ex64x4bx65x79x03x32x01x2dx32x33x34x01x31x2ex30x01x2dx32x2ex35 [junit] bytes in text =234firstStringsecondStringfirstKey1secondKey2-2341.0-2.5 [junit] o class = class java.util.ArrayList [junit] o size = 6 [junit] o[0] class = class java.lang.Integer [junit] o[1] class = class java.util.ArrayList [junit] o[2] class = class java.util.HashMap [junit] o = [234, [firstString, secondString], {firstKey=1, secondKey=2}, -234, 1.0, -2.5] [junit] Beginning Test testTBinarySortableProtocol: [junit] Testing struct test { double hello} [junit] Testing struct test { i32 hello} [junit] Testing struct test { i64 hello} [junit] Testing struct test { string hello} [junit] Testing struct test { string hello, double another} [junit] Test testTBinarySortableProtocol passed! [junit] bytes in text =234 firstStringsecondString firstKey1secondKey2 [junit] compare to=234 firstStringsecondString firstKey1secondKey2 [junit] o class = class java.util.ArrayList [junit] o size = 3 [junit] o[0] class = class java.lang.Integer [junit] o[1] class = class java.util.ArrayList [junit] o[2] class = class java.util.HashMap [junit] o = [234, [firstString, secondString], {firstKey=1, secondKey=2}] [junit] bytes in text =234 firstStringsecondString firstKey1secondKey2 [junit] compare to=234 firstStringsecondString firstKey1secondKey2 [junit] o class = class java.util.ArrayList [junit] o size = 3 [junit] o = [234, null, {firstKey=1, secondKey=2}] [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.305 sec [junit] Running org.apache.hadoop.hive.serde2.lazy.TestLazyArrayMapStruct [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.169 sec [junit] Running org.apache.hadoop.hive.serde2.lazy.TestLazyPrimitive [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.146 sec [junit] Running org.apache.hadoop.hive.serde2.lazy.TestLazySimpleSerDe
[jira] [Commented] (HIVE-2086) Add test coverage for external table data loss issue
[ https://issues.apache.org/jira/browse/HIVE-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069164#comment-13069164 ] Hudson commented on HIVE-2086: -- Integrated in Hive-trunk-h0.21 #841 (See [https://builds.apache.org/job/Hive-trunk-h0.21/841/]) HIVE-2086. Add test coverage for external table data loss issue (Jonathan Natkins via cws) cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1149331 Files : * /hive/trunk/data/files/ext_test * /hive/trunk/ql/src/test/queries/clientpositive/create_like.q * /hive/trunk/ql/src/test/results/clientpositive/create_like.q.out * /hive/trunk/data/files/ext_test/test.dat * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java * /hive/trunk/build-common.xml Add test coverage for external table data loss issue Key: HIVE-2086 URL: https://issues.apache.org/jira/browse/HIVE-2086 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Environment: Amazon elastics mapreduce cluster Reporter: Q Long Assignee: Jonathan Natkins Fix For: 0.8.0 Attachments: HIVE-2086.1.patch, HIVE-2086.2.patch, HIVE-2086.3.patch, create_like.q.out Data loss when using create external table like statement. 1) Set up an external table S, point to location L. Populate data in S. 2) Create another external table T, using statement like this: create external table T like S location L Make sure table T point to the same location as the original table S. 3) Query table T, see the same set of data in S. 4) drop table T. 5) Query table S will return nothing, and location L is deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2139) Enables HiveServer to accept -hiveconf option
[ https://issues.apache.org/jira/browse/HIVE-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069163#comment-13069163 ] Hudson commented on HIVE-2139: -- Integrated in Hive-trunk-h0.21 #841 (See [https://builds.apache.org/job/Hive-trunk-h0.21/841/]) HIVE-2139. Enable HiveServer to accept -hiveconf option (Patrick Hunt via cws) cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1149311 Files : * /hive/trunk/common/src/java/org/apache/hadoop/hive/common/LogUtils.java * /hive/trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java * /hive/trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java * /hive/trunk/common/src/java/org/apache/hadoop/hive/common/cli/CommonCliOptions.java * /hive/trunk/metastore/ivy.xml * /hive/trunk/bin/ext/metastore.sh * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java * /hive/trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java * /hive/trunk/common/ivy.xml * /hive/trunk/common/build.xml * /hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java * /hive/trunk/common/src/java/org/apache/hadoop/hive/common/cli * /hive/trunk/bin/ext/hiveserver.sh Enables HiveServer to accept -hiveconf option - Key: HIVE-2139 URL: https://issues.apache.org/jira/browse/HIVE-2139 Project: Hive Issue Type: Improvement Components: CLI Environment: Linux + CDH3u0 (Hive 0.7.0+27.1-2~lucid-cdh3) Reporter: Kazuki Ohta Assignee: Patrick Hunt Fix For: 0.8.0 Attachments: HIVE-2139.patch, HIVE-2139.patch, HIVE-2139.patch Currently, I'm trying to test HiveHBaseIntegration on HiveServer. But it doesn't seem to accept -hiveconf command. {code} hive --service hiveserver -hiveconf hbase.zookeeper.quorum=hdp0,hdp1,hdp2 Starting Hive Thrift Server java.lang.NumberFormatException: For input string: -hiveconf at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:449) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.hadoop.hive.service.HiveServer.main(HiveServer.java:382) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {code} Therefore, you need to throw the query like set hbase.zookeeper.quorum=hdp0,hdp1,hdp2 everytime. It's not convenient for separating the configuration between server-side and client-side. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2139) Enables HiveServer to accept -hiveconf option
[ https://issues.apache.org/jira/browse/HIVE-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069184#comment-13069184 ] Patrick Hunt commented on HIVE-2139: Should I update the docs for this? Where? If so any guidelines for doing so? (version differences for example) Enables HiveServer to accept -hiveconf option - Key: HIVE-2139 URL: https://issues.apache.org/jira/browse/HIVE-2139 Project: Hive Issue Type: Improvement Components: CLI Environment: Linux + CDH3u0 (Hive 0.7.0+27.1-2~lucid-cdh3) Reporter: Kazuki Ohta Assignee: Patrick Hunt Fix For: 0.8.0 Attachments: HIVE-2139.patch, HIVE-2139.patch, HIVE-2139.patch Currently, I'm trying to test HiveHBaseIntegration on HiveServer. But it doesn't seem to accept -hiveconf command. {code} hive --service hiveserver -hiveconf hbase.zookeeper.quorum=hdp0,hdp1,hdp2 Starting Hive Thrift Server java.lang.NumberFormatException: For input string: -hiveconf at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:449) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.hadoop.hive.service.HiveServer.main(HiveServer.java:382) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {code} Therefore, you need to throw the query like set hbase.zookeeper.quorum=hdp0,hdp1,hdp2 everytime. It's not convenient for separating the configuration between server-side and client-side. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2300) Move TempStatsStore directory to build/test
Move TempStatsStore directory to build/test --- Key: HIVE-2300 URL: https://issues.apache.org/jira/browse/HIVE-2300 Project: Hive Issue Type: Bug Components: Statistics, Testing Infrastructure Reporter: Carl Steinbach -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-1604) Patch to allow variables in Hive
[ https://issues.apache.org/jira/browse/HIVE-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Aggarwal reassigned HIVE-1604: -- Assignee: Vaibhav Aggarwal Patch to allow variables in Hive Key: HIVE-1604 URL: https://issues.apache.org/jira/browse/HIVE-1604 Project: Hive Issue Type: Improvement Components: CLI Reporter: Vaibhav Aggarwal Assignee: Vaibhav Aggarwal Attachments: HIVE-1604.patch Patch to Hive which allows command line substitution. The patch modifies the Hive command line driver and options processor to support the following arguments: hive [-d key=value] [-define key=value] -dSubsitution to apply to script -define Subsitution to apply to script -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1078) CREATE VIEW followup: CREATE OR REPLACE
[ https://issues.apache.org/jira/browse/HIVE-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Chen updated HIVE-1078: --- Status: Open (was: Patch Available) Fails create_or_replace_view.q CREATE VIEW followup: CREATE OR REPLACE Key: HIVE-1078 URL: https://issues.apache.org/jira/browse/HIVE-1078 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen Attachments: HIVE-1078v3.patch, HIVE-1078v4.patch, HIVE-1078v5.patch, HIVE-1078v6.patch, HIVE-1078v7.patch, HIVE-1078v8.patch Currently, replacing a view requires DROP VIEW v; CREATE VIEW v AS new-definition; CREATE OR REPLACE would allow these to be combined into a single operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-1078: CREATE VIEW followup: CREATE OR REPLACE
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1058/ --- (Updated 2011-07-21 22:07:29.150219) Review request for hive. Changes --- Fix failure of create_or_replace_view.q Summary --- https://issues.apache.org/jira/browse/HIVE-1078 This addresses bug HIVE-1078. https://issues.apache.org/jira/browse/HIVE-1078 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 1146902 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1146902 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1146902 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1146902 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1146902 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CreateViewDesc.java 1146902 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view1.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view2.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view3.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view4.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view5.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view6.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view7.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view8.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/recursive_view.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/create_or_replace_view.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view1.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view2.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view3.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view4.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view5.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view6.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view7.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view8.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/recursive_view.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/create_or_replace_view.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/create_view.q.out 1146902 Diff: https://reviews.apache.org/r/1058/diff Testing --- Passes unit tests Thanks, Charles
[jira] [Commented] (HIVE-1078) CREATE VIEW followup: CREATE OR REPLACE
[ https://issues.apache.org/jira/browse/HIVE-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069240#comment-13069240 ] jirapos...@reviews.apache.org commented on HIVE-1078: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1058/ --- (Updated 2011-07-21 22:07:29.150219) Review request for hive. Changes --- Fix failure of create_or_replace_view.q Summary --- https://issues.apache.org/jira/browse/HIVE-1078 This addresses bug HIVE-1078. https://issues.apache.org/jira/browse/HIVE-1078 Diffs (updated) - http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java 1146902 http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1146902 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1146902 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 1146902 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1146902 http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CreateViewDesc.java 1146902 http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view1.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view2.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view3.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view4.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view5.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view6.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view7.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/create_or_replace_view8.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/recursive_view.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/create_or_replace_view.q PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view1.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view2.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view3.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view4.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view5.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view6.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view7.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/create_or_replace_view8.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/recursive_view.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/create_or_replace_view.q.out PRE-CREATION http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/create_view.q.out 1146902 Diff: https://reviews.apache.org/r/1058/diff Testing --- Passes unit tests Thanks, Charles CREATE VIEW followup: CREATE OR REPLACE Key: HIVE-1078 URL: https://issues.apache.org/jira/browse/HIVE-1078 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen Attachments: HIVE-1078v3.patch, HIVE-1078v4.patch, HIVE-1078v5.patch, HIVE-1078v6.patch, HIVE-1078v7.patch, HIVE-1078v8.patch, HIVE-1078v9.patch Currently, replacing a view requires DROP VIEW v; CREATE VIEW v AS new-definition; CREATE OR REPLACE would allow these to be combined into a single operation. -- This message is automatically generated by JIRA. For more information on JIRA, see:
[jira] [Updated] (HIVE-1078) CREATE VIEW followup: CREATE OR REPLACE
[ https://issues.apache.org/jira/browse/HIVE-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Chen updated HIVE-1078: --- Attachment: HIVE-1078v9.patch CREATE VIEW followup: CREATE OR REPLACE Key: HIVE-1078 URL: https://issues.apache.org/jira/browse/HIVE-1078 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen Attachments: HIVE-1078v3.patch, HIVE-1078v4.patch, HIVE-1078v5.patch, HIVE-1078v6.patch, HIVE-1078v7.patch, HIVE-1078v8.patch, HIVE-1078v9.patch Currently, replacing a view requires DROP VIEW v; CREATE VIEW v AS new-definition; CREATE OR REPLACE would allow these to be combined into a single operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1078) CREATE VIEW followup: CREATE OR REPLACE
[ https://issues.apache.org/jira/browse/HIVE-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Chen updated HIVE-1078: --- Status: Patch Available (was: Open) CREATE VIEW followup: CREATE OR REPLACE Key: HIVE-1078 URL: https://issues.apache.org/jira/browse/HIVE-1078 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen Attachments: HIVE-1078v3.patch, HIVE-1078v4.patch, HIVE-1078v5.patch, HIVE-1078v6.patch, HIVE-1078v7.patch, HIVE-1078v8.patch, HIVE-1078v9.patch Currently, replacing a view requires DROP VIEW v; CREATE VIEW v AS new-definition; CREATE OR REPLACE would allow these to be combined into a single operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2139) Enables HiveServer to accept -hiveconf option
[ https://issues.apache.org/jira/browse/HIVE-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069245#comment-13069245 ] Carl Steinbach commented on HIVE-2139: -- I think doc updates should go here: https://cwiki.apache.org/confluence/display/Hive/HiveServer Enables HiveServer to accept -hiveconf option - Key: HIVE-2139 URL: https://issues.apache.org/jira/browse/HIVE-2139 Project: Hive Issue Type: Improvement Components: CLI Environment: Linux + CDH3u0 (Hive 0.7.0+27.1-2~lucid-cdh3) Reporter: Kazuki Ohta Assignee: Patrick Hunt Fix For: 0.8.0 Attachments: HIVE-2139.patch, HIVE-2139.patch, HIVE-2139.patch Currently, I'm trying to test HiveHBaseIntegration on HiveServer. But it doesn't seem to accept -hiveconf command. {code} hive --service hiveserver -hiveconf hbase.zookeeper.quorum=hdp0,hdp1,hdp2 Starting Hive Thrift Server java.lang.NumberFormatException: For input string: -hiveconf at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:449) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.hadoop.hive.service.HiveServer.main(HiveServer.java:382) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {code} Therefore, you need to throw the query like set hbase.zookeeper.quorum=hdp0,hdp1,hdp2 everytime. It's not convenient for separating the configuration between server-side and client-side. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2139) Enables HiveServer to accept -hiveconf option
[ https://issues.apache.org/jira/browse/HIVE-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069251#comment-13069251 ] Patrick Hunt commented on HIVE-2139: Ok, I'll update that. Do you all have some common way to handle documenting the fact that pre-0.8.0 it's one way, post 0.8.0 it's another? Is there an example you can point me to? Enables HiveServer to accept -hiveconf option - Key: HIVE-2139 URL: https://issues.apache.org/jira/browse/HIVE-2139 Project: Hive Issue Type: Improvement Components: CLI Environment: Linux + CDH3u0 (Hive 0.7.0+27.1-2~lucid-cdh3) Reporter: Kazuki Ohta Assignee: Patrick Hunt Fix For: 0.8.0 Attachments: HIVE-2139.patch, HIVE-2139.patch, HIVE-2139.patch Currently, I'm trying to test HiveHBaseIntegration on HiveServer. But it doesn't seem to accept -hiveconf command. {code} hive --service hiveserver -hiveconf hbase.zookeeper.quorum=hdp0,hdp1,hdp2 Starting Hive Thrift Server java.lang.NumberFormatException: For input string: -hiveconf at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:449) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.hadoop.hive.service.HiveServer.main(HiveServer.java:382) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {code} Therefore, you need to throw the query like set hbase.zookeeper.quorum=hdp0,hdp1,hdp2 everytime. It's not convenient for separating the configuration between server-side and client-side. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2128) Automatic Indexing with multiple tables
[ https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed S. Albiz updated HIVE-2128: Attachment: HIVE-2128.6.patch Automatic Indexing with multiple tables --- Key: HIVE-2128 URL: https://issues.apache.org/jira/browse/HIVE-2128 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch, HIVE-2128.4.patch, HIVE-2128.5.patch, HIVE-2128.6.patch Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2128) Automatic Indexing with multiple tables
[ https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed S. Albiz updated HIVE-2128: Status: Patch Available (was: Open) Automatic Indexing with multiple tables --- Key: HIVE-2128 URL: https://issues.apache.org/jira/browse/HIVE-2128 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch, HIVE-2128.4.patch, HIVE-2128.5.patch, HIVE-2128.6.patch Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2128) Automatic Indexing with multiple tables
[ https://issues.apache.org/jira/browse/HIVE-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069299#comment-13069299 ] jirapos...@reviews.apache.org commented on HIVE-2128: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1010/ --- (Updated 2011-07-21 23:52:23.929900) Review request for hive and John Sichi. Changes --- Added order by to testcases. This revealed an existing bug where we would walk the entire operator tree for each task in the task tree in IndexWhereTaskDispatcher. I amended this to only walk the subset of the operator tree in the current task. Summary --- Grab the indexed tables during optimized query generation, grab the associated path URIs, and keep those around in the Configuration object. When the job is passed to ExecDriver, this data is extracted and used in HiveIndexedInputFormat to decide whether to use the index file or delegate to the parent (HiveInputFormat) class. Not sure if this is robust. This addresses bug HIVE-2128. https://issues.apache.org/jira/browse/HIVE-2128 Diffs (updated) - ql/src/test/results/clientpositive/index_bitmap_auto_partitioned.q.out 4c9efd1 ql/src/test/results/clientpositive/index_auto_self_join.q.out PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java b9b586e ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java f1ee95d ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java 61bbbf5 ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java 7c91946 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereProcessor.java dbc489f ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/index/IndexWhereTaskDispatcher.java da084f6 ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java a03a9a6 ql/src/test/queries/clientpositive/index_auto_mult_tables.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_mult_tables_compact.q PRE-CREATION ql/src/test/queries/clientpositive/index_auto_self_join.q PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables.q.out PRE-CREATION ql/src/test/results/clientpositive/index_auto_mult_tables_compact.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1010/diff Testing --- added new testcase index_auto_mult_tables.q Thanks, Syed Automatic Indexing with multiple tables --- Key: HIVE-2128 URL: https://issues.apache.org/jira/browse/HIVE-2128 Project: Hive Issue Type: Improvement Components: Indexing Affects Versions: 0.8.0 Reporter: Russell Melick Assignee: Syed S. Albiz Attachments: HIVE-2128.1.patch, HIVE-2128.1.patch, HIVE-2128.2.patch, HIVE-2128.4.patch, HIVE-2128.5.patch, HIVE-2128.6.patch Make automatic indexing work with jobs which access multiple tables. We'll probably need to modify the way that the index input format works in order to associate index formats/files with specific tables. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-1078) CREATE VIEW followup: CREATE OR REPLACE
[ https://issues.apache.org/jira/browse/HIVE-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069304#comment-13069304 ] John Sichi commented on HIVE-1078: -- +1. Will commit when tests pass. CREATE VIEW followup: CREATE OR REPLACE Key: HIVE-1078 URL: https://issues.apache.org/jira/browse/HIVE-1078 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: Charles Chen Attachments: HIVE-1078v3.patch, HIVE-1078v4.patch, HIVE-1078v5.patch, HIVE-1078v6.patch, HIVE-1078v7.patch, HIVE-1078v8.patch, HIVE-1078v9.patch Currently, replacing a view requires DROP VIEW v; CREATE VIEW v AS new-definition; CREATE OR REPLACE would allow these to be combined into a single operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2296) bad compressed file names from insert into
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069314#comment-13069314 ] Siying Dong commented on HIVE-2296: --- +1 bad compressed file names from insert into -- Key: HIVE-2296 URL: https://issues.apache.org/jira/browse/HIVE-2296 Project: Hive Issue Type: Bug Affects Versions: 0.8.0 Reporter: Franklin Hu Assignee: Franklin Hu Fix For: 0.8.0 Attachments: hive-2296.1.patch, hive-2296.2.patch When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names: Before INSERT INTO: 00_0.gz After INSERT INTO: 00_0.gz 00_0.gz_copy_1 This causes corrupted output when doing a SELECT * on the table. Correct behavior should be to pick a valid filename such as: 00_0_copy_1.gz -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2139) Enables HiveServer to accept -hiveconf option
[ https://issues.apache.org/jira/browse/HIVE-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069328#comment-13069328 ] Carl Steinbach commented on HIVE-2139: -- @Patrick: Good idea. I created a page on the wiki for notes like this: https://cwiki.apache.org/confluence/display/Hive/HiveChangeLog Please add a blurb there and be sure to include a link back to this ticket. Thanks! Enables HiveServer to accept -hiveconf option - Key: HIVE-2139 URL: https://issues.apache.org/jira/browse/HIVE-2139 Project: Hive Issue Type: Improvement Components: CLI Environment: Linux + CDH3u0 (Hive 0.7.0+27.1-2~lucid-cdh3) Reporter: Kazuki Ohta Assignee: Patrick Hunt Fix For: 0.8.0 Attachments: HIVE-2139.patch, HIVE-2139.patch, HIVE-2139.patch Currently, I'm trying to test HiveHBaseIntegration on HiveServer. But it doesn't seem to accept -hiveconf command. {code} hive --service hiveserver -hiveconf hbase.zookeeper.quorum=hdp0,hdp1,hdp2 Starting Hive Thrift Server java.lang.NumberFormatException: For input string: -hiveconf at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:449) at java.lang.Integer.parseInt(Integer.java:499) at org.apache.hadoop.hive.service.HiveServer.main(HiveServer.java:382) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {code} Therefore, you need to throw the query like set hbase.zookeeper.quorum=hdp0,hdp1,hdp2 everytime. It's not convenient for separating the configuration between server-side and client-side. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request: HIVE-2246: Dedupe tables' column schemas from partitions in the metastore db
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1183/ --- Review request for hive, Ning Zhang and Paul Yang. Summary --- This patch tries to make minimal changes to the API while keeping migration short and somewhat easy to revert. The new schema can be described as follows: - CDS is a table corresponding to Column Descriptor objects. Currently, it only stores a CD_ID. - COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns. A Column Descriptor holds a list of columns. COLUMNS_V2 has a foreign key to the CD_ID to which it belongs. - SDS was modified to reference a Column Descriptor. So SDS now has a foreign key to a CD_ID which describes its columns. During migration, we create Column Descriptors for tables in a straightforward manner: their columns are now just wrapped inside a column descriptor. The SDS of partitions use their parent table's column descriptor, since currently a partition and its table share the same list of columns. When altering or adding a partition, give it it's parent table's column descriptor IF the columns they describe are the same. Otherwise, create a new column descriptor for its columns. When adding or altering a table, create a new column descriptor every time. Whenever you drop a storage descriptor (e.g, when dropping tables or partitions), check to see if the related column descriptor has any other references in the table. That is, check to see if any other storage descriptors point to that column descriptor. If none do, then delete that column descriptor. This check is in place so we don't have unreferenced column descriptors and columns hanging around after schema evolution for tables. This addresses bug HIVE-2246. https://issues.apache.org/jira/browse/HIVE-2246 Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1148945 trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java 1148945 trunk/metastore/src/model/package.jdo 1148945 Diff: https://reviews.apache.org/r/1183/diff Testing --- Passes facebook's regression testing and all existing test cases. In one instance, before migration, the overhead involved with storage descriptors and columns was ~11 GB. After migration, the overhead was ~1.5 GB. Thanks, Sohan
[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069389#comment-13069389 ] jirapos...@reviews.apache.org commented on HIVE-2246: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1183/ --- Review request for hive, Ning Zhang and Paul Yang. Summary --- This patch tries to make minimal changes to the API while keeping migration short and somewhat easy to revert. The new schema can be described as follows: - CDS is a table corresponding to Column Descriptor objects. Currently, it only stores a CD_ID. - COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns. A Column Descriptor holds a list of columns. COLUMNS_V2 has a foreign key to the CD_ID to which it belongs. - SDS was modified to reference a Column Descriptor. So SDS now has a foreign key to a CD_ID which describes its columns. During migration, we create Column Descriptors for tables in a straightforward manner: their columns are now just wrapped inside a column descriptor. The SDS of partitions use their parent table's column descriptor, since currently a partition and its table share the same list of columns. When altering or adding a partition, give it it's parent table's column descriptor IF the columns they describe are the same. Otherwise, create a new column descriptor for its columns. When adding or altering a table, create a new column descriptor every time. Whenever you drop a storage descriptor (e.g, when dropping tables or partitions), check to see if the related column descriptor has any other references in the table. That is, check to see if any other storage descriptors point to that column descriptor. If none do, then delete that column descriptor. This check is in place so we don't have unreferenced column descriptors and columns hanging around after schema evolution for tables. This addresses bug HIVE-2246. https://issues.apache.org/jira/browse/HIVE-2246 Diffs - trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1148945 trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java 1148945 trunk/metastore/src/model/package.jdo 1148945 Diff: https://reviews.apache.org/r/1183/diff Testing --- Passes facebook's regression testing and all existing test cases. In one instance, before migration, the overhead involved with storage descriptors and columns was ~11 GB. After migration, the overhead was ~1.5 GB. Thanks, Sohan Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2246.2.patch We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2246: - Description: Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. was: We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Tags: metastore, schema, JDO Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2246.2.patch Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: HIVE-2246: Dedupe tables' column schemas from partitions in the metastore db
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1183/ --- (Updated 2011-07-22 05:30:29.026246) Review request for hive, Ning Zhang and Paul Yang. Changes --- Adding some files I missed in the last diff. Summary --- This patch tries to make minimal changes to the API while keeping migration short and somewhat easy to revert. The new schema can be described as follows: - CDS is a table corresponding to Column Descriptor objects. Currently, it only stores a CD_ID. - COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns. A Column Descriptor holds a list of columns. COLUMNS_V2 has a foreign key to the CD_ID to which it belongs. - SDS was modified to reference a Column Descriptor. So SDS now has a foreign key to a CD_ID which describes its columns. During migration, we create Column Descriptors for tables in a straightforward manner: their columns are now just wrapped inside a column descriptor. The SDS of partitions use their parent table's column descriptor, since currently a partition and its table share the same list of columns. When altering or adding a partition, give it it's parent table's column descriptor IF the columns they describe are the same. Otherwise, create a new column descriptor for its columns. When adding or altering a table, create a new column descriptor every time. Whenever you drop a storage descriptor (e.g, when dropping tables or partitions), check to see if the related column descriptor has any other references in the table. That is, check to see if any other storage descriptors point to that column descriptor. If none do, then delete that column descriptor. This check is in place so we don't have unreferenced column descriptors and columns hanging around after schema evolution for tables. This addresses bug HIVE-2246. https://issues.apache.org/jira/browse/HIVE-2246 Diffs (updated) - trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1148945 trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java PRE-CREATION trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java 1148945 trunk/metastore/src/model/package.jdo 1148945 Diff: https://reviews.apache.org/r/1183/diff Testing --- Passes facebook's regression testing and all existing test cases. In one instance, before migration, the overhead involved with storage descriptors and columns was ~11 GB. After migration, the overhead was ~1.5 GB. Thanks, Sohan
[jira] [Updated] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sohan Jain updated HIVE-2246: - Attachment: HIVE-2246.3.patch Adding some missing files that I forgot to svn add Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db
[ https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069392#comment-13069392 ] jirapos...@reviews.apache.org commented on HIVE-2246: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1183/ --- (Updated 2011-07-22 05:30:29.026246) Review request for hive, Ning Zhang and Paul Yang. Changes --- Adding some files I missed in the last diff. Summary --- This patch tries to make minimal changes to the API while keeping migration short and somewhat easy to revert. The new schema can be described as follows: - CDS is a table corresponding to Column Descriptor objects. Currently, it only stores a CD_ID. - COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns. A Column Descriptor holds a list of columns. COLUMNS_V2 has a foreign key to the CD_ID to which it belongs. - SDS was modified to reference a Column Descriptor. So SDS now has a foreign key to a CD_ID which describes its columns. During migration, we create Column Descriptors for tables in a straightforward manner: their columns are now just wrapped inside a column descriptor. The SDS of partitions use their parent table's column descriptor, since currently a partition and its table share the same list of columns. When altering or adding a partition, give it it's parent table's column descriptor IF the columns they describe are the same. Otherwise, create a new column descriptor for its columns. When adding or altering a table, create a new column descriptor every time. Whenever you drop a storage descriptor (e.g, when dropping tables or partitions), check to see if the related column descriptor has any other references in the table. That is, check to see if any other storage descriptors point to that column descriptor. If none do, then delete that column descriptor. This check is in place so we don't have unreferenced column descriptors and columns hanging around after schema evolution for tables. This addresses bug HIVE-2246. https://issues.apache.org/jira/browse/HIVE-2246 Diffs (updated) - trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql PRE-CREATION trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 1148945 trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java PRE-CREATION trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java 1148945 trunk/metastore/src/model/package.jdo 1148945 Diff: https://reviews.apache.org/r/1183/diff Testing --- Passes facebook's regression testing and all existing test cases. In one instance, before migration, the overhead involved with storage descriptors and columns was ~11 GB. After migration, the overhead was ~1.5 GB. Thanks, Sohan Dedupe tables' column schemas from partitions in the metastore db - Key: HIVE-2246 URL: https://issues.apache.org/jira/browse/HIVE-2246 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Sohan Jain Assignee: Sohan Jain Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch Note: this patch proposes a schema change, and is therefore incompatible with the current metastore. We can re-organize the JDO models to reduce space usage to keep the metastore scalable for the future. Currently, partitions are the fastest growing objects in the metastore, and the metastore keeps a separate copy of the columns list for each partition. We can normalize the metastore db by decoupling Columns from Storage Descriptors and not storing duplicate lists of the columns for each partition. An idea is to create an additional level of indirection with a Column Descriptor that has a list of columns. A table has a reference to its latest Column Descriptor (note: a table may have more than one Column Descriptor in the case of schema evolution). Partitions and Indexes can reference the same Column Descriptors as their parent table. Currently, the COLUMNS table in the metastore has roughly (number of partitions + number of tables) * (average number of columns pertable) rows. We can reduce this to (number of tables) * (average number of columns per table) rows, while incurring a small cost proportional to the number of tables to store the Column Descriptors. Please see the latest review board for additional implementation details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira