[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-8292: -- Resolution: Fixed Release Note: HIVE-8292: MapRecordSource should obtain its ExecContext from a MapOperator (Gopal V, reviewed by Vikram Dixit) Status: Resolved (was: Patch Available) Committed to trunk and branch, thanks [~vikram.dixit]! Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp --- Key: HIVE-8292 URL: https://issues.apache.org/jira/browse/HIVE-8292 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Gopal V Fix For: 0.14.0 Attachments: 2014_09_29_14_46_04.jfr, HIVE-8292.1.patch, HIVE-8292.2.patch Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 45% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.processRow(Object) 5,327 62.348 hive.ql.exec.vector.VectorMapOperator.process(Writable)5,326 62.336 hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851 56.777 hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849 56.753 java.net.URI.relativize(URI) 3,903 45.681 java.net.URI.relativize(URI, URI) 3,903 45.681 java.net.URI.normalize(String) 2,169 25.386 java.net.URI.equal(String, String) 526 6.156 java.net.URI.equalIgnoringCase(String, String) 1 0.012 java.lang.String.substring(int) 1 0.012 hive.ql.exec.MapOperator.normalizePath(String)506 5.922 org.apache.commons.logging.impl.Log4JLogger.info(Object) 32 0.375 java.net.URI.equals(Object) 12 0.14 java.util.HashMap$KeySet.iterator() 5 0.059 java.util.HashMap.get(Object)4 0.047 java.util.LinkedHashMap.get(Object) 3 0.035 hive.ql.exec.Operator.cleanUpInputFileChanged() 1 0.012 hive.ql.exec.Operator.forward(Object, ObjectInspector) 473 5.536 hive.ql.exec.mr.ExecMapperContext.inputFileChanged()1 0.012 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-8292: -- Attachment: HIVE-8292.2.patch Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp --- Key: HIVE-8292 URL: https://issues.apache.org/jira/browse/HIVE-8292 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: 2014_09_29_14_46_04.jfr, HIVE-8292.1.patch, HIVE-8292.2.patch Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 45% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.processRow(Object) 5,327 62.348 hive.ql.exec.vector.VectorMapOperator.process(Writable)5,326 62.336 hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851 56.777 hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849 56.753 java.net.URI.relativize(URI) 3,903 45.681 java.net.URI.relativize(URI, URI) 3,903 45.681 java.net.URI.normalize(String) 2,169 25.386 java.net.URI.equal(String, String) 526 6.156 java.net.URI.equalIgnoringCase(String, String) 1 0.012 java.lang.String.substring(int) 1 0.012 hive.ql.exec.MapOperator.normalizePath(String)506 5.922 org.apache.commons.logging.impl.Log4JLogger.info(Object) 32 0.375 java.net.URI.equals(Object) 12 0.14 java.util.HashMap$KeySet.iterator() 5 0.059 java.util.HashMap.get(Object)4 0.047 java.util.LinkedHashMap.get(Object) 3 0.035 hive.ql.exec.Operator.cleanUpInputFileChanged() 1 0.012 hive.ql.exec.Operator.forward(Object, ObjectInspector) 473 5.536 hive.ql.exec.mr.ExecMapperContext.inputFileChanged()1 0.012 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-8292: -- Assignee: Gopal V (was: Vikram Dixit K) Status: Patch Available (was: Open) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp --- Key: HIVE-8292 URL: https://issues.apache.org/jira/browse/HIVE-8292 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Gopal V Fix For: 0.14.0 Attachments: 2014_09_29_14_46_04.jfr, HIVE-8292.1.patch, HIVE-8292.2.patch Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 45% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.processRow(Object) 5,327 62.348 hive.ql.exec.vector.VectorMapOperator.process(Writable)5,326 62.336 hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851 56.777 hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849 56.753 java.net.URI.relativize(URI) 3,903 45.681 java.net.URI.relativize(URI, URI) 3,903 45.681 java.net.URI.normalize(String) 2,169 25.386 java.net.URI.equal(String, String) 526 6.156 java.net.URI.equalIgnoringCase(String, String) 1 0.012 java.lang.String.substring(int) 1 0.012 hive.ql.exec.MapOperator.normalizePath(String)506 5.922 org.apache.commons.logging.impl.Log4JLogger.info(Object) 32 0.375 java.net.URI.equals(Object) 12 0.14 java.util.HashMap$KeySet.iterator() 5 0.059 java.util.HashMap.get(Object)4 0.047 java.util.LinkedHashMap.get(Object) 3 0.035 hive.ql.exec.Operator.cleanUpInputFileChanged() 1 0.012 hive.ql.exec.Operator.forward(Object, ObjectInspector) 473 5.536 hive.ql.exec.mr.ExecMapperContext.inputFileChanged()1 0.012 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8292: -- Attachment: HIVE-8292.1.patch This patch addresses the regression but doesn't handle multiple inputs for SMB join. Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp --- Key: HIVE-8292 URL: https://issues.apache.org/jira/browse/HIVE-8292 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: 2014_09_29_14_46_04.jfr, HIVE-8292.1.patch Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 45% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.processRow(Object) 5,327 62.348 hive.ql.exec.vector.VectorMapOperator.process(Writable)5,326 62.336 hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851 56.777 hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849 56.753 java.net.URI.relativize(URI) 3,903 45.681 java.net.URI.relativize(URI, URI) 3,903 45.681 java.net.URI.normalize(String) 2,169 25.386 java.net.URI.equal(String, String) 526 6.156 java.net.URI.equalIgnoringCase(String, String) 1 0.012 java.lang.String.substring(int) 1 0.012 hive.ql.exec.MapOperator.normalizePath(String)506 5.922 org.apache.commons.logging.impl.Log4JLogger.info(Object) 32 0.375 java.net.URI.equals(Object) 12 0.14 java.util.HashMap$KeySet.iterator() 5 0.059 java.util.HashMap.get(Object)4 0.047 java.util.LinkedHashMap.get(Object) 3 0.035 hive.ql.exec.Operator.cleanUpInputFileChanged() 1 0.012 hive.ql.exec.Operator.forward(Object, ObjectInspector) 473 5.536 hive.ql.exec.mr.ExecMapperContext.inputFileChanged()1 0.012 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8292: -- Assignee: Vikram Dixit K (was: Prasanth J) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp --- Key: HIVE-8292 URL: https://issues.apache.org/jira/browse/HIVE-8292 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: 2014_09_29_14_46_04.jfr Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 45% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.processRow(Object) 5,327 62.348 hive.ql.exec.vector.VectorMapOperator.process(Writable)5,326 62.336 hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851 56.777 hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849 56.753 java.net.URI.relativize(URI) 3,903 45.681 java.net.URI.relativize(URI, URI) 3,903 45.681 java.net.URI.normalize(String) 2,169 25.386 java.net.URI.equal(String, String) 526 6.156 java.net.URI.equalIgnoringCase(String, String) 1 0.012 java.lang.String.substring(int) 1 0.012 hive.ql.exec.MapOperator.normalizePath(String)506 5.922 org.apache.commons.logging.impl.Log4JLogger.info(Object) 32 0.375 java.net.URI.equals(Object) 12 0.14 java.util.HashMap$KeySet.iterator() 5 0.059 java.util.HashMap.get(Object)4 0.047 java.util.LinkedHashMap.get(Object) 3 0.035 hive.ql.exec.Operator.cleanUpInputFileChanged() 1 0.012 hive.ql.exec.Operator.forward(Object, ObjectInspector) 473 5.536 hive.ql.exec.mr.ExecMapperContext.inputFileChanged()1 0.012 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8292: -- Assignee: Prasanth J (was: Owen O'Malley) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp --- Key: HIVE-8292 URL: https://issues.apache.org/jira/browse/HIVE-8292 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Prasanth J Fix For: 0.14.0 Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String)621 18.169 {code} Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI) 655 19.163 java.net.URI.relativize(URI, URI) 655 19.163 java.net.URI.normalize(String) 517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8292: -- Description: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI)655 19.163 java.net.URI.relativize(URI, URI)655 19.163 java.net.URI.normalize(String)517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} was: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the time is spent in these two lines of code in OrcInputFormate.getReader() {code} String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY, Long.MAX_VALUE + :); ValidTxnList validTxnList = new ValidTxnListImpl(txnString); {code} {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215 org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046 hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325 hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)1,723 50.41 hive.common.ValidTxnListImpl.init(String) 934 27.326 conf.Configuration.get(String, String) 621 18.169 {code} Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI)655 19.163
[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8292: -- Attachment: 2014_09_29_14_46_04.jfr Hot function profile Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp --- Key: HIVE-8292 URL: https://issues.apache.org/jira/browse/HIVE-8292 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Environment: cn105 Reporter: Mostafa Mokhtar Assignee: Prasanth J Fix For: 0.14.0 Attachments: 2014_09_29_14_46_04.jfr Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI) 655 19.163 java.net.URI.relativize(URI, URI) 655 19.163 java.net.URI.normalize(String) 517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8292: -- Description: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 45% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI)655 19.163 java.net.URI.relativize(URI, URI)655 19.163 java.net.URI.normalize(String)517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} was: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 15% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI)655 19.163 java.net.URI.relativize(URI, URI)655 19.163 java.net.URI.normalize(String)517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74
[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
[ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-8292: -- Description: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 45% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) hive.ql.exec.tez.MapRecordSource.processRow(Object) 5,327 62.348 hive.ql.exec.vector.VectorMapOperator.process(Writable) 5,326 62.336 hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851 56.777 hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849 56.753 java.net.URI.relativize(URI) 3,903 45.681 java.net.URI.relativize(URI, URI) 3,903 45.681 java.net.URI.normalize(String) 2,169 25.386 java.net.URI.equal(String, String) 526 6.156 java.net.URI.equalIgnoringCase(String, String) 1 0.012 java.lang.String.substring(int) 1 0.012 hive.ql.exec.MapOperator.normalizePath(String) 506 5.922 org.apache.commons.logging.impl.Log4JLogger.info(Object)32 0.375 java.net.URI.equals(Object)12 0.14 java.util.HashMap$KeySet.iterator()5 0.059 java.util.HashMap.get(Object) 4 0.047 java.util.LinkedHashMap.get(Object)3 0.035 hive.ql.exec.Operator.cleanUpInputFileChanged()1 0.012 hive.ql.exec.Operator.forward(Object, ObjectInspector)473 5.536 hive.ql.exec.mr.ExecMapperContext.inputFileChanged() 1 0.012 {code} was: Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files. 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp 5% the CPU in {code} Path onepath = normalizePath(onefile); {code} And 45% the CPU in {code} onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri()); {code} From the profiler {code} Stack Trace Sample CountPercentage(%) org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336 java.net.URI.relativize(URI)655 19.163 java.net.URI.relativize(URI, URI)655 19.163 java.net.URI.normalize(String)517 15.126 java.net.URI.needsNormalization(String) 372 10.884 java.lang.String.charAt(int) 235 6.875 java.net.URI.equal(String, String)27 0.79 java.lang.StringBuilder.toString()1 0.029 java.lang.StringBuilder.init() 1 0.029 java.lang.StringBuilder.append(String)1 0.029 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 4.886 org.apache.hadoop.fs.Path.init(String) 162 4.74 org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74 org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838 org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838 java.lang.String.indexOf(String, int) 97 2.838 java.net.URI.init(String, String, String, String, String) 65 1.902 {code} Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp