[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-10-13 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-8292:
--
  Resolution: Fixed
Release Note: HIVE-8292: MapRecordSource should obtain its ExecContext from 
a MapOperator (Gopal V, reviewed by Vikram Dixit)
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch, thanks [~vikram.dixit]!

 Reading from partitioned bucketed tables has high overhead in 
 MapOperator.cleanUpInputFileChangedOp
 ---

 Key: HIVE-8292
 URL: https://issues.apache.org/jira/browse/HIVE-8292
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Gopal V
 Fix For: 0.14.0

 Attachments: 2014_09_29_14_46_04.jfr, HIVE-8292.1.patch, 
 HIVE-8292.2.patch


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
 5% the CPU in 
 {code}
  Path onepath = normalizePath(onefile);
 {code}
 And 
 45% the CPU in 
 {code}
  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
 {code}
 From the profiler 
 {code}
 Stack Trace   Sample CountPercentage(%)
 hive.ql.exec.tez.MapRecordSource.processRow(Object)   5,327   62.348
hive.ql.exec.vector.VectorMapOperator.process(Writable)5,326   62.336
   hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851   56.777
  hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849   56.753
  java.net.URI.relativize(URI) 3,903   45.681
 java.net.URI.relativize(URI, URI) 3,903   
 45.681
java.net.URI.normalize(String) 2,169   
 25.386
java.net.URI.equal(String, String) 
 526 6.156
java.net.URI.equalIgnoringCase(String, 
 String) 1   0.012
java.lang.String.substring(int)
 1   0.012
 hive.ql.exec.MapOperator.normalizePath(String)506 5.922
 org.apache.commons.logging.impl.Log4JLogger.info(Object)  32  
 0.375
  java.net.URI.equals(Object)  12  0.14
  java.util.HashMap$KeySet.iterator()  5   
 0.059
  java.util.HashMap.get(Object)4   
 0.047
  java.util.LinkedHashMap.get(Object)  3   
 0.035
  hive.ql.exec.Operator.cleanUpInputFileChanged()  1   0.012
   hive.ql.exec.Operator.forward(Object, ObjectInspector)  473 5.536
   hive.ql.exec.mr.ExecMapperContext.inputFileChanged()1   0.012
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-10-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-8292:
--
Attachment: HIVE-8292.2.patch

 Reading from partitioned bucketed tables has high overhead in 
 MapOperator.cleanUpInputFileChangedOp
 ---

 Key: HIVE-8292
 URL: https://issues.apache.org/jira/browse/HIVE-8292
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: 2014_09_29_14_46_04.jfr, HIVE-8292.1.patch, 
 HIVE-8292.2.patch


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
 5% the CPU in 
 {code}
  Path onepath = normalizePath(onefile);
 {code}
 And 
 45% the CPU in 
 {code}
  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
 {code}
 From the profiler 
 {code}
 Stack Trace   Sample CountPercentage(%)
 hive.ql.exec.tez.MapRecordSource.processRow(Object)   5,327   62.348
hive.ql.exec.vector.VectorMapOperator.process(Writable)5,326   62.336
   hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851   56.777
  hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849   56.753
  java.net.URI.relativize(URI) 3,903   45.681
 java.net.URI.relativize(URI, URI) 3,903   
 45.681
java.net.URI.normalize(String) 2,169   
 25.386
java.net.URI.equal(String, String) 
 526 6.156
java.net.URI.equalIgnoringCase(String, 
 String) 1   0.012
java.lang.String.substring(int)
 1   0.012
 hive.ql.exec.MapOperator.normalizePath(String)506 5.922
 org.apache.commons.logging.impl.Log4JLogger.info(Object)  32  
 0.375
  java.net.URI.equals(Object)  12  0.14
  java.util.HashMap$KeySet.iterator()  5   
 0.059
  java.util.HashMap.get(Object)4   
 0.047
  java.util.LinkedHashMap.get(Object)  3   
 0.035
  hive.ql.exec.Operator.cleanUpInputFileChanged()  1   0.012
   hive.ql.exec.Operator.forward(Object, ObjectInspector)  473 5.536
   hive.ql.exec.mr.ExecMapperContext.inputFileChanged()1   0.012
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-10-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-8292:
--
Assignee: Gopal V  (was: Vikram Dixit K)
  Status: Patch Available  (was: Open)

 Reading from partitioned bucketed tables has high overhead in 
 MapOperator.cleanUpInputFileChangedOp
 ---

 Key: HIVE-8292
 URL: https://issues.apache.org/jira/browse/HIVE-8292
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Gopal V
 Fix For: 0.14.0

 Attachments: 2014_09_29_14_46_04.jfr, HIVE-8292.1.patch, 
 HIVE-8292.2.patch


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
 5% the CPU in 
 {code}
  Path onepath = normalizePath(onefile);
 {code}
 And 
 45% the CPU in 
 {code}
  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
 {code}
 From the profiler 
 {code}
 Stack Trace   Sample CountPercentage(%)
 hive.ql.exec.tez.MapRecordSource.processRow(Object)   5,327   62.348
hive.ql.exec.vector.VectorMapOperator.process(Writable)5,326   62.336
   hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851   56.777
  hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849   56.753
  java.net.URI.relativize(URI) 3,903   45.681
 java.net.URI.relativize(URI, URI) 3,903   
 45.681
java.net.URI.normalize(String) 2,169   
 25.386
java.net.URI.equal(String, String) 
 526 6.156
java.net.URI.equalIgnoringCase(String, 
 String) 1   0.012
java.lang.String.substring(int)
 1   0.012
 hive.ql.exec.MapOperator.normalizePath(String)506 5.922
 org.apache.commons.logging.impl.Log4JLogger.info(Object)  32  
 0.375
  java.net.URI.equals(Object)  12  0.14
  java.util.HashMap$KeySet.iterator()  5   
 0.059
  java.util.HashMap.get(Object)4   
 0.047
  java.util.LinkedHashMap.get(Object)  3   
 0.035
  hive.ql.exec.Operator.cleanUpInputFileChanged()  1   0.012
   hive.ql.exec.Operator.forward(Object, ObjectInspector)  473 5.536
   hive.ql.exec.mr.ExecMapperContext.inputFileChanged()1   0.012
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-10-06 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Attachment: HIVE-8292.1.patch

This patch addresses the regression but doesn't handle multiple inputs for SMB 
join.

 Reading from partitioned bucketed tables has high overhead in 
 MapOperator.cleanUpInputFileChangedOp
 ---

 Key: HIVE-8292
 URL: https://issues.apache.org/jira/browse/HIVE-8292
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: 2014_09_29_14_46_04.jfr, HIVE-8292.1.patch


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
 5% the CPU in 
 {code}
  Path onepath = normalizePath(onefile);
 {code}
 And 
 45% the CPU in 
 {code}
  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
 {code}
 From the profiler 
 {code}
 Stack Trace   Sample CountPercentage(%)
 hive.ql.exec.tez.MapRecordSource.processRow(Object)   5,327   62.348
hive.ql.exec.vector.VectorMapOperator.process(Writable)5,326   62.336
   hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851   56.777
  hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849   56.753
  java.net.URI.relativize(URI) 3,903   45.681
 java.net.URI.relativize(URI, URI) 3,903   
 45.681
java.net.URI.normalize(String) 2,169   
 25.386
java.net.URI.equal(String, String) 
 526 6.156
java.net.URI.equalIgnoringCase(String, 
 String) 1   0.012
java.lang.String.substring(int)
 1   0.012
 hive.ql.exec.MapOperator.normalizePath(String)506 5.922
 org.apache.commons.logging.impl.Log4JLogger.info(Object)  32  
 0.375
  java.net.URI.equals(Object)  12  0.14
  java.util.HashMap$KeySet.iterator()  5   
 0.059
  java.util.HashMap.get(Object)4   
 0.047
  java.util.LinkedHashMap.get(Object)  3   
 0.035
  hive.ql.exec.Operator.cleanUpInputFileChanged()  1   0.012
   hive.ql.exec.Operator.forward(Object, ObjectInspector)  473 5.536
   hive.ql.exec.mr.ExecMapperContext.inputFileChanged()1   0.012
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-10-05 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Assignee: Vikram Dixit K  (was: Prasanth J)

 Reading from partitioned bucketed tables has high overhead in 
 MapOperator.cleanUpInputFileChangedOp
 ---

 Key: HIVE-8292
 URL: https://issues.apache.org/jira/browse/HIVE-8292
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Vikram Dixit K
 Fix For: 0.14.0

 Attachments: 2014_09_29_14_46_04.jfr


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
 5% the CPU in 
 {code}
  Path onepath = normalizePath(onefile);
 {code}
 And 
 45% the CPU in 
 {code}
  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
 {code}
 From the profiler 
 {code}
 Stack Trace   Sample CountPercentage(%)
 hive.ql.exec.tez.MapRecordSource.processRow(Object)   5,327   62.348
hive.ql.exec.vector.VectorMapOperator.process(Writable)5,326   62.336
   hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851   56.777
  hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849   56.753
  java.net.URI.relativize(URI) 3,903   45.681
 java.net.URI.relativize(URI, URI) 3,903   
 45.681
java.net.URI.normalize(String) 2,169   
 25.386
java.net.URI.equal(String, String) 
 526 6.156
java.net.URI.equalIgnoringCase(String, 
 String) 1   0.012
java.lang.String.substring(int)
 1   0.012
 hive.ql.exec.MapOperator.normalizePath(String)506 5.922
 org.apache.commons.logging.impl.Log4JLogger.info(Object)  32  
 0.375
  java.net.URI.equals(Object)  12  0.14
  java.util.HashMap$KeySet.iterator()  5   
 0.059
  java.util.HashMap.get(Object)4   
 0.047
  java.util.LinkedHashMap.get(Object)  3   
 0.035
  hive.ql.exec.Operator.cleanUpInputFileChanged()  1   0.012
   hive.ql.exec.Operator.forward(Object, ObjectInspector)  473 5.536
   hive.ql.exec.mr.ExecMapperContext.inputFileChanged()1   0.012
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-09-29 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Assignee: Prasanth J  (was: Owen O'Malley)

 Reading from partitioned bucketed tables has high overhead in 
 MapOperator.cleanUpInputFileChangedOp
 ---

 Key: HIVE-8292
 URL: https://issues.apache.org/jira/browse/HIVE-8292
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
 Fix For: 0.14.0


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 50% of the time is spent in these two lines of code in 
 OrcInputFormate.getReader()
 {code}
 String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
 Long.MAX_VALUE + :);
 ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
 {code}
 {code}
 Stack Trace   Sample CountPercentage(%)
 hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981   87.215
   org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572
   
 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
  Object)  2,002   58.572
   
 mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
 1,984   58.046
   hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
 Reporter)   1,983   58.016
   
 hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)  
   1,891   55.325
   hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
 AcidInputFormat$Options)1,723   50.41
   hive.common.ValidTxnListImpl.init(String) 
 934 27.326
 conf.Configuration.get(String, String)621 
 18.169
  {code}
 Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
 5% the CPU in 
 {code}
  Path onepath = normalizePath(onefile);
 {code}
 And 
 15% the CPU in 
 {code}
  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
 {code}
 From the profiler 
 {code}
 Stack Trace   Sample CountPercentage(%)
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 
 28.613
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)  
 978 28.613
   org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged()   
 866 25.336
  
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   
 866 25.336
 java.net.URI.relativize(URI)  655 19.163
java.net.URI.relativize(URI, URI)  655 19.163
   java.net.URI.normalize(String)  517 15.126
   java.net.URI.needsNormalization(String) 
 372 10.884
  java.lang.String.charAt(int) 235 
 6.875
 
 java.net.URI.equal(String, String)27  0.79
 
 java.lang.StringBuilder.toString()1   0.029
 
 java.lang.StringBuilder.init()  1   0.029
 
 java.lang.StringBuilder.append(String)1   0.029
   
 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167   
   4.886
  
 org.apache.hadoop.fs.Path.init(String) 162 4.74
 
 org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
 4.74
   org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, String)  
 97  2.838
 org.apache.commons.lang.StringUtils.replace(String, String, 
 String, int)  97  2.838
java.lang.String.indexOf(String, int)  97  2.838
   java.net.URI.init(String, String, String, String, String) 
 65  1.902
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-09-29 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.init()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.init(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74
org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
   org.apache.commons.lang.StringUtils.replace(String, String, String)  
97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, 
String, int)  97  2.838
 java.lang.String.indexOf(String, int)  97  2.838
java.net.URI.init(String, String, String, String, String) 
65  1.902
{code}


  was:
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the time is spent in these two lines of code in 
OrcInputFormate.getReader()
{code}
String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
Long.MAX_VALUE + :);
ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
{code}

{code}
Stack Trace Sample CountPercentage(%)
hive.ql.exec.tez.MapRecordSource.pushRecord()   2,981   87.215
org.apache.tez.mapreduce.lib.MRReaderMapred.next()  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
 Object)  2,002   58.572

mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
1,984   58.046
hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, 
Reporter)   1,983   58.016

hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)
1,891   55.325
hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, 
AcidInputFormat$Options)1,723   50.41
hive.common.ValidTxnListImpl.init(String) 
934 27.326
conf.Configuration.get(String, String)  621 
18.169
 {code}

Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   

[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-09-29 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Attachment: 2014_09_29_14_46_04.jfr

Hot function profile

 Reading from partitioned bucketed tables has high overhead in 
 MapOperator.cleanUpInputFileChangedOp
 ---

 Key: HIVE-8292
 URL: https://issues.apache.org/jira/browse/HIVE-8292
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
 Environment: cn105
Reporter: Mostafa Mokhtar
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: 2014_09_29_14_46_04.jfr


 Reading from bucketed partitioned tables has significantly higher overhead 
 compared to non-bucketed non-partitioned files.
 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
 5% the CPU in 
 {code}
  Path onepath = normalizePath(onefile);
 {code}
 And 
 15% the CPU in 
 {code}
  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
 {code}
 From the profiler 
 {code}
 Stack Trace   Sample CountPercentage(%)
 org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 
 28.613
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)  
 978 28.613
   org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged()   
 866 25.336
  
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   
 866 25.336
 java.net.URI.relativize(URI)  655 19.163
java.net.URI.relativize(URI, URI)  655 19.163
   java.net.URI.normalize(String)  517 15.126
   java.net.URI.needsNormalization(String) 
 372 10.884
  java.lang.String.charAt(int) 235 
 6.875
 
 java.net.URI.equal(String, String)27  0.79
 
 java.lang.StringBuilder.toString()1   0.029
 
 java.lang.StringBuilder.init()  1   0.029
 
 java.lang.StringBuilder.append(String)1   0.029
   
 org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167   
   4.886
  
 org.apache.hadoop.fs.Path.init(String) 162 4.74
 
 org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
 4.74
   org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, String)  
 97  2.838
 org.apache.commons.lang.StringUtils.replace(String, String, 
 String, int)  97  2.838
java.lang.String.indexOf(String, int)  97  2.838
   java.net.URI.init(String, String, String, String, String) 
 65  1.902
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-09-29 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
45% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.init()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.init(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74
org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
   org.apache.commons.lang.StringUtils.replace(String, String, String)  
97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, 
String, int)  97  2.838
 java.lang.String.indexOf(String, int)  97  2.838
java.net.URI.init(String, String, String, String, String) 
65  1.902
{code}


  was:
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
15% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.init()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.init(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74

[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

2014-09-29 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8292:
--
Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
45% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
hive.ql.exec.tez.MapRecordSource.processRow(Object) 5,327   62.348
   hive.ql.exec.vector.VectorMapOperator.process(Writable)  5,326   62.336
  hive.ql.exec.Operator.cleanUpInputFileChanged()   4,851   56.777
 hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   4,849   56.753
 java.net.URI.relativize(URI)   3,903   45.681
java.net.URI.relativize(URI, URI)   3,903   
45.681
   java.net.URI.normalize(String)   2,169   
25.386
   java.net.URI.equal(String, String)   
526 6.156
   java.net.URI.equalIgnoringCase(String, 
String)   1   0.012
   java.lang.String.substring(int)  1   
0.012
hive.ql.exec.MapOperator.normalizePath(String)  506 5.922
org.apache.commons.logging.impl.Log4JLogger.info(Object)32  
0.375
 java.net.URI.equals(Object)12  0.14
 java.util.HashMap$KeySet.iterator()5   
0.059
 java.util.HashMap.get(Object)  4   0.047
 java.util.LinkedHashMap.get(Object)3   
0.035
 hive.ql.exec.Operator.cleanUpInputFileChanged()1   0.012
  hive.ql.exec.Operator.forward(Object, ObjectInspector)473 5.536
  hive.ql.exec.mr.ExecMapperContext.inputFileChanged()  1   0.012
{code}


  was:
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
45% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

From the profiler 
{code}
Stack Trace Sample CountPercentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978 
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
  org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 
25.336
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866 25.336
java.net.URI.relativize(URI)655 19.163
   java.net.URI.relativize(URI, URI)655 19.163
  java.net.URI.normalize(String)517 15.126
java.net.URI.needsNormalization(String) 
372 10.884
   java.lang.String.charAt(int) 235 
6.875
  
java.net.URI.equal(String, String)27  0.79
  
java.lang.StringBuilder.toString()1   0.029
  
java.lang.StringBuilder.init()  1   0.029
  
java.lang.StringBuilder.append(String)1   0.029

org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)167 
4.886
   
org.apache.hadoop.fs.Path.init(String) 162 4.74
  
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162 
4.74
org.apache.hadoop.fs.Path.normalizePath(String, String) 97  2.838
   org.apache.commons.lang.StringUtils.replace(String, String, String)  
97  2.838
  org.apache.commons.lang.StringUtils.replace(String, String, 
String, int)  97  2.838
 java.lang.String.indexOf(String, int)  97  2.838
java.net.URI.init(String, String, String, String, String) 
65  1.902
{code}



 Reading from partitioned bucketed tables has high overhead in 
 MapOperator.cleanUpInputFileChangedOp