[jira] [Commented] (HIVE-18362) Introduce a parameter to control the max row number for map join convertion

2018-01-03 Thread wan kun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16309791#comment-16309791
 ] 

wan kun commented on HIVE-18362:


Hi,[~gopalv]
What we do is similar, but there is some difference in the implementation. My 
implementation is to take the table or partition's ROW_COUNT information 
directly from hive metastore, which does not need additional calculations.

I also have a few questions to ask:

1. Why do you use NDV instead of using ROW_COUNT directly ? I think NDV will be 
less than the actual number of ROW, but the actual memory is linearly related 
to the number of ROW.
2., I'm sorry, I haven't had the test environment of hive 2.* for a while. Hive 
branch-2.* depends on ColStatistics's statistics. Can you tell me where does 
ColStatistics come from ? Is this nesessary to add extra calculation for 
additional column statistics before our job?
3. The checkNumberOfEntriesForHashTable function only checks the number of 
Entry of one RS at a time. Does it happen that multiple map table is loaded 
into memory together, resulting in OOM?

There are also two following questions:
1. ConvertJoinMapJoin optimization  is only used in TezCompiler ? Spark use 
SparkMapJoinOptimizer. There is no optimizer for MapReduce ?
2. in hive branch-1.2 does not have this part of the code (but this parameter 
is added in hive-default.xml.template, which should not be effective)

> Introduce a parameter to control the max row number for map join convertion
> ---
>
> Key: HIVE-18362
> URL: https://issues.apache.org/jira/browse/HIVE-18362
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: wan kun
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-18362-branch-1.2.patch
>
>
> The compression ratio of the Orc compressed file will be very high in some 
> cases.
> The test table has three Int columns, with twelve million records, but the 
> compressed file size is only 4M. Hive will automatically converts the Join to 
> Map join, but this will cause memory overflow. So I think it is better to 
> have a parameter to limit to the total number of table records in the Map 
> Join convertion, and if the total number of records is larger than that, it 
> can not be converted to Map join.
> *hive.auto.convert.join.max.number = 250L*
> The default value for this parameter is 250, because so many records 
> occupy about 700M memory in clint JVM, and 250 records for Map Join are 
> also large tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18362) Introduce a parameter to control the max row number for map join convertion

2018-01-03 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-18362:
---
Status: Patch Available  (was: In Progress)

> Introduce a parameter to control the max row number for map join convertion
> ---
>
> Key: HIVE-18362
> URL: https://issues.apache.org/jira/browse/HIVE-18362
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
>
> The compression ratio of the Orc compressed file will be very high in some 
> cases.
> The test table has three Int columns, with twelve million records, but the 
> compressed file size is only 4M. Hive will automatically converts the Join to 
> Map join, but this will cause memory overflow. So I think it is better to 
> have a parameter to limit to the total number of table records in the Map 
> Join convertion, and if the total number of records is larger than that, it 
> can not be converted to Map join.
> *hive.auto.convert.join.max.number = 250L*
> The default value for this parameter is 250, because so many records 
> occupy about 700M memory in clint JVM, and 250 records for Map Join are 
> also large tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18362) Introduce a parameter to control the max row number for map join convertion

2018-01-03 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-18362:
---
Attachment: HIVE-18362-branch-1.2.patch

> Introduce a parameter to control the max row number for map join convertion
> ---
>
> Key: HIVE-18362
> URL: https://issues.apache.org/jira/browse/HIVE-18362
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Attachments: HIVE-18362-branch-1.2.patch
>
>
> The compression ratio of the Orc compressed file will be very high in some 
> cases.
> The test table has three Int columns, with twelve million records, but the 
> compressed file size is only 4M. Hive will automatically converts the Join to 
> Map join, but this will cause memory overflow. So I think it is better to 
> have a parameter to limit to the total number of table records in the Map 
> Join convertion, and if the total number of records is larger than that, it 
> can not be converted to Map join.
> *hive.auto.convert.join.max.number = 250L*
> The default value for this parameter is 250, because so many records 
> occupy about 700M memory in clint JVM, and 250 records for Map Join are 
> also large tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-18362) Introduce a parameter to control the max row number for map join convertion

2018-01-03 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-18362 started by wan kun.
--
> Introduce a parameter to control the max row number for map join convertion
> ---
>
> Key: HIVE-18362
> URL: https://issues.apache.org/jira/browse/HIVE-18362
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
>
> The compression ratio of the Orc compressed file will be very high in some 
> cases.
> The test table has three Int columns, with twelve million records, but the 
> compressed file size is only 4M. Hive will automatically converts the Join to 
> Map join, but this will cause memory overflow. So I think it is better to 
> have a parameter to limit to the total number of table records in the Map 
> Join convertion, and if the total number of records is larger than that, it 
> can not be converted to Map join.
> *hive.auto.convert.join.max.number = 250L*
> The default value for this parameter is 250, because so many records 
> occupy about 700M memory in clint JVM, and 250 records for Map Join are 
> also large tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18362) Introduce a parameter to control the max row number for map join convertion

2018-01-03 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun reassigned HIVE-18362:
--


> Introduce a parameter to control the max row number for map join convertion
> ---
>
> Key: HIVE-18362
> URL: https://issues.apache.org/jira/browse/HIVE-18362
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
>
> The compression ratio of the Orc compressed file will be very high in some 
> cases.
> The test table has three Int columns, with twelve million records, but the 
> compressed file size is only 4M. Hive will automatically converts the Join to 
> Map join, but this will cause memory overflow. So I think it is better to 
> have a parameter to limit to the total number of table records in the Map 
> Join convertion, and if the total number of records is larger than that, it 
> can not be converted to Map join.
> *hive.auto.convert.join.max.number = 250L*
> The default value for this parameter is 250, because so many records 
> occupy about 700M memory in clint JVM, and 250 records for Map Join are 
> also large tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18055) cache like pattern object using map object in like function

2017-11-15 Thread wan kun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253211#comment-16253211
 ] 

wan kun commented on HIVE-18055:


The LRU cache is very similar to the code in UDFJson class。
Maybe this LRU Map could be public util  class ?

> cache like pattern object using map object in like function
> ---
>
> Key: HIVE-18055
> URL: https://issues.apache.org/jira/browse/HIVE-18055
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
> Attachments: HIVE-18055-branch-1.2.patch, 
> HIVE-18055.2-branch-1.2.patch, HIVE-18055.3-branch-1.2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Now, only one pattern object was cached in the like function. If the like 
> function is working on one column, the pattern object will be generated 
> continuously for the regular expression matching. It's very inefficient. So 
> should we use LRU MAP to cache a batch of objects ?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18055) cache like pattern object using map object in like function

2017-11-15 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-18055:
---
Attachment: HIVE-18055.3-branch-1.2.patch

> cache like pattern object using map object in like function
> ---
>
> Key: HIVE-18055
> URL: https://issues.apache.org/jira/browse/HIVE-18055
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
> Attachments: HIVE-18055-branch-1.2.patch, 
> HIVE-18055.2-branch-1.2.patch, HIVE-18055.3-branch-1.2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Now, only one pattern object was cached in the like function. If the like 
> function is working on one column, the pattern object will be generated 
> continuously for the regular expression matching. It's very inefficient. So 
> should we use LRU MAP to cache a batch of objects ?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18055) cache like pattern object using map object in like function

2017-11-14 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-18055:
---
Attachment: HIVE-18055.2-branch-1.2.patch

> cache like pattern object using map object in like function
> ---
>
> Key: HIVE-18055
> URL: https://issues.apache.org/jira/browse/HIVE-18055
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
> Attachments: HIVE-18055-branch-1.2.patch, 
> HIVE-18055.2-branch-1.2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Now, only one pattern object was cached in the like function. If the like 
> function is working on one column, the pattern object will be generated 
> continuously for the regular expression matching. It's very inefficient. So 
> should we use LRU MAP to cache a batch of objects ?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18055) cache like pattern object using map object in like function

2017-11-13 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-18055:
---
Status: Patch Available  (was: In Progress)

> cache like pattern object using map object in like function
> ---
>
> Key: HIVE-18055
> URL: https://issues.apache.org/jira/browse/HIVE-18055
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
> Attachments: HIVE-18055-branch-1.2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Now, only one pattern object was cached in the like function. If the like 
> function is working on one column, the pattern object will be generated 
> continuously for the regular expression matching. It's very inefficient. So 
> should we use LRU MAP to cache a batch of objects ?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18055) cache like pattern object using map object in like function

2017-11-13 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-18055:
---
Attachment: HIVE-18055-branch-1.2.patch

> cache like pattern object using map object in like function
> ---
>
> Key: HIVE-18055
> URL: https://issues.apache.org/jira/browse/HIVE-18055
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
> Attachments: HIVE-18055-branch-1.2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Now, only one pattern object was cached in the like function. If the like 
> function is working on one column, the pattern object will be generated 
> continuously for the regular expression matching. It's very inefficient. So 
> should we use LRU MAP to cache a batch of objects ?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-18055) cache like pattern object using map object in like function

2017-11-13 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-18055 started by wan kun.
--
> cache like pattern object using map object in like function
> ---
>
> Key: HIVE-18055
> URL: https://issues.apache.org/jira/browse/HIVE-18055
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Now, only one pattern object was cached in the like function. If the like 
> function is working on one column, the pattern object will be generated 
> continuously for the regular expression matching. It's very inefficient. So 
> should we use LRU MAP to cache a batch of objects ?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18055) cache like pattern object using map object in like function

2017-11-13 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun reassigned HIVE-18055:
--


> cache like pattern object using map object in like function
> ---
>
> Key: HIVE-18055
> URL: https://issues.apache.org/jira/browse/HIVE-18055
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Now, only one pattern object was cached in the like function. If the like 
> function is working on one column, the pattern object will be generated 
> continuously for the regular expression matching. It's very inefficient. So 
> should we use LRU MAP to cache a batch of objects ?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17985) When check the partitions size in the partitioned table, it will throw NullPointerException

2017-11-08 Thread wan kun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245293#comment-16245293
 ] 

wan kun commented on HIVE-17985:


[~kgyrtkirk] 
  I have upload the patch file to https://reviews.apache.org/r/63692/ 
  Could you help me to review the code ?

> When check the partitions size in the partitioned table, it will throw  
> NullPointerException
> 
>
> Key: HIVE-17985
> URL: https://issues.apache.org/jira/browse/HIVE-17985
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Physical Optimizer
>Affects Versions: 1.2.2, 2.3.0, 3.0.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 1.2.3, 2.3.1
>
> Attachments: HIVE-17985-branch-1.2.patch, 
> HIVE-17985-branch-2.3.patch, HIVE-17985.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When the hive.limit.query.max.table.partition parameter is set, the 
> SemanticAnalyzer will throw NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17985) When check the partitions size in the partitioned table, it will throw NullPointerException

2017-11-08 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-17985:
---
   Resolution: Fixed
Fix Version/s: (was: 3.0.0)
   2.3.1
   1.2.3
   Status: Resolved  (was: Patch Available)

> When check the partitions size in the partitioned table, it will throw  
> NullPointerException
> 
>
> Key: HIVE-17985
> URL: https://issues.apache.org/jira/browse/HIVE-17985
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Physical Optimizer
>Affects Versions: 1.2.2, 2.3.0, 3.0.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 1.2.3, 2.3.1
>
> Attachments: HIVE-17985-branch-1.2.patch, 
> HIVE-17985-branch-2.3.patch, HIVE-17985.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When the hive.limit.query.max.table.partition parameter is set, the 
> SemanticAnalyzer will throw NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17985) When check the partitions size in the partitioned table, it will throw NullPointerException

2017-11-06 Thread wan kun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16241425#comment-16241425
 ] 

wan kun commented on HIVE-17985:


Hi [~kgyrtkirk] ,thank you for you reply!
I have reproduce this issue with qtest table srcpart.

{code:sql}
set hive.limit.query.max.table.partition=1;

CREATE TABLE srcpart_like (key STRING COMMENT 'default', value STRING COMMENT 
'default')
PARTITIONED BY (ds STRING, hr STRING)
STORED AS TEXTFILE;

INSERT OVERWRITE TABLE srcpart_like PARTITION (ds='2008-04-08', hr='12')
select key,value from srcpart;
{code}

> When check the partitions size in the partitioned table, it will throw  
> NullPointerException
> 
>
> Key: HIVE-17985
> URL: https://issues.apache.org/jira/browse/HIVE-17985
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Physical Optimizer
>Affects Versions: 1.2.2, 2.3.0, 3.0.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 3.0.0
>
> Attachments: HIVE-17985-branch-1.2.patch, 
> HIVE-17985-branch-2.3.patch, HIVE-17985.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When the hive.limit.query.max.table.partition parameter is set, the 
> SemanticAnalyzer will throw NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17985) When check the partitions size in the partitioned table, it will throw NullPointerException

2017-11-05 Thread wan kun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239985#comment-16239985
 ] 

wan kun commented on HIVE-17985:


I have just fix a NullPointerException,But the build have some unexpected error 
which I never meet before .
Could [~thejas] or [~gopalv] take a look?

Many thanks!

> When check the partitions size in the partitioned table, it will throw  
> NullPointerException
> 
>
> Key: HIVE-17985
> URL: https://issues.apache.org/jira/browse/HIVE-17985
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Physical Optimizer
>Affects Versions: 1.2.2, 2.3.0, 3.0.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 3.0.0
>
> Attachments: HIVE-17985-branch-1.2.patch, 
> HIVE-17985-branch-2.3.patch, HIVE-17985.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When the hive.limit.query.max.table.partition parameter is set, the 
> SemanticAnalyzer will throw NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17974) If the job resource jar already exists in the HDFS fileSystem, do not upload!

2017-11-05 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-17974:
---
Attachment: HIVE-17974.2-branch-1.2.patch

> If the job resource jar already exists in the HDFS fileSystem, do not upload!
> -
>
> Key: HIVE-17974
> URL: https://issues.apache.org/jira/browse/HIVE-17974
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor, Tez
>Affects Versions: 1.2.2
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
> Attachments: HIVE-17974-branch-1.2.patch, 
> HIVE-17974.2-branch-1.2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> For Mr or tez application, if the jar resources already exists on the HDFS, 
> the application will still upload the jars to the HDFS when it starts.I thind 
> this is not need.
> So, if the original resource file is already on HDFS,I will record it ,and 
> when the application starts, it will use the original file on HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17985) When check the partitions size in the partitioned table, it will throw NullPointerException

2017-11-05 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-17985:
---
Attachment: HIVE-17985-branch-2.3.patch

> When check the partitions size in the partitioned table, it will throw  
> NullPointerException
> 
>
> Key: HIVE-17985
> URL: https://issues.apache.org/jira/browse/HIVE-17985
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Physical Optimizer
>Affects Versions: 1.2.2, 2.3.0, 3.0.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 3.0.0
>
> Attachments: HIVE-17985-branch-1.2.patch, 
> HIVE-17985-branch-2.3.patch, HIVE-17985.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When the hive.limit.query.max.table.partition parameter is set, the 
> SemanticAnalyzer will throw NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17985) When check the partitions size in the partitioned table, it will throw NullPointerException

2017-11-05 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-17985:
---
Attachment: HIVE-17985-branch-1.2.patch

> When check the partitions size in the partitioned table, it will throw  
> NullPointerException
> 
>
> Key: HIVE-17985
> URL: https://issues.apache.org/jira/browse/HIVE-17985
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Physical Optimizer
>Affects Versions: 1.2.2, 2.3.0, 3.0.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 3.0.0
>
> Attachments: HIVE-17985-branch-1.2.patch, HIVE-17985.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When the hive.limit.query.max.table.partition parameter is set, the 
> SemanticAnalyzer will throw NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17985) When check the partitions size in the partitioned table, it will throw NullPointerException

2017-11-05 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-17985:
---
Attachment: HIVE-17985.patch

> When check the partitions size in the partitioned table, it will throw  
> NullPointerException
> 
>
> Key: HIVE-17985
> URL: https://issues.apache.org/jira/browse/HIVE-17985
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Physical Optimizer
>Affects Versions: 1.2.2, 2.3.0, 3.0.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 3.0.0
>
> Attachments: HIVE-17985.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When the hive.limit.query.max.table.partition parameter is set, the 
> SemanticAnalyzer will throw NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17985) When check the partitions size in the partitioned table, it will throw NullPointerException

2017-11-05 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-17985:
---
Status: Patch Available  (was: Open)

> When check the partitions size in the partitioned table, it will throw  
> NullPointerException
> 
>
> Key: HIVE-17985
> URL: https://issues.apache.org/jira/browse/HIVE-17985
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Physical Optimizer
>Affects Versions: 2.3.0, 1.2.2, 3.0.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 3.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When the hive.limit.query.max.table.partition parameter is set, the 
> SemanticAnalyzer will throw NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17985) When check the partitions size in the partitioned table, it will throw NullPointerException

2017-11-05 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun reassigned HIVE-17985:
--


> When check the partitions size in the partitioned table, it will throw  
> NullPointerException
> 
>
> Key: HIVE-17985
> URL: https://issues.apache.org/jira/browse/HIVE-17985
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Physical Optimizer
>Affects Versions: 2.3.0, 1.2.2, 3.0.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 3.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When the hive.limit.query.max.table.partition parameter is set, the 
> SemanticAnalyzer will throw NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17574) Avoid multiple copies of HDFS-based jars when localizing job-jars

2017-11-03 Thread wan kun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238753#comment-16238753
 ] 

wan kun commented on HIVE-17574:


Hi, [~mithun] , [~cdrome]:
I have some questions , and look forward to your advices:
1. In the MapReduce jobs, tmpJars have the similar problem. I think we can also 
use the tmpJars file on hdfs.
2. Fo the destFS.copyFromLocalFile method in tez DagUtils class, if the source 
file system type and the target file system type are also hfs fileSystem, it 
would not be upload again? When the MR jobs are submitted,there would not 
upload the jars.
3. Could we set the resources permission to PUBLIC, so  they would only be 
downloaded only once by NodeManager ?

Thank you

> Avoid multiple copies of HDFS-based jars when localizing job-jars
> -
>
> Key: HIVE-17574
> URL: https://issues.apache.org/jira/browse/HIVE-17574
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
>Priority: Major
> Attachments: HIVE-17574.1-branch-2.2.patch, 
> HIVE-17574.1-branch-2.patch, HIVE-17574.1.patch, HIVE-17574.2.patch
>
>
> Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.)
> This has to do with the classpaths of Hive actions run from Oozie, and 
> affects scripts that adds jars/resources from HDFS locations.
> As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) 
> tend to be stored in HDFS paths, as are any custom user-libraries used in 
> workflows. An {{ADD JAR|FILE|ARCHIVE}} statement in a Hive script causes the 
> following steps to occur:
> # Files are downloaded from HDFS to local temp dir.
> # UDFs are resolved/validated.
> # All jars/files, including those just downloaded from HDFS, are shipped 
> right back to HDFS-based scratch-directories, for job submission.
> For HDFS-based files, this is wasteful and time-consuming. #3 above should 
> skip shipping HDFS-based resources, and add those directly to the Tez session.
> We have a patch that's being used internally at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17974) If the job resource jar already exists in the HDFS fileSystem, do not upload!

2017-11-03 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-17974:
---
Status: Patch Available  (was: Open)

> If the job resource jar already exists in the HDFS fileSystem, do not upload!
> -
>
> Key: HIVE-17974
> URL: https://issues.apache.org/jira/browse/HIVE-17974
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor, Tez
>Affects Versions: 1.2.2
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
> Attachments: HIVE-17974-branch-1.2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> For Mr or tez application, if the jar resources already exists on the HDFS, 
> the application will still upload the jars to the HDFS when it starts.I thind 
> this is not need.
> So, if the original resource file is already on HDFS,I will record it ,and 
> when the application starts, it will use the original file on HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17974) If the job resource jar already exists in the HDFS fileSystem, do not upload!

2017-11-03 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-17974:
---
Attachment: (was: HIVE-17974-branch-1.2.3.patch)

> If the job resource jar already exists in the HDFS fileSystem, do not upload!
> -
>
> Key: HIVE-17974
> URL: https://issues.apache.org/jira/browse/HIVE-17974
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor, Tez
>Affects Versions: 1.2.2
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
> Attachments: HIVE-17974-branch-1.2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> For Mr or tez application, if the jar resources already exists on the HDFS, 
> the application will still upload the jars to the HDFS when it starts.I thind 
> this is not need.
> So, if the original resource file is already on HDFS,I will record it ,and 
> when the application starts, it will use the original file on HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17974) If the job resource jar already exists in the HDFS fileSystem, do not upload!

2017-11-03 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-17974:
---
Attachment: HIVE-17974-branch-1.2.patch

> If the job resource jar already exists in the HDFS fileSystem, do not upload!
> -
>
> Key: HIVE-17974
> URL: https://issues.apache.org/jira/browse/HIVE-17974
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor, Tez
>Affects Versions: 1.2.2
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
> Attachments: HIVE-17974-branch-1.2.3.patch, 
> HIVE-17974-branch-1.2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> For Mr or tez application, if the jar resources already exists on the HDFS, 
> the application will still upload the jars to the HDFS when it starts.I thind 
> this is not need.
> So, if the original resource file is already on HDFS,I will record it ,and 
> when the application starts, it will use the original file on HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17974) If the job resource jar already exists in the HDFS fileSystem, do not upload!

2017-11-03 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-17974:
---
Attachment: HIVE-17974-branch-1.2.3.patch

> If the job resource jar already exists in the HDFS fileSystem, do not upload!
> -
>
> Key: HIVE-17974
> URL: https://issues.apache.org/jira/browse/HIVE-17974
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor, Tez
>Affects Versions: 1.2.2
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
> Attachments: HIVE-17974-branch-1.2.3.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> For Mr or tez application, if the jar resources already exists on the HDFS, 
> the application will still upload the jars to the HDFS when it starts.I thind 
> this is not need.
> So, if the original resource file is already on HDFS,I will record it ,and 
> when the application starts, it will use the original file on HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17974) If the job resource jar already exists in the HDFS fileSystem, do not upload!

2017-11-03 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun reassigned HIVE-17974:
--


> If the job resource jar already exists in the HDFS fileSystem, do not upload!
> -
>
> Key: HIVE-17974
> URL: https://issues.apache.org/jira/browse/HIVE-17974
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor, Tez
>Affects Versions: 1.2.2
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
> Fix For: 1.2.3
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> For Mr or tez application, if the jar resources already exists on the HDFS, 
> the application will still upload the jars to the HDFS when it starts.I thind 
> this is not need.
> So, if the original resource file is already on HDFS,I will record it ,and 
> when the application starts, it will use the original file on HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-03-08 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944.8.patch

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, 
> HIVE-15944.2-branch-1.1.1.path.erroroutput, HIVE-15944.3.patch, 
> HIVE-15944.4-branch-1.1.1.patch, HIVE-15944.4.patch, HIVE-15944.5.patch, 
> HIVE-15944.6.patch, HIVE-15944.7.patch, HIVE-15944.8.patch, 
> HIVE-15944-branch-1.1.patch, HIVE-15944.patch, STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-03-07 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944.7.patch

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, 
> HIVE-15944.2-branch-1.1.1.path.erroroutput, HIVE-15944.3.patch, 
> HIVE-15944.4-branch-1.1.1.patch, HIVE-15944.4.patch, HIVE-15944.5.patch, 
> HIVE-15944.6.patch, HIVE-15944.7.patch, HIVE-15944-branch-1.1.patch, 
> HIVE-15944.patch, STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-03-04 Thread wan kun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895683#comment-15895683
 ] 

wan kun commented on HIVE-15944:


[~ashutoshc] I find my test case is something special .

In semantic analyze stage ,the plan tree is :

TS[0]-FIL[17]-SEL[2]-LIM[3]-RS[4] -SEL[5]-LIM[6]-RS[11] -JOIN[14]-SEL[15]-FS[16]
TS[7]-FIL[18]-SEL[9]-RS[13] -JOIN[14]

but when it compile to MR jobs, hive will add FIL[19] and TS[20] between LIM[6] 
and RS[11] operator.

The FIL[19] will get schema from LIM[6] which LIM[6] is not the right output 
cols.

So,I think two way to solve this problem.
1. when generator MR jobs ,use if condition to get the right file sink schema.
2. update the LIM operator's schema in the semantic optimize operation.

Now I try to fix this bug by the first way .Do you have any better suggestions!


And the HIVE-15944.6.patch run error because "No space left on device”,can you 
have a look at the output message.

Many thinks!


> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, 
> HIVE-15944.2-branch-1.1.1.path.erroroutput, HIVE-15944.3.patch, 
> HIVE-15944.4-branch-1.1.1.patch, HIVE-15944.4.patch, HIVE-15944.5.patch, 
> HIVE-15944.6.patch, HIVE-15944-branch-1.1.patch, HIVE-15944.patch, 
> STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-03-02 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944.6.patch

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, 
> HIVE-15944.2-branch-1.1.1.path.erroroutput, HIVE-15944.3.patch, 
> HIVE-15944.4-branch-1.1.1.patch, HIVE-15944.4.patch, HIVE-15944.5.patch, 
> HIVE-15944.6.patch, HIVE-15944-branch-1.1.patch, HIVE-15944.patch, 
> STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-03-02 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944.5.patch

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, 
> HIVE-15944.2-branch-1.1.1.path.erroroutput, HIVE-15944.3.patch, 
> HIVE-15944.4-branch-1.1.1.patch, HIVE-15944.4.patch, HIVE-15944.5.patch, 
> HIVE-15944-branch-1.1.patch, HIVE-15944.patch, STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-25 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: (was: HIVE-15944.3-branch-1.1.1.path)

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, 
> HIVE-15944.2-branch-1.1.1.path.erroroutput, HIVE-15944.3.patch, 
> HIVE-15944.4-branch-1.1.1.patch, HIVE-15944.4.patch, 
> HIVE-15944-branch-1.1.patch, HIVE-15944.patch, STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-25 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: (was: HIVE-15944.4-branch-1.1.1.path)

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, 
> HIVE-15944.2-branch-1.1.1.path.erroroutput, HIVE-15944.3.patch, 
> HIVE-15944.4-branch-1.1.1.patch, HIVE-15944.4.patch, 
> HIVE-15944-branch-1.1.patch, HIVE-15944.patch, STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-25 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944.4.patch
HIVE-15944.4-branch-1.1.1.patch

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, 
> HIVE-15944.2-branch-1.1.1.path.erroroutput, HIVE-15944.3.patch, 
> HIVE-15944.4-branch-1.1.1.patch, HIVE-15944.4.patch, 
> HIVE-15944-branch-1.1.patch, HIVE-15944.patch, STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-25 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944.4-branch-1.1.1.path

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, 
> HIVE-15944.2-branch-1.1.1.path.erroroutput, HIVE-15944.3-branch-1.1.1.path, 
> HIVE-15944.3.patch, HIVE-15944.4-branch-1.1.1.path, 
> HIVE-15944-branch-1.1.patch, HIVE-15944.patch, STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-23 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944.3.patch

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, 
> HIVE-15944.2-branch-1.1.1.path.erroroutput, HIVE-15944.3-branch-1.1.1.path, 
> HIVE-15944.3.patch, HIVE-15944-branch-1.1.patch, HIVE-15944.patch, 
> STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-23 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944.3-branch-1.1.1.path

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, 
> HIVE-15944.2-branch-1.1.1.path.erroroutput, HIVE-15944.3-branch-1.1.1.path, 
> HIVE-15944-branch-1.1.patch, HIVE-15944.patch, STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-20 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944.2-branch-1.1.1.path.erroroutput

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, 
> HIVE-15944.2-branch-1.1.1.path.erroroutput, HIVE-15944-branch-1.1.patch, 
> HIVE-15944.patch, STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-20 Thread wan kun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874776#comment-15874776
 ] 

wan kun commented on HIVE-15944:


[~ashutoshc]
I have add a test case with_column_pruner.q ~
For the test case,STAGE DEPENDENCIES is my local test output which is error:
outputColumnNames: _col0, _col1, _col10, _col11, _col2, _col3, _col4, _col5, 
_col6, _col7, _col8, _col9
While in the patch output is right : _col0, _col1, _col2, _col3, _col4, _col5, 
_col6, _col7, _col8, _col9, _col10, _col11

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, HIVE-15944-branch-1.1.patch, 
> HIVE-15944.patch, STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-20 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944.1.patch
STAGE_DEPENDENCIES

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.1.patch, HIVE-15944-branch-1.1.patch, 
> HIVE-15944.patch, STAGE_DEPENDENCIES
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-16 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Affects Version/s: 1.1.0

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.0, 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944-branch-1.1.patch, HIVE-15944.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15945) Remove debug parameter in HADOOP_OPTS environment when start a new job local.

2017-02-16 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15945:
---
Status: Patch Available  (was: Open)

> Remove debug parameter in HADOOP_OPTS environment when start a new job local.
> -
>
> Key: HIVE-15945
> URL: https://issues.apache.org/jira/browse/HIVE-15945
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
>  Labels: patch
> Fix For: 2.2.0
>
> Attachments: HIVE-15945.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When hive start a new job in child VM,the debug parameter will be defined 
> twice.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-16 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944-branch-1.1.patch

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944-branch-1.1.patch, HIVE-15944.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15945) Remove debug parameter in HADOOP_OPTS environment when start a new job local.

2017-02-16 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15945:
---
Attachment: HIVE-15945.patch

> Remove debug parameter in HADOOP_OPTS environment when start a new job local.
> -
>
> Key: HIVE-15945
> URL: https://issues.apache.org/jira/browse/HIVE-15945
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
>  Labels: patch
> Fix For: 2.2.0
>
> Attachments: HIVE-15945.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When hive start a new job in child VM,the debug parameter will be defined 
> twice.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15945) Remove debug parameter in HADOOP_OPTS environment when start a new job local.

2017-02-16 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun reassigned HIVE-15945:
--


> Remove debug parameter in HADOOP_OPTS environment when start a new job local.
> -
>
> Key: HIVE-15945
> URL: https://issues.apache.org/jira/browse/HIVE-15945
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: wan kun
>Assignee: wan kun
>Priority: Minor
>  Labels: patch
> Fix For: 2.2.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When hive start a new job in child VM,the debug parameter will be defined 
> twice.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-16 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944.patch

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-16 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: (was: HIVE-15944.patch)

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-16 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: HIVE-15944.patch

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: HIVE-15944.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-16 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: (was: 
0001-remove-sort-colList-operator-in-ColumnPrunerProcFact.patch)

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-16 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Attachment: 0001-remove-sort-colList-operator-in-ColumnPrunerProcFact.patch

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
> Attachments: 
> 0001-remove-sort-colList-operator-in-ColumnPrunerProcFact.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-16 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun updated HIVE-15944:
---
Status: Patch Available  (was: Open)

> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15944) The order of cols is error in ColumnPrunerReduceSinkProc because of sort operator

2017-02-16 Thread wan kun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wan kun reassigned HIVE-15944:
--


> The order of cols is error in ColumnPrunerReduceSinkProc because of sort 
> operator
> -
>
> Key: HIVE-15944
> URL: https://issues.apache.org/jira/browse/HIVE-15944
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: wan kun
>Assignee: wan kun
> Fix For: 2.2.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If one sql have two job.
> job 1:The order of cols is updated in ColumnPrunerReduceSinkProc because of 
> sort operator.
> job 2 will read error in map operation because the cols order is old.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)