[jira] [Updated] (HIVE-14290) Refactor HIVE-14054 to use Collections#newSetFromMap

2016-08-15 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-14290:
-
Attachment: HIVE-14290.1.patch

Re-uploading patch to trigger Hive QA

> Refactor HIVE-14054 to use Collections#newSetFromMap
> 
>
> Key: HIVE-14290
> URL: https://issues.apache.org/jira/browse/HIVE-14290
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Trivial
> Attachments: HIVE-14290.1.patch, HIVE-14290.1.patch, 
> HIVE-14290.1.patch
>
>
> There is a minor refactor that can be made to HiveMetaStoreChecker so that it 
> cleanly creates and uses a set that is backed by a Map implementation. In 
> this case, the underlying Map implementation is ConcurrentHashMap. This 
> refactor will help prevent issues such as the one reported in HIVE-14054.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14290) Refactor HIVE-14054 to use Collections#newSetFromMap

2016-07-20 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387122#comment-15387122
 ] 

Peter Slawski commented on HIVE-14290:
--

Thank you [~prasanth_j] for the review. It looks like an unrelated error caused 
the build to fail. I have attached the same patch again to this JIRA to 
hopefully trigger the QA build.

{code}
Could not transfer artifact 
org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde from/to datanucleus
{code}

> Refactor HIVE-14054 to use Collections#newSetFromMap
> 
>
> Key: HIVE-14290
> URL: https://issues.apache.org/jira/browse/HIVE-14290
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Trivial
> Attachments: HIVE-14290.1.patch, HIVE-14290.1.patch
>
>
> There is a minor refactor that can be made to HiveMetaStoreChecker so that it 
> cleanly creates and uses a set that is backed by a Map implementation. In 
> this case, the underlying Map implementation is ConcurrentHashMap. This 
> refactor will help prevent issues such as the one reported in HIVE-14054.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14290) Refactor HIVE-14054 to use Collections#newSetFromMap

2016-07-20 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-14290:
-
Attachment: HIVE-14290.1.patch

> Refactor HIVE-14054 to use Collections#newSetFromMap
> 
>
> Key: HIVE-14290
> URL: https://issues.apache.org/jira/browse/HIVE-14290
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Trivial
> Attachments: HIVE-14290.1.patch, HIVE-14290.1.patch
>
>
> There is a minor refactor that can be made to HiveMetaStoreChecker so that it 
> cleanly creates and uses a set that is backed by a Map implementation. In 
> this case, the underlying Map implementation is ConcurrentHashMap. This 
> refactor will help prevent issues such as the one reported in HIVE-14054.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14290) Refactor HIVE-14054 to use Collections#newSetFromMap

2016-07-19 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-14290:
-
Target Version/s: 2.2.0
   Fix Version/s: (was: 2.2.0)

> Refactor HIVE-14054 to use Collections#newSetFromMap
> 
>
> Key: HIVE-14290
> URL: https://issues.apache.org/jira/browse/HIVE-14290
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Trivial
> Attachments: HIVE-14290.1.patch
>
>
> There is a minor refactor that can be made to HiveMetaStoreChecker so that it 
> cleanly creates and uses a set that is backed by a Map implementation. In 
> this case, the underlying Map implementation is ConcurrentHashMap. This 
> refactor will help prevent issues such as the one reported in HIVE-14054.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14290) Refactor HIVE-14054 to use Collections#newSetFromMap

2016-07-19 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-14290:
-
Fix Version/s: 2.2.0
   Status: Patch Available  (was: Open)

I've attached a patch that makes this minor refactor.

> Refactor HIVE-14054 to use Collections#newSetFromMap
> 
>
> Key: HIVE-14290
> URL: https://issues.apache.org/jira/browse/HIVE-14290
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-14290.1.patch
>
>
> There is a minor refactor that can be made to HiveMetaStoreChecker so that it 
> cleanly creates and uses a set that is backed by a Map implementation. In 
> this case, the underlying Map implementation is ConcurrentHashMap. This 
> refactor will help prevent issues such as the one reported in HIVE-14054.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13699) Make JavaDataModel#get thread safe for parallel compilation

2016-05-18 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-13699:
-
Attachment: HIVE-13699.2.patch

Attached updated patch with fix to use SLF4J logger.

> Make JavaDataModel#get thread safe for parallel compilation
> ---
>
> Key: HIVE-13699
> URL: https://issues.apache.org/jira/browse/HIVE-13699
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, storage-api
>Affects Versions: 2.0.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Minor
> Attachments: HIVE-13699.1.patch, HIVE-13699.2.patch
>
>
> The class JavaDataModel has a static method, #get, that is not thread safe. 
> This may be an issue when parallel query compilation is enabled because two 
> threads may attempt to call JavaDataModel#get at the same time, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13699) Make JavaDataModel#get thread safe for parallel compilation

2016-05-18 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289449#comment-15289449
 ] 

Peter Slawski commented on HIVE-13699:
--

Yeah, I should be using SLF4J. I will correct and post an updated patch. This 
is a preemptive patch found by doing static analysis on the code path for 
Driver#compile.

> Make JavaDataModel#get thread safe for parallel compilation
> ---
>
> Key: HIVE-13699
> URL: https://issues.apache.org/jira/browse/HIVE-13699
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, storage-api
>Affects Versions: 2.0.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Minor
> Attachments: HIVE-13699.1.patch
>
>
> The class JavaDataModel has a static method, #get, that is not thread safe. 
> This may be an issue when parallel query compilation is enabled because two 
> threads may attempt to call JavaDataModel#get at the same time, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13699) Make JavaDataModel#get thread safe for parallel compilation

2016-05-10 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278857#comment-15278857
 ] 

Peter Slawski commented on HIVE-13699:
--

The tests that failed look to be unrelated. All the tests that failed had also 
failed in the Jenkins build before this one, see 
[http://ec2-54-177-240-2.us-west-1.compute.amazonaws.com/job/PreCommit-HIVE-MASTER-Build/205/console].
 To double check, I rerun the failed unit tests to verify this patch did not 
introduce failures. I noticed some of them are now fixed on latest-pulled 
master with my patch applied to it. Note the change itself is isolated to 
change implementations of JavaDataModel#get making it thread safe, it should 
not change behavior as the added unit tests show.


> Make JavaDataModel#get thread safe for parallel compilation
> ---
>
> Key: HIVE-13699
> URL: https://issues.apache.org/jira/browse/HIVE-13699
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, storage-api
>Affects Versions: 2.0.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Minor
> Attachments: HIVE-13699.1.patch
>
>
> The class JavaDataModel has a static method, #get, that is not thread safe. 
> This may be an issue when parallel query compilation is enabled because two 
> threads may attempt to call JavaDataModel#get at the same time, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13699) Make JavaDataModel#get thread safe for parallel compilation

2016-05-05 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-13699:
-
Status: Patch Available  (was: Open)

I have attached a patch which fixes this by initializing the system's 
JavaDataModel via the lazy holder pattern to ensure thread safety.

> Make JavaDataModel#get thread safe for parallel compilation
> ---
>
> Key: HIVE-13699
> URL: https://issues.apache.org/jira/browse/HIVE-13699
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, storage-api
>Affects Versions: 2.0.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Minor
> Attachments: HIVE-13699.1.patch
>
>
> The class JavaDataModel has a static method, #get, that is not thread safe. 
> This may be an issue when parallel query compilation is enabled because two 
> threads may attempt to call JavaDataModel#get at the same time, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13699) Make JavaDataModel#get thread safe for parallel compilation

2016-05-05 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-13699:
-
Attachment: HIVE-13699.1.patch

> Make JavaDataModel#get thread safe for parallel compilation
> ---
>
> Key: HIVE-13699
> URL: https://issues.apache.org/jira/browse/HIVE-13699
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, storage-api
>Affects Versions: 2.0.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Minor
> Attachments: HIVE-13699.1.patch
>
>
> The class JavaDataModel has a static method, #get, that is not thread safe. 
> This may be an issue when parallel query compilation is enabled because two 
> threads may attempt to call JavaDataModel#get at the same time, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13512) Make initializing dag ids in TezWork thread safe for parallel compilation

2016-04-29 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264476#comment-15264476
 ] 

Peter Slawski commented on HIVE-13512:
--

[~gopalv], could you please confirm my above statement regarding the test 
failures. I would like to know the next steps I need to take for getting this 
patch in. Thank you!

> Make initializing dag ids in TezWork thread safe for parallel compilation
> -
>
> Key: HIVE-13512
> URL: https://issues.apache.org/jira/browse/HIVE-13512
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Planning
>Affects Versions: 2.0.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Minor
> Attachments: HIVE-13512.1.patch, HIVE-13512.1.patch
>
>
> When parallel query compilation is enabled, it is possible for concurrent 
> running threads to create TezWork objects that have the same dag id. This is 
> because the counter used to obtain the next dag id is not thread safe. The 
> counter should be an AtomicInteger rather than an int.
> {code:java}
>   private static int counter;
>   ...
>   public TezWork(String queryId, Configuration conf) {
> this.dagId = queryId + ":" + (++counter);
> ...
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13512) Make initializing dag ids in TezWork thread safe for parallel compilation

2016-04-27 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259608#comment-15259608
 ] 

Peter Slawski commented on HIVE-13512:
--

The test failure appear not to be related to this patch. Between the two tests 
run, the patch has not changed. The common test fail between the two runs is 
TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3. However, this is 
using TestMiniSparkOnYarnCliDriver which should not be creating TezWork 
objects, which this patch only touches.

> Make initializing dag ids in TezWork thread safe for parallel compilation
> -
>
> Key: HIVE-13512
> URL: https://issues.apache.org/jira/browse/HIVE-13512
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Planning
>Affects Versions: 2.0.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Minor
> Attachments: HIVE-13512.1.patch, HIVE-13512.1.patch
>
>
> When parallel query compilation is enabled, it is possible for concurrent 
> running threads to create TezWork objects that have the same dag id. This is 
> because the counter used to obtain the next dag id is not thread safe. The 
> counter should be an AtomicInteger rather than an int.
> {code:java}
>   private static int counter;
>   ...
>   public TezWork(String queryId, Configuration conf) {
> this.dagId = queryId + ":" + (++counter);
> ...
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13512) Make initializing dag ids in TezWork thread safe for parallel compilation

2016-04-22 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-13512:
-
Attachment: HIVE-13512.1.patch

Attaching same patch to trigger tests. The logs for the previous test run are 
lost due to the Jenkins server being down.

> Make initializing dag ids in TezWork thread safe for parallel compilation
> -
>
> Key: HIVE-13512
> URL: https://issues.apache.org/jira/browse/HIVE-13512
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Planning
>Affects Versions: 2.0.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Minor
> Attachments: HIVE-13512.1.patch, HIVE-13512.1.patch
>
>
> When parallel query compilation is enabled, it is possible for concurrent 
> running threads to create TezWork objects that have the same dag id. This is 
> because the counter used to obtain the next dag id is not thread safe. The 
> counter should be an AtomicInteger rather than an int.
> {code:java}
>   private static int counter;
>   ...
>   public TezWork(String queryId, Configuration conf) {
> this.dagId = queryId + ":" + (++counter);
> ...
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13512) Make initializing dag ids in TezWork thread safe for parallel compilation

2016-04-13 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-13512:
-
Status: Patch Available  (was: Open)

I have attached a patch which replaces int with AtomicInteger.

> Make initializing dag ids in TezWork thread safe for parallel compilation
> -
>
> Key: HIVE-13512
> URL: https://issues.apache.org/jira/browse/HIVE-13512
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Planning
>Affects Versions: 2.0.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Minor
> Attachments: HIVE-13512.1.patch
>
>
> When parallel query compilation is enabled, it is possible for concurrent 
> running threads to create TezWork objects that have the same dag id. This is 
> because the counter used to obtain the next dag id is not thread safe. The 
> counter should be an AtomicInteger rather than an int.
> {code:java}
>   private static int counter;
>   ...
>   public TezWork(String queryId, Configuration conf) {
> this.dagId = queryId + ":" + (++counter);
> ...
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9147) Add unit test for HIVE-7323

2016-01-26 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117741#comment-15117741
 ] 

Peter Slawski commented on HIVE-9147:
-

Thank you [~ashutoshc] for reviewing this change. 

The test failures are not relevant to this change as this change only adds a 
new unit test. Are there any further actions on my part to take? This has been 
opened for a while and I am hoping it can resolved.

> Add unit test for HIVE-7323
> ---
>
> Key: HIVE-9147
> URL: https://issues.apache.org/jira/browse/HIVE-9147
> Project: Hive
>  Issue Type: Test
>  Components: Statistics
>Affects Versions: 0.14.0, 0.13.1
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Minor
> Attachments: HIVE-9147.1.patch, HIVE-9147.2.patch
>
>
> This unit test verifies that DateStatisticImpl doesn't store mutable objects 
> from callers for minimum and maximum values. This ensures callers cannot 
> modify the internal minimum and maximum values outside of DateStatisticImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9147) Add unit test for HIVE-7323

2016-01-19 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-9147:

Attachment: HIVE-9147.2.patch

I rebased this patch on top of the latest master branch. Please see attachments.

> Add unit test for HIVE-7323
> ---
>
> Key: HIVE-9147
> URL: https://issues.apache.org/jira/browse/HIVE-9147
> Project: Hive
>  Issue Type: Test
>  Components: Statistics
>Affects Versions: 0.14.0, 0.13.1
>Reporter: Peter Slawski
>Priority: Minor
> Attachments: HIVE-9147.1.patch, HIVE-9147.2.patch
>
>
> This unit test verifies that DateStatisticImpl doesn't store mutable objects 
> from callers for minimum and maximum values. This ensures callers cannot 
> modify the internal minimum and maximum values outside of DateStatisticImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-07 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533450#comment-14533450
 ] 

Peter Slawski commented on HIVE-10538:
--

[~prasanth_j], Thanks for resolving that last failed test and getting this fix 
in.

I was initially using Oracle's JDK 1.7.0_60, but didn't have luck with 1.7.0_45 
either. I also tried the OpenJDK version I already had installed, 1.7.0_79.

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski
Assignee: Peter Slawski
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
 HIVE-10538.1.patch, HIVE-10538.2.patch, HIVE-10538.3.patch


 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
   ... 8 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-05 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-10538:
-
Attachment: HIVE-10538.2.patch

I've attached the second revision of the patch which updates failed Spark 
qtests.

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski
Assignee: Peter Slawski
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
 HIVE-10538.1.patch, HIVE-10538.2.patch


 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
   ... 8 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-05 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529707#comment-14529707
 ] 

Peter Slawski commented on HIVE-10538:
--

Great, I've been working on just that. I'll be able to posted an updated patch 
tomorrow.

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski
Assignee: Peter Slawski
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
 HIVE-10538.1.patch


 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
   ... 8 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-05 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529673#comment-14529673
 ] 

Peter Slawski commented on HIVE-10538:
--

The Spark driver failures are caused by this change. This would be expected if 
a row's hashcode affected its ordering in Spark. This patch makes it so that 
HiveKey's hashcode outputted from ReduceSinkOperator is no longer always 
multiplied by 31 (as explained previously).

Also, for at least those failed qtests, the row ordering/output in the expected 
output differs across MapRed, Tez, and Spark. So, execution engine affects 
ordering.

From 
[spark/groupby_complex_types_multi_single_reducer.q.out#L221|https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/spark/groupby_complex_types_multi_single_reducer.q.out#L221]
{code}
POSTHOOK: query: SELECT DEST2.* FROM DEST2
POSTHOOK: type: QUERY
POSTHOOK: Input: default@dest2
 A masked pattern was here 
{120:val_120}   2
{129:val_129}   2
{160:val_160}   1
{26:val_26} 2
{27:val_27} 1
{288:val_288}   2
{298:val_298}   3
{30:val_30} 1
{311:val_311}   3
{74:val_74} 1
{code}
From 
[groupby_complex_types_multi_single_reducer.q.out#L240|https://github.com/apache/hive/blob/master/ql/src/test/results/clientpositive/groupby_complex_types_multi_single_reducer.q.out#L240]
{code}
POSTHOOK: query: SELECT DEST2.* FROM DEST2
POSTHOOK: type: QUERY
POSTHOOK: Input: default@dest2
 A masked pattern was here 
{0:val_0}   3
{10:val_10} 1
{100:val_100}   2
{103:val_103}   2
{104:val_104}   2
{105:val_105}   1
{11:val_11} 1
{111:val_111}   1
{113:val_113}   2
{114:val_114}   1
{code}

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski
Assignee: Peter Slawski
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
 HIVE-10538.1.patch


 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
   ... 8 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-05 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529134#comment-14529134
 ] 

Peter Slawski commented on HIVE-10538:
--

Will do.

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski
Assignee: Peter Slawski
Priority: Critical
 Fix For: 1.2.0, 1.3.0

 Attachments: HIVE-10538.1.patch, HIVE-10538.1.patch, 
 HIVE-10538.1.patch


 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
   ... 8 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-01 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523765#comment-14523765
 ] 

Peter Slawski commented on HIVE-10538:
--

Yes. Here is an explanation that refers to how this transient is used.

The transient is used to compute the row's hash in 
[ReduceSinkOperator.java#L368|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L368].
{code}
hashCode = computeHashCode(row, bucketNumber);
{code}
If the given the bucket number is valid (which would always be as the transient 
is initialized to a valid number) the computed hashcode would be always 
multiplied by 31, see 
[ReduceSinkOperator.java#L488|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L488]:
{code}
  ...
  private int computeHashCode(Object row, int buckNum) throws HiveException {
  ...
} else {
  for (int i = 0; i  partitionEval.length; i++) {
Object o = partitionEval[i].evaluate(row);
keyHashCode = keyHashCode * 31
+ ObjectInspectorUtils.hashCode(o, partitionObjectInspectors[i]);
  }
}
int hashCode = buckNum  0 ? keyHashCode : keyHashCode * 31 + buckNum;
...
return hashCode;
  }
{code}
FileSinkOperator recomputes the hashcode at findWriterOffset(), but won't 
multiple by 31. This causes a different bucket number to be computed than 
expected. bucketMap only contains mappings for the bucket numbers that is 
expected for the current reducer to receive. From 
[FileSinkOperator.java#L811|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L811]:
{code}
  private int findWriterOffset(Object row) throws HiveException {
  ...
  for (int i = 0; i  partitionEval.length; i++) {
Object o = partitionEval[i].evaluate(row);
keyHashCode = keyHashCode * 31
+ ObjectInspectorUtils.hashCode(o, partitionObjectInspectors[i]);
  }
  key.setHashCode(keyHashCode);
  int bucketNum = prtner.getBucket(key, null, totalFiles);
  return bucketMap.get(bucketNum);
  }
{code}
The transient was introduced in [HIVE-8151] which refactored the bucket number 
from a local variable to a transient field. Initially, the local variable was 
initialized to -1. The refactor changed the code so that the transient field 
was used instead.

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski
 Fix For: 1.3.0

 Attachments: HIVE-10538.1.patch


 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at 

[jira] [Updated] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-05-01 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-10538:
-
Attachment: HIVE-10538.1.patch

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski
 Attachments: HIVE-10538.1.patch


 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
   ... 8 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10538) Fix NPE in FileSinkOperator from hashcode mismatch

2015-04-29 Thread Peter Slawski (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520122#comment-14520122
 ] 

Peter Slawski commented on HIVE-10538:
--

Currently working on testing the fix for this issue.

 Fix NPE in FileSinkOperator from hashcode mismatch
 --

 Key: HIVE-10538
 URL: https://issues.apache.org/jira/browse/HIVE-10538
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0, 1.2.0
Reporter: Peter Slawski

 A Null Pointer Exception occurs when in FileSinkOperator when using bucketed 
 tables and distribute by with multiFileSpray enabled. The following snippet 
 query reproduces this issue:
 {code}
 set hive.enforce.bucketing = true;
 set hive.exec.reducers.max = 20;
 create table bucket_a(key int, value_a string) clustered by (key) into 256 
 buckets;
 create table bucket_b(key int, value_b string) clustered by (key) into 256 
 buckets;
 create table bucket_ab(key int, value_a string, value_b string) clustered by 
 (key) into 256 buckets;
 -- Insert data into bucket_a and bucket_b
 insert overwrite table bucket_ab
 select a.key, a.value_a, b.value_b from bucket_a a join bucket_b b on (a.key 
 = b.key) distribute by key;
 {code}
 The following stack trace is logged.
 {code}
 2015-04-29 12:54:12,841 FATAL [pool-110-thread-1]: ExecReducer 
 (ExecReducer.java:reduce(255)) - 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) {key:{},value:{_col0:113,_col1:val_113}}
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:819)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:747)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
   ... 8 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)