[jira] [Assigned] (HIVE-16068) BloomFilter expectedEntries not always using NDV when it's available during runtime filtering

2017-02-28 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-16068:
-


> BloomFilter expectedEntries not always using NDV when it's available during 
> runtime filtering
> -
>
> Key: HIVE-16068
> URL: https://issues.apache.org/jira/browse/HIVE-16068
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> The current logic only uses NDV if it's the only ColumnStat available, but it 
> looks like there can sometimes be other ColStats in the semijoin Select 
> operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16065) Vectorization: Wrong Key/Value information used by Vectorizer

2017-03-02 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893603#comment-15893603
 ] 

Jason Dere commented on HIVE-16065:
---

+1 pending tests

> Vectorization: Wrong Key/Value information used by Vectorizer
> -
>
> Key: HIVE-16065
> URL: https://issues.apache.org/jira/browse/HIVE-16065
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16065.01.patch, HIVE-16065.07.patch
>
>
> Make Vectorizer class get reducer key/value information the same way 
> ExecReducer/ReduceRecordProcessor do.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16386) Add debug logging to describe why runtime filtering semijoins are removed

2017-04-06 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959492#comment-15959492
 ] 

Jason Dere commented on HIVE-16386:
---

[~prasanth_j], can you review? just adding logging statements

> Add debug logging to describe why runtime filtering semijoins are removed
> -
>
> Key: HIVE-16386
> URL: https://issues.apache.org/jira/browse/HIVE-16386
> Project: Hive
>  Issue Type: Improvement
>  Components: Logging
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16386.1.patch
>
>
> Add a few logging statements to detail the reason why semijoin optimizations 
> are being removed, which can help during debugging.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16462) Vectorization: Enabling hybrid grace disables specialization of all reduce side joins

2017-04-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-16462:
-


> Vectorization: Enabling hybrid grace disables specialization of all reduce 
> side joins
> -
>
> Key: HIVE-16462
> URL: https://issues.apache.org/jira/browse/HIVE-16462
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Observed by [~gopalv].
> Having grace hash join enabled prevents the specialized vector hash joins 
> during the vectorizer stage of query planning. However 
> hive.llap.enable.grace.join.in.llap will later disable grace hash join 
> (LlapDecider runs after Vectorizer). If we can disable the grace hash join 
> before vectorization kicks in then we can still benefit from the specialized 
> vector hash joins.
> This can be special cased for the llap.execution.mode=only case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16462) Vectorization: Enabling hybrid grace disables specialization of all reduce side joins

2017-04-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16462:
--
Attachment: HIVE-16462.1.patch

> Vectorization: Enabling hybrid grace disables specialization of all reduce 
> side joins
> -
>
> Key: HIVE-16462
> URL: https://issues.apache.org/jira/browse/HIVE-16462
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16462.1.patch
>
>
> Observed by [~gopalv].
> Having grace hash join enabled prevents the specialized vector hash joins 
> during the vectorizer stage of query planning. However 
> hive.llap.enable.grace.join.in.llap will later disable grace hash join 
> (LlapDecider runs after Vectorizer). If we can disable the grace hash join 
> before vectorization kicks in then we can still benefit from the specialized 
> vector hash joins.
> This can be special cased for the llap.execution.mode=only case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16462) Vectorization: Enabling hybrid grace disables specialization of all reduce side joins

2017-04-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16462:
--
Status: Patch Available  (was: Open)

> Vectorization: Enabling hybrid grace disables specialization of all reduce 
> side joins
> -
>
> Key: HIVE-16462
> URL: https://issues.apache.org/jira/browse/HIVE-16462
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16462.1.patch
>
>
> Observed by [~gopalv].
> Having grace hash join enabled prevents the specialized vector hash joins 
> during the vectorizer stage of query planning. However 
> hive.llap.enable.grace.join.in.llap will later disable grace hash join 
> (LlapDecider runs after Vectorizer). If we can disable the grace hash join 
> before vectorization kicks in then we can still benefit from the specialized 
> vector hash joins.
> This can be special cased for the llap.execution.mode=only case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16462) Vectorization: Enabling hybrid grace disables specialization of all reduce side joins

2017-04-18 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16462:
--
Attachment: HIVE-16462.2.patch

Changing a debug message:

{noformat}
-  LOG.debug("Skipping llap decider");
+  LOG.debug("Skipping llap pre-vectorization pass");
{noformat}

> Vectorization: Enabling hybrid grace disables specialization of all reduce 
> side joins
> -
>
> Key: HIVE-16462
> URL: https://issues.apache.org/jira/browse/HIVE-16462
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16462.1.patch, HIVE-16462.2.patch
>
>
> Observed by [~gopalv].
> Having grace hash join enabled prevents the specialized vector hash joins 
> during the vectorizer stage of query planning. However 
> hive.llap.enable.grace.join.in.llap will later disable grace hash join 
> (LlapDecider runs after Vectorizer). If we can disable the grace hash join 
> before vectorization kicks in then we can still benefit from the specialized 
> vector hash joins.
> This can be special cased for the llap.execution.mode=only case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16462) Vectorization: Enabling hybrid grace disables specialization of all reduce side joins

2017-04-18 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16462:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Vectorization: Enabling hybrid grace disables specialization of all reduce 
> side joins
> -
>
> Key: HIVE-16462
> URL: https://issues.apache.org/jira/browse/HIVE-16462
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-16462.1.patch, HIVE-16462.2.patch
>
>
> Observed by [~gopalv].
> Having grace hash join enabled prevents the specialized vector hash joins 
> during the vectorizer stage of query planning. However 
> hive.llap.enable.grace.join.in.llap will later disable grace hash join 
> (LlapDecider runs after Vectorizer). If we can disable the grace hash join 
> before vectorization kicks in then we can still benefit from the specialized 
> vector hash joins.
> This can be special cased for the llap.execution.mode=only case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16441) De-duplicate semijoin branches in n-way joins

2017-04-24 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16441:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> De-duplicate semijoin branches in n-way joins
> -
>
> Key: HIVE-16441
> URL: https://issues.apache.org/jira/browse/HIVE-16441
> Project: Hive
>  Issue Type: Improvement
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-16441.1.patch, HIVE-16441.2.patch, 
> HIVE-16441.3.patch, HIVE-16441.4.patch
>
>
> Currently in n-way joins, semi join optimization creates n branches on same 
> key. Instead it should reuse one branch for all the joins.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-31 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107690#comment-16107690
 ] 

Jason Dere commented on HIVE-17113:
---

[~ashutoshc] can you review this one?

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, 
> HIVE-17113.3.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17217) SMB Join : Assert if paths are different in TezGroupedSplit in KeyValueInputMerger

2017-08-02 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17217:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> SMB Join : Assert if paths are different in TezGroupedSplit in 
> KeyValueInputMerger
> --
>
> Key: HIVE-17217
> URL: https://issues.apache.org/jira/browse/HIVE-17217
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-17217.1.patch, HIVE-17217.2.patch
>
>
> In KeyValueInputMerger, a TezGroupedSplit may contain more than 1 splits. 
> However, the splits should all belong to same path.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins

2017-08-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17283:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Enable parallel edges of semijoin along with mapjoins
> -
>
> Key: HIVE-17283
> URL: https://issues.apache.org/jira/browse/HIVE-17283
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-17283.1.patch, HIVE-17283.2.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of 
> semijoin with mapjoin. However, in some cases it maybe beneficial to have it.
> We need a config which can enable it.
> The default should be false which maintains the existing behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17281) LLAP external client not properly handling KILLED notification that occurs when a fragment is rejected

2017-08-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17281:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> LLAP external client not properly handling KILLED notification that occurs 
> when a fragment is rejected
> --
>
> Key: HIVE-17281
> URL: https://issues.apache.org/jira/browse/HIVE-17281
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-17281.1.patch
>
>
> When LLAP fragment submission is rejected, the external client receives both 
> REJECTED and KILLED notifications for the fragment. The KILLED notification 
> is being treated as an error, which prevents the retry logic from 
> resubmitting the fragment. This needs to be fixed in the client logic.
> {noformat}
> 17/08/02 04:36:16 INFO LlapBaseInputFormat: Registered id: 
> attempt_2519876382789748565_0005_0_00_21_0
> 17/08/02 04:36:16 INFO LlapTaskUmbilicalExternalClient: Fragment: 
> attempt_2519876382789748565_0005_0_00_21_0 rejected. Server Busy.
> 17/08/02 04:36:16 ERROR LlapTaskUmbilicalExternalClient: Task killed - 
> attempt_2519876382789748565_0005_0_00_21_0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17091) "Timed out getting readerEvents" error from external LLAP client

2017-07-13 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086699#comment-16086699
 ] 

Jason Dere commented on HIVE-17091:
---

Looks like a couple different issues at play here:
1) On the LLAP daemon, the executor finished but it's somehow still stuck 
waiting for the ChannelOutputStream to finish all writes (even though all of 
the data was already received by the client). This might be related to the 
pendingWrites/writeMonitor logic being used by the ChannelOutputStream to 
manage the number of outstanding writes for an external fragment request. I've 
tried replacing this mechanism with a Sempaphore, and so far I haven't seen 
this issue reoccur.
{noformat}
Thread 1802 (TezTR-683826_93_0_0_29_0):
  State: WAITING
  Blocked count: 456
  Wtaited count: 458
  Waiting on java.lang.Object@7e3b8b1
  Stack:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:502)

org.apache.hadoop.hive.llap.ChannelOutputStream.waitForWritesToFinish(ChannelOutputStream.java:153)

org.apache.hadoop.hive.llap.ChannelOutputStream.close(ChannelOutputStream.java:136)
java.io.FilterOutputStream.close(FilterOutputStream.java:159)

org.apache.hadoop.hive.llap.io.ChunkedOutputStream.close(ChunkedOutputStream.java:81)
java.io.FilterOutputStream.close(FilterOutputStream.java:159)
org.apache.hadoop.hive.llap.LlapRecordWriter.close(LlapRecordWriter.java:47)

org.apache.hadoop.hive.ql.io.HivePassThroughRecordWriter.close(HivePassThroughRecordWriter.java:46)

org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:190)

org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1039)
org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:711)
org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:711)
org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:711)
org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:711)

org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:464)

org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:206)
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)

org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
{noformat}

2) The LLAP client received the end of the data stream and is expecting a 
heartbeat with a task complete notification:
{noformat}
05:47:44,060 DEBUG 
org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient: Heartbeat from 
attempt_7085310350540683826_0089_0_00_33_0 events: 1
2017-06-29 05:47:44,060 DEBUG 
org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient: Task update 
event for attempt_7085310350540683826_0089_0_00_33_0
2017-06-29 05:47:44,065 DEBUG 
org.apache.hadoop.hive.llap.io.ChunkedInputStream: 
LlapTaskUmbilicalExternalClient(attempt_7085310350540683826_0089_0_00_33_0):
 Chunk size 131072
2017-06-29 05:47:44,081 DEBUG 
org.apache.hadoop.hive.llap.io.ChunkedInputStream: 
LlapTaskUmbilicalExternalClient(attempt_7085310350540683826_0089_0_00_33_0):
 Chunk size 131072
2017-06-29 05:47:44,097 DEBUG 
org.apache.hadoop.hive.llap.io.ChunkedInputStream: 
LlapTaskUmbilicalExternalClient(attempt_7085310350540683826_0089_0_00_33_0):
 Chunk size 131072
2017-06-29 05:47:44,119 DEBUG 
org.apache.hadoop.hive.llap.io.ChunkedInputStream: 
LlapTaskUmbilicalExternalClient(attempt_7085310350540683826_0089_0_00_33_0):
 Chunk size 30244
2017-06-29 05:47:44,123 DEBUG 
org.apache.hadoop.hive.llap.io.ChunkedInputStream: 
LlapTaskUmbilicalExternalClient(attempt_7085310350540683826_0089_0_00_33_0):
 Chunk size 0
2017-06-29 05:47:44,123 DEBUG 
org.apache.hadoop.hive.llap.io.ChunkedInputStream: 
LlapTaskUmbilicalExternalClient(attempt_7085310350540683826_0089_0_00_33_0):
 Hit end of data
2017-06-29 05:47:44,123 INFO org.apache.hadoop.hive.llap.LlapBaseRecordReader: 
1498729664123 Waiting for reader event for 
LlapTaskUmbilicalExternalClient(attempt_7085310350540683826_0089_0_00_33_0)
{noformat}

Due to issue 1 the task completed event never arrives at the client, though the 
client continues to receive heartbeats from LLAP. Eventually (after 30 
seconds), the external client times out waiting for the task completed event. 
I'm guessing the solution on the client side is that we shouldn't be timing out 
while waiting for the task complete event as long as we are still receiving 
hearbeats. I'll try setting it to an indefinite wait.
{noformat}
2017-06-29 05:48:14,106 DEBUG 
org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient: Received 
heartbeat from container, request={  
containerId=container_7085310350540683826_0089_00_33, requestId=487, 
startIndex=0, 

[jira] [Updated] (HIVE-17091) "Timed out getting readerEvents" error from external LLAP client

2017-07-16 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17091:
--
Status: Patch Available  (was: Open)

> "Timed out getting readerEvents" error from external LLAP client
> 
>
> Key: HIVE-17091
> URL: https://issues.apache.org/jira/browse/HIVE-17091
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17091.1.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Timed out getting readerEvents
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.getReaderEvent(LlapBaseRecordReader.java:261)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:148)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:48)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:121)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:68)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17070) remove .orig files from src

2017-07-11 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082595#comment-16082595
 ] 

Jason Dere commented on HIVE-17070:
---

+1

> remove .orig files from src
> ---
>
> Key: HIVE-17070
> URL: https://issues.apache.org/jira/browse/HIVE-17070
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Trivial
> Attachments: HIVE-17070.patch
>
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig
> ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-07-12 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16926:
--
Attachment: HIVE-16926.5.patch

Patch v5, with changes per feedback.

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch, HIVE-16926.4.patch, HIVE-16926.5.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-07-14 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16926:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master.
Crap I forgot about the imports - I'll try to take care of that in another item.

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch, HIVE-16926.4.patch, HIVE-16926.5.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17091) "Timed out getting readerEvents" error from external LLAP client

2017-07-14 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17091:
--
Attachment: HIVE-17091.1.patch

> "Timed out getting readerEvents" error from external LLAP client
> 
>
> Key: HIVE-17091
> URL: https://issues.apache.org/jira/browse/HIVE-17091
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17091.1.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Timed out getting readerEvents
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.getReaderEvent(LlapBaseRecordReader.java:261)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:148)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:48)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:121)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:68)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17091) "Timed out getting readerEvents" error from external LLAP client

2017-07-14 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088356#comment-16088356
 ] 

Jason Dere commented on HIVE-17091:
---

Patch for the issues mentioned above, plus added the fragment ID to the logging 
to make things easier to track.
[~sseth] [~prasanth_j] can you review?

> "Timed out getting readerEvents" error from external LLAP client
> 
>
> Key: HIVE-17091
> URL: https://issues.apache.org/jira/browse/HIVE-17091
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17091.1.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Timed out getting readerEvents
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.getReaderEvent(LlapBaseRecordReader.java:261)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:148)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:48)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:121)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:68)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-17 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090666#comment-16090666
 ] 

Jason Dere commented on HIVE-17113:
---

Talked to [~ashutoshc] and [~sseth] about this. According to Sid this is 
normally handled in MR using the OutputCommitter. However Ashutosh mentioned 
that Hive does not use the Hadoop OutputCommitter functionality and instead 
tries to handle duplicate task attempts by itself - thus the call to 
Utilities.removeTempOrDuplicateFiles().

A couple of solutions to this on the Hive side:
1) Changing Hive to properly use the OutputCommitter
2) Utiltiies.mvFileToFinalPath() should call 
Utilities.removeTempOrDuplicateFiles() after renaming the temp directory rather 
than before renaming. This is basically swapping the order of steps 6 and 8 in 
the Jira description, within Utilities.mvFileToFinalPath().

Gonna try to do option 2 as it looks like a simpler fix.

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17113:
--
Status: Patch Available  (was: Open)

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-17113:
-


> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17113:
--
Attachment: HIVE-17113.1.patch

Patch to switch the order of file operations during 
Utilities.mvFileToFinalPath() - move the temp directory to the final location 
first, then remove duplicate bucket files.
[~ashutoshc] can you take a look?
[~rajesh.balamohan] FYI this may undo some of the file operation optimization 
you did in HIVE-14323.

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17091) "Timed out getting readerEvents" error from external LLAP client

2017-07-17 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090901#comment-16090901
 ] 

Jason Dere commented on HIVE-17091:
---

That is true, the a task heartbeat should either be interrupted or 
getReaderEvent() will end up receiving the ErrorEvent that is generated by the 
heartbeat timeout.

> "Timed out getting readerEvents" error from external LLAP client
> 
>
> Key: HIVE-17091
> URL: https://issues.apache.org/jira/browse/HIVE-17091
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17091.1.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Timed out getting readerEvents
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.getReaderEvent(LlapBaseRecordReader.java:261)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:148)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:48)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:121)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:68)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17091) "Timed out getting readerEvents" error from external LLAP client

2017-07-18 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17091:
--
Attachment: HIVE-17091.2.patch

Small fix in LlapBaseRecordReader to fix NPE in TestLlapOutputFormat.testValues 
- was calling toString() on potentially null client.

> "Timed out getting readerEvents" error from external LLAP client
> 
>
> Key: HIVE-17091
> URL: https://issues.apache.org/jira/browse/HIVE-17091
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17091.1.patch, HIVE-17091.2.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Timed out getting readerEvents
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.getReaderEvent(LlapBaseRecordReader.java:261)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:148)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:48)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:121)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:68)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-18 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092149#comment-16092149
 ] 

Jason Dere commented on HIVE-17113:
---

Seems to be causing a failure in TestSparkCliDriver skewjoin.q

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-18 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17113:
--
Status: Open  (was: Patch Available)

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17091) "Timed out getting readerEvents" error from external LLAP client

2017-07-18 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17091:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> "Timed out getting readerEvents" error from external LLAP client
> 
>
> Key: HIVE-17091
> URL: https://issues.apache.org/jira/browse/HIVE-17091
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-17091.1.patch, HIVE-17091.2.patch
>
>
> {noformat}
> Caused by: java.io.IOException: Timed out getting readerEvents
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.getReaderEvent(LlapBaseRecordReader.java:261)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:148)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.next(LlapBaseRecordReader.java:48)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:121)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.next(LlapRowRecordReader.java:68)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-25 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100951#comment-16100951
 ] 

Jason Dere commented on HIVE-17113:
---

Spoke offline to [~ashutoshc], who recommended the following approach:
- During Utilities.removeTempOrDuplicateFiles(), maintain a list of files 
found/deduped. This list of files will be used to determine which files are 
moved to the destination directory.
- A configurable setting will be added here to control whether this file list 
will be used to control which files will be moved, or if the existing behavior 
will be used.

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-25 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100564#comment-16100564
 ] 

Jason Dere commented on HIVE-17113:
---

Looks like in the case of skewjoin in Spark, there can be multiple jobs which 
copy files into the same temp directory. When this happens, there can be name 
collisions - in the test there are collisions on files 00_0 and 01_0, 
which get renamed to 00_0_1 and 01_0_1. Since the 
removeTempOrDuplicateFiles() is now being called on the destination directory, 
it's not able to correctly disambiguate the 00_0_1, 01_0_1 files.

Since it looks like the destination directory can potentially hold results from 
more than one job, it does not seem to be correct to simply run 
removeTempOrDuplicateFiles() on the destination directory. Maybe we have to 
change the logic to the following:
1) Move the temp directory to a new directory name, to prevent additional files 
from being added by any runaway processes.
2) Run removeTempOrDuplicateFiles() on this renamed temp directory
3) Run renameOrMoveFiles() to move the renamed temp directory to the final 
location.

Though step 1 might be expensive for cloud storage (basically means performing 
twice the file moves right?) .. [~ashutoshc] should doing step 1 be a 
configurable setting?

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-26 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17113:
--
Attachment: HIVE-17113.3.patch

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, 
> HIVE-17113.3.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-07-11 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083184#comment-16083184
 ] 

Jason Dere commented on HIVE-16926:
---

Maybe I can just replace pendingClients/registeredClients with a single list 
and the RequestInfo can keep a state to show if the request is 
pending/running/etc.
Correct, the shared umbilical server will not be shut down. Is there any action 
needed on this part? I don't think anything is exposed to shut it down.

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch, HIVE-16926.4.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-29 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068894#comment-16068894
 ] 

Jason Dere commented on HIVE-16926:
---

[~sseth] [~sershe] can you review?

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch, HIVE-16926.4.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16761) LLAP IO: SMB joins fail elevator

2017-06-28 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066986#comment-16066986
 ] 

Jason Dere commented on HIVE-16761:
---

+1. Maybe add a comment to llap_smb.q about the SMB results being incorrect 
until HIVE-16965 is fixed.

> LLAP IO: SMB joins fail elevator 
> -
>
> Key: HIVE-16761
> URL: https://issues.apache.org/jira/browse/HIVE-16761
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16761.01.patch, HIVE-16761.02.patch, 
> HIVE-16761.03.patch, HIVE-16761.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:153)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:78)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
>   ... 26 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.exec.vector.LongColumnVector cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextString(BatchToRowReader.java:334)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.nextValue(BatchToRowReader.java:602)
>   at 
> org.apache.hadoop.hive.ql.io.BatchToRowReader.next(BatchToRowReader.java:149)
>   ... 28 more
> {code}
> {code}
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=500;
> select year,quarter,count(*) from transactions_raw_orc_200 a join 
> customer_accounts_orc_200 b on a.account_id=b.account_id group by 
> year,quarter;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-28 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16926:
--
Attachment: HIVE-16926.4.patch

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch, HIVE-16926.4.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16947) Semijoin Reduction : Task cycle created due to multiple semijoins in conjunction with hashjoin

2017-06-28 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16947:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Semijoin Reduction : Task cycle created due to multiple semijoins in 
> conjunction with hashjoin
> --
>
> Key: HIVE-16947
> URL: https://issues.apache.org/jira/browse/HIVE-16947
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-16947.1.patch, HIVE-16947.2.patch, 
> HIVE-16947.3.patch
>
>
> Typically a semijoin branch and a mapjoin may create a cycle when on same 
> operator tree. This is already handled, however, a semijoin branch can serve 
> more than one filters and the cycle detection logic currently only handles 
> the 1st one causing cycles preventing the queries from running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-10616) TypeInfoUtils doesn't handle DECIMAL with just precision specified

2017-06-28 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16067269#comment-16067269
 ] 

Jason Dere commented on HIVE-10616:
---

A bit late to the dance here, but the behavior here is definitely odd, should 
be either
1) correctly parse decimal(precision)
2) raise an error because decimal(precision, scale) is required
Returning decimal(10,0) in this case is just wrong - if we do make a change, 
might as well use [~tfriedr]'s patch. Do you mind adding a testcase to 
TestTypeInfoUtils for this?

> TypeInfoUtils doesn't handle DECIMAL with just precision specified
> --
>
> Key: HIVE-10616
> URL: https://issues.apache.org/jira/browse/HIVE-10616
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
> Attachments: HIVE-10616.1.patch
>
>
> The parseType method in TypeInfoUtils doesn't handle decimal types with just 
> precision specified although that's a valid type definition. 
> As a result, TypeInfoUtils.getTypeInfoFromTypeString will always return 
> decimal(10,0) for any decimal() string. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-10616) TypeInfoUtils doesn't handle DECIMAL with just precision specified

2017-06-30 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-10616:
--
Attachment: HIVE-10616.2.patch

Added test case for patch. [~hagleitn], can you take a look?

> TypeInfoUtils doesn't handle DECIMAL with just precision specified
> --
>
> Key: HIVE-10616
> URL: https://issues.apache.org/jira/browse/HIVE-10616
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
> Attachments: HIVE-10616.1.patch, HIVE-10616.2.patch
>
>
> The parseType method in TypeInfoUtils doesn't handle decimal types with just 
> precision specified although that's a valid type definition. 
> As a result, TypeInfoUtils.getTypeInfoFromTypeString will always return 
> decimal(10,0) for any decimal() string. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-10616) TypeInfoUtils doesn't handle DECIMAL with just precision specified

2017-06-30 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-10616:
--
Status: Patch Available  (was: Open)

> TypeInfoUtils doesn't handle DECIMAL with just precision specified
> --
>
> Key: HIVE-10616
> URL: https://issues.apache.org/jira/browse/HIVE-10616
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Jason Dere
>Priority: Minor
> Attachments: HIVE-10616.1.patch, HIVE-10616.2.patch
>
>
> The parseType method in TypeInfoUtils doesn't handle decimal types with just 
> precision specified although that's a valid type definition. 
> As a result, TypeInfoUtils.getTypeInfoFromTypeString will always return 
> decimal(10,0) for any decimal() string. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-10616) TypeInfoUtils doesn't handle DECIMAL with just precision specified

2017-07-05 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-10616:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Failures do not look related. I've committed to master.

> TypeInfoUtils doesn't handle DECIMAL with just precision specified
> --
>
> Key: HIVE-10616
> URL: https://issues.apache.org/jira/browse/HIVE-10616
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-10616.1.patch, HIVE-10616.2.patch
>
>
> The parseType method in TypeInfoUtils doesn't handle decimal types with just 
> precision specified although that's a valid type definition. 
> As a result, TypeInfoUtils.getTypeInfoFromTypeString will always return 
> decimal(10,0) for any decimal() string. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-10616) TypeInfoUtils doesn't handle DECIMAL with just precision specified

2017-07-05 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-10616:
-

Assignee: Thomas Friedrich  (was: Jason Dere)

> TypeInfoUtils doesn't handle DECIMAL with just precision specified
> --
>
> Key: HIVE-10616
> URL: https://issues.apache.org/jira/browse/HIVE-10616
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 1.0.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
> Attachments: HIVE-10616.1.patch, HIVE-10616.2.patch
>
>
> The parseType method in TypeInfoUtils doesn't handle decimal types with just 
> precision specified although that's a valid type definition. 
> As a result, TypeInfoUtils.getTypeInfoFromTypeString will always return 
> decimal(10,0) for any decimal() string. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16947) Semijoin Reduction : Task cycle created due to multiple semijoins in conjunction with hashjoin

2017-06-27 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065579#comment-16065579
 ] 

Jason Dere commented on HIVE-16947:
---

+1

> Semijoin Reduction : Task cycle created due to multiple semijoins in 
> conjunction with hashjoin
> --
>
> Key: HIVE-16947
> URL: https://issues.apache.org/jira/browse/HIVE-16947
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16947.1.patch, HIVE-16947.2.patch, 
> HIVE-16947.3.patch
>
>
> Typically a semijoin branch and a mapjoin may create a cycle when on same 
> operator tree. This is already handled, however, a semijoin branch can serve 
> more than one filters and the cycle detection logic currently only handles 
> the 1st one causing cycles preventing the queries from running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16553) Change default value for hive.tez.bigtable.minsize.semijoin.reduction

2017-04-28 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16553:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Commited to master

> Change default value for hive.tez.bigtable.minsize.semijoin.reduction
> -
>
> Key: HIVE-16553
> URL: https://issues.apache.org/jira/browse/HIVE-16553
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-16553.1.patch
>
>
> Current value is 1M rows, would like to bump this up to make sure we are not 
> creating semjoin optimizations on dimension tables, since having too many 
> semijoin optimizations can cause serialized execution of tasks if lots of 
> tasks are waiting for semijoin optimizations to be computed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16553) Change default value for hive.tez.bigtable.minsize.semijoin.reduction

2017-04-27 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-16553:
-


> Change default value for hive.tez.bigtable.minsize.semijoin.reduction
> -
>
> Key: HIVE-16553
> URL: https://issues.apache.org/jira/browse/HIVE-16553
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Current value is 1M rows, would like to bump this up to make sure we are not 
> creating semjoin optimizations on dimension tables, since having too many 
> semijoin optimizations can cause serialized execution of tasks if lots of 
> tasks are waiting for semijoin optimizations to be computed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16553) Change default value for hive.tez.bigtable.minsize.semijoin.reduction

2017-04-27 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15987794#comment-15987794
 ] 

Jason Dere commented on HIVE-16553:
---

[~hagleitn] [~gopalv] can you review? Just a simple default config change.

> Change default value for hive.tez.bigtable.minsize.semijoin.reduction
> -
>
> Key: HIVE-16553
> URL: https://issues.apache.org/jira/browse/HIVE-16553
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Current value is 1M rows, would like to bump this up to make sure we are not 
> creating semjoin optimizations on dimension tables, since having too many 
> semijoin optimizations can cause serialized execution of tasks if lots of 
> tasks are waiting for semijoin optimizations to be computed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16553) Change default value for hive.tez.bigtable.minsize.semijoin.reduction

2017-04-27 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16553:
--
Attachment: HIVE-16553.1.patch

Whoops, didn't post the patch! posting now

> Change default value for hive.tez.bigtable.minsize.semijoin.reduction
> -
>
> Key: HIVE-16553
> URL: https://issues.apache.org/jira/browse/HIVE-16553
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16553.1.patch
>
>
> Current value is 1M rows, would like to bump this up to make sure we are not 
> creating semjoin optimizations on dimension tables, since having too many 
> semijoin optimizations can cause serialized execution of tasks if lots of 
> tasks are waiting for semijoin optimizations to be computed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16553) Change default value for hive.tez.bigtable.minsize.semijoin.reduction

2017-04-27 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16553:
--
Status: Patch Available  (was: Open)

> Change default value for hive.tez.bigtable.minsize.semijoin.reduction
> -
>
> Key: HIVE-16553
> URL: https://issues.apache.org/jira/browse/HIVE-16553
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16553.1.patch
>
>
> Current value is 1M rows, would like to bump this up to make sure we are not 
> creating semjoin optimizations on dimension tables, since having too many 
> semijoin optimizations can cause serialized execution of tasks if lots of 
> tasks are waiting for semijoin optimizations to be computed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-28 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16965:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, 
> HIVE-16965.6.patch, HIVE-16965.7.patch, HIVE-16965.8.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-31 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17113:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch, 
> HIVE-17113.3.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17281) LLAP external client not properly handling KILLED notification that occurs when a fragment is rejected

2017-08-09 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17281:
--
Status: Patch Available  (was: Open)

[~rajesh.balamohan] [~sershe] Can you review?

> LLAP external client not properly handling KILLED notification that occurs 
> when a fragment is rejected
> --
>
> Key: HIVE-17281
> URL: https://issues.apache.org/jira/browse/HIVE-17281
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17281.1.patch
>
>
> When LLAP fragment submission is rejected, the external client receives both 
> REJECTED and KILLED notifications for the fragment. The KILLED notification 
> is being treated as an error, which prevents the retry logic from 
> resubmitting the fragment. This needs to be fixed in the client logic.
> {noformat}
> 17/08/02 04:36:16 INFO LlapBaseInputFormat: Registered id: 
> attempt_2519876382789748565_0005_0_00_21_0
> 17/08/02 04:36:16 INFO LlapTaskUmbilicalExternalClient: Fragment: 
> attempt_2519876382789748565_0005_0_00_21_0 rejected. Server Busy.
> 17/08/02 04:36:16 ERROR LlapTaskUmbilicalExternalClient: Task killed - 
> attempt_2519876382789748565_0005_0_00_21_0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17281) LLAP external client not properly handling KILLED notification that occurs when a fragment is rejected

2017-08-09 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17281:
--
Attachment: HIVE-17281.1.patch

> LLAP external client not properly handling KILLED notification that occurs 
> when a fragment is rejected
> --
>
> Key: HIVE-17281
> URL: https://issues.apache.org/jira/browse/HIVE-17281
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17281.1.patch
>
>
> When LLAP fragment submission is rejected, the external client receives both 
> REJECTED and KILLED notifications for the fragment. The KILLED notification 
> is being treated as an error, which prevents the retry logic from 
> resubmitting the fragment. This needs to be fixed in the client logic.
> {noformat}
> 17/08/02 04:36:16 INFO LlapBaseInputFormat: Registered id: 
> attempt_2519876382789748565_0005_0_00_21_0
> 17/08/02 04:36:16 INFO LlapTaskUmbilicalExternalClient: Fragment: 
> attempt_2519876382789748565_0005_0_00_21_0 rejected. Server Busy.
> 17/08/02 04:36:16 ERROR LlapTaskUmbilicalExternalClient: Task killed - 
> attempt_2519876382789748565_0005_0_00_21_0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17281) LLAP external client not properly handling KILLED notification that occurs when a fragment is rejected

2017-08-09 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-17281:
-


> LLAP external client not properly handling KILLED notification that occurs 
> when a fragment is rejected
> --
>
> Key: HIVE-17281
> URL: https://issues.apache.org/jira/browse/HIVE-17281
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> When LLAP fragment submission is rejected, the external client receives both 
> REJECTED and KILLED notifications for the fragment. The KILLED notification 
> is being treated as an error, which prevents the retry logic from 
> resubmitting the fragment. This needs to be fixed in the client logic.
> {noformat}
> 17/08/02 04:36:16 INFO LlapBaseInputFormat: Registered id: 
> attempt_2519876382789748565_0005_0_00_21_0
> 17/08/02 04:36:16 INFO LlapTaskUmbilicalExternalClient: Fragment: 
> attempt_2519876382789748565_0005_0_00_21_0 rejected. Server Busy.
> 17/08/02 04:36:16 ERROR LlapTaskUmbilicalExternalClient: Task killed - 
> attempt_2519876382789748565_0005_0_00_21_0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-25 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17113:
--
Attachment: HIVE-17113.2.patch

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17113) Duplicate bucket files can get written to table by runaway task

2017-07-25 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17113:
--
Status: Patch Available  (was: Open)

> Duplicate bucket files can get written to table by runaway task
> ---
>
> Key: HIVE-17113
> URL: https://issues.apache.org/jira/browse/HIVE-17113
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17113.1.patch, HIVE-17113.2.patch
>
>
> Saw a table get a duplicate bucket file from a Hive query. It looks like the 
> following happened:
> 1. Task attempt A_0 starts,but then stops making progress
> 2. The job was running with speculative execution on, and task attempt A_1 is 
> started
> 3. Task attempt A_1 finishes execution and saves its output to the temp 
> directory.
> 5. A task kill is sent to A_0, though this does appear to actually kill A_0
> 6. The job for the query finishes and Utilities.mvFileToFinalPath() calls 
> Utilities.removeTempOrDuplicateFiles() to check for duplicate bucket files
> 7. A_0 (still running) finally finishes and saves its file to the temp 
> directory. At this point we now have duplicate bucket files - oops!
> 8. Utilities.removeTempOrDuplicateFiles() moves the temp directory to the 
> final location, where it is later moved to the partition directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16550) Semijoin Hints should be able to skip the optimization if needed.

2017-05-03 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16550:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Semijoin Hints should be able to skip the optimization if needed.
> -
>
> Key: HIVE-16550
> URL: https://issues.apache.org/jira/browse/HIVE-16550
> Project: Hive
>  Issue Type: Improvement
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-16550.1.patch, HIVE-16550.2.patch, 
> HIVE-16550.3.patch
>
>
> Currently semi join hints are designed to enforce a particular semi join, 
> however, it should also be able to skip the optimization all together in a 
> query using hints.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16651) LlapProtocolClientProxy stack trace when using llap input format

2017-05-12 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16651:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> LlapProtocolClientProxy stack trace when using llap input format
> 
>
> Key: HIVE-16651
> URL: https://issues.apache.org/jira/browse/HIVE-16651
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-16651.1.patch
>
>
> Seeing this after LlapBaseRecordReader.close():
> {noformat}
> 16/06/28 22:05:32 WARN LlapProtocolClientProxy: RequestManager shutdown with 
> error
> java.util.concurrent.CancellationException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:121)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at com.google.common.util.concurrent.Futures$4.run(Futures.java:1170)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>   at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>   at 
> com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
>   at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
>   at java.util.concurrent.FutureTask.cancel(FutureTask.java:180)
>   at 
> org.apache.hadoop.hive.llap.tez.LlapProtocolClientProxy.serviceStop(LlapProtocolClientProxy.java:131)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient.serviceStop(LlapTaskUmbilicalExternalClient.java:135)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.close(LlapBaseRecordReader.java:84)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.close(LlapRowRecordReader.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16652) LlapInputFormat: Seeing "output error" WARN message

2017-05-12 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16652:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> LlapInputFormat: Seeing "output error" WARN message
> ---
>
> Key: HIVE-16652
> URL: https://issues.apache.org/jira/browse/HIVE-16652
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-16652.1.patch
>
>
> Another warning message I'm seeing in the logs for TestJdbcWithMiniLlap after 
> adding the line to close the RecordReader in the test:
> {noformat}
> 2017-05-11T11:08:34,511  WARN [IPC Server handler 0 on 54847] ipc.Server: IPC 
> Server handler 0 on 54847, call Call#341 Retry#0 heartbeat({  
> containerId=container_6830411502416918223_0003_00_00, requestId=2, 
> startIndex=0, preRoutedStartIndex=0, maxEventsToGet=500, 
> taskAttemptId=attempt_6830411502416918223_0003_0_00_00_0, eventCount=2 
> }), rpc version=2, client version=1, methodsFingerPrint=996603002 from 
> 10.22.8.180:54849: output error
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16651) LlapProtocolClientProxy stack trace when using llap input format

2017-05-12 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16651:
--
Status: Patch Available  (was: Open)

> LlapProtocolClientProxy stack trace when using llap input format
> 
>
> Key: HIVE-16651
> URL: https://issues.apache.org/jira/browse/HIVE-16651
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16651.1.patch
>
>
> Seeing this after LlapBaseRecordReader.close():
> {noformat}
> 16/06/28 22:05:32 WARN LlapProtocolClientProxy: RequestManager shutdown with 
> error
> java.util.concurrent.CancellationException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:121)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at com.google.common.util.concurrent.Futures$4.run(Futures.java:1170)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>   at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>   at 
> com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
>   at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
>   at java.util.concurrent.FutureTask.cancel(FutureTask.java:180)
>   at 
> org.apache.hadoop.hive.llap.tez.LlapProtocolClientProxy.serviceStop(LlapProtocolClientProxy.java:131)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient.serviceStop(LlapTaskUmbilicalExternalClient.java:135)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.close(LlapBaseRecordReader.java:84)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.close(LlapRowRecordReader.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16652) LlapInputFormat: Seeing "output error" WARN message

2017-05-12 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16652:
--
Attachment: HIVE-16652.1.patch

Patch to close external client after slight delay, to allow connection to be 
closed by LLAP daemon before we shut down the umbilical server used by the 
client.
[~sseth], can you take a look?

> LlapInputFormat: Seeing "output error" WARN message
> ---
>
> Key: HIVE-16652
> URL: https://issues.apache.org/jira/browse/HIVE-16652
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16652.1.patch
>
>
> Another warning message I'm seeing in the logs for TestJdbcWithMiniLlap after 
> adding the line to close the RecordReader in the test:
> {noformat}
> 2017-05-11T11:08:34,511  WARN [IPC Server handler 0 on 54847] ipc.Server: IPC 
> Server handler 0 on 54847, call Call#341 Retry#0 heartbeat({  
> containerId=container_6830411502416918223_0003_00_00, requestId=2, 
> startIndex=0, preRoutedStartIndex=0, maxEventsToGet=500, 
> taskAttemptId=attempt_6830411502416918223_0003_0_00_00_0, eventCount=2 
> }), rpc version=2, client version=1, methodsFingerPrint=996603002 from 
> 10.22.8.180:54849: output error
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16652) LlapInputFormat: Seeing "output error" WARN message

2017-05-12 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16652:
--
Status: Patch Available  (was: Open)

> LlapInputFormat: Seeing "output error" WARN message
> ---
>
> Key: HIVE-16652
> URL: https://issues.apache.org/jira/browse/HIVE-16652
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16652.1.patch
>
>
> Another warning message I'm seeing in the logs for TestJdbcWithMiniLlap after 
> adding the line to close the RecordReader in the test:
> {noformat}
> 2017-05-11T11:08:34,511  WARN [IPC Server handler 0 on 54847] ipc.Server: IPC 
> Server handler 0 on 54847, call Call#341 Retry#0 heartbeat({  
> containerId=container_6830411502416918223_0003_00_00, requestId=2, 
> startIndex=0, preRoutedStartIndex=0, maxEventsToGet=500, 
> taskAttemptId=attempt_6830411502416918223_0003_0_00_00_0, eventCount=2 
> }), rpc version=2, client version=1, methodsFingerPrint=996603002 from 
> 10.22.8.180:54849: output error
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16599) NPE in runtime filtering cost when handling SMB Joins

2017-05-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16599:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> NPE in runtime filtering cost when handling SMB Joins
> -
>
> Key: HIVE-16599
> URL: https://issues.apache.org/jira/browse/HIVE-16599
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-16599.1.patch
>
>
> A test with SMB joins failed with NPE in runtime filtering costing logic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16610) Semijoin Hint : Should be able to handle more than one hint per alias

2017-05-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16610:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Semijoin Hint : Should be able to handle more than one hint per alias
> -
>
> Key: HIVE-16610
> URL: https://issues.apache.org/jira/browse/HIVE-16610
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-16610.1.patch, HIVE-16610.2.patch, 
> HIVE-16610.3.patch, HIVE-16610.4.patch
>
>
> Currently the semi join hints can be used to create only one semi join 
> optimization per alias which is very limiting.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16637) Improve end-of-data checking for LLAP input format

2017-05-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16637:
--
Attachment: HIVE-16637.2.patch

It is not necessarily an EOF error there, fixing the error message in patch v2.

> Improve end-of-data checking for LLAP input format
> --
>
> Key: HIVE-16637
> URL: https://issues.apache.org/jira/browse/HIVE-16637
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16637.1.patch, HIVE-16637.2.patch
>
>
> The existing end of stream checking in the record reader is too forgiving of 
> errors and does not recognize situations where the server connection has 
> closed abruptly like HIVE-14093.
> Try to add a way to indicate that we have truly hit the end of the stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16651) LlapProtocolClientProxy stack trace when using llap input format

2017-05-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16651:
--
Attachment: HIVE-16651.1.patch

Looks like this has to do with cancelling the requestManagerFuture in 
LlapProtocolClientProxy. Patch to ignore CancellationException in the 
FuturesCallback. I suppose another fix for this might be to avoid cancelling 
the future in the first place.

[~sseth], can you review?

> LlapProtocolClientProxy stack trace when using llap input format
> 
>
> Key: HIVE-16651
> URL: https://issues.apache.org/jira/browse/HIVE-16651
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16651.1.patch
>
>
> Seeing this after LlapBaseRecordReader.close():
> {noformat}
> 16/06/28 22:05:32 WARN LlapProtocolClientProxy: RequestManager shutdown with 
> error
> java.util.concurrent.CancellationException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:121)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at com.google.common.util.concurrent.Futures$4.run(Futures.java:1170)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>   at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>   at 
> com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
>   at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
>   at java.util.concurrent.FutureTask.cancel(FutureTask.java:180)
>   at 
> org.apache.hadoop.hive.llap.tez.LlapProtocolClientProxy.serviceStop(LlapProtocolClientProxy.java:131)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient.serviceStop(LlapTaskUmbilicalExternalClient.java:135)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.close(LlapBaseRecordReader.java:84)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.close(LlapRowRecordReader.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HIVE-16073) Fix partition column check during runtime filtering

2017-05-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reopened HIVE-16073:
---

> Fix partition column check during runtime filtering
> ---
>
> Key: HIVE-16073
> URL: https://issues.apache.org/jira/browse/HIVE-16073
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16073.1.patch
>
>
> Followup of incorrect partition column check from HIVE-16022.
> Couple things to look at:
> 1. Does this check need to happen at all? Seems like this was a workaround.
> 2. If it is necessary, the logic looked incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-16073) Fix partition column check during runtime filtering

2017-05-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere resolved HIVE-16073.
---
Resolution: Duplicate

> Fix partition column check during runtime filtering
> ---
>
> Key: HIVE-16073
> URL: https://issues.apache.org/jira/browse/HIVE-16073
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16073.1.patch
>
>
> Followup of incorrect partition column check from HIVE-16022.
> Couple things to look at:
> 1. Does this check need to happen at all? Seems like this was a workaround.
> 2. If it is necessary, the logic looked incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16637) Improve end-of-data checking for LLAP input format

2017-05-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16637:
--
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-12991

> Improve end-of-data checking for LLAP input format
> --
>
> Key: HIVE-16637
> URL: https://issues.apache.org/jira/browse/HIVE-16637
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-16637.1.patch, HIVE-16637.2.patch
>
>
> The existing end of stream checking in the record reader is too forgiving of 
> errors and does not recognize situations where the server connection has 
> closed abruptly like HIVE-14093.
> Try to add a way to indicate that we have truly hit the end of the stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16651) LlapProtocolClientProxy stack trace when using llap input format

2017-05-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-16651:
-


> LlapProtocolClientProxy stack trace when using llap input format
> 
>
> Key: HIVE-16651
> URL: https://issues.apache.org/jira/browse/HIVE-16651
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Seeing this after LlapBaseRecordReader.close():
> {noformat}
> 16/06/28 22:05:32 WARN LlapProtocolClientProxy: RequestManager shutdown with 
> error
> java.util.concurrent.CancellationException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:121)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at com.google.common.util.concurrent.Futures$4.run(Futures.java:1170)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>   at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>   at 
> com.google.common.util.concurrent.ListenableFutureTask.done(ListenableFutureTask.java:91)
>   at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:384)
>   at java.util.concurrent.FutureTask.cancel(FutureTask.java:180)
>   at 
> org.apache.hadoop.hive.llap.tez.LlapProtocolClientProxy.serviceStop(LlapProtocolClientProxy.java:131)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.hive.llap.ext.LlapTaskUmbilicalExternalClient.serviceStop(LlapTaskUmbilicalExternalClient.java:135)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
>   at 
> org.apache.hadoop.hive.llap.LlapBaseRecordReader.close(LlapBaseRecordReader.java:84)
>   at 
> org.apache.hadoop.hive.llap.LlapRowRecordReader.close(LlapRowRecordReader.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16652) LlapInputFormat: Seeing "output error" WARN message

2017-05-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-16652:
-


> LlapInputFormat: Seeing "output error" WARN message
> ---
>
> Key: HIVE-16652
> URL: https://issues.apache.org/jira/browse/HIVE-16652
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Another warning message I'm seeing in the logs for TestJdbcWithMiniLlap after 
> adding the line to close the RecordReader in the test:
> {noformat}
> 2017-05-11T11:08:34,511  WARN [IPC Server handler 0 on 54847] ipc.Server: IPC 
> Server handler 0 on 54847, call Call#341 Retry#0 heartbeat({  
> containerId=container_6830411502416918223_0003_00_00, requestId=2, 
> startIndex=0, preRoutedStartIndex=0, maxEventsToGet=500, 
> taskAttemptId=attempt_6830411502416918223_0003_0_00_00_0, eventCount=2 
> }), rpc version=2, client version=1, methodsFingerPrint=996603002 from 
> 10.22.8.180:54849: output error
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16637) Improve end-of-data checking for LLAP input format

2017-05-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16637:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Improve end-of-data checking for LLAP input format
> --
>
> Key: HIVE-16637
> URL: https://issues.apache.org/jira/browse/HIVE-16637
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-16637.1.patch, HIVE-16637.2.patch
>
>
> The existing end of stream checking in the record reader is too forgiving of 
> errors and does not recognize situations where the server connection has 
> closed abruptly like HIVE-14093.
> Try to add a way to indicate that we have truly hit the end of the stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16652) LlapInputFormat: Seeing "output error" WARN message

2017-05-11 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006967#comment-16006967
 ] 

Jason Dere commented on HIVE-16652:
---

I suspect we may be closing the LlapTaskUmbilicalExternalClient and the 
umbilical connection to LLAP daemon before the LlapTaskUmbilicalExternalClient 
has the chance to send the heartbeat response back to the LLAP daemon.


> LlapInputFormat: Seeing "output error" WARN message
> ---
>
> Key: HIVE-16652
> URL: https://issues.apache.org/jira/browse/HIVE-16652
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Another warning message I'm seeing in the logs for TestJdbcWithMiniLlap after 
> adding the line to close the RecordReader in the test:
> {noformat}
> 2017-05-11T11:08:34,511  WARN [IPC Server handler 0 on 54847] ipc.Server: IPC 
> Server handler 0 on 54847, call Call#341 Retry#0 heartbeat({  
> containerId=container_6830411502416918223_0003_00_00, requestId=2, 
> startIndex=0, preRoutedStartIndex=0, maxEventsToGet=500, 
> taskAttemptId=attempt_6830411502416918223_0003_0_00_00_0, eventCount=2 
> }), rpc version=2, client version=1, methodsFingerPrint=996603002 from 
> 10.22.8.180:54849: output error
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16637) Improve end-of-data checking for LLAP input format

2017-05-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-16637:
-


> Improve end-of-data checking for LLAP input format
> --
>
> Key: HIVE-16637
> URL: https://issues.apache.org/jira/browse/HIVE-16637
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> The existing end of stream checking in the record reader is too forgiving of 
> errors and does not recognize situations where the server connection has 
> closed abruptly like HIVE-14093.
> Try to add a way to indicate that we have truly hit the end of the stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16637) Improve end-of-data checking for LLAP input format

2017-05-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16637:
--
Status: Patch Available  (was: Open)

> Improve end-of-data checking for LLAP input format
> --
>
> Key: HIVE-16637
> URL: https://issues.apache.org/jira/browse/HIVE-16637
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16637.1.patch
>
>
> The existing end of stream checking in the record reader is too forgiving of 
> errors and does not recognize situations where the server connection has 
> closed abruptly like HIVE-14093.
> Try to add a way to indicate that we have truly hit the end of the stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16637) Improve end-of-data checking for LLAP input format

2017-05-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16637:
--
Attachment: HIVE-16637.1.patch

Fix the way we check end of data in LlapBaseRecordReader.next().
Also change communication from the LlapOutputFormatService to use a chunked 
format (series of ), with a final 0-length chunk to 
indicate the end of the data stream.

> Improve end-of-data checking for LLAP input format
> --
>
> Key: HIVE-16637
> URL: https://issues.apache.org/jira/browse/HIVE-16637
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16637.1.patch
>
>
> The existing end of stream checking in the record reader is too forgiving of 
> errors and does not recognize situations where the server connection has 
> closed abruptly like HIVE-14093.
> Try to add a way to indicate that we have truly hit the end of the stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-13673) LLAP: handle case where no service instance is found on the host specified in the input split

2017-05-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-13673:
--
Status: Patch Available  (was: Open)

> LLAP: handle case where no service instance is found on the host specified in 
> the input split
> -
>
> Key: HIVE-13673
> URL: https://issues.apache.org/jira/browse/HIVE-13673
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-13673.1.patch
>
>
> From [~sseth] on review of HIVE-13620, in regards to 
> LlapBaseInputFormat.getServiceInstance() and how to handle the case of no 
> LLAP service instance for the host specified in the LLAP input split:
> {quote}
> This should really be a jira and TODO (post merge to master) - to either 1) 
> go to an alternate random address from the available llap instances, or 2) 
> have additional locations provided by HS2.
> I'd lean towards 1. It's absolutely possible for an llap instance to go down, 
> or the node to go down, which would end up causing failures.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-13673) LLAP: handle case where no service instance is found on the host specified in the input split

2017-05-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-13673:
-

Assignee: Jason Dere

> LLAP: handle case where no service instance is found on the host specified in 
> the input split
> -
>
> Key: HIVE-13673
> URL: https://issues.apache.org/jira/browse/HIVE-13673
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-13673.1.patch
>
>
> From [~sseth] on review of HIVE-13620, in regards to 
> LlapBaseInputFormat.getServiceInstance() and how to handle the case of no 
> LLAP service instance for the host specified in the LLAP input split:
> {quote}
> This should really be a jira and TODO (post merge to master) - to either 1) 
> go to an alternate random address from the available llap instances, or 2) 
> have additional locations provided by HS2.
> I'd lean towards 1. It's absolutely possible for an llap instance to go down, 
> or the node to go down, which would end up causing failures.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-13673) LLAP: handle case where no service instance is found on the host specified in the input split

2017-05-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-13673:
--
Attachment: HIVE-13673.1.patch

[~sseth], can you take a look?

> LLAP: handle case where no service instance is found on the host specified in 
> the input split
> -
>
> Key: HIVE-13673
> URL: https://issues.apache.org/jira/browse/HIVE-13673
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
> Attachments: HIVE-13673.1.patch
>
>
> From [~sseth] on review of HIVE-13620, in regards to 
> LlapBaseInputFormat.getServiceInstance() and how to handle the case of no 
> LLAP service instance for the host specified in the LLAP input split:
> {quote}
> This should really be a jira and TODO (post merge to master) - to either 1) 
> go to an alternate random address from the available llap instances, or 2) 
> have additional locations provided by HS2.
> I'd lean towards 1. It's absolutely possible for an llap instance to go down, 
> or the node to go down, which would end up causing failures.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16702) Use LazyBinarySerDe for LLAP InputFormat

2017-05-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16702:
--
Description: Currently using LazySimpleSerDe

> Use LazyBinarySerDe for LLAP InputFormat
> 
>
> Key: HIVE-16702
> URL: https://issues.apache.org/jira/browse/HIVE-16702
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Currently using LazySimpleSerDe



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16702) Use LazyBinarySerDe for LLAP InputFormat

2017-05-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-16702:
-


> Use LazyBinarySerDe for LLAP InputFormat
> 
>
> Key: HIVE-16702
> URL: https://issues.apache.org/jira/browse/HIVE-16702
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16702) Use LazyBinarySerDe for LLAP InputFormat

2017-05-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16702:
--
Attachment: HIVE-16702.1.patch

[~hagleitn] [~prasanth_j] can you take a look?

> Use LazyBinarySerDe for LLAP InputFormat
> 
>
> Key: HIVE-16702
> URL: https://issues.apache.org/jira/browse/HIVE-16702
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16702.1.patch
>
>
> Currently using LazySimpleSerDe



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16702) Use LazyBinarySerDe for LLAP InputFormat

2017-05-17 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16702:
--
Status: Patch Available  (was: Open)

> Use LazyBinarySerDe for LLAP InputFormat
> 
>
> Key: HIVE-16702
> URL: https://issues.apache.org/jira/browse/HIVE-16702
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16702.1.patch
>
>
> Currently using LazySimpleSerDe



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14052) Cleanup of structures required when LLAP access from external clients completes

2017-05-17 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015083#comment-16015083
 ] 

Jason Dere commented on HIVE-14052:
---

I think this looks good .. [~sershe] any other comments?

> Cleanup of structures required when LLAP access from external clients 
> completes
> ---
>
> Key: HIVE-14052
> URL: https://issues.apache.org/jira/browse/HIVE-14052
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Siddharth Seth
> Attachments: HIVE-14052.02.patch, HIVE-14052.04.patch, 
> HIVE-14052.1.patch
>
>
> Per [~sseth]: There's no cleanup at the moment, and structures used in LLAP 
> to track a query will keep building up slowly over time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16599) NPE in runtime filtering cost when handling SMB Joins

2017-05-09 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003856#comment-16003856
 ] 

Jason Dere commented on HIVE-16599:
---

+1

> NPE in runtime filtering cost when handling SMB Joins
> -
>
> Key: HIVE-16599
> URL: https://issues.apache.org/jira/browse/HIVE-16599
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16599.1.patch
>
>
> A test with SMB joins failed with NPE in runtime filtering costing logic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16637) Improve end-of-data checking for LLAP input format

2017-05-10 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005425#comment-16005425
 ] 

Jason Dere commented on HIVE-16637:
---

Whoops, forgot to link the RB: https://reviews.apache.org/r/59152/
In the existing logic, LlapBaseRecordReader would just go ahead and try to read 
from the DataInputStream, regardless of whether the underlying stream was 
finished or not. This does not work - methods like WritableUtils.readVInt() 
assume there is sufficient input in the stream and will throw EOFException if 
there is not. Anyway, we should never have ignored EOFException - that was not 
correct. We should never try to initiate a read if the stream is empty, thus 
adding the hasInput() check. I believe the mark/reset usage is correct - I 
think that's the point of having those methods.


> Improve end-of-data checking for LLAP input format
> --
>
> Key: HIVE-16637
> URL: https://issues.apache.org/jira/browse/HIVE-16637
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16637.1.patch
>
>
> The existing end of stream checking in the record reader is too forgiving of 
> errors and does not recognize situations where the server connection has 
> closed abruptly like HIVE-14093.
> Try to add a way to indicate that we have truly hit the end of the stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16610) Semijoin Hint : Should be able to handle more than one hint per alias

2017-05-10 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005436#comment-16005436
 ] 

Jason Dere commented on HIVE-16610:
---

+1 pending test results

> Semijoin Hint : Should be able to handle more than one hint per alias
> -
>
> Key: HIVE-16610
> URL: https://issues.apache.org/jira/browse/HIVE-16610
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16610.1.patch, HIVE-16610.2.patch, 
> HIVE-16610.3.patch, HIVE-16610.4.patch
>
>
> Currently the semi join hints can be used to create only one semi join 
> optimization per alias which is very limiting.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16637) Improve end-of-data checking for LLAP input format

2017-05-10 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005518#comment-16005518
 ] 

Jason Dere commented on HIVE-16637:
---

I think for this to work (basically, for the ChunkedInputStream always read the 
next chunk length after it's done reading a chunk), the ChunkedInputStream 
would have to do read ahead for the very first chunk size before any calls to 
read() have occurred, probably during the constructor. However, this would mean 
that getRecordReader() (which constructs the ChunkedInputStream) becomes a 
blocking call .. and potentially as long as the fragment execution if there are 
no resulting rows returned by the fragment. Which I don't think is a good thing 
.. unless there is another approach you are suggesting?

I'm not sure if trying to read beyond the end of input is incorrect - it is 
valid to do 
{code}while (in.read() != -1) { }{code}



> Improve end-of-data checking for LLAP input format
> --
>
> Key: HIVE-16637
> URL: https://issues.apache.org/jira/browse/HIVE-16637
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16637.1.patch
>
>
> The existing end of stream checking in the record reader is too forgiving of 
> errors and does not recognize situations where the server connection has 
> closed abruptly like HIVE-14093.
> Try to add a way to indicate that we have truly hit the end of the stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16637) Improve end-of-data checking for LLAP input format

2017-05-10 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005518#comment-16005518
 ] 

Jason Dere edited comment on HIVE-16637 at 5/10/17 9:54 PM:


I think for this to work (basically, for the ChunkedInputStream always read the 
next chunk length after it's done reading a chunk), the ChunkedInputStream 
would have to do read ahead for the very first chunk size before any calls to 
read() have occurred, probably during the constructor. However, this would mean 
that getRecordReader() (which constructs the ChunkedInputStream) becomes a 
blocking call .. and potentially as long as the fragment execution if there are 
no resulting rows returned by the fragment. Which I don't think is a good thing 
.. unless there is another approach you are suggesting?

I'm not sure if trying to read beyond the end of input is incorrect (least the 
first time) - it is valid to do 
{code}while (in.read() != -1) { }{code}




was (Author: jdere):
I think for this to work (basically, for the ChunkedInputStream always read the 
next chunk length after it's done reading a chunk), the ChunkedInputStream 
would have to do read ahead for the very first chunk size before any calls to 
read() have occurred, probably during the constructor. However, this would mean 
that getRecordReader() (which constructs the ChunkedInputStream) becomes a 
blocking call .. and potentially as long as the fragment execution if there are 
no resulting rows returned by the fragment. Which I don't think is a good thing 
.. unless there is another approach you are suggesting?

I'm not sure if trying to read beyond the end of input is incorrect - it is 
valid to do 
{code}while (in.read() != -1) { }{code}



> Improve end-of-data checking for LLAP input format
> --
>
> Key: HIVE-16637
> URL: https://issues.apache.org/jira/browse/HIVE-16637
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16637.1.patch
>
>
> The existing end of stream checking in the record reader is too forgiving of 
> errors and does not recognize situations where the server connection has 
> closed abruptly like HIVE-14093.
> Try to add a way to indicate that we have truly hit the end of the stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-20 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16926:
--
Attachment: HIVE-16926.1.patch

Initial patch, restructured the LlapTaskUmbilicalExternalClient code a bit.
- Uses shared LLAP umbilical server rather than a new server per external client
- Retries rejected submissions (WorkSubmitter helper class)
- No more deferred cleanup (from HIVE-16652). One thing about this is that once 
clients are closed/unregistered, communicator.stop() is called and it's removed 
from the registered list of clients. So we might get a few warning messages 
about untracked taskAttemptIds coming in during heartbeat() .. if this is 
undesirable we might be able to leave them in the registeredClients list (but 
ignore heartbeats to them as they are tagged as closed), and remove them using 
the HeartbeatCheckTask once they get too old.

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-20 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-16926:
-


> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16937) INFORMATION_SCHEMA usability: everything is currently a string

2017-06-23 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061405#comment-16061405
 ] 

Jason Dere commented on HIVE-16937:
---

+1 pending test results

> INFORMATION_SCHEMA usability: everything is currently a string
> --
>
> Key: HIVE-16937
> URL: https://issues.apache.org/jira/browse/HIVE-16937
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Carter Shanklin
>Assignee: Gunther Hagleitner
> Attachments: HIVE-16937.1.patch
>
>
> HIVE-1010 adds an information schema to Hive, also taking the opportunity to 
> expose some non-standard but valuable things like statistics in a SYS table.
> A challenge I have noted with the SYS table is that all statistic counts are 
> exposed as string types rather than numerics.
> {code}
> hive> show create table sys.tab_col_stats;
> OK
> CREATE TABLE `sys.tab_col_stats`(
>   `cs_id` string COMMENT 'from deserializer',
>   `db_name` string COMMENT 'from deserializer',
>   `table_name` string COMMENT 'from deserializer',
>   `column_name` string COMMENT 'from deserializer',
>   `column_type` string COMMENT 'from deserializer',
>   `tbl_id` string COMMENT 'from deserializer',
>   `long_low_value` string COMMENT 'from deserializer',
>   `long_high_value` string COMMENT 'from deserializer',
>   `double_high_value` string COMMENT 'from deserializer',
>   `double_low_value` string COMMENT 'from deserializer',
>   `big_decimal_low_value` string COMMENT 'from deserializer',
>   `big_decimal_high_value` string COMMENT 'from deserializer',
>   `num_nulls` string COMMENT 'from deserializer',
>   `num_distincts` string COMMENT 'from deserializer',
>   `avg_col_len` string COMMENT 'from deserializer',
>   `max_col_len` string COMMENT 'from deserializer',
>   `num_trues` string COMMENT 'from deserializer',
>   `num_falses` string COMMENT 'from deserializer',
>   `last_analyzed` string COMMENT 'from deserializer')
> ROW FORMAT SERDE
>   'org.apache.hive.storage.jdbc.JdbcSerDe'
> STORED BY
>   'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> {code}
> So you might run this query to try and find the column(s) which have the most 
> distinct values.
> {code}
> select
>   db_name, table_name, column_name
> from
>   sys.tab_col_stats
> where
>   num_distincts = ( select max(num_distincts) from sys.tab_col_stats );
> {code}
> Unfortunately this maximum is based on string sorting so it's not likely what 
> you really want.
> It would be better to use numeric types where appropriate such as all the 
> numbers in tab_col_stats, and most likely bigints should be used for stats 
> like # rows, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16947) Semijoin Reduction : Task cycle created due to multiple semijoins in conjunction with hashjoin

2017-06-23 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061281#comment-16061281
 ] 

Jason Dere commented on HIVE-16947:
---

Can you add a test for this?

> Semijoin Reduction : Task cycle created due to multiple semijoins in 
> conjunction with hashjoin
> --
>
> Key: HIVE-16947
> URL: https://issues.apache.org/jira/browse/HIVE-16947
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16947.1.patch
>
>
> Typically a semijoin branch and a mapjoin may create a cycle when on same 
> operator tree. This is already handled, however, a semijoin branch can serve 
> more than one filters and the cycle detection logic currently only handles 
> the 1st one causing cycles preventing the queries from running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-23 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16926:
--
Status: Patch Available  (was: Open)

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-23 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16926:
--
Attachment: HIVE-16926.2.patch

- Umbilical token should be same for all fragments of the same request
- Minor restructuring of retried requests
- Minor renaming


> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-06-27 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16926:
--
Attachment: HIVE-16926.3.patch

Those test results were unexpected. Re-attaching same patch to run again.

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-13673) LLAP: handle case where no service instance is found on the host specified in the input split

2017-05-18 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-13673:
--
Attachment: HIVE-13673.2.patch

Updating patch to instantiate Random instance up front.

> LLAP: handle case where no service instance is found on the host specified in 
> the input split
> -
>
> Key: HIVE-13673
> URL: https://issues.apache.org/jira/browse/HIVE-13673
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-13673.1.patch, HIVE-13673.2.patch
>
>
> From [~sseth] on review of HIVE-13620, in regards to 
> LlapBaseInputFormat.getServiceInstance() and how to handle the case of no 
> LLAP service instance for the host specified in the LLAP input split:
> {quote}
> This should really be a jira and TODO (post merge to master) - to either 1) 
> go to an alternate random address from the available llap instances, or 2) 
> have additional locations provided by HS2.
> I'd lean towards 1. It's absolutely possible for an llap instance to go down, 
> or the node to go down, which would end up causing failures.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16691) Add test for more datatypes for LlapInputFormat

2017-05-19 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16691:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Commited to master

> Add test for more datatypes for LlapInputFormat
> ---
>
> Key: HIVE-16691
> URL: https://issues.apache.org/jira/browse/HIVE-16691
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-16691.1.patch
>
>
> Update the test to include more data types.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-13673) LLAP: handle case where no service instance is found on the host specified in the input split

2017-05-19 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-13673:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> LLAP: handle case where no service instance is found on the host specified in 
> the input split
> -
>
> Key: HIVE-13673
> URL: https://issues.apache.org/jira/browse/HIVE-13673
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-13673.1.patch, HIVE-13673.2.patch
>
>
> From [~sseth] on review of HIVE-13620, in regards to 
> LlapBaseInputFormat.getServiceInstance() and how to handle the case of no 
> LLAP service instance for the host specified in the LLAP input split:
> {quote}
> This should really be a jira and TODO (post merge to master) - to either 1) 
> go to an alternate random address from the available llap instances, or 2) 
> have additional locations provided by HS2.
> I'd lean towards 1. It's absolutely possible for an llap instance to go down, 
> or the node to go down, which would end up causing failures.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16702) Use LazyBinarySerDe for LLAP InputFormat

2017-05-19 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16702:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Use LazyBinarySerDe for LLAP InputFormat
> 
>
> Key: HIVE-16702
> URL: https://issues.apache.org/jira/browse/HIVE-16702
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-16702.1.patch
>
>
> Currently using LazySimpleSerDe



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16691) Add test for more datatypes for LlapInputFormat

2017-05-16 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-16691:
-


> Add test for more datatypes for LlapInputFormat
> ---
>
> Key: HIVE-16691
> URL: https://issues.apache.org/jira/browse/HIVE-16691
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Update the test to include more data types.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


<    4   5   6   7   8   9   10   11   12   13   >