from:"Eric Wohlstadter \(JIRA\)"

[jira] [Commented] (HIVE-21560) Update Derby DDL to use CLOB instead of LONG VARCHAR

2019-04-03 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809183#comment-16809183
 ] 

Eric Wohlstadter commented on HIVE-21560:
-

Spoke to [~Rajkumar Singh] about this.
lgtm (pending clean test run)

> Update Derby DDL to use CLOB instead of LONG VARCHAR
> 
>
> Key: HIVE-21560
> URL: https://issues.apache.org/jira/browse/HIVE-21560
> Project: Hive
>  Issue Type: Bug
>Reporter: Shawn Weeks
>Assignee: Rajkumar Singh
>Priority: Minor
> Attachments: HIVE-21560.01.patch, HIVE-21560.02.patch, 
> HIVE-21560.patch
>
>
> in the Hive 1.x and 2.x metastore version for Derby there are two column in 
> "TBLS" that are set to LONG VARCHAR. This causes larger create view 
> statements to fail when using embedded metastore for testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20419) Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key

2019-01-19 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-20419:
---

Assignee: Teddy Choi

> Vectorization: Prevent mutation of VectorPartitionDesc after being used in a 
> hashmap key
> 
>
> Key: HIVE-20419
> URL: https://issues.apache.org/jira/browse/HIVE-20419
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Teddy Choi
>Priority: Major
>
> This is going into the loop because the VectorPartitionDesc is modified after 
> it is used in the HashMap key - resulting in a hashcode & equals modification 
> after it has been placed in the hashmap.
> {code}
> HiveServer2-Background-Pool: Thread-6049 State: RUNNABLE CPU usage on sample: 
> 621ms
> java.util.HashMap$TreeNode.find(int, Object, Class) HashMap.java:1869  <7 
> recursive calls>
> java.util.HashMap$TreeNode.putTreeVal(HashMap, HashMap$Node[], int, Object, 
> Object) HashMap.java:1989
> java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 
> HashMap.java:637
> java.util.HashMap.put(Object, Object) HashMap.java:611
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.addVectorPartitionDesc(PartitionDesc,
>  VectorPartitionDesc, Map) Vectorizer.java:1272
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.verifyAndSetVectorPartDesc(PartitionDesc,
>  boolean, List, Set, Map, Set, ArrayList, Set) Vectorizer.java:1323
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateInputFormatAndSchemaEvolution(MapWork,
>  String, TableScanOperator, Vectorizer$VectorTaskColumnInfo) 
> Vectorizer.java:1654
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateAndVectorizeMapWork(MapWork,
>  Vectorizer$VectorTaskColumnInfo, boolean) Vectorizer.java:1865
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(MapWork,
>  boolean) Vectorizer.java:1109
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Node,
>  Stack, Object[]) Vectorizer.java:961
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(Node, Stack, 
> TaskGraphWalker$TaskGraphWalkerContext) TaskGraphWalker.java:111
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(Node) 
> TaskGraphWalker.java:180
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(Collection, 
> HashMap) TaskGraphWalker.java:125
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(PhysicalContext)
>  Vectorizer.java:2442
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(List, 
> ParseContext, Context) TezCompiler.java:717
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(ParseContext, List, 
> HashSet, HashSet) TaskCompiler.java:258
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(ASTNode, 
> SemanticAnalyzer$PlannerContextFactory) SemanticAnalyzer.java:12443
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(ASTNode) 
> CalcitePlanner.java:358
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21031) Array with one empty string is inserted as an empty array

2018-12-11 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718320#comment-16718320
 ] 

Eric Wohlstadter commented on HIVE-21031:
-

[~pbyrnes]

This may be related to HIVE-20827

/cc [~teddy.choi]

> Array with one empty string is inserted as an empty array
> -
>
> Key: HIVE-21031
> URL: https://issues.apache.org/jira/browse/HIVE-21031
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.2
>Reporter: Patrick Byrnes
>Priority: Major
>
> In beeline the output of
> {code:java}
> select array("");{code}
> is:
> {code:java}
> [""]
> {code}
> However, the output of
> {code:java}
> insert into table a select array("");select * from a;{code}
> is one row of:
> {code:java}
> []{code}
>  
>  
> Similarly, the output of
> {code:java}
> select array(array()){code}
> is:
> {code:java}
> [[]]{code}
> However, the output of
> {code:java}
> insert into table b select array(array());select a,size(a) from b;{code}
> is one row of:
> {code:java}
> []{code}
>  
> Is there a way to insert an array whose only element is an empty string or an 
> array whose only element is an empty array into a table?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20985) If select operator inputs are temporary columns vectorization may reuse some of them as output

2018-11-29 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704095#comment-16704095
 ] 

Eric Wohlstadter commented on HIVE-20985:
-

[~teddy.choi]

Can you please review? Does the current patch use memory efficiently?

> If select operator inputs are temporary columns vectorization may reuse some 
> of them as output
> --
>
> Key: HIVE-20985
> URL: https://issues.apache.org/jira/browse/HIVE-20985
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-20985.01.patch, HIVE-20985.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20979) Fix memory leak in hive streaming

2018-11-29 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703878#comment-16703878
 ] 

Eric Wohlstadter commented on HIVE-20979:
-

[~ShubhamChaurasia]

lgtm

> Fix memory leak in hive streaming
> -
>
> Key: HIVE-20979
> URL: https://issues.apache.org/jira/browse/HIVE-20979
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 3.1.1
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-20979.1.patch, HIVE-20979.1.patch, 
> HIVE-20979.2.patch, HIVE-20979.3.patch, HIVE-20979.4.patch
>
>
> {{1) HiveStreamingConnection.Builder#init() adds a shutdown hook handler via 
> }}{{ShutdownHookManager.addShutdownHook but it is never removed which causes 
> all the handlers to accumulate and hence a memory leak.}}
> 2) AbstractRecordWriter creates an instance of FileSystem but does not close 
> it which in turn causes a leak due to accumulation in FileSystem$Cache#map
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20981) streaming/AbstractRecordWriter leaks HeapMemoryMonitor

2018-11-28 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20981:

Attachment: HIVE-20981.1.patch

> streaming/AbstractRecordWriter leaks HeapMemoryMonitor
> --
>
> Key: HIVE-20981
> URL: https://issues.apache.org/jira/browse/HIVE-20981
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20981.1.patch
>
>
> Each record writer registers a memory monitor with the MemoryMXBean but they 
> aren't removed. So the listener objects/lambdas accumulate over time in the 
> bean. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20981) streaming/AbstractRecordWriter leaks HeapMemoryMonitor

2018-11-28 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20981:

Status: Patch Available  (was: Open)

> streaming/AbstractRecordWriter leaks HeapMemoryMonitor
> --
>
> Key: HIVE-20981
> URL: https://issues.apache.org/jira/browse/HIVE-20981
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20981.1.patch
>
>
> Each record writer registers a memory monitor with the MemoryMXBean but they 
> aren't removed. So the listener objects/lambdas accumulate over time in the 
> bean. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20981) streaming/AbstractRecordWriter leaks HeapMemoryMonitor

2018-11-28 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-20981:
---


> streaming/AbstractRecordWriter leaks HeapMemoryMonitor
> --
>
> Key: HIVE-20981
> URL: https://issues.apache.org/jira/browse/HIVE-20981
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
>
> Each record writer registers a memory monitor with the MemoryMXBean but they 
> aren't removed. So the listener objects/lambdas accumulate over time in the 
> bean. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (HIVE-20552) Get Schema from LogicalPlan faster

2018-10-29 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reopened HIVE-20552:
-

Reopened. The patch is failing for me with:
{code:java}
Caused by: java.lang.NullPointerException: null^M
at 
org.apache.hadoop.hive.ql.parse.QBSubQuery.(QBSubQuery.java:489)^M
at 
org.apache.hadoop.hive.ql.parse.SubQueryUtils.buildSubQuery(SubQueryUtils.java:249)^M
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.subqueryRestrictionCheck(CalcitePlanner.java:3141)^M
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSubQueryRelNode(CalcitePlanner.java:3322)^M
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterRelNode(CalcitePlanner.java:3379)^M
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genFilterLogicalPlan(CalcitePlanner.java:3434)^M
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4970)^M
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1722)^M
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1670)^M
at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)^M
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)^M
at 
org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)^M
at 
org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)^M
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1431)^M
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genLogicalPlan(CalcitePlanner.java:393)^M
at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parseQueryAndGetSchema(ParseUtils.java:554)^M
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:254)^M
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:206)^M
at 
org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)^M
at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)^M
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)^M
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)^M
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)^M
at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)^M
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)^M
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)^M
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:519)^M
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:511)^M
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)^M
... 16 more{code}

> Get Schema from LogicalPlan faster
> --
>
> Key: HIVE-20552
> URL: https://issues.apache.org/jira/browse/HIVE-20552
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-20552.1.patch, HIVE-20552.2.patch, 
> HIVE-20552.3.patch
>
>
> To get the schema of a query faster, it currently needs to compile, optimize, 
> and generate a TezPlan, which creates extra overhead when only the 
> LogicalPlan is needed.
> 1. Copy the method \{{HiveMaterializedViewsRegistry.parseQuery}}, making it 
> \{{public static}} and putting it in a utility class. 
> 2. Change the return statement of the method to \{{return 
> analyzer.getResultSchema();}}
> 3. Change the return type of the method to \{{List}}
> 4. Call the new method from \{{GenericUDTFGetSplits.createPlanFragment}} 
> replacing the current code which does this:
> {code}
>  if(num == 0) {
>  //Schema only
>  return new PlanFragment(null, schema, null);
>  }
> {code}
> moving the call earlier in \{{getPlanFragment}} ... right after the HiveConf 
> is created ... bypassing the code that uses \{{HiveTxnManager}} and 
> \{{Driver}}.
> 5. Convert the \{{List}} to 
> \{{org.apache.hadoop.hive.llap.Schema}}.
> 6. return from \{{getPlanFragment}} by returning \{{new PlanFragment(null, 
> schema, null)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20321) Vectorization: Cut down memory size of 1 col VectorHashKeyWrapper to <1 CacheLine

2018-08-13 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578834#comment-16578834
 ] 

Eric Wohlstadter commented on HIVE-20321:
-

"specific case was query23 and query65"

"query23 has group by ss_customer_sk across store_sales"

"query65 has group by ss_store_sk, ss_item_sk"

> Vectorization: Cut down memory size of 1 col VectorHashKeyWrapper to <1 
> CacheLine
> -
>
> Key: HIVE-20321
> URL: https://issues.apache.org/jira/browse/HIVE-20321
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Priority: Major
>
> With a full sized LLAP instance, the memory size of the VectorHashKeyWrapper 
> is bigger than the low Xmx JVMs.
> {code}
> * 64-bit VM: **
> org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapper object internals:
>  OFFSET  SIZE 
> TYPE DESCRIPTION  VALUE
>   016 
>  (object header)  N/A
>  16 4 
>  int VectorHashKeyWrapper.hashcodeN/A
>  20 4 
>  (alignment/padding gap) 
>  24 8 
>   long[] VectorHashKeyWrapper.longValues  N/A
>  32 8 
> double[] VectorHashKeyWrapper.doubleValuesN/A
>  40 8 
> byte[][] VectorHashKeyWrapper.byteValues  N/A
>  48 8 
>int[] VectorHashKeyWrapper.byteStarts  N/A
>  56 8 
>int[] VectorHashKeyWrapper.byteLengths N/A
>  64 8   
> org.apache.hadoop.hive.serde2.io.HiveDecimalWritable[] 
> VectorHashKeyWrapper.decimalValues   N/A
>  72 8 
> java.sql.Timestamp[] VectorHashKeyWrapper.timestampValues N/A
>  80 8 
> org.apache.hadoop.hive.common.type.HiveIntervalDayTime[] 
> VectorHashKeyWrapper.intervalDayTimeValues   N/A
>  88 8
> boolean[] VectorHashKeyWrapper.isNull  N/A
>  96 8   
> org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapper.HashContext 
> VectorHashKeyWrapper.hashCtx N/A
> Instance size: 104 bytes
> Space losses: 4 bytes internal + 0 bytes external = 4 bytes total
> {code}
> Pulling this up to a parent class allows for this to be cut down to 32 bytes 
> for the single column case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-08-10 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-20203:
---

Assignee: Eric Wohlstadter  (was: SAUVAGEAU Eric)

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Fix For: 4.0.0
>
> Attachments: HIVE-20203.1.patch, HIVE-20203.2.patch, 
> HIVE-20203.3.patch, HIVE-20203.4.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-08-10 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-20203:
---

Assignee: SAUVAGEAU Eric  (was: M. Arshad Khan)

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: SAUVAGEAU Eric
>Priority: Blocker
> Fix For: 4.0.0
>
> Attachments: HIVE-20203.1.patch, HIVE-20203.2.patch, 
> HIVE-20203.3.patch, HIVE-20203.4.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-08 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573728#comment-16573728
 ] 

Eric Wohlstadter commented on HIVE-20312:
-

[~teddy.choi]

Can you go ahead and merge to master?

Thanks for the review!

> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20312.1.patch, HIVE-20312.2.patch, 
> HIVE-20312.3.patch
>
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-07 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572630#comment-16572630
 ] 

Eric Wohlstadter commented on HIVE-20312:
-

I modified {{LlapArrowRowInputFormat}} so this will be tested by the existing 
{{TestJdbcWithMiniLlapArrow}}

> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20312.1.patch, HIVE-20312.2.patch, 
> HIVE-20312.3.patch
>
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-07 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20312:

Status: Patch Available  (was: Open)

> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20312.1.patch, HIVE-20312.2.patch, 
> HIVE-20312.3.patch
>
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-07 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20312:

Attachment: HIVE-20312.3.patch

> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20312.1.patch, HIVE-20312.2.patch, 
> HIVE-20312.3.patch
>
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-07 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20312:

Status: Open  (was: Patch Available)

> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20312.1.patch, HIVE-20312.2.patch
>
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20044) Arrow Serde should pad char values and handle empty strings correctly

2018-08-07 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20044:

Attachment: HIVE-20044.3.patch

> Arrow Serde should pad char values and handle empty strings correctly
> -
>
> Key: HIVE-20044
> URL: https://issues.apache.org/jira/browse/HIVE-20044
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Teddy Choi
>Assignee: Eric Wohlstadter
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20044.1.branch-3.patch, HIVE-20044.1.patch, 
> HIVE-20044.1.patch, HIVE-20044.2.patch, HIVE-20044.3.patch, HIVE-20044.patch
>
>
> When Arrow Serde serializes char values, it loses padding. Also when it 
> counts empty strings, sometimes it makes a smaller number. It should pad char 
> values and handle empty strings correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20044) Arrow Serde should pad char values and handle empty strings correctly

2018-08-07 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20044:

Status: Patch Available  (was: Open)

> Arrow Serde should pad char values and handle empty strings correctly
> -
>
> Key: HIVE-20044
> URL: https://issues.apache.org/jira/browse/HIVE-20044
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Teddy Choi
>Assignee: Eric Wohlstadter
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20044.1.branch-3.patch, HIVE-20044.1.patch, 
> HIVE-20044.1.patch, HIVE-20044.2.patch, HIVE-20044.3.patch, HIVE-20044.patch
>
>
> When Arrow Serde serializes char values, it loses padding. Also when it 
> counts empty strings, sometimes it makes a smaller number. It should pad char 
> values and handle empty strings correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20044) Arrow Serde should pad char values and handle empty strings correctly

2018-08-07 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572363#comment-16572363
 ] 

Eric Wohlstadter commented on HIVE-20044:
-

[~mmccline] [~teddy.choi]

Updated the patch to merge with changes from HIVE-20300

> Arrow Serde should pad char values and handle empty strings correctly
> -
>
> Key: HIVE-20044
> URL: https://issues.apache.org/jira/browse/HIVE-20044
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20044.1.branch-3.patch, HIVE-20044.1.patch, 
> HIVE-20044.1.patch, HIVE-20044.2.patch, HIVE-20044.3.patch, HIVE-20044.patch
>
>
> When Arrow Serde serializes char values, it loses padding. Also when it 
> counts empty strings, sometimes it makes a smaller number. It should pad char 
> values and handle empty strings correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20044) Arrow Serde should pad char values and handle empty strings correctly

2018-08-07 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-20044:
---

Assignee: Teddy Choi  (was: Eric Wohlstadter)

> Arrow Serde should pad char values and handle empty strings correctly
> -
>
> Key: HIVE-20044
> URL: https://issues.apache.org/jira/browse/HIVE-20044
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20044.1.branch-3.patch, HIVE-20044.1.patch, 
> HIVE-20044.1.patch, HIVE-20044.2.patch, HIVE-20044.3.patch, HIVE-20044.patch
>
>
> When Arrow Serde serializes char values, it loses padding. Also when it 
> counts empty strings, sometimes it makes a smaller number. It should pad char 
> values and handle empty strings correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20044) Arrow Serde should pad char values and handle empty strings correctly

2018-08-07 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20044:

Status: Open  (was: Patch Available)

> Arrow Serde should pad char values and handle empty strings correctly
> -
>
> Key: HIVE-20044
> URL: https://issues.apache.org/jira/browse/HIVE-20044
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20044.1.branch-3.patch, HIVE-20044.1.patch, 
> HIVE-20044.1.patch, HIVE-20044.2.patch, HIVE-20044.patch
>
>
> When Arrow Serde serializes char values, it loses padding. Also when it 
> counts empty strings, sometimes it makes a smaller number. It should pad char 
> values and handle empty strings correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20044) Arrow Serde should pad char values and handle empty strings correctly

2018-08-07 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-20044:
---

Assignee: Eric Wohlstadter  (was: Teddy Choi)

> Arrow Serde should pad char values and handle empty strings correctly
> -
>
> Key: HIVE-20044
> URL: https://issues.apache.org/jira/browse/HIVE-20044
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Teddy Choi
>Assignee: Eric Wohlstadter
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20044.1.branch-3.patch, HIVE-20044.1.patch, 
> HIVE-20044.1.patch, HIVE-20044.2.patch, HIVE-20044.patch
>
>
> When Arrow Serde serializes char values, it loses padding. Also when it 
> counts empty strings, sometimes it makes a smaller number. It should pad char 
> values and handle empty strings correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-06 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20312:

Status: Patch Available  (was: Open)

> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20312.1.patch, HIVE-20312.2.patch
>
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-06 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20312:

Attachment: HIVE-20312.2.patch

> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20312.1.patch, HIVE-20312.2.patch
>
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-06 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20312:

Status: Open  (was: Patch Available)

> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20312.1.patch, HIVE-20312.2.patch
>
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-06 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20300:

Attachment: HIVE-20300.4.patch

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch, 
> HIVE-20300.3.patch, HIVE-20300.4.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-06 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20300:

Status: Patch Available  (was: Open)

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch, 
> HIVE-20300.3.patch, HIVE-20300.4.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-06 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20300:

Status: Open  (was: Patch Available)

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch, 
> HIVE-20300.3.patch, HIVE-20300.4.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-05 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569631#comment-16569631
 ] 

Eric Wohlstadter commented on HIVE-20300:
-

[~mmccline] [~teddy.choi]

I'll make a followup ticket to use randomized schemas/data 
TestArrowColumnarBatchSerDe.

I think that is orthogonal to this ticket, as these tests weren't randomized 
previously. 

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch, 
> HIVE-20300.3.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-05 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569628#comment-16569628
 ] 

Eric Wohlstadter edited comment on HIVE-20300 at 8/6/18 1:14 AM:
-

[~jdere] [~mmccline]

Updated patch based on comments:
 # Lazy initialize VectorFileSinkArrowOperator.recordWriter
 # Make VectorFileSinkArrowOperator fields transient unless they were created 
in the constructor


was (Author: ewohlstadter):
[~jdere] [~mmccline]

Updated patch based on comments:
 # Lazy initialize VectorFileSinkArrowOperator.recordWriter
 # Make VectorFileSinkArrowOperator transient unless they were created in the 
constructor

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch, 
> HIVE-20300.3.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-05 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569628#comment-16569628
 ] 

Eric Wohlstadter commented on HIVE-20300:
-

[~jdere] [~mmccline]

Updated patch based on comments:
 # Lazy initialize VectorFileSinkArrowOperator.recordWriter
 # Make VectorFileSinkArrowOperator transient unless they were created in the 
constructor

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch, 
> HIVE-20300.3.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20300:

Status: Patch Available  (was: Open)

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch, 
> HIVE-20300.3.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20300:

Attachment: HIVE-20300.3.patch

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch, 
> HIVE-20300.3.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20300:

Status: Open  (was: Patch Available)

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch, 
> HIVE-20300.3.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20312:

Status: Patch Available  (was: In Progress)

> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20312.1.patch
>
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20312:

Attachment: HIVE-20312.1.patch

> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20312.1.patch
>
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20290) Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during GetSplits

2018-08-04 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20290:

Status: Patch Available  (was: Open)

> Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during 
> GetSplits
> ---
>
> Key: HIVE-20290
> URL: https://issues.apache.org/jira/browse/HIVE-20290
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20290.1.patch, HIVE-20290.2.patch
>
>
> When using {{GenericUDTFGetSplits}} to create {{LlapInputSplit}} for 
> submission to {{LlapOutputFormatService}}, the physical plan generation 
> initializes whatever SerDe is being used.
> {{ArrowColumnarBatchSerDe}} initializes buffers for Arrow and 
> {{VectorizedRowBatch}} at this point inside HS2 which are never used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20290) Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during GetSplits

2018-08-04 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20290:

Attachment: HIVE-20290.2.patch

> Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during 
> GetSplits
> ---
>
> Key: HIVE-20290
> URL: https://issues.apache.org/jira/browse/HIVE-20290
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20290.1.patch, HIVE-20290.2.patch
>
>
> When using {{GenericUDTFGetSplits}} to create {{LlapInputSplit}} for 
> submission to {{LlapOutputFormatService}}, the physical plan generation 
> initializes whatever SerDe is being used.
> {{ArrowColumnarBatchSerDe}} initializes buffers for Arrow and 
> {{VectorizedRowBatch}} at this point inside HS2 which are never used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20290) Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during GetSplits

2018-08-04 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20290:

Status: Open  (was: Patch Available)

> Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during 
> GetSplits
> ---
>
> Key: HIVE-20290
> URL: https://issues.apache.org/jira/browse/HIVE-20290
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20290.1.patch, HIVE-20290.2.patch
>
>
> When using {{GenericUDTFGetSplits}} to create {{LlapInputSplit}} for 
> submission to {{LlapOutputFormatService}}, the physical plan generation 
> initializes whatever SerDe is being used.
> {{ArrowColumnarBatchSerDe}} initializes buffers for Arrow and 
> {{VectorizedRowBatch}} at this point inside HS2 which are never used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work started] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-03 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-20312 started by Eric Wohlstadter.
---
> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService

2018-08-03 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-20312:
---


> Allow arrow clients to use their own BufferAllocator with 
> LlapOutputFormatService
> -
>
> Key: HIVE-20312
> URL: https://issues.apache.org/jira/browse/HIVE-20312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
>
> Clients should be able to provide their own BufferAllocator to 
> LlapBaseInputFormat if allocator operations depend on client-side logic. For 
> example, clients may want to manage the allocator hierarchy per client-side 
> task, thread, etc.. 
> Currently the client is forced to use one global RootAllocator per process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-02 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20300:

Status: Patch Available  (was: Open)

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-02 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20300:

Attachment: HIVE-20300.2.patch

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-02 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20300:

Status: Open  (was: Patch Available)

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch, HIVE-20300.2.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-02 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567716#comment-16567716
 ] 

Eric Wohlstadter commented on HIVE-20300:
-

[https://reviews.apache.org/r/68178/]

 

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-02 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20300:

Status: Patch Available  (was: Open)

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-02 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20300:

Attachment: HIVE-20300.1.patch

> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20300.1.patch
>
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20300) VectorFileSinkArrowOperator

2018-08-02 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-20300:
---


> VectorFileSinkArrowOperator
> ---
>
> Key: HIVE-20300
> URL: https://issues.apache.org/jira/browse/HIVE-20300
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
>
> Bypass the row-mode FileSinkOperator for pushing Arrow format to the 
> LlapOutputFormatService.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20290) Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during GetSplits

2018-08-01 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20290:

Attachment: HIVE-20290.1.patch

> Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during 
> GetSplits
> ---
>
> Key: HIVE-20290
> URL: https://issues.apache.org/jira/browse/HIVE-20290
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20290.1.patch
>
>
> When using {{GenericUDTFGetSplits}} to create {{LlapInputSplit}} for 
> submission to {{LlapOutputFormatService}}, the physical plan generation 
> initializes whatever SerDe is being used.
> {{ArrowColumnarBatchSerDe}} initializes buffers for Arrow and 
> {{VectorizedRowBatch}} at this point inside HS2 which are never used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20290) Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during GetSplits

2018-08-01 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20290:

Status: Patch Available  (was: In Progress)

> Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during 
> GetSplits
> ---
>
> Key: HIVE-20290
> URL: https://issues.apache.org/jira/browse/HIVE-20290
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20290.1.patch
>
>
> When using {{GenericUDTFGetSplits}} to create {{LlapInputSplit}} for 
> submission to {{LlapOutputFormatService}}, the physical plan generation 
> initializes whatever SerDe is being used.
> {{ArrowColumnarBatchSerDe}} initializes buffers for Arrow and 
> {{VectorizedRowBatch}} at this point inside HS2 which are never used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20290) Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during GetSplits

2018-08-01 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-20290:
---


> Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during 
> GetSplits
> ---
>
> Key: HIVE-20290
> URL: https://issues.apache.org/jira/browse/HIVE-20290
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
>
> When using {{GenericUDTFGetSplits}} to create {{LlapInputSplit}} for 
> submission to {{LlapOutputFormatService}}, the physical plan generation 
> initializes whatever SerDe is being used.
> {{ArrowColumnarBatchSerDe}} initializes buffers for Arrow and 
> {{VectorizedRowBatch}} at this point inside HS2 which are never used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work started] (HIVE-20290) Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during GetSplits

2018-08-01 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-20290 started by Eric Wohlstadter.
---
> Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during 
> GetSplits
> ---
>
> Key: HIVE-20290
> URL: https://issues.apache.org/jira/browse/HIVE-20290
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
>
> When using {{GenericUDTFGetSplits}} to create {{LlapInputSplit}} for 
> submission to {{LlapOutputFormatService}}, the physical plan generation 
> initializes whatever SerDe is being used.
> {{ArrowColumnarBatchSerDe}} initializes buffers for Arrow and 
> {{VectorizedRowBatch}} at this point inside HS2 which are never used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-24 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554809#comment-16554809
 ] 

Eric Wohlstadter commented on HIVE-20203:
-

[~teddy.choi]

I think this should be ok to commit now.

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch, HIVE-20203.2.patch, 
> HIVE-20203.3.patch, HIVE-20203.4.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-20 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20203:

Status: Open  (was: Patch Available)

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch, HIVE-20203.2.patch, 
> HIVE-20203.3.patch, HIVE-20203.4.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-20 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20203:

Attachment: HIVE-20203.4.patch

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch, HIVE-20203.2.patch, 
> HIVE-20203.3.patch, HIVE-20203.4.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-20 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20203:

Status: Patch Available  (was: Open)

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch, HIVE-20203.2.patch, 
> HIVE-20203.3.patch, HIVE-20203.4.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20044) Arrow Serde should pad char values and handle empty strings correctly

2018-07-19 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549934#comment-16549934
 ] 

Eric Wohlstadter commented on HIVE-20044:
-

[~teddy.choi]

Looks like you should kick this patch again to get a green build.

 

/cc [~ashutoshc]

I think TestMiniDruidCliDriver is flaky.

> Arrow Serde should pad char values and handle empty strings correctly
> -
>
> Key: HIVE-20044
> URL: https://issues.apache.org/jira/browse/HIVE-20044
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-20044.1.branch-3.patch, HIVE-20044.1.patch, 
> HIVE-20044.patch
>
>
> When Arrow Serde serializes char values, it loses padding. Also when it 
> counts empty strings, sometimes it makes a smaller number. It should pad char 
> values and handle empty strings correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-19 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20203:

Status: Patch Available  (was: Open)

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch, HIVE-20203.2.patch, 
> HIVE-20203.3.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-19 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20203:

Attachment: HIVE-20203.3.patch

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch, HIVE-20203.2.patch, 
> HIVE-20203.3.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-19 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20203:

Status: Open  (was: Patch Available)

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch, HIVE-20203.2.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-19 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20203:

Status: Open  (was: Patch Available)

Made an error in the first patch.

Line in Serializer.java should have been 
{code:java}
rootVector = NullableMapVector.empty(null, allocator);{code}

instead of
{code}
rootVector = NullableMapVector.empty(null, serDe.rootAllocator);
{code}

so it uses the ChildAllocator.

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch, HIVE-20203.2.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-19 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20203:

Attachment: HIVE-20203.2.patch

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch, HIVE-20203.2.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-19 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20203:

Status: Patch Available  (was: Open)

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch, HIVE-20203.2.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-18 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20203:

Description: 
ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task that 
uses the serde.

The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.

This buffer is never closed and leaks about 1K of physical memory for each task.

This patch does three things:
 # Ensure the buffer is closed when the RecordWriter for the task is closed. 
 # Adds per-task memory accounting by assigning a ChildAllocator to each task 
from the RootAllocator.
 # Enforces that the ChildAllocator for a task has released all memory assigned 
to it, when the task is completed. 

The patch assumes that close() is always called on the RecordWriter when a task 
is finished (even if there is a failure during task execution). 

  was:
ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task that 
uses the serde.

The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.

This buffer is never closed and leaks about 1K of physical memory for each task.

This patch does three things:
 # Ensure the buffer is closed when the RecordWriter for the task is closed. 
 # Adds per-task memory accounting by assigning a ChildAllocator to each task 
from the RootAllocator.
 # Enforces that the ChildAllocator for a task has released all memory assigned 
to it, when the task is completed. 

The patch assumes that close() is always called on the RecordWriter when a task 
is finished (even if their is a failure during task execution). 


> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if there is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-18 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548266#comment-16548266
 ] 

Eric Wohlstadter commented on HIVE-20203:
-

[~teddy.choi]

Can you please review the patch?

The patch is covered by the existing TestJdbcWithMiniLlapArrow,

i.e. if the memory leak still exists, LlapArrowRecordWriter would throw an 
exception during the execution of TestJdbcWithMiniLlapArrow.

/cc [~mmccline]

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if their is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-18 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20203:

Attachment: HIVE-20203.1.patch

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if their is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-18 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20203:

Status: Patch Available  (was: Open)

> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
> Attachments: HIVE-20203.1.patch
>
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if their is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20203) Arrow SerDe leaks a DirectByteBuffer

2018-07-18 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-20203:
---


> Arrow SerDe leaks a DirectByteBuffer
> 
>
> Key: HIVE-20203
> URL: https://issues.apache.org/jira/browse/HIVE-20203
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Blocker
>
> ArrowColumnarBatchSerDe allocates an arrow NullableMapVector for each task 
> that uses the serde.
> The vector is a DirectByteBuffer allocated from Arrow's off-heap buffer pool.
> This buffer is never closed and leaks about 1K of physical memory for each 
> task.
> This patch does three things:
>  # Ensure the buffer is closed when the RecordWriter for the task is closed. 
>  # Adds per-task memory accounting by assigning a ChildAllocator to each task 
> from the RootAllocator.
>  # Enforces that the ChildAllocator for a task has released all memory 
> assigned to it, when the task is completed. 
> The patch assumes that close() is always called on the RecordWriter when a 
> task is finished (even if their is a failure during task execution). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20093) LlapOutputFomatService: Use ArrowBuf with Netty for Accounting

2018-07-09 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20093:

Status: Open  (was: Patch Available)

> LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
> --
>
> Key: HIVE-20093
> URL: https://issues.apache.org/jira/browse/HIVE-20093
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20093.1.patch, HIVE-20093.2.patch, 
> HIVE-20093.3.patch
>
>
> Combining {{Unpooled.wrappedBuffer}} with Arrow buffers can create corrupted 
> buffers from buffer reuse race-condition.
> This change ensures Arrow memory to be accounted by the same BufferAllocator.
> RootAllocator will return an ArrowBuf which cooperates with Arrow memory 
> arrow accounting after Netty {{release(1)}} the buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20093) LlapOutputFomatService: Use ArrowBuf with Netty for Accounting

2018-07-09 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20093:

Attachment: HIVE-20093.3.patch

> LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
> --
>
> Key: HIVE-20093
> URL: https://issues.apache.org/jira/browse/HIVE-20093
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20093.1.patch, HIVE-20093.2.patch, 
> HIVE-20093.3.patch
>
>
> Combining {{Unpooled.wrappedBuffer}} with Arrow buffers can create corrupted 
> buffers from buffer reuse race-condition.
> This change ensures Arrow memory to be accounted by the same BufferAllocator.
> RootAllocator will return an ArrowBuf which cooperates with Arrow memory 
> arrow accounting after Netty {{release(1)}} the buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20093) LlapOutputFomatService: Use ArrowBuf with Netty for Accounting

2018-07-09 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20093:

Status: Patch Available  (was: Open)

> LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
> --
>
> Key: HIVE-20093
> URL: https://issues.apache.org/jira/browse/HIVE-20093
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20093.1.patch, HIVE-20093.2.patch, 
> HIVE-20093.3.patch
>
>
> Combining {{Unpooled.wrappedBuffer}} with Arrow buffers can create corrupted 
> buffers from buffer reuse race-condition.
> This change ensures Arrow memory to be accounted by the same BufferAllocator.
> RootAllocator will return an ArrowBuf which cooperates with Arrow memory 
> arrow accounting after Netty {{release(1)}} the buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20093) LlapOutputFomatService: Use ArrowBuf with Netty for Accounting

2018-07-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20093:

Attachment: HIVE-20093.2.patch

> LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
> --
>
> Key: HIVE-20093
> URL: https://issues.apache.org/jira/browse/HIVE-20093
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20093.1.patch, HIVE-20093.2.patch
>
>
> Combining {{Unpooled.wrappedBuffer}} with Arrow buffers can create corrupted 
> buffers from buffer reuse race-condition.
> This change ensures Arrow memory to be accounted by the same BufferAllocator.
> RootAllocator will return an ArrowBuf which cooperates with Arrow memory 
> arrow accounting after Netty {{release(1)}} the buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20093) LlapOutputFomatService: Use ArrowBuf with Netty for Accounting

2018-07-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20093:

Status: Patch Available  (was: Open)

> LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
> --
>
> Key: HIVE-20093
> URL: https://issues.apache.org/jira/browse/HIVE-20093
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20093.1.patch, HIVE-20093.2.patch
>
>
> Combining {{Unpooled.wrappedBuffer}} with Arrow buffers can create corrupted 
> buffers from buffer reuse race-condition.
> This change ensures Arrow memory to be accounted by the same BufferAllocator.
> RootAllocator will return an ArrowBuf which cooperates with Arrow memory 
> arrow accounting after Netty {{release(1)}} the buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20093) LlapOutputFomatService: Use ArrowBuf with Netty for Accounting

2018-07-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20093:

Status: Open  (was: Patch Available)

> LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
> --
>
> Key: HIVE-20093
> URL: https://issues.apache.org/jira/browse/HIVE-20093
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20093.1.patch, HIVE-20093.2.patch
>
>
> Combining {{Unpooled.wrappedBuffer}} with Arrow buffers can create corrupted 
> buffers from buffer reuse race-condition.
> This change ensures Arrow memory to be accounted by the same BufferAllocator.
> RootAllocator will return an ArrowBuf which cooperates with Arrow memory 
> arrow accounting after Netty {{release(1)}} the buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20093) LlapOutputFomatService: Use ArrowBuf with Netty for Accounting

2018-07-05 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534202#comment-16534202
 ] 

Eric Wohlstadter commented on HIVE-20093:
-

Retrying same patch due to unrelated test failures.

> LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
> --
>
> Key: HIVE-20093
> URL: https://issues.apache.org/jira/browse/HIVE-20093
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20093.1.patch, HIVE-20093.2.patch
>
>
> Combining {{Unpooled.wrappedBuffer}} with Arrow buffers can create corrupted 
> buffers from buffer reuse race-condition.
> This change ensures Arrow memory to be accounted by the same BufferAllocator.
> RootAllocator will return an ArrowBuf which cooperates with Arrow memory 
> arrow accounting after Netty {{release(1)}} the buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20093) LlapOutputFomatService: Use ArrowBuf with Netty for Accounting

2018-07-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20093:

Status: Patch Available  (was: Open)

> LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
> --
>
> Key: HIVE-20093
> URL: https://issues.apache.org/jira/browse/HIVE-20093
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20093.1.patch
>
>
> Combining {{Unpooled.wrappedBuffer}} with Arrow buffers can create corrupted 
> buffers from buffer reuse race-condition.
> This change ensures Arrow memory to be accounted by the same BufferAllocator.
> RootAllocator will return an ArrowBuf which cooperates with Arrow memory 
> arrow accounting after Netty {{release(1)}} the buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20093) LlapOutputFomatService: Use ArrowBuf with Netty for Accounting

2018-07-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-20093:

Attachment: HIVE-20093.1.patch

> LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
> --
>
> Key: HIVE-20093
> URL: https://issues.apache.org/jira/browse/HIVE-20093
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-20093.1.patch
>
>
> Combining {{Unpooled.wrappedBuffer}} with Arrow buffers can create corrupted 
> buffers from buffer reuse race-condition.
> This change ensures Arrow memory to be accounted by the same BufferAllocator.
> RootAllocator will return an ArrowBuf which cooperates with Arrow memory 
> arrow accounting after Netty {{release(1)}} the buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20093) LlapOutputFomatService: Use ArrowBuf with Netty for Accounting

2018-07-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-20093:
---


> LlapOutputFomatService: Use ArrowBuf with Netty for Accounting
> --
>
> Key: HIVE-20093
> URL: https://issues.apache.org/jira/browse/HIVE-20093
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
>
> Combining {{Unpooled.wrappedBuffer}} with Arrow buffers can create corrupted 
> buffers from buffer reuse race-condition.
> This change ensures Arrow memory to be accounted by the same BufferAllocator.
> RootAllocator will return an ArrowBuf which cooperates with Arrow memory 
> arrow accounting after Netty {{release(1)}} the buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19853) Arrow serializer needs to create a TimeStampMicroTZVector instead of TimeStampMicroVector

2018-06-16 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514855#comment-16514855
 ] 

Eric Wohlstadter commented on HIVE-19853:
-

The test failure was previously reported here, I assume it is unrelated:

https://issues.apache.org/jira/browse/HIVE-19922

[https://builds.apache.org/job/PreCommit-HIVE-Build/11826/testReport/]

 

 

> Arrow serializer needs to create a TimeStampMicroTZVector instead of 
> TimeStampMicroVector
> -
>
> Key: HIVE-19853
> URL: https://issues.apache.org/jira/browse/HIVE-19853
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-19853.1.patch, HIVE-19853.2.patch
>
>
> HIVE-19723 changed nanosecond to microsecond in Arrow serialization. However, 
> it needs to be microsecond with time zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19853) Arrow serializer needs to create a TimeStampMicroTZVector instead of TimeStampMicroVector

2018-06-15 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514396#comment-16514396
 ] 

Eric Wohlstadter commented on HIVE-19853:
-

lgtm

[~mmccline], can you merge to master and branch-3?

Thanks!

> Arrow serializer needs to create a TimeStampMicroTZVector instead of 
> TimeStampMicroVector
> -
>
> Key: HIVE-19853
> URL: https://issues.apache.org/jira/browse/HIVE-19853
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-19853.1.patch, HIVE-19853.2.patch
>
>
> HIVE-19723 changed nanosecond to microsecond in Arrow serialization. However, 
> it needs to be microsecond with time zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-19723) Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"

2018-06-08 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506809#comment-16506809
 ] 

Eric Wohlstadter edited comment on HIVE-19723 at 6/9/18 4:24 AM:
-

[~teddy.choi]

Serializer needs to create a {{TimeStampMicroTZVector}} instead of 
{{TimeStampMicroVector}}. 

See: 
{{org.apache.spark.sql.vectorized.ArrowColumnVector.ArrowColumnVector(ValueVector
 vector)}}

Can you create a new JIRA for that?


was (Author: ewohlstadter):
[~teddy.choi]

Serializer needs to create a {{TimeStampMicroTZVector}} instead of 
{{TimeStampMicroVector}}. 

See: 
{{org.apache.spark.sql.vectorized.ArrowColumnVector.ArrowColumnVector(ValueVector
 vector)}}

> Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"
> -
>
> Key: HIVE-19723
> URL: https://issues.apache.org/jira/browse/HIVE-19723
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19723.1.patch, HIVE-19723.3.patch, 
> HIVE-19723.4.patch, HIVE-19732.2.patch
>
>
> Spark's Arrow support only provides Timestamp at MICROSECOND granularity. 
> Spark 2.3.0 won't accept NANOSECOND. Switch it back to MICROSECOND.
> The unit test org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow will just need 
> to change the assertion to test microsecond. And we'll need to add this to 
> documentation on supported datatypes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19723) Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"

2018-06-08 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506809#comment-16506809
 ] 

Eric Wohlstadter commented on HIVE-19723:
-

[~teddy.choi]

Serializer needs to create a {{TimeStampMicroTZVector}} instead of 
{{TimeStampMicroVector}}. 

See: 
{{org.apache.spark.sql.vectorized.ArrowColumnVector.ArrowColumnVector(ValueVector
 vector)}}

> Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"
> -
>
> Key: HIVE-19723
> URL: https://issues.apache.org/jira/browse/HIVE-19723
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0
>
> Attachments: HIVE-19723.1.patch, HIVE-19723.3.patch, 
> HIVE-19723.4.patch, HIVE-19732.2.patch
>
>
> Spark's Arrow support only provides Timestamp at MICROSECOND granularity. 
> Spark 2.3.0 won't accept NANOSECOND. Switch it back to MICROSECOND.
> The unit test org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow will just need 
> to change the assertion to test microsecond. And we'll need to add this to 
> documentation on supported datatypes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HIVE-19839) sssssssssssss

2018-06-08 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter resolved HIVE-19839.
-
Resolution: Invalid

> s
> -
>
> Key: HIVE-19839
> URL: https://issues.apache.org/jira/browse/HIVE-19839
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, UDF
>Affects Versions: 2.3.1
>Reporter: sadashiv
>Priority: Major
> Fix For: 0.10.1
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19808) GenericUDTFGetSplits should support ACID reads in the temp. table read path

2018-06-07 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505656#comment-16505656
 ] 

Eric Wohlstadter commented on HIVE-19808:
-

[~jdere]

Got a green run. Can you help merge to master and branch-3?

> GenericUDTFGetSplits should support ACID reads in the temp. table read path
> ---
>
> Key: HIVE-19808
> URL: https://issues.apache.org/jira/browse/HIVE-19808
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-19808.1.patch, HIVE-19808.2.patch
>
>
> 1. Map-only reads work on ACID tables.
> 2. Temp. table reads (for multi-vertex queries) work on non-ACID tables.
> 3. But temp. table reads don't work on ACID tables.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create 
> temp table: java.lang.IllegalStateException: calling recordValidTxn() more 
> than once in the same txnid:420
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:303)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:484)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19808) GenericUDTFGetSplits should support ACID reads in the temp. table read path

2018-06-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-19808:

Status: Patch Available  (was: Open)

> GenericUDTFGetSplits should support ACID reads in the temp. table read path
> ---
>
> Key: HIVE-19808
> URL: https://issues.apache.org/jira/browse/HIVE-19808
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-19808.1.patch, HIVE-19808.2.patch
>
>
> 1. Map-only reads work on ACID tables.
> 2. Temp. table reads (for multi-vertex queries) work on non-ACID tables.
> 3. But temp. table reads don't work on ACID tables.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create 
> temp table: java.lang.IllegalStateException: calling recordValidTxn() more 
> than once in the same txnid:420
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:303)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:484)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19808) GenericUDTFGetSplits should support ACID reads in the temp. table read path

2018-06-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-19808:

Attachment: HIVE-19808.2.patch

> GenericUDTFGetSplits should support ACID reads in the temp. table read path
> ---
>
> Key: HIVE-19808
> URL: https://issues.apache.org/jira/browse/HIVE-19808
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-19808.1.patch, HIVE-19808.2.patch
>
>
> 1. Map-only reads work on ACID tables.
> 2. Temp. table reads (for multi-vertex queries) work on non-ACID tables.
> 3. But temp. table reads don't work on ACID tables.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create 
> temp table: java.lang.IllegalStateException: calling recordValidTxn() more 
> than once in the same txnid:420
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:303)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:484)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19808) GenericUDTFGetSplits should support ACID reads in the temp. table read path

2018-06-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-19808:

Status: Open  (was: Patch Available)

> GenericUDTFGetSplits should support ACID reads in the temp. table read path
> ---
>
> Key: HIVE-19808
> URL: https://issues.apache.org/jira/browse/HIVE-19808
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-19808.1.patch
>
>
> 1. Map-only reads work on ACID tables.
> 2. Temp. table reads (for multi-vertex queries) work on non-ACID tables.
> 3. But temp. table reads don't work on ACID tables.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create 
> temp table: java.lang.IllegalStateException: calling recordValidTxn() more 
> than once in the same txnid:420
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:303)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:484)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19808) GenericUDTFGetSplits should support ACID reads in the temp. table read path

2018-06-05 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502579#comment-16502579
 ] 

Eric Wohlstadter commented on HIVE-19808:
-

[~jdere] [~prasanth_j]

[https://reviews.apache.org/r/67462/]

 

> GenericUDTFGetSplits should support ACID reads in the temp. table read path
> ---
>
> Key: HIVE-19808
> URL: https://issues.apache.org/jira/browse/HIVE-19808
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-19808.1.patch
>
>
> 1. Map-only reads work on ACID tables.
> 2. Temp. table reads (for multi-vertex queries) work on non-ACID tables.
> 3. But temp. table reads don't work on ACID tables.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create 
> temp table: java.lang.IllegalStateException: calling recordValidTxn() more 
> than once in the same txnid:420
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:303)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:484)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19808) GenericUDTFGetSplits should support ACID reads in the temp. table read path

2018-06-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-19808:

Status: Patch Available  (was: Open)

> GenericUDTFGetSplits should support ACID reads in the temp. table read path
> ---
>
> Key: HIVE-19808
> URL: https://issues.apache.org/jira/browse/HIVE-19808
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-19808.1.patch
>
>
> 1. Map-only reads work on ACID tables.
> 2. Temp. table reads (for multi-vertex queries) work on non-ACID tables.
> 3. But temp. table reads don't work on ACID tables.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create 
> temp table: java.lang.IllegalStateException: calling recordValidTxn() more 
> than once in the same txnid:420
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:303)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:484)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19808) GenericUDTFGetSplits should support ACID reads in the temp. table read path

2018-06-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-19808:

Attachment: HIVE-19808.1.patch

> GenericUDTFGetSplits should support ACID reads in the temp. table read path
> ---
>
> Key: HIVE-19808
> URL: https://issues.apache.org/jira/browse/HIVE-19808
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-19808.1.patch
>
>
> 1. Map-only reads work on ACID tables.
> 2. Temp. table reads (for multi-vertex queries) work on non-ACID tables.
> 3. But temp. table reads don't work on ACID tables.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create 
> temp table: java.lang.IllegalStateException: calling recordValidTxn() more 
> than once in the same txnid:420
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:303)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:484)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19808) GenericUDTFGetSplits should support ACID reads in the temp. table read path

2018-06-05 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502563#comment-16502563
 ] 

Eric Wohlstadter commented on HIVE-19808:
-

[~ekoifman]

Yeah.

What happens in GenericUDTFGetSplits is: 
{code:java}
"create temporary table " + tableName + " as " + query
{code}
and then the temp table is read into LLAP and exported by the 
{{LlapOutputFormatService}}. 

Currently if {{query}} references an ACID table, then it fails.

The temp table itself is not ACID, the issue occurs when any of the source 
tables are ACID.

> GenericUDTFGetSplits should support ACID reads in the temp. table read path
> ---
>
> Key: HIVE-19808
> URL: https://issues.apache.org/jira/browse/HIVE-19808
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
>
> 1. Map-only reads work on ACID tables.
> 2. Temp. table reads (for multi-vertex queries) work on non-ACID tables.
> 3. But temp. table reads don't work on ACID tables.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create 
> temp table: java.lang.IllegalStateException: calling recordValidTxn() more 
> than once in the same txnid:420
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:303)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:484)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-19808) GenericUDTFGetSplits should support ACID reads in the temp. table read path

2018-06-05 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502563#comment-16502563
 ] 

Eric Wohlstadter edited comment on HIVE-19808 at 6/5/18 9:59 PM:
-

[~ekoifman] [~jdere]

Yeah.

What happens in GenericUDTFGetSplits is: 
{code:java}
"create temporary table " + tableName + " as " + query
{code}
and then the temp table is read into LLAP and exported by the 
{{LlapOutputFormatService}}.

Currently if {{query}} references an ACID table, then it fails.

The temp table itself is not ACID, the issue occurs when any of the source 
tables are ACID.


was (Author: ewohlstadter):
[~ekoifman]

Yeah.

What happens in GenericUDTFGetSplits is: 
{code:java}
"create temporary table " + tableName + " as " + query
{code}
and then the temp table is read into LLAP and exported by the 
{{LlapOutputFormatService}}. 

Currently if {{query}} references an ACID table, then it fails.

The temp table itself is not ACID, the issue occurs when any of the source 
tables are ACID.

> GenericUDTFGetSplits should support ACID reads in the temp. table read path
> ---
>
> Key: HIVE-19808
> URL: https://issues.apache.org/jira/browse/HIVE-19808
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
>
> 1. Map-only reads work on ACID tables.
> 2. Temp. table reads (for multi-vertex queries) work on non-ACID tables.
> 3. But temp. table reads don't work on ACID tables.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create 
> temp table: java.lang.IllegalStateException: calling recordValidTxn() more 
> than once in the same txnid:420
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:303)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:484)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-19808) GenericUDTFGetSplits should support ACID reads in the temp. table read path

2018-06-05 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter reassigned HIVE-19808:
---


> GenericUDTFGetSplits should support ACID reads in the temp. table read path
> ---
>
> Key: HIVE-19808
> URL: https://issues.apache.org/jira/browse/HIVE-19808
> Project: Hive
>  Issue Type: Bug
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
>
> 1. Map-only reads work on ACID tables.
> 2. Temp. table reads (for multi-vertex queries) work on non-ACID tables.
> 3. But temp. table reads don't work on ACID tables.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create 
> temp table: java.lang.IllegalStateException: calling recordValidTxn() more 
> than once in the same txnid:420
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.createPlanFragment(GenericUDTFGetSplits.java:303)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:202)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:918)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:985)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:931)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:492)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:484)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:145)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-19723) Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"

2018-06-01 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498593#comment-16498593
 ] 

Eric Wohlstadter edited comment on HIVE-19723 at 6/1/18 9:20 PM:
-

[~teddy.choi]

Hive's Arrow serializer appears to truncate down to MILLISECONDS, but the Jira 
description calls for MICROSECONDS.

This is motivated by {{org.apache.spark.sql.execution.arrow.ArrowUtils.scala}}
{code:java}
case ts: ArrowType.Timestamp if ts.getUnit == TimeUnit.MICROSECOND => 
TimestampType{code}

My understanding is that since the primary use-case for {{ArrowUtils}} is 
Python integration, some of the conversions are currently somewhat particular 
for Python. Perhaps Python/Pandas only supports MICROSECOND timestamps. 

FYI: [~hyukjin.kwon] [~bryanc]




was (Author: ewohlstadter):
[~teddy.choi]

The Arrow serializer appears to truncate down to MILLISECONDS, but the Jira 
description calls for MICROSECONDS.

This is motivated by {{org.apache.spark.sql.execution.arrow.ArrowUtils.scala}}
{code:java}
case ts: ArrowType.Timestamp if ts.getUnit == TimeUnit.MICROSECOND => 
TimestampType{code}

My understanding is that since the primary use-case for {{ArrowUtils}} is 
Python integration, some of the conversions are currently somewhat particular 
for Python. Perhaps Python/Pandas only supports MICROSECOND timestamps. 

FYI: [~hyukjin.kwon] [~bryanc]



> Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"
> -
>
> Key: HIVE-19723
> URL: https://issues.apache.org/jira/browse/HIVE-19723
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19723.1.patch, HIVE-19732.2.patch
>
>
> Spark's Arrow support only provides Timestamp at MICROSECOND granularity. 
> Spark 2.3.0 won't accept NANOSECOND. Switch it back to MICROSECOND.
> The unit test org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow will just need 
> to change the assertion to test microsecond. And we'll need to add this to 
> documentation on supported datatypes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19723) Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"

2018-06-01 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498593#comment-16498593
 ] 

Eric Wohlstadter commented on HIVE-19723:
-

[~teddy.choi]

The Arrow serializer appears to truncate down to MILLISECONDS, but the Jira 
description calls for MICROSECONDS.

This is motivated by {{org.apache.spark.sql.execution.arrow.ArrowUtils.scala}}
{code:java}
case ts: ArrowType.Timestamp if ts.getUnit == TimeUnit.MICROSECOND => 
TimestampType{code}

My understanding is that since the primary use-case for {{ArrowUtils}} is 
Python integration, some of the conversions are currently somewhat particular 
for Python. Perhaps Python/Pandas only supports MICROSECOND timestamps. 

FYI: [~hyukjin.kwon] [~bryanc]



> Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"
> -
>
> Key: HIVE-19723
> URL: https://issues.apache.org/jira/browse/HIVE-19723
> Project: Hive
>  Issue Type: Bug
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19723.1.patch, HIVE-19732.2.patch
>
>
> Spark's Arrow support only provides Timestamp at MICROSECOND granularity. 
> Spark 2.3.0 won't accept NANOSECOND. Switch it back to MICROSECOND.
> The unit test org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow will just need 
> to change the assertion to test microsecond. And we'll need to add this to 
> documentation on supported datatypes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19305) Arrow format for LlapOutputFormatService (umbrella)

2018-05-29 Thread Eric Wohlstadter (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494347#comment-16494347
 ] 

Eric Wohlstadter commented on HIVE-19305:
-

[~ashutoshc]

Yes, except HIVE-19713 was also added to pass the storage-handler version 
problem.

> Arrow format for LlapOutputFormatService (umbrella)
> ---
>
> Key: HIVE-19305
> URL: https://issues.apache.org/jira/browse/HIVE-19305
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-19305.1-branch-3.patch, 
> HIVE-19305.2-branch-3.patch, HIVE-19305.3-branch-3.patch, 
> HIVE-19305.4-branch-3.patch
>
>
> Allows external clients to consume output from LLAP daemons in Arrow stream 
> format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19682) Provide option for GenericUDTFGetSplits to return only schema metadata

2018-05-28 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-19682:

Status: Open  (was: Patch Available)

> Provide option for GenericUDTFGetSplits to return only schema metadata
> --
>
> Key: HIVE-19682
> URL: https://issues.apache.org/jira/browse/HIVE-19682
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-19682.1.patch, HIVE-19682.2.patch, 
> HIVE-19682.3.patch
>
>
> For some uses cases it is necessary to know the output schema for a HiveQL 
> before executing the query. But there is no existing client API that provides 
> this information.
> Hive JDBC doesn't provide the schema for parametric types in 
> {{ResultSetMetaData}}.
> GenericUDTFGetSplits bundles the proper schema metadata with the fragments 
> for input splits. An option can be added to return only the schema metadata 
> from compilation, and the generation of input splits can be skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19682) Provide option for GenericUDTFGetSplits to return only schema metadata

2018-05-28 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-19682:

Attachment: HIVE-19682.3.patch

> Provide option for GenericUDTFGetSplits to return only schema metadata
> --
>
> Key: HIVE-19682
> URL: https://issues.apache.org/jira/browse/HIVE-19682
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-19682.1.patch, HIVE-19682.2.patch, 
> HIVE-19682.3.patch
>
>
> For some uses cases it is necessary to know the output schema for a HiveQL 
> before executing the query. But there is no existing client API that provides 
> this information.
> Hive JDBC doesn't provide the schema for parametric types in 
> {{ResultSetMetaData}}.
> GenericUDTFGetSplits bundles the proper schema metadata with the fragments 
> for input splits. An option can be added to return only the schema metadata 
> from compilation, and the generation of input splits can be skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19682) Provide option for GenericUDTFGetSplits to return only schema metadata

2018-05-28 Thread Eric Wohlstadter (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wohlstadter updated HIVE-19682:

Status: Patch Available  (was: Open)

> Provide option for GenericUDTFGetSplits to return only schema metadata
> --
>
> Key: HIVE-19682
> URL: https://issues.apache.org/jira/browse/HIVE-19682
> Project: Hive
>  Issue Type: Improvement
>Reporter: Eric Wohlstadter
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: HIVE-19682.1.patch, HIVE-19682.2.patch, 
> HIVE-19682.3.patch
>
>
> For some uses cases it is necessary to know the output schema for a HiveQL 
> before executing the query. But there is no existing client API that provides 
> this information.
> Hive JDBC doesn't provide the schema for parametric types in 
> {{ResultSetMetaData}}.
> GenericUDTFGetSplits bundles the proper schema metadata with the fragments 
> for input splits. An option can be added to return only the schema metadata 
> from compilation, and the generation of input splits can be skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 >

1 - 100 of 218 matches

Mail list logo